Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux

Pull drm updates from Dave Airlie: "Highlights: - AMD KFD driver merge This is the AMD HSA interface for exposing a lowlevel interface for GPGPU use. They have an open source userspace built on top of this interface, and the code looks as good as it was going to get out of tree. - Initial atomic modesetting work The need for an atomic modesetting interface to allow userspace to try and send a complete set of modesetting state to the driver has arisen, and been suffering from neglect this past year. No more, the start of the common code and changes for msm driver to use it are in this tree. Ongoing work to get the userspace ioctl finished and the code clean will probably wait until next kernel. - DisplayID 1.3 and tiled monitor exposed to userspace. Tiled monitor property is now exposed for userspace to make use of. - Rockchip drm driver merged. - imx gpu driver moved out of staging Other stuff: - core: panel - MIPI DSI + new panels. expose suggested x/y properties for virtual GPUs - i915: Initial Skylake (SKL) support gen3/4 reset work start of dri1/ums removal infoframe tracking fixes for lots of things. - nouveau: tegra k1 voltage support GM204 modesetting support GT21x memory reclocking work - radeon: CI dpm fixes GPUVM improvements Initial DPM fan control - rcar-du: HDMI support added removed some support for old boards slave encoder driver for Analog Devices adv7511 - exynos: Exynos4415 SoC support - msm: a4xx gpu support atomic helper conversion - tegra: iommu support universal plane support ganged-mode DSI support - sti: HDMI i2c improvements - vmwgfx: some late fixes. - qxl: use suggested x/y properties" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (969 commits) drm: sti: fix module compilation issue drm/i915: save/restore GMBUS freq across suspend/resume on gen4 drm: sti: correctly cleanup CRTC and planes drm: sti: add HQVDP plane drm: sti: add cursor plane drm: sti: enable auxiliary CRTC drm: sti: fix delay in VTG programming drm: sti: prepare sti_tvout to support auxiliary crtc drm: sti: use drm_crtc_vblank_{on/off} instead of drm_vblank_{on/off} drm: sti: fix hdmi avi infoframe drm: sti: remove event lock while disabling vblank drm: sti: simplify gdp code drm: sti: clear all mixer control drm: sti: remove gpio for HDMI hot plug detection drm: sti: allow to change hdmi ddc i2c adapter drm/doc: Document drm_add_modes_noedid() usage drm/i915: Remove '& 0xffff' from the mask given to WA_REG() drm/i915: Invert the mask and val arguments in wa_add() and WA_REG() drm: Zero out DRM object memory upon cleanup drm/i915/bdw: Fix the write setting up the WIZ hashing mode ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2014-12-15 15:52:01 -0800
committer: Linus Torvalds <torvalds@linux-foundation.org> 2014-12-15 15:52:01 -0800
commit: 988adfdffdd43cfd841df734664727993076d7cb (patch)
tree: 6794f7bba8f595500c2b7d33376ad6614adcfaf2
parent: x86: mm: consolidate VM_FAULT_RETRY handling (diff)
parent: drm: sti: fix module compilation issue (diff)
download: linux-dev-988adfdffdd43cfd841df734664727993076d7cb.tar.xz
linux-dev-988adfdffdd43cfd841df734664727993076d7cb.zip
549 files changed, 53440 insertions, 14575 deletions
diff --git a/CREDITS b/CREDITS
index bb6278884f89..c56d8aa10131 100644
--- a/CREDITS
+++ b/CREDITS
@@ -1197,6 +1197,13 @@ S: R. Tocantins, 89 - Cristo Rei
 S: 80050-430 - Curitiba - Paraná
 S: Brazil
 
+N: Oded Gabbay
+E: oded.gabbay@gmail.com
+D: AMD KFD maintainer
+S: 12 Shraga Raphaeli
+S: Petah-Tikva, 4906418
+S: Israel
+
 N: Kumar Gala
 E: galak@kernel.crashing.org
 D: Embedded PowerPC 6xx/7xx/74xx/82xx/83xx/85xx support
diff --git a/Documentation/DocBook/drm.tmpl b/Documentation/DocBook/drm.tmpl
index be35bc328b77..4b592ffbafee 100644
--- a/Documentation/DocBook/drm.tmpl
+++ b/Documentation/DocBook/drm.tmpl
@@ -492,10 +492,10 @@ char *date;</synopsis>
     <sect2>
       <title>The Translation Table Manager (TTM)</title>
       <para>
-	TTM design background and information belongs here.
+        TTM design background and information belongs here.
       </para>
       <sect3>
-	<title>TTM initialization</title>
+        <title>TTM initialization</title>
         <warning><para>This section is outdated.</para></warning>
         <para>
           Drivers wishing to support TTM must fill out a drm_bo_driver
@@ -503,42 +503,42 @@ char *date;</synopsis>
           pointers for initializing the TTM, allocating and freeing memory,
           waiting for command completion and fence synchronization, and memory
           migration. See the radeon_ttm.c file for an example of usage.
-	</para>
-	<para>
-	  The ttm_global_reference structure is made up of several fields:
-	</para>
-	<programlisting>
-	  struct ttm_global_reference {
-	  	enum ttm_global_types global_type;
-	  	size_t size;
-	  	void *object;
-	  	int (*init) (struct ttm_global_reference *);
-	  	void (*release) (struct ttm_global_reference *);
-	  };
-	</programlisting>
-	<para>
-	  There should be one global reference structure for your memory
-	  manager as a whole, and there will be others for each object
-	  created by the memory manager at runtime.  Your global TTM should
-	  have a type of TTM_GLOBAL_TTM_MEM.  The size field for the global
-	  object should be sizeof(struct ttm_mem_global), and the init and
-	  release hooks should point at your driver-specific init and
-	  release routines, which probably eventually call
-	  ttm_mem_global_init and ttm_mem_global_release, respectively.
-	</para>
-	<para>
-	  Once your global TTM accounting structure is set up and initialized
-	  by calling ttm_global_item_ref() on it,
-	  you need to create a buffer object TTM to
-	  provide a pool for buffer object allocation by clients and the
-	  kernel itself.  The type of this object should be TTM_GLOBAL_TTM_BO,
-	  and its size should be sizeof(struct ttm_bo_global).  Again,
-	  driver-specific init and release functions may be provided,
-	  likely eventually calling ttm_bo_global_init() and
-	  ttm_bo_global_release(), respectively.  Also, like the previous
-	  object, ttm_global_item_ref() is used to create an initial reference
-	  count for the TTM, which will call your initialization function.
-	</para>
+        </para>
+        <para>
+          The ttm_global_reference structure is made up of several fields:
+        </para>
+        <programlisting>
+          struct ttm_global_reference {
+                  enum ttm_global_types global_type;
+                  size_t size;
+                  void *object;
+                  int (*init) (struct ttm_global_reference *);
+                  void (*release) (struct ttm_global_reference *);
+          };
+        </programlisting>
+        <para>
+          There should be one global reference structure for your memory
+          manager as a whole, and there will be others for each object
+          created by the memory manager at runtime.  Your global TTM should
+          have a type of TTM_GLOBAL_TTM_MEM.  The size field for the global
+          object should be sizeof(struct ttm_mem_global), and the init and
+          release hooks should point at your driver-specific init and
+          release routines, which probably eventually call
+          ttm_mem_global_init and ttm_mem_global_release, respectively.
+        </para>
+        <para>
+          Once your global TTM accounting structure is set up and initialized
+          by calling ttm_global_item_ref() on it,
+          you need to create a buffer object TTM to
+          provide a pool for buffer object allocation by clients and the
+          kernel itself.  The type of this object should be TTM_GLOBAL_TTM_BO,
+          and its size should be sizeof(struct ttm_bo_global).  Again,
+          driver-specific init and release functions may be provided,
+          likely eventually calling ttm_bo_global_init() and
+          ttm_bo_global_release(), respectively.  Also, like the previous
+          object, ttm_global_item_ref() is used to create an initial reference
+          count for the TTM, which will call your initialization function.
+        </para>
       </sect3>
     </sect2>
     <sect2 id="drm-gem">
@@ -566,19 +566,19 @@ char *date;</synopsis>
         using driver-specific ioctls.
       </para>
       <para>
-	On a fundamental level, GEM involves several operations:
-	<itemizedlist>
-	  <listitem>Memory allocation and freeing</listitem>
-	  <listitem>Command execution</listitem>
-	  <listitem>Aperture management at command execution time</listitem>
-	</itemizedlist>
-	Buffer object allocation is relatively straightforward and largely
+        On a fundamental level, GEM involves several operations:
+        <itemizedlist>
+          <listitem>Memory allocation and freeing</listitem>
+          <listitem>Command execution</listitem>
+          <listitem>Aperture management at command execution time</listitem>
+        </itemizedlist>
+        Buffer object allocation is relatively straightforward and largely
         provided by Linux's shmem layer, which provides memory to back each
         object.
       </para>
       <para>
         Device-specific operations, such as command execution, pinning, buffer
-	read &amp; write, mapping, and domain ownership transfers are left to
+        read &amp; write, mapping, and domain ownership transfers are left to
         driver-specific ioctls.
       </para>
       <sect3>
@@ -738,16 +738,16 @@ char *date;</synopsis>
           respectively. The conversion is handled by the DRM core without any
           driver-specific support.
         </para>
-	<para>
-	  GEM also supports buffer sharing with dma-buf file descriptors through
-	  PRIME. GEM-based drivers must use the provided helpers functions to
-	  implement the exporting and importing correctly. See <xref linkend="drm-prime-support" />.
-	  Since sharing file descriptors is inherently more secure than the
-	  easily guessable and global GEM names it is the preferred buffer
-	  sharing mechanism. Sharing buffers through GEM names is only supported
-	  for legacy userspace. Furthermore PRIME also allows cross-device
-	  buffer sharing since it is based on dma-bufs.
-	</para>
+        <para>
+          GEM also supports buffer sharing with dma-buf file descriptors through
+          PRIME. GEM-based drivers must use the provided helpers functions to
+          implement the exporting and importing correctly. See <xref linkend="drm-prime-support" />.
+          Since sharing file descriptors is inherently more secure than the
+          easily guessable and global GEM names it is the preferred buffer
+          sharing mechanism. Sharing buffers through GEM names is only supported
+          for legacy userspace. Furthermore PRIME also allows cross-device
+          buffer sharing since it is based on dma-bufs.
+        </para>
       </sect3>
       <sect3 id="drm-gem-objects-mapping">
         <title>GEM Objects Mapping</title>
@@ -852,7 +852,7 @@ char *date;</synopsis>
       <sect3>
         <title>Command Execution</title>
         <para>
-	  Perhaps the most important GEM function for GPU devices is providing a
+          Perhaps the most important GEM function for GPU devices is providing a
           command execution interface to clients. Client programs construct
           command buffers containing references to previously allocated memory
           objects, and then submit them to GEM. At that point, GEM takes care to
@@ -874,95 +874,101 @@ char *date;</synopsis>
         <title>GEM Function Reference</title>
 !Edrivers/gpu/drm/drm_gem.c
       </sect3>
-      </sect2>
-      <sect2>
-	<title>VMA Offset Manager</title>
+    </sect2>
+    <sect2>
+      <title>VMA Offset Manager</title>
 !Pdrivers/gpu/drm/drm_vma_manager.c vma offset manager
 !Edrivers/gpu/drm/drm_vma_manager.c
 !Iinclude/drm/drm_vma_manager.h
-      </sect2>
-      <sect2 id="drm-prime-support">
-	<title>PRIME Buffer Sharing</title>
-	<para>
-	  PRIME is the cross device buffer sharing framework in drm, originally
-	  created for the OPTIMUS range of multi-gpu platforms. To userspace
-	  PRIME buffers are dma-buf based file descriptors.
-	</para>
-	<sect3>
-	  <title>Overview and Driver Interface</title>
-	  <para>
-	    Similar to GEM global names, PRIME file descriptors are
-	    also used to share buffer objects across processes. They offer
-	    additional security: as file descriptors must be explicitly sent over
-	    UNIX domain sockets to be shared between applications, they can't be
-	    guessed like the globally unique GEM names.
-	  </para>
-	  <para>
-	    Drivers that support the PRIME
-	    API must set the DRIVER_PRIME bit in the struct
-	    <structname>drm_driver</structname>
-	    <structfield>driver_features</structfield> field, and implement the
-	    <methodname>prime_handle_to_fd</methodname> and
-	    <methodname>prime_fd_to_handle</methodname> operations.
-	  </para>
-	  <para>
-	    <synopsis>int (*prime_handle_to_fd)(struct drm_device *dev,
-			  struct drm_file *file_priv, uint32_t handle,
-			  uint32_t flags, int *prime_fd);
+    </sect2>
+    <sect2 id="drm-prime-support">
+      <title>PRIME Buffer Sharing</title>
+      <para>
+        PRIME is the cross device buffer sharing framework in drm, originally
+        created for the OPTIMUS range of multi-gpu platforms. To userspace
+        PRIME buffers are dma-buf based file descriptors.
+      </para>
+      <sect3>
+        <title>Overview and Driver Interface</title>
+        <para>
+          Similar to GEM global names, PRIME file descriptors are
+          also used to share buffer objects across processes. They offer
+          additional security: as file descriptors must be explicitly sent over
+          UNIX domain sockets to be shared between applications, they can't be
+          guessed like the globally unique GEM names.
+        </para>
+        <para>
+          Drivers that support the PRIME
+          API must set the DRIVER_PRIME bit in the struct
+          <structname>drm_driver</structname>
+          <structfield>driver_features</structfield> field, and implement the
+          <methodname>prime_handle_to_fd</methodname> and
+          <methodname>prime_fd_to_handle</methodname> operations.
+        </para>
+        <para>
+          <synopsis>int (*prime_handle_to_fd)(struct drm_device *dev,
+                          struct drm_file *file_priv, uint32_t handle,
+                          uint32_t flags, int *prime_fd);
 int (*prime_fd_to_handle)(struct drm_device *dev,
-			  struct drm_file *file_priv, int prime_fd,
-			  uint32_t *handle);</synopsis>
-	    Those two operations convert a handle to a PRIME file descriptor and
-	    vice versa. Drivers must use the kernel dma-buf buffer sharing framework
-	    to manage the PRIME file descriptors. Similar to the mode setting
-	    API PRIME is agnostic to the underlying buffer object manager, as
-	    long as handles are 32bit unsigned integers.
-	  </para>
-	  <para>
-	    While non-GEM drivers must implement the operations themselves, GEM
-	    drivers must use the <function>drm_gem_prime_handle_to_fd</function>
-	    and <function>drm_gem_prime_fd_to_handle</function> helper functions.
-	    Those helpers rely on the driver
-	    <methodname>gem_prime_export</methodname> and
-	    <methodname>gem_prime_import</methodname> operations to create a dma-buf
-	    instance from a GEM object (dma-buf exporter role) and to create a GEM
-	    object from a dma-buf instance (dma-buf importer role).
-	  </para>
-	  <para>
-	    <synopsis>struct dma_buf * (*gem_prime_export)(struct drm_device *dev,
-				     struct drm_gem_object *obj,
-				     int flags);
+                          struct drm_file *file_priv, int prime_fd,
+                          uint32_t *handle);</synopsis>
+            Those two operations convert a handle to a PRIME file descriptor and
+            vice versa. Drivers must use the kernel dma-buf buffer sharing framework
+            to manage the PRIME file descriptors. Similar to the mode setting
+            API PRIME is agnostic to the underlying buffer object manager, as
+            long as handles are 32bit unsigned integers.
+          </para>
+          <para>
+            While non-GEM drivers must implement the operations themselves, GEM
+            drivers must use the <function>drm_gem_prime_handle_to_fd</function>
+            and <function>drm_gem_prime_fd_to_handle</function> helper functions.
+            Those helpers rely on the driver
+            <methodname>gem_prime_export</methodname> and
+            <methodname>gem_prime_import</methodname> operations to create a dma-buf
+            instance from a GEM object (dma-buf exporter role) and to create a GEM
+            object from a dma-buf instance (dma-buf importer role).
+          </para>
+          <para>
+            <synopsis>struct dma_buf * (*gem_prime_export)(struct drm_device *dev,
+                             struct drm_gem_object *obj,
+                             int flags);
 struct drm_gem_object * (*gem_prime_import)(struct drm_device *dev,
-					    struct dma_buf *dma_buf);</synopsis>
-	    These two operations are mandatory for GEM drivers that support
-	    PRIME.
-	  </para>
-	</sect3>
-        <sect3>
-          <title>PRIME Helper Functions</title>
-!Pdrivers/gpu/drm/drm_prime.c PRIME Helpers
+                                            struct dma_buf *dma_buf);</synopsis>
+            These two operations are mandatory for GEM drivers that support
+            PRIME.
+          </para>
         </sect3>
-      </sect2>
-      <sect2>
-	<title>PRIME Function References</title>
+      <sect3>
+        <title>PRIME Helper Functions</title>
+!Pdrivers/gpu/drm/drm_prime.c PRIME Helpers
+      </sect3>
+    </sect2>
+    <sect2>
+      <title>PRIME Function References</title>
 !Edrivers/gpu/drm/drm_prime.c
-      </sect2>
-      <sect2>
-	<title>DRM MM Range Allocator</title>
-	<sect3>
-	  <title>Overview</title>
+    </sect2>
+    <sect2>
+      <title>DRM MM Range Allocator</title>
+      <sect3>
+        <title>Overview</title>
 !Pdrivers/gpu/drm/drm_mm.c Overview
-	</sect3>
-	<sect3>
-	  <title>LRU Scan/Eviction Support</title>
+      </sect3>
+      <sect3>
+        <title>LRU Scan/Eviction Support</title>
 !Pdrivers/gpu/drm/drm_mm.c lru scan roaster
-	</sect3>
+      </sect3>
       </sect2>
-      <sect2>
-	<title>DRM MM Range Allocator Function References</title>
+    <sect2>
+      <title>DRM MM Range Allocator Function References</title>
 !Edrivers/gpu/drm/drm_mm.c
 !Iinclude/drm/drm_mm.h
-      </sect2>
+    </sect2>
+    <sect2>
+      <title>CMA Helper Functions Reference</title>
+!Pdrivers/gpu/drm/drm_gem_cma_helper.c cma helpers
+!Edrivers/gpu/drm/drm_gem_cma_helper.c
+!Iinclude/drm/drm_gem_cma_helper.h
+    </sect2>
   </sect1>
 
   <!-- Internals: mode setting -->
@@ -996,6 +1002,10 @@ int max_width, max_height;</synopsis>
 !Edrivers/gpu/drm/drm_modes.c
     </sect2>
     <sect2>
+      <title>Atomic Mode Setting Function Reference</title>
+!Edrivers/gpu/drm/drm_atomic.c
+    </sect2>
+    <sect2>
       <title>Frame Buffer Creation</title>
       <synopsis>struct drm_framebuffer *(*fb_create)(struct drm_device *dev,
 				     struct drm_file *file_priv,
@@ -1827,6 +1837,10 @@ void intel_crt_init(struct drm_device *dev)
 !Edrivers/gpu/drm/drm_crtc.c
     </sect2>
     <sect2>
+      <title>KMS Data Structures</title>
+!Iinclude/drm/drm_crtc.h
+    </sect2>
+    <sect2>
       <title>KMS Locking</title>
 !Pdrivers/gpu/drm/drm_modeset_lock.c kms locking
 !Iinclude/drm/drm_modeset_lock.h
@@ -1933,10 +1947,16 @@ void intel_crt_init(struct drm_device *dev)
             and then retrieves a list of modes by calling the connector
             <methodname>get_modes</methodname> helper operation.
           </para>
+         <para>
+            If the helper operation returns no mode, and if the connector status
+            is connector_status_connected, standard VESA DMT modes up to
+            1024x768 are automatically added to the modes list by a call to
+            <function>drm_add_modes_noedid</function>.
+          </para>
           <para>
-            The function filters out modes larger than
+            The function then filters out modes larger than
             <parameter>max_width</parameter> and <parameter>max_height</parameter>
-            if specified. It then calls the optional connector
+            if specified. It finally calls the optional connector
             <methodname>mode_valid</methodname> helper operation for each mode in
             the probed list to check whether the mode is valid for the connector.
           </para>
@@ -2076,12 +2096,20 @@ void intel_crt_init(struct drm_device *dev)
           <synopsis>int (*get_modes)(struct drm_connector *connector);</synopsis>
           <para>
             Fill the connector's <structfield>probed_modes</structfield> list
-            by parsing EDID data with <function>drm_add_edid_modes</function> or
-            calling <function>drm_mode_probed_add</function> directly for every
+            by parsing EDID data with <function>drm_add_edid_modes</function>,
+            adding standard VESA DMT modes with <function>drm_add_modes_noedid</function>,
+            or calling <function>drm_mode_probed_add</function> directly for every
             supported mode and return the number of modes it has detected. This
             operation is mandatory.
           </para>
           <para>
+            Note that the caller function will automatically add standard VESA
+            DMT modes up to 1024x768 if the <methodname>get_modes</methodname>
+            helper operation returns no mode and if the connector status is
+            connector_status_connected. There is no need to call
+            <function>drm_add_edid_modes</function> manually in that case.
+          </para>
+          <para>
             When adding modes manually the driver creates each mode with a call to
             <function>drm_mode_create</function> and must fill the following fields.
             <itemizedlist>
@@ -2278,7 +2306,7 @@ void intel_crt_init(struct drm_device *dev)
             <function>drm_helper_probe_single_connector_modes</function>.
           </para>
           <para>
-            When parsing EDID data, <function>drm_add_edid_modes</function> fill the
+            When parsing EDID data, <function>drm_add_edid_modes</function> fills the
             connector <structfield>display_info</structfield>
             <structfield>width_mm</structfield> and
             <structfield>height_mm</structfield> fields. When creating modes
@@ -2316,8 +2344,26 @@ void intel_crt_init(struct drm_device *dev)
       </itemizedlist>
     </sect2>
     <sect2>
+      <title>Atomic Modeset Helper Functions Reference</title>
+      <sect3>
+	<title>Overview</title>
+!Pdrivers/gpu/drm/drm_atomic_helper.c overview
+      </sect3>
+      <sect3>
+	<title>Implementing Asynchronous Atomic Commit</title>
+!Pdrivers/gpu/drm/drm_atomic_helper.c implementing async commit
+      </sect3>
+      <sect3>
+	<title>Atomic State Reset and Initialization</title>
+!Pdrivers/gpu/drm/drm_atomic_helper.c atomic state reset and initialization
+      </sect3>
+!Iinclude/drm/drm_atomic_helper.h
+!Edrivers/gpu/drm/drm_atomic_helper.c
+    </sect2>
+    <sect2>
       <title>Modeset Helper Functions Reference</title>
 !Edrivers/gpu/drm/drm_crtc_helper.c
+!Pdrivers/gpu/drm/drm_crtc_helper.c overview
     </sect2>
     <sect2>
       <title>Output Probing Helper Functions Reference</title>
@@ -2343,6 +2389,12 @@ void intel_crt_init(struct drm_device *dev)
 !Edrivers/gpu/drm/drm_dp_mst_topology.c
     </sect2>
     <sect2>
+      <title>MIPI DSI Helper Functions Reference</title>
+!Pdrivers/gpu/drm/drm_mipi_dsi.c dsi helpers
+!Iinclude/drm/drm_mipi_dsi.h
+!Edrivers/gpu/drm/drm_mipi_dsi.c
+    </sect2>
+    <sect2>
       <title>EDID Helper Functions Reference</title>
 !Edrivers/gpu/drm/drm_edid.c
     </sect2>
@@ -2371,7 +2423,12 @@ void intel_crt_init(struct drm_device *dev)
     </sect2>
     <sect2>
       <title id="drm-kms-planehelpers">Plane Helper Reference</title>
-!Edrivers/gpu/drm/drm_plane_helper.c Plane Helpers
+!Edrivers/gpu/drm/drm_plane_helper.c
+!Pdrivers/gpu/drm/drm_plane_helper.c overview
+    </sect2>
+    <sect2>
+	  <title>Tile group</title>
+!Pdrivers/gpu/drm/drm_crtc.c Tile group
     </sect2>
   </sect1>
 
@@ -2507,8 +2564,8 @@ void intel_crt_init(struct drm_device *dev)
 	<td valign="top" >Description/Restrictions</td>
 	</tr>
 	<tr>
-	<td rowspan="21" valign="top" >DRM</td>
-	<td rowspan="2" valign="top" >Generic</td>
+	<td rowspan="25" valign="top" >DRM</td>
+	<td rowspan="4" valign="top" >Generic</td>
 	<td valign="top" >“EDID”</td>
 	<td valign="top" >BLOB | IMMUTABLE</td>
 	<td valign="top" >0</td>
@@ -2523,6 +2580,20 @@ void intel_crt_init(struct drm_device *dev)
 	<td valign="top" >Contains DPMS operation mode value.</td>
 	</tr>
 	<tr>
+	<td valign="top" >“PATH”</td>
+	<td valign="top" >BLOB | IMMUTABLE</td>
+	<td valign="top" >0</td>
+	<td valign="top" >Connector</td>
+	<td valign="top" >Contains topology path to a connector.</td>
+	</tr>
+	<tr>
+	<td valign="top" >“TILE”</td>
+	<td valign="top" >BLOB | IMMUTABLE</td>
+	<td valign="top" >0</td>
+	<td valign="top" >Connector</td>
+	<td valign="top" >Contains tiling information for a connector.</td>
+	</tr>
+	<tr>
 	<td rowspan="1" valign="top" >Plane</td>
 	<td valign="top" >“type”</td>
 	<td valign="top" >ENUM | IMMUTABLE</td>
@@ -2638,6 +2709,21 @@ void intel_crt_init(struct drm_device *dev)
 	<td valign="top" >TBD</td>
 	</tr>
 	<tr>
+	<td rowspan="2" valign="top" >Virtual GPU</td>
+	<td valign="top" >“suggested X”</td>
+	<td valign="top" >RANGE</td>
+	<td valign="top" >Min=0, Max=0xffffffff</td>
+	<td valign="top" >Connector</td>
+	<td valign="top" >property to suggest an X offset for a connector</td>
+	</tr>
+	<tr>
+	<td valign="top" >“suggested Y”</td>
+	<td valign="top" >RANGE</td>
+	<td valign="top" >Min=0, Max=0xffffffff</td>
+	<td valign="top" >Connector</td>
+	<td valign="top" >property to suggest an Y offset for a connector</td>
+	</tr>
+	<tr>
 	<td rowspan="3" valign="top" >Optional</td>
 	<td valign="top" >“scaling mode”</td>
 	<td valign="top" >ENUM</td>
@@ -3788,6 +3874,26 @@ int num_ioctls;</synopsis>
       those have basic support through the gma500 drm driver.
     </para>
     <sect1>
+      <title>Core Driver Infrastructure</title>
+      <para>
+	This section covers core driver infrastructure used by both the display
+	and the GEM parts of the driver.
+      </para>
+      <sect2>
+        <title>Runtime Power Management</title>
+!Pdrivers/gpu/drm/i915/intel_runtime_pm.c runtime pm
+!Idrivers/gpu/drm/i915/intel_runtime_pm.c
+      </sect2>
+      <sect2>
+        <title>Interrupt Handling</title>
+!Pdrivers/gpu/drm/i915/i915_irq.c interrupt handling
+!Fdrivers/gpu/drm/i915/i915_irq.c intel_irq_init intel_irq_init_hw intel_hpd_init
+!Fdrivers/gpu/drm/i915/i915_irq.c intel_irq_fini
+!Fdrivers/gpu/drm/i915/i915_irq.c intel_runtime_pm_disable_interrupts
+!Fdrivers/gpu/drm/i915/i915_irq.c intel_runtime_pm_enable_interrupts
+      </sect2>
+    </sect1>
+    <sect1>
       <title>Display Hardware Handling</title>
       <para>
         This section covers everything related to the display hardware including
@@ -3804,6 +3910,18 @@ int num_ioctls;</synopsis>
         </para>
       </sect2>
       <sect2>
+        <title>Frontbuffer Tracking</title>
+!Pdrivers/gpu/drm/i915/intel_frontbuffer.c frontbuffer tracking
+!Idrivers/gpu/drm/i915/intel_frontbuffer.c
+!Fdrivers/gpu/drm/i915/intel_drv.h intel_frontbuffer_flip
+!Fdrivers/gpu/drm/i915/i915_gem.c i915_gem_track_fb
+      </sect2>
+      <sect2>
+        <title>Display FIFO Underrun Reporting</title>
+!Pdrivers/gpu/drm/i915/intel_fifo_underrun.c fifo underrun handling
+!Idrivers/gpu/drm/i915/intel_fifo_underrun.c
+      </sect2>
+      <sect2>
         <title>Plane Configuration</title>
         <para>
 	  This section covers plane configuration and composition with the
@@ -3823,6 +3941,16 @@ int num_ioctls;</synopsis>
         </para>
       </sect2>
       <sect2>
+	<title>High Definition Audio</title>
+!Pdrivers/gpu/drm/i915/intel_audio.c High Definition Audio over HDMI and Display Port
+!Idrivers/gpu/drm/i915/intel_audio.c
+      </sect2>
+      <sect2>
+	<title>Panel Self Refresh PSR (PSR/SRD)</title>
+!Pdrivers/gpu/drm/i915/intel_psr.c Panel Self Refresh (PSR/SRD)
+!Idrivers/gpu/drm/i915/intel_psr.c
+      </sect2>
+      <sect2>
         <title>DPIO</title>
 !Pdrivers/gpu/drm/i915/i915_reg.h DPIO
 	<table id="dpiox2">
@@ -3931,6 +4059,28 @@ int num_ioctls;</synopsis>
 !Idrivers/gpu/drm/i915/intel_lrc.c
       </sect2>
     </sect1>
+
+    <sect1>
+      <title> Tracing </title>
+      <para>
+    This sections covers all things related to the tracepoints implemented in
+    the i915 driver.
+      </para>
+      <sect2>
+        <title> i915_ppgtt_create and i915_ppgtt_release </title>
+!Pdrivers/gpu/drm/i915/i915_trace.h i915_ppgtt_create and i915_ppgtt_release tracepoints
+      </sect2>
+      <sect2>
+        <title> i915_context_create and i915_context_free </title>
+!Pdrivers/gpu/drm/i915/i915_trace.h i915_context_create and i915_context_free tracepoints
+      </sect2>
+      <sect2>
+        <title> switch_mm </title>
+!Pdrivers/gpu/drm/i915/i915_trace.h switch_mm tracepoint
+      </sect2>
+    </sect1>
+
   </chapter>
+!Cdrivers/gpu/drm/i915/i915_irq.c
 </part>
 </book>
diff --git a/Documentation/devicetree/bindings/staging/imx-drm/fsl-imx-drm.txt b/Documentation/devicetree/bindings/drm/imx/fsl-imx-drm.txt
index e75f0e549fff..e75f0e549fff 100644
--- a/Documentation/devicetree/bindings/staging/imx-drm/fsl-imx-drm.txt
+++ b/Documentation/devicetree/bindings/drm/imx/fsl-imx-drm.txt
diff --git a/Documentation/devicetree/bindings/staging/imx-drm/hdmi.txt b/Documentation/devicetree/bindings/drm/imx/hdmi.txt
index 1b756cf9afb0..1b756cf9afb0 100644
--- a/Documentation/devicetree/bindings/staging/imx-drm/hdmi.txt
+++ b/Documentation/devicetree/bindings/drm/imx/hdmi.txt
diff --git a/Documentation/devicetree/bindings/staging/imx-drm/ldb.txt b/Documentation/devicetree/bindings/drm/imx/ldb.txt
index 443bcb6134d5..443bcb6134d5 100644
--- a/Documentation/devicetree/bindings/staging/imx-drm/ldb.txt
+++ b/Documentation/devicetree/bindings/drm/imx/ldb.txt
diff --git a/Documentation/devicetree/bindings/gpu/nvidia,tegra20-host1x.txt b/Documentation/devicetree/bindings/gpu/nvidia,tegra20-host1x.txt
index b48f4ef31d93..4c32ef0b7db8 100644
--- a/Documentation/devicetree/bindings/gpu/nvidia,tegra20-host1x.txt
+++ b/Documentation/devicetree/bindings/gpu/nvidia,tegra20-host1x.txt
@@ -191,6 +191,8 @@ of the following host1x client modules:
   - nvidia,hpd-gpio: specifies a GPIO used for hotplug detection
   - nvidia,edid: supplies a binary EDID blob
   - nvidia,panel: phandle of a display panel
+  - nvidia,ganged-mode: contains a phandle to a second DSI controller to gang
+    up with in order to support up to 8 data lanes
 
 - sor: serial output resource
 
diff --git a/Documentation/devicetree/bindings/gpu/st,stih4xx.txt b/Documentation/devicetree/bindings/gpu/st,stih4xx.txt
index 2d150c311a05..c99eb34e640b 100644
--- a/Documentation/devicetree/bindings/gpu/st,stih4xx.txt
+++ b/Documentation/devicetree/bindings/gpu/st,stih4xx.txt
@@ -68,7 +68,7 @@ STMicroelectronics stih4xx platforms
     number of clocks may depend of the SoC type.
   - clock-names: names of the clocks listed in clocks property in the same
     order.
-  - hdmi,hpd-gpio: gpio id to detect if an hdmi cable is plugged or not.
+  - ddc: phandle of an I2C controller used for DDC EDID probing
 
 sti-hda:
   Required properties:
@@ -83,6 +83,22 @@ sti-hda:
   - clock-names: names of the clocks listed in clocks property in the same
     order.
 
+sti-hqvdp:
+  must be a child of sti-display-subsystem
+  Required properties:
+  - compatible: "st,stih<chip>-hqvdp"
+  - reg: Physical base address of the IP registers and length of memory mapped region.
+  - clocks: from common clock binding: handle hardware IP needed clocks, the
+    number of clocks may depend of the SoC type.
+    See ../clocks/clock-bindings.txt for details.
+  - clock-names: names of the clocks listed in clocks property in the same
+    order.
+  - resets: resets to be used by the device
+    See ../reset/reset.txt for details.
+  - reset-names: names of the resets listed in resets property in the same
+    order.
+  - st,vtg: phandle on vtg main device node.
+
 Example:
 
 / {
@@ -173,7 +189,6 @@ Example:
 				interrupt-names	= "irq";
 				clock-names	= "pix", "tmds", "phy", "audio";
 				clocks          = <&clockgen_c_vcc CLK_S_PIX_HDMI>, <&clockgen_c_vcc CLK_S_TMDS_HDMI>, <&clockgen_c_vcc CLK_S_HDMI_REJECT_PLL>, <&clockgen_b1 CLK_S_PCM_0>;
-				hdmi,hpd-gpio	= <&PIO2 5>;
 			};
 
 			sti-hda@fe85a000 {
@@ -184,6 +199,16 @@ Example:
 				clocks          = <&clockgen_c_vcc CLK_S_PIX_HD>, <&clockgen_c_vcc CLK_S_HDDAC>;
 			};
 		};
+
+		sti-hqvdp@9c000000 {
+				compatible	= "st,stih407-hqvdp";
+				reg		= <0x9C00000 0x100000>;
+				clock-names	= "hqvdp", "pix_main";
+				clocks		= <&clk_s_c0_flexgen CLK_MAIN_DISP>, <&clk_s_d2_flexgen CLK_PIX_MAIN_DISP>;
+				reset-names     = "hqvdp";
+				resets          = <&softreset STIH407_HDQVDP_SOFTRESET>;
+				st,vtg		= <&vtg_main>;
+			};
 	};
 	...
 };
diff --git a/Documentation/devicetree/bindings/panel/auo,b116xw03.txt b/Documentation/devicetree/bindings/panel/auo,b116xw03.txt
new file mode 100644
index 000000000000..690d0a568ef3
--- /dev/null
+++ b/Documentation/devicetree/bindings/panel/auo,b116xw03.txt
@@ -0,0 +1,7 @@
+AU Optronics Corporation 11.6" HD (1366x768) color TFT-LCD panel
+
+Required properties:
+- compatible: should be "auo,b116xw03"
+
+This binding is compatible with the simple-panel binding, which is specified
+in simple-panel.txt in this directory.
diff --git a/Documentation/devicetree/bindings/panel/hannstar,hsd070pww1.txt b/Documentation/devicetree/bindings/panel/hannstar,hsd070pww1.txt
new file mode 100644
index 000000000000..7da1d5c038ff
--- /dev/null
+++ b/Documentation/devicetree/bindings/panel/hannstar,hsd070pww1.txt
@@ -0,0 +1,7 @@
+HannStar Display Corp. HSD070PWW1 7.0" WXGA TFT LCD panel
+
+Required properties:
+- compatible: should be "hannstar,hsd070pww1"
+
+This binding is compatible with the simple-panel binding, which is specified
+in simple-panel.txt in this directory.
diff --git a/Documentation/devicetree/bindings/panel/hit,tx23d38vm0caa.txt b/Documentation/devicetree/bindings/panel/hit,tx23d38vm0caa.txt
new file mode 100644
index 000000000000..04caaae19af6
--- /dev/null
+++ b/Documentation/devicetree/bindings/panel/hit,tx23d38vm0caa.txt
@@ -0,0 +1,7 @@
+Hitachi Ltd. Corporation 9" WVGA (800x480) TFT LCD panel
+
+Required properties:
+- compatible: should be "hit,tx23d38vm0caa"
+
+This binding is compatible with the simple-panel binding, which is specified
+in simple-panel.txt in this directory.
diff --git a/Documentation/devicetree/bindings/panel/innolux,g121i1-l01.txt b/Documentation/devicetree/bindings/panel/innolux,g121i1-l01.txt
new file mode 100644
index 000000000000..2743b07cd2f2
--- /dev/null
+++ b/Documentation/devicetree/bindings/panel/innolux,g121i1-l01.txt
@@ -0,0 +1,7 @@
+Innolux Corporation 12.1" WXGA (1280x800) TFT LCD panel
+
+Required properties:
+- compatible: should be "innolux,g121i1-l01"
+
+This binding is compatible with the simple-panel binding, which is specified
+in simple-panel.txt in this directory.
diff --git a/Documentation/devicetree/bindings/panel/sharp,lq101r1sx01.txt b/Documentation/devicetree/bindings/panel/sharp,lq101r1sx01.txt
new file mode 100644
index 000000000000..f522bb8e47e1
--- /dev/null
+++ b/Documentation/devicetree/bindings/panel/sharp,lq101r1sx01.txt
@@ -0,0 +1,49 @@
+Sharp Microelectronics 10.1" WQXGA TFT LCD panel
+
+This panel requires a dual-channel DSI host to operate. It supports two modes:
+- left-right: each channel drives the left or right half of the screen
+- even-odd: each channel drives the even or odd lines of the screen
+
+Each of the DSI channels controls a separate DSI peripheral. The peripheral
+driven by the first link (DSI-LINK1), left or even, is considered the primary
+peripheral and controls the device. The 'link2' property contains a phandle
+to the peripheral driven by the second link (DSI-LINK2, right or odd).
+
+Note that in video mode the DSI-LINK1 interface always provides the left/even
+pixels and DSI-LINK2 always provides the right/odd pixels. In command mode it
+is possible to program either link to drive the left/even or right/odd pixels
+but for the sake of consistency this binding assumes that the same assignment
+is chosen as for video mode.
+
+Required properties:
+- compatible: should be "sharp,lq101r1sx01"
+- reg: DSI virtual channel of the peripheral
+
+Required properties (for DSI-LINK1 only):
+- link2: phandle to the DSI peripheral on the secondary link. Note that the
+  presence of this property marks the containing node as DSI-LINK1.
+- power-supply: phandle of the regulator that provides the supply voltage
+
+Optional properties (for DSI-LINK1 only):
+- backlight: phandle of the backlight device attached to the panel
+
+Example:
+
+	dsi@54300000 {
+		panel: panel@0 {
+			compatible = "sharp,lq101r1sx01";
+			reg = <0>;
+
+			link2 = <&secondary>;
+
+			power-supply = <...>;
+			backlight = <...>;
+		};
+	};
+
+	dsi@54400000 {
+		secondary: panel@0 {
+			compatible = "sharp,lq101r1sx01";
+			reg = <0>;
+		};
+	};
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index cc6151c431c8..423d47418e72 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -66,8 +66,10 @@ gmt	Global Mixed-mode Technology, Inc.
 google	Google, Inc.
 gumstix	Gumstix, Inc.
 gw	Gateworks Corporation
+hannstar	HannStar Display Corporation
 haoyu	Haoyu Microelectronic Co. Ltd.
 hisilicon	Hisilicon Limited.
+hit	Hitachi Ltd.
 honeywell	Honeywell
 hp	Hewlett Packard
 i2se	I2SE GmbH
diff --git a/Documentation/devicetree/bindings/video/adi,adv7511.txt b/Documentation/devicetree/bindings/video/adi,adv7511.txt
new file mode 100644
index 000000000000..96c25ee01501
--- /dev/null
+++ b/Documentation/devicetree/bindings/video/adi,adv7511.txt
@@ -0,0 +1,88 @@
+Analog Device ADV7511(W)/13 HDMI Encoders
+-----------------------------------------
+
+The ADV7511, ADV7511W and ADV7513 are HDMI audio and video transmitters
+compatible with HDMI 1.4 and DVI 1.0. They support color space conversion,
+S/PDIF, CEC and HDCP.
+
+Required properties:
+
+- compatible: Should be one of "adi,adv7511", "adi,adv7511w" or "adi,adv7513"
+- reg: I2C slave address
+
+The ADV7511 supports a large number of input data formats that differ by their
+color depth, color format, clock mode, bit justification and random
+arrangement of components on the data bus. The combination of the following
+properties describe the input and map directly to the video input tables of the
+ADV7511 datasheet that document all the supported combinations.
+
+- adi,input-depth: Number of bits per color component at the input (8, 10 or
+  12).
+- adi,input-colorspace: The input color space, one of "rgb", "yuv422" or
+  "yuv444".
+- adi,input-clock: The input clock type, one of "1x" (one clock cycle per
+  pixel), "2x" (two clock cycles per pixel), "ddr" (one clock cycle per pixel,
+  data driven on both edges).
+
+The following input format properties are required except in "rgb 1x" and
+"yuv444 1x" modes, in which case they must not be specified.
+
+- adi,input-style: The input components arrangement variant (1, 2 or 3), as
+  listed in the input format tables in the datasheet.
+- adi,input-justification: The input bit justification ("left", "evenly",
+  "right").
+
+Optional properties:
+
+- interrupts: Specifier for the ADV7511 interrupt
+- pd-gpios: Specifier for the GPIO connected to the power down signal
+
+- adi,clock-delay: Video data clock delay relative to the pixel clock, in ps
+  (-1200 ps .. 1600 ps). Defaults to no delay.
+- adi,embedded-sync: The input uses synchronization signals embedded in the
+  data stream (similar to BT.656). Defaults to separate H/V synchronization
+  signals.
+
+Required nodes:
+
+The ADV7511 has two video ports. Their connections are modelled using the OF
+graph bindings specified in Documentation/devicetree/bindings/graph.txt.
+
+- Video port 0 for the RGB or YUV input
+- Video port 1 for the HDMI output
+
+
+Example
+-------
+
+	adv7511w: hdmi@39 {
+		compatible = "adi,adv7511w";
+		reg = <39>;
+		interrupt-parent = <&gpio3>;
+		interrupts = <29 IRQ_TYPE_EDGE_FALLING>;
+
+		adi,input-depth = <8>;
+		adi,input-colorspace = "rgb";
+		adi,input-clock = "1x";
+		adi,input-style = <1>;
+		adi,input-justification = "evenly";
+
+		ports {
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			port@0 {
+				reg = <0>;
+				adv7511w_in: endpoint {
+					remote-endpoint = <&dpi_out>;
+				};
+			};
+
+			port@1 {
+				reg = <1>;
+				adv7511_out: endpoint {
+					remote-endpoint = <&hdmi_connector_in>;
+				};
+			};
+		};
+	};
diff --git a/Documentation/devicetree/bindings/video/exynos_dsim.txt b/Documentation/devicetree/bindings/video/exynos_dsim.txt
index e74243b4b317..ca2b4aacd9af 100644
--- a/Documentation/devicetree/bindings/video/exynos_dsim.txt
+++ b/Documentation/devicetree/bindings/video/exynos_dsim.txt
@@ -4,6 +4,7 @@ Required properties:
   - compatible: value should be one of the following
 		"samsung,exynos3250-mipi-dsi" /* for Exynos3250/3472 SoCs */
 		"samsung,exynos4210-mipi-dsi" /* for Exynos4 SoCs */
+		"samsung,exynos4415-mipi-dsi" /* for Exynos4415 SoC */
 		"samsung,exynos5410-mipi-dsi" /* for Exynos5410/5420/5440 SoCs */
   - reg: physical base address and length of the registers set for the device
   - interrupts: should contain DSI interrupt
diff --git a/Documentation/devicetree/bindings/video/rockchip-drm.txt b/Documentation/devicetree/bindings/video/rockchip-drm.txt
new file mode 100644
index 000000000000..7fff582495a2
--- /dev/null
+++ b/Documentation/devicetree/bindings/video/rockchip-drm.txt
@@ -0,0 +1,19 @@
+Rockchip DRM master device
+================================
+
+The Rockchip DRM master device is a virtual device needed to list all
+vop devices or other display interface nodes that comprise the
+graphics subsystem.
+
+Required properties:
+- compatible: Should be "rockchip,display-subsystem"
+- ports: Should contain a list of phandles pointing to display interface port
+  of vop devices. vop definitions as defined in
+  Documentation/devicetree/bindings/video/rockchip-vop.txt
+
+example:
+
+display-subsystem {
+	compatible = "rockchip,display-subsystem";
+	ports = <&vopl_out>, <&vopb_out>;
+};
diff --git a/Documentation/devicetree/bindings/video/rockchip-vop.txt b/Documentation/devicetree/bindings/video/rockchip-vop.txt
new file mode 100644
index 000000000000..d15351f2313d
--- /dev/null
+++ b/Documentation/devicetree/bindings/video/rockchip-vop.txt
@@ -0,0 +1,58 @@
+device-tree bindings for rockchip soc display controller (vop)
+
+VOP (Visual Output Processor) is the Display Controller for the Rockchip
+series of SoCs which transfers the image data from a video memory
+buffer to an external LCD interface.
+
+Required properties:
+- compatible: value should be one of the following
+		"rockchip,rk3288-vop";
+
+- interrupts: should contain a list of all VOP IP block interrupts in the
+		 order: VSYNC, LCD_SYSTEM. The interrupt specifier
+		 format depends on the interrupt controller used.
+
+- clocks: must include clock specifiers corresponding to entries in the
+		clock-names property.
+
+- clock-names: Must contain
+		aclk_vop: for ddr buffer transfer.
+		hclk_vop: for ahb bus to R/W the phy regs.
+		dclk_vop: pixel clock.
+
+- resets: Must contain an entry for each entry in reset-names.
+  See ../reset/reset.txt for details.
+- reset-names: Must include the following entries:
+  - axi
+  - ahb
+  - dclk
+
+- iommus: required a iommu node
+
+- port: A port node with endpoint definitions as defined in
+  Documentation/devicetree/bindings/media/video-interfaces.txt.
+
+Example:
+SoC specific DT entry:
+	vopb: vopb@ff930000 {
+		compatible = "rockchip,rk3288-vop";
+		reg = <0xff930000 0x19c>;
+		interrupts = <GIC_SPI 15 IRQ_TYPE_LEVEL_HIGH>;
+		clocks = <&cru ACLK_VOP0>, <&cru DCLK_VOP0>, <&cru HCLK_VOP0>;
+		clock-names = "aclk_vop", "dclk_vop", "hclk_vop";
+		resets = <&cru SRST_LCDC1_AXI>, <&cru SRST_LCDC1_AHB>, <&cru SRST_LCDC1_DCLK>;
+		reset-names = "axi", "ahb", "dclk";
+		iommus = <&vopb_mmu>;
+		vopb_out: port {
+			#address-cells = <1>;
+			#size-cells = <0>;
+			vopb_out_edp: endpoint@0 {
+				reg = <0>;
+				remote-endpoint=<&edp_in_vopb>;
+			};
+			vopb_out_hdmi: endpoint@1 {
+				reg = <1>;
+				remote-endpoint=<&hdmi_in_vopb>;
+			};
+		};
+	};
diff --git a/Documentation/devicetree/bindings/video/samsung-fimd.txt b/Documentation/devicetree/bindings/video/samsung-fimd.txt
index 4e6c77c85546..cf1af6371021 100644
--- a/Documentation/devicetree/bindings/video/samsung-fimd.txt
+++ b/Documentation/devicetree/bindings/video/samsung-fimd.txt
@@ -11,6 +11,7 @@ Required properties:
 		"samsung,s5pv210-fimd"; /* for S5PV210 SoC */
 		"samsung,exynos3250-fimd"; /* for Exynos3250/3472 SoCs */
 		"samsung,exynos4210-fimd"; /* for Exynos4 SoCs */
+		"samsung,exynos4415-fimd"; /* for Exynos4415 SoC */
 		"samsung,exynos5250-fimd"; /* for Exynos5 SoCs */
 
 - reg: physical base address and length of the FIMD registers set.
diff --git a/MAINTAINERS b/MAINTAINERS
index fdffe962a16a..c690b5a0d7b7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -618,6 +618,16 @@ S:	Maintained
 F:	drivers/iommu/amd_iommu*.[ch]
 F:	include/linux/amd-iommu.h
 
+AMD KFD
+M:      Oded Gabbay <oded.gabbay@amd.com>
+L:      dri-devel@lists.freedesktop.org
+T:      git git://people.freedesktop.org/~gabbayo/linux.git
+S:      Supported
+F:      drivers/gpu/drm/amd/amdkfd/
+F:      drivers/gpu/drm/radeon/radeon_kfd.c
+F:      drivers/gpu/drm/radeon/radeon_kfd.h
+F:      include/uapi/linux/kfd_ioctl.h
+
 AMD MICROCODE UPDATE SUPPORT
 M:	Andreas Herrmann <herrmann.der.user@googlemail.com>
 L:	amd64-microcode@amd64.org
@@ -3297,6 +3307,13 @@ F:	drivers/gpu/drm/exynos/
 F:	include/drm/exynos*
 F:	include/uapi/drm/exynos*
 
+DRM DRIVERS FOR FREESCALE IMX
+M:	Philipp Zabel <p.zabel@pengutronix.de>
+L:	dri-devel@lists.freedesktop.org
+S:	Maintained
+F:	drivers/gpu/drm/imx/
+F:	Documentation/devicetree/bindings/drm/imx/
+
 DRM DRIVERS FOR NVIDIA TEGRA
 M:	Thierry Reding <thierry.reding@gmail.com>
 M:	Terje Bergström <tbergstrom@nvidia.com>
diff --git a/arch/arm/mach-shmobile/board-lager.c b/arch/arm/mach-shmobile/board-lager.c
index b47262afb240..f8197eb6e566 100644
--- a/arch/arm/mach-shmobile/board-lager.c
+++ b/arch/arm/mach-shmobile/board-lager.c
@@ -32,7 +32,6 @@
 #include <linux/pinctrl/machine.h>
 #include <linux/platform_data/camera-rcar.h>
 #include <linux/platform_data/gpio-rcar.h>
-#include <linux/platform_data/rcar-du.h>
 #include <linux/platform_data/usb-rcar-gen2-phy.h>
 #include <linux/platform_device.h>
 #include <linux/phy.h>
@@ -83,61 +82,6 @@
  *
  */
 
-/* DU */
-static struct rcar_du_encoder_data lager_du_encoders[] = {
-	{
-		.type = RCAR_DU_ENCODER_VGA,
-		.output = RCAR_DU_OUTPUT_DPAD0,
-	}, {
-		.type = RCAR_DU_ENCODER_NONE,
-		.output = RCAR_DU_OUTPUT_LVDS1,
-		.connector.lvds.panel = {
-			.width_mm = 210,
-			.height_mm = 158,
-			.mode = {
-				.pixelclock = 65000000,
-				.hactive = 1024,
-				.hfront_porch = 20,
-				.hback_porch = 160,
-				.hsync_len = 136,
-				.vactive = 768,
-				.vfront_porch = 3,
-				.vback_porch = 29,
-				.vsync_len = 6,
-			},
-		},
-	},
-};
-
-static const struct rcar_du_platform_data lager_du_pdata __initconst = {
-	.encoders = lager_du_encoders,
-	.num_encoders = ARRAY_SIZE(lager_du_encoders),
-};
-
-static const struct resource du_resources[] __initconst = {
-	DEFINE_RES_MEM(0xfeb00000, 0x70000),
-	DEFINE_RES_MEM_NAMED(0xfeb90000, 0x1c, "lvds.0"),
-	DEFINE_RES_MEM_NAMED(0xfeb94000, 0x1c, "lvds.1"),
-	DEFINE_RES_IRQ(gic_spi(256)),
-	DEFINE_RES_IRQ(gic_spi(268)),
-	DEFINE_RES_IRQ(gic_spi(269)),
-};
-
-static void __init lager_add_du_device(void)
-{
-	struct platform_device_info info = {
-		.name = "rcar-du-r8a7790",
-		.id = -1,
-		.res = du_resources,
-		.num_res = ARRAY_SIZE(du_resources),
-		.data = &lager_du_pdata,
-		.size_data = sizeof(lager_du_pdata),
-		.dma_mask = DMA_BIT_MASK(32),
-	};
-
-	platform_device_register_full(&info);
-}
-
 /* LEDS */
 static struct gpio_led lager_leds[] = {
 	{
@@ -800,8 +744,6 @@ static void __init lager_add_standard_devices(void)
 
 	platform_device_register_full(&ether_info);
 
-	lager_add_du_device();
-
 	platform_device_register_resndata(NULL, "qspi", 0,
 					  qspi_resources,
 					  ARRAY_SIZE(qspi_resources),
diff --git a/arch/arm/mach-shmobile/board-marzen.c b/arch/arm/mach-shmobile/board-marzen.c
index 994dc7d86ae2..598f704f76ae 100644
--- a/arch/arm/mach-shmobile/board-marzen.c
+++ b/arch/arm/mach-shmobile/board-marzen.c
@@ -27,7 +27,6 @@
 #include <linux/pinctrl/machine.h>
 #include <linux/platform_data/camera-rcar.h>
 #include <linux/platform_data/gpio-rcar.h>
-#include <linux/platform_data/rcar-du.h>
 #include <linux/platform_data/usb-rcar-phy.h>
 #include <linux/regulator/fixed.h>
 #include <linux/regulator/machine.h>
@@ -171,62 +170,6 @@ static struct platform_device hspi_device = {
 	.num_resources	= ARRAY_SIZE(hspi_resources),
 };
 
-/*
- * DU
- *
- * The panel only specifies the [hv]display and [hv]total values. The position
- * and width of the sync pulses don't matter, they're copied from VESA timings.
- */
-static struct rcar_du_encoder_data du_encoders[] = {
-	{
-		.type = RCAR_DU_ENCODER_VGA,
-		.output = RCAR_DU_OUTPUT_DPAD0,
-	}, {
-		.type = RCAR_DU_ENCODER_LVDS,
-		.output = RCAR_DU_OUTPUT_DPAD1,
-		.connector.lvds.panel = {
-			.width_mm = 210,
-			.height_mm = 158,
-			.mode = {
-				.pixelclock = 65000000,
-				.hactive = 1024,
-				.hfront_porch = 20,
-				.hback_porch = 160,
-				.hsync_len = 136,
-				.vactive = 768,
-				.vfront_porch = 3,
-				.vback_porch = 29,
-				.vsync_len = 6,
-			},
-		},
-	},
-};
-
-static const struct rcar_du_platform_data du_pdata __initconst = {
-	.encoders = du_encoders,
-	.num_encoders = ARRAY_SIZE(du_encoders),
-};
-
-static const struct resource du_resources[] __initconst = {
-	DEFINE_RES_MEM(0xfff80000, 0x40000),
-	DEFINE_RES_IRQ(gic_iid(0x3f)),
-};
-
-static void __init marzen_add_du_device(void)
-{
-	struct platform_device_info info = {
-		.name = "rcar-du-r8a7779",
-		.id = -1,
-		.res = du_resources,
-		.num_res = ARRAY_SIZE(du_resources),
-		.data = &du_pdata,
-		.size_data = sizeof(du_pdata),
-		.dma_mask = DMA_BIT_MASK(32),
-	};
-
-	platform_device_register_full(&info);
-}
-
 /* LEDS */
 static struct gpio_led marzen_leds[] = {
 	{
@@ -385,7 +328,6 @@ static void __init marzen_init(void)
 	platform_device_register_full(&vin1_info);
 	platform_device_register_full(&vin3_info);
 	platform_add_devices(marzen_devices, ARRAY_SIZE(marzen_devices));
-	marzen_add_du_device();
 }
 
 static const char *marzen_boards_compat_dt[] __initdata = {
diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index 2e1a6853e00c..fe9f0b79a18b 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -455,6 +455,23 @@ struct intel_stolen_funcs {
 	u32 (*base)(int num, int slot, int func, size_t size);
 };
 
+static size_t __init gen9_stolen_size(int num, int slot, int func)
+{
+	u16 gmch_ctrl;
+
+	gmch_ctrl = read_pci_config_16(num, slot, func, SNB_GMCH_CTRL);
+	gmch_ctrl >>= BDW_GMCH_GMS_SHIFT;
+	gmch_ctrl &= BDW_GMCH_GMS_MASK;
+
+	if (gmch_ctrl < 0xf0)
+		return gmch_ctrl << 25; /* 32 MB units */
+	else
+		/* 4MB increments starting at 0xf0 for 4MB */
+		return (gmch_ctrl - 0xf0 + 1) << 22;
+}
+
+typedef size_t (*stolen_size_fn)(int num, int slot, int func);
+
 static const struct intel_stolen_funcs i830_stolen_funcs __initconst = {
 	.base = i830_stolen_base,
 	.size = i830_stolen_size,
@@ -490,6 +507,11 @@ static const struct intel_stolen_funcs gen8_stolen_funcs __initconst = {
 	.size = gen8_stolen_size,
 };
 
+static const struct intel_stolen_funcs gen9_stolen_funcs __initconst = {
+	.base = intel_stolen_base,
+	.size = gen9_stolen_size,
+};
+
 static const struct intel_stolen_funcs chv_stolen_funcs __initconst = {
 	.base = intel_stolen_base,
 	.size = chv_stolen_size,
@@ -523,6 +545,7 @@ static const struct pci_device_id intel_stolen_ids[] __initconst = {
 	INTEL_BDW_M_IDS(&gen8_stolen_funcs),
 	INTEL_BDW_D_IDS(&gen8_stolen_funcs),
 	INTEL_CHV_IDS(&chv_stolen_funcs),
+	INTEL_SKL_IDS(&gen9_stolen_funcs),
 };
 
 static void __init intel_graphics_stolen(int num, int slot, int func)
diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 9a024f899dd4..f3334829e55a 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -153,7 +153,6 @@ static struct page *i8xx_alloc_pages(void)
 		__free_pages(page, 2);
 		return NULL;
 	}
-	get_page(page);
 	atomic_inc(&agp_bridge->current_memory_agp);
 	return page;
 }
@@ -164,7 +163,6 @@ static void i8xx_destroy_pages(struct page *page)
 		return;
 
 	set_pages_wb(page, 4);
-	put_page(page);
 	__free_pages(page, 2);
 	atomic_dec(&agp_bridge->current_memory_agp);
 }
@@ -300,7 +298,6 @@ static int intel_gtt_setup_scratch_page(void)
 	page = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO);
 	if (page == NULL)
 		return -ENOMEM;
-	get_page(page);
 	set_pages_uc(page, 1);
 
 	if (intel_private.needs_dmar) {
@@ -560,7 +557,6 @@ static void intel_gtt_teardown_scratch_page(void)
 	set_pages_wb(intel_private.scratch_page, 1);
 	pci_unmap_page(intel_private.pcidev, intel_private.scratch_page_dma,
 		       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-	put_page(intel_private.scratch_page);
 	__free_page(intel_private.scratch_page);
 }
 
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index e3b4b0f02b3d..c3413b6adb17 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -167,6 +167,8 @@ config DRM_SAVAGE
 
 source "drivers/gpu/drm/exynos/Kconfig"
 
+source "drivers/gpu/drm/rockchip/Kconfig"
+
 source "drivers/gpu/drm/vmwgfx/Kconfig"
 
 source "drivers/gpu/drm/gma500/Kconfig"
@@ -200,3 +202,7 @@ source "drivers/gpu/drm/tegra/Kconfig"
 source "drivers/gpu/drm/panel/Kconfig"
 
 source "drivers/gpu/drm/sti/Kconfig"
+
+source "drivers/gpu/drm/amd/amdkfd/Kconfig"
+
+source "drivers/gpu/drm/imx/Kconfig"
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 9292a761ea6d..66e40398b3d3 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -14,7 +14,7 @@ drm-y       :=	drm_auth.o drm_bufs.o drm_cache.o \
 		drm_info.o drm_debugfs.o drm_encoder_slave.o \
 		drm_trace_points.o drm_global.o drm_prime.o \
 		drm_rect.o drm_vma_manager.o drm_flip_work.o \
-		drm_modeset_lock.o
+		drm_modeset_lock.o drm_atomic.o
 
 drm-$(CONFIG_COMPAT) += drm_ioc32.o
 drm-$(CONFIG_DRM_GEM_CMA_HELPER) += drm_gem_cma_helper.o
@@ -23,7 +23,7 @@ drm-$(CONFIG_DRM_PANEL) += drm_panel.o
 drm-$(CONFIG_OF) += drm_of.o
 
 drm_kms_helper-y := drm_crtc_helper.o drm_dp_helper.o drm_probe_helper.o \
-		drm_plane_helper.o drm_dp_mst_topology.o
+		drm_plane_helper.o drm_dp_mst_topology.o drm_atomic_helper.o
 drm_kms_helper-$(CONFIG_DRM_LOAD_EDID_FIRMWARE) += drm_edid_load.o
 drm_kms_helper-$(CONFIG_DRM_KMS_FB_HELPER) += drm_fb_helper.o
 drm_kms_helper-$(CONFIG_DRM_KMS_CMA_HELPER) += drm_fb_cma_helper.o
@@ -49,6 +49,7 @@ obj-$(CONFIG_DRM_VMWGFX)+= vmwgfx/
 obj-$(CONFIG_DRM_VIA)	+=via/
 obj-$(CONFIG_DRM_NOUVEAU) +=nouveau/
 obj-$(CONFIG_DRM_EXYNOS) +=exynos/
+obj-$(CONFIG_DRM_ROCKCHIP) +=rockchip/
 obj-$(CONFIG_DRM_GMA500) += gma500/
 obj-$(CONFIG_DRM_UDL) += udl/
 obj-$(CONFIG_DRM_AST) += ast/
@@ -62,6 +63,8 @@ obj-$(CONFIG_DRM_BOCHS) += bochs/
 obj-$(CONFIG_DRM_MSM) += msm/
 obj-$(CONFIG_DRM_TEGRA) += tegra/
 obj-$(CONFIG_DRM_STI) += sti/
+obj-$(CONFIG_DRM_IMX) += imx/
 obj-y			+= i2c/
 obj-y			+= panel/
 obj-y			+= bridge/
+obj-$(CONFIG_HSA_AMD) += amd/amdkfd/
diff --git a/drivers/gpu/drm/README.drm b/drivers/gpu/drm/README.drm
deleted file mode 100644
index b5b332722581..000000000000
--- a/drivers/gpu/drm/README.drm
+++ /dev/null
@@ -1,43 +0,0 @@
-************************************************************
-* For the very latest on DRI development, please see:      *
-*     http://dri.freedesktop.org/                          *
-************************************************************
-
-The Direct Rendering Manager (drm) is a device-independent kernel-level
-device driver that provides support for the XFree86 Direct Rendering
-Infrastructure (DRI).
-
-The DRM supports the Direct Rendering Infrastructure (DRI) in four major
-ways:
-
-    1. The DRM provides synchronized access to the graphics hardware via
-       the use of an optimized two-tiered lock.
-
-    2. The DRM enforces the DRI security policy for access to the graphics
-       hardware by only allowing authenticated X11 clients access to
-       restricted regions of memory.
-
-    3. The DRM provides a generic DMA engine, complete with multiple
-       queues and the ability to detect the need for an OpenGL context
-       switch.
-
-    4. The DRM is extensible via the use of small device-specific modules
-       that rely extensively on the API exported by the DRM module.
-
-
-Documentation on the DRI is available from:
-    http://dri.freedesktop.org/wiki/Documentation
-    http://sourceforge.net/project/showfiles.php?group_id=387
-    http://dri.sourceforge.net/doc/
-
-For specific information about kernel-level support, see:
-
-    The Direct Rendering Manager, Kernel Support for the Direct Rendering
-    Infrastructure
-    http://dri.sourceforge.net/doc/drm_low_level.html
-
-    Hardware Locking for the Direct Rendering Infrastructure
-    http://dri.sourceforge.net/doc/hardware_locking_low_level.html
-
-    A Security Analysis of the Direct Rendering Infrastructure
-    http://dri.sourceforge.net/doc/security_low_level.html
diff --git a/drivers/gpu/drm/amd/amdkfd/Kconfig b/drivers/gpu/drm/amd/amdkfd/Kconfig
new file mode 100644
index 000000000000..8dfac37ff327
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/Kconfig
@@ -0,0 +1,9 @@
+#
+# Heterogenous system architecture configuration
+#
+
+config HSA_AMD
+	tristate "HSA kernel driver for AMD GPU devices"
+	depends on DRM_RADEON && AMD_IOMMU_V2 && X86_64
+	help
+	  Enable this if you want to use HSA features on AMD GPU devices.
diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile
new file mode 100644
index 000000000000..be6246de5091
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -0,0 +1,14 @@
+#
+# Makefile for Heterogenous System Architecture support for AMD GPU devices
+#
+
+ccflags-y := -Iinclude/drm -Idrivers/gpu/drm/amd/include/
+
+amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
+		kfd_pasid.o kfd_doorbell.o kfd_flat_memory.o \
+		kfd_process.o kfd_queue.o kfd_mqd_manager.o \
+		kfd_kernel_queue.o kfd_packet_manager.o \
+		kfd_process_queue_manager.o kfd_device_queue_manager.o \
+		kfd_interrupt.o
+
+obj-$(CONFIG_HSA_AMD)	+= amdkfd.o
diff --git a/drivers/gpu/drm/amd/amdkfd/cik_regs.h b/drivers/gpu/drm/amd/amdkfd/cik_regs.h
new file mode 100644
index 000000000000..607fc5ceadbe
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/cik_regs.h
@@ -0,0 +1,221 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef CIK_REGS_H
+#define CIK_REGS_H
+
+#define IH_VMID_0_LUT					0x3D40u
+
+#define BIF_DOORBELL_CNTL				0x530Cu
+
+#define	SRBM_GFX_CNTL					0xE44
+#define	PIPEID(x)					((x) << 0)
+#define	MEID(x)						((x) << 2)
+#define	VMID(x)						((x) << 4)
+#define	QUEUEID(x)					((x) << 8)
+
+#define	SQ_CONFIG					0x8C00
+
+#define	SH_MEM_BASES					0x8C28
+/* if PTR32, these are the bases for scratch and lds */
+#define	PRIVATE_BASE(x)					((x) << 0) /* scratch */
+#define	SHARED_BASE(x)					((x) << 16) /* LDS */
+#define	SH_MEM_APE1_BASE				0x8C2C
+/* if PTR32, this is the base location of GPUVM */
+#define	SH_MEM_APE1_LIMIT				0x8C30
+/* if PTR32, this is the upper limit of GPUVM */
+#define	SH_MEM_CONFIG					0x8C34
+#define	PTR32						(1 << 0)
+#define PRIVATE_ATC					(1 << 1)
+#define	ALIGNMENT_MODE(x)				((x) << 2)
+#define	SH_MEM_ALIGNMENT_MODE_DWORD			0
+#define	SH_MEM_ALIGNMENT_MODE_DWORD_STRICT		1
+#define	SH_MEM_ALIGNMENT_MODE_STRICT			2
+#define	SH_MEM_ALIGNMENT_MODE_UNALIGNED			3
+#define	DEFAULT_MTYPE(x)				((x) << 4)
+#define	APE1_MTYPE(x)					((x) << 7)
+
+/* valid for both DEFAULT_MTYPE and APE1_MTYPE */
+#define	MTYPE_CACHED					0
+#define	MTYPE_NONCACHED					3
+
+
+#define SH_STATIC_MEM_CONFIG				0x9604u
+
+#define	TC_CFG_L1_LOAD_POLICY0				0xAC68
+#define	TC_CFG_L1_LOAD_POLICY1				0xAC6C
+#define	TC_CFG_L1_STORE_POLICY				0xAC70
+#define	TC_CFG_L2_LOAD_POLICY0				0xAC74
+#define	TC_CFG_L2_LOAD_POLICY1				0xAC78
+#define	TC_CFG_L2_STORE_POLICY0				0xAC7C
+#define	TC_CFG_L2_STORE_POLICY1				0xAC80
+#define	TC_CFG_L2_ATOMIC_POLICY				0xAC84
+#define	TC_CFG_L1_VOLATILE				0xAC88
+#define	TC_CFG_L2_VOLATILE				0xAC8C
+
+#define CP_PQ_WPTR_POLL_CNTL				0xC20C
+#define	WPTR_POLL_EN					(1 << 31)
+
+#define CPC_INT_CNTL					0xC2D0
+#define CP_ME1_PIPE0_INT_CNTL				0xC214
+#define CP_ME1_PIPE1_INT_CNTL				0xC218
+#define CP_ME1_PIPE2_INT_CNTL				0xC21C
+#define CP_ME1_PIPE3_INT_CNTL				0xC220
+#define CP_ME2_PIPE0_INT_CNTL				0xC224
+#define CP_ME2_PIPE1_INT_CNTL				0xC228
+#define CP_ME2_PIPE2_INT_CNTL				0xC22C
+#define CP_ME2_PIPE3_INT_CNTL				0xC230
+#define DEQUEUE_REQUEST_INT_ENABLE			(1 << 13)
+#define WRM_POLL_TIMEOUT_INT_ENABLE			(1 << 17)
+#define PRIV_REG_INT_ENABLE				(1 << 23)
+#define TIME_STAMP_INT_ENABLE				(1 << 26)
+#define GENERIC2_INT_ENABLE				(1 << 29)
+#define GENERIC1_INT_ENABLE				(1 << 30)
+#define GENERIC0_INT_ENABLE				(1 << 31)
+#define CP_ME1_PIPE0_INT_STATUS				0xC214
+#define CP_ME1_PIPE1_INT_STATUS				0xC218
+#define CP_ME1_PIPE2_INT_STATUS				0xC21C
+#define CP_ME1_PIPE3_INT_STATUS				0xC220
+#define CP_ME2_PIPE0_INT_STATUS				0xC224
+#define CP_ME2_PIPE1_INT_STATUS				0xC228
+#define CP_ME2_PIPE2_INT_STATUS				0xC22C
+#define CP_ME2_PIPE3_INT_STATUS				0xC230
+#define DEQUEUE_REQUEST_INT_STATUS			(1 << 13)
+#define WRM_POLL_TIMEOUT_INT_STATUS			(1 << 17)
+#define PRIV_REG_INT_STATUS				(1 << 23)
+#define TIME_STAMP_INT_STATUS				(1 << 26)
+#define GENERIC2_INT_STATUS				(1 << 29)
+#define GENERIC1_INT_STATUS				(1 << 30)
+#define GENERIC0_INT_STATUS				(1 << 31)
+
+#define CP_HPD_EOP_BASE_ADDR				0xC904
+#define CP_HPD_EOP_BASE_ADDR_HI				0xC908
+#define CP_HPD_EOP_VMID					0xC90C
+#define CP_HPD_EOP_CONTROL				0xC910
+#define	EOP_SIZE(x)					((x) << 0)
+#define	EOP_SIZE_MASK					(0x3f << 0)
+#define CP_MQD_BASE_ADDR				0xC914
+#define CP_MQD_BASE_ADDR_HI				0xC918
+#define CP_HQD_ACTIVE					0xC91C
+#define CP_HQD_VMID					0xC920
+
+#define CP_HQD_PERSISTENT_STATE				0xC924u
+#define	DEFAULT_CP_HQD_PERSISTENT_STATE			(0x33U << 8)
+#define	PRELOAD_REQ					(1 << 0)
+
+#define CP_HQD_PIPE_PRIORITY				0xC928u
+#define CP_HQD_QUEUE_PRIORITY				0xC92Cu
+#define CP_HQD_QUANTUM					0xC930u
+#define	QUANTUM_EN					1U
+#define	QUANTUM_SCALE_1MS				(1U << 4)
+#define	QUANTUM_DURATION(x)				((x) << 8)
+
+#define CP_HQD_PQ_BASE					0xC934
+#define CP_HQD_PQ_BASE_HI				0xC938
+#define CP_HQD_PQ_RPTR					0xC93C
+#define CP_HQD_PQ_RPTR_REPORT_ADDR			0xC940
+#define CP_HQD_PQ_RPTR_REPORT_ADDR_HI			0xC944
+#define CP_HQD_PQ_WPTR_POLL_ADDR			0xC948
+#define CP_HQD_PQ_WPTR_POLL_ADDR_HI			0xC94C
+#define CP_HQD_PQ_DOORBELL_CONTROL			0xC950
+#define	DOORBELL_OFFSET(x)				((x) << 2)
+#define	DOORBELL_OFFSET_MASK				(0x1fffff << 2)
+#define	DOORBELL_SOURCE					(1 << 28)
+#define	DOORBELL_SCHD_HIT				(1 << 29)
+#define	DOORBELL_EN					(1 << 30)
+#define	DOORBELL_HIT					(1 << 31)
+#define CP_HQD_PQ_WPTR					0xC954
+#define CP_HQD_PQ_CONTROL				0xC958
+#define	QUEUE_SIZE(x)					((x) << 0)
+#define	QUEUE_SIZE_MASK					(0x3f << 0)
+#define	RPTR_BLOCK_SIZE(x)				((x) << 8)
+#define	RPTR_BLOCK_SIZE_MASK				(0x3f << 8)
+#define	MIN_AVAIL_SIZE(x)				((x) << 20)
+#define	PQ_ATC_EN					(1 << 23)
+#define	PQ_VOLATILE					(1 << 26)
+#define	NO_UPDATE_RPTR					(1 << 27)
+#define	UNORD_DISPATCH					(1 << 28)
+#define	ROQ_PQ_IB_FLIP					(1 << 29)
+#define	PRIV_STATE					(1 << 30)
+#define	KMD_QUEUE					(1 << 31)
+
+#define	DEFAULT_RPTR_BLOCK_SIZE				RPTR_BLOCK_SIZE(5)
+#define	DEFAULT_MIN_AVAIL_SIZE				MIN_AVAIL_SIZE(3)
+
+#define CP_HQD_IB_BASE_ADDR				0xC95Cu
+#define CP_HQD_IB_BASE_ADDR_HI				0xC960u
+#define CP_HQD_IB_RPTR					0xC964u
+#define CP_HQD_IB_CONTROL				0xC968u
+#define	IB_ATC_EN					(1U << 23)
+#define	DEFAULT_MIN_IB_AVAIL_SIZE			(3U << 20)
+
+#define CP_HQD_DEQUEUE_REQUEST				0xC974
+#define	DEQUEUE_REQUEST_DRAIN				1
+#define DEQUEUE_REQUEST_RESET				2
+#define		DEQUEUE_INT					(1U << 8)
+
+#define CP_HQD_SEMA_CMD					0xC97Cu
+#define CP_HQD_MSG_TYPE					0xC980u
+#define CP_HQD_ATOMIC0_PREOP_LO				0xC984u
+#define CP_HQD_ATOMIC0_PREOP_HI				0xC988u
+#define CP_HQD_ATOMIC1_PREOP_LO				0xC98Cu
+#define CP_HQD_ATOMIC1_PREOP_HI				0xC990u
+#define CP_HQD_HQ_SCHEDULER0				0xC994u
+#define CP_HQD_HQ_SCHEDULER1				0xC998u
+
+
+#define CP_MQD_CONTROL					0xC99C
+#define	MQD_VMID(x)					((x) << 0)
+#define	MQD_VMID_MASK					(0xf << 0)
+#define	MQD_CONTROL_PRIV_STATE_EN			(1U << 8)
+
+#define GRBM_GFX_INDEX					0x30800
+#define	INSTANCE_INDEX(x)				((x) << 0)
+#define	SH_INDEX(x)					((x) << 8)
+#define	SE_INDEX(x)					((x) << 16)
+#define	SH_BROADCAST_WRITES				(1 << 29)
+#define	INSTANCE_BROADCAST_WRITES			(1 << 30)
+#define	SE_BROADCAST_WRITES				(1 << 31)
+
+#define SQC_CACHES					0x30d20
+#define SQC_POLICY					0x8C38u
+#define SQC_VOLATILE					0x8C3Cu
+
+#define CP_PERFMON_CNTL					0x36020
+
+#define ATC_VMID0_PASID_MAPPING				0x339Cu
+#define	ATC_VMID_PASID_MAPPING_UPDATE_STATUS		0x3398u
+#define	ATC_VMID_PASID_MAPPING_VALID			(1U << 31)
+
+#define ATC_VM_APERTURE0_CNTL				0x3310u
+#define	ATS_ACCESS_MODE_NEVER				0
+#define	ATS_ACCESS_MODE_ALWAYS				1
+
+#define ATC_VM_APERTURE0_CNTL2				0x3318u
+#define ATC_VM_APERTURE0_HIGH_ADDR			0x3308u
+#define ATC_VM_APERTURE0_LOW_ADDR			0x3300u
+#define ATC_VM_APERTURE1_CNTL				0x3314u
+#define ATC_VM_APERTURE1_CNTL2				0x331Cu
+#define ATC_VM_APERTURE1_HIGH_ADDR			0x330Cu
+#define ATC_VM_APERTURE1_LOW_ADDR			0x3304u
+
+#endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
new file mode 100644
index 000000000000..4f7b275f2f7b
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -0,0 +1,595 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/device.h>
+#include <linux/export.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/compat.h>
+#include <uapi/linux/kfd_ioctl.h>
+#include <linux/time.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+#include <uapi/asm-generic/mman-common.h>
+#include <asm/processor.h>
+#include "kfd_priv.h"
+#include "kfd_device_queue_manager.h"
+
+static long kfd_ioctl(struct file *, unsigned int, unsigned long);
+static int kfd_open(struct inode *, struct file *);
+static int kfd_mmap(struct file *, struct vm_area_struct *);
+
+static const char kfd_dev_name[] = "kfd";
+
+static const struct file_operations kfd_fops = {
+	.owner = THIS_MODULE,
+	.unlocked_ioctl = kfd_ioctl,
+	.compat_ioctl = kfd_ioctl,
+	.open = kfd_open,
+	.mmap = kfd_mmap,
+};
+
+static int kfd_char_dev_major = -1;
+static struct class *kfd_class;
+struct device *kfd_device;
+
+int kfd_chardev_init(void)
+{
+	int err = 0;
+
+	kfd_char_dev_major = register_chrdev(0, kfd_dev_name, &kfd_fops);
+	err = kfd_char_dev_major;
+	if (err < 0)
+		goto err_register_chrdev;
+
+	kfd_class = class_create(THIS_MODULE, kfd_dev_name);
+	err = PTR_ERR(kfd_class);
+	if (IS_ERR(kfd_class))
+		goto err_class_create;
+
+	kfd_device = device_create(kfd_class, NULL,
+					MKDEV(kfd_char_dev_major, 0),
+					NULL, kfd_dev_name);
+	err = PTR_ERR(kfd_device);
+	if (IS_ERR(kfd_device))
+		goto err_device_create;
+
+	return 0;
+
+err_device_create:
+	class_destroy(kfd_class);
+err_class_create:
+	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
+err_register_chrdev:
+	return err;
+}
+
+void kfd_chardev_exit(void)
+{
+	device_destroy(kfd_class, MKDEV(kfd_char_dev_major, 0));
+	class_destroy(kfd_class);
+	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
+}
+
+struct device *kfd_chardev(void)
+{
+	return kfd_device;
+}
+
+
+static int kfd_open(struct inode *inode, struct file *filep)
+{
+	struct kfd_process *process;
+	bool is_32bit_user_mode;
+
+	if (iminor(inode) != 0)
+		return -ENODEV;
+
+	is_32bit_user_mode = is_compat_task();
+
+	if (is_32bit_user_mode == true) {
+		dev_warn(kfd_device,
+			"Process %d (32-bit) failed to open /dev/kfd\n"
+			"32-bit processes are not supported by amdkfd\n",
+			current->pid);
+		return -EPERM;
+	}
+
+	process = kfd_create_process(current);
+	if (IS_ERR(process))
+		return PTR_ERR(process);
+
+	process->is_32bit_user_mode = is_32bit_user_mode;
+
+	dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
+		process->pasid, process->is_32bit_user_mode);
+
+	kfd_init_apertures(process);
+
+	return 0;
+}
+
+static long kfd_ioctl_get_version(struct file *filep, struct kfd_process *p,
+					void __user *arg)
+{
+	struct kfd_ioctl_get_version_args args;
+	int err = 0;
+
+	args.major_version = KFD_IOCTL_MAJOR_VERSION;
+	args.minor_version = KFD_IOCTL_MINOR_VERSION;
+
+	if (copy_to_user(arg, &args, sizeof(args)))
+		err = -EFAULT;
+
+	return err;
+}
+
+static int set_queue_properties_from_user(struct queue_properties *q_properties,
+				struct kfd_ioctl_create_queue_args *args)
+{
+	if (args->queue_percentage > KFD_MAX_QUEUE_PERCENTAGE) {
+		pr_err("kfd: queue percentage must be between 0 to KFD_MAX_QUEUE_PERCENTAGE\n");
+		return -EINVAL;
+	}
+
+	if (args->queue_priority > KFD_MAX_QUEUE_PRIORITY) {
+		pr_err("kfd: queue priority must be between 0 to KFD_MAX_QUEUE_PRIORITY\n");
+		return -EINVAL;
+	}
+
+	if ((args->ring_base_address) &&
+		(!access_ok(VERIFY_WRITE,
+			(const void __user *) args->ring_base_address,
+			sizeof(uint64_t)))) {
+		pr_err("kfd: can't access ring base address\n");
+		return -EFAULT;
+	}
+
+	if (!is_power_of_2(args->ring_size) && (args->ring_size != 0)) {
+		pr_err("kfd: ring size must be a power of 2 or 0\n");
+		return -EINVAL;
+	}
+
+	if (!access_ok(VERIFY_WRITE,
+			(const void __user *) args->read_pointer_address,
+			sizeof(uint32_t))) {
+		pr_err("kfd: can't access read pointer\n");
+		return -EFAULT;
+	}
+
+	if (!access_ok(VERIFY_WRITE,
+			(const void __user *) args->write_pointer_address,
+			sizeof(uint32_t))) {
+		pr_err("kfd: can't access write pointer\n");
+		return -EFAULT;
+	}
+
+	q_properties->is_interop = false;
+	q_properties->queue_percent = args->queue_percentage;
+	q_properties->priority = args->queue_priority;
+	q_properties->queue_address = args->ring_base_address;
+	q_properties->queue_size = args->ring_size;
+	q_properties->read_ptr = (uint32_t *) args->read_pointer_address;
+	q_properties->write_ptr = (uint32_t *) args->write_pointer_address;
+	if (args->queue_type == KFD_IOC_QUEUE_TYPE_COMPUTE ||
+		args->queue_type == KFD_IOC_QUEUE_TYPE_COMPUTE_AQL)
+		q_properties->type = KFD_QUEUE_TYPE_COMPUTE;
+	else
+		return -ENOTSUPP;
+
+	if (args->queue_type == KFD_IOC_QUEUE_TYPE_COMPUTE_AQL)
+		q_properties->format = KFD_QUEUE_FORMAT_AQL;
+	else
+		q_properties->format = KFD_QUEUE_FORMAT_PM4;
+
+	pr_debug("Queue Percentage (%d, %d)\n",
+			q_properties->queue_percent, args->queue_percentage);
+
+	pr_debug("Queue Priority (%d, %d)\n",
+			q_properties->priority, args->queue_priority);
+
+	pr_debug("Queue Address (0x%llX, 0x%llX)\n",
+			q_properties->queue_address, args->ring_base_address);
+
+	pr_debug("Queue Size (0x%llX, %u)\n",
+			q_properties->queue_size, args->ring_size);
+
+	pr_debug("Queue r/w Pointers (0x%llX, 0x%llX)\n",
+			(uint64_t) q_properties->read_ptr,
+			(uint64_t) q_properties->write_ptr);
+
+	pr_debug("Queue Format (%d)\n", q_properties->format);
+
+	return 0;
+}
+
+static long kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
+					void __user *arg)
+{
+	struct kfd_ioctl_create_queue_args args;
+	struct kfd_dev *dev;
+	int err = 0;
+	unsigned int queue_id;
+	struct kfd_process_device *pdd;
+	struct queue_properties q_properties;
+
+	memset(&q_properties, 0, sizeof(struct queue_properties));
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	pr_debug("kfd: creating queue ioctl\n");
+
+	err = set_queue_properties_from_user(&q_properties, &args);
+	if (err)
+		return err;
+
+	dev = kfd_device_by_id(args.gpu_id);
+	if (dev == NULL)
+		return -EINVAL;
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_bind_process_to_device(dev, p);
+	if (IS_ERR(pdd)) {
+		err = PTR_ERR(pdd);
+		goto err_bind_process;
+	}
+
+	pr_debug("kfd: creating queue for PASID %d on GPU 0x%x\n",
+			p->pasid,
+			dev->id);
+
+	err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, 0,
+				KFD_QUEUE_TYPE_COMPUTE, &queue_id);
+	if (err != 0)
+		goto err_create_queue;
+
+	args.queue_id = queue_id;
+
+	/* Return gpu_id as doorbell offset for mmap usage */
+	args.doorbell_offset = args.gpu_id << PAGE_SHIFT;
+
+	if (copy_to_user(arg, &args, sizeof(args))) {
+		err = -EFAULT;
+		goto err_copy_args_out;
+	}
+
+	mutex_unlock(&p->mutex);
+
+	pr_debug("kfd: queue id %d was created successfully\n", args.queue_id);
+
+	pr_debug("ring buffer address == 0x%016llX\n",
+			args.ring_base_address);
+
+	pr_debug("read ptr address    == 0x%016llX\n",
+			args.read_pointer_address);
+
+	pr_debug("write ptr address   == 0x%016llX\n",
+			args.write_pointer_address);
+
+	return 0;
+
+err_copy_args_out:
+	pqm_destroy_queue(&p->pqm, queue_id);
+err_create_queue:
+err_bind_process:
+	mutex_unlock(&p->mutex);
+	return err;
+}
+
+static int kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p,
+					void __user *arg)
+{
+	int retval;
+	struct kfd_ioctl_destroy_queue_args args;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	pr_debug("kfd: destroying queue id %d for PASID %d\n",
+				args.queue_id,
+				p->pasid);
+
+	mutex_lock(&p->mutex);
+
+	retval = pqm_destroy_queue(&p->pqm, args.queue_id);
+
+	mutex_unlock(&p->mutex);
+	return retval;
+}
+
+static int kfd_ioctl_update_queue(struct file *filp, struct kfd_process *p,
+					void __user *arg)
+{
+	int retval;
+	struct kfd_ioctl_update_queue_args args;
+	struct queue_properties properties;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	if (args.queue_percentage > KFD_MAX_QUEUE_PERCENTAGE) {
+		pr_err("kfd: queue percentage must be between 0 to KFD_MAX_QUEUE_PERCENTAGE\n");
+		return -EINVAL;
+	}
+
+	if (args.queue_priority > KFD_MAX_QUEUE_PRIORITY) {
+		pr_err("kfd: queue priority must be between 0 to KFD_MAX_QUEUE_PRIORITY\n");
+		return -EINVAL;
+	}
+
+	if ((args.ring_base_address) &&
+		(!access_ok(VERIFY_WRITE,
+			(const void __user *) args.ring_base_address,
+			sizeof(uint64_t)))) {
+		pr_err("kfd: can't access ring base address\n");
+		return -EFAULT;
+	}
+
+	if (!is_power_of_2(args.ring_size) && (args.ring_size != 0)) {
+		pr_err("kfd: ring size must be a power of 2 or 0\n");
+		return -EINVAL;
+	}
+
+	properties.queue_address = args.ring_base_address;
+	properties.queue_size = args.ring_size;
+	properties.queue_percent = args.queue_percentage;
+	properties.priority = args.queue_priority;
+
+	pr_debug("kfd: updating queue id %d for PASID %d\n",
+			args.queue_id, p->pasid);
+
+	mutex_lock(&p->mutex);
+
+	retval = pqm_update_queue(&p->pqm, args.queue_id, &properties);
+
+	mutex_unlock(&p->mutex);
+
+	return retval;
+}
+
+static long kfd_ioctl_set_memory_policy(struct file *filep,
+				struct kfd_process *p, void __user *arg)
+{
+	struct kfd_ioctl_set_memory_policy_args args;
+	struct kfd_dev *dev;
+	int err = 0;
+	struct kfd_process_device *pdd;
+	enum cache_policy default_policy, alternate_policy;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	if (args.default_policy != KFD_IOC_CACHE_POLICY_COHERENT
+	    && args.default_policy != KFD_IOC_CACHE_POLICY_NONCOHERENT) {
+		return -EINVAL;
+	}
+
+	if (args.alternate_policy != KFD_IOC_CACHE_POLICY_COHERENT
+	    && args.alternate_policy != KFD_IOC_CACHE_POLICY_NONCOHERENT) {
+		return -EINVAL;
+	}
+
+	dev = kfd_device_by_id(args.gpu_id);
+	if (dev == NULL)
+		return -EINVAL;
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_bind_process_to_device(dev, p);
+	if (IS_ERR(pdd)) {
+		err = PTR_ERR(pdd);
+		goto out;
+	}
+
+	default_policy = (args.default_policy == KFD_IOC_CACHE_POLICY_COHERENT)
+			 ? cache_policy_coherent : cache_policy_noncoherent;
+
+	alternate_policy =
+		(args.alternate_policy == KFD_IOC_CACHE_POLICY_COHERENT)
+		   ? cache_policy_coherent : cache_policy_noncoherent;
+
+	if (!dev->dqm->set_cache_memory_policy(dev->dqm,
+				&pdd->qpd,
+				default_policy,
+				alternate_policy,
+				(void __user *)args.alternate_aperture_base,
+				args.alternate_aperture_size))
+		err = -EINVAL;
+
+out:
+	mutex_unlock(&p->mutex);
+
+	return err;
+}
+
+static long kfd_ioctl_get_clock_counters(struct file *filep,
+				struct kfd_process *p, void __user *arg)
+{
+	struct kfd_ioctl_get_clock_counters_args args;
+	struct kfd_dev *dev;
+	struct timespec time;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	dev = kfd_device_by_id(args.gpu_id);
+	if (dev == NULL)
+		return -EINVAL;
+
+	/* Reading GPU clock counter from KGD */
+	args.gpu_clock_counter = kfd2kgd->get_gpu_clock_counter(dev->kgd);
+
+	/* No access to rdtsc. Using raw monotonic time */
+	getrawmonotonic(&time);
+	args.cpu_clock_counter = (uint64_t)timespec_to_ns(&time);
+
+	get_monotonic_boottime(&time);
+	args.system_clock_counter = (uint64_t)timespec_to_ns(&time);
+
+	/* Since the counter is in nano-seconds we use 1GHz frequency */
+	args.system_clock_freq = 1000000000;
+
+	if (copy_to_user(arg, &args, sizeof(args)))
+		return -EFAULT;
+
+	return 0;
+}
+
+
+static int kfd_ioctl_get_process_apertures(struct file *filp,
+				struct kfd_process *p, void __user *arg)
+{
+	struct kfd_ioctl_get_process_apertures_args args;
+	struct kfd_process_device_apertures *pAperture;
+	struct kfd_process_device *pdd;
+
+	dev_dbg(kfd_device, "get apertures for PASID %d", p->pasid);
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	args.num_of_nodes = 0;
+
+	mutex_lock(&p->mutex);
+
+	/*if the process-device list isn't empty*/
+	if (kfd_has_process_device_data(p)) {
+		/* Run over all pdd of the process */
+		pdd = kfd_get_first_process_device_data(p);
+		do {
+			pAperture = &args.process_apertures[args.num_of_nodes];
+			pAperture->gpu_id = pdd->dev->id;
+			pAperture->lds_base = pdd->lds_base;
+			pAperture->lds_limit = pdd->lds_limit;
+			pAperture->gpuvm_base = pdd->gpuvm_base;
+			pAperture->gpuvm_limit = pdd->gpuvm_limit;
+			pAperture->scratch_base = pdd->scratch_base;
+			pAperture->scratch_limit = pdd->scratch_limit;
+
+			dev_dbg(kfd_device,
+				"node id %u\n", args.num_of_nodes);
+			dev_dbg(kfd_device,
+				"gpu id %u\n", pdd->dev->id);
+			dev_dbg(kfd_device,
+				"lds_base %llX\n", pdd->lds_base);
+			dev_dbg(kfd_device,
+				"lds_limit %llX\n", pdd->lds_limit);
+			dev_dbg(kfd_device,
+				"gpuvm_base %llX\n", pdd->gpuvm_base);
+			dev_dbg(kfd_device,
+				"gpuvm_limit %llX\n", pdd->gpuvm_limit);
+			dev_dbg(kfd_device,
+				"scratch_base %llX\n", pdd->scratch_base);
+			dev_dbg(kfd_device,
+				"scratch_limit %llX\n", pdd->scratch_limit);
+
+			args.num_of_nodes++;
+		} while ((pdd = kfd_get_next_process_device_data(p, pdd)) != NULL &&
+				(args.num_of_nodes < NUM_OF_SUPPORTED_GPUS));
+	}
+
+	mutex_unlock(&p->mutex);
+
+	if (copy_to_user(arg, &args, sizeof(args)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
+{
+	struct kfd_process *process;
+	long err = -EINVAL;
+
+	dev_dbg(kfd_device,
+		"ioctl cmd 0x%x (#%d), arg 0x%lx\n",
+		cmd, _IOC_NR(cmd), arg);
+
+	process = kfd_get_process(current);
+	if (IS_ERR(process))
+		return PTR_ERR(process);
+
+	switch (cmd) {
+	case KFD_IOC_GET_VERSION:
+		err = kfd_ioctl_get_version(filep, process, (void __user *)arg);
+		break;
+	case KFD_IOC_CREATE_QUEUE:
+		err = kfd_ioctl_create_queue(filep, process,
+						(void __user *)arg);
+		break;
+
+	case KFD_IOC_DESTROY_QUEUE:
+		err = kfd_ioctl_destroy_queue(filep, process,
+						(void __user *)arg);
+		break;
+
+	case KFD_IOC_SET_MEMORY_POLICY:
+		err = kfd_ioctl_set_memory_policy(filep, process,
+						(void __user *)arg);
+		break;
+
+	case KFD_IOC_GET_CLOCK_COUNTERS:
+		err = kfd_ioctl_get_clock_counters(filep, process,
+						(void __user *)arg);
+		break;
+
+	case KFD_IOC_GET_PROCESS_APERTURES:
+		err = kfd_ioctl_get_process_apertures(filep, process,
+						(void __user *)arg);
+		break;
+
+	case KFD_IOC_UPDATE_QUEUE:
+		err = kfd_ioctl_update_queue(filep, process,
+						(void __user *)arg);
+		break;
+
+	default:
+		dev_err(kfd_device,
+			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
+			cmd, arg);
+		err = -EINVAL;
+		break;
+	}
+
+	if (err < 0)
+		dev_err(kfd_device,
+			"ioctl error %ld for ioctl cmd 0x%x (#%d)\n",
+			err, cmd, _IOC_NR(cmd));
+
+	return err;
+}
+
+static int kfd_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+	struct kfd_process *process;
+
+	process = kfd_get_process(current);
+	if (IS_ERR(process))
+		return PTR_ERR(process);
+
+	return kfd_doorbell_mmap(process, vma);
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
new file mode 100644
index 000000000000..a374fa3d3ee6
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
@@ -0,0 +1,294 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef KFD_CRAT_H_INCLUDED
+#define KFD_CRAT_H_INCLUDED
+
+#include <linux/types.h>
+
+#pragma pack(1)
+
+/*
+ * 4CC signature values for the CRAT and CDIT ACPI tables
+ */
+
+#define CRAT_SIGNATURE	"CRAT"
+#define CDIT_SIGNATURE	"CDIT"
+
+/*
+ * Component Resource Association Table (CRAT)
+ */
+
+#define CRAT_OEMID_LENGTH	6
+#define CRAT_OEMTABLEID_LENGTH	8
+#define CRAT_RESERVED_LENGTH	6
+
+#define CRAT_OEMID_64BIT_MASK ((1ULL << (CRAT_OEMID_LENGTH * 8)) - 1)
+
+struct crat_header {
+	uint32_t	signature;
+	uint32_t	length;
+	uint8_t		revision;
+	uint8_t		checksum;
+	uint8_t		oem_id[CRAT_OEMID_LENGTH];
+	uint8_t		oem_table_id[CRAT_OEMTABLEID_LENGTH];
+	uint32_t	oem_revision;
+	uint32_t	creator_id;
+	uint32_t	creator_revision;
+	uint32_t	total_entries;
+	uint16_t	num_domains;
+	uint8_t		reserved[CRAT_RESERVED_LENGTH];
+};
+
+/*
+ * The header structure is immediately followed by total_entries of the
+ * data definitions
+ */
+
+/*
+ * The currently defined subtype entries in the CRAT
+ */
+#define CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY	0
+#define CRAT_SUBTYPE_MEMORY_AFFINITY		1
+#define CRAT_SUBTYPE_CACHE_AFFINITY		2
+#define CRAT_SUBTYPE_TLB_AFFINITY		3
+#define CRAT_SUBTYPE_CCOMPUTE_AFFINITY		4
+#define CRAT_SUBTYPE_IOLINK_AFFINITY		5
+#define CRAT_SUBTYPE_MAX			6
+
+#define CRAT_SIBLINGMAP_SIZE	32
+
+/*
+ * ComputeUnit Affinity structure and definitions
+ */
+#define CRAT_CU_FLAGS_ENABLED		0x00000001
+#define CRAT_CU_FLAGS_HOT_PLUGGABLE	0x00000002
+#define CRAT_CU_FLAGS_CPU_PRESENT	0x00000004
+#define CRAT_CU_FLAGS_GPU_PRESENT	0x00000008
+#define CRAT_CU_FLAGS_IOMMU_PRESENT	0x00000010
+#define CRAT_CU_FLAGS_RESERVED		0xffffffe0
+
+#define CRAT_COMPUTEUNIT_RESERVED_LENGTH 4
+
+struct crat_subtype_computeunit {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	proximity_domain;
+	uint32_t	processor_id_low;
+	uint16_t	num_cpu_cores;
+	uint16_t	num_simd_cores;
+	uint16_t	max_waves_simd;
+	uint16_t	io_count;
+	uint16_t	hsa_capability;
+	uint16_t	lds_size_in_kb;
+	uint8_t		wave_front_size;
+	uint8_t		num_banks;
+	uint16_t	micro_engine_id;
+	uint8_t		num_arrays;
+	uint8_t		num_cu_per_array;
+	uint8_t		num_simd_per_cu;
+	uint8_t		max_slots_scatch_cu;
+	uint8_t		reserved2[CRAT_COMPUTEUNIT_RESERVED_LENGTH];
+};
+
+/*
+ * HSA Memory Affinity structure and definitions
+ */
+#define CRAT_MEM_FLAGS_ENABLED		0x00000001
+#define CRAT_MEM_FLAGS_HOT_PLUGGABLE	0x00000002
+#define CRAT_MEM_FLAGS_NON_VOLATILE	0x00000004
+#define CRAT_MEM_FLAGS_RESERVED		0xfffffff8
+
+#define CRAT_MEMORY_RESERVED_LENGTH 8
+
+struct crat_subtype_memory {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	promixity_domain;
+	uint32_t	base_addr_low;
+	uint32_t	base_addr_high;
+	uint32_t	length_low;
+	uint32_t	length_high;
+	uint32_t	width;
+	uint8_t		reserved2[CRAT_MEMORY_RESERVED_LENGTH];
+};
+
+/*
+ * HSA Cache Affinity structure and definitions
+ */
+#define CRAT_CACHE_FLAGS_ENABLED	0x00000001
+#define CRAT_CACHE_FLAGS_DATA_CACHE	0x00000002
+#define CRAT_CACHE_FLAGS_INST_CACHE	0x00000004
+#define CRAT_CACHE_FLAGS_CPU_CACHE	0x00000008
+#define CRAT_CACHE_FLAGS_SIMD_CACHE	0x00000010
+#define CRAT_CACHE_FLAGS_RESERVED	0xffffffe0
+
+#define CRAT_CACHE_RESERVED_LENGTH 8
+
+struct crat_subtype_cache {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	processor_id_low;
+	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
+	uint32_t	cache_size;
+	uint8_t		cache_level;
+	uint8_t		lines_per_tag;
+	uint16_t	cache_line_size;
+	uint8_t		associativity;
+	uint8_t		cache_properties;
+	uint16_t	cache_latency;
+	uint8_t		reserved2[CRAT_CACHE_RESERVED_LENGTH];
+};
+
+/*
+ * HSA TLB Affinity structure and definitions
+ */
+#define CRAT_TLB_FLAGS_ENABLED	0x00000001
+#define CRAT_TLB_FLAGS_DATA_TLB	0x00000002
+#define CRAT_TLB_FLAGS_INST_TLB	0x00000004
+#define CRAT_TLB_FLAGS_CPU_TLB	0x00000008
+#define CRAT_TLB_FLAGS_SIMD_TLB	0x00000010
+#define CRAT_TLB_FLAGS_RESERVED	0xffffffe0
+
+#define CRAT_TLB_RESERVED_LENGTH 4
+
+struct crat_subtype_tlb {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	processor_id_low;
+	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
+	uint32_t	tlb_level;
+	uint8_t		data_tlb_associativity_2mb;
+	uint8_t		data_tlb_size_2mb;
+	uint8_t		instruction_tlb_associativity_2mb;
+	uint8_t		instruction_tlb_size_2mb;
+	uint8_t		data_tlb_associativity_4k;
+	uint8_t		data_tlb_size_4k;
+	uint8_t		instruction_tlb_associativity_4k;
+	uint8_t		instruction_tlb_size_4k;
+	uint8_t		data_tlb_associativity_1gb;
+	uint8_t		data_tlb_size_1gb;
+	uint8_t		instruction_tlb_associativity_1gb;
+	uint8_t		instruction_tlb_size_1gb;
+	uint8_t		reserved2[CRAT_TLB_RESERVED_LENGTH];
+};
+
+/*
+ * HSA CCompute/APU Affinity structure and definitions
+ */
+#define CRAT_CCOMPUTE_FLAGS_ENABLED	0x00000001
+#define CRAT_CCOMPUTE_FLAGS_RESERVED	0xfffffffe
+
+#define CRAT_CCOMPUTE_RESERVED_LENGTH 16
+
+struct crat_subtype_ccompute {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	processor_id_low;
+	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
+	uint32_t	apu_size;
+	uint8_t		reserved2[CRAT_CCOMPUTE_RESERVED_LENGTH];
+};
+
+/*
+ * HSA IO Link Affinity structure and definitions
+ */
+#define CRAT_IOLINK_FLAGS_ENABLED	0x00000001
+#define CRAT_IOLINK_FLAGS_COHERENCY	0x00000002
+#define CRAT_IOLINK_FLAGS_RESERVED	0xfffffffc
+
+/*
+ * IO interface types
+ */
+#define CRAT_IOLINK_TYPE_UNDEFINED	0
+#define CRAT_IOLINK_TYPE_HYPERTRANSPORT	1
+#define CRAT_IOLINK_TYPE_PCIEXPRESS	2
+#define CRAT_IOLINK_TYPE_OTHER		3
+#define CRAT_IOLINK_TYPE_MAX		255
+
+#define CRAT_IOLINK_RESERVED_LENGTH 24
+
+struct crat_subtype_iolink {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	proximity_domain_from;
+	uint32_t	proximity_domain_to;
+	uint8_t		io_interface_type;
+	uint8_t		version_major;
+	uint16_t	version_minor;
+	uint32_t	minimum_latency;
+	uint32_t	maximum_latency;
+	uint32_t	minimum_bandwidth_mbs;
+	uint32_t	maximum_bandwidth_mbs;
+	uint32_t	recommended_transfer_size;
+	uint8_t		reserved2[CRAT_IOLINK_RESERVED_LENGTH];
+};
+
+/*
+ * HSA generic sub-type header
+ */
+
+#define CRAT_SUBTYPE_FLAGS_ENABLED 0x00000001
+
+struct crat_subtype_generic {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+};
+
+/*
+ * Component Locality Distance Information Table (CDIT)
+ */
+#define CDIT_OEMID_LENGTH	6
+#define CDIT_OEMTABLEID_LENGTH	8
+
+struct cdit_header {
+	uint32_t	signature;
+	uint32_t	length;
+	uint8_t		revision;
+	uint8_t		checksum;
+	uint8_t		oem_id[CDIT_OEMID_LENGTH];
+	uint8_t		oem_table_id[CDIT_OEMTABLEID_LENGTH];
+	uint32_t	oem_revision;
+	uint32_t	creator_id;
+	uint32_t	creator_revision;
+	uint32_t	total_entries;
+	uint16_t	num_domains;
+	uint8_t		entry[1];
+};
+
+#pragma pack()
+
+#endif /* KFD_CRAT_H_INCLUDED */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
new file mode 100644
index 000000000000..43884ebd4303
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -0,0 +1,308 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/amd-iommu.h>
+#include <linux/bsearch.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include "kfd_priv.h"
+#include "kfd_device_queue_manager.h"
+
+#define MQD_SIZE_ALIGNED 768
+
+static const struct kfd_device_info kaveri_device_info = {
+	.max_pasid_bits = 16,
+	.ih_ring_entry_size = 4 * sizeof(uint32_t),
+	.mqd_size_aligned = MQD_SIZE_ALIGNED
+};
+
+struct kfd_deviceid {
+	unsigned short did;
+	const struct kfd_device_info *device_info;
+};
+
+/* Please keep this sorted by increasing device id. */
+static const struct kfd_deviceid supported_devices[] = {
+	{ 0x1304, &kaveri_device_info },	/* Kaveri */
+	{ 0x1305, &kaveri_device_info },	/* Kaveri */
+	{ 0x1306, &kaveri_device_info },	/* Kaveri */
+	{ 0x1307, &kaveri_device_info },	/* Kaveri */
+	{ 0x1309, &kaveri_device_info },	/* Kaveri */
+	{ 0x130A, &kaveri_device_info },	/* Kaveri */
+	{ 0x130B, &kaveri_device_info },	/* Kaveri */
+	{ 0x130C, &kaveri_device_info },	/* Kaveri */
+	{ 0x130D, &kaveri_device_info },	/* Kaveri */
+	{ 0x130E, &kaveri_device_info },	/* Kaveri */
+	{ 0x130F, &kaveri_device_info },	/* Kaveri */
+	{ 0x1310, &kaveri_device_info },	/* Kaveri */
+	{ 0x1311, &kaveri_device_info },	/* Kaveri */
+	{ 0x1312, &kaveri_device_info },	/* Kaveri */
+	{ 0x1313, &kaveri_device_info },	/* Kaveri */
+	{ 0x1315, &kaveri_device_info },	/* Kaveri */
+	{ 0x1316, &kaveri_device_info },	/* Kaveri */
+	{ 0x1317, &kaveri_device_info },	/* Kaveri */
+	{ 0x1318, &kaveri_device_info },	/* Kaveri */
+	{ 0x131B, &kaveri_device_info },	/* Kaveri */
+	{ 0x131C, &kaveri_device_info },	/* Kaveri */
+	{ 0x131D, &kaveri_device_info },	/* Kaveri */
+};
+
+static const struct kfd_device_info *lookup_device_info(unsigned short did)
+{
+	size_t i;
+
+	for (i = 0; i < ARRAY_SIZE(supported_devices); i++) {
+		if (supported_devices[i].did == did) {
+			BUG_ON(supported_devices[i].device_info == NULL);
+			return supported_devices[i].device_info;
+		}
+	}
+
+	return NULL;
+}
+
+struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev)
+{
+	struct kfd_dev *kfd;
+
+	const struct kfd_device_info *device_info =
+					lookup_device_info(pdev->device);
+
+	if (!device_info)
+		return NULL;
+
+	kfd = kzalloc(sizeof(*kfd), GFP_KERNEL);
+	if (!kfd)
+		return NULL;
+
+	kfd->kgd = kgd;
+	kfd->device_info = device_info;
+	kfd->pdev = pdev;
+	kfd->init_complete = false;
+
+	return kfd;
+}
+
+static bool device_iommu_pasid_init(struct kfd_dev *kfd)
+{
+	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
+					AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
+					AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
+
+	struct amd_iommu_device_info iommu_info;
+	unsigned int pasid_limit;
+	int err;
+
+	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
+	if (err < 0) {
+		dev_err(kfd_device,
+			"error getting iommu info. is the iommu enabled?\n");
+		return false;
+	}
+
+	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags) {
+		dev_err(kfd_device, "error required iommu flags ats(%i), pri(%i), pasid(%i)\n",
+		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
+		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
+		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP) != 0);
+		return false;
+	}
+
+	pasid_limit = min_t(unsigned int,
+			(unsigned int)1 << kfd->device_info->max_pasid_bits,
+			iommu_info.max_pasids);
+	/*
+	 * last pasid is used for kernel queues doorbells
+	 * in the future the last pasid might be used for a kernel thread.
+	 */
+	pasid_limit = min_t(unsigned int,
+				pasid_limit,
+				kfd->doorbell_process_limit - 1);
+
+	err = amd_iommu_init_device(kfd->pdev, pasid_limit);
+	if (err < 0) {
+		dev_err(kfd_device, "error initializing iommu device\n");
+		return false;
+	}
+
+	if (!kfd_set_pasid_limit(pasid_limit)) {
+		dev_err(kfd_device, "error setting pasid limit\n");
+		amd_iommu_free_device(kfd->pdev);
+		return false;
+	}
+
+	return true;
+}
+
+static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
+{
+	struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
+
+	if (dev)
+		kfd_unbind_process_from_device(dev, pasid);
+}
+
+bool kgd2kfd_device_init(struct kfd_dev *kfd,
+			 const struct kgd2kfd_shared_resources *gpu_resources)
+{
+	unsigned int size;
+
+	kfd->shared_resources = *gpu_resources;
+
+	/* calculate max size of mqds needed for queues */
+	size = max_num_of_processes *
+		max_num_of_queues_per_process *
+		kfd->device_info->mqd_size_aligned;
+
+	/* add another 512KB for all other allocations on gart */
+	size += 512 * 1024;
+
+	if (kfd2kgd->init_sa_manager(kfd->kgd, size)) {
+		dev_err(kfd_device,
+			"Error initializing sa manager for device (%x:%x)\n",
+			kfd->pdev->vendor, kfd->pdev->device);
+		goto out;
+	}
+
+	kfd_doorbell_init(kfd);
+
+	if (kfd_topology_add_device(kfd) != 0) {
+		dev_err(kfd_device,
+			"Error adding device (%x:%x) to topology\n",
+			kfd->pdev->vendor, kfd->pdev->device);
+		goto kfd_topology_add_device_error;
+	}
+
+	if (kfd_interrupt_init(kfd)) {
+		dev_err(kfd_device,
+			"Error initializing interrupts for device (%x:%x)\n",
+			kfd->pdev->vendor, kfd->pdev->device);
+		goto kfd_interrupt_error;
+	}
+
+	if (!device_iommu_pasid_init(kfd)) {
+		dev_err(kfd_device,
+			"Error initializing iommuv2 for device (%x:%x)\n",
+			kfd->pdev->vendor, kfd->pdev->device);
+		goto device_iommu_pasid_error;
+	}
+	amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
+						iommu_pasid_shutdown_callback);
+
+	kfd->dqm = device_queue_manager_init(kfd);
+	if (!kfd->dqm) {
+		dev_err(kfd_device,
+			"Error initializing queue manager for device (%x:%x)\n",
+			kfd->pdev->vendor, kfd->pdev->device);
+		goto device_queue_manager_error;
+	}
+
+	if (kfd->dqm->start(kfd->dqm) != 0) {
+		dev_err(kfd_device,
+			"Error starting queuen manager for device (%x:%x)\n",
+			kfd->pdev->vendor, kfd->pdev->device);
+		goto dqm_start_error;
+	}
+
+	kfd->init_complete = true;
+	dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
+		 kfd->pdev->device);
+
+	pr_debug("kfd: Starting kfd with the following scheduling policy %d\n",
+		sched_policy);
+
+	goto out;
+
+dqm_start_error:
+	device_queue_manager_uninit(kfd->dqm);
+device_queue_manager_error:
+	amd_iommu_free_device(kfd->pdev);
+device_iommu_pasid_error:
+	kfd_interrupt_exit(kfd);
+kfd_interrupt_error:
+	kfd_topology_remove_device(kfd);
+kfd_topology_add_device_error:
+	kfd2kgd->fini_sa_manager(kfd->kgd);
+	dev_err(kfd_device,
+		"device (%x:%x) NOT added due to errors\n",
+		kfd->pdev->vendor, kfd->pdev->device);
+out:
+	return kfd->init_complete;
+}
+
+void kgd2kfd_device_exit(struct kfd_dev *kfd)
+{
+	if (kfd->init_complete) {
+		device_queue_manager_uninit(kfd->dqm);
+		amd_iommu_free_device(kfd->pdev);
+		kfd_interrupt_exit(kfd);
+		kfd_topology_remove_device(kfd);
+	}
+
+	kfree(kfd);
+}
+
+void kgd2kfd_suspend(struct kfd_dev *kfd)
+{
+	BUG_ON(kfd == NULL);
+
+	if (kfd->init_complete) {
+		kfd->dqm->stop(kfd->dqm);
+		amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
+		amd_iommu_free_device(kfd->pdev);
+	}
+}
+
+int kgd2kfd_resume(struct kfd_dev *kfd)
+{
+	unsigned int pasid_limit;
+	int err;
+
+	BUG_ON(kfd == NULL);
+
+	pasid_limit = kfd_get_pasid_limit();
+
+	if (kfd->init_complete) {
+		err = amd_iommu_init_device(kfd->pdev, pasid_limit);
+		if (err < 0)
+			return -ENXIO;
+		amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
+						iommu_pasid_shutdown_callback);
+		kfd->dqm->start(kfd->dqm);
+	}
+
+	return 0;
+}
+
+/* This is called directly from KGD at ISR. */
+void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
+{
+	if (kfd->init_complete) {
+		spin_lock(&kfd->interrupt_lock);
+
+		if (kfd->interrupts_active
+		    && enqueue_ih_ring_entry(kfd, ih_ring_entry))
+			schedule_work(&kfd->interrupt_work);
+
+		spin_unlock(&kfd->interrupt_lock);
+	}
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
new file mode 100644
index 000000000000..924e90c072e5
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -0,0 +1,1062 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/types.h>
+#include <linux/printk.h>
+#include <linux/bitops.h>
+#include "kfd_priv.h"
+#include "kfd_device_queue_manager.h"
+#include "kfd_mqd_manager.h"
+#include "cik_regs.h"
+#include "kfd_kernel_queue.h"
+#include "../../radeon/cik_reg.h"
+
+/* Size of the per-pipe EOP queue */
+#define CIK_HPD_EOP_BYTES_LOG2 11
+#define CIK_HPD_EOP_BYTES (1U << CIK_HPD_EOP_BYTES_LOG2)
+
+static bool is_mem_initialized;
+
+static int init_memory(struct device_queue_manager *dqm);
+static int set_pasid_vmid_mapping(struct device_queue_manager *dqm,
+					unsigned int pasid, unsigned int vmid);
+
+static int create_compute_queue_nocpsch(struct device_queue_manager *dqm,
+					struct queue *q,
+					struct qcm_process_device *qpd);
+static int execute_queues_cpsch(struct device_queue_manager *dqm, bool lock);
+static int destroy_queues_cpsch(struct device_queue_manager *dqm, bool lock);
+
+
+static inline unsigned int get_pipes_num(struct device_queue_manager *dqm)
+{
+	BUG_ON(!dqm || !dqm->dev);
+	return dqm->dev->shared_resources.compute_pipe_count;
+}
+
+static inline unsigned int get_first_pipe(struct device_queue_manager *dqm)
+{
+	BUG_ON(!dqm);
+	return dqm->dev->shared_resources.first_compute_pipe;
+}
+
+static inline unsigned int get_pipes_num_cpsch(void)
+{
+	return PIPE_PER_ME_CP_SCHEDULING;
+}
+
+static inline unsigned int
+get_sh_mem_bases_nybble_64(struct kfd_process_device *pdd)
+{
+	uint32_t nybble;
+
+	nybble = (pdd->lds_base >> 60) & 0x0E;
+
+	return nybble;
+
+}
+
+static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd)
+{
+	unsigned int shared_base;
+
+	shared_base = (pdd->lds_base >> 16) & 0xFF;
+
+	return shared_base;
+}
+
+static uint32_t compute_sh_mem_bases_64bit(unsigned int top_address_nybble);
+static void init_process_memory(struct device_queue_manager *dqm,
+				struct qcm_process_device *qpd)
+{
+	struct kfd_process_device *pdd;
+	unsigned int temp;
+
+	BUG_ON(!dqm || !qpd);
+
+	pdd = qpd_to_pdd(qpd);
+
+	/* check if sh_mem_config register already configured */
+	if (qpd->sh_mem_config == 0) {
+		qpd->sh_mem_config =
+			ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED) |
+			DEFAULT_MTYPE(MTYPE_NONCACHED) |
+			APE1_MTYPE(MTYPE_NONCACHED);
+		qpd->sh_mem_ape1_limit = 0;
+		qpd->sh_mem_ape1_base = 0;
+	}
+
+	if (qpd->pqm->process->is_32bit_user_mode) {
+		temp = get_sh_mem_bases_32(pdd);
+		qpd->sh_mem_bases = SHARED_BASE(temp);
+		qpd->sh_mem_config |= PTR32;
+	} else {
+		temp = get_sh_mem_bases_nybble_64(pdd);
+		qpd->sh_mem_bases = compute_sh_mem_bases_64bit(temp);
+	}
+
+	pr_debug("kfd: is32bit process: %d sh_mem_bases nybble: 0x%X and register 0x%X\n",
+		qpd->pqm->process->is_32bit_user_mode, temp, qpd->sh_mem_bases);
+}
+
+static void program_sh_mem_settings(struct device_queue_manager *dqm,
+					struct qcm_process_device *qpd)
+{
+	return kfd2kgd->program_sh_mem_settings(dqm->dev->kgd, qpd->vmid,
+						qpd->sh_mem_config,
+						qpd->sh_mem_ape1_base,
+						qpd->sh_mem_ape1_limit,
+						qpd->sh_mem_bases);
+}
+
+static int allocate_vmid(struct device_queue_manager *dqm,
+			struct qcm_process_device *qpd,
+			struct queue *q)
+{
+	int bit, allocated_vmid;
+
+	if (dqm->vmid_bitmap == 0)
+		return -ENOMEM;
+
+	bit = find_first_bit((unsigned long *)&dqm->vmid_bitmap, CIK_VMID_NUM);
+	clear_bit(bit, (unsigned long *)&dqm->vmid_bitmap);
+
+	/* Kaveri kfd vmid's starts from vmid 8 */
+	allocated_vmid = bit + KFD_VMID_START_OFFSET;
+	pr_debug("kfd: vmid allocation %d\n", allocated_vmid);
+	qpd->vmid = allocated_vmid;
+	q->properties.vmid = allocated_vmid;
+
+	set_pasid_vmid_mapping(dqm, q->process->pasid, q->properties.vmid);
+	program_sh_mem_settings(dqm, qpd);
+
+	return 0;
+}
+
+static void deallocate_vmid(struct device_queue_manager *dqm,
+				struct qcm_process_device *qpd,
+				struct queue *q)
+{
+	int bit = qpd->vmid - KFD_VMID_START_OFFSET;
+
+	set_bit(bit, (unsigned long *)&dqm->vmid_bitmap);
+	qpd->vmid = 0;
+	q->properties.vmid = 0;
+}
+
+static int create_queue_nocpsch(struct device_queue_manager *dqm,
+				struct queue *q,
+				struct qcm_process_device *qpd,
+				int *allocated_vmid)
+{
+	int retval;
+
+	BUG_ON(!dqm || !q || !qpd || !allocated_vmid);
+
+	pr_debug("kfd: In func %s\n", __func__);
+	print_queue(q);
+
+	mutex_lock(&dqm->lock);
+
+	if (list_empty(&qpd->queues_list)) {
+		retval = allocate_vmid(dqm, qpd, q);
+		if (retval != 0) {
+			mutex_unlock(&dqm->lock);
+			return retval;
+		}
+	}
+	*allocated_vmid = qpd->vmid;
+	q->properties.vmid = qpd->vmid;
+
+	retval = create_compute_queue_nocpsch(dqm, q, qpd);
+
+	if (retval != 0) {
+		if (list_empty(&qpd->queues_list)) {
+			deallocate_vmid(dqm, qpd, q);
+			*allocated_vmid = 0;
+		}
+		mutex_unlock(&dqm->lock);
+		return retval;
+	}
+
+	list_add(&q->list, &qpd->queues_list);
+	dqm->queue_count++;
+
+	mutex_unlock(&dqm->lock);
+	return 0;
+}
+
+static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q)
+{
+	bool set;
+	int pipe, bit;
+
+	set = false;
+
+	for (pipe = dqm->next_pipe_to_allocate; pipe < get_pipes_num(dqm);
+			pipe = (pipe + 1) % get_pipes_num(dqm)) {
+		if (dqm->allocated_queues[pipe] != 0) {
+			bit = find_first_bit(
+				(unsigned long *)&dqm->allocated_queues[pipe],
+				QUEUES_PER_PIPE);
+
+			clear_bit(bit,
+				(unsigned long *)&dqm->allocated_queues[pipe]);
+			q->pipe = pipe;
+			q->queue = bit;
+			set = true;
+			break;
+		}
+	}
+
+	if (set == false)
+		return -EBUSY;
+
+	pr_debug("kfd: DQM %s hqd slot - pipe (%d) queue(%d)\n",
+				__func__, q->pipe, q->queue);
+	/* horizontal hqd allocation */
+	dqm->next_pipe_to_allocate = (pipe + 1) % get_pipes_num(dqm);
+
+	return 0;
+}
+
+static inline void deallocate_hqd(struct device_queue_manager *dqm,
+				struct queue *q)
+{
+	set_bit(q->queue, (unsigned long *)&dqm->allocated_queues[q->pipe]);
+}
+
+static int create_compute_queue_nocpsch(struct device_queue_manager *dqm,
+					struct queue *q,
+					struct qcm_process_device *qpd)
+{
+	int retval;
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dqm || !q || !qpd);
+
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_COMPUTE);
+	if (mqd == NULL)
+		return -ENOMEM;
+
+	retval = allocate_hqd(dqm, q);
+	if (retval != 0)
+		return retval;
+
+	retval = mqd->init_mqd(mqd, &q->mqd, &q->mqd_mem_obj,
+				&q->gart_mqd_addr, &q->properties);
+	if (retval != 0) {
+		deallocate_hqd(dqm, q);
+		return retval;
+	}
+
+	return 0;
+}
+
+static int destroy_queue_nocpsch(struct device_queue_manager *dqm,
+				struct qcm_process_device *qpd,
+				struct queue *q)
+{
+	int retval;
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dqm || !q || !q->mqd || !qpd);
+
+	retval = 0;
+
+	pr_debug("kfd: In Func %s\n", __func__);
+
+	mutex_lock(&dqm->lock);
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_COMPUTE);
+	if (mqd == NULL) {
+		retval = -ENOMEM;
+		goto out;
+	}
+
+	retval = mqd->destroy_mqd(mqd, q->mqd,
+				KFD_PREEMPT_TYPE_WAVEFRONT,
+				QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS,
+				q->pipe, q->queue);
+
+	if (retval != 0)
+		goto out;
+
+	deallocate_hqd(dqm, q);
+
+	mqd->uninit_mqd(mqd, q->mqd, q->mqd_mem_obj);
+
+	list_del(&q->list);
+	if (list_empty(&qpd->queues_list))
+		deallocate_vmid(dqm, qpd, q);
+	dqm->queue_count--;
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int update_queue(struct device_queue_manager *dqm, struct queue *q)
+{
+	int retval;
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dqm || !q || !q->mqd);
+
+	mutex_lock(&dqm->lock);
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_COMPUTE);
+	if (mqd == NULL) {
+		mutex_unlock(&dqm->lock);
+		return -ENOMEM;
+	}
+
+	retval = mqd->update_mqd(mqd, q->mqd, &q->properties);
+	if (q->properties.is_active == true)
+		dqm->queue_count++;
+	else
+		dqm->queue_count--;
+
+	if (sched_policy != KFD_SCHED_POLICY_NO_HWS)
+		retval = execute_queues_cpsch(dqm, false);
+
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static struct mqd_manager *get_mqd_manager_nocpsch(
+		struct device_queue_manager *dqm, enum KFD_MQD_TYPE type)
+{
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dqm || type >= KFD_MQD_TYPE_MAX);
+
+	pr_debug("kfd: In func %s mqd type %d\n", __func__, type);
+
+	mqd = dqm->mqds[type];
+	if (!mqd) {
+		mqd = mqd_manager_init(type, dqm->dev);
+		if (mqd == NULL)
+			pr_err("kfd: mqd manager is NULL");
+		dqm->mqds[type] = mqd;
+	}
+
+	return mqd;
+}
+
+static int register_process_nocpsch(struct device_queue_manager *dqm,
+					struct qcm_process_device *qpd)
+{
+	struct device_process_node *n;
+
+	BUG_ON(!dqm || !qpd);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	n = kzalloc(sizeof(struct device_process_node), GFP_KERNEL);
+	if (!n)
+		return -ENOMEM;
+
+	n->qpd = qpd;
+
+	mutex_lock(&dqm->lock);
+	list_add(&n->list, &dqm->queues);
+
+	init_process_memory(dqm, qpd);
+	dqm->processes_count++;
+
+	mutex_unlock(&dqm->lock);
+
+	return 0;
+}
+
+static int unregister_process_nocpsch(struct device_queue_manager *dqm,
+					struct qcm_process_device *qpd)
+{
+	int retval;
+	struct device_process_node *cur, *next;
+
+	BUG_ON(!dqm || !qpd);
+
+	BUG_ON(!list_empty(&qpd->queues_list));
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	retval = 0;
+	mutex_lock(&dqm->lock);
+
+	list_for_each_entry_safe(cur, next, &dqm->queues, list) {
+		if (qpd == cur->qpd) {
+			list_del(&cur->list);
+			kfree(cur);
+			dqm->processes_count--;
+			goto out;
+		}
+	}
+	/* qpd not found in dqm list */
+	retval = 1;
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int
+set_pasid_vmid_mapping(struct device_queue_manager *dqm, unsigned int pasid,
+			unsigned int vmid)
+{
+	uint32_t pasid_mapping;
+
+	pasid_mapping = (pasid == 0) ? 0 : (uint32_t)pasid |
+						ATC_VMID_PASID_MAPPING_VALID;
+	return kfd2kgd->set_pasid_vmid_mapping(dqm->dev->kgd, pasid_mapping,
+						vmid);
+}
+
+static uint32_t compute_sh_mem_bases_64bit(unsigned int top_address_nybble)
+{
+	/* In 64-bit mode, we can only control the top 3 bits of the LDS,
+	 * scratch and GPUVM apertures.
+	 * The hardware fills in the remaining 59 bits according to the
+	 * following pattern:
+	 * LDS:		X0000000'00000000 - X0000001'00000000 (4GB)
+	 * Scratch:	X0000001'00000000 - X0000002'00000000 (4GB)
+	 * GPUVM:	Y0010000'00000000 - Y0020000'00000000 (1TB)
+	 *
+	 * (where X/Y is the configurable nybble with the low-bit 0)
+	 *
+	 * LDS and scratch will have the same top nybble programmed in the
+	 * top 3 bits of SH_MEM_BASES.PRIVATE_BASE.
+	 * GPUVM can have a different top nybble programmed in the
+	 * top 3 bits of SH_MEM_BASES.SHARED_BASE.
+	 * We don't bother to support different top nybbles
+	 * for LDS/Scratch and GPUVM.
+	 */
+
+	BUG_ON((top_address_nybble & 1) || top_address_nybble > 0xE ||
+		top_address_nybble == 0);
+
+	return PRIVATE_BASE(top_address_nybble << 12) |
+			SHARED_BASE(top_address_nybble << 12);
+}
+
+static int init_memory(struct device_queue_manager *dqm)
+{
+	int i, retval;
+
+	for (i = 8; i < 16; i++)
+		set_pasid_vmid_mapping(dqm, 0, i);
+
+	retval = kfd2kgd->init_memory(dqm->dev->kgd);
+	if (retval == 0)
+		is_mem_initialized = true;
+	return retval;
+}
+
+
+static int init_pipelines(struct device_queue_manager *dqm,
+			unsigned int pipes_num, unsigned int first_pipe)
+{
+	void *hpdptr;
+	struct mqd_manager *mqd;
+	unsigned int i, err, inx;
+	uint64_t pipe_hpd_addr;
+
+	BUG_ON(!dqm || !dqm->dev);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	/*
+	 * Allocate memory for the HPDs. This is hardware-owned per-pipe data.
+	 * The driver never accesses this memory after zeroing it.
+	 * It doesn't even have to be saved/restored on suspend/resume
+	 * because it contains no data when there are no active queues.
+	 */
+
+	err = kfd2kgd->allocate_mem(dqm->dev->kgd,
+				CIK_HPD_EOP_BYTES * pipes_num,
+				PAGE_SIZE,
+				KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
+				(struct kgd_mem **) &dqm->pipeline_mem);
+
+	if (err) {
+		pr_err("kfd: error allocate vidmem num pipes: %d\n",
+			pipes_num);
+		return -ENOMEM;
+	}
+
+	hpdptr = dqm->pipeline_mem->cpu_ptr;
+	dqm->pipelines_addr = dqm->pipeline_mem->gpu_addr;
+
+	memset(hpdptr, 0, CIK_HPD_EOP_BYTES * pipes_num);
+
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_COMPUTE);
+	if (mqd == NULL) {
+		kfd2kgd->free_mem(dqm->dev->kgd,
+				(struct kgd_mem *) dqm->pipeline_mem);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < pipes_num; i++) {
+		inx = i + first_pipe;
+		pipe_hpd_addr = dqm->pipelines_addr + i * CIK_HPD_EOP_BYTES;
+		pr_debug("kfd: pipeline address %llX\n", pipe_hpd_addr);
+		/* = log2(bytes/4)-1 */
+		kfd2kgd->init_pipeline(dqm->dev->kgd, i,
+				CIK_HPD_EOP_BYTES_LOG2 - 3, pipe_hpd_addr);
+	}
+
+	return 0;
+}
+
+
+static int init_scheduler(struct device_queue_manager *dqm)
+{
+	int retval;
+
+	BUG_ON(!dqm);
+
+	pr_debug("kfd: In %s\n", __func__);
+
+	retval = init_pipelines(dqm, get_pipes_num(dqm), KFD_DQM_FIRST_PIPE);
+	if (retval != 0)
+		return retval;
+
+	retval = init_memory(dqm);
+
+	return retval;
+}
+
+static int initialize_nocpsch(struct device_queue_manager *dqm)
+{
+	int i;
+
+	BUG_ON(!dqm);
+
+	pr_debug("kfd: In func %s num of pipes: %d\n",
+			__func__, get_pipes_num(dqm));
+
+	mutex_init(&dqm->lock);
+	INIT_LIST_HEAD(&dqm->queues);
+	dqm->queue_count = dqm->next_pipe_to_allocate = 0;
+	dqm->allocated_queues = kcalloc(get_pipes_num(dqm),
+					sizeof(unsigned int), GFP_KERNEL);
+	if (!dqm->allocated_queues) {
+		mutex_destroy(&dqm->lock);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < get_pipes_num(dqm); i++)
+		dqm->allocated_queues[i] = (1 << QUEUES_PER_PIPE) - 1;
+
+	dqm->vmid_bitmap = (1 << VMID_PER_DEVICE) - 1;
+
+	init_scheduler(dqm);
+	return 0;
+}
+
+static void uninitialize_nocpsch(struct device_queue_manager *dqm)
+{
+	int i;
+
+	BUG_ON(!dqm);
+
+	BUG_ON(dqm->queue_count > 0 || dqm->processes_count > 0);
+
+	kfree(dqm->allocated_queues);
+	for (i = 0 ; i < KFD_MQD_TYPE_MAX ; i++)
+		kfree(dqm->mqds[i]);
+	mutex_destroy(&dqm->lock);
+	kfd2kgd->free_mem(dqm->dev->kgd,
+			(struct kgd_mem *) dqm->pipeline_mem);
+}
+
+static int start_nocpsch(struct device_queue_manager *dqm)
+{
+	return 0;
+}
+
+static int stop_nocpsch(struct device_queue_manager *dqm)
+{
+	return 0;
+}
+
+/*
+ * Device Queue Manager implementation for cp scheduler
+ */
+
+static int set_sched_resources(struct device_queue_manager *dqm)
+{
+	struct scheduling_resources res;
+	unsigned int queue_num, queue_mask;
+
+	BUG_ON(!dqm);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	queue_num = get_pipes_num_cpsch() * QUEUES_PER_PIPE;
+	queue_mask = (1 << queue_num) - 1;
+	res.vmid_mask = (1 << VMID_PER_DEVICE) - 1;
+	res.vmid_mask <<= KFD_VMID_START_OFFSET;
+	res.queue_mask = queue_mask << (get_first_pipe(dqm) * QUEUES_PER_PIPE);
+	res.gws_mask = res.oac_mask = res.gds_heap_base =
+						res.gds_heap_size = 0;
+
+	pr_debug("kfd: scheduling resources:\n"
+			"      vmid mask: 0x%8X\n"
+			"      queue mask: 0x%8llX\n",
+			res.vmid_mask, res.queue_mask);
+
+	return pm_send_set_resources(&dqm->packets, &res);
+}
+
+static int initialize_cpsch(struct device_queue_manager *dqm)
+{
+	int retval;
+
+	BUG_ON(!dqm);
+
+	pr_debug("kfd: In func %s num of pipes: %d\n",
+			__func__, get_pipes_num_cpsch());
+
+	mutex_init(&dqm->lock);
+	INIT_LIST_HEAD(&dqm->queues);
+	dqm->queue_count = dqm->processes_count = 0;
+	dqm->active_runlist = false;
+	retval = init_pipelines(dqm, get_pipes_num(dqm), 0);
+	if (retval != 0)
+		goto fail_init_pipelines;
+
+	return 0;
+
+fail_init_pipelines:
+	mutex_destroy(&dqm->lock);
+	return retval;
+}
+
+static int start_cpsch(struct device_queue_manager *dqm)
+{
+	struct device_process_node *node;
+	int retval;
+
+	BUG_ON(!dqm);
+
+	retval = 0;
+
+	retval = pm_init(&dqm->packets, dqm);
+	if (retval != 0)
+		goto fail_packet_manager_init;
+
+	retval = set_sched_resources(dqm);
+	if (retval != 0)
+		goto fail_set_sched_resources;
+
+	pr_debug("kfd: allocating fence memory\n");
+
+	/* allocate fence memory on the gart */
+	retval = kfd2kgd->allocate_mem(dqm->dev->kgd,
+					sizeof(*dqm->fence_addr),
+					32,
+					KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
+					(struct kgd_mem **) &dqm->fence_mem);
+
+	if (retval != 0)
+		goto fail_allocate_vidmem;
+
+	dqm->fence_addr = dqm->fence_mem->cpu_ptr;
+	dqm->fence_gpu_addr = dqm->fence_mem->gpu_addr;
+
+	list_for_each_entry(node, &dqm->queues, list)
+		if (node->qpd->pqm->process && dqm->dev)
+			kfd_bind_process_to_device(dqm->dev,
+						node->qpd->pqm->process);
+
+	execute_queues_cpsch(dqm, true);
+
+	return 0;
+fail_allocate_vidmem:
+fail_set_sched_resources:
+	pm_uninit(&dqm->packets);
+fail_packet_manager_init:
+	return retval;
+}
+
+static int stop_cpsch(struct device_queue_manager *dqm)
+{
+	struct device_process_node *node;
+	struct kfd_process_device *pdd;
+
+	BUG_ON(!dqm);
+
+	destroy_queues_cpsch(dqm, true);
+
+	list_for_each_entry(node, &dqm->queues, list) {
+		pdd = qpd_to_pdd(node->qpd);
+		pdd->bound = false;
+	}
+	kfd2kgd->free_mem(dqm->dev->kgd,
+			(struct kgd_mem *) dqm->fence_mem);
+	pm_uninit(&dqm->packets);
+
+	return 0;
+}
+
+static int create_kernel_queue_cpsch(struct device_queue_manager *dqm,
+					struct kernel_queue *kq,
+					struct qcm_process_device *qpd)
+{
+	BUG_ON(!dqm || !kq || !qpd);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	mutex_lock(&dqm->lock);
+	list_add(&kq->list, &qpd->priv_queue_list);
+	dqm->queue_count++;
+	qpd->is_debug = true;
+	execute_queues_cpsch(dqm, false);
+	mutex_unlock(&dqm->lock);
+
+	return 0;
+}
+
+static void destroy_kernel_queue_cpsch(struct device_queue_manager *dqm,
+					struct kernel_queue *kq,
+					struct qcm_process_device *qpd)
+{
+	BUG_ON(!dqm || !kq);
+
+	pr_debug("kfd: In %s\n", __func__);
+
+	mutex_lock(&dqm->lock);
+	destroy_queues_cpsch(dqm, false);
+	list_del(&kq->list);
+	dqm->queue_count--;
+	qpd->is_debug = false;
+	execute_queues_cpsch(dqm, false);
+	mutex_unlock(&dqm->lock);
+}
+
+static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
+			struct qcm_process_device *qpd, int *allocate_vmid)
+{
+	int retval;
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dqm || !q || !qpd);
+
+	retval = 0;
+
+	if (allocate_vmid)
+		*allocate_vmid = 0;
+
+	mutex_lock(&dqm->lock);
+
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_CP);
+	if (mqd == NULL) {
+		mutex_unlock(&dqm->lock);
+		return -ENOMEM;
+	}
+
+	retval = mqd->init_mqd(mqd, &q->mqd, &q->mqd_mem_obj,
+				&q->gart_mqd_addr, &q->properties);
+	if (retval != 0)
+		goto out;
+
+	list_add(&q->list, &qpd->queues_list);
+	if (q->properties.is_active) {
+		dqm->queue_count++;
+		retval = execute_queues_cpsch(dqm, false);
+	}
+
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int fence_wait_timeout(unsigned int *fence_addr,
+				unsigned int fence_value,
+				unsigned long timeout)
+{
+	BUG_ON(!fence_addr);
+	timeout += jiffies;
+
+	while (*fence_addr != fence_value) {
+		if (time_after(jiffies, timeout)) {
+			pr_err("kfd: qcm fence wait loop timeout expired\n");
+			return -ETIME;
+		}
+		cpu_relax();
+	}
+
+	return 0;
+}
+
+static int destroy_queues_cpsch(struct device_queue_manager *dqm, bool lock)
+{
+	int retval;
+
+	BUG_ON(!dqm);
+
+	retval = 0;
+
+	if (lock)
+		mutex_lock(&dqm->lock);
+	if (dqm->active_runlist == false)
+		goto out;
+	retval = pm_send_unmap_queue(&dqm->packets, KFD_QUEUE_TYPE_COMPUTE,
+			KFD_PREEMPT_TYPE_FILTER_ALL_QUEUES, 0, false, 0);
+	if (retval != 0)
+		goto out;
+
+	*dqm->fence_addr = KFD_FENCE_INIT;
+	pm_send_query_status(&dqm->packets, dqm->fence_gpu_addr,
+				KFD_FENCE_COMPLETED);
+	/* should be timed out */
+	fence_wait_timeout(dqm->fence_addr, KFD_FENCE_COMPLETED,
+				QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS);
+	pm_release_ib(&dqm->packets);
+	dqm->active_runlist = false;
+
+out:
+	if (lock)
+		mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int execute_queues_cpsch(struct device_queue_manager *dqm, bool lock)
+{
+	int retval;
+
+	BUG_ON(!dqm);
+
+	if (lock)
+		mutex_lock(&dqm->lock);
+
+	retval = destroy_queues_cpsch(dqm, false);
+	if (retval != 0) {
+		pr_err("kfd: the cp might be in an unrecoverable state due to an unsuccessful queues preemption");
+		goto out;
+	}
+
+	if (dqm->queue_count <= 0 || dqm->processes_count <= 0) {
+		retval = 0;
+		goto out;
+	}
+
+	if (dqm->active_runlist) {
+		retval = 0;
+		goto out;
+	}
+
+	retval = pm_send_runlist(&dqm->packets, &dqm->queues);
+	if (retval != 0) {
+		pr_err("kfd: failed to execute runlist");
+		goto out;
+	}
+	dqm->active_runlist = true;
+
+out:
+	if (lock)
+		mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int destroy_queue_cpsch(struct device_queue_manager *dqm,
+				struct qcm_process_device *qpd,
+				struct queue *q)
+{
+	int retval;
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dqm || !qpd || !q);
+
+	retval = 0;
+
+	/* remove queue from list to prevent rescheduling after preemption */
+	mutex_lock(&dqm->lock);
+
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_CP);
+	if (!mqd) {
+		retval = -ENOMEM;
+		goto failed;
+	}
+
+	list_del(&q->list);
+	dqm->queue_count--;
+
+	execute_queues_cpsch(dqm, false);
+
+	mqd->uninit_mqd(mqd, q->mqd, q->mqd_mem_obj);
+
+	mutex_unlock(&dqm->lock);
+
+	return 0;
+
+failed:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+/*
+ * Low bits must be 0000/FFFF as required by HW, high bits must be 0 to
+ * stay in user mode.
+ */
+#define APE1_FIXED_BITS_MASK 0xFFFF80000000FFFFULL
+/* APE1 limit is inclusive and 64K aligned. */
+#define APE1_LIMIT_ALIGNMENT 0xFFFF
+
+static bool set_cache_memory_policy(struct device_queue_manager *dqm,
+				   struct qcm_process_device *qpd,
+				   enum cache_policy default_policy,
+				   enum cache_policy alternate_policy,
+				   void __user *alternate_aperture_base,
+				   uint64_t alternate_aperture_size)
+{
+	uint32_t default_mtype;
+	uint32_t ape1_mtype;
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	mutex_lock(&dqm->lock);
+
+	if (alternate_aperture_size == 0) {
+		/* base > limit disables APE1 */
+		qpd->sh_mem_ape1_base = 1;
+		qpd->sh_mem_ape1_limit = 0;
+	} else {
+		/*
+		 * In FSA64, APE1_Base[63:0] = { 16{SH_MEM_APE1_BASE[31]},
+		 *			SH_MEM_APE1_BASE[31:0], 0x0000 }
+		 * APE1_Limit[63:0] = { 16{SH_MEM_APE1_LIMIT[31]},
+		 *			SH_MEM_APE1_LIMIT[31:0], 0xFFFF }
+		 * Verify that the base and size parameters can be
+		 * represented in this format and convert them.
+		 * Additionally restrict APE1 to user-mode addresses.
+		 */
+
+		uint64_t base = (uintptr_t)alternate_aperture_base;
+		uint64_t limit = base + alternate_aperture_size - 1;
+
+		if (limit <= base)
+			goto out;
+
+		if ((base & APE1_FIXED_BITS_MASK) != 0)
+			goto out;
+
+		if ((limit & APE1_FIXED_BITS_MASK) != APE1_LIMIT_ALIGNMENT)
+			goto out;
+
+		qpd->sh_mem_ape1_base = base >> 16;
+		qpd->sh_mem_ape1_limit = limit >> 16;
+	}
+
+	default_mtype = (default_policy == cache_policy_coherent) ?
+			MTYPE_NONCACHED :
+			MTYPE_CACHED;
+
+	ape1_mtype = (alternate_policy == cache_policy_coherent) ?
+			MTYPE_NONCACHED :
+			MTYPE_CACHED;
+
+	qpd->sh_mem_config = (qpd->sh_mem_config & PTR32)
+			| ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED)
+			| DEFAULT_MTYPE(default_mtype)
+			| APE1_MTYPE(ape1_mtype);
+
+	if ((sched_policy == KFD_SCHED_POLICY_NO_HWS) && (qpd->vmid != 0))
+		program_sh_mem_settings(dqm, qpd);
+
+	pr_debug("kfd: sh_mem_config: 0x%x, ape1_base: 0x%x, ape1_limit: 0x%x\n",
+		qpd->sh_mem_config, qpd->sh_mem_ape1_base,
+		qpd->sh_mem_ape1_limit);
+
+	mutex_unlock(&dqm->lock);
+	return true;
+
+out:
+	mutex_unlock(&dqm->lock);
+	return false;
+}
+
+struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
+{
+	struct device_queue_manager *dqm;
+
+	BUG_ON(!dev);
+
+	dqm = kzalloc(sizeof(struct device_queue_manager), GFP_KERNEL);
+	if (!dqm)
+		return NULL;
+
+	dqm->dev = dev;
+	switch (sched_policy) {
+	case KFD_SCHED_POLICY_HWS:
+	case KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION:
+		/* initialize dqm for cp scheduling */
+		dqm->create_queue = create_queue_cpsch;
+		dqm->initialize = initialize_cpsch;
+		dqm->start = start_cpsch;
+		dqm->stop = stop_cpsch;
+		dqm->destroy_queue = destroy_queue_cpsch;
+		dqm->update_queue = update_queue;
+		dqm->get_mqd_manager = get_mqd_manager_nocpsch;
+		dqm->register_process = register_process_nocpsch;
+		dqm->unregister_process = unregister_process_nocpsch;
+		dqm->uninitialize = uninitialize_nocpsch;
+		dqm->create_kernel_queue = create_kernel_queue_cpsch;
+		dqm->destroy_kernel_queue = destroy_kernel_queue_cpsch;
+		dqm->set_cache_memory_policy = set_cache_memory_policy;
+		break;
+	case KFD_SCHED_POLICY_NO_HWS:
+		/* initialize dqm for no cp scheduling */
+		dqm->start = start_nocpsch;
+		dqm->stop = stop_nocpsch;
+		dqm->create_queue = create_queue_nocpsch;
+		dqm->destroy_queue = destroy_queue_nocpsch;
+		dqm->update_queue = update_queue;
+		dqm->get_mqd_manager = get_mqd_manager_nocpsch;
+		dqm->register_process = register_process_nocpsch;
+		dqm->unregister_process = unregister_process_nocpsch;
+		dqm->initialize = initialize_nocpsch;
+		dqm->uninitialize = uninitialize_nocpsch;
+		dqm->set_cache_memory_policy = set_cache_memory_policy;
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	if (dqm->initialize(dqm) != 0) {
+		kfree(dqm);
+		return NULL;
+	}
+
+	return dqm;
+}
+
+void device_queue_manager_uninit(struct device_queue_manager *dqm)
+{
+	BUG_ON(!dqm);
+
+	dqm->uninitialize(dqm);
+	kfree(dqm);
+}
+
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
new file mode 100644
index 000000000000..c3f189e8ae35
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -0,0 +1,146 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef KFD_DEVICE_QUEUE_MANAGER_H_
+#define KFD_DEVICE_QUEUE_MANAGER_H_
+
+#include <linux/rwsem.h>
+#include <linux/list.h>
+#include "kfd_priv.h"
+#include "kfd_mqd_manager.h"
+
+#define QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS	(500)
+#define QUEUES_PER_PIPE				(8)
+#define PIPE_PER_ME_CP_SCHEDULING		(3)
+#define CIK_VMID_NUM				(8)
+#define KFD_VMID_START_OFFSET			(8)
+#define VMID_PER_DEVICE				CIK_VMID_NUM
+#define KFD_DQM_FIRST_PIPE			(0)
+
+struct device_process_node {
+	struct qcm_process_device *qpd;
+	struct list_head list;
+};
+
+/**
+ * struct device_queue_manager
+ *
+ * @create_queue: Queue creation routine.
+ *
+ * @destroy_queue: Queue destruction routine.
+ *
+ * @update_queue: Queue update routine.
+ *
+ * @get_mqd_manager: Returns the mqd manager according to the mqd type.
+ *
+ * @exeute_queues: Dispatches the queues list to the H/W.
+ *
+ * @register_process: This routine associates a specific process with device.
+ *
+ * @unregister_process: destroys the associations between process to device.
+ *
+ * @initialize: Initializes the pipelines and memory module for that device.
+ *
+ * @start: Initializes the resources/modules the the device needs for queues
+ * execution. This function is called on device initialization and after the
+ * system woke up after suspension.
+ *
+ * @stop: This routine stops execution of all the active queue running on the
+ * H/W and basically this function called on system suspend.
+ *
+ * @uninitialize: Destroys all the device queue manager resources allocated in
+ * initialize routine.
+ *
+ * @create_kernel_queue: Creates kernel queue. Used for debug queue.
+ *
+ * @destroy_kernel_queue: Destroys kernel queue. Used for debug queue.
+ *
+ * @set_cache_memory_policy: Sets memory policy (cached/ non cached) for the
+ * memory apertures.
+ *
+ * This struct is a base class for the kfd queues scheduler in the
+ * device level. The device base class should expose the basic operations
+ * for queue creation and queue destruction. This base class hides the
+ * scheduling mode of the driver and the specific implementation of the
+ * concrete device. This class is the only class in the queues scheduler
+ * that configures the H/W.
+ */
+
+struct device_queue_manager {
+	int	(*create_queue)(struct device_queue_manager *dqm,
+				struct queue *q,
+				struct qcm_process_device *qpd,
+				int *allocate_vmid);
+	int	(*destroy_queue)(struct device_queue_manager *dqm,
+				struct qcm_process_device *qpd,
+				struct queue *q);
+	int	(*update_queue)(struct device_queue_manager *dqm,
+				struct queue *q);
+
+	struct mqd_manager * (*get_mqd_manager)
+					(struct device_queue_manager *dqm,
+					enum KFD_MQD_TYPE type);
+
+	int	(*register_process)(struct device_queue_manager *dqm,
+					struct qcm_process_device *qpd);
+	int	(*unregister_process)(struct device_queue_manager *dqm,
+					struct qcm_process_device *qpd);
+	int	(*initialize)(struct device_queue_manager *dqm);
+	int	(*start)(struct device_queue_manager *dqm);
+	int	(*stop)(struct device_queue_manager *dqm);
+	void	(*uninitialize)(struct device_queue_manager *dqm);
+	int	(*create_kernel_queue)(struct device_queue_manager *dqm,
+					struct kernel_queue *kq,
+					struct qcm_process_device *qpd);
+	void	(*destroy_kernel_queue)(struct device_queue_manager *dqm,
+					struct kernel_queue *kq,
+					struct qcm_process_device *qpd);
+	bool	(*set_cache_memory_policy)(struct device_queue_manager *dqm,
+					   struct qcm_process_device *qpd,
+					   enum cache_policy default_policy,
+					   enum cache_policy alternate_policy,
+					   void __user *alternate_aperture_base,
+					   uint64_t alternate_aperture_size);
+
+
+	struct mqd_manager	*mqds[KFD_MQD_TYPE_MAX];
+	struct packet_manager	packets;
+	struct kfd_dev		*dev;
+	struct mutex		lock;
+	struct list_head	queues;
+	unsigned int		processes_count;
+	unsigned int		queue_count;
+	unsigned int		next_pipe_to_allocate;
+	unsigned int		*allocated_queues;
+	unsigned int		vmid_bitmap;
+	uint64_t		pipelines_addr;
+	struct kfd_mem_obj	*pipeline_mem;
+	uint64_t		fence_gpu_addr;
+	unsigned int		*fence_addr;
+	struct kfd_mem_obj	*fence_mem;
+	bool			active_runlist;
+};
+
+
+
+#endif /* KFD_DEVICE_QUEUE_MANAGER_H_ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
new file mode 100644
index 000000000000..b5791a5c7c06
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -0,0 +1,256 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+#include "kfd_priv.h"
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/slab.h>
+#include <linux/io.h>
+
+/*
+ * This extension supports a kernel level doorbells management for
+ * the kernel queues.
+ * Basically the last doorbells page is devoted to kernel queues
+ * and that's assures that any user process won't get access to the
+ * kernel doorbells page
+ */
+static DEFINE_MUTEX(doorbell_mutex);
+static unsigned long doorbell_available_index[
+	DIV_ROUND_UP(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS, BITS_PER_LONG)] = { 0 };
+
+#define KERNEL_DOORBELL_PASID 1
+#define KFD_SIZE_OF_DOORBELL_IN_BYTES 4
+
+/*
+ * Each device exposes a doorbell aperture, a PCI MMIO aperture that
+ * receives 32-bit writes that are passed to queues as wptr values.
+ * The doorbells are intended to be written by applications as part
+ * of queueing work on user-mode queues.
+ * We assign doorbells to applications in PAGE_SIZE-sized and aligned chunks.
+ * We map the doorbell address space into user-mode when a process creates
+ * its first queue on each device.
+ * Although the mapping is done by KFD, it is equivalent to an mmap of
+ * the /dev/kfd with the particular device encoded in the mmap offset.
+ * There will be other uses for mmap of /dev/kfd, so only a range of
+ * offsets (KFD_MMAP_DOORBELL_START-END) is used for doorbells.
+ */
+
+/* # of doorbell bytes allocated for each process. */
+static inline size_t doorbell_process_allocation(void)
+{
+	return roundup(KFD_SIZE_OF_DOORBELL_IN_BYTES *
+			KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
+			PAGE_SIZE);
+}
+
+/* Doorbell calculations for device init. */
+void kfd_doorbell_init(struct kfd_dev *kfd)
+{
+	size_t doorbell_start_offset;
+	size_t doorbell_aperture_size;
+	size_t doorbell_process_limit;
+
+	/*
+	 * We start with calculations in bytes because the input data might
+	 * only be byte-aligned.
+	 * Only after we have done the rounding can we assume any alignment.
+	 */
+
+	doorbell_start_offset =
+			roundup(kfd->shared_resources.doorbell_start_offset,
+					doorbell_process_allocation());
+
+	doorbell_aperture_size =
+			rounddown(kfd->shared_resources.doorbell_aperture_size,
+					doorbell_process_allocation());
+
+	if (doorbell_aperture_size > doorbell_start_offset)
+		doorbell_process_limit =
+			(doorbell_aperture_size - doorbell_start_offset) /
+						doorbell_process_allocation();
+	else
+		doorbell_process_limit = 0;
+
+	kfd->doorbell_base = kfd->shared_resources.doorbell_physical_address +
+				doorbell_start_offset;
+
+	kfd->doorbell_id_offset = doorbell_start_offset / sizeof(u32);
+	kfd->doorbell_process_limit = doorbell_process_limit - 1;
+
+	kfd->doorbell_kernel_ptr = ioremap(kfd->doorbell_base,
+						doorbell_process_allocation());
+
+	BUG_ON(!kfd->doorbell_kernel_ptr);
+
+	pr_debug("kfd: doorbell initialization:\n");
+	pr_debug("kfd: doorbell base           == 0x%08lX\n",
+			(uintptr_t)kfd->doorbell_base);
+
+	pr_debug("kfd: doorbell_id_offset      == 0x%08lX\n",
+			kfd->doorbell_id_offset);
+
+	pr_debug("kfd: doorbell_process_limit  == 0x%08lX\n",
+			doorbell_process_limit);
+
+	pr_debug("kfd: doorbell_kernel_offset  == 0x%08lX\n",
+			(uintptr_t)kfd->doorbell_base);
+
+	pr_debug("kfd: doorbell aperture size  == 0x%08lX\n",
+			kfd->shared_resources.doorbell_aperture_size);
+
+	pr_debug("kfd: doorbell kernel address == 0x%08lX\n",
+			(uintptr_t)kfd->doorbell_kernel_ptr);
+}
+
+int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma)
+{
+	phys_addr_t address;
+	struct kfd_dev *dev;
+
+	/*
+	 * For simplicitly we only allow mapping of the entire doorbell
+	 * allocation of a single device & process.
+	 */
+	if (vma->vm_end - vma->vm_start != doorbell_process_allocation())
+		return -EINVAL;
+
+	/* Find kfd device according to gpu id */
+	dev = kfd_device_by_id(vma->vm_pgoff);
+	if (dev == NULL)
+		return -EINVAL;
+
+	/* Find if pdd exists for combination of process and gpu id */
+	if (!kfd_get_process_device_data(dev, process, 0))
+		return -EINVAL;
+
+	/* Calculate physical address of doorbell */
+	address = kfd_get_process_doorbells(dev, process);
+
+	vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
+				VM_DONTDUMP | VM_PFNMAP;
+
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+
+	pr_debug("kfd: mapping doorbell page in kfd_doorbell_mmap\n"
+		 "     target user address == 0x%08llX\n"
+		 "     physical address    == 0x%08llX\n"
+		 "     vm_flags            == 0x%04lX\n"
+		 "     size                == 0x%04lX\n",
+		 (unsigned long long) vma->vm_start, address, vma->vm_flags,
+		 doorbell_process_allocation());
+
+
+	return io_remap_pfn_range(vma,
+				vma->vm_start,
+				address >> PAGE_SHIFT,
+				doorbell_process_allocation(),
+				vma->vm_page_prot);
+}
+
+
+/* get kernel iomem pointer for a doorbell */
+u32 __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
+					unsigned int *doorbell_off)
+{
+	u32 inx;
+
+	BUG_ON(!kfd || !doorbell_off);
+
+	mutex_lock(&doorbell_mutex);
+	inx = find_first_zero_bit(doorbell_available_index,
+					KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
+
+	__set_bit(inx, doorbell_available_index);
+	mutex_unlock(&doorbell_mutex);
+
+	if (inx >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
+		return NULL;
+
+	/*
+	 * Calculating the kernel doorbell offset using "faked" kernel
+	 * pasid that allocated for kernel queues only
+	 */
+	*doorbell_off = KERNEL_DOORBELL_PASID * (doorbell_process_allocation() /
+							sizeof(u32)) + inx;
+
+	pr_debug("kfd: get kernel queue doorbell\n"
+			 "     doorbell offset   == 0x%08d\n"
+			 "     kernel address    == 0x%08lX\n",
+		*doorbell_off, (uintptr_t)(kfd->doorbell_kernel_ptr + inx));
+
+	return kfd->doorbell_kernel_ptr + inx;
+}
+
+void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr)
+{
+	unsigned int inx;
+
+	BUG_ON(!kfd || !db_addr);
+
+	inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr);
+
+	mutex_lock(&doorbell_mutex);
+	__clear_bit(inx, doorbell_available_index);
+	mutex_unlock(&doorbell_mutex);
+}
+
+inline void write_kernel_doorbell(u32 __iomem *db, u32 value)
+{
+	if (db) {
+		writel(value, db);
+		pr_debug("writing %d to doorbell address 0x%p\n", value, db);
+	}
+}
+
+/*
+ * queue_ids are in the range [0,MAX_PROCESS_QUEUES) and are mapped 1:1
+ * to doorbells with the process's doorbell page
+ */
+unsigned int kfd_queue_id_to_doorbell(struct kfd_dev *kfd,
+					struct kfd_process *process,
+					unsigned int queue_id)
+{
+	/*
+	 * doorbell_id_offset accounts for doorbells taken by KGD.
+	 * pasid * doorbell_process_allocation/sizeof(u32) adjusts
+	 * to the process's doorbells
+	 */
+	return kfd->doorbell_id_offset +
+		process->pasid * (doorbell_process_allocation()/sizeof(u32)) +
+		queue_id;
+}
+
+uint64_t kfd_get_number_elems(struct kfd_dev *kfd)
+{
+	uint64_t num_of_elems = (kfd->shared_resources.doorbell_aperture_size -
+				kfd->shared_resources.doorbell_start_offset) /
+					doorbell_process_allocation() + 1;
+
+	return num_of_elems;
+
+}
+
+phys_addr_t kfd_get_process_doorbells(struct kfd_dev *dev,
+					struct kfd_process *process)
+{
+	return dev->doorbell_base +
+		process->pasid * doorbell_process_allocation();
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
new file mode 100644
index 000000000000..66df4da01c29
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -0,0 +1,356 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/device.h>
+#include <linux/export.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/compat.h>
+#include <uapi/linux/kfd_ioctl.h>
+#include <linux/time.h>
+#include "kfd_priv.h"
+#include <linux/mm.h>
+#include <uapi/asm-generic/mman-common.h>
+#include <asm/processor.h>
+
+/*
+ * The primary memory I/O features being added for revisions of gfxip
+ * beyond 7.0 (Kaveri) are:
+ *
+ * Access to ATC/IOMMU mapped memory w/ associated extension of VA to 48b
+ *
+ * “Flat” shader memory access – These are new shader vector memory
+ * operations that do not reference a T#/V# so a “pointer” is what is
+ * sourced from the vector gprs for direct access to memory.
+ * This pointer space has the Shared(LDS) and Private(Scratch) memory
+ * mapped into this pointer space as apertures.
+ * The hardware then determines how to direct the memory request
+ * based on what apertures the request falls in.
+ *
+ * Unaligned support and alignment check
+ *
+ *
+ * System Unified Address - SUA
+ *
+ * The standard usage for GPU virtual addresses are that they are mapped by
+ * a set of page tables we call GPUVM and these page tables are managed by
+ * a combination of vidMM/driver software components.  The current virtual
+ * address (VA) range for GPUVM is 40b.
+ *
+ * As of gfxip7.1 and beyond we’re adding the ability for compute memory
+ * clients (CP/RLC, DMA, SHADER(ifetch, scalar, and vector ops)) to access
+ * the same page tables used by host x86 processors and that are managed by
+ * the operating system. This is via a technique and hardware called ATC/IOMMU.
+ * The GPU has the capability of accessing both the GPUVM and ATC address
+ * spaces for a given VMID (process) simultaneously and we call this feature
+ * system unified address (SUA).
+ *
+ * There are three fundamental address modes of operation for a given VMID
+ * (process) on the GPU:
+ *
+ *	HSA64 – 64b pointers and the default address space is ATC
+ *	HSA32 – 32b pointers and the default address space is ATC
+ *	GPUVM – 64b pointers and the default address space is GPUVM (driver
+ *		model mode)
+ *
+ *
+ * HSA64 - ATC/IOMMU 64b
+ *
+ * A 64b pointer in the AMD64/IA64 CPU architecture is not fully utilized
+ * by the CPU so an AMD CPU can only access the high area
+ * (VA[63:47] == 0x1FFFF) and low area (VA[63:47 == 0) of the address space
+ * so the actual VA carried to translation is 48b.  There is a “hole” in
+ * the middle of the 64b VA space.
+ *
+ * The GPU not only has access to all of the CPU accessible address space via
+ * ATC/IOMMU, but it also has access to the GPUVM address space.  The “system
+ * unified address” feature (SUA) is the mapping of GPUVM and ATC address
+ * spaces into a unified pointer space.  The method we take for 64b mode is
+ * to map the full 40b GPUVM address space into the hole of the 64b address
+ * space.
+
+ * The GPUVM_Base/GPUVM_Limit defines the aperture in the 64b space where we
+ * direct requests to be translated via GPUVM page tables instead of the
+ * IOMMU path.
+ *
+ *
+ * 64b to 49b Address conversion
+ *
+ * Note that there are still significant portions of unused regions (holes)
+ * in the 64b address space even for the GPU.  There are several places in
+ * the pipeline (sw and hw), we wish to compress the 64b virtual address
+ * to a 49b address.  This 49b address is constituted of an “ATC” bit
+ * plus a 48b virtual address.  This 49b address is what is passed to the
+ * translation hardware.  ATC==0 means the 48b address is a GPUVM address
+ * (max of 2^40 – 1) intended to be translated via GPUVM page tables.
+ * ATC==1 means the 48b address is intended to be translated via IOMMU
+ * page tables.
+ *
+ * A 64b pointer is compared to the apertures that are defined (Base/Limit), in
+ * this case the GPUVM aperture (red) is defined and if a pointer falls in this
+ * aperture, we subtract the GPUVM_Base address and set the ATC bit to zero
+ * as part of the 64b to 49b conversion.
+ *
+ * Where this 64b to 49b conversion is done is a function of the usage.
+ * Most GPU memory access is via memory objects where the driver builds
+ * a descriptor which consists of a base address and a memory access by
+ * the GPU usually consists of some kind of an offset or Cartesian coordinate
+ * that references this memory descriptor.  This is the case for shader
+ * instructions that reference the T# or V# constants, or for specified
+ * locations of assets (ex. the shader program location).  In these cases
+ * the driver is what handles the 64b to 49b conversion and the base
+ * address in the descriptor (ex. V# or T# or shader program location)
+ * is defined as a 48b address w/ an ATC bit.  For this usage a given
+ * memory object cannot straddle multiple apertures in the 64b address
+ * space. For example a shader program cannot jump in/out between ATC
+ * and GPUVM space.
+ *
+ * In some cases we wish to pass a 64b pointer to the GPU hardware and
+ * the GPU hw does the 64b to 49b conversion before passing memory
+ * requests to the cache/memory system.  This is the case for the
+ * S_LOAD and FLAT_* shader memory instructions where we have 64b pointers
+ * in scalar and vector GPRs respectively.
+ *
+ * In all cases (no matter where the 64b -> 49b conversion is done), the gfxip
+ * hardware sends a 48b address along w/ an ATC bit, to the memory controller
+ * on the memory request interfaces.
+ *
+ *	<client>_MC_rdreq_atc   // read request ATC bit
+ *
+ *		0 : <client>_MC_rdreq_addr is a GPUVM VA
+ *
+ *		1 : <client>_MC_rdreq_addr is a ATC VA
+ *
+ *
+ * “Spare” aperture (APE1)
+ *
+ * We use the GPUVM aperture to differentiate ATC vs. GPUVM, but we also use
+ * apertures to set the Mtype field for S_LOAD/FLAT_* ops which is input to the
+ * config tables for setting cache policies. The “spare” (APE1) aperture is
+ * motivated by getting a different Mtype from the default.
+ * The default aperture isn’t an actual base/limit aperture; it is just the
+ * address space that doesn’t hit any defined base/limit apertures.
+ * The following diagram is a complete picture of the gfxip7.x SUA apertures.
+ * The APE1 can be placed either below or above
+ * the hole (cannot be in the hole).
+ *
+ *
+ * General Aperture definitions and rules
+ *
+ * An aperture register definition consists of a Base, Limit, Mtype, and
+ * usually an ATC bit indicating which translation tables that aperture uses.
+ * In all cases (for SUA and DUA apertures discussed later), aperture base
+ * and limit definitions are 64KB aligned.
+ *
+ *	<ape>_Base[63:0] = { <ape>_Base_register[63:16], 0x0000 }
+ *
+ *	<ape>_Limit[63:0] = { <ape>_Limit_register[63:16], 0xFFFF }
+ *
+ * The base and limit are considered inclusive to an aperture so being
+ * inside an aperture means (address >= Base) AND (address <= Limit).
+ *
+ * In no case is a payload that straddles multiple apertures expected to work.
+ * For example a load_dword_x4 that starts in one aperture and ends in another,
+ * does not work.  For the vector FLAT_* ops we have detection capability in
+ * the shader for reporting a “memory violation” back to the
+ * SQ block for use in traps.
+ * A memory violation results when an op falls into the hole,
+ * or a payload straddles multiple apertures.  The S_LOAD instruction
+ * does not have this detection.
+ *
+ * Apertures cannot overlap.
+ *
+ *
+ *
+ * HSA32 - ATC/IOMMU 32b
+ *
+ * For HSA32 mode, the pointers are interpreted as 32 bits and use a single GPR
+ * instead of two for the S_LOAD and FLAT_* ops. The entire GPUVM space of 40b
+ * will not fit so there is only partial visibility to the GPUVM
+ * space (defined by the aperture) for S_LOAD and FLAT_* ops.
+ * There is no spare (APE1) aperture for HSA32 mode.
+ *
+ *
+ * GPUVM 64b mode (driver model)
+ *
+ * This mode is related to HSA64 in that the difference really is that
+ * the default aperture is GPUVM (ATC==0) and not ATC space.
+ * We have gfxip7.x hardware that has FLAT_* and S_LOAD support for
+ * SUA GPUVM mode, but does not support HSA32/HSA64.
+ *
+ *
+ * Device Unified Address - DUA
+ *
+ * Device unified address (DUA) is the name of the feature that maps the
+ * Shared(LDS) memory and Private(Scratch) memory into the overall address
+ * space for use by the new FLAT_* vector memory ops.  The Shared and
+ * Private memories are mapped as apertures into the address space,
+ * and the hardware detects when a FLAT_* memory request is to be redirected
+ * to the LDS or Scratch memory when it falls into one of these apertures.
+ * Like the SUA apertures, the Shared/Private apertures are 64KB aligned and
+ * the base/limit is “in” the aperture. For both HSA64 and GPUVM SUA modes,
+ * the Shared/Private apertures are always placed in a limited selection of
+ * options in the hole of the 64b address space. For HSA32 mode, the
+ * Shared/Private apertures can be placed anywhere in the 32b space
+ * except at 0.
+ *
+ *
+ * HSA64 Apertures for FLAT_* vector ops
+ *
+ * For HSA64 SUA mode, the Shared and Private apertures are always placed
+ * in the hole w/ a limited selection of possible locations. The requests
+ * that fall in the private aperture are expanded as a function of the
+ * work-item id (tid) and redirected to the location of the
+ * “hidden private memory”. The hidden private can be placed in either GPUVM
+ * or ATC space. The addresses that fall in the shared aperture are
+ * re-directed to the on-chip LDS memory hardware.
+ *
+ *
+ * HSA32 Apertures for FLAT_* vector ops
+ *
+ * In HSA32 mode, the Private and Shared apertures can be placed anywhere
+ * in the 32b space except at 0 (Private or Shared Base at zero disables
+ * the apertures). If the base address of the apertures are non-zero
+ * (ie apertures exists), the size is always 64KB.
+ *
+ *
+ * GPUVM Apertures for FLAT_* vector ops
+ *
+ * In GPUVM mode, the Shared/Private apertures are specified identically
+ * to HSA64 mode where they are always in the hole at a limited selection
+ * of locations.
+ *
+ *
+ * Aperture Definitions for SUA and DUA
+ *
+ * The interpretation of the aperture register definitions for a given
+ * VMID is a function of the “SUA Mode” which is one of HSA64, HSA32, or
+ * GPUVM64 discussed in previous sections. The mode is first decoded, and
+ * then the remaining register decode is a function of the mode.
+ *
+ *
+ * SUA Mode Decode
+ *
+ * For the S_LOAD and FLAT_* shader operations, the SUA mode is decoded from
+ * the COMPUTE_DISPATCH_INITIATOR:DATA_ATC bit and
+ * the SH_MEM_CONFIG:PTR32 bits.
+ *
+ * COMPUTE_DISPATCH_INITIATOR:DATA_ATC    SH_MEM_CONFIG:PTR32        Mode
+ *
+ * 1                                              0                  HSA64
+ *
+ * 1                                              1                  HSA32
+ *
+ * 0                                              X                 GPUVM64
+ *
+ * In general the hardware will ignore the PTR32 bit and treat
+ * as “0” whenever DATA_ATC = “0”, but sw should set PTR32=0
+ * when DATA_ATC=0.
+ *
+ * The DATA_ATC bit is only set for compute dispatches.
+ * All “Draw” dispatches are hardcoded to GPUVM64 mode
+ * for FLAT_* / S_LOAD operations.
+ */
+
+#define MAKE_GPUVM_APP_BASE(gpu_num) \
+	(((uint64_t)(gpu_num) << 61) + 0x1000000000000L)
+
+#define MAKE_GPUVM_APP_LIMIT(base) \
+	(((uint64_t)(base) & \
+		0xFFFFFF0000000000UL) | 0xFFFFFFFFFFL)
+
+#define MAKE_SCRATCH_APP_BASE(gpu_num) \
+	(((uint64_t)(gpu_num) << 61) + 0x100000000L)
+
+#define MAKE_SCRATCH_APP_LIMIT(base) \
+	(((uint64_t)base & 0xFFFFFFFF00000000UL) | 0xFFFFFFFF)
+
+#define MAKE_LDS_APP_BASE(gpu_num) \
+	(((uint64_t)(gpu_num) << 61) + 0x0)
+#define MAKE_LDS_APP_LIMIT(base) \
+	(((uint64_t)(base) & 0xFFFFFFFF00000000UL) | 0xFFFFFFFF)
+
+int kfd_init_apertures(struct kfd_process *process)
+{
+	uint8_t id  = 0;
+	struct kfd_dev *dev;
+	struct kfd_process_device *pdd;
+
+	mutex_lock(&process->mutex);
+
+	/*Iterating over all devices*/
+	while ((dev = kfd_topology_enum_kfd_devices(id)) != NULL &&
+		id < NUM_OF_SUPPORTED_GPUS) {
+
+		pdd = kfd_get_process_device_data(dev, process, 1);
+
+		/*
+		 * For 64 bit process aperture will be statically reserved in
+		 * the x86_64 non canonical process address space
+		 * amdkfd doesn't currently support apertures for 32 bit process
+		 */
+		if (process->is_32bit_user_mode) {
+			pdd->lds_base = pdd->lds_limit = 0;
+			pdd->gpuvm_base = pdd->gpuvm_limit = 0;
+			pdd->scratch_base = pdd->scratch_limit = 0;
+		} else {
+			/*
+			 * node id couldn't be 0 - the three MSB bits of
+			 * aperture shoudn't be 0
+			 */
+			pdd->lds_base = MAKE_LDS_APP_BASE(id + 1);
+
+			pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
+
+			pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
+
+			pdd->gpuvm_limit =
+					MAKE_GPUVM_APP_LIMIT(pdd->gpuvm_base);
+
+			pdd->scratch_base = MAKE_SCRATCH_APP_BASE(id + 1);
+
+			pdd->scratch_limit =
+				MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
+		}
+
+		dev_dbg(kfd_device, "node id %u\n", id);
+		dev_dbg(kfd_device, "gpu id %u\n", pdd->dev->id);
+		dev_dbg(kfd_device, "lds_base %llX\n", pdd->lds_base);
+		dev_dbg(kfd_device, "lds_limit %llX\n", pdd->lds_limit);
+		dev_dbg(kfd_device, "gpuvm_base %llX\n", pdd->gpuvm_base);
+		dev_dbg(kfd_device, "gpuvm_limit %llX\n", pdd->gpuvm_limit);
+		dev_dbg(kfd_device, "scratch_base %llX\n", pdd->scratch_base);
+		dev_dbg(kfd_device, "scratch_limit %llX\n", pdd->scratch_limit);
+
+		id++;
+	}
+
+	mutex_unlock(&process->mutex);
+
+	return 0;
+}
+
+
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
new file mode 100644
index 000000000000..5b999095a1f7
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
@@ -0,0 +1,176 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/*
+ * KFD Interrupts.
+ *
+ * AMD GPUs deliver interrupts by pushing an interrupt description onto the
+ * interrupt ring and then sending an interrupt. KGD receives the interrupt
+ * in ISR and sends us a pointer to each new entry on the interrupt ring.
+ *
+ * We generally can't process interrupt-signaled events from ISR, so we call
+ * out to each interrupt client module (currently only the scheduler) to ask if
+ * each interrupt is interesting. If they return true, then it requires further
+ * processing so we copy it to an internal interrupt ring and call each
+ * interrupt client again from a work-queue.
+ *
+ * There's no acknowledgment for the interrupts we use. The hardware simply
+ * queues a new interrupt each time without waiting.
+ *
+ * The fixed-size internal queue means that it's possible for us to lose
+ * interrupts because we have no back-pressure to the hardware.
+ */
+
+#include <linux/slab.h>
+#include <linux/device.h>
+#include "kfd_priv.h"
+
+#define KFD_INTERRUPT_RING_SIZE 256
+
+static void interrupt_wq(struct work_struct *);
+
+int kfd_interrupt_init(struct kfd_dev *kfd)
+{
+	void *interrupt_ring = kmalloc_array(KFD_INTERRUPT_RING_SIZE,
+					kfd->device_info->ih_ring_entry_size,
+					GFP_KERNEL);
+	if (!interrupt_ring)
+		return -ENOMEM;
+
+	kfd->interrupt_ring = interrupt_ring;
+	kfd->interrupt_ring_size =
+		KFD_INTERRUPT_RING_SIZE * kfd->device_info->ih_ring_entry_size;
+	atomic_set(&kfd->interrupt_ring_wptr, 0);
+	atomic_set(&kfd->interrupt_ring_rptr, 0);
+
+	spin_lock_init(&kfd->interrupt_lock);
+
+	INIT_WORK(&kfd->interrupt_work, interrupt_wq);
+
+	kfd->interrupts_active = true;
+
+	/*
+	 * After this function returns, the interrupt will be enabled. This
+	 * barrier ensures that the interrupt running on a different processor
+	 * sees all the above writes.
+	 */
+	smp_wmb();
+
+	return 0;
+}
+
+void kfd_interrupt_exit(struct kfd_dev *kfd)
+{
+	/*
+	 * Stop the interrupt handler from writing to the ring and scheduling
+	 * workqueue items. The spinlock ensures that any interrupt running
+	 * after we have unlocked sees interrupts_active = false.
+	 */
+	unsigned long flags;
+
+	spin_lock_irqsave(&kfd->interrupt_lock, flags);
+	kfd->interrupts_active = false;
+	spin_unlock_irqrestore(&kfd->interrupt_lock, flags);
+
+	/*
+	 * Flush_scheduled_work ensures that there are no outstanding
+	 * work-queue items that will access interrupt_ring. New work items
+	 * can't be created because we stopped interrupt handling above.
+	 */
+	flush_scheduled_work();
+
+	kfree(kfd->interrupt_ring);
+}
+
+/*
+ * This assumes that it can't be called concurrently with itself
+ * but only with dequeue_ih_ring_entry.
+ */
+bool enqueue_ih_ring_entry(struct kfd_dev *kfd,	const void *ih_ring_entry)
+{
+	unsigned int rptr = atomic_read(&kfd->interrupt_ring_rptr);
+	unsigned int wptr = atomic_read(&kfd->interrupt_ring_wptr);
+
+	if ((rptr - wptr) % kfd->interrupt_ring_size ==
+					kfd->device_info->ih_ring_entry_size) {
+		/* This is very bad, the system is likely to hang. */
+		dev_err_ratelimited(kfd_chardev(),
+			"Interrupt ring overflow, dropping interrupt.\n");
+		return false;
+	}
+
+	memcpy(kfd->interrupt_ring + wptr, ih_ring_entry,
+			kfd->device_info->ih_ring_entry_size);
+
+	wptr = (wptr + kfd->device_info->ih_ring_entry_size) %
+			kfd->interrupt_ring_size;
+	smp_wmb(); /* Ensure memcpy'd data is visible before wptr update. */
+	atomic_set(&kfd->interrupt_ring_wptr, wptr);
+
+	return true;
+}
+
+/*
+ * This assumes that it can't be called concurrently with itself
+ * but only with enqueue_ih_ring_entry.
+ */
+static bool dequeue_ih_ring_entry(struct kfd_dev *kfd, void *ih_ring_entry)
+{
+	/*
+	 * Assume that wait queues have an implicit barrier, i.e. anything that
+	 * happened in the ISR before it queued work is visible.
+	 */
+
+	unsigned int wptr = atomic_read(&kfd->interrupt_ring_wptr);
+	unsigned int rptr = atomic_read(&kfd->interrupt_ring_rptr);
+
+	if (rptr == wptr)
+		return false;
+
+	memcpy(ih_ring_entry, kfd->interrupt_ring + rptr,
+			kfd->device_info->ih_ring_entry_size);
+
+	rptr = (rptr + kfd->device_info->ih_ring_entry_size) %
+			kfd->interrupt_ring_size;
+
+	/*
+	 * Ensure the rptr write update is not visible until
+	 * memcpy has finished reading.
+	 */
+	smp_mb();
+	atomic_set(&kfd->interrupt_ring_rptr, rptr);
+
+	return true;
+}
+
+static void interrupt_wq(struct work_struct *work)
+{
+	struct kfd_dev *dev = container_of(work, struct kfd_dev,
+						interrupt_work);
+
+	uint32_t ih_ring_entry[DIV_ROUND_UP(
+				dev->device_info->ih_ring_entry_size,
+				sizeof(uint32_t))];
+
+	while (dequeue_ih_ring_entry(dev, ih_ring_entry))
+		;
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
new file mode 100644
index 000000000000..935071410724
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
@@ -0,0 +1,353 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/printk.h>
+#include <linux/sched.h>
+#include "kfd_kernel_queue.h"
+#include "kfd_priv.h"
+#include "kfd_device_queue_manager.h"
+#include "kfd_pm4_headers.h"
+#include "kfd_pm4_opcodes.h"
+
+#define PM4_COUNT_ZERO (((1 << 15) - 1) << 16)
+
+static bool initialize(struct kernel_queue *kq, struct kfd_dev *dev,
+		enum kfd_queue_type type, unsigned int queue_size)
+{
+	struct queue_properties prop;
+	int retval;
+	union PM4_MES_TYPE_3_HEADER nop;
+
+	BUG_ON(!kq || !dev);
+	BUG_ON(type != KFD_QUEUE_TYPE_DIQ && type != KFD_QUEUE_TYPE_HIQ);
+
+	pr_debug("kfd: In func %s initializing queue type %d size %d\n",
+			__func__, KFD_QUEUE_TYPE_HIQ, queue_size);
+
+	nop.opcode = IT_NOP;
+	nop.type = PM4_TYPE_3;
+	nop.u32all |= PM4_COUNT_ZERO;
+
+	kq->dev = dev;
+	kq->nop_packet = nop.u32all;
+	switch (type) {
+	case KFD_QUEUE_TYPE_DIQ:
+	case KFD_QUEUE_TYPE_HIQ:
+		kq->mqd = dev->dqm->get_mqd_manager(dev->dqm,
+						KFD_MQD_TYPE_CIK_HIQ);
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	if (kq->mqd == NULL)
+		return false;
+
+	prop.doorbell_ptr = kfd_get_kernel_doorbell(dev, &prop.doorbell_off);
+
+	if (prop.doorbell_ptr == NULL)
+		goto err_get_kernel_doorbell;
+
+	retval = kfd2kgd->allocate_mem(dev->kgd,
+					queue_size,
+					PAGE_SIZE,
+					KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
+					(struct kgd_mem **) &kq->pq);
+
+	if (retval != 0)
+		goto err_pq_allocate_vidmem;
+
+	kq->pq_kernel_addr = kq->pq->cpu_ptr;
+	kq->pq_gpu_addr = kq->pq->gpu_addr;
+
+	retval = kfd2kgd->allocate_mem(dev->kgd,
+					sizeof(*kq->rptr_kernel),
+					32,
+					KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
+					(struct kgd_mem **) &kq->rptr_mem);
+
+	if (retval != 0)
+		goto err_rptr_allocate_vidmem;
+
+	kq->rptr_kernel = kq->rptr_mem->cpu_ptr;
+	kq->rptr_gpu_addr = kq->rptr_mem->gpu_addr;
+
+	retval = kfd2kgd->allocate_mem(dev->kgd,
+					sizeof(*kq->wptr_kernel),
+					32,
+					KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
+					(struct kgd_mem **) &kq->wptr_mem);
+
+	if (retval != 0)
+		goto err_wptr_allocate_vidmem;
+
+	kq->wptr_kernel = kq->wptr_mem->cpu_ptr;
+	kq->wptr_gpu_addr = kq->wptr_mem->gpu_addr;
+
+	memset(kq->pq_kernel_addr, 0, queue_size);
+	memset(kq->rptr_kernel, 0, sizeof(*kq->rptr_kernel));
+	memset(kq->wptr_kernel, 0, sizeof(*kq->wptr_kernel));
+
+	prop.queue_size = queue_size;
+	prop.is_interop = false;
+	prop.priority = 1;
+	prop.queue_percent = 100;
+	prop.type = type;
+	prop.vmid = 0;
+	prop.queue_address = kq->pq_gpu_addr;
+	prop.read_ptr = (uint32_t *) kq->rptr_gpu_addr;
+	prop.write_ptr = (uint32_t *) kq->wptr_gpu_addr;
+
+	if (init_queue(&kq->queue, prop) != 0)
+		goto err_init_queue;
+
+	kq->queue->device = dev;
+	kq->queue->process = kfd_get_process(current);
+
+	retval = kq->mqd->init_mqd(kq->mqd, &kq->queue->mqd,
+					&kq->queue->mqd_mem_obj,
+					&kq->queue->gart_mqd_addr,
+					&kq->queue->properties);
+	if (retval != 0)
+		goto err_init_mqd;
+
+	/* assign HIQ to HQD */
+	if (type == KFD_QUEUE_TYPE_HIQ) {
+		pr_debug("assigning hiq to hqd\n");
+		kq->queue->pipe = KFD_CIK_HIQ_PIPE;
+		kq->queue->queue = KFD_CIK_HIQ_QUEUE;
+		kq->mqd->load_mqd(kq->mqd, kq->queue->mqd, kq->queue->pipe,
+					kq->queue->queue, NULL);
+	} else {
+		/* allocate fence for DIQ */
+
+		retval = kfd2kgd->allocate_mem(dev->kgd,
+					sizeof(uint32_t),
+					32,
+					KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
+					(struct kgd_mem **) &kq->fence_mem_obj);
+
+		if (retval != 0)
+			goto err_alloc_fence;
+
+		kq->fence_kernel_address = kq->fence_mem_obj->cpu_ptr;
+		kq->fence_gpu_addr = kq->fence_mem_obj->gpu_addr;
+	}
+
+	print_queue(kq->queue);
+
+	return true;
+err_alloc_fence:
+err_init_mqd:
+	uninit_queue(kq->queue);
+err_init_queue:
+	kfd2kgd->free_mem(dev->kgd, (struct kgd_mem *) kq->wptr_mem);
+err_wptr_allocate_vidmem:
+	kfd2kgd->free_mem(dev->kgd, (struct kgd_mem *) kq->rptr_mem);
+err_rptr_allocate_vidmem:
+	kfd2kgd->free_mem(dev->kgd, (struct kgd_mem *) kq->pq);
+err_pq_allocate_vidmem:
+	pr_err("kfd: error init pq\n");
+	kfd_release_kernel_doorbell(dev, prop.doorbell_ptr);
+err_get_kernel_doorbell:
+	pr_err("kfd: error init doorbell");
+	return false;
+
+}
+
+static void uninitialize(struct kernel_queue *kq)
+{
+	BUG_ON(!kq);
+
+	if (kq->queue->properties.type == KFD_QUEUE_TYPE_HIQ)
+		kq->mqd->destroy_mqd(kq->mqd,
+					NULL,
+					false,
+					QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS,
+					kq->queue->pipe,
+					kq->queue->queue);
+
+	kfd2kgd->free_mem(kq->dev->kgd, (struct kgd_mem *) kq->rptr_mem);
+	kfd2kgd->free_mem(kq->dev->kgd, (struct kgd_mem *) kq->wptr_mem);
+	kfd2kgd->free_mem(kq->dev->kgd, (struct kgd_mem *) kq->pq);
+	kfd_release_kernel_doorbell(kq->dev,
+					kq->queue->properties.doorbell_ptr);
+	uninit_queue(kq->queue);
+}
+
+static int acquire_packet_buffer(struct kernel_queue *kq,
+		size_t packet_size_in_dwords, unsigned int **buffer_ptr)
+{
+	size_t available_size;
+	size_t queue_size_dwords;
+	uint32_t wptr, rptr;
+	unsigned int *queue_address;
+
+	BUG_ON(!kq || !buffer_ptr);
+
+	rptr = *kq->rptr_kernel;
+	wptr = *kq->wptr_kernel;
+	queue_address = (unsigned int *)kq->pq_kernel_addr;
+	queue_size_dwords = kq->queue->properties.queue_size / sizeof(uint32_t);
+
+	pr_debug("kfd: In func %s\nrptr: %d\nwptr: %d\nqueue_address 0x%p\n",
+			__func__, rptr, wptr, queue_address);
+
+	available_size = (rptr - 1 - wptr + queue_size_dwords) %
+							queue_size_dwords;
+
+	if (packet_size_in_dwords >= queue_size_dwords ||
+			packet_size_in_dwords >= available_size) {
+		/*
+		 * make sure calling functions know
+		 * acquire_packet_buffer() failed
+		 */
+		*buffer_ptr = NULL;
+		return -ENOMEM;
+	}
+
+	if (wptr + packet_size_in_dwords >= queue_size_dwords) {
+		while (wptr > 0) {
+			queue_address[wptr] = kq->nop_packet;
+			wptr = (wptr + 1) % queue_size_dwords;
+		}
+	}
+
+	*buffer_ptr = &queue_address[wptr];
+	kq->pending_wptr = wptr + packet_size_in_dwords;
+
+	return 0;
+}
+
+static void submit_packet(struct kernel_queue *kq)
+{
+#ifdef DEBUG
+	int i;
+#endif
+
+	BUG_ON(!kq);
+
+#ifdef DEBUG
+	for (i = *kq->wptr_kernel; i < kq->pending_wptr; i++) {
+		pr_debug("0x%2X ", kq->pq_kernel_addr[i]);
+		if (i % 15 == 0)
+			pr_debug("\n");
+	}
+	pr_debug("\n");
+#endif
+
+	*kq->wptr_kernel = kq->pending_wptr;
+	write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
+				kq->pending_wptr);
+}
+
+static int sync_with_hw(struct kernel_queue *kq, unsigned long timeout_ms)
+{
+	unsigned long org_timeout_ms;
+
+	BUG_ON(!kq);
+
+	org_timeout_ms = timeout_ms;
+	timeout_ms += jiffies * 1000 / HZ;
+	while (*kq->wptr_kernel != *kq->rptr_kernel) {
+		if (time_after(jiffies * 1000 / HZ, timeout_ms)) {
+			pr_err("kfd: kernel_queue %s timeout expired %lu\n",
+				__func__, org_timeout_ms);
+			pr_err("kfd: wptr: %d rptr: %d\n",
+				*kq->wptr_kernel, *kq->rptr_kernel);
+			return -ETIME;
+		}
+		schedule();
+	}
+
+	return 0;
+}
+
+static void rollback_packet(struct kernel_queue *kq)
+{
+	BUG_ON(!kq);
+	kq->pending_wptr = *kq->queue->properties.write_ptr;
+}
+
+struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
+					enum kfd_queue_type type)
+{
+	struct kernel_queue *kq;
+
+	BUG_ON(!dev);
+
+	kq = kzalloc(sizeof(struct kernel_queue), GFP_KERNEL);
+	if (!kq)
+		return NULL;
+
+	kq->initialize = initialize;
+	kq->uninitialize = uninitialize;
+	kq->acquire_packet_buffer = acquire_packet_buffer;
+	kq->submit_packet = submit_packet;
+	kq->sync_with_hw = sync_with_hw;
+	kq->rollback_packet = rollback_packet;
+
+	if (kq->initialize(kq, dev, type, KFD_KERNEL_QUEUE_SIZE) == false) {
+		pr_err("kfd: failed to init kernel queue\n");
+		kfree(kq);
+		return NULL;
+	}
+	return kq;
+}
+
+void kernel_queue_uninit(struct kernel_queue *kq)
+{
+	BUG_ON(!kq);
+
+	kq->uninitialize(kq);
+	kfree(kq);
+}
+
+static __attribute__((unused)) void test_kq(struct kfd_dev *dev)
+{
+	struct kernel_queue *kq;
+	uint32_t *buffer, i;
+	int retval;
+
+	BUG_ON(!dev);
+
+	pr_debug("kfd: starting kernel queue test\n");
+
+	kq = kernel_queue_init(dev, KFD_QUEUE_TYPE_HIQ);
+	BUG_ON(!kq);
+
+	retval = kq->acquire_packet_buffer(kq, 5, &buffer);
+	BUG_ON(retval != 0);
+	for (i = 0; i < 5; i++)
+		buffer[i] = kq->nop_packet;
+	kq->submit_packet(kq);
+	kq->sync_with_hw(kq, 1000);
+
+	pr_debug("kfd: ending kernel queue test\n");
+}
+
+
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
new file mode 100644
index 000000000000..dcd2bdb68d44
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
@@ -0,0 +1,69 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef KFD_KERNEL_QUEUE_H_
+#define KFD_KERNEL_QUEUE_H_
+
+#include <linux/list.h>
+#include <linux/types.h>
+#include "kfd_priv.h"
+
+struct kernel_queue {
+	/* interface */
+	bool	(*initialize)(struct kernel_queue *kq, struct kfd_dev *dev,
+			enum kfd_queue_type type, unsigned int queue_size);
+	void	(*uninitialize)(struct kernel_queue *kq);
+	int	(*acquire_packet_buffer)(struct kernel_queue *kq,
+					size_t packet_size_in_dwords,
+					unsigned int **buffer_ptr);
+
+	void	(*submit_packet)(struct kernel_queue *kq);
+	int	(*sync_with_hw)(struct kernel_queue *kq,
+				unsigned long timeout_ms);
+	void	(*rollback_packet)(struct kernel_queue *kq);
+
+	/* data */
+	struct kfd_dev		*dev;
+	struct mqd_manager	*mqd;
+	struct queue		*queue;
+	uint32_t		pending_wptr;
+	unsigned int		nop_packet;
+
+	struct kfd_mem_obj	*rptr_mem;
+	uint32_t		*rptr_kernel;
+	uint64_t		rptr_gpu_addr;
+	struct kfd_mem_obj	*wptr_mem;
+	uint32_t		*wptr_kernel;
+	uint64_t		wptr_gpu_addr;
+	struct kfd_mem_obj	*pq;
+	uint64_t		pq_gpu_addr;
+	uint32_t		*pq_kernel_addr;
+
+	struct kfd_mem_obj	*fence_mem_obj;
+	uint64_t		fence_gpu_addr;
+	void			*fence_kernel_address;
+
+	struct list_head	list;
+};
+
+#endif /* KFD_KERNEL_QUEUE_H_ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
new file mode 100644
index 000000000000..95d5af138e6e
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -0,0 +1,159 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/moduleparam.h>
+#include <linux/device.h>
+#include "kfd_priv.h"
+
+#define KFD_DRIVER_AUTHOR	"AMD Inc. and others"
+
+#define KFD_DRIVER_DESC		"Standalone HSA driver for AMD's GPUs"
+#define KFD_DRIVER_DATE		"20141113"
+#define KFD_DRIVER_MAJOR	0
+#define KFD_DRIVER_MINOR	7
+#define KFD_DRIVER_PATCHLEVEL	0
+
+const struct kfd2kgd_calls *kfd2kgd;
+static const struct kgd2kfd_calls kgd2kfd = {
+	.exit		= kgd2kfd_exit,
+	.probe		= kgd2kfd_probe,
+	.device_init	= kgd2kfd_device_init,
+	.device_exit	= kgd2kfd_device_exit,
+	.interrupt	= kgd2kfd_interrupt,
+	.suspend	= kgd2kfd_suspend,
+	.resume		= kgd2kfd_resume,
+};
+
+int sched_policy = KFD_SCHED_POLICY_HWS;
+module_param(sched_policy, int, 0444);
+MODULE_PARM_DESC(sched_policy,
+	"Kernel cmdline parameter that defines the amdkfd scheduling policy");
+
+int max_num_of_processes = KFD_MAX_NUM_OF_PROCESSES_DEFAULT;
+module_param(max_num_of_processes, int, 0444);
+MODULE_PARM_DESC(max_num_of_processes,
+	"Kernel cmdline parameter that defines the amdkfd maximum number of supported processes");
+
+int max_num_of_queues_per_process = KFD_MAX_NUM_OF_QUEUES_PER_PROCESS_DEFAULT;
+module_param(max_num_of_queues_per_process, int, 0444);
+MODULE_PARM_DESC(max_num_of_queues_per_process,
+	"Kernel cmdline parameter that defines the amdkfd maximum number of supported queues per process");
+
+bool kgd2kfd_init(unsigned interface_version,
+		  const struct kfd2kgd_calls *f2g,
+		  const struct kgd2kfd_calls **g2f)
+{
+	/*
+	 * Only one interface version is supported,
+	 * no kfd/kgd version skew allowed.
+	 */
+	if (interface_version != KFD_INTERFACE_VERSION)
+		return false;
+
+	/* Protection against multiple amd kgd loads */
+	if (kfd2kgd)
+		return true;
+
+	kfd2kgd = f2g;
+	*g2f = &kgd2kfd;
+
+	return true;
+}
+EXPORT_SYMBOL(kgd2kfd_init);
+
+void kgd2kfd_exit(void)
+{
+}
+
+static int __init kfd_module_init(void)
+{
+	int err;
+
+	kfd2kgd = NULL;
+
+	/* Verify module parameters */
+	if ((sched_policy < KFD_SCHED_POLICY_HWS) ||
+		(sched_policy > KFD_SCHED_POLICY_NO_HWS)) {
+		pr_err("kfd: sched_policy has invalid value\n");
+		return -1;
+	}
+
+	/* Verify module parameters */
+	if ((max_num_of_processes < 0) ||
+		(max_num_of_processes > KFD_MAX_NUM_OF_PROCESSES)) {
+		pr_err("kfd: max_num_of_processes must be between 0 to KFD_MAX_NUM_OF_PROCESSES\n");
+		return -1;
+	}
+
+	if ((max_num_of_queues_per_process < 0) ||
+		(max_num_of_queues_per_process >
+			KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)) {
+		pr_err("kfd: max_num_of_queues_per_process must be between 0 to KFD_MAX_NUM_OF_QUEUES_PER_PROCESS\n");
+		return -1;
+	}
+
+	err = kfd_pasid_init();
+	if (err < 0)
+		goto err_pasid;
+
+	err = kfd_chardev_init();
+	if (err < 0)
+		goto err_ioctl;
+
+	err = kfd_topology_init();
+	if (err < 0)
+		goto err_topology;
+
+	kfd_process_create_wq();
+
+	dev_info(kfd_device, "Initialized module\n");
+
+	return 0;
+
+err_topology:
+	kfd_chardev_exit();
+err_ioctl:
+	kfd_pasid_exit();
+err_pasid:
+	return err;
+}
+
+static void __exit kfd_module_exit(void)
+{
+	kfd_process_destroy_wq();
+	kfd_topology_shutdown();
+	kfd_chardev_exit();
+	kfd_pasid_exit();
+	dev_info(kfd_device, "Removed module\n");
+}
+
+module_init(kfd_module_init);
+module_exit(kfd_module_exit);
+
+MODULE_AUTHOR(KFD_DRIVER_AUTHOR);
+MODULE_DESCRIPTION(KFD_DRIVER_DESC);
+MODULE_LICENSE("GPL and additional rights");
+MODULE_VERSION(__stringify(KFD_DRIVER_MAJOR) "."
+	       __stringify(KFD_DRIVER_MINOR) "."
+	       __stringify(KFD_DRIVER_PATCHLEVEL));
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
new file mode 100644
index 000000000000..adc31474e786
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
@@ -0,0 +1,346 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include "kfd_priv.h"
+#include "kfd_mqd_manager.h"
+#include "cik_regs.h"
+#include "../../radeon/cik_reg.h"
+
+inline void busy_wait(unsigned long ms)
+{
+	while (time_before(jiffies, ms))
+		cpu_relax();
+}
+
+static inline struct cik_mqd *get_mqd(void *mqd)
+{
+	return (struct cik_mqd *)mqd;
+}
+
+static int init_mqd(struct mqd_manager *mm, void **mqd,
+		struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
+		struct queue_properties *q)
+{
+	uint64_t addr;
+	struct cik_mqd *m;
+	int retval;
+
+	BUG_ON(!mm || !q || !mqd);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	retval = kfd2kgd->allocate_mem(mm->dev->kgd,
+					sizeof(struct cik_mqd),
+					256,
+					KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
+					(struct kgd_mem **) mqd_mem_obj);
+
+	if (retval != 0)
+		return -ENOMEM;
+
+	m = (struct cik_mqd *) (*mqd_mem_obj)->cpu_ptr;
+	addr = (*mqd_mem_obj)->gpu_addr;
+
+	memset(m, 0, ALIGN(sizeof(struct cik_mqd), 256));
+
+	m->header = 0xC0310800;
+	m->compute_pipelinestat_enable = 1;
+	m->compute_static_thread_mgmt_se0 = 0xFFFFFFFF;
+	m->compute_static_thread_mgmt_se1 = 0xFFFFFFFF;
+	m->compute_static_thread_mgmt_se2 = 0xFFFFFFFF;
+	m->compute_static_thread_mgmt_se3 = 0xFFFFFFFF;
+
+	/*
+	 * Make sure to use the last queue state saved on mqd when the cp
+	 * reassigns the queue, so when queue is switched on/off (e.g over
+	 * subscription or quantum timeout) the context will be consistent
+	 */
+	m->cp_hqd_persistent_state =
+				DEFAULT_CP_HQD_PERSISTENT_STATE | PRELOAD_REQ;
+
+	m->cp_mqd_control             = MQD_CONTROL_PRIV_STATE_EN;
+	m->cp_mqd_base_addr_lo        = lower_32_bits(addr);
+	m->cp_mqd_base_addr_hi        = upper_32_bits(addr);
+
+	m->cp_hqd_ib_control = DEFAULT_MIN_IB_AVAIL_SIZE | IB_ATC_EN;
+	/* Although WinKFD writes this, I suspect it should not be necessary */
+	m->cp_hqd_ib_control = IB_ATC_EN | DEFAULT_MIN_IB_AVAIL_SIZE;
+
+	m->cp_hqd_quantum = QUANTUM_EN | QUANTUM_SCALE_1MS |
+				QUANTUM_DURATION(10);
+
+	/*
+	 * Pipe Priority
+	 * Identifies the pipe relative priority when this queue is connected
+	 * to the pipeline. The pipe priority is against the GFX pipe and HP3D.
+	 * In KFD we are using a fixed pipe priority set to CS_MEDIUM.
+	 * 0 = CS_LOW (typically below GFX)
+	 * 1 = CS_MEDIUM (typically between HP3D and GFX
+	 * 2 = CS_HIGH (typically above HP3D)
+	 */
+	m->cp_hqd_pipe_priority = 1;
+	m->cp_hqd_queue_priority = 15;
+
+	*mqd = m;
+	if (gart_addr != NULL)
+		*gart_addr = addr;
+	retval = mm->update_mqd(mm, m, q);
+
+	return retval;
+}
+
+static void uninit_mqd(struct mqd_manager *mm, void *mqd,
+			struct kfd_mem_obj *mqd_mem_obj)
+{
+	BUG_ON(!mm || !mqd);
+	kfd2kgd->free_mem(mm->dev->kgd, (struct kgd_mem *) mqd_mem_obj);
+}
+
+static int load_mqd(struct mqd_manager *mm, void *mqd, uint32_t pipe_id,
+			uint32_t queue_id, uint32_t __user *wptr)
+{
+	return kfd2kgd->hqd_load(mm->dev->kgd, mqd, pipe_id, queue_id, wptr);
+
+}
+
+static int update_mqd(struct mqd_manager *mm, void *mqd,
+			struct queue_properties *q)
+{
+	struct cik_mqd *m;
+
+	BUG_ON(!mm || !q || !mqd);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	m = get_mqd(mqd);
+	m->cp_hqd_pq_control = DEFAULT_RPTR_BLOCK_SIZE |
+				DEFAULT_MIN_AVAIL_SIZE | PQ_ATC_EN;
+
+	/*
+	 * Calculating queue size which is log base 2 of actual queue size -1
+	 * dwords and another -1 for ffs
+	 */
+	m->cp_hqd_pq_control |= ffs(q->queue_size / sizeof(unsigned int))
+								- 1 - 1;
+	m->cp_hqd_pq_base_lo = lower_32_bits((uint64_t)q->queue_address >> 8);
+	m->cp_hqd_pq_base_hi = upper_32_bits((uint64_t)q->queue_address >> 8);
+	m->cp_hqd_pq_rptr_report_addr_lo = lower_32_bits((uint64_t)q->read_ptr);
+	m->cp_hqd_pq_rptr_report_addr_hi = upper_32_bits((uint64_t)q->read_ptr);
+	m->cp_hqd_pq_doorbell_control = DOORBELL_EN |
+					DOORBELL_OFFSET(q->doorbell_off);
+
+	m->cp_hqd_vmid = q->vmid;
+
+	if (q->format == KFD_QUEUE_FORMAT_AQL) {
+		m->cp_hqd_iq_rptr = AQL_ENABLE;
+		m->cp_hqd_pq_control |= NO_UPDATE_RPTR;
+	}
+
+	m->cp_hqd_active = 0;
+	q->is_active = false;
+	if (q->queue_size > 0 &&
+			q->queue_address != 0 &&
+			q->queue_percent > 0) {
+		m->cp_hqd_active = 1;
+		q->is_active = true;
+	}
+
+	return 0;
+}
+
+static int destroy_mqd(struct mqd_manager *mm, void *mqd,
+			enum kfd_preempt_type type,
+			unsigned int timeout, uint32_t pipe_id,
+			uint32_t queue_id)
+{
+	return kfd2kgd->hqd_destroy(mm->dev->kgd, type, timeout,
+					pipe_id, queue_id);
+}
+
+static bool is_occupied(struct mqd_manager *mm, void *mqd,
+			uint64_t queue_address,	uint32_t pipe_id,
+			uint32_t queue_id)
+{
+
+	return kfd2kgd->hqd_is_occupies(mm->dev->kgd, queue_address,
+					pipe_id, queue_id);
+
+}
+
+/*
+ * HIQ MQD Implementation, concrete implementation for HIQ MQD implementation.
+ * The HIQ queue in Kaveri is using the same MQD structure as all the user mode
+ * queues but with different initial values.
+ */
+
+static int init_mqd_hiq(struct mqd_manager *mm, void **mqd,
+		struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
+		struct queue_properties *q)
+{
+	uint64_t addr;
+	struct cik_mqd *m;
+	int retval;
+
+	BUG_ON(!mm || !q || !mqd || !mqd_mem_obj);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	retval = kfd2kgd->allocate_mem(mm->dev->kgd,
+					sizeof(struct cik_mqd),
+					256,
+					KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
+					(struct kgd_mem **) mqd_mem_obj);
+
+	if (retval != 0)
+		return -ENOMEM;
+
+	m = (struct cik_mqd *) (*mqd_mem_obj)->cpu_ptr;
+	addr = (*mqd_mem_obj)->gpu_addr;
+
+	memset(m, 0, ALIGN(sizeof(struct cik_mqd), 256));
+
+	m->header = 0xC0310800;
+	m->compute_pipelinestat_enable = 1;
+	m->compute_static_thread_mgmt_se0 = 0xFFFFFFFF;
+	m->compute_static_thread_mgmt_se1 = 0xFFFFFFFF;
+	m->compute_static_thread_mgmt_se2 = 0xFFFFFFFF;
+	m->compute_static_thread_mgmt_se3 = 0xFFFFFFFF;
+
+	m->cp_hqd_persistent_state = DEFAULT_CP_HQD_PERSISTENT_STATE |
+					PRELOAD_REQ;
+	m->cp_hqd_quantum = QUANTUM_EN | QUANTUM_SCALE_1MS |
+				QUANTUM_DURATION(10);
+
+	m->cp_mqd_control             = MQD_CONTROL_PRIV_STATE_EN;
+	m->cp_mqd_base_addr_lo        = lower_32_bits(addr);
+	m->cp_mqd_base_addr_hi        = upper_32_bits(addr);
+
+	m->cp_hqd_ib_control = DEFAULT_MIN_IB_AVAIL_SIZE;
+
+	/*
+	 * Pipe Priority
+	 * Identifies the pipe relative priority when this queue is connected
+	 * to the pipeline. The pipe priority is against the GFX pipe and HP3D.
+	 * In KFD we are using a fixed pipe priority set to CS_MEDIUM.
+	 * 0 = CS_LOW (typically below GFX)
+	 * 1 = CS_MEDIUM (typically between HP3D and GFX
+	 * 2 = CS_HIGH (typically above HP3D)
+	 */
+	m->cp_hqd_pipe_priority = 1;
+	m->cp_hqd_queue_priority = 15;
+
+	*mqd = m;
+	if (gart_addr)
+		*gart_addr = addr;
+	retval = mm->update_mqd(mm, m, q);
+
+	return retval;
+}
+
+static int update_mqd_hiq(struct mqd_manager *mm, void *mqd,
+				struct queue_properties *q)
+{
+	struct cik_mqd *m;
+
+	BUG_ON(!mm || !q || !mqd);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	m = get_mqd(mqd);
+	m->cp_hqd_pq_control = DEFAULT_RPTR_BLOCK_SIZE |
+				DEFAULT_MIN_AVAIL_SIZE |
+				PRIV_STATE |
+				KMD_QUEUE;
+
+	/*
+	 * Calculating queue size which is log base 2 of actual queue
+	 * size -1 dwords
+	 */
+	m->cp_hqd_pq_control |= ffs(q->queue_size / sizeof(unsigned int))
+								- 1 - 1;
+	m->cp_hqd_pq_base_lo = lower_32_bits((uint64_t)q->queue_address >> 8);
+	m->cp_hqd_pq_base_hi = upper_32_bits((uint64_t)q->queue_address >> 8);
+	m->cp_hqd_pq_rptr_report_addr_lo = lower_32_bits((uint64_t)q->read_ptr);
+	m->cp_hqd_pq_rptr_report_addr_hi = upper_32_bits((uint64_t)q->read_ptr);
+	m->cp_hqd_pq_doorbell_control = DOORBELL_EN |
+					DOORBELL_OFFSET(q->doorbell_off);
+
+	m->cp_hqd_vmid = q->vmid;
+
+	m->cp_hqd_active = 0;
+	q->is_active = false;
+	if (q->queue_size > 0 &&
+			q->queue_address != 0 &&
+			q->queue_percent > 0) {
+		m->cp_hqd_active = 1;
+		q->is_active = true;
+	}
+
+	return 0;
+}
+
+struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type,
+					struct kfd_dev *dev)
+{
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dev);
+	BUG_ON(type >= KFD_MQD_TYPE_MAX);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	mqd = kzalloc(sizeof(struct mqd_manager), GFP_KERNEL);
+	if (!mqd)
+		return NULL;
+
+	mqd->dev = dev;
+
+	switch (type) {
+	case KFD_MQD_TYPE_CIK_CP:
+	case KFD_MQD_TYPE_CIK_COMPUTE:
+		mqd->init_mqd = init_mqd;
+		mqd->uninit_mqd = uninit_mqd;
+		mqd->load_mqd = load_mqd;
+		mqd->update_mqd = update_mqd;
+		mqd->destroy_mqd = destroy_mqd;
+		mqd->is_occupied = is_occupied;
+		break;
+	case KFD_MQD_TYPE_CIK_HIQ:
+		mqd->init_mqd = init_mqd_hiq;
+		mqd->uninit_mqd = uninit_mqd;
+		mqd->load_mqd = load_mqd;
+		mqd->update_mqd = update_mqd_hiq;
+		mqd->destroy_mqd = destroy_mqd;
+		mqd->is_occupied = is_occupied;
+		break;
+	default:
+		kfree(mqd);
+		return NULL;
+	}
+
+	return mqd;
+}
+
+/* SDMA queues should be implemented here when the cp will supports them */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
new file mode 100644
index 000000000000..213a71e0b6c7
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
@@ -0,0 +1,91 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef KFD_MQD_MANAGER_H_
+#define KFD_MQD_MANAGER_H_
+
+#include "kfd_priv.h"
+
+/**
+ * struct mqd_manager
+ *
+ * @init_mqd: Allocates the mqd buffer on local gpu memory and initialize it.
+ *
+ * @load_mqd: Loads the mqd to a concrete hqd slot. Used only for no cp
+ * scheduling mode.
+ *
+ * @update_mqd: Handles a update call for the MQD
+ *
+ * @destroy_mqd: Destroys the HQD slot and by that preempt the relevant queue.
+ * Used only for no cp scheduling.
+ *
+ * @uninit_mqd: Releases the mqd buffer from local gpu memory.
+ *
+ * @is_occupied: Checks if the relevant HQD slot is occupied.
+ *
+ * @mqd_mutex: Mqd manager mutex.
+ *
+ * @dev: The kfd device structure coupled with this module.
+ *
+ * MQD stands for Memory Queue Descriptor which represents the current queue
+ * state in the memory and initiate the HQD (Hardware Queue Descriptor) state.
+ * This structure is actually a base class for the different types of MQDs
+ * structures for the variant ASICs that should be supported in the future.
+ * This base class is also contains all the MQD specific operations.
+ * Another important thing to mention is that each queue has a MQD that keeps
+ * his state (or context) after each preemption or reassignment.
+ * Basically there are a instances of the mqd manager class per MQD type per
+ * ASIC. Currently the kfd driver supports only Kaveri so there are instances
+ * per KFD_MQD_TYPE for each device.
+ *
+ */
+
+struct mqd_manager {
+	int	(*init_mqd)(struct mqd_manager *mm, void **mqd,
+			struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
+			struct queue_properties *q);
+
+	int	(*load_mqd)(struct mqd_manager *mm, void *mqd,
+				uint32_t pipe_id, uint32_t queue_id,
+				uint32_t __user *wptr);
+
+	int	(*update_mqd)(struct mqd_manager *mm, void *mqd,
+				struct queue_properties *q);
+
+	int	(*destroy_mqd)(struct mqd_manager *mm, void *mqd,
+				enum kfd_preempt_type type,
+				unsigned int timeout, uint32_t pipe_id,
+				uint32_t queue_id);
+
+	void	(*uninit_mqd)(struct mqd_manager *mm, void *mqd,
+				struct kfd_mem_obj *mqd_mem_obj);
+
+	bool	(*is_occupied)(struct mqd_manager *mm, void *mqd,
+				uint64_t queue_address,	uint32_t pipe_id,
+				uint32_t queue_id);
+
+	struct mutex	mqd_mutex;
+	struct kfd_dev	*dev;
+};
+
+#endif /* KFD_MQD_MANAGER_H_ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
new file mode 100644
index 000000000000..5ce9233d2004
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -0,0 +1,565 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/slab.h>
+#include <linux/mutex.h>
+#include "kfd_device_queue_manager.h"
+#include "kfd_kernel_queue.h"
+#include "kfd_priv.h"
+#include "kfd_pm4_headers.h"
+#include "kfd_pm4_opcodes.h"
+
+static inline void inc_wptr(unsigned int *wptr, unsigned int increment_bytes,
+				unsigned int buffer_size_bytes)
+{
+	unsigned int temp = *wptr + increment_bytes / sizeof(uint32_t);
+
+	BUG_ON((temp * sizeof(uint32_t)) > buffer_size_bytes);
+	*wptr = temp;
+}
+
+static unsigned int build_pm4_header(unsigned int opcode, size_t packet_size)
+{
+	union PM4_MES_TYPE_3_HEADER header;
+
+	header.u32all = 0;
+	header.opcode = opcode;
+	header.count = packet_size/sizeof(uint32_t) - 2;
+	header.type = PM4_TYPE_3;
+
+	return header.u32all;
+}
+
+static void pm_calc_rlib_size(struct packet_manager *pm,
+				unsigned int *rlib_size,
+				bool *over_subscription)
+{
+	unsigned int process_count, queue_count;
+
+	BUG_ON(!pm || !rlib_size || !over_subscription);
+
+	process_count = pm->dqm->processes_count;
+	queue_count = pm->dqm->queue_count;
+
+	/* check if there is over subscription*/
+	*over_subscription = false;
+	if ((process_count > 1) ||
+		queue_count > PIPE_PER_ME_CP_SCHEDULING * QUEUES_PER_PIPE) {
+		*over_subscription = true;
+		pr_debug("kfd: over subscribed runlist\n");
+	}
+
+	/* calculate run list ib allocation size */
+	*rlib_size = process_count * sizeof(struct pm4_map_process) +
+		     queue_count * sizeof(struct pm4_map_queues);
+
+	/*
+	 * Increase the allocation size in case we need a chained run list
+	 * when over subscription
+	 */
+	if (*over_subscription)
+		*rlib_size += sizeof(struct pm4_runlist);
+
+	pr_debug("kfd: runlist ib size %d\n", *rlib_size);
+}
+
+static int pm_allocate_runlist_ib(struct packet_manager *pm,
+				unsigned int **rl_buffer,
+				uint64_t *rl_gpu_buffer,
+				unsigned int *rl_buffer_size,
+				bool *is_over_subscription)
+{
+	int retval;
+
+	BUG_ON(!pm);
+	BUG_ON(pm->allocated == true);
+	BUG_ON(is_over_subscription == NULL);
+
+	pm_calc_rlib_size(pm, rl_buffer_size, is_over_subscription);
+
+	retval = kfd2kgd->allocate_mem(pm->dqm->dev->kgd,
+					*rl_buffer_size,
+					PAGE_SIZE,
+					KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
+					(struct kgd_mem **) &pm->ib_buffer_obj);
+
+	if (retval != 0) {
+		pr_err("kfd: failed to allocate runlist IB\n");
+		return retval;
+	}
+
+	*(void **)rl_buffer = pm->ib_buffer_obj->cpu_ptr;
+	*rl_gpu_buffer = pm->ib_buffer_obj->gpu_addr;
+
+	memset(*rl_buffer, 0, *rl_buffer_size);
+	pm->allocated = true;
+	return retval;
+}
+
+static int pm_create_runlist(struct packet_manager *pm, uint32_t *buffer,
+			uint64_t ib, size_t ib_size_in_dwords, bool chain)
+{
+	struct pm4_runlist *packet;
+
+	BUG_ON(!pm || !buffer || !ib);
+
+	packet = (struct pm4_runlist *)buffer;
+
+	memset(buffer, 0, sizeof(struct pm4_runlist));
+	packet->header.u32all = build_pm4_header(IT_RUN_LIST,
+						sizeof(struct pm4_runlist));
+
+	packet->bitfields4.ib_size = ib_size_in_dwords;
+	packet->bitfields4.chain = chain ? 1 : 0;
+	packet->bitfields4.offload_polling = 0;
+	packet->bitfields4.valid = 1;
+	packet->ordinal2 = lower_32_bits(ib);
+	packet->bitfields3.ib_base_hi = upper_32_bits(ib);
+
+	return 0;
+}
+
+static int pm_create_map_process(struct packet_manager *pm, uint32_t *buffer,
+				struct qcm_process_device *qpd)
+{
+	struct pm4_map_process *packet;
+	struct queue *cur;
+	uint32_t num_queues;
+
+	BUG_ON(!pm || !buffer || !qpd);
+
+	packet = (struct pm4_map_process *)buffer;
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	memset(buffer, 0, sizeof(struct pm4_map_process));
+
+	packet->header.u32all = build_pm4_header(IT_MAP_PROCESS,
+					sizeof(struct pm4_map_process));
+	packet->bitfields2.diq_enable = (qpd->is_debug) ? 1 : 0;
+	packet->bitfields2.process_quantum = 1;
+	packet->bitfields2.pasid = qpd->pqm->process->pasid;
+	packet->bitfields3.page_table_base = qpd->page_table_base;
+	packet->bitfields10.gds_size = qpd->gds_size;
+	packet->bitfields10.num_gws = qpd->num_gws;
+	packet->bitfields10.num_oac = qpd->num_oac;
+	num_queues = 0;
+	list_for_each_entry(cur, &qpd->queues_list, list)
+		num_queues++;
+	packet->bitfields10.num_queues = num_queues;
+
+	packet->sh_mem_config = qpd->sh_mem_config;
+	packet->sh_mem_bases = qpd->sh_mem_bases;
+	packet->sh_mem_ape1_base = qpd->sh_mem_ape1_base;
+	packet->sh_mem_ape1_limit = qpd->sh_mem_ape1_limit;
+
+	packet->gds_addr_lo = lower_32_bits(qpd->gds_context_area);
+	packet->gds_addr_hi = upper_32_bits(qpd->gds_context_area);
+
+	return 0;
+}
+
+static int pm_create_map_queue(struct packet_manager *pm, uint32_t *buffer,
+				struct queue *q)
+{
+	struct pm4_map_queues *packet;
+
+	BUG_ON(!pm || !buffer || !q);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	packet = (struct pm4_map_queues *)buffer;
+	memset(buffer, 0, sizeof(struct pm4_map_queues));
+
+	packet->header.u32all = build_pm4_header(IT_MAP_QUEUES,
+						sizeof(struct pm4_map_queues));
+	packet->bitfields2.alloc_format =
+				alloc_format__mes_map_queues__one_per_pipe;
+	packet->bitfields2.num_queues = 1;
+	packet->bitfields2.queue_sel =
+		queue_sel__mes_map_queues__map_to_hws_determined_queue_slots;
+
+	packet->bitfields2.vidmem = (q->properties.is_interop) ?
+			vidmem__mes_map_queues__uses_video_memory :
+			vidmem__mes_map_queues__uses_no_video_memory;
+
+	switch (q->properties.type) {
+	case KFD_QUEUE_TYPE_COMPUTE:
+	case KFD_QUEUE_TYPE_DIQ:
+		packet->bitfields2.engine_sel =
+				engine_sel__mes_map_queues__compute;
+		break;
+	case KFD_QUEUE_TYPE_SDMA:
+		packet->bitfields2.engine_sel =
+				engine_sel__mes_map_queues__sdma0;
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	packet->mes_map_queues_ordinals[0].bitfields3.doorbell_offset =
+			q->properties.doorbell_off;
+
+	packet->mes_map_queues_ordinals[0].mqd_addr_lo =
+			lower_32_bits(q->gart_mqd_addr);
+
+	packet->mes_map_queues_ordinals[0].mqd_addr_hi =
+			upper_32_bits(q->gart_mqd_addr);
+
+	packet->mes_map_queues_ordinals[0].wptr_addr_lo =
+			lower_32_bits((uint64_t)q->properties.write_ptr);
+
+	packet->mes_map_queues_ordinals[0].wptr_addr_hi =
+			upper_32_bits((uint64_t)q->properties.write_ptr);
+
+	return 0;
+}
+
+static int pm_create_runlist_ib(struct packet_manager *pm,
+				struct list_head *queues,
+				uint64_t *rl_gpu_addr,
+				size_t *rl_size_bytes)
+{
+	unsigned int alloc_size_bytes;
+	unsigned int *rl_buffer, rl_wptr, i;
+	int retval, proccesses_mapped;
+	struct device_process_node *cur;
+	struct qcm_process_device *qpd;
+	struct queue *q;
+	struct kernel_queue *kq;
+	bool is_over_subscription;
+
+	BUG_ON(!pm || !queues || !rl_size_bytes || !rl_gpu_addr);
+
+	rl_wptr = retval = proccesses_mapped = 0;
+
+	retval = pm_allocate_runlist_ib(pm, &rl_buffer, rl_gpu_addr,
+				&alloc_size_bytes, &is_over_subscription);
+	if (retval != 0)
+		return retval;
+
+	*rl_size_bytes = alloc_size_bytes;
+
+	pr_debug("kfd: In func %s\n", __func__);
+	pr_debug("kfd: building runlist ib process count: %d queues count %d\n",
+		pm->dqm->processes_count, pm->dqm->queue_count);
+
+	/* build the run list ib packet */
+	list_for_each_entry(cur, queues, list) {
+		qpd = cur->qpd;
+		/* build map process packet */
+		if (proccesses_mapped >= pm->dqm->processes_count) {
+			pr_debug("kfd: not enough space left in runlist IB\n");
+			pm_release_ib(pm);
+			return -ENOMEM;
+		}
+		retval = pm_create_map_process(pm, &rl_buffer[rl_wptr], qpd);
+		if (retval != 0)
+			return retval;
+		proccesses_mapped++;
+		inc_wptr(&rl_wptr, sizeof(struct pm4_map_process),
+				alloc_size_bytes);
+
+		list_for_each_entry(kq, &qpd->priv_queue_list, list) {
+			if (kq->queue->properties.is_active != true)
+				continue;
+			retval = pm_create_map_queue(pm, &rl_buffer[rl_wptr],
+							kq->queue);
+			if (retval != 0)
+				return retval;
+			inc_wptr(&rl_wptr, sizeof(struct pm4_map_queues),
+					alloc_size_bytes);
+		}
+
+		list_for_each_entry(q, &qpd->queues_list, list) {
+			if (q->properties.is_active != true)
+				continue;
+			retval = pm_create_map_queue(pm,
+						&rl_buffer[rl_wptr], q);
+			if (retval != 0)
+				return retval;
+			inc_wptr(&rl_wptr, sizeof(struct pm4_map_queues),
+					alloc_size_bytes);
+		}
+	}
+
+	pr_debug("kfd: finished map process and queues to runlist\n");
+
+	if (is_over_subscription)
+		pm_create_runlist(pm, &rl_buffer[rl_wptr], *rl_gpu_addr,
+				alloc_size_bytes / sizeof(uint32_t), true);
+
+	for (i = 0; i < alloc_size_bytes / sizeof(uint32_t); i++)
+		pr_debug("0x%2X ", rl_buffer[i]);
+	pr_debug("\n");
+
+	return 0;
+}
+
+int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm)
+{
+	BUG_ON(!dqm);
+
+	pm->dqm = dqm;
+	mutex_init(&pm->lock);
+	pm->priv_queue = kernel_queue_init(dqm->dev, KFD_QUEUE_TYPE_HIQ);
+	if (pm->priv_queue == NULL) {
+		mutex_destroy(&pm->lock);
+		return -ENOMEM;
+	}
+	pm->allocated = false;
+
+	return 0;
+}
+
+void pm_uninit(struct packet_manager *pm)
+{
+	BUG_ON(!pm);
+
+	mutex_destroy(&pm->lock);
+	kernel_queue_uninit(pm->priv_queue);
+}
+
+int pm_send_set_resources(struct packet_manager *pm,
+				struct scheduling_resources *res)
+{
+	struct pm4_set_resources *packet;
+
+	BUG_ON(!pm || !res);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	mutex_lock(&pm->lock);
+	pm->priv_queue->acquire_packet_buffer(pm->priv_queue,
+					sizeof(*packet) / sizeof(uint32_t),
+			(unsigned int **)&packet);
+	if (packet == NULL) {
+		mutex_unlock(&pm->lock);
+		pr_err("kfd: failed to allocate buffer on kernel queue\n");
+		return -ENOMEM;
+	}
+
+	memset(packet, 0, sizeof(struct pm4_set_resources));
+	packet->header.u32all = build_pm4_header(IT_SET_RESOURCES,
+					sizeof(struct pm4_set_resources));
+
+	packet->bitfields2.queue_type =
+			queue_type__mes_set_resources__hsa_interface_queue_hiq;
+	packet->bitfields2.vmid_mask = res->vmid_mask;
+	packet->bitfields2.unmap_latency = KFD_UNMAP_LATENCY;
+	packet->bitfields7.oac_mask = res->oac_mask;
+	packet->bitfields8.gds_heap_base = res->gds_heap_base;
+	packet->bitfields8.gds_heap_size = res->gds_heap_size;
+
+	packet->gws_mask_lo = lower_32_bits(res->gws_mask);
+	packet->gws_mask_hi = upper_32_bits(res->gws_mask);
+
+	packet->queue_mask_lo = lower_32_bits(res->queue_mask);
+	packet->queue_mask_hi = upper_32_bits(res->queue_mask);
+
+	pm->priv_queue->submit_packet(pm->priv_queue);
+	pm->priv_queue->sync_with_hw(pm->priv_queue, KFD_HIQ_TIMEOUT);
+
+	mutex_unlock(&pm->lock);
+
+	return 0;
+}
+
+int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues)
+{
+	uint64_t rl_gpu_ib_addr;
+	uint32_t *rl_buffer;
+	size_t rl_ib_size, packet_size_dwords;
+	int retval;
+
+	BUG_ON(!pm || !dqm_queues);
+
+	retval = pm_create_runlist_ib(pm, dqm_queues, &rl_gpu_ib_addr,
+					&rl_ib_size);
+	if (retval != 0)
+		goto fail_create_runlist_ib;
+
+	pr_debug("kfd: runlist IB address: 0x%llX\n", rl_gpu_ib_addr);
+
+	packet_size_dwords = sizeof(struct pm4_runlist) / sizeof(uint32_t);
+	mutex_lock(&pm->lock);
+
+	retval = pm->priv_queue->acquire_packet_buffer(pm->priv_queue,
+					packet_size_dwords, &rl_buffer);
+	if (retval != 0)
+		goto fail_acquire_packet_buffer;
+
+	retval = pm_create_runlist(pm, rl_buffer, rl_gpu_ib_addr,
+					rl_ib_size / sizeof(uint32_t), false);
+	if (retval != 0)
+		goto fail_create_runlist;
+
+	pm->priv_queue->submit_packet(pm->priv_queue);
+	pm->priv_queue->sync_with_hw(pm->priv_queue, KFD_HIQ_TIMEOUT);
+
+	mutex_unlock(&pm->lock);
+
+	return retval;
+
+fail_create_runlist:
+	pm->priv_queue->rollback_packet(pm->priv_queue);
+fail_acquire_packet_buffer:
+	mutex_unlock(&pm->lock);
+fail_create_runlist_ib:
+	if (pm->allocated == true)
+		pm_release_ib(pm);
+	return retval;
+}
+
+int pm_send_query_status(struct packet_manager *pm, uint64_t fence_address,
+			uint32_t fence_value)
+{
+	int retval;
+	struct pm4_query_status *packet;
+
+	BUG_ON(!pm || !fence_address);
+
+	mutex_lock(&pm->lock);
+	retval = pm->priv_queue->acquire_packet_buffer(
+			pm->priv_queue,
+			sizeof(struct pm4_query_status) / sizeof(uint32_t),
+			(unsigned int **)&packet);
+	if (retval != 0)
+		goto fail_acquire_packet_buffer;
+
+	packet->header.u32all = build_pm4_header(IT_QUERY_STATUS,
+					sizeof(struct pm4_query_status));
+
+	packet->bitfields2.context_id = 0;
+	packet->bitfields2.interrupt_sel =
+			interrupt_sel__mes_query_status__completion_status;
+	packet->bitfields2.command =
+			command__mes_query_status__fence_only_after_write_ack;
+
+	packet->addr_hi = upper_32_bits((uint64_t)fence_address);
+	packet->addr_lo = lower_32_bits((uint64_t)fence_address);
+	packet->data_hi = upper_32_bits((uint64_t)fence_value);
+	packet->data_lo = lower_32_bits((uint64_t)fence_value);
+
+	pm->priv_queue->submit_packet(pm->priv_queue);
+	pm->priv_queue->sync_with_hw(pm->priv_queue, KFD_HIQ_TIMEOUT);
+	mutex_unlock(&pm->lock);
+
+	return 0;
+
+fail_acquire_packet_buffer:
+	mutex_unlock(&pm->lock);
+	return retval;
+}
+
+int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type,
+			enum kfd_preempt_type_filter mode,
+			uint32_t filter_param, bool reset,
+			unsigned int sdma_engine)
+{
+	int retval;
+	uint32_t *buffer;
+	struct pm4_unmap_queues *packet;
+
+	BUG_ON(!pm);
+
+	mutex_lock(&pm->lock);
+	retval = pm->priv_queue->acquire_packet_buffer(
+			pm->priv_queue,
+			sizeof(struct pm4_unmap_queues) / sizeof(uint32_t),
+			&buffer);
+	if (retval != 0)
+		goto err_acquire_packet_buffer;
+
+	packet = (struct pm4_unmap_queues *)buffer;
+	memset(buffer, 0, sizeof(struct pm4_unmap_queues));
+
+	packet->header.u32all = build_pm4_header(IT_UNMAP_QUEUES,
+					sizeof(struct pm4_unmap_queues));
+	switch (type) {
+	case KFD_QUEUE_TYPE_COMPUTE:
+	case KFD_QUEUE_TYPE_DIQ:
+		packet->bitfields2.engine_sel =
+			engine_sel__mes_unmap_queues__compute;
+		break;
+	case KFD_QUEUE_TYPE_SDMA:
+		packet->bitfields2.engine_sel =
+			engine_sel__mes_unmap_queues__sdma0 + sdma_engine;
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	if (reset)
+		packet->bitfields2.action =
+				action__mes_unmap_queues__reset_queues;
+	else
+		packet->bitfields2.action =
+				action__mes_unmap_queues__preempt_queues;
+
+	switch (mode) {
+	case KFD_PREEMPT_TYPE_FILTER_SINGLE_QUEUE:
+		packet->bitfields2.queue_sel =
+				queue_sel__mes_unmap_queues__perform_request_on_specified_queues;
+		packet->bitfields2.num_queues = 1;
+		packet->bitfields3b.doorbell_offset0 = filter_param;
+		break;
+	case KFD_PREEMPT_TYPE_FILTER_BY_PASID:
+		packet->bitfields2.queue_sel =
+				queue_sel__mes_unmap_queues__perform_request_on_pasid_queues;
+		packet->bitfields3a.pasid = filter_param;
+		break;
+	case KFD_PREEMPT_TYPE_FILTER_ALL_QUEUES:
+		packet->bitfields2.queue_sel =
+				queue_sel__mes_unmap_queues__perform_request_on_all_active_queues;
+		break;
+	default:
+		BUG();
+		break;
+	};
+
+	pm->priv_queue->submit_packet(pm->priv_queue);
+	pm->priv_queue->sync_with_hw(pm->priv_queue, KFD_HIQ_TIMEOUT);
+
+	mutex_unlock(&pm->lock);
+	return 0;
+
+err_acquire_packet_buffer:
+	mutex_unlock(&pm->lock);
+	return retval;
+}
+
+void pm_release_ib(struct packet_manager *pm)
+{
+	BUG_ON(!pm);
+
+	mutex_lock(&pm->lock);
+	if (pm->allocated) {
+		kfd2kgd->free_mem(pm->dqm->dev->kgd,
+				(struct kgd_mem *) pm->ib_buffer_obj);
+		pm->allocated = false;
+	}
+	mutex_unlock(&pm->lock);
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c b/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c
new file mode 100644
index 000000000000..71699ad97d74
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c
@@ -0,0 +1,96 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/slab.h>
+#include <linux/types.h>
+#include "kfd_priv.h"
+
+static unsigned long *pasid_bitmap;
+static unsigned int pasid_limit;
+static DEFINE_MUTEX(pasid_mutex);
+
+int kfd_pasid_init(void)
+{
+	pasid_limit = max_num_of_processes;
+
+	pasid_bitmap = kzalloc(BITS_TO_LONGS(pasid_limit), GFP_KERNEL);
+	if (!pasid_bitmap)
+		return -ENOMEM;
+
+	set_bit(0, pasid_bitmap); /* PASID 0 is reserved. */
+
+	return 0;
+}
+
+void kfd_pasid_exit(void)
+{
+	kfree(pasid_bitmap);
+}
+
+bool kfd_set_pasid_limit(unsigned int new_limit)
+{
+	if (new_limit < pasid_limit) {
+		bool ok;
+
+		mutex_lock(&pasid_mutex);
+
+		/* ensure that no pasids >= new_limit are in-use */
+		ok = (find_next_bit(pasid_bitmap, pasid_limit, new_limit) ==
+								pasid_limit);
+		if (ok)
+			pasid_limit = new_limit;
+
+		mutex_unlock(&pasid_mutex);
+
+		return ok;
+	}
+
+	return true;
+}
+
+inline unsigned int kfd_get_pasid_limit(void)
+{
+	return pasid_limit;
+}
+
+unsigned int kfd_pasid_alloc(void)
+{
+	unsigned int found;
+
+	mutex_lock(&pasid_mutex);
+
+	found = find_first_zero_bit(pasid_bitmap, pasid_limit);
+	if (found == pasid_limit)
+		found = 0;
+	else
+		set_bit(found, pasid_bitmap);
+
+	mutex_unlock(&pasid_mutex);
+
+	return found;
+}
+
+void kfd_pasid_free(unsigned int pasid)
+{
+	BUG_ON(pasid == 0 || pasid >= pasid_limit);
+	clear_bit(pasid, pasid_bitmap);
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers.h b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers.h
new file mode 100644
index 000000000000..071ad5724bd2
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers.h
@@ -0,0 +1,405 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef KFD_PM4_HEADERS_H_
+#define KFD_PM4_HEADERS_H_
+
+#ifndef PM4_MES_HEADER_DEFINED
+#define PM4_MES_HEADER_DEFINED
+union PM4_MES_TYPE_3_HEADER {
+	struct {
+		uint32_t reserved1:8;	/* < reserved */
+		uint32_t opcode:8;	/* < IT opcode */
+		uint32_t count:14;	/* < number of DWORDs - 1
+					 * in the information body.
+					 */
+		uint32_t type:2;	/* < packet identifier.
+					 * It should be 3 for type 3 packets
+					 */
+	};
+	uint32_t u32all;
+};
+#endif /* PM4_MES_HEADER_DEFINED */
+
+/* --------------------MES_SET_RESOURCES-------------------- */
+
+#ifndef PM4_MES_SET_RESOURCES_DEFINED
+#define PM4_MES_SET_RESOURCES_DEFINED
+enum set_resources_queue_type_enum {
+	queue_type__mes_set_resources__kernel_interface_queue_kiq = 0,
+	queue_type__mes_set_resources__hsa_interface_queue_hiq = 1,
+	queue_type__mes_set_resources__hsa_debug_interface_queue = 4
+};
+
+struct pm4_set_resources {
+	union {
+		union PM4_MES_TYPE_3_HEADER header;	/* header */
+		uint32_t ordinal1;
+	};
+
+	union {
+		struct {
+			uint32_t vmid_mask:16;
+			uint32_t unmap_latency:8;
+			uint32_t reserved1:5;
+			enum set_resources_queue_type_enum queue_type:3;
+		} bitfields2;
+		uint32_t ordinal2;
+	};
+
+	uint32_t queue_mask_lo;
+	uint32_t queue_mask_hi;
+	uint32_t gws_mask_lo;
+	uint32_t gws_mask_hi;
+
+	union {
+		struct {
+			uint32_t oac_mask:16;
+			uint32_t reserved2:16;
+		} bitfields7;
+		uint32_t ordinal7;
+	};
+
+	union {
+		struct {
+			uint32_t gds_heap_base:6;
+			uint32_t reserved3:5;
+			uint32_t gds_heap_size:6;
+			uint32_t reserved4:15;
+		} bitfields8;
+		uint32_t ordinal8;
+	};
+
+};
+#endif
+
+/*--------------------MES_RUN_LIST-------------------- */
+
+#ifndef PM4_MES_RUN_LIST_DEFINED
+#define PM4_MES_RUN_LIST_DEFINED
+
+struct pm4_runlist {
+	union {
+		union PM4_MES_TYPE_3_HEADER header;	/* header */
+		uint32_t ordinal1;
+	};
+
+	union {
+		struct {
+			uint32_t reserved1:2;
+			uint32_t ib_base_lo:30;
+		} bitfields2;
+		uint32_t ordinal2;
+	};
+
+	union {
+		struct {
+			uint32_t ib_base_hi:16;
+			uint32_t reserved2:16;
+		} bitfields3;
+		uint32_t ordinal3;
+	};
+
+	union {
+		struct {
+			uint32_t ib_size:20;
+			uint32_t chain:1;
+			uint32_t offload_polling:1;
+			uint32_t reserved3:1;
+			uint32_t valid:1;
+			uint32_t reserved4:8;
+		} bitfields4;
+		uint32_t ordinal4;
+	};
+
+};
+#endif
+
+/*--------------------MES_MAP_PROCESS-------------------- */
+
+#ifndef PM4_MES_MAP_PROCESS_DEFINED
+#define PM4_MES_MAP_PROCESS_DEFINED
+
+struct pm4_map_process {
+	union {
+		union PM4_MES_TYPE_3_HEADER header;	/* header */
+		uint32_t ordinal1;
+	};
+
+	union {
+		struct {
+			uint32_t pasid:16;
+			uint32_t reserved1:8;
+			uint32_t diq_enable:1;
+			uint32_t process_quantum:7;
+		} bitfields2;
+		uint32_t ordinal2;
+	};
+
+	union {
+		struct {
+			uint32_t page_table_base:28;
+			uint32_t reserved3:4;
+		} bitfields3;
+		uint32_t ordinal3;
+	};
+
+	uint32_t sh_mem_bases;
+	uint32_t sh_mem_ape1_base;
+	uint32_t sh_mem_ape1_limit;
+	uint32_t sh_mem_config;
+	uint32_t gds_addr_lo;
+	uint32_t gds_addr_hi;
+
+	union {
+		struct {
+			uint32_t num_gws:6;
+			uint32_t reserved4:2;
+			uint32_t num_oac:4;
+			uint32_t reserved5:4;
+			uint32_t gds_size:6;
+			uint32_t num_queues:10;
+		} bitfields10;
+		uint32_t ordinal10;
+	};
+
+};
+#endif
+
+/*--------------------MES_MAP_QUEUES--------------------*/
+
+#ifndef PM4_MES_MAP_QUEUES_DEFINED
+#define PM4_MES_MAP_QUEUES_DEFINED
+enum map_queues_queue_sel_enum {
+	queue_sel__mes_map_queues__map_to_specified_queue_slots = 0,
+	queue_sel__mes_map_queues__map_to_hws_determined_queue_slots = 1,
+	queue_sel__mes_map_queues__enable_process_queues = 2
+};
+
+enum map_queues_vidmem_enum {
+	vidmem__mes_map_queues__uses_no_video_memory = 0,
+	vidmem__mes_map_queues__uses_video_memory = 1
+};
+
+enum map_queues_alloc_format_enum {
+	alloc_format__mes_map_queues__one_per_pipe = 0,
+	alloc_format__mes_map_queues__all_on_one_pipe = 1
+};
+
+enum map_queues_engine_sel_enum {
+	engine_sel__mes_map_queues__compute = 0,
+	engine_sel__mes_map_queues__sdma0 = 2,
+	engine_sel__mes_map_queues__sdma1 = 3
+};
+
+struct pm4_map_queues {
+	union {
+		union PM4_MES_TYPE_3_HEADER header;	/* header */
+		uint32_t ordinal1;
+	};
+
+	union {
+		struct {
+			uint32_t reserved1:4;
+			enum map_queues_queue_sel_enum queue_sel:2;
+			uint32_t reserved2:2;
+			uint32_t vmid:4;
+			uint32_t reserved3:4;
+			enum map_queues_vidmem_enum vidmem:2;
+			uint32_t reserved4:6;
+			enum map_queues_alloc_format_enum alloc_format:2;
+			enum map_queues_engine_sel_enum engine_sel:3;
+			uint32_t num_queues:3;
+		} bitfields2;
+		uint32_t ordinal2;
+	};
+
+	struct {
+		union {
+			struct {
+				uint32_t reserved5:2;
+				uint32_t doorbell_offset:21;
+				uint32_t reserved6:3;
+				uint32_t queue:6;
+			} bitfields3;
+			uint32_t ordinal3;
+		};
+
+		uint32_t mqd_addr_lo;
+		uint32_t mqd_addr_hi;
+		uint32_t wptr_addr_lo;
+		uint32_t wptr_addr_hi;
+
+	} mes_map_queues_ordinals[1];	/* 1..N of these ordinal groups */
+
+};
+#endif
+
+/*--------------------MES_QUERY_STATUS--------------------*/
+
+#ifndef PM4_MES_QUERY_STATUS_DEFINED
+#define PM4_MES_QUERY_STATUS_DEFINED
+enum query_status_interrupt_sel_enum {
+	interrupt_sel__mes_query_status__completion_status = 0,
+	interrupt_sel__mes_query_status__process_status = 1,
+	interrupt_sel__mes_query_status__queue_status = 2
+};
+
+enum query_status_command_enum {
+	command__mes_query_status__interrupt_only = 0,
+	command__mes_query_status__fence_only_immediate = 1,
+	command__mes_query_status__fence_only_after_write_ack = 2,
+	command__mes_query_status__fence_wait_for_write_ack_send_interrupt = 3
+};
+
+enum query_status_engine_sel_enum {
+	engine_sel__mes_query_status__compute = 0,
+	engine_sel__mes_query_status__sdma0_queue = 2,
+	engine_sel__mes_query_status__sdma1_queue = 3
+};
+
+struct pm4_query_status {
+	union {
+		union PM4_MES_TYPE_3_HEADER header;	/* header */
+		uint32_t ordinal1;
+	};
+
+	union {
+		struct {
+			uint32_t context_id:28;
+			enum query_status_interrupt_sel_enum interrupt_sel:2;
+			enum query_status_command_enum command:2;
+		} bitfields2;
+		uint32_t ordinal2;
+	};
+
+	union {
+		struct {
+			uint32_t pasid:16;
+			uint32_t reserved1:16;
+		} bitfields3a;
+		struct {
+			uint32_t reserved2:2;
+			uint32_t doorbell_offset:21;
+			uint32_t reserved3:3;
+			enum query_status_engine_sel_enum engine_sel:3;
+			uint32_t reserved4:3;
+		} bitfields3b;
+		uint32_t ordinal3;
+	};
+
+	uint32_t addr_lo;
+	uint32_t addr_hi;
+	uint32_t data_lo;
+	uint32_t data_hi;
+};
+#endif
+
+/*--------------------MES_UNMAP_QUEUES--------------------*/
+
+#ifndef PM4_MES_UNMAP_QUEUES_DEFINED
+#define PM4_MES_UNMAP_QUEUES_DEFINED
+enum unmap_queues_action_enum {
+	action__mes_unmap_queues__preempt_queues = 0,
+	action__mes_unmap_queues__reset_queues = 1,
+	action__mes_unmap_queues__disable_process_queues = 2
+};
+
+enum unmap_queues_queue_sel_enum {
+	queue_sel__mes_unmap_queues__perform_request_on_specified_queues = 0,
+	queue_sel__mes_unmap_queues__perform_request_on_pasid_queues = 1,
+	queue_sel__mes_unmap_queues__perform_request_on_all_active_queues = 2
+};
+
+enum unmap_queues_engine_sel_enum {
+	engine_sel__mes_unmap_queues__compute = 0,
+	engine_sel__mes_unmap_queues__sdma0 = 2,
+	engine_sel__mes_unmap_queues__sdma1 = 3
+};
+
+struct pm4_unmap_queues {
+	union {
+		union PM4_MES_TYPE_3_HEADER header;	/* header */
+		uint32_t ordinal1;
+	};
+
+	union {
+		struct {
+			enum unmap_queues_action_enum action:2;
+			uint32_t reserved1:2;
+			enum unmap_queues_queue_sel_enum queue_sel:2;
+			uint32_t reserved2:20;
+			enum unmap_queues_engine_sel_enum engine_sel:3;
+			uint32_t num_queues:3;
+		} bitfields2;
+		uint32_t ordinal2;
+	};
+
+	union {
+		struct {
+			uint32_t pasid:16;
+			uint32_t reserved3:16;
+		} bitfields3a;
+		struct {
+			uint32_t reserved4:2;
+			uint32_t doorbell_offset0:21;
+			uint32_t reserved5:9;
+		} bitfields3b;
+		uint32_t ordinal3;
+	};
+
+	union {
+		struct {
+			uint32_t reserved6:2;
+			uint32_t doorbell_offset1:21;
+			uint32_t reserved7:9;
+		} bitfields4;
+		uint32_t ordinal4;
+	};
+
+	union {
+		struct {
+			uint32_t reserved8:2;
+			uint32_t doorbell_offset2:21;
+			uint32_t reserved9:9;
+		} bitfields5;
+		uint32_t ordinal5;
+	};
+
+	union {
+		struct {
+			uint32_t reserved10:2;
+			uint32_t doorbell_offset3:21;
+			uint32_t reserved11:9;
+		} bitfields6;
+		uint32_t ordinal6;
+	};
+
+};
+#endif
+
+enum {
+	CACHE_FLUSH_AND_INV_TS_EVENT = 0x00000014
+};
+
+#endif /* KFD_PM4_HEADERS_H_ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pm4_opcodes.h b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_opcodes.h
new file mode 100644
index 000000000000..b72fa3b8c2d4
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_opcodes.h
@@ -0,0 +1,107 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+
+#ifndef KFD_PM4_OPCODES_H
+#define KFD_PM4_OPCODES_H
+
+enum it_opcode_type {
+	IT_NOP                               = 0x10,
+	IT_SET_BASE                          = 0x11,
+	IT_CLEAR_STATE                       = 0x12,
+	IT_INDEX_BUFFER_SIZE                 = 0x13,
+	IT_DISPATCH_DIRECT                   = 0x15,
+	IT_DISPATCH_INDIRECT                 = 0x16,
+	IT_ATOMIC_GDS                        = 0x1D,
+	IT_OCCLUSION_QUERY                   = 0x1F,
+	IT_SET_PREDICATION                   = 0x20,
+	IT_REG_RMW                           = 0x21,
+	IT_COND_EXEC                         = 0x22,
+	IT_PRED_EXEC                         = 0x23,
+	IT_DRAW_INDIRECT                     = 0x24,
+	IT_DRAW_INDEX_INDIRECT               = 0x25,
+	IT_INDEX_BASE                        = 0x26,
+	IT_DRAW_INDEX_2                      = 0x27,
+	IT_CONTEXT_CONTROL                   = 0x28,
+	IT_INDEX_TYPE                        = 0x2A,
+	IT_DRAW_INDIRECT_MULTI               = 0x2C,
+	IT_DRAW_INDEX_AUTO                   = 0x2D,
+	IT_NUM_INSTANCES                     = 0x2F,
+	IT_DRAW_INDEX_MULTI_AUTO             = 0x30,
+	IT_INDIRECT_BUFFER_CNST              = 0x33,
+	IT_STRMOUT_BUFFER_UPDATE             = 0x34,
+	IT_DRAW_INDEX_OFFSET_2               = 0x35,
+	IT_DRAW_PREAMBLE                     = 0x36,
+	IT_WRITE_DATA                        = 0x37,
+	IT_DRAW_INDEX_INDIRECT_MULTI         = 0x38,
+	IT_MEM_SEMAPHORE                     = 0x39,
+	IT_COPY_DW                           = 0x3B,
+	IT_WAIT_REG_MEM                      = 0x3C,
+	IT_INDIRECT_BUFFER                   = 0x3F,
+	IT_COPY_DATA                         = 0x40,
+	IT_PFP_SYNC_ME                       = 0x42,
+	IT_SURFACE_SYNC                      = 0x43,
+	IT_COND_WRITE                        = 0x45,
+	IT_EVENT_WRITE                       = 0x46,
+	IT_EVENT_WRITE_EOP                   = 0x47,
+	IT_EVENT_WRITE_EOS                   = 0x48,
+	IT_RELEASE_MEM                       = 0x49,
+	IT_PREAMBLE_CNTL                     = 0x4A,
+	IT_DMA_DATA                          = 0x50,
+	IT_ACQUIRE_MEM                       = 0x58,
+	IT_REWIND                            = 0x59,
+	IT_LOAD_UCONFIG_REG                  = 0x5E,
+	IT_LOAD_SH_REG                       = 0x5F,
+	IT_LOAD_CONFIG_REG                   = 0x60,
+	IT_LOAD_CONTEXT_REG                  = 0x61,
+	IT_SET_CONFIG_REG                    = 0x68,
+	IT_SET_CONTEXT_REG                   = 0x69,
+	IT_SET_CONTEXT_REG_INDIRECT          = 0x73,
+	IT_SET_SH_REG                        = 0x76,
+	IT_SET_SH_REG_OFFSET                 = 0x77,
+	IT_SET_QUEUE_REG                     = 0x78,
+	IT_SET_UCONFIG_REG                   = 0x79,
+	IT_SCRATCH_RAM_WRITE                 = 0x7D,
+	IT_SCRATCH_RAM_READ                  = 0x7E,
+	IT_LOAD_CONST_RAM                    = 0x80,
+	IT_WRITE_CONST_RAM                   = 0x81,
+	IT_DUMP_CONST_RAM                    = 0x83,
+	IT_INCREMENT_CE_COUNTER              = 0x84,
+	IT_INCREMENT_DE_COUNTER              = 0x85,
+	IT_WAIT_ON_CE_COUNTER                = 0x86,
+	IT_WAIT_ON_DE_COUNTER_DIFF           = 0x88,
+	IT_SWITCH_BUFFER                     = 0x8B,
+	IT_SET_RESOURCES                     = 0xA0,
+	IT_MAP_PROCESS                       = 0xA1,
+	IT_MAP_QUEUES                        = 0xA2,
+	IT_UNMAP_QUEUES                      = 0xA3,
+	IT_QUERY_STATUS                      = 0xA4,
+	IT_RUN_LIST                          = 0xA5,
+};
+
+#define PM4_TYPE_0 0
+#define PM4_TYPE_2 2
+#define PM4_TYPE_3 3
+
+#endif /* KFD_PM4_OPCODES_H */
+
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
new file mode 100644
index 000000000000..f9fb81e3bb09
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -0,0 +1,600 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef KFD_PRIV_H_INCLUDED
+#define KFD_PRIV_H_INCLUDED
+
+#include <linux/hashtable.h>
+#include <linux/mmu_notifier.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+#include <linux/atomic.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/kfd_ioctl.h>
+#include <kgd_kfd_interface.h>
+
+#define KFD_SYSFS_FILE_MODE 0444
+
+/*
+ * When working with cp scheduler we should assign the HIQ manually or via
+ * the radeon driver to a fixed hqd slot, here are the fixed HIQ hqd slot
+ * definitions for Kaveri. In Kaveri only the first ME queues participates
+ * in the cp scheduling taking that in mind we set the HIQ slot in the
+ * second ME.
+ */
+#define KFD_CIK_HIQ_PIPE 4
+#define KFD_CIK_HIQ_QUEUE 0
+
+/* GPU ID hash width in bits */
+#define KFD_GPU_ID_HASH_WIDTH 16
+
+/* Macro for allocating structures */
+#define kfd_alloc_struct(ptr_to_struct)	\
+	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
+
+/* Kernel module parameter to specify maximum number of supported processes */
+extern int max_num_of_processes;
+
+#define KFD_MAX_NUM_OF_PROCESSES_DEFAULT 32
+#define KFD_MAX_NUM_OF_PROCESSES 512
+
+/*
+ * Kernel module parameter to specify maximum number of supported queues
+ * per process
+ */
+extern int max_num_of_queues_per_process;
+
+#define KFD_MAX_NUM_OF_QUEUES_PER_PROCESS_DEFAULT 128
+#define KFD_MAX_NUM_OF_QUEUES_PER_PROCESS 1024
+
+#define KFD_KERNEL_QUEUE_SIZE 2048
+
+/* Kernel module parameter to specify the scheduling policy */
+extern int sched_policy;
+
+/**
+ * enum kfd_sched_policy
+ *
+ * @KFD_SCHED_POLICY_HWS: H/W scheduling policy known as command processor (cp)
+ * scheduling. In this scheduling mode we're using the firmware code to
+ * schedule the user mode queues and kernel queues such as HIQ and DIQ.
+ * the HIQ queue is used as a special queue that dispatches the configuration
+ * to the cp and the user mode queues list that are currently running.
+ * the DIQ queue is a debugging queue that dispatches debugging commands to the
+ * firmware.
+ * in this scheduling mode user mode queues over subscription feature is
+ * enabled.
+ *
+ * @KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION: The same as above but the over
+ * subscription feature disabled.
+ *
+ * @KFD_SCHED_POLICY_NO_HWS: no H/W scheduling policy is a mode which directly
+ * set the command processor registers and sets the queues "manually". This
+ * mode is used *ONLY* for debugging proposes.
+ *
+ */
+enum kfd_sched_policy {
+	KFD_SCHED_POLICY_HWS = 0,
+	KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION,
+	KFD_SCHED_POLICY_NO_HWS
+};
+
+enum cache_policy {
+	cache_policy_coherent,
+	cache_policy_noncoherent
+};
+
+struct kfd_device_info {
+	unsigned int max_pasid_bits;
+	size_t ih_ring_entry_size;
+	uint16_t mqd_size_aligned;
+};
+
+struct kfd_dev {
+	struct kgd_dev *kgd;
+
+	const struct kfd_device_info *device_info;
+	struct pci_dev *pdev;
+
+	unsigned int id;		/* topology stub index */
+
+	phys_addr_t doorbell_base;	/* Start of actual doorbells used by
+					 * KFD. It is aligned for mapping
+					 * into user mode
+					 */
+	size_t doorbell_id_offset;	/* Doorbell offset (from KFD doorbell
+					 * to HW doorbell, GFX reserved some
+					 * at the start)
+					 */
+	size_t doorbell_process_limit;	/* Number of processes we have doorbell
+					 * space for.
+					 */
+	u32 __iomem *doorbell_kernel_ptr; /* This is a pointer for a doorbells
+					   * page used by kernel queue
+					   */
+
+	struct kgd2kfd_shared_resources shared_resources;
+
+	void *interrupt_ring;
+	size_t interrupt_ring_size;
+	atomic_t interrupt_ring_rptr;
+	atomic_t interrupt_ring_wptr;
+	struct work_struct interrupt_work;
+	spinlock_t interrupt_lock;
+
+	/* QCM Device instance */
+	struct device_queue_manager *dqm;
+
+	bool init_complete;
+	/*
+	 * Interrupts of interest to KFD are copied
+	 * from the HW ring into a SW ring.
+	 */
+	bool interrupts_active;
+};
+
+/* KGD2KFD callbacks */
+void kgd2kfd_exit(void);
+struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev);
+bool kgd2kfd_device_init(struct kfd_dev *kfd,
+			 const struct kgd2kfd_shared_resources *gpu_resources);
+void kgd2kfd_device_exit(struct kfd_dev *kfd);
+
+extern const struct kfd2kgd_calls *kfd2kgd;
+
+struct kfd_mem_obj {
+	void *bo;
+	uint64_t gpu_addr;
+	uint32_t *cpu_ptr;
+};
+
+enum kfd_mempool {
+	KFD_MEMPOOL_SYSTEM_CACHEABLE = 1,
+	KFD_MEMPOOL_SYSTEM_WRITECOMBINE = 2,
+	KFD_MEMPOOL_FRAMEBUFFER = 3,
+};
+
+/* Character device interface */
+int kfd_chardev_init(void);
+void kfd_chardev_exit(void);
+struct device *kfd_chardev(void);
+
+/**
+ * enum kfd_preempt_type_filter
+ *
+ * @KFD_PREEMPT_TYPE_FILTER_SINGLE_QUEUE: Preempts single queue.
+ *
+ * @KFD_PRERMPT_TYPE_FILTER_ALL_QUEUES: Preempts all queues in the
+ *						running queues list.
+ *
+ * @KFD_PRERMPT_TYPE_FILTER_BY_PASID: Preempts queues that belongs to
+ *						specific process.
+ *
+ */
+enum kfd_preempt_type_filter {
+	KFD_PREEMPT_TYPE_FILTER_SINGLE_QUEUE,
+	KFD_PREEMPT_TYPE_FILTER_ALL_QUEUES,
+	KFD_PREEMPT_TYPE_FILTER_BY_PASID
+};
+
+enum kfd_preempt_type {
+	KFD_PREEMPT_TYPE_WAVEFRONT,
+	KFD_PREEMPT_TYPE_WAVEFRONT_RESET
+};
+
+/**
+ * enum kfd_queue_type
+ *
+ * @KFD_QUEUE_TYPE_COMPUTE: Regular user mode queue type.
+ *
+ * @KFD_QUEUE_TYPE_SDMA: Sdma user mode queue type.
+ *
+ * @KFD_QUEUE_TYPE_HIQ: HIQ queue type.
+ *
+ * @KFD_QUEUE_TYPE_DIQ: DIQ queue type.
+ */
+enum kfd_queue_type  {
+	KFD_QUEUE_TYPE_COMPUTE,
+	KFD_QUEUE_TYPE_SDMA,
+	KFD_QUEUE_TYPE_HIQ,
+	KFD_QUEUE_TYPE_DIQ
+};
+
+enum kfd_queue_format {
+	KFD_QUEUE_FORMAT_PM4,
+	KFD_QUEUE_FORMAT_AQL
+};
+
+/**
+ * struct queue_properties
+ *
+ * @type: The queue type.
+ *
+ * @queue_id: Queue identifier.
+ *
+ * @queue_address: Queue ring buffer address.
+ *
+ * @queue_size: Queue ring buffer size.
+ *
+ * @priority: Defines the queue priority relative to other queues in the
+ * process.
+ * This is just an indication and HW scheduling may override the priority as
+ * necessary while keeping the relative prioritization.
+ * the priority granularity is from 0 to f which f is the highest priority.
+ * currently all queues are initialized with the highest priority.
+ *
+ * @queue_percent: This field is partially implemented and currently a zero in
+ * this field defines that the queue is non active.
+ *
+ * @read_ptr: User space address which points to the number of dwords the
+ * cp read from the ring buffer. This field updates automatically by the H/W.
+ *
+ * @write_ptr: Defines the number of dwords written to the ring buffer.
+ *
+ * @doorbell_ptr: This field aim is to notify the H/W of new packet written to
+ * the queue ring buffer. This field should be similar to write_ptr and the user
+ * should update this field after he updated the write_ptr.
+ *
+ * @doorbell_off: The doorbell offset in the doorbell pci-bar.
+ *
+ * @is_interop: Defines if this is a interop queue. Interop queue means that the
+ * queue can access both graphics and compute resources.
+ *
+ * @is_active: Defines if the queue is active or not.
+ *
+ * @vmid: If the scheduling mode is no cp scheduling the field defines the vmid
+ * of the queue.
+ *
+ * This structure represents the queue properties for each queue no matter if
+ * it's user mode or kernel mode queue.
+ *
+ */
+struct queue_properties {
+	enum kfd_queue_type type;
+	enum kfd_queue_format format;
+	unsigned int queue_id;
+	uint64_t queue_address;
+	uint64_t  queue_size;
+	uint32_t priority;
+	uint32_t queue_percent;
+	uint32_t *read_ptr;
+	uint32_t *write_ptr;
+	uint32_t __iomem *doorbell_ptr;
+	uint32_t doorbell_off;
+	bool is_interop;
+	bool is_active;
+	/* Not relevant for user mode queues in cp scheduling */
+	unsigned int vmid;
+};
+
+/**
+ * struct queue
+ *
+ * @list: Queue linked list.
+ *
+ * @mqd: The queue MQD.
+ *
+ * @mqd_mem_obj: The MQD local gpu memory object.
+ *
+ * @gart_mqd_addr: The MQD gart mc address.
+ *
+ * @properties: The queue properties.
+ *
+ * @mec: Used only in no cp scheduling mode and identifies to micro engine id
+ * that the queue should be execute on.
+ *
+ * @pipe: Used only in no cp scheduling mode and identifies the queue's pipe id.
+ *
+ * @queue: Used only in no cp scheduliong mode and identifies the queue's slot.
+ *
+ * @process: The kfd process that created this queue.
+ *
+ * @device: The kfd device that created this queue.
+ *
+ * This structure represents user mode compute queues.
+ * It contains all the necessary data to handle such queues.
+ *
+ */
+
+struct queue {
+	struct list_head list;
+	void *mqd;
+	struct kfd_mem_obj *mqd_mem_obj;
+	uint64_t gart_mqd_addr;
+	struct queue_properties properties;
+
+	uint32_t mec;
+	uint32_t pipe;
+	uint32_t queue;
+
+	struct kfd_process	*process;
+	struct kfd_dev		*device;
+};
+
+/*
+ * Please read the kfd_mqd_manager.h description.
+ */
+enum KFD_MQD_TYPE {
+	KFD_MQD_TYPE_CIK_COMPUTE = 0, /* for no cp scheduling */
+	KFD_MQD_TYPE_CIK_HIQ, /* for hiq */
+	KFD_MQD_TYPE_CIK_CP, /* for cp queues and diq */
+	KFD_MQD_TYPE_CIK_SDMA, /* for sdma queues */
+	KFD_MQD_TYPE_MAX
+};
+
+struct scheduling_resources {
+	unsigned int vmid_mask;
+	enum kfd_queue_type type;
+	uint64_t queue_mask;
+	uint64_t gws_mask;
+	uint32_t oac_mask;
+	uint32_t gds_heap_base;
+	uint32_t gds_heap_size;
+};
+
+struct process_queue_manager {
+	/* data */
+	struct kfd_process	*process;
+	unsigned int		num_concurrent_processes;
+	struct list_head	queues;
+	unsigned long		*queue_slot_bitmap;
+};
+
+struct qcm_process_device {
+	/* The Device Queue Manager that owns this data */
+	struct device_queue_manager *dqm;
+	struct process_queue_manager *pqm;
+	/* Device Queue Manager lock */
+	struct mutex *lock;
+	/* Queues list */
+	struct list_head queues_list;
+	struct list_head priv_queue_list;
+
+	unsigned int queue_count;
+	unsigned int vmid;
+	bool is_debug;
+	/*
+	 * All the memory management data should be here too
+	 */
+	uint64_t gds_context_area;
+	uint32_t sh_mem_config;
+	uint32_t sh_mem_bases;
+	uint32_t sh_mem_ape1_base;
+	uint32_t sh_mem_ape1_limit;
+	uint32_t page_table_base;
+	uint32_t gds_size;
+	uint32_t num_gws;
+	uint32_t num_oac;
+};
+
+/* Data that is per-process-per device. */
+struct kfd_process_device {
+	/*
+	 * List of all per-device data for a process.
+	 * Starts from kfd_process.per_device_data.
+	 */
+	struct list_head per_device_list;
+
+	/* The device that owns this data. */
+	struct kfd_dev *dev;
+
+
+	/* per-process-per device QCM data structure */
+	struct qcm_process_device qpd;
+
+	/*Apertures*/
+	uint64_t lds_base;
+	uint64_t lds_limit;
+	uint64_t gpuvm_base;
+	uint64_t gpuvm_limit;
+	uint64_t scratch_base;
+	uint64_t scratch_limit;
+
+	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
+	bool bound;
+};
+
+#define qpd_to_pdd(x) container_of(x, struct kfd_process_device, qpd)
+
+/* Process data */
+struct kfd_process {
+	/*
+	 * kfd_process are stored in an mm_struct*->kfd_process*
+	 * hash table (kfd_processes in kfd_process.c)
+	 */
+	struct hlist_node kfd_processes;
+
+	struct mm_struct *mm;
+
+	struct mutex mutex;
+
+	/*
+	 * In any process, the thread that started main() is the lead
+	 * thread and outlives the rest.
+	 * It is here because amd_iommu_bind_pasid wants a task_struct.
+	 */
+	struct task_struct *lead_thread;
+
+	/* We want to receive a notification when the mm_struct is destroyed */
+	struct mmu_notifier mmu_notifier;
+
+	/* Use for delayed freeing of kfd_process structure */
+	struct rcu_head	rcu;
+
+	unsigned int pasid;
+
+	/*
+	 * List of kfd_process_device structures,
+	 * one for each device the process is using.
+	 */
+	struct list_head per_device_data;
+
+	struct process_queue_manager pqm;
+
+	/* The process's queues. */
+	size_t queue_array_size;
+
+	/* Size is queue_array_size, up to MAX_PROCESS_QUEUES. */
+	struct kfd_queue **queues;
+
+	unsigned long allocated_queue_bitmap[DIV_ROUND_UP(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS, BITS_PER_LONG)];
+
+	/*Is the user space process 32 bit?*/
+	bool is_32bit_user_mode;
+};
+
+void kfd_process_create_wq(void);
+void kfd_process_destroy_wq(void);
+struct kfd_process *kfd_create_process(const struct task_struct *);
+struct kfd_process *kfd_get_process(const struct task_struct *);
+
+struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
+							struct kfd_process *p);
+void kfd_unbind_process_from_device(struct kfd_dev *dev, unsigned int pasid);
+struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
+							struct kfd_process *p,
+							int create_pdd);
+
+/* Process device data iterator */
+struct kfd_process_device *kfd_get_first_process_device_data(struct kfd_process *p);
+struct kfd_process_device *kfd_get_next_process_device_data(struct kfd_process *p,
+						struct kfd_process_device *pdd);
+bool kfd_has_process_device_data(struct kfd_process *p);
+
+/* PASIDs */
+int kfd_pasid_init(void);
+void kfd_pasid_exit(void);
+bool kfd_set_pasid_limit(unsigned int new_limit);
+unsigned int kfd_get_pasid_limit(void);
+unsigned int kfd_pasid_alloc(void);
+void kfd_pasid_free(unsigned int pasid);
+
+/* Doorbells */
+void kfd_doorbell_init(struct kfd_dev *kfd);
+int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma);
+u32 __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
+					unsigned int *doorbell_off);
+void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr);
+u32 read_kernel_doorbell(u32 __iomem *db);
+void write_kernel_doorbell(u32 __iomem *db, u32 value);
+unsigned int kfd_queue_id_to_doorbell(struct kfd_dev *kfd,
+					struct kfd_process *process,
+					unsigned int queue_id);
+
+extern struct device *kfd_device;
+
+/* Topology */
+int kfd_topology_init(void);
+void kfd_topology_shutdown(void);
+int kfd_topology_add_device(struct kfd_dev *gpu);
+int kfd_topology_remove_device(struct kfd_dev *gpu);
+struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
+struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
+struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx);
+
+/* Interrupts */
+int kfd_interrupt_init(struct kfd_dev *dev);
+void kfd_interrupt_exit(struct kfd_dev *dev);
+void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry);
+bool enqueue_ih_ring_entry(struct kfd_dev *kfd,	const void *ih_ring_entry);
+
+/* Power Management */
+void kgd2kfd_suspend(struct kfd_dev *kfd);
+int kgd2kfd_resume(struct kfd_dev *kfd);
+
+/* amdkfd Apertures */
+int kfd_init_apertures(struct kfd_process *process);
+
+/* Queue Context Management */
+inline uint32_t lower_32(uint64_t x);
+inline uint32_t upper_32(uint64_t x);
+
+int init_queue(struct queue **q, struct queue_properties properties);
+void uninit_queue(struct queue *q);
+void print_queue_properties(struct queue_properties *q);
+void print_queue(struct queue *q);
+
+struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type,
+					struct kfd_dev *dev);
+struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev);
+void device_queue_manager_uninit(struct device_queue_manager *dqm);
+struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
+					enum kfd_queue_type type);
+void kernel_queue_uninit(struct kernel_queue *kq);
+
+/* Process Queue Manager */
+struct process_queue_node {
+	struct queue *q;
+	struct kernel_queue *kq;
+	struct list_head process_queue_list;
+};
+
+int pqm_init(struct process_queue_manager *pqm, struct kfd_process *p);
+void pqm_uninit(struct process_queue_manager *pqm);
+int pqm_create_queue(struct process_queue_manager *pqm,
+			    struct kfd_dev *dev,
+			    struct file *f,
+			    struct queue_properties *properties,
+			    unsigned int flags,
+			    enum kfd_queue_type type,
+			    unsigned int *qid);
+int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
+int pqm_update_queue(struct process_queue_manager *pqm, unsigned int qid,
+			struct queue_properties *p);
+
+/* Packet Manager */
+
+#define KFD_HIQ_TIMEOUT (500)
+
+#define KFD_FENCE_COMPLETED (100)
+#define KFD_FENCE_INIT   (10)
+#define KFD_UNMAP_LATENCY (150)
+
+struct packet_manager {
+	struct device_queue_manager *dqm;
+	struct kernel_queue *priv_queue;
+	struct mutex lock;
+	bool allocated;
+	struct kfd_mem_obj *ib_buffer_obj;
+};
+
+int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm);
+void pm_uninit(struct packet_manager *pm);
+int pm_send_set_resources(struct packet_manager *pm,
+				struct scheduling_resources *res);
+int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues);
+int pm_send_query_status(struct packet_manager *pm, uint64_t fence_address,
+				uint32_t fence_value);
+
+int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type,
+			enum kfd_preempt_type_filter mode,
+			uint32_t filter_param, bool reset,
+			unsigned int sdma_engine);
+
+void pm_release_ib(struct packet_manager *pm);
+
+uint64_t kfd_get_number_elems(struct kfd_dev *kfd);
+phys_addr_t kfd_get_process_doorbells(struct kfd_dev *dev,
+					struct kfd_process *process);
+
+#endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
new file mode 100644
index 000000000000..b85eb0b830b4
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -0,0 +1,410 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/mutex.h>
+#include <linux/log2.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/amd-iommu.h>
+#include <linux/notifier.h>
+struct mm_struct;
+
+#include "kfd_priv.h"
+
+/*
+ * Initial size for the array of queues.
+ * The allocated size is doubled each time
+ * it is exceeded up to MAX_PROCESS_QUEUES.
+ */
+#define INITIAL_QUEUE_ARRAY_SIZE 16
+
+/*
+ * List of struct kfd_process (field kfd_process).
+ * Unique/indexed by mm_struct*
+ */
+#define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
+static DEFINE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
+static DEFINE_MUTEX(kfd_processes_mutex);
+
+DEFINE_STATIC_SRCU(kfd_processes_srcu);
+
+static struct workqueue_struct *kfd_process_wq;
+
+struct kfd_process_release_work {
+	struct work_struct kfd_work;
+	struct kfd_process *p;
+};
+
+static struct kfd_process *find_process(const struct task_struct *thread);
+static struct kfd_process *create_process(const struct task_struct *thread);
+
+void kfd_process_create_wq(void)
+{
+	if (!kfd_process_wq)
+		kfd_process_wq = create_workqueue("kfd_process_wq");
+}
+
+void kfd_process_destroy_wq(void)
+{
+	if (kfd_process_wq) {
+		flush_workqueue(kfd_process_wq);
+		destroy_workqueue(kfd_process_wq);
+		kfd_process_wq = NULL;
+	}
+}
+
+struct kfd_process *kfd_create_process(const struct task_struct *thread)
+{
+	struct kfd_process *process;
+
+	BUG_ON(!kfd_process_wq);
+
+	if (thread->mm == NULL)
+		return ERR_PTR(-EINVAL);
+
+	/* Only the pthreads threading model is supported. */
+	if (thread->group_leader->mm != thread->mm)
+		return ERR_PTR(-EINVAL);
+
+	/* Take mmap_sem because we call __mmu_notifier_register inside */
+	down_write(&thread->mm->mmap_sem);
+
+	/*
+	 * take kfd processes mutex before starting of process creation
+	 * so there won't be a case where two threads of the same process
+	 * create two kfd_process structures
+	 */
+	mutex_lock(&kfd_processes_mutex);
+
+	/* A prior open of /dev/kfd could have already created the process. */
+	process = find_process(thread);
+	if (process)
+		pr_debug("kfd: process already found\n");
+
+	if (!process)
+		process = create_process(thread);
+
+	mutex_unlock(&kfd_processes_mutex);
+
+	up_write(&thread->mm->mmap_sem);
+
+	return process;
+}
+
+struct kfd_process *kfd_get_process(const struct task_struct *thread)
+{
+	struct kfd_process *process;
+
+	if (thread->mm == NULL)
+		return ERR_PTR(-EINVAL);
+
+	/* Only the pthreads threading model is supported. */
+	if (thread->group_leader->mm != thread->mm)
+		return ERR_PTR(-EINVAL);
+
+	process = find_process(thread);
+
+	return process;
+}
+
+static struct kfd_process *find_process_by_mm(const struct mm_struct *mm)
+{
+	struct kfd_process *process;
+
+	hash_for_each_possible_rcu(kfd_processes_table, process,
+					kfd_processes, (uintptr_t)mm)
+		if (process->mm == mm)
+			return process;
+
+	return NULL;
+}
+
+static struct kfd_process *find_process(const struct task_struct *thread)
+{
+	struct kfd_process *p;
+	int idx;
+
+	idx = srcu_read_lock(&kfd_processes_srcu);
+	p = find_process_by_mm(thread->mm);
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+
+	return p;
+}
+
+static void kfd_process_wq_release(struct work_struct *work)
+{
+	struct kfd_process_release_work *my_work;
+	struct kfd_process_device *pdd, *temp;
+	struct kfd_process *p;
+
+	my_work = (struct kfd_process_release_work *) work;
+
+	p = my_work->p;
+
+	mutex_lock(&p->mutex);
+
+	list_for_each_entry_safe(pdd, temp, &p->per_device_data,
+							per_device_list) {
+		amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
+		list_del(&pdd->per_device_list);
+
+		kfree(pdd);
+	}
+
+	kfd_pasid_free(p->pasid);
+
+	mutex_unlock(&p->mutex);
+
+	mutex_destroy(&p->mutex);
+
+	kfree(p->queues);
+
+	kfree(p);
+
+	kfree((void *)work);
+}
+
+static void kfd_process_destroy_delayed(struct rcu_head *rcu)
+{
+	struct kfd_process_release_work *work;
+	struct kfd_process *p;
+
+	BUG_ON(!kfd_process_wq);
+
+	p = container_of(rcu, struct kfd_process, rcu);
+	BUG_ON(atomic_read(&p->mm->mm_count) <= 0);
+
+	mmdrop(p->mm);
+
+	work = (struct kfd_process_release_work *)
+		kmalloc(sizeof(struct kfd_process_release_work), GFP_ATOMIC);
+
+	if (work) {
+		INIT_WORK((struct work_struct *) work, kfd_process_wq_release);
+		work->p = p;
+		queue_work(kfd_process_wq, (struct work_struct *) work);
+	}
+}
+
+static void kfd_process_notifier_release(struct mmu_notifier *mn,
+					struct mm_struct *mm)
+{
+	struct kfd_process *p;
+
+	/*
+	 * The kfd_process structure can not be free because the
+	 * mmu_notifier srcu is read locked
+	 */
+	p = container_of(mn, struct kfd_process, mmu_notifier);
+	BUG_ON(p->mm != mm);
+
+	mutex_lock(&kfd_processes_mutex);
+	hash_del_rcu(&p->kfd_processes);
+	mutex_unlock(&kfd_processes_mutex);
+	synchronize_srcu(&kfd_processes_srcu);
+
+	mutex_lock(&p->mutex);
+
+	/* In case our notifier is called before IOMMU notifier */
+	pqm_uninit(&p->pqm);
+
+	mutex_unlock(&p->mutex);
+
+	/*
+	 * Because we drop mm_count inside kfd_process_destroy_delayed
+	 * and because the mmu_notifier_unregister function also drop
+	 * mm_count we need to take an extra count here.
+	 */
+	atomic_inc(&p->mm->mm_count);
+	mmu_notifier_unregister_no_release(&p->mmu_notifier, p->mm);
+	mmu_notifier_call_srcu(&p->rcu, &kfd_process_destroy_delayed);
+}
+
+static const struct mmu_notifier_ops kfd_process_mmu_notifier_ops = {
+	.release = kfd_process_notifier_release,
+};
+
+static struct kfd_process *create_process(const struct task_struct *thread)
+{
+	struct kfd_process *process;
+	int err = -ENOMEM;
+
+	process = kzalloc(sizeof(*process), GFP_KERNEL);
+
+	if (!process)
+		goto err_alloc_process;
+
+	process->queues = kmalloc_array(INITIAL_QUEUE_ARRAY_SIZE,
+					sizeof(process->queues[0]), GFP_KERNEL);
+	if (!process->queues)
+		goto err_alloc_queues;
+
+	process->pasid = kfd_pasid_alloc();
+	if (process->pasid == 0)
+		goto err_alloc_pasid;
+
+	mutex_init(&process->mutex);
+
+	process->mm = thread->mm;
+
+	/* register notifier */
+	process->mmu_notifier.ops = &kfd_process_mmu_notifier_ops;
+	err = __mmu_notifier_register(&process->mmu_notifier, process->mm);
+	if (err)
+		goto err_mmu_notifier;
+
+	hash_add_rcu(kfd_processes_table, &process->kfd_processes,
+			(uintptr_t)process->mm);
+
+	process->lead_thread = thread->group_leader;
+
+	process->queue_array_size = INITIAL_QUEUE_ARRAY_SIZE;
+
+	INIT_LIST_HEAD(&process->per_device_data);
+
+	err = pqm_init(&process->pqm, process);
+	if (err != 0)
+		goto err_process_pqm_init;
+
+	return process;
+
+err_process_pqm_init:
+	hash_del_rcu(&process->kfd_processes);
+	synchronize_rcu();
+	mmu_notifier_unregister_no_release(&process->mmu_notifier, process->mm);
+err_mmu_notifier:
+	kfd_pasid_free(process->pasid);
+err_alloc_pasid:
+	kfree(process->queues);
+err_alloc_queues:
+	kfree(process);
+err_alloc_process:
+	return ERR_PTR(err);
+}
+
+struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
+							struct kfd_process *p,
+							int create_pdd)
+{
+	struct kfd_process_device *pdd = NULL;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list)
+		if (pdd->dev == dev)
+			return pdd;
+
+	if (create_pdd) {
+		pdd = kzalloc(sizeof(*pdd), GFP_KERNEL);
+		if (pdd != NULL) {
+			pdd->dev = dev;
+			INIT_LIST_HEAD(&pdd->qpd.queues_list);
+			INIT_LIST_HEAD(&pdd->qpd.priv_queue_list);
+			pdd->qpd.dqm = dev->dqm;
+			list_add(&pdd->per_device_list, &p->per_device_data);
+		}
+	}
+
+	return pdd;
+}
+
+/*
+ * Direct the IOMMU to bind the process (specifically the pasid->mm)
+ * to the device.
+ * Unbinding occurs when the process dies or the device is removed.
+ *
+ * Assumes that the process lock is held.
+ */
+struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
+							struct kfd_process *p)
+{
+	struct kfd_process_device *pdd = kfd_get_process_device_data(dev, p, 1);
+	int err;
+
+	if (pdd == NULL)
+		return ERR_PTR(-ENOMEM);
+
+	if (pdd->bound)
+		return pdd;
+
+	err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
+	if (err < 0)
+		return ERR_PTR(err);
+
+	pdd->bound = true;
+
+	return pdd;
+}
+
+void kfd_unbind_process_from_device(struct kfd_dev *dev, unsigned int pasid)
+{
+	struct kfd_process *p;
+	struct kfd_process_device *pdd;
+	int idx, i;
+
+	BUG_ON(dev == NULL);
+
+	idx = srcu_read_lock(&kfd_processes_srcu);
+
+	hash_for_each_rcu(kfd_processes_table, i, p, kfd_processes)
+		if (p->pasid == pasid)
+			break;
+
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+
+	BUG_ON(p->pasid != pasid);
+
+	mutex_lock(&p->mutex);
+
+	pqm_uninit(&p->pqm);
+
+	pdd = kfd_get_process_device_data(dev, p, 0);
+
+	/*
+	 * Just mark pdd as unbound, because we still need it to call
+	 * amd_iommu_unbind_pasid() in when the process exits.
+	 * We don't call amd_iommu_unbind_pasid() here
+	 * because the IOMMU called us.
+	 */
+	if (pdd)
+		pdd->bound = false;
+
+	mutex_unlock(&p->mutex);
+}
+
+struct kfd_process_device *kfd_get_first_process_device_data(struct kfd_process *p)
+{
+	return list_first_entry(&p->per_device_data,
+				struct kfd_process_device,
+				per_device_list);
+}
+
+struct kfd_process_device *kfd_get_next_process_device_data(struct kfd_process *p,
+						struct kfd_process_device *pdd)
+{
+	if (list_is_last(&pdd->per_device_list, &p->per_device_data))
+		return NULL;
+	return list_next_entry(pdd, per_device_list);
+}
+
+bool kfd_has_process_device_data(struct kfd_process *p)
+{
+	return !(list_empty(&p->per_device_data));
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
new file mode 100644
index 000000000000..47526780d736
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -0,0 +1,343 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/slab.h>
+#include <linux/list.h>
+#include "kfd_device_queue_manager.h"
+#include "kfd_priv.h"
+#include "kfd_kernel_queue.h"
+
+static inline struct process_queue_node *get_queue_by_qid(
+			struct process_queue_manager *pqm, unsigned int qid)
+{
+	struct process_queue_node *pqn;
+
+	BUG_ON(!pqm);
+
+	list_for_each_entry(pqn, &pqm->queues, process_queue_list) {
+		if (pqn->q && pqn->q->properties.queue_id == qid)
+			return pqn;
+		if (pqn->kq && pqn->kq->queue->properties.queue_id == qid)
+			return pqn;
+	}
+
+	return NULL;
+}
+
+static int find_available_queue_slot(struct process_queue_manager *pqm,
+					unsigned int *qid)
+{
+	unsigned long found;
+
+	BUG_ON(!pqm || !qid);
+
+	pr_debug("kfd: in %s\n", __func__);
+
+	found = find_first_zero_bit(pqm->queue_slot_bitmap,
+			max_num_of_queues_per_process);
+
+	pr_debug("kfd: the new slot id %lu\n", found);
+
+	if (found >= max_num_of_queues_per_process) {
+		pr_info("amdkfd: Can not open more queues for process with pasid %d\n",
+				pqm->process->pasid);
+		return -ENOMEM;
+	}
+
+	set_bit(found, pqm->queue_slot_bitmap);
+	*qid = found;
+
+	return 0;
+}
+
+int pqm_init(struct process_queue_manager *pqm, struct kfd_process *p)
+{
+	BUG_ON(!pqm);
+
+	INIT_LIST_HEAD(&pqm->queues);
+	pqm->queue_slot_bitmap =
+			kzalloc(DIV_ROUND_UP(max_num_of_queues_per_process,
+					BITS_PER_BYTE), GFP_KERNEL);
+	if (pqm->queue_slot_bitmap == NULL)
+		return -ENOMEM;
+	pqm->process = p;
+
+	return 0;
+}
+
+void pqm_uninit(struct process_queue_manager *pqm)
+{
+	int retval;
+	struct process_queue_node *pqn, *next;
+
+	BUG_ON(!pqm);
+
+	pr_debug("In func %s\n", __func__);
+
+	list_for_each_entry_safe(pqn, next, &pqm->queues, process_queue_list) {
+		retval = pqm_destroy_queue(
+				pqm,
+				(pqn->q != NULL) ?
+					pqn->q->properties.queue_id :
+					pqn->kq->queue->properties.queue_id);
+
+		if (retval != 0) {
+			pr_err("kfd: failed to destroy queue\n");
+			return;
+		}
+	}
+	kfree(pqm->queue_slot_bitmap);
+	pqm->queue_slot_bitmap = NULL;
+}
+
+static int create_cp_queue(struct process_queue_manager *pqm,
+				struct kfd_dev *dev, struct queue **q,
+				struct queue_properties *q_properties,
+				struct file *f, unsigned int qid)
+{
+	int retval;
+
+	retval = 0;
+
+	/* Doorbell initialized in user space*/
+	q_properties->doorbell_ptr = NULL;
+
+	q_properties->doorbell_off =
+			kfd_queue_id_to_doorbell(dev, pqm->process, qid);
+
+	/* let DQM handle it*/
+	q_properties->vmid = 0;
+	q_properties->queue_id = qid;
+	q_properties->type = KFD_QUEUE_TYPE_COMPUTE;
+
+	retval = init_queue(q, *q_properties);
+	if (retval != 0)
+		goto err_init_queue;
+
+	(*q)->device = dev;
+	(*q)->process = pqm->process;
+
+	pr_debug("kfd: PQM After init queue");
+
+	return retval;
+
+err_init_queue:
+	return retval;
+}
+
+int pqm_create_queue(struct process_queue_manager *pqm,
+			    struct kfd_dev *dev,
+			    struct file *f,
+			    struct queue_properties *properties,
+			    unsigned int flags,
+			    enum kfd_queue_type type,
+			    unsigned int *qid)
+{
+	int retval;
+	struct kfd_process_device *pdd;
+	struct queue_properties q_properties;
+	struct queue *q;
+	struct process_queue_node *pqn;
+	struct kernel_queue *kq;
+
+	BUG_ON(!pqm || !dev || !properties || !qid);
+
+	memset(&q_properties, 0, sizeof(struct queue_properties));
+	memcpy(&q_properties, properties, sizeof(struct queue_properties));
+	q = NULL;
+	kq = NULL;
+
+	pdd = kfd_get_process_device_data(dev, pqm->process, 1);
+	BUG_ON(!pdd);
+
+	retval = find_available_queue_slot(pqm, qid);
+	if (retval != 0)
+		return retval;
+
+	if (list_empty(&pqm->queues)) {
+		pdd->qpd.pqm = pqm;
+		dev->dqm->register_process(dev->dqm, &pdd->qpd);
+	}
+
+	pqn = kzalloc(sizeof(struct process_queue_node), GFP_KERNEL);
+	if (!pqn) {
+		retval = -ENOMEM;
+		goto err_allocate_pqn;
+	}
+
+	switch (type) {
+	case KFD_QUEUE_TYPE_COMPUTE:
+		/* check if there is over subscription */
+		if ((sched_policy == KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION) &&
+		((dev->dqm->processes_count >= VMID_PER_DEVICE) ||
+		(dev->dqm->queue_count >= PIPE_PER_ME_CP_SCHEDULING * QUEUES_PER_PIPE))) {
+			pr_err("kfd: over-subscription is not allowed in radeon_kfd.sched_policy == 1\n");
+			retval = -EPERM;
+			goto err_create_queue;
+		}
+
+		retval = create_cp_queue(pqm, dev, &q, &q_properties, f, *qid);
+		if (retval != 0)
+			goto err_create_queue;
+		pqn->q = q;
+		pqn->kq = NULL;
+		retval = dev->dqm->create_queue(dev->dqm, q, &pdd->qpd,
+						&q->properties.vmid);
+		print_queue(q);
+		break;
+	case KFD_QUEUE_TYPE_DIQ:
+		kq = kernel_queue_init(dev, KFD_QUEUE_TYPE_DIQ);
+		if (kq == NULL) {
+			retval = -ENOMEM;
+			goto err_create_queue;
+		}
+		kq->queue->properties.queue_id = *qid;
+		pqn->kq = kq;
+		pqn->q = NULL;
+		retval = dev->dqm->create_kernel_queue(dev->dqm, kq, &pdd->qpd);
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	if (retval != 0) {
+		pr_err("kfd: error dqm create queue\n");
+		goto err_create_queue;
+	}
+
+	pr_debug("kfd: PQM After DQM create queue\n");
+
+	list_add(&pqn->process_queue_list, &pqm->queues);
+
+	if (q) {
+		*properties = q->properties;
+		pr_debug("kfd: PQM done creating queue\n");
+		print_queue_properties(properties);
+	}
+
+	return retval;
+
+err_create_queue:
+	kfree(pqn);
+err_allocate_pqn:
+	clear_bit(*qid, pqm->queue_slot_bitmap);
+	return retval;
+}
+
+int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid)
+{
+	struct process_queue_node *pqn;
+	struct kfd_process_device *pdd;
+	struct device_queue_manager *dqm;
+	struct kfd_dev *dev;
+	int retval;
+
+	dqm = NULL;
+
+	BUG_ON(!pqm);
+	retval = 0;
+
+	pr_debug("kfd: In Func %s\n", __func__);
+
+	pqn = get_queue_by_qid(pqm, qid);
+	if (pqn == NULL) {
+		pr_err("kfd: queue id does not match any known queue\n");
+		return -EINVAL;
+	}
+
+	dev = NULL;
+	if (pqn->kq)
+		dev = pqn->kq->dev;
+	if (pqn->q)
+		dev = pqn->q->device;
+	BUG_ON(!dev);
+
+	pdd = kfd_get_process_device_data(dev, pqm->process, 1);
+	BUG_ON(!pdd);
+
+	if (pqn->kq) {
+		/* destroy kernel queue (DIQ) */
+		dqm = pqn->kq->dev->dqm;
+		dqm->destroy_kernel_queue(dqm, pqn->kq, &pdd->qpd);
+		kernel_queue_uninit(pqn->kq);
+	}
+
+	if (pqn->q) {
+		dqm = pqn->q->device->dqm;
+		retval = dqm->destroy_queue(dqm, &pdd->qpd, pqn->q);
+		if (retval != 0)
+			return retval;
+
+		uninit_queue(pqn->q);
+	}
+
+	list_del(&pqn->process_queue_list);
+	kfree(pqn);
+	clear_bit(qid, pqm->queue_slot_bitmap);
+
+	if (list_empty(&pqm->queues))
+		dqm->unregister_process(dqm, &pdd->qpd);
+
+	return retval;
+}
+
+int pqm_update_queue(struct process_queue_manager *pqm, unsigned int qid,
+			struct queue_properties *p)
+{
+	int retval;
+	struct process_queue_node *pqn;
+
+	BUG_ON(!pqm);
+
+	pqn = get_queue_by_qid(pqm, qid);
+	BUG_ON(!pqn);
+
+	pqn->q->properties.queue_address = p->queue_address;
+	pqn->q->properties.queue_size = p->queue_size;
+	pqn->q->properties.queue_percent = p->queue_percent;
+	pqn->q->properties.priority = p->priority;
+
+	retval = pqn->q->device->dqm->update_queue(pqn->q->device->dqm, pqn->q);
+	if (retval != 0)
+		return retval;
+
+	return 0;
+}
+
+static __attribute__((unused)) struct kernel_queue *pqm_get_kernel_queue(
+					struct process_queue_manager *pqm,
+					unsigned int qid)
+{
+	struct process_queue_node *pqn;
+
+	BUG_ON(!pqm);
+
+	pqn = get_queue_by_qid(pqm, qid);
+	if (pqn && pqn->kq)
+		return pqn->kq;
+
+	return NULL;
+}
+
+
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
new file mode 100644
index 000000000000..9a0c90b0702e
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
@@ -0,0 +1,85 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/slab.h>
+#include "kfd_priv.h"
+
+void print_queue_properties(struct queue_properties *q)
+{
+	if (!q)
+		return;
+
+	pr_debug("Printing queue properties:\n");
+	pr_debug("Queue Type: %u\n", q->type);
+	pr_debug("Queue Size: %llu\n", q->queue_size);
+	pr_debug("Queue percent: %u\n", q->queue_percent);
+	pr_debug("Queue Address: 0x%llX\n", q->queue_address);
+	pr_debug("Queue Id: %u\n", q->queue_id);
+	pr_debug("Queue Process Vmid: %u\n", q->vmid);
+	pr_debug("Queue Read Pointer: 0x%p\n", q->read_ptr);
+	pr_debug("Queue Write Pointer: 0x%p\n", q->write_ptr);
+	pr_debug("Queue Doorbell Pointer: 0x%p\n", q->doorbell_ptr);
+	pr_debug("Queue Doorbell Offset: %u\n", q->doorbell_off);
+}
+
+void print_queue(struct queue *q)
+{
+	if (!q)
+		return;
+	pr_debug("Printing queue:\n");
+	pr_debug("Queue Type: %u\n", q->properties.type);
+	pr_debug("Queue Size: %llu\n", q->properties.queue_size);
+	pr_debug("Queue percent: %u\n", q->properties.queue_percent);
+	pr_debug("Queue Address: 0x%llX\n", q->properties.queue_address);
+	pr_debug("Queue Id: %u\n", q->properties.queue_id);
+	pr_debug("Queue Process Vmid: %u\n", q->properties.vmid);
+	pr_debug("Queue Read Pointer: 0x%p\n", q->properties.read_ptr);
+	pr_debug("Queue Write Pointer: 0x%p\n", q->properties.write_ptr);
+	pr_debug("Queue Doorbell Pointer: 0x%p\n", q->properties.doorbell_ptr);
+	pr_debug("Queue Doorbell Offset: %u\n", q->properties.doorbell_off);
+	pr_debug("Queue MQD Address: 0x%p\n", q->mqd);
+	pr_debug("Queue MQD Gart: 0x%llX\n", q->gart_mqd_addr);
+	pr_debug("Queue Process Address: 0x%p\n", q->process);
+	pr_debug("Queue Device Address: 0x%p\n", q->device);
+}
+
+int init_queue(struct queue **q, struct queue_properties properties)
+{
+	struct queue *tmp;
+
+	BUG_ON(!q);
+
+	tmp = kzalloc(sizeof(struct queue), GFP_KERNEL);
+	if (!tmp)
+		return -ENOMEM;
+
+	memcpy(&tmp->properties, &properties, sizeof(struct queue_properties));
+
+	*q = tmp;
+	return 0;
+}
+
+void uninit_queue(struct queue *q)
+{
+	kfree(q);
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
new file mode 100644
index 000000000000..5733e2859e8a
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -0,0 +1,1235 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/errno.h>
+#include <linux/acpi.h>
+#include <linux/hash.h>
+#include <linux/cpufreq.h>
+
+#include "kfd_priv.h"
+#include "kfd_crat.h"
+#include "kfd_topology.h"
+
+static struct list_head topology_device_list;
+static int topology_crat_parsed;
+static struct kfd_system_properties sys_props;
+
+static DECLARE_RWSEM(topology_lock);
+
+struct kfd_dev *kfd_device_by_id(uint32_t gpu_id)
+{
+	struct kfd_topology_device *top_dev;
+	struct kfd_dev *device = NULL;
+
+	down_read(&topology_lock);
+
+	list_for_each_entry(top_dev, &topology_device_list, list)
+		if (top_dev->gpu_id == gpu_id) {
+			device = top_dev->gpu;
+			break;
+		}
+
+	up_read(&topology_lock);
+
+	return device;
+}
+
+struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev)
+{
+	struct kfd_topology_device *top_dev;
+	struct kfd_dev *device = NULL;
+
+	down_read(&topology_lock);
+
+	list_for_each_entry(top_dev, &topology_device_list, list)
+		if (top_dev->gpu->pdev == pdev) {
+			device = top_dev->gpu;
+			break;
+		}
+
+	up_read(&topology_lock);
+
+	return device;
+}
+
+static int kfd_topology_get_crat_acpi(void *crat_image, size_t *size)
+{
+	struct acpi_table_header *crat_table;
+	acpi_status status;
+
+	if (!size)
+		return -EINVAL;
+
+	/*
+	 * Fetch the CRAT table from ACPI
+	 */
+	status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table);
+	if (status == AE_NOT_FOUND) {
+		pr_warn("CRAT table not found\n");
+		return -ENODATA;
+	} else if (ACPI_FAILURE(status)) {
+		const char *err = acpi_format_exception(status);
+
+		pr_err("CRAT table error: %s\n", err);
+		return -EINVAL;
+	}
+
+	if (*size >= crat_table->length && crat_image != NULL)
+		memcpy(crat_image, crat_table, crat_table->length);
+
+	*size = crat_table->length;
+
+	return 0;
+}
+
+static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
+		struct crat_subtype_computeunit *cu)
+{
+	BUG_ON(!dev);
+	BUG_ON(!cu);
+
+	dev->node_props.cpu_cores_count = cu->num_cpu_cores;
+	dev->node_props.cpu_core_id_base = cu->processor_id_low;
+	if (cu->hsa_capability & CRAT_CU_FLAGS_IOMMU_PRESENT)
+		dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
+
+	pr_info("CU CPU: cores=%d id_base=%d\n", cu->num_cpu_cores,
+			cu->processor_id_low);
+}
+
+static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
+		struct crat_subtype_computeunit *cu)
+{
+	BUG_ON(!dev);
+	BUG_ON(!cu);
+
+	dev->node_props.simd_id_base = cu->processor_id_low;
+	dev->node_props.simd_count = cu->num_simd_cores;
+	dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
+	dev->node_props.max_waves_per_simd = cu->max_waves_simd;
+	dev->node_props.wave_front_size = cu->wave_front_size;
+	dev->node_props.mem_banks_count = cu->num_banks;
+	dev->node_props.array_count = cu->num_arrays;
+	dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
+	dev->node_props.simd_per_cu = cu->num_simd_per_cu;
+	dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
+	if (cu->hsa_capability & CRAT_CU_FLAGS_HOT_PLUGGABLE)
+		dev->node_props.capability |= HSA_CAP_HOT_PLUGGABLE;
+	pr_info("CU GPU: simds=%d id_base=%d\n", cu->num_simd_cores,
+				cu->processor_id_low);
+}
+
+/* kfd_parse_subtype_cu is called when the topology mutex is already acquired */
+static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu)
+{
+	struct kfd_topology_device *dev;
+	int i = 0;
+
+	BUG_ON(!cu);
+
+	pr_info("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
+			cu->proximity_domain, cu->hsa_capability);
+	list_for_each_entry(dev, &topology_device_list, list) {
+		if (cu->proximity_domain == i) {
+			if (cu->flags & CRAT_CU_FLAGS_CPU_PRESENT)
+				kfd_populated_cu_info_cpu(dev, cu);
+
+			if (cu->flags & CRAT_CU_FLAGS_GPU_PRESENT)
+				kfd_populated_cu_info_gpu(dev, cu);
+			break;
+		}
+		i++;
+	}
+
+	return 0;
+}
+
+/*
+ * kfd_parse_subtype_mem is called when the topology mutex is
+ * already acquired
+ */
+static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
+{
+	struct kfd_mem_properties *props;
+	struct kfd_topology_device *dev;
+	int i = 0;
+
+	BUG_ON(!mem);
+
+	pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
+			mem->promixity_domain);
+	list_for_each_entry(dev, &topology_device_list, list) {
+		if (mem->promixity_domain == i) {
+			props = kfd_alloc_struct(props);
+			if (props == NULL)
+				return -ENOMEM;
+
+			if (dev->node_props.cpu_cores_count == 0)
+				props->heap_type = HSA_MEM_HEAP_TYPE_FB_PRIVATE;
+			else
+				props->heap_type = HSA_MEM_HEAP_TYPE_SYSTEM;
+
+			if (mem->flags & CRAT_MEM_FLAGS_HOT_PLUGGABLE)
+				props->flags |= HSA_MEM_FLAGS_HOT_PLUGGABLE;
+			if (mem->flags & CRAT_MEM_FLAGS_NON_VOLATILE)
+				props->flags |= HSA_MEM_FLAGS_NON_VOLATILE;
+
+			props->size_in_bytes =
+				((uint64_t)mem->length_high << 32) +
+							mem->length_low;
+			props->width = mem->width;
+
+			dev->mem_bank_count++;
+			list_add_tail(&props->list, &dev->mem_props);
+
+			break;
+		}
+		i++;
+	}
+
+	return 0;
+}
+
+/*
+ * kfd_parse_subtype_cache is called when the topology mutex
+ * is already acquired
+ */
+static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
+{
+	struct kfd_cache_properties *props;
+	struct kfd_topology_device *dev;
+	uint32_t id;
+
+	BUG_ON(!cache);
+
+	id = cache->processor_id_low;
+
+	pr_info("Found cache entry in CRAT table with processor_id=%d\n", id);
+	list_for_each_entry(dev, &topology_device_list, list)
+		if (id == dev->node_props.cpu_core_id_base ||
+		    id == dev->node_props.simd_id_base) {
+			props = kfd_alloc_struct(props);
+			if (props == NULL)
+				return -ENOMEM;
+
+			props->processor_id_low = id;
+			props->cache_level = cache->cache_level;
+			props->cache_size = cache->cache_size;
+			props->cacheline_size = cache->cache_line_size;
+			props->cachelines_per_tag = cache->lines_per_tag;
+			props->cache_assoc = cache->associativity;
+			props->cache_latency = cache->cache_latency;
+
+			if (cache->flags & CRAT_CACHE_FLAGS_DATA_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_DATA;
+			if (cache->flags & CRAT_CACHE_FLAGS_INST_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_INSTRUCTION;
+			if (cache->flags & CRAT_CACHE_FLAGS_CPU_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_CPU;
+			if (cache->flags & CRAT_CACHE_FLAGS_SIMD_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_HSACU;
+
+			dev->cache_count++;
+			dev->node_props.caches_count++;
+			list_add_tail(&props->list, &dev->cache_props);
+
+			break;
+		}
+
+	return 0;
+}
+
+/*
+ * kfd_parse_subtype_iolink is called when the topology mutex
+ * is already acquired
+ */
+static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
+{
+	struct kfd_iolink_properties *props;
+	struct kfd_topology_device *dev;
+	uint32_t i = 0;
+	uint32_t id_from;
+	uint32_t id_to;
+
+	BUG_ON(!iolink);
+
+	id_from = iolink->proximity_domain_from;
+	id_to = iolink->proximity_domain_to;
+
+	pr_info("Found IO link entry in CRAT table with id_from=%d\n", id_from);
+	list_for_each_entry(dev, &topology_device_list, list) {
+		if (id_from == i) {
+			props = kfd_alloc_struct(props);
+			if (props == NULL)
+				return -ENOMEM;
+
+			props->node_from = id_from;
+			props->node_to = id_to;
+			props->ver_maj = iolink->version_major;
+			props->ver_min = iolink->version_minor;
+
+			/*
+			 * weight factor (derived from CDIR), currently always 1
+			 */
+			props->weight = 1;
+
+			props->min_latency = iolink->minimum_latency;
+			props->max_latency = iolink->maximum_latency;
+			props->min_bandwidth = iolink->minimum_bandwidth_mbs;
+			props->max_bandwidth = iolink->maximum_bandwidth_mbs;
+			props->rec_transfer_size =
+					iolink->recommended_transfer_size;
+
+			dev->io_link_count++;
+			dev->node_props.io_links_count++;
+			list_add_tail(&props->list, &dev->io_link_props);
+
+			break;
+		}
+		i++;
+	}
+
+	return 0;
+}
+
+static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
+{
+	struct crat_subtype_computeunit *cu;
+	struct crat_subtype_memory *mem;
+	struct crat_subtype_cache *cache;
+	struct crat_subtype_iolink *iolink;
+	int ret = 0;
+
+	BUG_ON(!sub_type_hdr);
+
+	switch (sub_type_hdr->type) {
+	case CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY:
+		cu = (struct crat_subtype_computeunit *)sub_type_hdr;
+		ret = kfd_parse_subtype_cu(cu);
+		break;
+	case CRAT_SUBTYPE_MEMORY_AFFINITY:
+		mem = (struct crat_subtype_memory *)sub_type_hdr;
+		ret = kfd_parse_subtype_mem(mem);
+		break;
+	case CRAT_SUBTYPE_CACHE_AFFINITY:
+		cache = (struct crat_subtype_cache *)sub_type_hdr;
+		ret = kfd_parse_subtype_cache(cache);
+		break;
+	case CRAT_SUBTYPE_TLB_AFFINITY:
+		/*
+		 * For now, nothing to do here
+		 */
+		pr_info("Found TLB entry in CRAT table (not processing)\n");
+		break;
+	case CRAT_SUBTYPE_CCOMPUTE_AFFINITY:
+		/*
+		 * For now, nothing to do here
+		 */
+		pr_info("Found CCOMPUTE entry in CRAT table (not processing)\n");
+		break;
+	case CRAT_SUBTYPE_IOLINK_AFFINITY:
+		iolink = (struct crat_subtype_iolink *)sub_type_hdr;
+		ret = kfd_parse_subtype_iolink(iolink);
+		break;
+	default:
+		pr_warn("Unknown subtype (%d) in CRAT\n",
+				sub_type_hdr->type);
+	}
+
+	return ret;
+}
+
+static void kfd_release_topology_device(struct kfd_topology_device *dev)
+{
+	struct kfd_mem_properties *mem;
+	struct kfd_cache_properties *cache;
+	struct kfd_iolink_properties *iolink;
+
+	BUG_ON(!dev);
+
+	list_del(&dev->list);
+
+	while (dev->mem_props.next != &dev->mem_props) {
+		mem = container_of(dev->mem_props.next,
+				struct kfd_mem_properties, list);
+		list_del(&mem->list);
+		kfree(mem);
+	}
+
+	while (dev->cache_props.next != &dev->cache_props) {
+		cache = container_of(dev->cache_props.next,
+				struct kfd_cache_properties, list);
+		list_del(&cache->list);
+		kfree(cache);
+	}
+
+	while (dev->io_link_props.next != &dev->io_link_props) {
+		iolink = container_of(dev->io_link_props.next,
+				struct kfd_iolink_properties, list);
+		list_del(&iolink->list);
+		kfree(iolink);
+	}
+
+	kfree(dev);
+
+	sys_props.num_devices--;
+}
+
+static void kfd_release_live_view(void)
+{
+	struct kfd_topology_device *dev;
+
+	while (topology_device_list.next != &topology_device_list) {
+		dev = container_of(topology_device_list.next,
+				 struct kfd_topology_device, list);
+		kfd_release_topology_device(dev);
+}
+
+	memset(&sys_props, 0, sizeof(sys_props));
+}
+
+static struct kfd_topology_device *kfd_create_topology_device(void)
+{
+	struct kfd_topology_device *dev;
+
+	dev = kfd_alloc_struct(dev);
+	if (dev == NULL) {
+		pr_err("No memory to allocate a topology device");
+		return NULL;
+	}
+
+	INIT_LIST_HEAD(&dev->mem_props);
+	INIT_LIST_HEAD(&dev->cache_props);
+	INIT_LIST_HEAD(&dev->io_link_props);
+
+	list_add_tail(&dev->list, &topology_device_list);
+	sys_props.num_devices++;
+
+	return dev;
+}
+
+static int kfd_parse_crat_table(void *crat_image)
+{
+	struct kfd_topology_device *top_dev;
+	struct crat_subtype_generic *sub_type_hdr;
+	uint16_t node_id;
+	int ret;
+	struct crat_header *crat_table = (struct crat_header *)crat_image;
+	uint16_t num_nodes;
+	uint32_t image_len;
+
+	if (!crat_image)
+		return -EINVAL;
+
+	num_nodes = crat_table->num_domains;
+	image_len = crat_table->length;
+
+	pr_info("Parsing CRAT table with %d nodes\n", num_nodes);
+
+	for (node_id = 0; node_id < num_nodes; node_id++) {
+		top_dev = kfd_create_topology_device();
+		if (!top_dev) {
+			kfd_release_live_view();
+			return -ENOMEM;
+		}
+	}
+
+	sys_props.platform_id =
+		(*((uint64_t *)crat_table->oem_id)) & CRAT_OEMID_64BIT_MASK;
+	sys_props.platform_oem = *((uint64_t *)crat_table->oem_table_id);
+	sys_props.platform_rev = crat_table->revision;
+
+	sub_type_hdr = (struct crat_subtype_generic *)(crat_table+1);
+	while ((char *)sub_type_hdr + sizeof(struct crat_subtype_generic) <
+			((char *)crat_image) + image_len) {
+		if (sub_type_hdr->flags & CRAT_SUBTYPE_FLAGS_ENABLED) {
+			ret = kfd_parse_subtype(sub_type_hdr);
+			if (ret != 0) {
+				kfd_release_live_view();
+				return ret;
+			}
+		}
+
+		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
+				sub_type_hdr->length);
+	}
+
+	sys_props.generation_count++;
+	topology_crat_parsed = 1;
+
+	return 0;
+}
+
+
+#define sysfs_show_gen_prop(buffer, fmt, ...) \
+		snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)
+#define sysfs_show_32bit_prop(buffer, name, value) \
+		sysfs_show_gen_prop(buffer, "%s %u\n", name, value)
+#define sysfs_show_64bit_prop(buffer, name, value) \
+		sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
+#define sysfs_show_32bit_val(buffer, value) \
+		sysfs_show_gen_prop(buffer, "%u\n", value)
+#define sysfs_show_str_val(buffer, value) \
+		sysfs_show_gen_prop(buffer, "%s\n", value)
+
+static ssize_t sysprops_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	if (attr == &sys_props.attr_genid) {
+		ret = sysfs_show_32bit_val(buffer, sys_props.generation_count);
+	} else if (attr == &sys_props.attr_props) {
+		sysfs_show_64bit_prop(buffer, "platform_oem",
+				sys_props.platform_oem);
+		sysfs_show_64bit_prop(buffer, "platform_id",
+				sys_props.platform_id);
+		ret = sysfs_show_64bit_prop(buffer, "platform_rev",
+				sys_props.platform_rev);
+	} else {
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static const struct sysfs_ops sysprops_ops = {
+	.show = sysprops_show,
+};
+
+static struct kobj_type sysprops_type = {
+	.sysfs_ops = &sysprops_ops,
+};
+
+static ssize_t iolink_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+	struct kfd_iolink_properties *iolink;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	iolink = container_of(attr, struct kfd_iolink_properties, attr);
+	sysfs_show_32bit_prop(buffer, "type", iolink->iolink_type);
+	sysfs_show_32bit_prop(buffer, "version_major", iolink->ver_maj);
+	sysfs_show_32bit_prop(buffer, "version_minor", iolink->ver_min);
+	sysfs_show_32bit_prop(buffer, "node_from", iolink->node_from);
+	sysfs_show_32bit_prop(buffer, "node_to", iolink->node_to);
+	sysfs_show_32bit_prop(buffer, "weight", iolink->weight);
+	sysfs_show_32bit_prop(buffer, "min_latency", iolink->min_latency);
+	sysfs_show_32bit_prop(buffer, "max_latency", iolink->max_latency);
+	sysfs_show_32bit_prop(buffer, "min_bandwidth", iolink->min_bandwidth);
+	sysfs_show_32bit_prop(buffer, "max_bandwidth", iolink->max_bandwidth);
+	sysfs_show_32bit_prop(buffer, "recommended_transfer_size",
+			iolink->rec_transfer_size);
+	ret = sysfs_show_32bit_prop(buffer, "flags", iolink->flags);
+
+	return ret;
+}
+
+static const struct sysfs_ops iolink_ops = {
+	.show = iolink_show,
+};
+
+static struct kobj_type iolink_type = {
+	.sysfs_ops = &iolink_ops,
+};
+
+static ssize_t mem_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+	struct kfd_mem_properties *mem;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	mem = container_of(attr, struct kfd_mem_properties, attr);
+	sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
+	sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
+	sysfs_show_32bit_prop(buffer, "flags", mem->flags);
+	sysfs_show_32bit_prop(buffer, "width", mem->width);
+	ret = sysfs_show_32bit_prop(buffer, "mem_clk_max", mem->mem_clk_max);
+
+	return ret;
+}
+
+static const struct sysfs_ops mem_ops = {
+	.show = mem_show,
+};
+
+static struct kobj_type mem_type = {
+	.sysfs_ops = &mem_ops,
+};
+
+static ssize_t kfd_cache_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+	uint32_t i;
+	struct kfd_cache_properties *cache;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	cache = container_of(attr, struct kfd_cache_properties, attr);
+	sysfs_show_32bit_prop(buffer, "processor_id_low",
+			cache->processor_id_low);
+	sysfs_show_32bit_prop(buffer, "level", cache->cache_level);
+	sysfs_show_32bit_prop(buffer, "size", cache->cache_size);
+	sysfs_show_32bit_prop(buffer, "cache_line_size", cache->cacheline_size);
+	sysfs_show_32bit_prop(buffer, "cache_lines_per_tag",
+			cache->cachelines_per_tag);
+	sysfs_show_32bit_prop(buffer, "association", cache->cache_assoc);
+	sysfs_show_32bit_prop(buffer, "latency", cache->cache_latency);
+	sysfs_show_32bit_prop(buffer, "type", cache->cache_type);
+	snprintf(buffer, PAGE_SIZE, "%ssibling_map ", buffer);
+	for (i = 0; i < KFD_TOPOLOGY_CPU_SIBLINGS; i++)
+		ret = snprintf(buffer, PAGE_SIZE, "%s%d%s",
+				buffer, cache->sibling_map[i],
+				(i == KFD_TOPOLOGY_CPU_SIBLINGS-1) ?
+						"\n" : ",");
+
+	return ret;
+}
+
+static const struct sysfs_ops cache_ops = {
+	.show = kfd_cache_show,
+};
+
+static struct kobj_type cache_type = {
+	.sysfs_ops = &cache_ops,
+};
+
+static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+	struct kfd_topology_device *dev;
+	char public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
+	uint32_t i;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	if (strcmp(attr->name, "gpu_id") == 0) {
+		dev = container_of(attr, struct kfd_topology_device,
+				attr_gpuid);
+		ret = sysfs_show_32bit_val(buffer, dev->gpu_id);
+	} else if (strcmp(attr->name, "name") == 0) {
+		dev = container_of(attr, struct kfd_topology_device,
+				attr_name);
+		for (i = 0; i < KFD_TOPOLOGY_PUBLIC_NAME_SIZE; i++) {
+			public_name[i] =
+					(char)dev->node_props.marketing_name[i];
+			if (dev->node_props.marketing_name[i] == 0)
+				break;
+		}
+		public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE-1] = 0x0;
+		ret = sysfs_show_str_val(buffer, public_name);
+	} else {
+		dev = container_of(attr, struct kfd_topology_device,
+				attr_props);
+		sysfs_show_32bit_prop(buffer, "cpu_cores_count",
+				dev->node_props.cpu_cores_count);
+		sysfs_show_32bit_prop(buffer, "simd_count",
+				dev->node_props.simd_count);
+
+		if (dev->mem_bank_count < dev->node_props.mem_banks_count) {
+			pr_warn("kfd: mem_banks_count truncated from %d to %d\n",
+					dev->node_props.mem_banks_count,
+					dev->mem_bank_count);
+			sysfs_show_32bit_prop(buffer, "mem_banks_count",
+					dev->mem_bank_count);
+		} else {
+			sysfs_show_32bit_prop(buffer, "mem_banks_count",
+					dev->node_props.mem_banks_count);
+		}
+
+		sysfs_show_32bit_prop(buffer, "caches_count",
+				dev->node_props.caches_count);
+		sysfs_show_32bit_prop(buffer, "io_links_count",
+				dev->node_props.io_links_count);
+		sysfs_show_32bit_prop(buffer, "cpu_core_id_base",
+				dev->node_props.cpu_core_id_base);
+		sysfs_show_32bit_prop(buffer, "simd_id_base",
+				dev->node_props.simd_id_base);
+		sysfs_show_32bit_prop(buffer, "capability",
+				dev->node_props.capability);
+		sysfs_show_32bit_prop(buffer, "max_waves_per_simd",
+				dev->node_props.max_waves_per_simd);
+		sysfs_show_32bit_prop(buffer, "lds_size_in_kb",
+				dev->node_props.lds_size_in_kb);
+		sysfs_show_32bit_prop(buffer, "gds_size_in_kb",
+				dev->node_props.gds_size_in_kb);
+		sysfs_show_32bit_prop(buffer, "wave_front_size",
+				dev->node_props.wave_front_size);
+		sysfs_show_32bit_prop(buffer, "array_count",
+				dev->node_props.array_count);
+		sysfs_show_32bit_prop(buffer, "simd_arrays_per_engine",
+				dev->node_props.simd_arrays_per_engine);
+		sysfs_show_32bit_prop(buffer, "cu_per_simd_array",
+				dev->node_props.cu_per_simd_array);
+		sysfs_show_32bit_prop(buffer, "simd_per_cu",
+				dev->node_props.simd_per_cu);
+		sysfs_show_32bit_prop(buffer, "max_slots_scratch_cu",
+				dev->node_props.max_slots_scratch_cu);
+		sysfs_show_32bit_prop(buffer, "engine_id",
+				dev->node_props.engine_id);
+		sysfs_show_32bit_prop(buffer, "vendor_id",
+				dev->node_props.vendor_id);
+		sysfs_show_32bit_prop(buffer, "device_id",
+				dev->node_props.device_id);
+		sysfs_show_32bit_prop(buffer, "location_id",
+				dev->node_props.location_id);
+
+		if (dev->gpu) {
+			sysfs_show_32bit_prop(buffer, "max_engine_clk_fcompute",
+					kfd2kgd->get_max_engine_clock_in_mhz(
+						dev->gpu->kgd));
+			sysfs_show_64bit_prop(buffer, "local_mem_size",
+					kfd2kgd->get_vmem_size(dev->gpu->kgd));
+		}
+
+		ret = sysfs_show_32bit_prop(buffer, "max_engine_clk_ccompute",
+				cpufreq_quick_get_max(0)/1000);
+	}
+
+	return ret;
+}
+
+static const struct sysfs_ops node_ops = {
+	.show = node_show,
+};
+
+static struct kobj_type node_type = {
+	.sysfs_ops = &node_ops,
+};
+
+static void kfd_remove_sysfs_file(struct kobject *kobj, struct attribute *attr)
+{
+	sysfs_remove_file(kobj, attr);
+	kobject_del(kobj);
+	kobject_put(kobj);
+}
+
+static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
+{
+	struct kfd_iolink_properties *iolink;
+	struct kfd_cache_properties *cache;
+	struct kfd_mem_properties *mem;
+
+	BUG_ON(!dev);
+
+	if (dev->kobj_iolink) {
+		list_for_each_entry(iolink, &dev->io_link_props, list)
+			if (iolink->kobj) {
+				kfd_remove_sysfs_file(iolink->kobj,
+							&iolink->attr);
+				iolink->kobj = NULL;
+			}
+		kobject_del(dev->kobj_iolink);
+		kobject_put(dev->kobj_iolink);
+		dev->kobj_iolink = NULL;
+	}
+
+	if (dev->kobj_cache) {
+		list_for_each_entry(cache, &dev->cache_props, list)
+			if (cache->kobj) {
+				kfd_remove_sysfs_file(cache->kobj,
+							&cache->attr);
+				cache->kobj = NULL;
+			}
+		kobject_del(dev->kobj_cache);
+		kobject_put(dev->kobj_cache);
+		dev->kobj_cache = NULL;
+	}
+
+	if (dev->kobj_mem) {
+		list_for_each_entry(mem, &dev->mem_props, list)
+			if (mem->kobj) {
+				kfd_remove_sysfs_file(mem->kobj, &mem->attr);
+				mem->kobj = NULL;
+			}
+		kobject_del(dev->kobj_mem);
+		kobject_put(dev->kobj_mem);
+		dev->kobj_mem = NULL;
+	}
+
+	if (dev->kobj_node) {
+		sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
+		sysfs_remove_file(dev->kobj_node, &dev->attr_name);
+		sysfs_remove_file(dev->kobj_node, &dev->attr_props);
+		kobject_del(dev->kobj_node);
+		kobject_put(dev->kobj_node);
+		dev->kobj_node = NULL;
+	}
+}
+
+static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
+		uint32_t id)
+{
+	struct kfd_iolink_properties *iolink;
+	struct kfd_cache_properties *cache;
+	struct kfd_mem_properties *mem;
+	int ret;
+	uint32_t i;
+
+	BUG_ON(!dev);
+
+	/*
+	 * Creating the sysfs folders
+	 */
+	BUG_ON(dev->kobj_node);
+	dev->kobj_node = kfd_alloc_struct(dev->kobj_node);
+	if (!dev->kobj_node)
+		return -ENOMEM;
+
+	ret = kobject_init_and_add(dev->kobj_node, &node_type,
+			sys_props.kobj_nodes, "%d", id);
+	if (ret < 0)
+		return ret;
+
+	dev->kobj_mem = kobject_create_and_add("mem_banks", dev->kobj_node);
+	if (!dev->kobj_mem)
+		return -ENOMEM;
+
+	dev->kobj_cache = kobject_create_and_add("caches", dev->kobj_node);
+	if (!dev->kobj_cache)
+		return -ENOMEM;
+
+	dev->kobj_iolink = kobject_create_and_add("io_links", dev->kobj_node);
+	if (!dev->kobj_iolink)
+		return -ENOMEM;
+
+	/*
+	 * Creating sysfs files for node properties
+	 */
+	dev->attr_gpuid.name = "gpu_id";
+	dev->attr_gpuid.mode = KFD_SYSFS_FILE_MODE;
+	sysfs_attr_init(&dev->attr_gpuid);
+	dev->attr_name.name = "name";
+	dev->attr_name.mode = KFD_SYSFS_FILE_MODE;
+	sysfs_attr_init(&dev->attr_name);
+	dev->attr_props.name = "properties";
+	dev->attr_props.mode = KFD_SYSFS_FILE_MODE;
+	sysfs_attr_init(&dev->attr_props);
+	ret = sysfs_create_file(dev->kobj_node, &dev->attr_gpuid);
+	if (ret < 0)
+		return ret;
+	ret = sysfs_create_file(dev->kobj_node, &dev->attr_name);
+	if (ret < 0)
+		return ret;
+	ret = sysfs_create_file(dev->kobj_node, &dev->attr_props);
+	if (ret < 0)
+		return ret;
+
+	i = 0;
+	list_for_each_entry(mem, &dev->mem_props, list) {
+		mem->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
+		if (!mem->kobj)
+			return -ENOMEM;
+		ret = kobject_init_and_add(mem->kobj, &mem_type,
+				dev->kobj_mem, "%d", i);
+		if (ret < 0)
+			return ret;
+
+		mem->attr.name = "properties";
+		mem->attr.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&mem->attr);
+		ret = sysfs_create_file(mem->kobj, &mem->attr);
+		if (ret < 0)
+			return ret;
+		i++;
+	}
+
+	i = 0;
+	list_for_each_entry(cache, &dev->cache_props, list) {
+		cache->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
+		if (!cache->kobj)
+			return -ENOMEM;
+		ret = kobject_init_and_add(cache->kobj, &cache_type,
+				dev->kobj_cache, "%d", i);
+		if (ret < 0)
+			return ret;
+
+		cache->attr.name = "properties";
+		cache->attr.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&cache->attr);
+		ret = sysfs_create_file(cache->kobj, &cache->attr);
+		if (ret < 0)
+			return ret;
+		i++;
+	}
+
+	i = 0;
+	list_for_each_entry(iolink, &dev->io_link_props, list) {
+		iolink->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
+		if (!iolink->kobj)
+			return -ENOMEM;
+		ret = kobject_init_and_add(iolink->kobj, &iolink_type,
+				dev->kobj_iolink, "%d", i);
+		if (ret < 0)
+			return ret;
+
+		iolink->attr.name = "properties";
+		iolink->attr.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&iolink->attr);
+		ret = sysfs_create_file(iolink->kobj, &iolink->attr);
+		if (ret < 0)
+			return ret;
+		i++;
+}
+
+	return 0;
+}
+
+static int kfd_build_sysfs_node_tree(void)
+{
+	struct kfd_topology_device *dev;
+	int ret;
+	uint32_t i = 0;
+
+	list_for_each_entry(dev, &topology_device_list, list) {
+		ret = kfd_build_sysfs_node_entry(dev, 0);
+		if (ret < 0)
+			return ret;
+		i++;
+	}
+
+	return 0;
+}
+
+static void kfd_remove_sysfs_node_tree(void)
+{
+	struct kfd_topology_device *dev;
+
+	list_for_each_entry(dev, &topology_device_list, list)
+		kfd_remove_sysfs_node_entry(dev);
+}
+
+static int kfd_topology_update_sysfs(void)
+{
+	int ret;
+
+	pr_info("Creating topology SYSFS entries\n");
+	if (sys_props.kobj_topology == NULL) {
+		sys_props.kobj_topology =
+				kfd_alloc_struct(sys_props.kobj_topology);
+		if (!sys_props.kobj_topology)
+			return -ENOMEM;
+
+		ret = kobject_init_and_add(sys_props.kobj_topology,
+				&sysprops_type,  &kfd_device->kobj,
+				"topology");
+		if (ret < 0)
+			return ret;
+
+		sys_props.kobj_nodes = kobject_create_and_add("nodes",
+				sys_props.kobj_topology);
+		if (!sys_props.kobj_nodes)
+			return -ENOMEM;
+
+		sys_props.attr_genid.name = "generation_id";
+		sys_props.attr_genid.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&sys_props.attr_genid);
+		ret = sysfs_create_file(sys_props.kobj_topology,
+				&sys_props.attr_genid);
+		if (ret < 0)
+			return ret;
+
+		sys_props.attr_props.name = "system_properties";
+		sys_props.attr_props.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&sys_props.attr_props);
+		ret = sysfs_create_file(sys_props.kobj_topology,
+				&sys_props.attr_props);
+		if (ret < 0)
+			return ret;
+	}
+
+	kfd_remove_sysfs_node_tree();
+
+	return kfd_build_sysfs_node_tree();
+}
+
+static void kfd_topology_release_sysfs(void)
+{
+	kfd_remove_sysfs_node_tree();
+	if (sys_props.kobj_topology) {
+		sysfs_remove_file(sys_props.kobj_topology,
+				&sys_props.attr_genid);
+		sysfs_remove_file(sys_props.kobj_topology,
+				&sys_props.attr_props);
+		if (sys_props.kobj_nodes) {
+			kobject_del(sys_props.kobj_nodes);
+			kobject_put(sys_props.kobj_nodes);
+			sys_props.kobj_nodes = NULL;
+		}
+		kobject_del(sys_props.kobj_topology);
+		kobject_put(sys_props.kobj_topology);
+		sys_props.kobj_topology = NULL;
+	}
+}
+
+int kfd_topology_init(void)
+{
+	void *crat_image = NULL;
+	size_t image_size = 0;
+	int ret;
+
+	/*
+	 * Initialize the head for the topology device list
+	 */
+	INIT_LIST_HEAD(&topology_device_list);
+	init_rwsem(&topology_lock);
+	topology_crat_parsed = 0;
+
+	memset(&sys_props, 0, sizeof(sys_props));
+
+	/*
+	 * Get the CRAT image from the ACPI
+	 */
+	ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
+	if (ret == 0 && image_size > 0) {
+		pr_info("Found CRAT image with size=%zd\n", image_size);
+		crat_image = kmalloc(image_size, GFP_KERNEL);
+		if (!crat_image) {
+			ret = -ENOMEM;
+			pr_err("No memory for allocating CRAT image\n");
+			goto err;
+		}
+		ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
+
+		if (ret == 0) {
+			down_write(&topology_lock);
+			ret = kfd_parse_crat_table(crat_image);
+			if (ret == 0)
+				ret = kfd_topology_update_sysfs();
+			up_write(&topology_lock);
+		} else {
+			pr_err("Couldn't get CRAT table size from ACPI\n");
+		}
+		kfree(crat_image);
+	} else if (ret == -ENODATA) {
+		ret = 0;
+	} else {
+		pr_err("Couldn't get CRAT table size from ACPI\n");
+	}
+
+err:
+	pr_info("Finished initializing topology ret=%d\n", ret);
+	return ret;
+}
+
+void kfd_topology_shutdown(void)
+{
+	kfd_topology_release_sysfs();
+	kfd_release_live_view();
+}
+
+static void kfd_debug_print_topology(void)
+{
+	struct kfd_topology_device *dev;
+	uint32_t i = 0;
+
+	pr_info("DEBUG PRINT OF TOPOLOGY:");
+	list_for_each_entry(dev, &topology_device_list, list) {
+		pr_info("Node: %d\n", i);
+		pr_info("\tGPU assigned: %s\n", (dev->gpu ? "yes" : "no"));
+		pr_info("\tCPU count: %d\n", dev->node_props.cpu_cores_count);
+		pr_info("\tSIMD count: %d", dev->node_props.simd_count);
+		i++;
+	}
+}
+
+static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
+{
+	uint32_t hashout;
+	uint32_t buf[7];
+	int i;
+
+	if (!gpu)
+		return 0;
+
+	buf[0] = gpu->pdev->devfn;
+	buf[1] = gpu->pdev->subsystem_vendor;
+	buf[2] = gpu->pdev->subsystem_device;
+	buf[3] = gpu->pdev->device;
+	buf[4] = gpu->pdev->bus->number;
+	buf[5] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) & 0xffffffff);
+	buf[6] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) >> 32);
+
+	for (i = 0, hashout = 0; i < 7; i++)
+		hashout ^= hash_32(buf[i], KFD_GPU_ID_HASH_WIDTH);
+
+	return hashout;
+}
+
+static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
+{
+	struct kfd_topology_device *dev;
+	struct kfd_topology_device *out_dev = NULL;
+
+	BUG_ON(!gpu);
+
+	list_for_each_entry(dev, &topology_device_list, list)
+		if (dev->gpu == NULL && dev->node_props.simd_count > 0) {
+			dev->gpu = gpu;
+			out_dev = dev;
+			break;
+		}
+
+	return out_dev;
+}
+
+static void kfd_notify_gpu_change(uint32_t gpu_id, int arrival)
+{
+	/*
+	 * TODO: Generate an event for thunk about the arrival/removal
+	 * of the GPU
+	 */
+}
+
+int kfd_topology_add_device(struct kfd_dev *gpu)
+{
+	uint32_t gpu_id;
+	struct kfd_topology_device *dev;
+	int res;
+
+	BUG_ON(!gpu);
+
+	gpu_id = kfd_generate_gpu_id(gpu);
+
+	pr_debug("kfd: Adding new GPU (ID: 0x%x) to topology\n", gpu_id);
+
+	down_write(&topology_lock);
+	/*
+	 * Try to assign the GPU to existing topology device (generated from
+	 * CRAT table
+	 */
+	dev = kfd_assign_gpu(gpu);
+	if (!dev) {
+		pr_info("GPU was not found in the current topology. Extending.\n");
+		kfd_debug_print_topology();
+		dev = kfd_create_topology_device();
+		if (!dev) {
+			res = -ENOMEM;
+			goto err;
+		}
+		dev->gpu = gpu;
+
+		/*
+		 * TODO: Make a call to retrieve topology information from the
+		 * GPU vBIOS
+		 */
+
+		/*
+		 * Update the SYSFS tree, since we added another topology device
+		 */
+		if (kfd_topology_update_sysfs() < 0)
+			kfd_topology_release_sysfs();
+
+	}
+
+	dev->gpu_id = gpu_id;
+	gpu->id = gpu_id;
+	dev->node_props.vendor_id = gpu->pdev->vendor;
+	dev->node_props.device_id = gpu->pdev->device;
+	dev->node_props.location_id = (gpu->pdev->bus->number << 24) +
+			(gpu->pdev->devfn & 0xffffff);
+	/*
+	 * TODO: Retrieve max engine clock values from KGD
+	 */
+
+	res = 0;
+
+err:
+	up_write(&topology_lock);
+
+	if (res == 0)
+		kfd_notify_gpu_change(gpu_id, 1);
+
+	return res;
+}
+
+int kfd_topology_remove_device(struct kfd_dev *gpu)
+{
+	struct kfd_topology_device *dev;
+	uint32_t gpu_id;
+	int res = -ENODEV;
+
+	BUG_ON(!gpu);
+
+	down_write(&topology_lock);
+
+	list_for_each_entry(dev, &topology_device_list, list)
+		if (dev->gpu == gpu) {
+			gpu_id = dev->gpu_id;
+			kfd_remove_sysfs_node_entry(dev);
+			kfd_release_topology_device(dev);
+			res = 0;
+			if (kfd_topology_update_sysfs() < 0)
+				kfd_topology_release_sysfs();
+			break;
+		}
+
+	up_write(&topology_lock);
+
+	if (res == 0)
+		kfd_notify_gpu_change(gpu_id, 0);
+
+	return res;
+}
+
+/*
+ * When idx is out of bounds, the function will return NULL
+ */
+struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx)
+{
+
+	struct kfd_topology_device *top_dev;
+	struct kfd_dev *device = NULL;
+	uint8_t device_idx = 0;
+
+	down_read(&topology_lock);
+
+	list_for_each_entry(top_dev, &topology_device_list, list) {
+		if (device_idx == idx) {
+			device = top_dev->gpu;
+			break;
+		}
+
+		device_idx++;
+	}
+
+	up_read(&topology_lock);
+
+	return device;
+
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
new file mode 100644
index 000000000000..989624b3cd14
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -0,0 +1,168 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __KFD_TOPOLOGY_H__
+#define __KFD_TOPOLOGY_H__
+
+#include <linux/types.h>
+#include <linux/list.h>
+#include "kfd_priv.h"
+
+#define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
+
+#define HSA_CAP_HOT_PLUGGABLE			0x00000001
+#define HSA_CAP_ATS_PRESENT			0x00000002
+#define HSA_CAP_SHARED_WITH_GRAPHICS		0x00000004
+#define HSA_CAP_QUEUE_SIZE_POW2			0x00000008
+#define HSA_CAP_QUEUE_SIZE_32BIT		0x00000010
+#define HSA_CAP_QUEUE_IDLE_EVENT		0x00000020
+#define HSA_CAP_VA_LIMIT			0x00000040
+#define HSA_CAP_WATCH_POINTS_SUPPORTED		0x00000080
+#define HSA_CAP_WATCH_POINTS_TOTALBITS_MASK	0x00000f00
+#define HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT	8
+#define HSA_CAP_RESERVED			0xfffff000
+
+struct kfd_node_properties {
+	uint32_t cpu_cores_count;
+	uint32_t simd_count;
+	uint32_t mem_banks_count;
+	uint32_t caches_count;
+	uint32_t io_links_count;
+	uint32_t cpu_core_id_base;
+	uint32_t simd_id_base;
+	uint32_t capability;
+	uint32_t max_waves_per_simd;
+	uint32_t lds_size_in_kb;
+	uint32_t gds_size_in_kb;
+	uint32_t wave_front_size;
+	uint32_t array_count;
+	uint32_t simd_arrays_per_engine;
+	uint32_t cu_per_simd_array;
+	uint32_t simd_per_cu;
+	uint32_t max_slots_scratch_cu;
+	uint32_t engine_id;
+	uint32_t vendor_id;
+	uint32_t device_id;
+	uint32_t location_id;
+	uint32_t max_engine_clk_fcompute;
+	uint32_t max_engine_clk_ccompute;
+	uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
+};
+
+#define HSA_MEM_HEAP_TYPE_SYSTEM	0
+#define HSA_MEM_HEAP_TYPE_FB_PUBLIC	1
+#define HSA_MEM_HEAP_TYPE_FB_PRIVATE	2
+#define HSA_MEM_HEAP_TYPE_GPU_GDS	3
+#define HSA_MEM_HEAP_TYPE_GPU_LDS	4
+#define HSA_MEM_HEAP_TYPE_GPU_SCRATCH	5
+
+#define HSA_MEM_FLAGS_HOT_PLUGGABLE	0x00000001
+#define HSA_MEM_FLAGS_NON_VOLATILE	0x00000002
+#define HSA_MEM_FLAGS_RESERVED		0xfffffffc
+
+struct kfd_mem_properties {
+	struct list_head	list;
+	uint32_t		heap_type;
+	uint64_t		size_in_bytes;
+	uint32_t		flags;
+	uint32_t		width;
+	uint32_t		mem_clk_max;
+	struct kobject		*kobj;
+	struct attribute	attr;
+};
+
+#define KFD_TOPOLOGY_CPU_SIBLINGS 256
+
+#define HSA_CACHE_TYPE_DATA		0x00000001
+#define HSA_CACHE_TYPE_INSTRUCTION	0x00000002
+#define HSA_CACHE_TYPE_CPU		0x00000004
+#define HSA_CACHE_TYPE_HSACU		0x00000008
+#define HSA_CACHE_TYPE_RESERVED		0xfffffff0
+
+struct kfd_cache_properties {
+	struct list_head	list;
+	uint32_t		processor_id_low;
+	uint32_t		cache_level;
+	uint32_t		cache_size;
+	uint32_t		cacheline_size;
+	uint32_t		cachelines_per_tag;
+	uint32_t		cache_assoc;
+	uint32_t		cache_latency;
+	uint32_t		cache_type;
+	uint8_t			sibling_map[KFD_TOPOLOGY_CPU_SIBLINGS];
+	struct kobject		*kobj;
+	struct attribute	attr;
+};
+
+struct kfd_iolink_properties {
+	struct list_head	list;
+	uint32_t		iolink_type;
+	uint32_t		ver_maj;
+	uint32_t		ver_min;
+	uint32_t		node_from;
+	uint32_t		node_to;
+	uint32_t		weight;
+	uint32_t		min_latency;
+	uint32_t		max_latency;
+	uint32_t		min_bandwidth;
+	uint32_t		max_bandwidth;
+	uint32_t		rec_transfer_size;
+	uint32_t		flags;
+	struct kobject		*kobj;
+	struct attribute	attr;
+};
+
+struct kfd_topology_device {
+	struct list_head		list;
+	uint32_t			gpu_id;
+	struct kfd_node_properties	node_props;
+	uint32_t			mem_bank_count;
+	struct list_head		mem_props;
+	uint32_t			cache_count;
+	struct list_head		cache_props;
+	uint32_t			io_link_count;
+	struct list_head		io_link_props;
+	struct kfd_dev			*gpu;
+	struct kobject			*kobj_node;
+	struct kobject			*kobj_mem;
+	struct kobject			*kobj_cache;
+	struct kobject			*kobj_iolink;
+	struct attribute		attr_gpuid;
+	struct attribute		attr_name;
+	struct attribute		attr_props;
+};
+
+struct kfd_system_properties {
+	uint32_t		num_devices;     /* Number of H-NUMA nodes */
+	uint32_t		generation_count;
+	uint64_t		platform_oem;
+	uint64_t		platform_id;
+	uint64_t		platform_rev;
+	struct kobject		*kobj_topology;
+	struct kobject		*kobj_nodes;
+	struct attribute	attr_genid;
+	struct attribute	attr_props;
+};
+
+
+
+#endif /* __KFD_TOPOLOGY_H__ */
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
new file mode 100644
index 000000000000..9c729dd8dd50
--- /dev/null
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -0,0 +1,185 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/*
+ * This file defines the private interface between the
+ * AMD kernel graphics drivers and the AMD KFD.
+ */
+
+#ifndef KGD_KFD_INTERFACE_H_INCLUDED
+#define KGD_KFD_INTERFACE_H_INCLUDED
+
+#include <linux/types.h>
+
+struct pci_dev;
+
+#define KFD_INTERFACE_VERSION 1
+
+struct kfd_dev;
+struct kgd_dev;
+
+struct kgd_mem;
+
+enum kgd_memory_pool {
+	KGD_POOL_SYSTEM_CACHEABLE = 1,
+	KGD_POOL_SYSTEM_WRITECOMBINE = 2,
+	KGD_POOL_FRAMEBUFFER = 3,
+};
+
+struct kgd2kfd_shared_resources {
+	/* Bit n == 1 means VMID n is available for KFD. */
+	unsigned int compute_vmid_bitmap;
+
+	/* Compute pipes are counted starting from MEC0/pipe0 as 0. */
+	unsigned int first_compute_pipe;
+
+	/* Number of MEC pipes available for KFD. */
+	unsigned int compute_pipe_count;
+
+	/* Base address of doorbell aperture. */
+	phys_addr_t doorbell_physical_address;
+
+	/* Size in bytes of doorbell aperture. */
+	size_t doorbell_aperture_size;
+
+	/* Number of bytes at start of aperture reserved for KGD. */
+	size_t doorbell_start_offset;
+};
+
+/**
+ * struct kgd2kfd_calls
+ *
+ * @exit: Notifies amdkfd that kgd module is unloaded
+ *
+ * @probe: Notifies amdkfd about a probe done on a device in the kgd driver.
+ *
+ * @device_init: Initialize the newly probed device (if it is a device that
+ * amdkfd supports)
+ *
+ * @device_exit: Notifies amdkfd about a removal of a kgd device
+ *
+ * @suspend: Notifies amdkfd about a suspend action done to a kgd device
+ *
+ * @resume: Notifies amdkfd about a resume action done to a kgd device
+ *
+ * This structure contains function callback pointers so the kgd driver
+ * will notify to the amdkfd about certain status changes.
+ *
+ */
+struct kgd2kfd_calls {
+	void (*exit)(void);
+	struct kfd_dev* (*probe)(struct kgd_dev *kgd, struct pci_dev *pdev);
+	bool (*device_init)(struct kfd_dev *kfd,
+			const struct kgd2kfd_shared_resources *gpu_resources);
+	void (*device_exit)(struct kfd_dev *kfd);
+	void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
+	void (*suspend)(struct kfd_dev *kfd);
+	int (*resume)(struct kfd_dev *kfd);
+};
+
+/**
+ * struct kfd2kgd_calls
+ *
+ * @init_sa_manager: Initialize an instance of the sa manager, used by
+ * amdkfd for all system memory allocations that are mapped to the GART
+ * address space
+ *
+ * @fini_sa_manager: Releases all memory allocations for amdkfd that are
+ * handled by kgd sa manager
+ *
+ * @allocate_mem: Allocate a buffer from amdkfd's sa manager. The buffer can
+ * be used for mqds, hpds, kernel queue, fence and runlists
+ *
+ * @free_mem: Frees a buffer that was allocated by amdkfd's sa manager
+ *
+ * @get_vmem_size: Retrieves (physical) size of VRAM
+ *
+ * @get_gpu_clock_counter: Retrieves GPU clock counter
+ *
+ * @get_max_engine_clock_in_mhz: Retrieves maximum GPU clock in MHz
+ *
+ * @program_sh_mem_settings: A function that should initiate the memory
+ * properties such as main aperture memory type (cache / non cached) and
+ * secondary aperture base address, size and memory type.
+ * This function is used only for no cp scheduling mode.
+ *
+ * @set_pasid_vmid_mapping: Exposes pasid/vmid pair to the H/W for no cp
+ * scheduling mode. Only used for no cp scheduling mode.
+ *
+ * @init_memory: Initializes memory apertures to fixed base/limit address
+ * and non cached memory types.
+ *
+ * @init_pipeline: Initialized the compute pipelines.
+ *
+ * @hqd_load: Loads the mqd structure to a H/W hqd slot. used only for no cp
+ * sceduling mode.
+ *
+ * @hqd_is_occupies: Checks if a hqd slot is occupied.
+ *
+ * @hqd_destroy: Destructs and preempts the queue assigned to that hqd slot.
+ *
+ * This structure contains function pointers to services that the kgd driver
+ * provides to amdkfd driver.
+ *
+ */
+struct kfd2kgd_calls {
+	/* Memory management. */
+	int (*init_sa_manager)(struct kgd_dev *kgd, unsigned int size);
+	void (*fini_sa_manager)(struct kgd_dev *kgd);
+	int (*allocate_mem)(struct kgd_dev *kgd, size_t size, size_t alignment,
+			enum kgd_memory_pool pool, struct kgd_mem **mem);
+
+	void (*free_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
+
+	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
+	uint64_t (*get_gpu_clock_counter)(struct kgd_dev *kgd);
+
+	uint32_t (*get_max_engine_clock_in_mhz)(struct kgd_dev *kgd);
+
+	/* Register access functions */
+	void (*program_sh_mem_settings)(struct kgd_dev *kgd, uint32_t vmid,
+			uint32_t sh_mem_config,	uint32_t sh_mem_ape1_base,
+			uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases);
+
+	int (*set_pasid_vmid_mapping)(struct kgd_dev *kgd, unsigned int pasid,
+					unsigned int vmid);
+
+	int (*init_memory)(struct kgd_dev *kgd);
+	int (*init_pipeline)(struct kgd_dev *kgd, uint32_t pipe_id,
+				uint32_t hpd_size, uint64_t hpd_gpu_addr);
+
+	int (*hqd_load)(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
+			uint32_t queue_id, uint32_t __user *wptr);
+
+	bool (*hqd_is_occupies)(struct kgd_dev *kgd, uint64_t queue_address,
+				uint32_t pipe_id, uint32_t queue_id);
+
+	int (*hqd_destroy)(struct kgd_dev *kgd, uint32_t reset_type,
+				unsigned int timeout, uint32_t pipe_id,
+				uint32_t queue_id);
+};
+
+bool kgd2kfd_init(unsigned interface_version,
+		  const struct kfd2kgd_calls *f2g,
+		  const struct kgd2kfd_calls **g2f);
+
+#endif /* KGD_KFD_INTERFACE_H_INCLUDED */
diff --git a/drivers/gpu/drm/armada/armada_crtc.c b/drivers/gpu/drm/armada/armada_crtc.c
index e4a1490b42c2..e3a7a5078e5c 100644
--- a/drivers/gpu/drm/armada/armada_crtc.c
+++ b/drivers/gpu/drm/armada/armada_crtc.c
@@ -12,6 +12,7 @@
 #include <linux/platform_device.h>
 #include <drm/drmP.h>
 #include <drm/drm_crtc_helper.h>
+#include <drm/drm_plane_helper.h>
 #include "armada_crtc.h"
 #include "armada_drm.h"
 #include "armada_fb.h"
diff --git a/drivers/gpu/drm/ast/ast_mode.c b/drivers/gpu/drm/ast/ast_mode.c
index 9dc0fd5c1ea4..b7ee2634e47c 100644
--- a/drivers/gpu/drm/ast/ast_mode.c
+++ b/drivers/gpu/drm/ast/ast_mode.c
@@ -31,6 +31,7 @@
 #include <drm/drmP.h>
 #include <drm/drm_crtc.h>
 #include <drm/drm_crtc_helper.h>
+#include <drm/drm_plane_helper.h>
 #include "ast_drv.h"
 
 #include "ast_tables.h"
diff --git a/drivers/gpu/drm/bochs/bochs_fbdev.c b/drivers/gpu/drm/bochs/bochs_fbdev.c
index fe95d31cd110..61dbf09dff5d 100644
--- a/drivers/gpu/drm/bochs/bochs_fbdev.c
+++ b/drivers/gpu/drm/bochs/bochs_fbdev.c
@@ -9,6 +9,17 @@
 
 /* ---------------------------------------------------------------------- */
 
+static int bochsfb_mmap(struct fb_info *info,
+			struct vm_area_struct *vma)
+{
+	struct drm_fb_helper *fb_helper = info->par;
+	struct bochs_device *bochs =
+		container_of(fb_helper, struct bochs_device, fb.helper);
+	struct bochs_bo *bo = gem_to_bochs_bo(bochs->fb.gfb.obj);
+
+	return ttm_fbdev_mmap(vma, &bo->bo);
+}
+
 static struct fb_ops bochsfb_ops = {
 	.owner = THIS_MODULE,
 	.fb_check_var = drm_fb_helper_check_var,
@@ -19,6 +30,7 @@ static struct fb_ops bochsfb_ops = {
 	.fb_pan_display = drm_fb_helper_pan_display,
 	.fb_blank = drm_fb_helper_blank,
 	.fb_setcmap = drm_fb_helper_setcmap,
+	.fb_mmap = bochsfb_mmap,
 };
 
 static int bochsfb_create_object(struct bochs_device *bochs,
@@ -123,11 +135,9 @@ static int bochsfb_create(struct drm_fb_helper *helper,
 	info->screen_base = bo->kmap.virtual;
 	info->screen_size = size;
 
-#if 0
-	/* FIXME: get this right for mmap(/dev/fb0) */
-	info->fix.smem_start = bochs_bo_mmap_offset(bo);
+	drm_vma_offset_remove(&bo->bo.bdev->vma_manager, &bo->bo.vma_node);
+	info->fix.smem_start = 0;
 	info->fix.smem_len = size;
-#endif
 
 	ret = fb_alloc_cmap(&info->cmap, 256, 0);
 	if (ret) {
diff --git a/drivers/gpu/drm/bochs/bochs_hw.c b/drivers/gpu/drm/bochs/bochs_hw.c
index dbe619e6aab4..460389702d31 100644
--- a/drivers/gpu/drm/bochs/bochs_hw.c
+++ b/drivers/gpu/drm/bochs/bochs_hw.c
@@ -51,11 +51,10 @@ int bochs_hw_init(struct drm_device *dev, uint32_t flags)
 {
 	struct bochs_device *bochs = dev->dev_private;
 	struct pci_dev *pdev = dev->pdev;
-	unsigned long addr, size, mem, ioaddr, iosize;
+	unsigned long addr, size, mem, ioaddr, iosize, qext_size;
 	u16 id;
 
-	if (/* (ent->driver_data == BOCHS_QEMU_STDVGA) && */
-	    (pdev->resource[2].flags & IORESOURCE_MEM)) {
+	if (pdev->resource[2].flags & IORESOURCE_MEM) {
 		/* mmio bar with vga and bochs registers present */
 		if (pci_request_region(pdev, 2, "bochs-drm") != 0) {
 			DRM_ERROR("Cannot request mmio region\n");
@@ -116,6 +115,24 @@ int bochs_hw_init(struct drm_device *dev, uint32_t flags)
 		 size / 1024, addr,
 		 bochs->ioports ? "ioports" : "mmio",
 		 ioaddr);
+
+	if (bochs->mmio && pdev->revision >= 2) {
+		qext_size = readl(bochs->mmio + 0x600);
+		if (qext_size < 4 || qext_size > iosize)
+			goto noext;
+		DRM_DEBUG("Found qemu ext regs, size %ld\n", qext_size);
+		if (qext_size >= 8) {
+#ifdef __BIG_ENDIAN
+			writel(0xbebebebe, bochs->mmio + 0x604);
+#else
+			writel(0x1e1e1e1e, bochs->mmio + 0x604);
+#endif
+			DRM_DEBUG("  qext endian: 0x%x\n",
+				  readl(bochs->mmio + 0x604));
+		}
+	}
+
+noext:
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/bochs/bochs_kms.c b/drivers/gpu/drm/bochs/bochs_kms.c
index 6b7efcf363d6..85f0f8cf1fb8 100644
--- a/drivers/gpu/drm/bochs/bochs_kms.c
+++ b/drivers/gpu/drm/bochs/bochs_kms.c
@@ -6,6 +6,7 @@
  */
 
 #include "bochs.h"
+#include <drm/drm_plane_helper.h>
 
 static int defx = 1024;
 static int defy = 768;
@@ -108,11 +109,32 @@ static void bochs_crtc_gamma_set(struct drm_crtc *crtc, u16 *red, u16 *green,
 {
 }
 
+static int bochs_crtc_page_flip(struct drm_crtc *crtc,
+				struct drm_framebuffer *fb,
+				struct drm_pending_vblank_event *event,
+				uint32_t page_flip_flags)
+{
+	struct bochs_device *bochs =
+		container_of(crtc, struct bochs_device, crtc);
+	struct drm_framebuffer *old_fb = crtc->primary->fb;
+	unsigned long irqflags;
+
+	crtc->primary->fb = fb;
+	bochs_crtc_mode_set_base(crtc, 0, 0, old_fb);
+	if (event) {
+		spin_lock_irqsave(&bochs->dev->event_lock, irqflags);
+		drm_send_vblank_event(bochs->dev, -1, event);
+		spin_unlock_irqrestore(&bochs->dev->event_lock, irqflags);
+	}
+	return 0;
+}
+
 /* These provide the minimum set of functions required to handle a CRTC */
 static const struct drm_crtc_funcs bochs_crtc_funcs = {
 	.gamma_set = bochs_crtc_gamma_set,
 	.set_config = drm_crtc_helper_set_config,
 	.destroy = drm_crtc_cleanup,
+	.page_flip = bochs_crtc_page_flip,
 };
 
 static const struct drm_crtc_helper_funcs bochs_helper_funcs = {
diff --git a/drivers/gpu/drm/cirrus/cirrus_drv.h b/drivers/gpu/drm/cirrus/cirrus_drv.h
index d44e69daa239..693a4565c4ff 100644
--- a/drivers/gpu/drm/cirrus/cirrus_drv.h
+++ b/drivers/gpu/drm/cirrus/cirrus_drv.h
@@ -210,6 +210,9 @@ int cirrus_framebuffer_init(struct drm_device *dev,
 			    struct drm_mode_fb_cmd2 *mode_cmd,
 			    struct drm_gem_object *obj);
 
+bool cirrus_check_framebuffer(struct cirrus_device *cdev, int width, int height,
+			      int bpp, int pitch);
+
 				/* cirrus_display.c */
 int cirrus_modeset_init(struct cirrus_device *cdev);
 void cirrus_modeset_fini(struct cirrus_device *cdev);
diff --git a/drivers/gpu/drm/cirrus/cirrus_fbdev.c b/drivers/gpu/drm/cirrus/cirrus_fbdev.c
index d231b1c317af..502a89eb54b5 100644
--- a/drivers/gpu/drm/cirrus/cirrus_fbdev.c
+++ b/drivers/gpu/drm/cirrus/cirrus_fbdev.c
@@ -139,6 +139,7 @@ static int cirrusfb_create_object(struct cirrus_fbdev *afbdev,
 			       struct drm_gem_object **gobj_p)
 {
 	struct drm_device *dev = afbdev->helper.dev;
+	struct cirrus_device *cdev = dev->dev_private;
 	u32 bpp, depth;
 	u32 size;
 	struct drm_gem_object *gobj;
@@ -146,8 +147,10 @@ static int cirrusfb_create_object(struct cirrus_fbdev *afbdev,
 	int ret = 0;
 	drm_fb_get_bpp_depth(mode_cmd->pixel_format, &depth, &bpp);
 
-	if (bpp > 24)
+	if (!cirrus_check_framebuffer(cdev, mode_cmd->width, mode_cmd->height,
+				      bpp, mode_cmd->pitches[0]))
 		return -EINVAL;
+
 	size = mode_cmd->pitches[0] * mode_cmd->height;
 	ret = cirrus_gem_create(dev, size, true, &gobj);
 	if (ret)
diff --git a/drivers/gpu/drm/cirrus/cirrus_main.c b/drivers/gpu/drm/cirrus/cirrus_main.c
index 99c1983f99d2..4c2d68e9102d 100644
--- a/drivers/gpu/drm/cirrus/cirrus_main.c
+++ b/drivers/gpu/drm/cirrus/cirrus_main.c
@@ -49,14 +49,16 @@ cirrus_user_framebuffer_create(struct drm_device *dev,
 			       struct drm_file *filp,
 			       struct drm_mode_fb_cmd2 *mode_cmd)
 {
+	struct cirrus_device *cdev = dev->dev_private;
 	struct drm_gem_object *obj;
 	struct cirrus_framebuffer *cirrus_fb;
 	int ret;
 	u32 bpp, depth;
 
 	drm_fb_get_bpp_depth(mode_cmd->pixel_format, &depth, &bpp);
-	/* cirrus can't handle > 24bpp framebuffers at all */
-	if (bpp > 24)
+
+	if (!cirrus_check_framebuffer(cdev, mode_cmd->width, mode_cmd->height,
+				      bpp, mode_cmd->pitches[0]))
 		return ERR_PTR(-EINVAL);
 
 	obj = drm_gem_object_lookup(dev, filp, mode_cmd->handles[0]);
@@ -96,8 +98,7 @@ static int cirrus_vram_init(struct cirrus_device *cdev)
 {
 	/* BAR 0 is VRAM */
 	cdev->mc.vram_base = pci_resource_start(cdev->dev->pdev, 0);
-	/* We have 4MB of VRAM */
-	cdev->mc.vram_size = 4 * 1024 * 1024;
+	cdev->mc.vram_size = pci_resource_len(cdev->dev->pdev, 0);
 
 	if (!request_mem_region(cdev->mc.vram_base, cdev->mc.vram_size,
 				"cirrusdrmfb_vram")) {
@@ -179,17 +180,22 @@ int cirrus_driver_load(struct drm_device *dev, unsigned long flags)
 	}
 
 	r = cirrus_mm_init(cdev);
-	if (r)
+	if (r) {
 		dev_err(&dev->pdev->dev, "fatal err on mm init\n");
+		goto out;
+	}
 
 	r = cirrus_modeset_init(cdev);
-	if (r)
+	if (r) {
 		dev_err(&dev->pdev->dev, "Fatal error during modeset init: %d\n", r);
+		goto out;
+	}
 
 	dev->mode_config.funcs = (void *)&cirrus_mode_funcs;
+
+	return 0;
 out:
-	if (r)
-		cirrus_driver_unload(dev);
+	cirrus_driver_unload(dev);
 	return r;
 }
 
@@ -307,3 +313,21 @@ out_unlock:
 	return ret;
 
 }
+
+bool cirrus_check_framebuffer(struct cirrus_device *cdev, int width, int height,
+			      int bpp, int pitch)
+{
+	const int max_pitch = 0x1FF << 3; /* (4096 - 1) & ~111b bytes */
+	const int max_size = cdev->mc.vram_size;
+
+	if (bpp > 32)
+		return false;
+
+	if (pitch > max_pitch)
+		return false;
+
+	if (pitch * height > max_size)
+		return false;
+
+	return true;
+}
diff --git a/drivers/gpu/drm/cirrus/cirrus_mode.c b/drivers/gpu/drm/cirrus/cirrus_mode.c
index c7c5a9d91fa0..99d4a74ffeaf 100644
--- a/drivers/gpu/drm/cirrus/cirrus_mode.c
+++ b/drivers/gpu/drm/cirrus/cirrus_mode.c
@@ -16,6 +16,7 @@
  */
 #include <drm/drmP.h>
 #include <drm/drm_crtc_helper.h>
+#include <drm/drm_plane_helper.h>
 
 #include <video/cirrus.h>
 
diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
new file mode 100644
index 000000000000..ff5f034cc405
--- /dev/null
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -0,0 +1,657 @@
+/*
+ * Copyright (C) 2014 Red Hat
+ * Copyright (C) 2014 Intel Corp.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ * Rob Clark <robdclark@gmail.com>
+ * Daniel Vetter <daniel.vetter@ffwll.ch>
+ */
+
+
+#include <drm/drmP.h>
+#include <drm/drm_atomic.h>
+#include <drm/drm_plane_helper.h>
+
+static void kfree_state(struct drm_atomic_state *state)
+{
+	kfree(state->connectors);
+	kfree(state->connector_states);
+	kfree(state->crtcs);
+	kfree(state->crtc_states);
+	kfree(state->planes);
+	kfree(state->plane_states);
+	kfree(state);
+}
+
+/**
+ * drm_atomic_state_alloc - allocate atomic state
+ * @dev: DRM device
+ *
+ * This allocates an empty atomic state to track updates.
+ */
+struct drm_atomic_state *
+drm_atomic_state_alloc(struct drm_device *dev)
+{
+	struct drm_atomic_state *state;
+
+	state = kzalloc(sizeof(*state), GFP_KERNEL);
+	if (!state)
+		return NULL;
+
+	state->num_connector = ACCESS_ONCE(dev->mode_config.num_connector);
+
+	state->crtcs = kcalloc(dev->mode_config.num_crtc,
+			       sizeof(*state->crtcs), GFP_KERNEL);
+	if (!state->crtcs)
+		goto fail;
+	state->crtc_states = kcalloc(dev->mode_config.num_crtc,
+				     sizeof(*state->crtc_states), GFP_KERNEL);
+	if (!state->crtc_states)
+		goto fail;
+	state->planes = kcalloc(dev->mode_config.num_total_plane,
+				sizeof(*state->planes), GFP_KERNEL);
+	if (!state->planes)
+		goto fail;
+	state->plane_states = kcalloc(dev->mode_config.num_total_plane,
+				      sizeof(*state->plane_states), GFP_KERNEL);
+	if (!state->plane_states)
+		goto fail;
+	state->connectors = kcalloc(state->num_connector,
+				    sizeof(*state->connectors),
+				    GFP_KERNEL);
+	if (!state->connectors)
+		goto fail;
+	state->connector_states = kcalloc(state->num_connector,
+					  sizeof(*state->connector_states),
+					  GFP_KERNEL);
+	if (!state->connector_states)
+		goto fail;
+
+	state->dev = dev;
+
+	DRM_DEBUG_KMS("Allocate atomic state %p\n", state);
+
+	return state;
+fail:
+	kfree_state(state);
+
+	return NULL;
+}
+EXPORT_SYMBOL(drm_atomic_state_alloc);
+
+/**
+ * drm_atomic_state_clear - clear state object
+ * @state: atomic state
+ *
+ * When the w/w mutex algorithm detects a deadlock we need to back off and drop
+ * all locks. So someone else could sneak in and change the current modeset
+ * configuration. Which means that all the state assembled in @state is no
+ * longer an atomic update to the current state, but to some arbitrary earlier
+ * state. Which could break assumptions the driver's ->atomic_check likely
+ * relies on.
+ *
+ * Hence we must clear all cached state and completely start over, using this
+ * function.
+ */
+void drm_atomic_state_clear(struct drm_atomic_state *state)
+{
+	struct drm_device *dev = state->dev;
+	struct drm_mode_config *config = &dev->mode_config;
+	int i;
+
+	DRM_DEBUG_KMS("Clearing atomic state %p\n", state);
+
+	for (i = 0; i < state->num_connector; i++) {
+		struct drm_connector *connector = state->connectors[i];
+
+		if (!connector)
+			continue;
+
+		WARN_ON(!drm_modeset_is_locked(&config->connection_mutex));
+
+		connector->funcs->atomic_destroy_state(connector,
+						       state->connector_states[i]);
+	}
+
+	for (i = 0; i < config->num_crtc; i++) {
+		struct drm_crtc *crtc = state->crtcs[i];
+
+		if (!crtc)
+			continue;
+
+		crtc->funcs->atomic_destroy_state(crtc,
+						  state->crtc_states[i]);
+	}
+
+	for (i = 0; i < config->num_total_plane; i++) {
+		struct drm_plane *plane = state->planes[i];
+
+		if (!plane)
+			continue;
+
+		plane->funcs->atomic_destroy_state(plane,
+						   state->plane_states[i]);
+	}
+}
+EXPORT_SYMBOL(drm_atomic_state_clear);
+
+/**
+ * drm_atomic_state_free - free all memory for an atomic state
+ * @state: atomic state to deallocate
+ *
+ * This frees all memory associated with an atomic state, including all the
+ * per-object state for planes, crtcs and connectors.
+ */
+void drm_atomic_state_free(struct drm_atomic_state *state)
+{
+	drm_atomic_state_clear(state);
+
+	DRM_DEBUG_KMS("Freeing atomic state %p\n", state);
+
+	kfree_state(state);
+}
+EXPORT_SYMBOL(drm_atomic_state_free);
+
+/**
+ * drm_atomic_get_crtc_state - get crtc state
+ * @state: global atomic state object
+ * @crtc: crtc to get state object for
+ *
+ * This function returns the crtc state for the given crtc, allocating it if
+ * needed. It will also grab the relevant crtc lock to make sure that the state
+ * is consistent.
+ *
+ * Returns:
+ *
+ * Either the allocated state or the error code encoded into the pointer. When
+ * the error is EDEADLK then the w/w mutex code has detected a deadlock and the
+ * entire atomic sequence must be restarted. All other errors are fatal.
+ */
+struct drm_crtc_state *
+drm_atomic_get_crtc_state(struct drm_atomic_state *state,
+			  struct drm_crtc *crtc)
+{
+	int ret, index;
+	struct drm_crtc_state *crtc_state;
+
+	index = drm_crtc_index(crtc);
+
+	if (state->crtc_states[index])
+		return state->crtc_states[index];
+
+	ret = drm_modeset_lock(&crtc->mutex, state->acquire_ctx);
+	if (ret)
+		return ERR_PTR(ret);
+
+	crtc_state = crtc->funcs->atomic_duplicate_state(crtc);
+	if (!crtc_state)
+		return ERR_PTR(-ENOMEM);
+
+	state->crtc_states[index] = crtc_state;
+	state->crtcs[index] = crtc;
+	crtc_state->state = state;
+
+	DRM_DEBUG_KMS("Added [CRTC:%d] %p state to %p\n",
+		      crtc->base.id, crtc_state, state);
+
+	return crtc_state;
+}
+EXPORT_SYMBOL(drm_atomic_get_crtc_state);
+
+/**
+ * drm_atomic_get_plane_state - get plane state
+ * @state: global atomic state object
+ * @plane: plane to get state object for
+ *
+ * This function returns the plane state for the given plane, allocating it if
+ * needed. It will also grab the relevant plane lock to make sure that the state
+ * is consistent.
+ *
+ * Returns:
+ *
+ * Either the allocated state or the error code encoded into the pointer. When
+ * the error is EDEADLK then the w/w mutex code has detected a deadlock and the
+ * entire atomic sequence must be restarted. All other errors are fatal.
+ */
+struct drm_plane_state *
+drm_atomic_get_plane_state(struct drm_atomic_state *state,
+			  struct drm_plane *plane)
+{
+	int ret, index;
+	struct drm_plane_state *plane_state;
+
+	index = drm_plane_index(plane);
+
+	if (state->plane_states[index])
+		return state->plane_states[index];
+
+	ret = drm_modeset_lock(&plane->mutex, state->acquire_ctx);
+	if (ret)
+		return ERR_PTR(ret);
+
+	plane_state = plane->funcs->atomic_duplicate_state(plane);
+	if (!plane_state)
+		return ERR_PTR(-ENOMEM);
+
+	state->plane_states[index] = plane_state;
+	state->planes[index] = plane;
+	plane_state->state = state;
+
+	DRM_DEBUG_KMS("Added [PLANE:%d] %p state to %p\n",
+		      plane->base.id, plane_state, state);
+
+	if (plane_state->crtc) {
+		struct drm_crtc_state *crtc_state;
+
+		crtc_state = drm_atomic_get_crtc_state(state,
+						       plane_state->crtc);
+		if (IS_ERR(crtc_state))
+			return ERR_CAST(crtc_state);
+	}
+
+	return plane_state;
+}
+EXPORT_SYMBOL(drm_atomic_get_plane_state);
+
+/**
+ * drm_atomic_get_connector_state - get connector state
+ * @state: global atomic state object
+ * @connector: connector to get state object for
+ *
+ * This function returns the connector state for the given connector,
+ * allocating it if needed. It will also grab the relevant connector lock to
+ * make sure that the state is consistent.
+ *
+ * Returns:
+ *
+ * Either the allocated state or the error code encoded into the pointer. When
+ * the error is EDEADLK then the w/w mutex code has detected a deadlock and the
+ * entire atomic sequence must be restarted. All other errors are fatal.
+ */
+struct drm_connector_state *
+drm_atomic_get_connector_state(struct drm_atomic_state *state,
+			  struct drm_connector *connector)
+{
+	int ret, index;
+	struct drm_mode_config *config = &connector->dev->mode_config;
+	struct drm_connector_state *connector_state;
+
+	ret = drm_modeset_lock(&config->connection_mutex, state->acquire_ctx);
+	if (ret)
+		return ERR_PTR(ret);
+
+	index = drm_connector_index(connector);
+
+	/*
+	 * Construction of atomic state updates can race with a connector
+	 * hot-add which might overflow. In this case flip the table and just
+	 * restart the entire ioctl - no one is fast enough to livelock a cpu
+	 * with physical hotplug events anyway.
+	 *
+	 * Note that we only grab the indexes once we have the right lock to
+	 * prevent hotplug/unplugging of connectors. So removal is no problem,
+	 * at most the array is a bit too large.
+	 */
+	if (index >= state->num_connector) {
+		DRM_DEBUG_KMS("Hot-added connector would overflow state array, restarting\n");
+		return ERR_PTR(-EAGAIN);
+	}
+
+	if (state->connector_states[index])
+		return state->connector_states[index];
+
+	connector_state = connector->funcs->atomic_duplicate_state(connector);
+	if (!connector_state)
+		return ERR_PTR(-ENOMEM);
+
+	state->connector_states[index] = connector_state;
+	state->connectors[index] = connector;
+	connector_state->state = state;
+
+	DRM_DEBUG_KMS("Added [CONNECTOR:%d] %p state to %p\n",
+		      connector->base.id, connector_state, state);
+
+	if (connector_state->crtc) {
+		struct drm_crtc_state *crtc_state;
+
+		crtc_state = drm_atomic_get_crtc_state(state,
+						       connector_state->crtc);
+		if (IS_ERR(crtc_state))
+			return ERR_CAST(crtc_state);
+	}
+
+	return connector_state;
+}
+EXPORT_SYMBOL(drm_atomic_get_connector_state);
+
+/**
+ * drm_atomic_set_crtc_for_plane - set crtc for plane
+ * @state: the incoming atomic state
+ * @plane: the plane whose incoming state to update
+ * @crtc: crtc to use for the plane
+ *
+ * Changing the assigned crtc for a plane requires us to grab the lock and state
+ * for the new crtc, as needed. This function takes care of all these details
+ * besides updating the pointer in the state object itself.
+ *
+ * Returns:
+ * 0 on success or can fail with -EDEADLK or -ENOMEM. When the error is EDEADLK
+ * then the w/w mutex code has detected a deadlock and the entire atomic
+ * sequence must be restarted. All other errors are fatal.
+ */
+int
+drm_atomic_set_crtc_for_plane(struct drm_atomic_state *state,
+			      struct drm_plane *plane, struct drm_crtc *crtc)
+{
+	struct drm_plane_state *plane_state =
+			drm_atomic_get_plane_state(state, plane);
+	struct drm_crtc_state *crtc_state;
+
+	if (WARN_ON(IS_ERR(plane_state)))
+		return PTR_ERR(plane_state);
+
+	if (plane_state->crtc) {
+		crtc_state = drm_atomic_get_crtc_state(plane_state->state,
+						       plane_state->crtc);
+		if (WARN_ON(IS_ERR(crtc_state)))
+			return PTR_ERR(crtc_state);
+
+		crtc_state->plane_mask &= ~(1 << drm_plane_index(plane));
+	}
+
+	plane_state->crtc = crtc;
+
+	if (crtc) {
+		crtc_state = drm_atomic_get_crtc_state(plane_state->state,
+						       crtc);
+		if (IS_ERR(crtc_state))
+			return PTR_ERR(crtc_state);
+		crtc_state->plane_mask |= (1 << drm_plane_index(plane));
+	}
+
+	if (crtc)
+		DRM_DEBUG_KMS("Link plane state %p to [CRTC:%d]\n",
+			      plane_state, crtc->base.id);
+	else
+		DRM_DEBUG_KMS("Link plane state %p to [NOCRTC]\n", plane_state);
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_atomic_set_crtc_for_plane);
+
+/**
+ * drm_atomic_set_fb_for_plane - set crtc for plane
+ * @plane_state: atomic state object for the plane
+ * @fb: fb to use for the plane
+ *
+ * Changing the assigned framebuffer for a plane requires us to grab a reference
+ * to the new fb and drop the reference to the old fb, if there is one. This
+ * function takes care of all these details besides updating the pointer in the
+ * state object itself.
+ */
+void
+drm_atomic_set_fb_for_plane(struct drm_plane_state *plane_state,
+			    struct drm_framebuffer *fb)
+{
+	if (plane_state->fb)
+		drm_framebuffer_unreference(plane_state->fb);
+	if (fb)
+		drm_framebuffer_reference(fb);
+	plane_state->fb = fb;
+
+	if (fb)
+		DRM_DEBUG_KMS("Set [FB:%d] for plane state %p\n",
+			      fb->base.id, plane_state);
+	else
+		DRM_DEBUG_KMS("Set [NOFB] for plane state %p\n", plane_state);
+}
+EXPORT_SYMBOL(drm_atomic_set_fb_for_plane);
+
+/**
+ * drm_atomic_set_crtc_for_connector - set crtc for connector
+ * @conn_state: atomic state object for the connector
+ * @crtc: crtc to use for the connector
+ *
+ * Changing the assigned crtc for a connector requires us to grab the lock and
+ * state for the new crtc, as needed. This function takes care of all these
+ * details besides updating the pointer in the state object itself.
+ *
+ * Returns:
+ * 0 on success or can fail with -EDEADLK or -ENOMEM. When the error is EDEADLK
+ * then the w/w mutex code has detected a deadlock and the entire atomic
+ * sequence must be restarted. All other errors are fatal.
+ */
+int
+drm_atomic_set_crtc_for_connector(struct drm_connector_state *conn_state,
+				  struct drm_crtc *crtc)
+{
+	struct drm_crtc_state *crtc_state;
+
+	if (crtc) {
+		crtc_state = drm_atomic_get_crtc_state(conn_state->state, crtc);
+		if (IS_ERR(crtc_state))
+			return PTR_ERR(crtc_state);
+	}
+
+	conn_state->crtc = crtc;
+
+	if (crtc)
+		DRM_DEBUG_KMS("Link connector state %p to [CRTC:%d]\n",
+			      conn_state, crtc->base.id);
+	else
+		DRM_DEBUG_KMS("Link connector state %p to [NOCRTC]\n",
+			      conn_state);
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_atomic_set_crtc_for_connector);
+
+/**
+ * drm_atomic_add_affected_connectors - add connectors for crtc
+ * @state: atomic state
+ * @crtc: DRM crtc
+ *
+ * This function walks the current configuration and adds all connectors
+ * currently using @crtc to the atomic configuration @state. Note that this
+ * function must acquire the connection mutex. This can potentially cause
+ * unneeded seralization if the update is just for the planes on one crtc. Hence
+ * drivers and helpers should only call this when really needed (e.g. when a
+ * full modeset needs to happen due to some change).
+ *
+ * Returns:
+ * 0 on success or can fail with -EDEADLK or -ENOMEM. When the error is EDEADLK
+ * then the w/w mutex code has detected a deadlock and the entire atomic
+ * sequence must be restarted. All other errors are fatal.
+ */
+int
+drm_atomic_add_affected_connectors(struct drm_atomic_state *state,
+				   struct drm_crtc *crtc)
+{
+	struct drm_mode_config *config = &state->dev->mode_config;
+	struct drm_connector *connector;
+	struct drm_connector_state *conn_state;
+	int ret;
+
+	ret = drm_modeset_lock(&config->connection_mutex, state->acquire_ctx);
+	if (ret)
+		return ret;
+
+	DRM_DEBUG_KMS("Adding all current connectors for [CRTC:%d] to %p\n",
+		      crtc->base.id, state);
+
+	/*
+	 * Changed connectors are already in @state, so only need to look at the
+	 * current configuration.
+	 */
+	list_for_each_entry(connector, &config->connector_list, head) {
+		if (connector->state->crtc != crtc)
+			continue;
+
+		conn_state = drm_atomic_get_connector_state(state, connector);
+		if (IS_ERR(conn_state))
+			return PTR_ERR(conn_state);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_atomic_add_affected_connectors);
+
+/**
+ * drm_atomic_connectors_for_crtc - count number of connected outputs
+ * @state: atomic state
+ * @crtc: DRM crtc
+ *
+ * This function counts all connectors which will be connected to @crtc
+ * according to @state. Useful to recompute the enable state for @crtc.
+ */
+int
+drm_atomic_connectors_for_crtc(struct drm_atomic_state *state,
+			       struct drm_crtc *crtc)
+{
+	int i, num_connected_connectors = 0;
+
+	for (i = 0; i < state->num_connector; i++) {
+		struct drm_connector_state *conn_state;
+
+		conn_state = state->connector_states[i];
+
+		if (conn_state && conn_state->crtc == crtc)
+			num_connected_connectors++;
+	}
+
+	DRM_DEBUG_KMS("State %p has %i connectors for [CRTC:%d]\n",
+		      state, num_connected_connectors, crtc->base.id);
+
+	return num_connected_connectors;
+}
+EXPORT_SYMBOL(drm_atomic_connectors_for_crtc);
+
+/**
+ * drm_atomic_legacy_backoff - locking backoff for legacy ioctls
+ * @state: atomic state
+ *
+ * This function should be used by legacy entry points which don't understand
+ * -EDEADLK semantics. For simplicity this one will grab all modeset locks after
+ *  the slowpath completed.
+ */
+void drm_atomic_legacy_backoff(struct drm_atomic_state *state)
+{
+	int ret;
+
+retry:
+	drm_modeset_backoff(state->acquire_ctx);
+
+	ret = drm_modeset_lock(&state->dev->mode_config.connection_mutex,
+			       state->acquire_ctx);
+	if (ret)
+		goto retry;
+	ret = drm_modeset_lock_all_crtcs(state->dev,
+					 state->acquire_ctx);
+	if (ret)
+		goto retry;
+}
+EXPORT_SYMBOL(drm_atomic_legacy_backoff);
+
+/**
+ * drm_atomic_check_only - check whether a given config would work
+ * @state: atomic configuration to check
+ *
+ * Note that this function can return -EDEADLK if the driver needed to acquire
+ * more locks but encountered a deadlock. The caller must then do the usual w/w
+ * backoff dance and restart. All other errors are fatal.
+ *
+ * Returns:
+ * 0 on success, negative error code on failure.
+ */
+int drm_atomic_check_only(struct drm_atomic_state *state)
+{
+	struct drm_mode_config *config = &state->dev->mode_config;
+
+	DRM_DEBUG_KMS("checking %p\n", state);
+
+	if (config->funcs->atomic_check)
+		return config->funcs->atomic_check(state->dev, state);
+	else
+		return 0;
+}
+EXPORT_SYMBOL(drm_atomic_check_only);
+
+/**
+ * drm_atomic_commit - commit configuration atomically
+ * @state: atomic configuration to check
+ *
+ * Note that this function can return -EDEADLK if the driver needed to acquire
+ * more locks but encountered a deadlock. The caller must then do the usual w/w
+ * backoff dance and restart. All other errors are fatal.
+ *
+ * Also note that on successful execution ownership of @state is transferred
+ * from the caller of this function to the function itself. The caller must not
+ * free or in any other way access @state. If the function fails then the caller
+ * must clean up @state itself.
+ *
+ * Returns:
+ * 0 on success, negative error code on failure.
+ */
+int drm_atomic_commit(struct drm_atomic_state *state)
+{
+	struct drm_mode_config *config = &state->dev->mode_config;
+	int ret;
+
+	ret = drm_atomic_check_only(state);
+	if (ret)
+		return ret;
+
+	DRM_DEBUG_KMS("commiting %p\n", state);
+
+	return config->funcs->atomic_commit(state->dev, state, false);
+}
+EXPORT_SYMBOL(drm_atomic_commit);
+
+/**
+ * drm_atomic_async_commit - atomic&async configuration commit
+ * @state: atomic configuration to check
+ *
+ * Note that this function can return -EDEADLK if the driver needed to acquire
+ * more locks but encountered a deadlock. The caller must then do the usual w/w
+ * backoff dance and restart. All other errors are fatal.
+ *
+ * Also note that on successful execution ownership of @state is transferred
+ * from the caller of this function to the function itself. The caller must not
+ * free or in any other way access @state. If the function fails then the caller
+ * must clean up @state itself.
+ *
+ * Returns:
+ * 0 on success, negative error code on failure.
+ */
+int drm_atomic_async_commit(struct drm_atomic_state *state)
+{
+	struct drm_mode_config *config = &state->dev->mode_config;
+	int ret;
+
+	ret = drm_atomic_check_only(state);
+	if (ret)
+		return ret;
+
+	DRM_DEBUG_KMS("commiting %p asynchronously\n", state);
+
+	return config->funcs->atomic_commit(state->dev, state, true);
+}
+EXPORT_SYMBOL(drm_atomic_async_commit);
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
new file mode 100644
index 000000000000..4a78a773151c
--- /dev/null
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -0,0 +1,1966 @@
+/*
+ * Copyright (C) 2014 Red Hat
+ * Copyright (C) 2014 Intel Corp.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ * Rob Clark <robdclark@gmail.com>
+ * Daniel Vetter <daniel.vetter@ffwll.ch>
+ */
+
+#include <drm/drmP.h>
+#include <drm/drm_atomic.h>
+#include <drm/drm_plane_helper.h>
+#include <drm/drm_crtc_helper.h>
+#include <drm/drm_atomic_helper.h>
+#include <linux/fence.h>
+
+/**
+ * DOC: overview
+ *
+ * This helper library provides implementations of check and commit functions on
+ * top of the CRTC modeset helper callbacks and the plane helper callbacks. It
+ * also provides convenience implementations for the atomic state handling
+ * callbacks for drivers which don't need to subclass the drm core structures to
+ * add their own additional internal state.
+ *
+ * This library also provides default implementations for the check callback in
+ * drm_atomic_helper_check and for the commit callback with
+ * drm_atomic_helper_commit. But the individual stages and callbacks are expose
+ * to allow drivers to mix and match and e.g. use the plane helpers only
+ * together with a driver private modeset implementation.
+ *
+ * This library also provides implementations for all the legacy driver
+ * interfaces on top of the atomic interface. See drm_atomic_helper_set_config,
+ * drm_atomic_helper_disable_plane, drm_atomic_helper_disable_plane and the
+ * various functions to implement set_property callbacks. New drivers must not
+ * implement these functions themselves but must use the provided helpers.
+ */
+static void
+drm_atomic_helper_plane_changed(struct drm_atomic_state *state,
+				struct drm_plane_state *plane_state,
+				struct drm_plane *plane)
+{
+	struct drm_crtc_state *crtc_state;
+
+	if (plane->state->crtc) {
+		crtc_state = state->crtc_states[drm_crtc_index(plane->crtc)];
+
+		if (WARN_ON(!crtc_state))
+			return;
+
+		crtc_state->planes_changed = true;
+	}
+
+	if (plane_state->crtc) {
+		crtc_state =
+			state->crtc_states[drm_crtc_index(plane_state->crtc)];
+
+		if (WARN_ON(!crtc_state))
+			return;
+
+		crtc_state->planes_changed = true;
+	}
+}
+
+static struct drm_crtc *
+get_current_crtc_for_encoder(struct drm_device *dev,
+			     struct drm_encoder *encoder)
+{
+	struct drm_mode_config *config = &dev->mode_config;
+	struct drm_connector *connector;
+
+	WARN_ON(!drm_modeset_is_locked(&config->connection_mutex));
+
+	list_for_each_entry(connector, &config->connector_list, head) {
+		if (connector->state->best_encoder != encoder)
+			continue;
+
+		return connector->state->crtc;
+	}
+
+	return NULL;
+}
+
+static int
+steal_encoder(struct drm_atomic_state *state,
+	      struct drm_encoder *encoder,
+	      struct drm_crtc *encoder_crtc)
+{
+	struct drm_mode_config *config = &state->dev->mode_config;
+	struct drm_crtc_state *crtc_state;
+	struct drm_connector *connector;
+	struct drm_connector_state *connector_state;
+	int ret;
+
+	/*
+	 * We can only steal an encoder coming from a connector, which means we
+	 * must already hold the connection_mutex.
+	 */
+	WARN_ON(!drm_modeset_is_locked(&config->connection_mutex));
+
+	DRM_DEBUG_KMS("[ENCODER:%d:%s] in use on [CRTC:%d], stealing it\n",
+		      encoder->base.id, encoder->name,
+		      encoder_crtc->base.id);
+
+	crtc_state = drm_atomic_get_crtc_state(state, encoder_crtc);
+	if (IS_ERR(crtc_state))
+		return PTR_ERR(crtc_state);
+
+	crtc_state->mode_changed = true;
+
+	list_for_each_entry(connector, &config->connector_list, head) {
+		if (connector->state->best_encoder != encoder)
+			continue;
+
+		DRM_DEBUG_KMS("Stealing encoder from [CONNECTOR:%d:%s]\n",
+			      connector->base.id,
+			      connector->name);
+
+		connector_state = drm_atomic_get_connector_state(state,
+								 connector);
+		if (IS_ERR(connector_state))
+			return PTR_ERR(connector_state);
+
+		ret = drm_atomic_set_crtc_for_connector(connector_state, NULL);
+		if (ret)
+			return ret;
+		connector_state->best_encoder = NULL;
+	}
+
+	return 0;
+}
+
+static int
+update_connector_routing(struct drm_atomic_state *state, int conn_idx)
+{
+	struct drm_connector_helper_funcs *funcs;
+	struct drm_encoder *new_encoder;
+	struct drm_crtc *encoder_crtc;
+	struct drm_connector *connector;
+	struct drm_connector_state *connector_state;
+	struct drm_crtc_state *crtc_state;
+	int idx, ret;
+
+	connector = state->connectors[conn_idx];
+	connector_state = state->connector_states[conn_idx];
+
+	if (!connector)
+		return 0;
+
+	DRM_DEBUG_KMS("Updating routing for [CONNECTOR:%d:%s]\n",
+			connector->base.id,
+			connector->name);
+
+	if (connector->state->crtc != connector_state->crtc) {
+		if (connector->state->crtc) {
+			idx = drm_crtc_index(connector->state->crtc);
+
+			crtc_state = state->crtc_states[idx];
+			crtc_state->mode_changed = true;
+		}
+
+		if (connector_state->crtc) {
+			idx = drm_crtc_index(connector_state->crtc);
+
+			crtc_state = state->crtc_states[idx];
+			crtc_state->mode_changed = true;
+		}
+	}
+
+	if (!connector_state->crtc) {
+		DRM_DEBUG_KMS("Disabling [CONNECTOR:%d:%s]\n",
+				connector->base.id,
+				connector->name);
+
+		connector_state->best_encoder = NULL;
+
+		return 0;
+	}
+
+	funcs = connector->helper_private;
+	new_encoder = funcs->best_encoder(connector);
+
+	if (!new_encoder) {
+		DRM_DEBUG_KMS("No suitable encoder found for [CONNECTOR:%d:%s]\n",
+			      connector->base.id,
+			      connector->name);
+		return -EINVAL;
+	}
+
+	if (new_encoder == connector_state->best_encoder) {
+		DRM_DEBUG_KMS("[CONNECTOR:%d:%s] keeps [ENCODER:%d:%s], now on [CRTC:%d]\n",
+			      connector->base.id,
+			      connector->name,
+			      new_encoder->base.id,
+			      new_encoder->name,
+			      connector_state->crtc->base.id);
+
+		return 0;
+	}
+
+	encoder_crtc = get_current_crtc_for_encoder(state->dev,
+						    new_encoder);
+
+	if (encoder_crtc) {
+		ret = steal_encoder(state, new_encoder, encoder_crtc);
+		if (ret) {
+			DRM_DEBUG_KMS("Encoder stealing failed for [CONNECTOR:%d:%s]\n",
+				      connector->base.id,
+				      connector->name);
+			return ret;
+		}
+	}
+
+	connector_state->best_encoder = new_encoder;
+	idx = drm_crtc_index(connector_state->crtc);
+
+	crtc_state = state->crtc_states[idx];
+	crtc_state->mode_changed = true;
+
+	DRM_DEBUG_KMS("[CONNECTOR:%d:%s] using [ENCODER:%d:%s] on [CRTC:%d]\n",
+		      connector->base.id,
+		      connector->name,
+		      new_encoder->base.id,
+		      new_encoder->name,
+		      connector_state->crtc->base.id);
+
+	return 0;
+}
+
+static int
+mode_fixup(struct drm_atomic_state *state)
+{
+	int ncrtcs = state->dev->mode_config.num_crtc;
+	struct drm_crtc_state *crtc_state;
+	struct drm_connector_state *conn_state;
+	int i;
+	bool ret;
+
+	for (i = 0; i < ncrtcs; i++) {
+		crtc_state = state->crtc_states[i];
+
+		if (!crtc_state || !crtc_state->mode_changed)
+			continue;
+
+		drm_mode_copy(&crtc_state->adjusted_mode, &crtc_state->mode);
+	}
+
+	for (i = 0; i < state->num_connector; i++) {
+		struct drm_encoder_helper_funcs *funcs;
+		struct drm_encoder *encoder;
+
+		conn_state = state->connector_states[i];
+
+		if (!conn_state)
+			continue;
+
+		WARN_ON(!!conn_state->best_encoder != !!conn_state->crtc);
+
+		if (!conn_state->crtc || !conn_state->best_encoder)
+			continue;
+
+		crtc_state =
+			state->crtc_states[drm_crtc_index(conn_state->crtc)];
+
+		/*
+		 * Each encoder has at most one connector (since we always steal
+		 * it away), so we won't call ->mode_fixup twice.
+		 */
+		encoder = conn_state->best_encoder;
+		funcs = encoder->helper_private;
+
+		if (encoder->bridge && encoder->bridge->funcs->mode_fixup) {
+			ret = encoder->bridge->funcs->mode_fixup(
+					encoder->bridge, &crtc_state->mode,
+					&crtc_state->adjusted_mode);
+			if (!ret) {
+				DRM_DEBUG_KMS("Bridge fixup failed\n");
+				return -EINVAL;
+			}
+		}
+
+
+		ret = funcs->mode_fixup(encoder, &crtc_state->mode,
+					&crtc_state->adjusted_mode);
+		if (!ret) {
+			DRM_DEBUG_KMS("[ENCODER:%d:%s] fixup failed\n",
+				      encoder->base.id, encoder->name);
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < ncrtcs; i++) {
+		struct drm_crtc_helper_funcs *funcs;
+		struct drm_crtc *crtc;
+
+		crtc_state = state->crtc_states[i];
+		crtc = state->crtcs[i];
+
+		if (!crtc_state || !crtc_state->mode_changed)
+			continue;
+
+		funcs = crtc->helper_private;
+		ret = funcs->mode_fixup(crtc, &crtc_state->mode,
+					&crtc_state->adjusted_mode);
+		if (!ret) {
+			DRM_DEBUG_KMS("[CRTC:%d] fixup failed\n",
+				      crtc->base.id);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+static int
+drm_atomic_helper_check_modeset(struct drm_device *dev,
+				struct drm_atomic_state *state)
+{
+	int ncrtcs = dev->mode_config.num_crtc;
+	struct drm_crtc *crtc;
+	struct drm_crtc_state *crtc_state;
+	int i, ret;
+
+	for (i = 0; i < ncrtcs; i++) {
+		crtc = state->crtcs[i];
+		crtc_state = state->crtc_states[i];
+
+		if (!crtc)
+			continue;
+
+		if (!drm_mode_equal(&crtc->state->mode, &crtc_state->mode)) {
+			DRM_DEBUG_KMS("[CRTC:%d] mode changed\n",
+				      crtc->base.id);
+			crtc_state->mode_changed = true;
+		}
+
+		if (crtc->state->enable != crtc_state->enable) {
+			DRM_DEBUG_KMS("[CRTC:%d] enable changed\n",
+				      crtc->base.id);
+			crtc_state->mode_changed = true;
+		}
+	}
+
+	for (i = 0; i < state->num_connector; i++) {
+		/*
+		 * This only sets crtc->mode_changed for routing changes,
+		 * drivers must set crtc->mode_changed themselves when connector
+		 * properties need to be updated.
+		 */
+		ret = update_connector_routing(state, i);
+		if (ret)
+			return ret;
+	}
+
+	/*
+	 * After all the routing has been prepared we need to add in any
+	 * connector which is itself unchanged, but who's crtc changes it's
+	 * configuration. This must be done before calling mode_fixup in case a
+	 * crtc only changed its mode but has the same set of connectors.
+	 */
+	for (i = 0; i < ncrtcs; i++) {
+		int num_connectors;
+
+		crtc = state->crtcs[i];
+		crtc_state = state->crtc_states[i];
+
+		if (!crtc || !crtc_state->mode_changed)
+			continue;
+
+		DRM_DEBUG_KMS("[CRTC:%d] needs full modeset, enable: %c\n",
+			      crtc->base.id,
+			      crtc_state->enable ? 'y' : 'n');
+
+		ret = drm_atomic_add_affected_connectors(state, crtc);
+		if (ret != 0)
+			return ret;
+
+		num_connectors = drm_atomic_connectors_for_crtc(state,
+								crtc);
+
+		if (crtc_state->enable != !!num_connectors) {
+			DRM_DEBUG_KMS("[CRTC:%d] enabled/connectors mismatch\n",
+				      crtc->base.id);
+
+			return -EINVAL;
+		}
+	}
+
+	return mode_fixup(state);
+}
+
+/**
+ * drm_atomic_helper_check - validate state object
+ * @dev: DRM device
+ * @state: the driver state object
+ *
+ * Check the state object to see if the requested state is physically possible.
+ * Only crtcs and planes have check callbacks, so for any additional (global)
+ * checking that a driver needs it can simply wrap that around this function.
+ * Drivers without such needs can directly use this as their ->atomic_check()
+ * callback.
+ *
+ * RETURNS
+ * Zero for success or -errno
+ */
+int drm_atomic_helper_check(struct drm_device *dev,
+			    struct drm_atomic_state *state)
+{
+	int nplanes = dev->mode_config.num_total_plane;
+	int ncrtcs = dev->mode_config.num_crtc;
+	int i, ret = 0;
+
+	for (i = 0; i < nplanes; i++) {
+		struct drm_plane_helper_funcs *funcs;
+		struct drm_plane *plane = state->planes[i];
+		struct drm_plane_state *plane_state = state->plane_states[i];
+
+		if (!plane)
+			continue;
+
+		funcs = plane->helper_private;
+
+		drm_atomic_helper_plane_changed(state, plane_state, plane);
+
+		if (!funcs || !funcs->atomic_check)
+			continue;
+
+		ret = funcs->atomic_check(plane, plane_state);
+		if (ret) {
+			DRM_DEBUG_KMS("[PLANE:%d] atomic check failed\n",
+				      plane->base.id);
+			return ret;
+		}
+	}
+
+	for (i = 0; i < ncrtcs; i++) {
+		struct drm_crtc_helper_funcs *funcs;
+		struct drm_crtc *crtc = state->crtcs[i];
+
+		if (!crtc)
+			continue;
+
+		funcs = crtc->helper_private;
+
+		if (!funcs || !funcs->atomic_check)
+			continue;
+
+		ret = funcs->atomic_check(crtc, state->crtc_states[i]);
+		if (ret) {
+			DRM_DEBUG_KMS("[CRTC:%d] atomic check failed\n",
+				      crtc->base.id);
+			return ret;
+		}
+	}
+
+	ret = drm_atomic_helper_check_modeset(dev, state);
+	if (ret)
+		return ret;
+
+	return ret;
+}
+EXPORT_SYMBOL(drm_atomic_helper_check);
+
+static void
+disable_outputs(struct drm_device *dev, struct drm_atomic_state *old_state)
+{
+	int ncrtcs = old_state->dev->mode_config.num_crtc;
+	int i;
+
+	for (i = 0; i < old_state->num_connector; i++) {
+		struct drm_connector_state *old_conn_state;
+		struct drm_connector *connector;
+		struct drm_encoder_helper_funcs *funcs;
+		struct drm_encoder *encoder;
+
+		old_conn_state = old_state->connector_states[i];
+		connector = old_state->connectors[i];
+
+		/* Shut down everything that's in the changeset and currently
+		 * still on. So need to check the old, saved state. */
+		if (!old_conn_state || !old_conn_state->crtc)
+			continue;
+
+		encoder = old_conn_state->best_encoder;
+
+		/* We shouldn't get this far if we didn't previously have
+		 * an encoder.. but WARN_ON() rather than explode.
+		 */
+		if (WARN_ON(!encoder))
+			continue;
+
+		funcs = encoder->helper_private;
+
+		/*
+		 * Each encoder has at most one connector (since we always steal
+		 * it away), so we won't call call disable hooks twice.
+		 */
+		if (encoder->bridge)
+			encoder->bridge->funcs->disable(encoder->bridge);
+
+		/* Right function depends upon target state. */
+		if (connector->state->crtc)
+			funcs->prepare(encoder);
+		else if (funcs->disable)
+			funcs->disable(encoder);
+		else
+			funcs->dpms(encoder, DRM_MODE_DPMS_OFF);
+
+		if (encoder->bridge)
+			encoder->bridge->funcs->post_disable(encoder->bridge);
+	}
+
+	for (i = 0; i < ncrtcs; i++) {
+		struct drm_crtc_helper_funcs *funcs;
+		struct drm_crtc *crtc;
+
+		crtc = old_state->crtcs[i];
+
+		/* Shut down everything that needs a full modeset. */
+		if (!crtc || !crtc->state->mode_changed)
+			continue;
+
+		funcs = crtc->helper_private;
+
+		/* Right function depends upon target state. */
+		if (crtc->state->enable)
+			funcs->prepare(crtc);
+		else if (funcs->disable)
+			funcs->disable(crtc);
+		else
+			funcs->dpms(crtc, DRM_MODE_DPMS_OFF);
+	}
+}
+
+static void
+set_routing_links(struct drm_device *dev, struct drm_atomic_state *old_state)
+{
+	int ncrtcs = old_state->dev->mode_config.num_crtc;
+	int i;
+
+	/* clear out existing links */
+	for (i = 0; i < old_state->num_connector; i++) {
+		struct drm_connector *connector;
+
+		connector = old_state->connectors[i];
+
+		if (!connector || !connector->encoder)
+			continue;
+
+		WARN_ON(!connector->encoder->crtc);
+
+		connector->encoder->crtc = NULL;
+		connector->encoder = NULL;
+	}
+
+	/* set new links */
+	for (i = 0; i < old_state->num_connector; i++) {
+		struct drm_connector *connector;
+
+		connector = old_state->connectors[i];
+
+		if (!connector || !connector->state->crtc)
+			continue;
+
+		if (WARN_ON(!connector->state->best_encoder))
+			continue;
+
+		connector->encoder = connector->state->best_encoder;
+		connector->encoder->crtc = connector->state->crtc;
+	}
+
+	/* set legacy state in the crtc structure */
+	for (i = 0; i < ncrtcs; i++) {
+		struct drm_crtc *crtc;
+
+		crtc = old_state->crtcs[i];
+
+		if (!crtc)
+			continue;
+
+		crtc->mode = crtc->state->mode;
+		crtc->enabled = crtc->state->enable;
+		crtc->x = crtc->primary->state->src_x >> 16;
+		crtc->y = crtc->primary->state->src_y >> 16;
+	}
+}
+
+static void
+crtc_set_mode(struct drm_device *dev, struct drm_atomic_state *old_state)
+{
+	int ncrtcs = old_state->dev->mode_config.num_crtc;
+	int i;
+
+	for (i = 0; i < ncrtcs; i++) {
+		struct drm_crtc_helper_funcs *funcs;
+		struct drm_crtc *crtc;
+
+		crtc = old_state->crtcs[i];
+
+		if (!crtc || !crtc->state->mode_changed)
+			continue;
+
+		funcs = crtc->helper_private;
+
+		if (crtc->state->enable)
+			funcs->mode_set_nofb(crtc);
+	}
+
+	for (i = 0; i < old_state->num_connector; i++) {
+		struct drm_connector *connector;
+		struct drm_crtc_state *new_crtc_state;
+		struct drm_encoder_helper_funcs *funcs;
+		struct drm_encoder *encoder;
+		struct drm_display_mode *mode, *adjusted_mode;
+
+		connector = old_state->connectors[i];
+
+		if (!connector || !connector->state->best_encoder)
+			continue;
+
+		encoder = connector->state->best_encoder;
+		funcs = encoder->helper_private;
+		new_crtc_state = connector->state->crtc->state;
+		mode = &new_crtc_state->mode;
+		adjusted_mode = &new_crtc_state->adjusted_mode;
+
+		/*
+		 * Each encoder has at most one connector (since we always steal
+		 * it away), so we won't call call mode_set hooks twice.
+		 */
+		funcs->mode_set(encoder, mode, adjusted_mode);
+
+		if (encoder->bridge && encoder->bridge->funcs->mode_set)
+			encoder->bridge->funcs->mode_set(encoder->bridge,
+							 mode, adjusted_mode);
+	}
+}
+
+/**
+ * drm_atomic_helper_commit_pre_planes - modeset commit before plane updates
+ * @dev: DRM device
+ * @state: atomic state
+ *
+ * This function commits the modeset changes that need to be committed before
+ * updating planes. It shuts down all the outputs that need to be shut down and
+ * prepares them (if required) with the new mode.
+ */
+void drm_atomic_helper_commit_pre_planes(struct drm_device *dev,
+					 struct drm_atomic_state *state)
+{
+	disable_outputs(dev, state);
+	set_routing_links(dev, state);
+	crtc_set_mode(dev, state);
+}
+EXPORT_SYMBOL(drm_atomic_helper_commit_pre_planes);
+
+/**
+ * drm_atomic_helper_commit_post_planes - modeset commit after plane updates
+ * @dev: DRM device
+ * @old_state: atomic state object with old state structures
+ *
+ * This function commits the modeset changes that need to be committed after
+ * updating planes: It enables all the outputs with the new configuration which
+ * had to be turned off for the update.
+ */
+void drm_atomic_helper_commit_post_planes(struct drm_device *dev,
+					  struct drm_atomic_state *old_state)
+{
+	int ncrtcs = old_state->dev->mode_config.num_crtc;
+	int i;
+
+	for (i = 0; i < ncrtcs; i++) {
+		struct drm_crtc_helper_funcs *funcs;
+		struct drm_crtc *crtc;
+
+		crtc = old_state->crtcs[i];
+
+		/* Need to filter out CRTCs where only planes change. */
+		if (!crtc || !crtc->state->mode_changed)
+			continue;
+
+		funcs = crtc->helper_private;
+
+		if (crtc->state->enable)
+			funcs->commit(crtc);
+	}
+
+	for (i = 0; i < old_state->num_connector; i++) {
+		struct drm_connector *connector;
+		struct drm_encoder_helper_funcs *funcs;
+		struct drm_encoder *encoder;
+
+		connector = old_state->connectors[i];
+
+		if (!connector || !connector->state->best_encoder)
+			continue;
+
+		encoder = connector->state->best_encoder;
+		funcs = encoder->helper_private;
+
+		/*
+		 * Each encoder has at most one connector (since we always steal
+		 * it away), so we won't call call enable hooks twice.
+		 */
+		if (encoder->bridge)
+			encoder->bridge->funcs->pre_enable(encoder->bridge);
+
+		funcs->commit(encoder);
+
+		if (encoder->bridge)
+			encoder->bridge->funcs->enable(encoder->bridge);
+	}
+}
+EXPORT_SYMBOL(drm_atomic_helper_commit_post_planes);
+
+static void wait_for_fences(struct drm_device *dev,
+			    struct drm_atomic_state *state)
+{
+	int nplanes = dev->mode_config.num_total_plane;
+	int i;
+
+	for (i = 0; i < nplanes; i++) {
+		struct drm_plane *plane = state->planes[i];
+
+		if (!plane || !plane->state->fence)
+			continue;
+
+		WARN_ON(!plane->state->fb);
+
+		fence_wait(plane->state->fence, false);
+		fence_put(plane->state->fence);
+		plane->state->fence = NULL;
+	}
+}
+
+static bool framebuffer_changed(struct drm_device *dev,
+				struct drm_atomic_state *old_state,
+				struct drm_crtc *crtc)
+{
+	struct drm_plane *plane;
+	struct drm_plane_state *old_plane_state;
+	int nplanes = old_state->dev->mode_config.num_total_plane;
+	int i;
+
+	for (i = 0; i < nplanes; i++) {
+		plane = old_state->planes[i];
+		old_plane_state = old_state->plane_states[i];
+
+		if (!plane)
+			continue;
+
+		if (plane->state->crtc != crtc &&
+		    old_plane_state->crtc != crtc)
+			continue;
+
+		if (plane->state->fb != old_plane_state->fb)
+			return true;
+	}
+
+	return false;
+}
+
+/**
+ * drm_atomic_helper_wait_for_vblanks - wait for vblank on crtcs
+ * @dev: DRM device
+ * @old_state: atomic state object with old state structures
+ *
+ * Helper to, after atomic commit, wait for vblanks on all effected
+ * crtcs (ie. before cleaning up old framebuffers using
+ * drm_atomic_helper_cleanup_planes()). It will only wait on crtcs where the
+ * framebuffers have actually changed to optimize for the legacy cursor and
+ * plane update use-case.
+ */
+void
+drm_atomic_helper_wait_for_vblanks(struct drm_device *dev,
+		struct drm_atomic_state *old_state)
+{
+	struct drm_crtc *crtc;
+	struct drm_crtc_state *old_crtc_state;
+	int ncrtcs = old_state->dev->mode_config.num_crtc;
+	int i, ret;
+
+	for (i = 0; i < ncrtcs; i++) {
+		crtc = old_state->crtcs[i];
+		old_crtc_state = old_state->crtc_states[i];
+
+		if (!crtc)
+			continue;
+
+		/* No one cares about the old state, so abuse it for tracking
+		 * and store whether we hold a vblank reference (and should do a
+		 * vblank wait) in the ->enable boolean. */
+		old_crtc_state->enable = false;
+
+		if (!crtc->state->enable)
+			continue;
+
+		if (!framebuffer_changed(dev, old_state, crtc))
+			continue;
+
+		ret = drm_crtc_vblank_get(crtc);
+		if (ret != 0)
+			continue;
+
+		old_crtc_state->enable = true;
+		old_crtc_state->last_vblank_count = drm_vblank_count(dev, i);
+	}
+
+	for (i = 0; i < ncrtcs; i++) {
+		crtc = old_state->crtcs[i];
+		old_crtc_state = old_state->crtc_states[i];
+
+		if (!crtc || !old_crtc_state->enable)
+			continue;
+
+		ret = wait_event_timeout(dev->vblank[i].queue,
+				old_crtc_state->last_vblank_count !=
+					drm_vblank_count(dev, i),
+				msecs_to_jiffies(50));
+
+		drm_crtc_vblank_put(crtc);
+	}
+}
+EXPORT_SYMBOL(drm_atomic_helper_wait_for_vblanks);
+
+/**
+ * drm_atomic_helper_commit - commit validated state object
+ * @dev: DRM device
+ * @state: the driver state object
+ * @async: asynchronous commit
+ *
+ * This function commits a with drm_atomic_helper_check() pre-validated state
+ * object. This can still fail when e.g. the framebuffer reservation fails. For
+ * now this doesn't implement asynchronous commits.
+ *
+ * RETURNS
+ * Zero for success or -errno.
+ */
+int drm_atomic_helper_commit(struct drm_device *dev,
+			     struct drm_atomic_state *state,
+			     bool async)
+{
+	int ret;
+
+	if (async)
+		return -EBUSY;
+
+	ret = drm_atomic_helper_prepare_planes(dev, state);
+	if (ret)
+		return ret;
+
+	/*
+	 * This is the point of no return - everything below never fails except
+	 * when the hw goes bonghits. Which means we can commit the new state on
+	 * the software side now.
+	 */
+
+	drm_atomic_helper_swap_state(dev, state);
+
+	/*
+	 * Everything below can be run asynchronously without the need to grab
+	 * any modeset locks at all under one conditions: It must be guaranteed
+	 * that the asynchronous work has either been cancelled (if the driver
+	 * supports it, which at least requires that the framebuffers get
+	 * cleaned up with drm_atomic_helper_cleanup_planes()) or completed
+	 * before the new state gets committed on the software side with
+	 * drm_atomic_helper_swap_state().
+	 *
+	 * This scheme allows new atomic state updates to be prepared and
+	 * checked in parallel to the asynchronous completion of the previous
+	 * update. Which is important since compositors need to figure out the
+	 * composition of the next frame right after having submitted the
+	 * current layout.
+	 */
+
+	wait_for_fences(dev, state);
+
+	drm_atomic_helper_commit_pre_planes(dev, state);
+
+	drm_atomic_helper_commit_planes(dev, state);
+
+	drm_atomic_helper_commit_post_planes(dev, state);
+
+	drm_atomic_helper_wait_for_vblanks(dev, state);
+
+	drm_atomic_helper_cleanup_planes(dev, state);
+
+	drm_atomic_state_free(state);
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_atomic_helper_commit);
+
+/**
+ * DOC: implementing async commit
+ *
+ * For now the atomic helpers don't support async commit directly. If there is
+ * real need it could be added though, using the dma-buf fence infrastructure
+ * for generic synchronization with outstanding rendering.
+ *
+ * For now drivers have to implement async commit themselves, with the following
+ * sequence being the recommended one:
+ *
+ * 1. Run drm_atomic_helper_prepare_planes() first. This is the only function
+ * which commit needs to call which can fail, so we want to run it first and
+ * synchronously.
+ *
+ * 2. Synchronize with any outstanding asynchronous commit worker threads which
+ * might be affected the new state update. This can be done by either cancelling
+ * or flushing the work items, depending upon whether the driver can deal with
+ * cancelled updates. Note that it is important to ensure that the framebuffer
+ * cleanup is still done when cancelling.
+ *
+ * For sufficient parallelism it is recommended to have a work item per crtc
+ * (for updates which don't touch global state) and a global one. Then we only
+ * need to synchronize with the crtc work items for changed crtcs and the global
+ * work item, which allows nice concurrent updates on disjoint sets of crtcs.
+ *
+ * 3. The software state is updated synchronously with
+ * drm_atomic_helper_swap_state. Doing this under the protection of all modeset
+ * locks means concurrent callers never see inconsistent state. And doing this
+ * while it's guaranteed that no relevant async worker runs means that async
+ * workers do not need grab any locks. Actually they must not grab locks, for
+ * otherwise the work flushing will deadlock.
+ *
+ * 4. Schedule a work item to do all subsequent steps, using the split-out
+ * commit helpers: a) pre-plane commit b) plane commit c) post-plane commit and
+ * then cleaning up the framebuffers after the old framebuffer is no longer
+ * being displayed.
+ */
+
+/**
+ * drm_atomic_helper_prepare_planes - prepare plane resources after commit
+ * @dev: DRM device
+ * @state: atomic state object with old state structures
+ *
+ * This function prepares plane state, specifically framebuffers, for the new
+ * configuration. If any failure is encountered this function will call
+ * ->cleanup_fb on any already successfully prepared framebuffer.
+ *
+ * Returns:
+ * 0 on success, negative error code on failure.
+ */
+int drm_atomic_helper_prepare_planes(struct drm_device *dev,
+				     struct drm_atomic_state *state)
+{
+	int nplanes = dev->mode_config.num_total_plane;
+	int ret, i;
+
+	for (i = 0; i < nplanes; i++) {
+		struct drm_plane_helper_funcs *funcs;
+		struct drm_plane *plane = state->planes[i];
+		struct drm_framebuffer *fb;
+
+		if (!plane)
+			continue;
+
+		funcs = plane->helper_private;
+
+		fb = state->plane_states[i]->fb;
+
+		if (fb && funcs->prepare_fb) {
+			ret = funcs->prepare_fb(plane, fb);
+			if (ret)
+				goto fail;
+		}
+	}
+
+	return 0;
+
+fail:
+	for (i--; i >= 0; i--) {
+		struct drm_plane_helper_funcs *funcs;
+		struct drm_plane *plane = state->planes[i];
+		struct drm_framebuffer *fb;
+
+		if (!plane)
+			continue;
+
+		funcs = plane->helper_private;
+
+		fb = state->plane_states[i]->fb;
+
+		if (fb && funcs->cleanup_fb)
+			funcs->cleanup_fb(plane, fb);
+
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(drm_atomic_helper_prepare_planes);
+
+/**
+ * drm_atomic_helper_commit_planes - commit plane state
+ * @dev: DRM device
+ * @old_state: atomic state object with old state structures
+ *
+ * This function commits the new plane state using the plane and atomic helper
+ * functions for planes and crtcs. It assumes that the atomic state has already
+ * been pushed into the relevant object state pointers, since this step can no
+ * longer fail.
+ *
+ * It still requires the global state object @old_state to know which planes and
+ * crtcs need to be updated though.
+ */
+void drm_atomic_helper_commit_planes(struct drm_device *dev,
+				     struct drm_atomic_state *old_state)
+{
+	int nplanes = dev->mode_config.num_total_plane;
+	int ncrtcs = dev->mode_config.num_crtc;
+	int i;
+
+	for (i = 0; i < ncrtcs; i++) {
+		struct drm_crtc_helper_funcs *funcs;
+		struct drm_crtc *crtc = old_state->crtcs[i];
+
+		if (!crtc)
+			continue;
+
+		funcs = crtc->helper_private;
+
+		if (!funcs || !funcs->atomic_begin)
+			continue;
+
+		funcs->atomic_begin(crtc);
+	}
+
+	for (i = 0; i < nplanes; i++) {
+		struct drm_plane_helper_funcs *funcs;
+		struct drm_plane *plane = old_state->planes[i];
+		struct drm_plane_state *old_plane_state;
+
+		if (!plane)
+			continue;
+
+		funcs = plane->helper_private;
+
+		if (!funcs || !funcs->atomic_update)
+			continue;
+
+		old_plane_state = old_state->plane_states[i];
+
+		funcs->atomic_update(plane, old_plane_state);
+	}
+
+	for (i = 0; i < ncrtcs; i++) {
+		struct drm_crtc_helper_funcs *funcs;
+		struct drm_crtc *crtc = old_state->crtcs[i];
+
+		if (!crtc)
+			continue;
+
+		funcs = crtc->helper_private;
+
+		if (!funcs || !funcs->atomic_flush)
+			continue;
+
+		funcs->atomic_flush(crtc);
+	}
+}
+EXPORT_SYMBOL(drm_atomic_helper_commit_planes);
+
+/**
+ * drm_atomic_helper_cleanup_planes - cleanup plane resources after commit
+ * @dev: DRM device
+ * @old_state: atomic state object with old state structures
+ *
+ * This function cleans up plane state, specifically framebuffers, from the old
+ * configuration. Hence the old configuration must be perserved in @old_state to
+ * be able to call this function.
+ *
+ * This function must also be called on the new state when the atomic update
+ * fails at any point after calling drm_atomic_helper_prepare_planes().
+ */
+void drm_atomic_helper_cleanup_planes(struct drm_device *dev,
+				      struct drm_atomic_state *old_state)
+{
+	int nplanes = dev->mode_config.num_total_plane;
+	int i;
+
+	for (i = 0; i < nplanes; i++) {
+		struct drm_plane_helper_funcs *funcs;
+		struct drm_plane *plane = old_state->planes[i];
+		struct drm_framebuffer *old_fb;
+
+		if (!plane)
+			continue;
+
+		funcs = plane->helper_private;
+
+		old_fb = old_state->plane_states[i]->fb;
+
+		if (old_fb && funcs->cleanup_fb)
+			funcs->cleanup_fb(plane, old_fb);
+	}
+}
+EXPORT_SYMBOL(drm_atomic_helper_cleanup_planes);
+
+/**
+ * drm_atomic_helper_swap_state - store atomic state into current sw state
+ * @dev: DRM device
+ * @state: atomic state
+ *
+ * This function stores the atomic state into the current state pointers in all
+ * driver objects. It should be called after all failing steps have been done
+ * and succeeded, but before the actual hardware state is committed.
+ *
+ * For cleanup and error recovery the current state for all changed objects will
+ * be swaped into @state.
+ *
+ * With that sequence it fits perfectly into the plane prepare/cleanup sequence:
+ *
+ * 1. Call drm_atomic_helper_prepare_planes() with the staged atomic state.
+ *
+ * 2. Do any other steps that might fail.
+ *
+ * 3. Put the staged state into the current state pointers with this function.
+ *
+ * 4. Actually commit the hardware state.
+ *
+ * 5. Call drm_atomic_helper_cleanup_planes with @state, which since step 3
+ * contains the old state. Also do any other cleanup required with that state.
+ */
+void drm_atomic_helper_swap_state(struct drm_device *dev,
+				  struct drm_atomic_state *state)
+{
+	int i;
+
+	for (i = 0; i < dev->mode_config.num_connector; i++) {
+		struct drm_connector *connector = state->connectors[i];
+
+		if (!connector)
+			continue;
+
+		connector->state->state = state;
+		swap(state->connector_states[i], connector->state);
+		connector->state->state = NULL;
+	}
+
+	for (i = 0; i < dev->mode_config.num_crtc; i++) {
+		struct drm_crtc *crtc = state->crtcs[i];
+
+		if (!crtc)
+			continue;
+
+		crtc->state->state = state;
+		swap(state->crtc_states[i], crtc->state);
+		crtc->state->state = NULL;
+	}
+
+	for (i = 0; i < dev->mode_config.num_total_plane; i++) {
+		struct drm_plane *plane = state->planes[i];
+
+		if (!plane)
+			continue;
+
+		plane->state->state = state;
+		swap(state->plane_states[i], plane->state);
+		plane->state->state = NULL;
+	}
+}
+EXPORT_SYMBOL(drm_atomic_helper_swap_state);
+
+/**
+ * drm_atomic_helper_update_plane - Helper for primary plane update using atomic
+ * @plane: plane object to update
+ * @crtc: owning CRTC of owning plane
+ * @fb: framebuffer to flip onto plane
+ * @crtc_x: x offset of primary plane on crtc
+ * @crtc_y: y offset of primary plane on crtc
+ * @crtc_w: width of primary plane rectangle on crtc
+ * @crtc_h: height of primary plane rectangle on crtc
+ * @src_x: x offset of @fb for panning
+ * @src_y: y offset of @fb for panning
+ * @src_w: width of source rectangle in @fb
+ * @src_h: height of source rectangle in @fb
+ *
+ * Provides a default plane update handler using the atomic driver interface.
+ *
+ * RETURNS:
+ * Zero on success, error code on failure
+ */
+int drm_atomic_helper_update_plane(struct drm_plane *plane,
+				   struct drm_crtc *crtc,
+				   struct drm_framebuffer *fb,
+				   int crtc_x, int crtc_y,
+				   unsigned int crtc_w, unsigned int crtc_h,
+				   uint32_t src_x, uint32_t src_y,
+				   uint32_t src_w, uint32_t src_h)
+{
+	struct drm_atomic_state *state;
+	struct drm_plane_state *plane_state;
+	int ret = 0;
+
+	state = drm_atomic_state_alloc(plane->dev);
+	if (!state)
+		return -ENOMEM;
+
+	state->acquire_ctx = drm_modeset_legacy_acquire_ctx(crtc);
+retry:
+	plane_state = drm_atomic_get_plane_state(state, plane);
+	if (IS_ERR(plane_state)) {
+		ret = PTR_ERR(plane_state);
+		goto fail;
+	}
+
+	ret = drm_atomic_set_crtc_for_plane(state, plane, crtc);
+	if (ret != 0)
+		goto fail;
+	drm_atomic_set_fb_for_plane(plane_state, fb);
+	plane_state->crtc_x = crtc_x;
+	plane_state->crtc_y = crtc_y;
+	plane_state->crtc_h = crtc_h;
+	plane_state->crtc_w = crtc_w;
+	plane_state->src_x = src_x;
+	plane_state->src_y = src_y;
+	plane_state->src_h = src_h;
+	plane_state->src_w = src_w;
+
+	ret = drm_atomic_commit(state);
+	if (ret != 0)
+		goto fail;
+
+	/* Driver takes ownership of state on successful commit. */
+	return 0;
+fail:
+	if (ret == -EDEADLK)
+		goto backoff;
+
+	drm_atomic_state_free(state);
+
+	return ret;
+backoff:
+	drm_atomic_state_clear(state);
+	drm_atomic_legacy_backoff(state);
+
+	/*
+	 * Someone might have exchanged the framebuffer while we dropped locks
+	 * in the backoff code. We need to fix up the fb refcount tracking the
+	 * core does for us.
+	 */
+	plane->old_fb = plane->fb;
+
+	goto retry;
+}
+EXPORT_SYMBOL(drm_atomic_helper_update_plane);
+
+/**
+ * drm_atomic_helper_disable_plane - Helper for primary plane disable using * atomic
+ * @plane: plane to disable
+ *
+ * Provides a default plane disable handler using the atomic driver interface.
+ *
+ * RETURNS:
+ * Zero on success, error code on failure
+ */
+int drm_atomic_helper_disable_plane(struct drm_plane *plane)
+{
+	struct drm_atomic_state *state;
+	struct drm_plane_state *plane_state;
+	int ret = 0;
+
+	/*
+	 * FIXME: Without plane->crtc set we can't get at the implicit legacy
+	 * acquire context. The real fix will be to wire the acquire ctx through
+	 * everywhere we need it, but meanwhile prevent chaos by just skipping
+	 * this noop. The critical case is the cursor ioctls which a) only grab
+	 * crtc/cursor-plane locks (so we need the crtc to get at the right
+	 * acquire context) and b) can try to disable the plane multiple times.
+	 */
+	if (!plane->crtc)
+		return 0;
+
+	state = drm_atomic_state_alloc(plane->dev);
+	if (!state)
+		return -ENOMEM;
+
+	state->acquire_ctx = drm_modeset_legacy_acquire_ctx(plane->crtc);
+retry:
+	plane_state = drm_atomic_get_plane_state(state, plane);
+	if (IS_ERR(plane_state)) {
+		ret = PTR_ERR(plane_state);
+		goto fail;
+	}
+
+	ret = drm_atomic_set_crtc_for_plane(state, plane, NULL);
+	if (ret != 0)
+		goto fail;
+	drm_atomic_set_fb_for_plane(plane_state, NULL);
+	plane_state->crtc_x = 0;
+	plane_state->crtc_y = 0;
+	plane_state->crtc_h = 0;
+	plane_state->crtc_w = 0;
+	plane_state->src_x = 0;
+	plane_state->src_y = 0;
+	plane_state->src_h = 0;
+	plane_state->src_w = 0;
+
+	ret = drm_atomic_commit(state);
+	if (ret != 0)
+		goto fail;
+
+	/* Driver takes ownership of state on successful commit. */
+	return 0;
+fail:
+	if (ret == -EDEADLK)
+		goto backoff;
+
+	drm_atomic_state_free(state);
+
+	return ret;
+backoff:
+	drm_atomic_state_clear(state);
+	drm_atomic_legacy_backoff(state);
+
+	/*
+	 * Someone might have exchanged the framebuffer while we dropped locks
+	 * in the backoff code. We need to fix up the fb refcount tracking the
+	 * core does for us.
+	 */
+	plane->old_fb = plane->fb;
+
+	goto retry;
+}
+EXPORT_SYMBOL(drm_atomic_helper_disable_plane);
+
+static int update_output_state(struct drm_atomic_state *state,
+			       struct drm_mode_set *set)
+{
+	struct drm_device *dev = set->crtc->dev;
+	struct drm_connector_state *conn_state;
+	int ncrtcs = state->dev->mode_config.num_crtc;
+	int ret, i, j;
+
+	ret = drm_modeset_lock(&dev->mode_config.connection_mutex,
+			       state->acquire_ctx);
+	if (ret)
+		return ret;
+
+	/* First grab all affected connector/crtc states. */
+	for (i = 0; i < set->num_connectors; i++) {
+		conn_state = drm_atomic_get_connector_state(state,
+							    set->connectors[i]);
+		if (IS_ERR(conn_state))
+			return PTR_ERR(conn_state);
+	}
+
+	for (i = 0; i < ncrtcs; i++) {
+		struct drm_crtc *crtc = state->crtcs[i];
+
+		if (!crtc)
+			continue;
+
+		ret = drm_atomic_add_affected_connectors(state, crtc);
+		if (ret)
+			return ret;
+	}
+
+	/* Then recompute connector->crtc links and crtc enabling state. */
+	for (i = 0; i < state->num_connector; i++) {
+		struct drm_connector *connector;
+
+		connector = state->connectors[i];
+		conn_state = state->connector_states[i];
+
+		if (!connector)
+			continue;
+
+		if (conn_state->crtc == set->crtc) {
+			ret = drm_atomic_set_crtc_for_connector(conn_state,
+								NULL);
+			if (ret)
+				return ret;
+		}
+
+		for (j = 0; j < set->num_connectors; j++) {
+			if (set->connectors[j] == connector) {
+				ret = drm_atomic_set_crtc_for_connector(conn_state,
+									set->crtc);
+				if (ret)
+					return ret;
+				break;
+			}
+		}
+	}
+
+	for (i = 0; i < ncrtcs; i++) {
+		struct drm_crtc *crtc = state->crtcs[i];
+		struct drm_crtc_state *crtc_state = state->crtc_states[i];
+
+		if (!crtc)
+			continue;
+
+		/* Don't update ->enable for the CRTC in the set_config request,
+		 * since a mismatch would indicate a bug in the upper layers.
+		 * The actual modeset code later on will catch any
+		 * inconsistencies here. */
+		if (crtc == set->crtc)
+			continue;
+
+		crtc_state->enable =
+			drm_atomic_connectors_for_crtc(state, crtc);
+	}
+
+	return 0;
+}
+
+/**
+ * drm_atomic_helper_set_config - set a new config from userspace
+ * @set: mode set configuration
+ *
+ * Provides a default crtc set_config handler using the atomic driver interface.
+ *
+ * Returns:
+ * Returns 0 on success, negative errno numbers on failure.
+ */
+int drm_atomic_helper_set_config(struct drm_mode_set *set)
+{
+	struct drm_atomic_state *state;
+	struct drm_crtc *crtc = set->crtc;
+	struct drm_crtc_state *crtc_state;
+	struct drm_plane_state *primary_state;
+	int ret = 0;
+
+	state = drm_atomic_state_alloc(crtc->dev);
+	if (!state)
+		return -ENOMEM;
+
+	state->acquire_ctx = drm_modeset_legacy_acquire_ctx(crtc);
+retry:
+	crtc_state = drm_atomic_get_crtc_state(state, crtc);
+	if (IS_ERR(crtc_state)) {
+		ret = PTR_ERR(crtc_state);
+		goto fail;
+	}
+
+	primary_state = drm_atomic_get_plane_state(state, crtc->primary);
+	if (IS_ERR(primary_state)) {
+		ret = PTR_ERR(primary_state);
+		goto fail;
+	}
+
+	if (!set->mode) {
+		WARN_ON(set->fb);
+		WARN_ON(set->num_connectors);
+
+		crtc_state->enable = false;
+
+		ret = drm_atomic_set_crtc_for_plane(state, crtc->primary, NULL);
+		if (ret != 0)
+			goto fail;
+
+		drm_atomic_set_fb_for_plane(primary_state, NULL);
+
+		goto commit;
+	}
+
+	WARN_ON(!set->fb);
+	WARN_ON(!set->num_connectors);
+
+	crtc_state->enable = true;
+	drm_mode_copy(&crtc_state->mode, set->mode);
+
+	ret = drm_atomic_set_crtc_for_plane(state, crtc->primary, crtc);
+	if (ret != 0)
+		goto fail;
+	drm_atomic_set_fb_for_plane(primary_state, set->fb);
+	primary_state->crtc_x = 0;
+	primary_state->crtc_y = 0;
+	primary_state->crtc_h = set->mode->vdisplay;
+	primary_state->crtc_w = set->mode->hdisplay;
+	primary_state->src_x = set->x << 16;
+	primary_state->src_y = set->y << 16;
+	primary_state->src_h = set->mode->vdisplay << 16;
+	primary_state->src_w = set->mode->hdisplay << 16;
+
+commit:
+	ret = update_output_state(state, set);
+	if (ret)
+		goto fail;
+
+	ret = drm_atomic_commit(state);
+	if (ret != 0)
+		goto fail;
+
+	/* Driver takes ownership of state on successful commit. */
+	return 0;
+fail:
+	if (ret == -EDEADLK)
+		goto backoff;
+
+	drm_atomic_state_free(state);
+
+	return ret;
+backoff:
+	drm_atomic_state_clear(state);
+	drm_atomic_legacy_backoff(state);
+
+	/*
+	 * Someone might have exchanged the framebuffer while we dropped locks
+	 * in the backoff code. We need to fix up the fb refcount tracking the
+	 * core does for us.
+	 */
+	crtc->primary->old_fb = crtc->primary->fb;
+
+	goto retry;
+}
+EXPORT_SYMBOL(drm_atomic_helper_set_config);
+
+/**
+ * drm_atomic_helper_crtc_set_property - helper for crtc prorties
+ * @crtc: DRM crtc
+ * @property: DRM property
+ * @val: value of property
+ *
+ * Provides a default plane disablle handler using the atomic driver interface.
+ *
+ * RETURNS:
+ * Zero on success, error code on failure
+ */
+int
+drm_atomic_helper_crtc_set_property(struct drm_crtc *crtc,
+				    struct drm_property *property,
+				    uint64_t val)
+{
+	struct drm_atomic_state *state;
+	struct drm_crtc_state *crtc_state;
+	int ret = 0;
+
+	state = drm_atomic_state_alloc(crtc->dev);
+	if (!state)
+		return -ENOMEM;
+
+	/* ->set_property is always called with all locks held. */
+	state->acquire_ctx = crtc->dev->mode_config.acquire_ctx;
+retry:
+	crtc_state = drm_atomic_get_crtc_state(state, crtc);
+	if (IS_ERR(crtc_state)) {
+		ret = PTR_ERR(crtc_state);
+		goto fail;
+	}
+
+	ret = crtc->funcs->atomic_set_property(crtc, crtc_state,
+					       property, val);
+	if (ret)
+		goto fail;
+
+	ret = drm_atomic_commit(state);
+	if (ret != 0)
+		goto fail;
+
+	/* Driver takes ownership of state on successful commit. */
+	return 0;
+fail:
+	if (ret == -EDEADLK)
+		goto backoff;
+
+	drm_atomic_state_free(state);
+
+	return ret;
+backoff:
+	drm_atomic_state_clear(state);
+	drm_atomic_legacy_backoff(state);
+
+	goto retry;
+}
+EXPORT_SYMBOL(drm_atomic_helper_crtc_set_property);
+
+/**
+ * drm_atomic_helper_plane_set_property - helper for plane prorties
+ * @plane: DRM plane
+ * @property: DRM property
+ * @val: value of property
+ *
+ * Provides a default plane disable handler using the atomic driver interface.
+ *
+ * RETURNS:
+ * Zero on success, error code on failure
+ */
+int
+drm_atomic_helper_plane_set_property(struct drm_plane *plane,
+				    struct drm_property *property,
+				    uint64_t val)
+{
+	struct drm_atomic_state *state;
+	struct drm_plane_state *plane_state;
+	int ret = 0;
+
+	state = drm_atomic_state_alloc(plane->dev);
+	if (!state)
+		return -ENOMEM;
+
+	/* ->set_property is always called with all locks held. */
+	state->acquire_ctx = plane->dev->mode_config.acquire_ctx;
+retry:
+	plane_state = drm_atomic_get_plane_state(state, plane);
+	if (IS_ERR(plane_state)) {
+		ret = PTR_ERR(plane_state);
+		goto fail;
+	}
+
+	ret = plane->funcs->atomic_set_property(plane, plane_state,
+					       property, val);
+	if (ret)
+		goto fail;
+
+	ret = drm_atomic_commit(state);
+	if (ret != 0)
+		goto fail;
+
+	/* Driver takes ownership of state on successful commit. */
+	return 0;
+fail:
+	if (ret == -EDEADLK)
+		goto backoff;
+
+	drm_atomic_state_free(state);
+
+	return ret;
+backoff:
+	drm_atomic_state_clear(state);
+	drm_atomic_legacy_backoff(state);
+
+	goto retry;
+}
+EXPORT_SYMBOL(drm_atomic_helper_plane_set_property);
+
+/**
+ * drm_atomic_helper_connector_set_property - helper for connector prorties
+ * @connector: DRM connector
+ * @property: DRM property
+ * @val: value of property
+ *
+ * Provides a default plane disablle handler using the atomic driver interface.
+ *
+ * RETURNS:
+ * Zero on success, error code on failure
+ */
+int
+drm_atomic_helper_connector_set_property(struct drm_connector *connector,
+				    struct drm_property *property,
+				    uint64_t val)
+{
+	struct drm_atomic_state *state;
+	struct drm_connector_state *connector_state;
+	int ret = 0;
+
+	state = drm_atomic_state_alloc(connector->dev);
+	if (!state)
+		return -ENOMEM;
+
+	/* ->set_property is always called with all locks held. */
+	state->acquire_ctx = connector->dev->mode_config.acquire_ctx;
+retry:
+	connector_state = drm_atomic_get_connector_state(state, connector);
+	if (IS_ERR(connector_state)) {
+		ret = PTR_ERR(connector_state);
+		goto fail;
+	}
+
+	ret = connector->funcs->atomic_set_property(connector, connector_state,
+					       property, val);
+	if (ret)
+		goto fail;
+
+	ret = drm_atomic_commit(state);
+	if (ret != 0)
+		goto fail;
+
+	/* Driver takes ownership of state on successful commit. */
+	return 0;
+fail:
+	if (ret == -EDEADLK)
+		goto backoff;
+
+	drm_atomic_state_free(state);
+
+	return ret;
+backoff:
+	drm_atomic_state_clear(state);
+	drm_atomic_legacy_backoff(state);
+
+	goto retry;
+}
+EXPORT_SYMBOL(drm_atomic_helper_connector_set_property);
+
+/**
+ * drm_atomic_helper_page_flip - execute a legacy page flip
+ * @crtc: DRM crtc
+ * @fb: DRM framebuffer
+ * @event: optional DRM event to signal upon completion
+ * @flags: flip flags for non-vblank sync'ed updates
+ *
+ * Provides a default page flip implementation using the atomic driver interface.
+ *
+ * Note that for now so called async page flips (i.e. updates which are not
+ * synchronized to vblank) are not supported, since the atomic interfaces have
+ * no provisions for this yet.
+ *
+ * Returns:
+ * Returns 0 on success, negative errno numbers on failure.
+ */
+int drm_atomic_helper_page_flip(struct drm_crtc *crtc,
+				struct drm_framebuffer *fb,
+				struct drm_pending_vblank_event *event,
+				uint32_t flags)
+{
+	struct drm_plane *plane = crtc->primary;
+	struct drm_atomic_state *state;
+	struct drm_plane_state *plane_state;
+	struct drm_crtc_state *crtc_state;
+	int ret = 0;
+
+	if (flags & DRM_MODE_PAGE_FLIP_ASYNC)
+		return -EINVAL;
+
+	state = drm_atomic_state_alloc(plane->dev);
+	if (!state)
+		return -ENOMEM;
+
+	state->acquire_ctx = drm_modeset_legacy_acquire_ctx(crtc);
+retry:
+	crtc_state = drm_atomic_get_crtc_state(state, crtc);
+	if (IS_ERR(crtc_state)) {
+		ret = PTR_ERR(crtc_state);
+		goto fail;
+	}
+	crtc_state->event = event;
+
+	plane_state = drm_atomic_get_plane_state(state, plane);
+	if (IS_ERR(plane_state)) {
+		ret = PTR_ERR(plane_state);
+		goto fail;
+	}
+
+	ret = drm_atomic_set_crtc_for_plane(state, plane, crtc);
+	if (ret != 0)
+		goto fail;
+	drm_atomic_set_fb_for_plane(plane_state, fb);
+
+	ret = drm_atomic_async_commit(state);
+	if (ret != 0)
+		goto fail;
+
+	/* TODO: ->page_flip is the only driver callback where the core
+	 * doesn't update plane->fb. For now patch it up here. */
+	plane->fb = plane->state->fb;
+
+	/* Driver takes ownership of state on successful async commit. */
+	return 0;
+fail:
+	if (ret == -EDEADLK)
+		goto backoff;
+
+	drm_atomic_state_free(state);
+
+	return ret;
+backoff:
+	drm_atomic_state_clear(state);
+	drm_atomic_legacy_backoff(state);
+
+	/*
+	 * Someone might have exchanged the framebuffer while we dropped locks
+	 * in the backoff code. We need to fix up the fb refcount tracking the
+	 * core does for us.
+	 */
+	plane->old_fb = plane->fb;
+
+	goto retry;
+}
+EXPORT_SYMBOL(drm_atomic_helper_page_flip);
+
+/**
+ * DOC: atomic state reset and initialization
+ *
+ * Both the drm core and the atomic helpers assume that there is always the full
+ * and correct atomic software state for all connectors, CRTCs and planes
+ * available. Which is a bit a problem on driver load and also after system
+ * suspend. One way to solve this is to have a hardware state read-out
+ * infrastructure which reconstructs the full software state (e.g. the i915
+ * driver).
+ *
+ * The simpler solution is to just reset the software state to everything off,
+ * which is easiest to do by calling drm_mode_config_reset(). To facilitate this
+ * the atomic helpers provide default reset implementations for all hooks.
+ */
+
+/**
+ * drm_atomic_helper_crtc_reset - default ->reset hook for CRTCs
+ * @crtc: drm CRTC
+ *
+ * Resets the atomic state for @crtc by freeing the state pointer (which might
+ * be NULL, e.g. at driver load time) and allocating a new empty state object.
+ */
+void drm_atomic_helper_crtc_reset(struct drm_crtc *crtc)
+{
+	kfree(crtc->state);
+	crtc->state = kzalloc(sizeof(*crtc->state), GFP_KERNEL);
+}
+EXPORT_SYMBOL(drm_atomic_helper_crtc_reset);
+
+/**
+ * drm_atomic_helper_crtc_duplicate_state - default state duplicate hook
+ * @crtc: drm CRTC
+ *
+ * Default CRTC state duplicate hook for drivers which don't have their own
+ * subclassed CRTC state structure.
+ */
+struct drm_crtc_state *
+drm_atomic_helper_crtc_duplicate_state(struct drm_crtc *crtc)
+{
+	struct drm_crtc_state *state;
+
+	if (WARN_ON(!crtc->state))
+		return NULL;
+
+	state = kmemdup(crtc->state, sizeof(*crtc->state), GFP_KERNEL);
+
+	if (state) {
+		state->mode_changed = false;
+		state->planes_changed = false;
+		state->event = NULL;
+	}
+
+	return state;
+}
+EXPORT_SYMBOL(drm_atomic_helper_crtc_duplicate_state);
+
+/**
+ * drm_atomic_helper_crtc_destroy_state - default state destroy hook
+ * @crtc: drm CRTC
+ * @state: CRTC state object to release
+ *
+ * Default CRTC state destroy hook for drivers which don't have their own
+ * subclassed CRTC state structure.
+ */
+void drm_atomic_helper_crtc_destroy_state(struct drm_crtc *crtc,
+					  struct drm_crtc_state *state)
+{
+	kfree(state);
+}
+EXPORT_SYMBOL(drm_atomic_helper_crtc_destroy_state);
+
+/**
+ * drm_atomic_helper_plane_reset - default ->reset hook for planes
+ * @plane: drm plane
+ *
+ * Resets the atomic state for @plane by freeing the state pointer (which might
+ * be NULL, e.g. at driver load time) and allocating a new empty state object.
+ */
+void drm_atomic_helper_plane_reset(struct drm_plane *plane)
+{
+	if (plane->state && plane->state->fb)
+		drm_framebuffer_unreference(plane->state->fb);
+
+	kfree(plane->state);
+	plane->state = kzalloc(sizeof(*plane->state), GFP_KERNEL);
+}
+EXPORT_SYMBOL(drm_atomic_helper_plane_reset);
+
+/**
+ * drm_atomic_helper_plane_duplicate_state - default state duplicate hook
+ * @plane: drm plane
+ *
+ * Default plane state duplicate hook for drivers which don't have their own
+ * subclassed plane state structure.
+ */
+struct drm_plane_state *
+drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane)
+{
+	struct drm_plane_state *state;
+
+	if (WARN_ON(!plane->state))
+		return NULL;
+
+	state = kmemdup(plane->state, sizeof(*plane->state), GFP_KERNEL);
+
+	if (state && state->fb)
+		drm_framebuffer_reference(state->fb);
+
+	return state;
+}
+EXPORT_SYMBOL(drm_atomic_helper_plane_duplicate_state);
+
+/**
+ * drm_atomic_helper_plane_destroy_state - default state destroy hook
+ * @plane: drm plane
+ * @state: plane state object to release
+ *
+ * Default plane state destroy hook for drivers which don't have their own
+ * subclassed plane state structure.
+ */
+void drm_atomic_helper_plane_destroy_state(struct drm_plane *plane,
+					   struct drm_plane_state *state)
+{
+	if (state->fb)
+		drm_framebuffer_unreference(state->fb);
+
+	kfree(state);
+}
+EXPORT_SYMBOL(drm_atomic_helper_plane_destroy_state);
+
+/**
+ * drm_atomic_helper_connector_reset - default ->reset hook for connectors
+ * @connector: drm connector
+ *
+ * Resets the atomic state for @connector by freeing the state pointer (which
+ * might be NULL, e.g. at driver load time) and allocating a new empty state
+ * object.
+ */
+void drm_atomic_helper_connector_reset(struct drm_connector *connector)
+{
+	kfree(connector->state);
+	connector->state = kzalloc(sizeof(*connector->state), GFP_KERNEL);
+}
+EXPORT_SYMBOL(drm_atomic_helper_connector_reset);
+
+/**
+ * drm_atomic_helper_connector_duplicate_state - default state duplicate hook
+ * @connector: drm connector
+ *
+ * Default connector state duplicate hook for drivers which don't have their own
+ * subclassed connector state structure.
+ */
+struct drm_connector_state *
+drm_atomic_helper_connector_duplicate_state(struct drm_connector *connector)
+{
+	if (WARN_ON(!connector->state))
+		return NULL;
+
+	return kmemdup(connector->state, sizeof(*connector->state), GFP_KERNEL);
+}
+EXPORT_SYMBOL(drm_atomic_helper_connector_duplicate_state);
+
+/**
+ * drm_atomic_helper_connector_destroy_state - default state destroy hook
+ * @connector: drm connector
+ * @state: connector state object to release
+ *
+ * Default connector state destroy hook for drivers which don't have their own
+ * subclassed connector state structure.
+ */
+void drm_atomic_helper_connector_destroy_state(struct drm_connector *connector,
+					  struct drm_connector_state *state)
+{
+	kfree(state);
+}
+EXPORT_SYMBOL(drm_atomic_helper_connector_destroy_state);
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index e79c8d3700d8..5213da499d39 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -683,7 +683,7 @@ int drm_crtc_init_with_planes(struct drm_device *dev, struct drm_crtc *crtc,
 	drm_modeset_lock_init(&crtc->mutex);
 	ret = drm_mode_object_get(dev, &crtc->base, DRM_MODE_OBJECT_CRTC);
 	if (ret)
-		goto out;
+		return ret;
 
 	crtc->base.properties = &crtc->properties;
 
@@ -697,9 +697,7 @@ int drm_crtc_init_with_planes(struct drm_device *dev, struct drm_crtc *crtc,
 	if (cursor)
 		cursor->possible_crtcs = 1 << drm_crtc_index(crtc);
 
- out:
-
-	return ret;
+	return 0;
 }
 EXPORT_SYMBOL(drm_crtc_init_with_planes);
 
@@ -723,6 +721,12 @@ void drm_crtc_cleanup(struct drm_crtc *crtc)
 	drm_mode_object_put(dev, &crtc->base);
 	list_del(&crtc->head);
 	dev->mode_config.num_crtc--;
+
+	WARN_ON(crtc->state && !crtc->funcs->atomic_destroy_state);
+	if (crtc->state && crtc->funcs->atomic_destroy_state)
+		crtc->funcs->atomic_destroy_state(crtc, crtc->state);
+
+	memset(crtc, 0, sizeof(*crtc));
 }
 EXPORT_SYMBOL(drm_crtc_cleanup);
 
@@ -766,7 +770,6 @@ static void drm_mode_remove(struct drm_connector *connector,
 /**
  * drm_connector_get_cmdline_mode - reads the user's cmdline mode
  * @connector: connector to quwery
- * @mode: returned mode
  *
  * The kernel supports per-connector configration of its consoles through
  * use of the video= parameter. This function parses that option and
@@ -870,6 +873,8 @@ int drm_connector_init(struct drm_device *dev,
 
 	drm_connector_get_cmdline_mode(connector);
 
+	/* We should add connectors at the end to avoid upsetting the connector
+	 * index too much. */
 	list_add_tail(&connector->head, &dev->mode_config.connector_list);
 	dev->mode_config.num_connector++;
 
@@ -905,6 +910,11 @@ void drm_connector_cleanup(struct drm_connector *connector)
 	struct drm_device *dev = connector->dev;
 	struct drm_display_mode *mode, *t;
 
+	if (connector->tile_group) {
+		drm_mode_put_tile_group(dev, connector->tile_group);
+		connector->tile_group = NULL;
+	}
+
 	list_for_each_entry_safe(mode, t, &connector->probed_modes, head)
 		drm_mode_remove(connector, mode);
 
@@ -919,6 +929,13 @@ void drm_connector_cleanup(struct drm_connector *connector)
 	connector->name = NULL;
 	list_del(&connector->head);
 	dev->mode_config.num_connector--;
+
+	WARN_ON(connector->state && !connector->funcs->atomic_destroy_state);
+	if (connector->state && connector->funcs->atomic_destroy_state)
+		connector->funcs->atomic_destroy_state(connector,
+						       connector->state);
+
+	memset(connector, 0, sizeof(*connector));
 }
 EXPORT_SYMBOL(drm_connector_cleanup);
 
@@ -933,6 +950,9 @@ unsigned int drm_connector_index(struct drm_connector *connector)
 {
 	unsigned int index = 0;
 	struct drm_connector *tmp;
+	struct drm_mode_config *config = &connector->dev->mode_config;
+
+	WARN_ON(!drm_modeset_is_locked(&config->connection_mutex));
 
 	list_for_each_entry(tmp, &connector->dev->mode_config.connector_list, head) {
 		if (tmp == connector)
@@ -1057,6 +1077,8 @@ void drm_bridge_cleanup(struct drm_bridge *bridge)
 	list_del(&bridge->head);
 	dev->mode_config.num_bridge--;
 	drm_modeset_unlock_all(dev);
+
+	memset(bridge, 0, sizeof(*bridge));
 }
 EXPORT_SYMBOL(drm_bridge_cleanup);
 
@@ -1123,10 +1145,11 @@ void drm_encoder_cleanup(struct drm_encoder *encoder)
 	drm_modeset_lock_all(dev);
 	drm_mode_object_put(dev, &encoder->base);
 	kfree(encoder->name);
-	encoder->name = NULL;
 	list_del(&encoder->head);
 	dev->mode_config.num_encoder--;
 	drm_modeset_unlock_all(dev);
+
+	memset(encoder, 0, sizeof(*encoder));
 }
 EXPORT_SYMBOL(drm_encoder_cleanup);
 
@@ -1153,11 +1176,11 @@ int drm_universal_plane_init(struct drm_device *dev, struct drm_plane *plane,
 {
 	int ret;
 
-	drm_modeset_lock_all(dev);
-
 	ret = drm_mode_object_get(dev, &plane->base, DRM_MODE_OBJECT_PLANE);
 	if (ret)
-		goto out;
+		return ret;
+
+	drm_modeset_lock_init(&plane->mutex);
 
 	plane->base.properties = &plane->properties;
 	plane->dev = dev;
@@ -1167,8 +1190,7 @@ int drm_universal_plane_init(struct drm_device *dev, struct drm_plane *plane,
 	if (!plane->format_types) {
 		DRM_DEBUG_KMS("out of memory when allocating plane\n");
 		drm_mode_object_put(dev, &plane->base);
-		ret = -ENOMEM;
-		goto out;
+		return -ENOMEM;
 	}
 
 	memcpy(plane->format_types, formats, format_count * sizeof(uint32_t));
@@ -1185,10 +1207,7 @@ int drm_universal_plane_init(struct drm_device *dev, struct drm_plane *plane,
 				   dev->mode_config.plane_type_property,
 				   plane->type);
 
- out:
-	drm_modeset_unlock_all(dev);
-
-	return ret;
+	return 0;
 }
 EXPORT_SYMBOL(drm_universal_plane_init);
 
@@ -1246,6 +1265,12 @@ void drm_plane_cleanup(struct drm_plane *plane)
 	if (plane->type == DRM_PLANE_TYPE_OVERLAY)
 		dev->mode_config.num_overlay_plane--;
 	drm_modeset_unlock_all(dev);
+
+	WARN_ON(plane->state && !plane->funcs->atomic_destroy_state);
+	if (plane->state && plane->funcs->atomic_destroy_state)
+		plane->funcs->atomic_destroy_state(plane, plane->state);
+
+	memset(plane, 0, sizeof(*plane));
 }
 EXPORT_SYMBOL(drm_plane_cleanup);
 
@@ -1328,6 +1353,11 @@ static int drm_mode_create_standard_connector_properties(struct drm_device *dev)
 				       "PATH", 0);
 	dev->mode_config.path_property = dev_path;
 
+	dev->mode_config.tile_property = drm_property_create(dev,
+							     DRM_MODE_PROP_BLOB |
+							     DRM_MODE_PROP_IMMUTABLE,
+							     "TILE", 0);
+
 	return 0;
 }
 
@@ -1388,12 +1418,13 @@ EXPORT_SYMBOL(drm_mode_create_dvi_i_properties);
  * responsible for allocating a list of format names and passing them to
  * this routine.
  */
-int drm_mode_create_tv_properties(struct drm_device *dev, int num_modes,
+int drm_mode_create_tv_properties(struct drm_device *dev,
+				  unsigned int num_modes,
 				  char *modes[])
 {
 	struct drm_property *tv_selector;
 	struct drm_property *tv_subconnector;
-	int i;
+	unsigned int i;
 
 	if (dev->mode_config.tv_select_subconnector_property)
 		return 0;
@@ -1491,7 +1522,7 @@ EXPORT_SYMBOL(drm_mode_create_scaling_mode_property);
  * connectors.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_create_aspect_ratio_property(struct drm_device *dev)
 {
@@ -1535,6 +1566,30 @@ int drm_mode_create_dirty_info_property(struct drm_device *dev)
 }
 EXPORT_SYMBOL(drm_mode_create_dirty_info_property);
 
+/**
+ * drm_mode_create_suggested_offset_properties - create suggests offset properties
+ * @dev: DRM device
+ *
+ * Create the the suggested x/y offset property for connectors.
+ */
+int drm_mode_create_suggested_offset_properties(struct drm_device *dev)
+{
+	if (dev->mode_config.suggested_x_property && dev->mode_config.suggested_y_property)
+		return 0;
+
+	dev->mode_config.suggested_x_property =
+		drm_property_create_range(dev, DRM_MODE_PROP_IMMUTABLE, "suggested X", 0, 0xffffffff);
+
+	dev->mode_config.suggested_y_property =
+		drm_property_create_range(dev, DRM_MODE_PROP_IMMUTABLE, "suggested Y", 0, 0xffffffff);
+
+	if (dev->mode_config.suggested_x_property == NULL ||
+	    dev->mode_config.suggested_y_property == NULL)
+		return -ENOMEM;
+	return 0;
+}
+EXPORT_SYMBOL(drm_mode_create_suggested_offset_properties);
+
 static int drm_mode_group_init(struct drm_device *dev, struct drm_mode_group *group)
 {
 	uint32_t total_objects = 0;
@@ -1651,7 +1706,7 @@ static void drm_crtc_convert_to_umode(struct drm_mode_modeinfo *out,
  * the caller.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 static int drm_crtc_convert_umode(struct drm_display_mode *out,
 				  const struct drm_mode_modeinfo *in)
@@ -1694,7 +1749,7 @@ static int drm_crtc_convert_umode(struct drm_display_mode *out,
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_getresources(struct drm_device *dev, void *data,
 			  struct drm_file *file_priv)
@@ -1745,7 +1800,9 @@ int drm_mode_getresources(struct drm_device *dev, void *data,
 	card_res->count_fbs = fb_count;
 	mutex_unlock(&file_priv->fbs_lock);
 
-	drm_modeset_lock_all(dev);
+	/* mode_config.mutex protects the connector list against e.g. DP MST
+	 * connector hot-adding. CRTC/Plane lists are invariant. */
+	mutex_lock(&dev->mode_config.mutex);
 	if (!drm_is_primary_client(file_priv)) {
 
 		mode_group = NULL;
@@ -1865,7 +1922,7 @@ int drm_mode_getresources(struct drm_device *dev, void *data,
 		  card_res->count_connectors, card_res->count_encoders);
 
 out:
-	drm_modeset_unlock_all(dev);
+	mutex_unlock(&dev->mode_config.mutex);
 	return ret;
 }
 
@@ -1880,26 +1937,22 @@ out:
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_getcrtc(struct drm_device *dev,
 		     void *data, struct drm_file *file_priv)
 {
 	struct drm_mode_crtc *crtc_resp = data;
 	struct drm_crtc *crtc;
-	int ret = 0;
 
 	if (!drm_core_check_feature(dev, DRIVER_MODESET))
 		return -EINVAL;
 
-	drm_modeset_lock_all(dev);
-
 	crtc = drm_crtc_find(dev, crtc_resp->crtc_id);
-	if (!crtc) {
-		ret = -ENOENT;
-		goto out;
-	}
+	if (!crtc)
+		return -ENOENT;
 
+	drm_modeset_lock_crtc(crtc, crtc->primary);
 	crtc_resp->x = crtc->x;
 	crtc_resp->y = crtc->y;
 	crtc_resp->gamma_size = crtc->gamma_size;
@@ -1916,10 +1969,9 @@ int drm_mode_getcrtc(struct drm_device *dev,
 	} else {
 		crtc_resp->mode_valid = 0;
 	}
+	drm_modeset_unlock_crtc(crtc);
 
-out:
-	drm_modeset_unlock_all(dev);
-	return ret;
+	return 0;
 }
 
 static bool drm_mode_expose_to_userspace(const struct drm_display_mode *mode,
@@ -1935,6 +1987,15 @@ static bool drm_mode_expose_to_userspace(const struct drm_display_mode *mode,
 	return true;
 }
 
+static struct drm_encoder *drm_connector_get_encoder(struct drm_connector *connector)
+{
+	/* For atomic drivers only state objects are synchronously updated and
+	 * protected by modeset locks, so check those first. */
+	if (connector->state)
+		return connector->state->best_encoder;
+	return connector->encoder;
+}
+
 /**
  * drm_mode_getconnector - get connector configuration
  * @dev: drm device for the ioctl
@@ -1946,13 +2007,14 @@ static bool drm_mode_expose_to_userspace(const struct drm_display_mode *mode,
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_getconnector(struct drm_device *dev, void *data,
 			  struct drm_file *file_priv)
 {
 	struct drm_mode_get_connector *out_resp = data;
 	struct drm_connector *connector;
+	struct drm_encoder *encoder;
 	struct drm_display_mode *mode;
 	int mode_count = 0;
 	int props_count = 0;
@@ -2008,8 +2070,10 @@ int drm_mode_getconnector(struct drm_device *dev, void *data,
 	out_resp->subpixel = connector->display_info.subpixel_order;
 	out_resp->connection = connector->status;
 	drm_modeset_lock(&dev->mode_config.connection_mutex, NULL);
-	if (connector->encoder)
-		out_resp->encoder_id = connector->encoder->base.id;
+
+	encoder = drm_connector_get_encoder(connector);
+	if (encoder)
+		out_resp->encoder_id = encoder->base.id;
 	else
 		out_resp->encoder_id = 0;
 	drm_modeset_unlock(&dev->mode_config.connection_mutex);
@@ -2079,6 +2143,33 @@ out:
 	return ret;
 }
 
+static struct drm_crtc *drm_encoder_get_crtc(struct drm_encoder *encoder)
+{
+	struct drm_connector *connector;
+	struct drm_device *dev = encoder->dev;
+	bool uses_atomic = false;
+
+	/* For atomic drivers only state objects are synchronously updated and
+	 * protected by modeset locks, so check those first. */
+	list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
+		if (!connector->state)
+			continue;
+
+		uses_atomic = true;
+
+		if (connector->state->best_encoder != encoder)
+			continue;
+
+		return connector->state->crtc;
+	}
+
+	/* Don't return stale data (e.g. pending async disable). */
+	if (uses_atomic)
+		return NULL;
+
+	return encoder->crtc;
+}
+
 /**
  * drm_mode_getencoder - get encoder configuration
  * @dev: drm device for the ioctl
@@ -2090,37 +2181,38 @@ out:
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_getencoder(struct drm_device *dev, void *data,
 			struct drm_file *file_priv)
 {
 	struct drm_mode_get_encoder *enc_resp = data;
 	struct drm_encoder *encoder;
-	int ret = 0;
+	struct drm_crtc *crtc;
 
 	if (!drm_core_check_feature(dev, DRIVER_MODESET))
 		return -EINVAL;
 
-	drm_modeset_lock_all(dev);
 	encoder = drm_encoder_find(dev, enc_resp->encoder_id);
-	if (!encoder) {
-		ret = -ENOENT;
-		goto out;
-	}
+	if (!encoder)
+		return -ENOENT;
 
-	if (encoder->crtc)
+	drm_modeset_lock(&dev->mode_config.connection_mutex, NULL);
+	crtc = drm_encoder_get_crtc(encoder);
+	if (crtc)
+		enc_resp->crtc_id = crtc->base.id;
+	else if (encoder->crtc)
 		enc_resp->crtc_id = encoder->crtc->base.id;
 	else
 		enc_resp->crtc_id = 0;
+	drm_modeset_unlock(&dev->mode_config.connection_mutex);
+
 	enc_resp->encoder_type = encoder->encoder_type;
 	enc_resp->encoder_id = encoder->base.id;
 	enc_resp->possible_crtcs = encoder->possible_crtcs;
 	enc_resp->possible_clones = encoder->possible_clones;
 
-out:
-	drm_modeset_unlock_all(dev);
-	return ret;
+	return 0;
 }
 
 /**
@@ -2134,7 +2226,7 @@ out:
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_getplane_res(struct drm_device *dev, void *data,
 			  struct drm_file *file_priv)
@@ -2143,13 +2235,12 @@ int drm_mode_getplane_res(struct drm_device *dev, void *data,
 	struct drm_mode_config *config;
 	struct drm_plane *plane;
 	uint32_t __user *plane_ptr;
-	int copied = 0, ret = 0;
+	int copied = 0;
 	unsigned num_planes;
 
 	if (!drm_core_check_feature(dev, DRIVER_MODESET))
 		return -EINVAL;
 
-	drm_modeset_lock_all(dev);
 	config = &dev->mode_config;
 
 	if (file_priv->universal_planes)
@@ -2165,6 +2256,7 @@ int drm_mode_getplane_res(struct drm_device *dev, void *data,
 	    (plane_resp->count_planes >= num_planes)) {
 		plane_ptr = (uint32_t __user *)(unsigned long)plane_resp->plane_id_ptr;
 
+		/* Plane lists are invariant, no locking needed. */
 		list_for_each_entry(plane, &config->plane_list, head) {
 			/*
 			 * Unless userspace set the 'universal planes'
@@ -2174,18 +2266,14 @@ int drm_mode_getplane_res(struct drm_device *dev, void *data,
 			    !file_priv->universal_planes)
 				continue;
 
-			if (put_user(plane->base.id, plane_ptr + copied)) {
-				ret = -EFAULT;
-				goto out;
-			}
+			if (put_user(plane->base.id, plane_ptr + copied))
+				return -EFAULT;
 			copied++;
 		}
 	}
 	plane_resp->count_planes = num_planes;
 
-out:
-	drm_modeset_unlock_all(dev);
-	return ret;
+	return 0;
 }
 
 /**
@@ -2199,7 +2287,7 @@ out:
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_getplane(struct drm_device *dev, void *data,
 		      struct drm_file *file_priv)
@@ -2207,18 +2295,15 @@ int drm_mode_getplane(struct drm_device *dev, void *data,
 	struct drm_mode_get_plane *plane_resp = data;
 	struct drm_plane *plane;
 	uint32_t __user *format_ptr;
-	int ret = 0;
 
 	if (!drm_core_check_feature(dev, DRIVER_MODESET))
 		return -EINVAL;
 
-	drm_modeset_lock_all(dev);
 	plane = drm_plane_find(dev, plane_resp->plane_id);
-	if (!plane) {
-		ret = -ENOENT;
-		goto out;
-	}
+	if (!plane)
+		return -ENOENT;
 
+	drm_modeset_lock(&plane->mutex, NULL);
 	if (plane->crtc)
 		plane_resp->crtc_id = plane->crtc->base.id;
 	else
@@ -2228,6 +2313,7 @@ int drm_mode_getplane(struct drm_device *dev, void *data,
 		plane_resp->fb_id = plane->fb->base.id;
 	else
 		plane_resp->fb_id = 0;
+	drm_modeset_unlock(&plane->mutex);
 
 	plane_resp->plane_id = plane->base.id;
 	plane_resp->possible_crtcs = plane->possible_crtcs;
@@ -2243,15 +2329,12 @@ int drm_mode_getplane(struct drm_device *dev, void *data,
 		if (copy_to_user(format_ptr,
 				 plane->format_types,
 				 sizeof(uint32_t) * plane->format_count)) {
-			ret = -EFAULT;
-			goto out;
+			return -EFAULT;
 		}
 	}
 	plane_resp->count_format_types = plane->format_count;
 
-out:
-	drm_modeset_unlock_all(dev);
-	return ret;
+	return 0;
 }
 
 /*
@@ -2274,7 +2357,7 @@ static int __setplane_internal(struct drm_plane *plane,
 {
 	int ret = 0;
 	unsigned int fb_width, fb_height;
-	int i;
+	unsigned int i;
 
 	/* No fb means shut it down */
 	if (!fb) {
@@ -2378,13 +2461,12 @@ static int setplane_internal(struct drm_plane *plane,
  * valid crtc).
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_setplane(struct drm_device *dev, void *data,
 		      struct drm_file *file_priv)
 {
 	struct drm_mode_set_plane *plane_req = data;
-	struct drm_mode_object *obj;
 	struct drm_plane *plane;
 	struct drm_crtc *crtc = NULL;
 	struct drm_framebuffer *fb = NULL;
@@ -2407,14 +2489,12 @@ int drm_mode_setplane(struct drm_device *dev, void *data,
 	 * First, find the plane, crtc, and fb objects.  If not available,
 	 * we don't bother to call the driver.
 	 */
-	obj = drm_mode_object_find(dev, plane_req->plane_id,
-				   DRM_MODE_OBJECT_PLANE);
-	if (!obj) {
+	plane = drm_plane_find(dev, plane_req->plane_id);
+	if (!plane) {
 		DRM_DEBUG_KMS("Unknown plane ID %d\n",
 			      plane_req->plane_id);
 		return -ENOENT;
 	}
-	plane = obj_to_plane(obj);
 
 	if (plane_req->fb_id) {
 		fb = drm_framebuffer_lookup(dev, plane_req->fb_id);
@@ -2424,14 +2504,12 @@ int drm_mode_setplane(struct drm_device *dev, void *data,
 			return -ENOENT;
 		}
 
-		obj = drm_mode_object_find(dev, plane_req->crtc_id,
-					   DRM_MODE_OBJECT_CRTC);
-		if (!obj) {
+		crtc = drm_crtc_find(dev, plane_req->crtc_id);
+		if (!crtc) {
 			DRM_DEBUG_KMS("Unknown crtc ID %d\n",
 				      plane_req->crtc_id);
 			return -ENOENT;
 		}
-		crtc = obj_to_crtc(obj);
 	}
 
 	/*
@@ -2453,7 +2531,7 @@ int drm_mode_setplane(struct drm_device *dev, void *data,
  * interface. The only thing it adds is correct refcounting dance.
  * 
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_set_config_internal(struct drm_mode_set *set)
 {
@@ -2546,7 +2624,7 @@ EXPORT_SYMBOL(drm_crtc_check_viewport);
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_setcrtc(struct drm_device *dev, void *data,
 		     struct drm_file *file_priv)
@@ -2709,7 +2787,7 @@ out:
  * userspace wants to make use of these capabilities.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 static int drm_mode_cursor_universal(struct drm_crtc *crtc,
 				     struct drm_mode_cursor2 *req,
@@ -2810,7 +2888,7 @@ static int drm_mode_cursor_common(struct drm_device *dev,
 	 * If this crtc has a universal cursor plane, call that plane's update
 	 * handler rather than using legacy cursor handlers.
 	 */
-	drm_modeset_lock_crtc(crtc);
+	drm_modeset_lock_crtc(crtc, crtc->cursor);
 	if (crtc->cursor) {
 		ret = drm_mode_cursor_universal(crtc, req, file_priv);
 		goto out;
@@ -2857,7 +2935,7 @@ out:
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_cursor_ioctl(struct drm_device *dev,
 			  void *data, struct drm_file *file_priv)
@@ -2884,7 +2962,7 @@ int drm_mode_cursor_ioctl(struct drm_device *dev,
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_cursor2_ioctl(struct drm_device *dev,
 			   void *data, struct drm_file *file_priv)
@@ -2943,23 +3021,21 @@ EXPORT_SYMBOL(drm_mode_legacy_fb_format);
  * @file_priv: drm file for the ioctl call
  *
  * Add a new FB to the specified CRTC, given a user request. This is the
- * original addfb ioclt which only supported RGB formats.
+ * original addfb ioctl which only supported RGB formats.
  *
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_addfb(struct drm_device *dev,
 		   void *data, struct drm_file *file_priv)
 {
 	struct drm_mode_fb_cmd *or = data;
 	struct drm_mode_fb_cmd2 r = {};
-	struct drm_mode_config *config = &dev->mode_config;
-	struct drm_framebuffer *fb;
-	int ret = 0;
+	int ret;
 
-	/* Use new struct with format internally */
+	/* convert to new format and call new ioctl */
 	r.fb_id = or->fb_id;
 	r.width = or->width;
 	r.height = or->height;
@@ -2967,28 +3043,13 @@ int drm_mode_addfb(struct drm_device *dev,
 	r.pixel_format = drm_mode_legacy_fb_format(or->bpp, or->depth);
 	r.handles[0] = or->handle;
 
-	if (!drm_core_check_feature(dev, DRIVER_MODESET))
-		return -EINVAL;
-
-	if ((config->min_width > r.width) || (r.width > config->max_width))
-		return -EINVAL;
-
-	if ((config->min_height > r.height) || (r.height > config->max_height))
-		return -EINVAL;
+	ret = drm_mode_addfb2(dev, &r, file_priv);
+	if (ret)
+		return ret;
 
-	fb = dev->mode_config.funcs->fb_create(dev, file_priv, &r);
-	if (IS_ERR(fb)) {
-		DRM_DEBUG_KMS("could not create framebuffer\n");
-		return PTR_ERR(fb);
-	}
+	or->fb_id = r.fb_id;
 
-	mutex_lock(&file_priv->fbs_lock);
-	or->fb_id = fb->base.id;
-	list_add(&fb->filp_head, &file_priv->fbs);
-	DRM_DEBUG_KMS("[FB:%d]\n", fb->base.id);
-	mutex_unlock(&file_priv->fbs_lock);
-
-	return ret;
+	return 0;
 }
 
 static int format_check(const struct drm_mode_fb_cmd2 *r)
@@ -3080,7 +3141,7 @@ static int framebuffer_check(const struct drm_mode_fb_cmd2 *r)
 	num_planes = drm_format_num_planes(r->pixel_format);
 
 	if (r->width == 0 || r->width % hsub) {
-		DRM_DEBUG_KMS("bad framebuffer width %u\n", r->height);
+		DRM_DEBUG_KMS("bad framebuffer width %u\n", r->width);
 		return -EINVAL;
 	}
 
@@ -3170,7 +3231,7 @@ static struct drm_framebuffer *add_framebuffer_internal(struct drm_device *dev,
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_addfb2(struct drm_device *dev,
 		    void *data, struct drm_file *file_priv)
@@ -3198,7 +3259,7 @@ int drm_mode_addfb2(struct drm_device *dev,
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_rmfb(struct drm_device *dev,
 		   void *data, struct drm_file *file_priv)
@@ -3252,7 +3313,7 @@ fail_lookup:
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_getfb(struct drm_device *dev,
 		   void *data, struct drm_file *file_priv)
@@ -3313,7 +3374,7 @@ int drm_mode_getfb(struct drm_device *dev,
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_dirtyfb_ioctl(struct drm_device *dev,
 			   void *data, struct drm_file *file_priv)
@@ -3393,7 +3454,7 @@ out_err1:
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 void drm_fb_release(struct drm_file *priv)
 {
@@ -3402,7 +3463,7 @@ void drm_fb_release(struct drm_file *priv)
 
 	/*
 	 * When the file gets released that means no one else can access the fb
-	 * list any more, so no need to grab fpriv->fbs_lock. And we need to to
+	 * list any more, so no need to grab fpriv->fbs_lock. And we need to
 	 * avoid upsetting lockdep since the universal cursor code adds a
 	 * framebuffer while holding mutex locks.
 	 *
@@ -3435,6 +3496,10 @@ void drm_fb_release(struct drm_file *priv)
  * object with drm_object_attach_property. The returned property object must be
  * freed with drm_property_destroy.
  *
+ * Note that the DRM core keeps a per-device list of properties and that, if
+ * drm_mode_config_cleanup() is called, it will destroy all properties created
+ * by the driver.
+ *
  * Returns:
  * A pointer to the newly created property on success, NULL on failure.
  */
@@ -3462,7 +3527,7 @@ struct drm_property *drm_property_create(struct drm_device *dev, int flags,
 
 	property->flags = flags;
 	property->num_values = num_values;
-	INIT_LIST_HEAD(&property->enum_blob_list);
+	INIT_LIST_HEAD(&property->enum_list);
 
 	if (name) {
 		strncpy(property->name, name, DRM_PROP_NAME_LEN);
@@ -3611,7 +3676,7 @@ static struct drm_property *property_create_range(struct drm_device *dev,
  * object with drm_object_attach_property. The returned property object must be
  * freed with drm_property_destroy.
  *
- * Userspace is allowed to set any interger value in the (min, max) range
+ * Userspace is allowed to set any integer value in the (min, max) range
  * inclusive.
  *
  * Returns:
@@ -3684,8 +3749,8 @@ int drm_property_add_enum(struct drm_property *property, int index,
 			(value > 63))
 		return -EINVAL;
 
-	if (!list_empty(&property->enum_blob_list)) {
-		list_for_each_entry(prop_enum, &property->enum_blob_list, head) {
+	if (!list_empty(&property->enum_list)) {
+		list_for_each_entry(prop_enum, &property->enum_list, head) {
 			if (prop_enum->value == value) {
 				strncpy(prop_enum->name, name, DRM_PROP_NAME_LEN);
 				prop_enum->name[DRM_PROP_NAME_LEN-1] = '\0';
@@ -3703,7 +3768,7 @@ int drm_property_add_enum(struct drm_property *property, int index,
 	prop_enum->value = value;
 
 	property->values[index] = value;
-	list_add_tail(&prop_enum->head, &property->enum_blob_list);
+	list_add_tail(&prop_enum->head, &property->enum_list);
 	return 0;
 }
 EXPORT_SYMBOL(drm_property_add_enum);
@@ -3720,7 +3785,7 @@ void drm_property_destroy(struct drm_device *dev, struct drm_property *property)
 {
 	struct drm_property_enum *prop_enum, *pt;
 
-	list_for_each_entry_safe(prop_enum, pt, &property->enum_blob_list, head) {
+	list_for_each_entry_safe(prop_enum, pt, &property->enum_list, head) {
 		list_del(&prop_enum->head);
 		kfree(prop_enum);
 	}
@@ -3823,17 +3888,20 @@ int drm_object_property_get_value(struct drm_mode_object *obj,
 EXPORT_SYMBOL(drm_object_property_get_value);
 
 /**
- * drm_mode_getproperty_ioctl - get the current value of a connector's property
+ * drm_mode_getproperty_ioctl - get the property metadata
  * @dev: DRM device
  * @data: ioctl data
  * @file_priv: DRM file info
  *
- * This function retrieves the current value for an connectors's property.
+ * This function retrieves the metadata for a given property, like the different
+ * possible values for an enum property or the limits for a range property.
+ *
+ * Blob properties are special
  *
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_getproperty_ioctl(struct drm_device *dev,
 			       void *data, struct drm_file *file_priv)
@@ -3841,16 +3909,12 @@ int drm_mode_getproperty_ioctl(struct drm_device *dev,
 	struct drm_mode_get_property *out_resp = data;
 	struct drm_property *property;
 	int enum_count = 0;
-	int blob_count = 0;
 	int value_count = 0;
 	int ret = 0, i;
 	int copied;
 	struct drm_property_enum *prop_enum;
 	struct drm_mode_property_enum __user *enum_ptr;
-	struct drm_property_blob *prop_blob;
-	uint32_t __user *blob_id_ptr;
 	uint64_t __user *values_ptr;
-	uint32_t __user *blob_length_ptr;
 
 	if (!drm_core_check_feature(dev, DRIVER_MODESET))
 		return -EINVAL;
@@ -3864,11 +3928,8 @@ int drm_mode_getproperty_ioctl(struct drm_device *dev,
 
 	if (drm_property_type_is(property, DRM_MODE_PROP_ENUM) ||
 			drm_property_type_is(property, DRM_MODE_PROP_BITMASK)) {
-		list_for_each_entry(prop_enum, &property->enum_blob_list, head)
+		list_for_each_entry(prop_enum, &property->enum_list, head)
 			enum_count++;
-	} else if (drm_property_type_is(property, DRM_MODE_PROP_BLOB)) {
-		list_for_each_entry(prop_blob, &property->enum_blob_list, head)
-			blob_count++;
 	}
 
 	value_count = property->num_values;
@@ -3893,7 +3954,7 @@ int drm_mode_getproperty_ioctl(struct drm_device *dev,
 		if ((out_resp->count_enum_blobs >= enum_count) && enum_count) {
 			copied = 0;
 			enum_ptr = (struct drm_mode_property_enum __user *)(unsigned long)out_resp->enum_blob_ptr;
-			list_for_each_entry(prop_enum, &property->enum_blob_list, head) {
+			list_for_each_entry(prop_enum, &property->enum_list, head) {
 
 				if (copy_to_user(&enum_ptr[copied].value, &prop_enum->value, sizeof(uint64_t))) {
 					ret = -EFAULT;
@@ -3911,35 +3972,24 @@ int drm_mode_getproperty_ioctl(struct drm_device *dev,
 		out_resp->count_enum_blobs = enum_count;
 	}
 
-	if (drm_property_type_is(property, DRM_MODE_PROP_BLOB)) {
-		if ((out_resp->count_enum_blobs >= blob_count) && blob_count) {
-			copied = 0;
-			blob_id_ptr = (uint32_t __user *)(unsigned long)out_resp->enum_blob_ptr;
-			blob_length_ptr = (uint32_t __user *)(unsigned long)out_resp->values_ptr;
-
-			list_for_each_entry(prop_blob, &property->enum_blob_list, head) {
-				if (put_user(prop_blob->base.id, blob_id_ptr + copied)) {
-					ret = -EFAULT;
-					goto done;
-				}
-
-				if (put_user(prop_blob->length, blob_length_ptr + copied)) {
-					ret = -EFAULT;
-					goto done;
-				}
-
-				copied++;
-			}
-		}
-		out_resp->count_enum_blobs = blob_count;
-	}
+	/*
+	 * NOTE: The idea seems to have been to use this to read all the blob
+	 * property values. But nothing ever added them to the corresponding
+	 * list, userspace always used the special-purpose get_blob ioctl to
+	 * read the value for a blob property. It also doesn't make a lot of
+	 * sense to return values here when everything else is just metadata for
+	 * the property itself.
+	 */
+	if (drm_property_type_is(property, DRM_MODE_PROP_BLOB))
+		out_resp->count_enum_blobs = 0;
 done:
 	drm_modeset_unlock_all(dev);
 	return ret;
 }
 
-static struct drm_property_blob *drm_property_create_blob(struct drm_device *dev, int length,
-							  void *data)
+static struct drm_property_blob *
+drm_property_create_blob(struct drm_device *dev, size_t length,
+			 const void *data)
 {
 	struct drm_property_blob *blob;
 	int ret;
@@ -3985,7 +4035,7 @@ static void drm_property_destroy_blob(struct drm_device *dev,
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_getblob_ioctl(struct drm_device *dev,
 			   void *data, struct drm_file *file_priv)
@@ -4019,12 +4069,25 @@ done:
 	return ret;
 }
 
+/**
+ * drm_mode_connector_set_path_property - set tile property on connector
+ * @connector: connector to set property on.
+ * @path: path to use for property.
+ *
+ * This creates a property to expose to userspace to specify a
+ * connector path. This is mainly used for DisplayPort MST where
+ * connectors have a topology and we want to allow userspace to give
+ * them more meaningful names.
+ *
+ * Returns:
+ * Zero on success, negative errno on failure.
+ */
 int drm_mode_connector_set_path_property(struct drm_connector *connector,
-					 char *path)
+					 const char *path)
 {
 	struct drm_device *dev = connector->dev;
-	int ret, size;
-	size = strlen(path) + 1;
+	size_t size = strlen(path) + 1;
+	int ret;
 
 	connector->path_blob_ptr = drm_property_create_blob(connector->dev,
 							    size, path);
@@ -4039,6 +4102,52 @@ int drm_mode_connector_set_path_property(struct drm_connector *connector,
 EXPORT_SYMBOL(drm_mode_connector_set_path_property);
 
 /**
+ * drm_mode_connector_set_tile_property - set tile property on connector
+ * @connector: connector to set property on.
+ *
+ * This looks up the tile information for a connector, and creates a
+ * property for userspace to parse if it exists. The property is of
+ * the form of 8 integers using ':' as a separator.
+ *
+ * Returns:
+ * Zero on success, errno on failure.
+ */
+int drm_mode_connector_set_tile_property(struct drm_connector *connector)
+{
+	struct drm_device *dev = connector->dev;
+	int ret, size;
+	char tile[256];
+
+	if (connector->tile_blob_ptr)
+		drm_property_destroy_blob(dev, connector->tile_blob_ptr);
+
+	if (!connector->has_tile) {
+		connector->tile_blob_ptr = NULL;
+		ret = drm_object_property_set_value(&connector->base,
+						    dev->mode_config.tile_property, 0);
+		return ret;
+	}
+
+	snprintf(tile, 256, "%d:%d:%d:%d:%d:%d:%d:%d",
+		 connector->tile_group->id, connector->tile_is_single_monitor,
+		 connector->num_h_tile, connector->num_v_tile,
+		 connector->tile_h_loc, connector->tile_v_loc,
+		 connector->tile_h_size, connector->tile_v_size);
+	size = strlen(tile) + 1;
+
+	connector->tile_blob_ptr = drm_property_create_blob(connector->dev,
+							    size, tile);
+	if (!connector->tile_blob_ptr)
+		return -EINVAL;
+
+	ret = drm_object_property_set_value(&connector->base,
+					    dev->mode_config.tile_property,
+					    connector->tile_blob_ptr->base.id);
+	return ret;
+}
+EXPORT_SYMBOL(drm_mode_connector_set_tile_property);
+
+/**
  * drm_mode_connector_update_edid_property - update the edid property of a connector
  * @connector: drm connector
  * @edid: new value of the edid property
@@ -4047,13 +4156,14 @@ EXPORT_SYMBOL(drm_mode_connector_set_path_property);
  * connector's edid property.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_connector_update_edid_property(struct drm_connector *connector,
-					    struct edid *edid)
+					    const struct edid *edid)
 {
 	struct drm_device *dev = connector->dev;
-	int ret, size;
+	size_t size;
+	int ret;
 
 	/* ignore requests to set edid when overridden */
 	if (connector->override_edid)
@@ -4143,7 +4253,7 @@ static bool drm_property_change_is_valid(struct drm_property *property,
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_connector_property_set_ioctl(struct drm_device *dev,
 				       void *data, struct drm_file *file_priv)
@@ -4226,7 +4336,7 @@ int drm_mode_plane_set_obj_prop(struct drm_plane *plane,
 EXPORT_SYMBOL(drm_mode_plane_set_obj_prop);
 
 /**
- * drm_mode_getproperty_ioctl - get the current value of a object's property
+ * drm_mode_obj_get_properties_ioctl - get the current value of a object's property
  * @dev: DRM device
  * @data: ioctl data
  * @file_priv: DRM file info
@@ -4238,7 +4348,7 @@ EXPORT_SYMBOL(drm_mode_plane_set_obj_prop);
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_obj_get_properties_ioctl(struct drm_device *dev, void *data,
 				      struct drm_file *file_priv)
@@ -4310,7 +4420,7 @@ out:
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_obj_set_property_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file_priv)
@@ -4382,7 +4492,7 @@ out:
  * possible_clones and possible_crtcs bitmasks.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_connector_attach_encoder(struct drm_connector *connector,
 				      struct drm_encoder *encoder)
@@ -4409,7 +4519,7 @@ EXPORT_SYMBOL(drm_mode_connector_attach_encoder);
  * fixed gamma table size.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_crtc_set_gamma_size(struct drm_crtc *crtc,
 				 int gamma_size)
@@ -4438,7 +4548,7 @@ EXPORT_SYMBOL(drm_mode_crtc_set_gamma_size);
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_gamma_set_ioctl(struct drm_device *dev,
 			     void *data, struct drm_file *file_priv)
@@ -4510,7 +4620,7 @@ out:
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_gamma_get_ioctl(struct drm_device *dev,
 			     void *data, struct drm_file *file_priv)
@@ -4576,7 +4686,7 @@ out:
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_page_flip_ioctl(struct drm_device *dev,
 			     void *data, struct drm_file *file_priv)
@@ -4599,7 +4709,7 @@ int drm_mode_page_flip_ioctl(struct drm_device *dev,
 	if (!crtc)
 		return -ENOENT;
 
-	drm_modeset_lock_crtc(crtc);
+	drm_modeset_lock_crtc(crtc, crtc->primary);
 	if (crtc->primary->fb == NULL) {
 		/* The framebuffer is currently unbound, presumably
 		 * due to a hotplug event, that userspace has not
@@ -4742,7 +4852,7 @@ EXPORT_SYMBOL(drm_mode_config_reset);
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_create_dumb_ioctl(struct drm_device *dev,
 			       void *data, struct drm_file *file_priv)
@@ -4769,6 +4879,16 @@ int drm_mode_create_dumb_ioctl(struct drm_device *dev,
 	if (PAGE_ALIGN(size) == 0)
 		return -EINVAL;
 
+	/*
+	 * handle, pitch and size are output parameters. Zero them out to
+	 * prevent drivers from accidentally using uninitialized data. Since
+	 * not all existing userspace is clearing these fields properly we
+	 * cannot reject IOCTL with garbage in them.
+	 */
+	args->handle = 0;
+	args->pitch = 0;
+	args->size = 0;
+
 	return dev->driver->dumb_create(file_priv, dev, args);
 }
 
@@ -4784,7 +4904,7 @@ int drm_mode_create_dumb_ioctl(struct drm_device *dev,
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_mmap_dumb_ioctl(struct drm_device *dev,
 			     void *data, struct drm_file *file_priv)
@@ -4811,7 +4931,7 @@ int drm_mode_mmap_dumb_ioctl(struct drm_device *dev,
  * Called by the user via ioctl.
  *
  * Returns:
- * Zero on success, errno on failure.
+ * Zero on success, negative errno on failure.
  */
 int drm_mode_destroy_dumb_ioctl(struct drm_device *dev,
 				void *data, struct drm_file *file_priv)
@@ -5097,6 +5217,7 @@ void drm_mode_config_init(struct drm_device *dev)
 	INIT_LIST_HEAD(&dev->mode_config.property_blob_list);
 	INIT_LIST_HEAD(&dev->mode_config.plane_list);
 	idr_init(&dev->mode_config.crtc_idr);
+	idr_init(&dev->mode_config.tile_idr);
 
 	drm_modeset_lock_all(dev);
 	drm_mode_create_standard_connector_properties(dev);
@@ -5184,6 +5305,7 @@ void drm_mode_config_cleanup(struct drm_device *dev)
 		crtc->funcs->destroy(crtc);
 	}
 
+	idr_destroy(&dev->mode_config.tile_idr);
 	idr_destroy(&dev->mode_config.crtc_idr);
 	drm_modeset_lock_fini(&dev->mode_config.connection_mutex);
 }
@@ -5206,3 +5328,100 @@ struct drm_property *drm_mode_create_rotation_property(struct drm_device *dev,
 					   supported_rotations);
 }
 EXPORT_SYMBOL(drm_mode_create_rotation_property);
+
+/**
+ * DOC: Tile group
+ *
+ * Tile groups are used to represent tiled monitors with a unique
+ * integer identifier. Tiled monitors using DisplayID v1.3 have
+ * a unique 8-byte handle, we store this in a tile group, so we
+ * have a common identifier for all tiles in a monitor group.
+ */
+static void drm_tile_group_free(struct kref *kref)
+{
+	struct drm_tile_group *tg = container_of(kref, struct drm_tile_group, refcount);
+	struct drm_device *dev = tg->dev;
+	mutex_lock(&dev->mode_config.idr_mutex);
+	idr_remove(&dev->mode_config.tile_idr, tg->id);
+	mutex_unlock(&dev->mode_config.idr_mutex);
+	kfree(tg);
+}
+
+/**
+ * drm_mode_put_tile_group - drop a reference to a tile group.
+ * @dev: DRM device
+ * @tg: tile group to drop reference to.
+ *
+ * drop reference to tile group and free if 0.
+ */
+void drm_mode_put_tile_group(struct drm_device *dev,
+			     struct drm_tile_group *tg)
+{
+	kref_put(&tg->refcount, drm_tile_group_free);
+}
+
+/**
+ * drm_mode_get_tile_group - get a reference to an existing tile group
+ * @dev: DRM device
+ * @topology: 8-bytes unique per monitor.
+ *
+ * Use the unique bytes to get a reference to an existing tile group.
+ *
+ * RETURNS:
+ * tile group or NULL if not found.
+ */
+struct drm_tile_group *drm_mode_get_tile_group(struct drm_device *dev,
+					       char topology[8])
+{
+	struct drm_tile_group *tg;
+	int id;
+	mutex_lock(&dev->mode_config.idr_mutex);
+	idr_for_each_entry(&dev->mode_config.tile_idr, tg, id) {
+		if (!memcmp(tg->group_data, topology, 8)) {
+			if (!kref_get_unless_zero(&tg->refcount))
+				tg = NULL;
+			mutex_unlock(&dev->mode_config.idr_mutex);
+			return tg;
+		}
+	}
+	mutex_unlock(&dev->mode_config.idr_mutex);
+	return NULL;
+}
+
+/**
+ * drm_mode_create_tile_group - create a tile group from a displayid description
+ * @dev: DRM device
+ * @topology: 8-bytes unique per monitor.
+ *
+ * Create a tile group for the unique monitor, and get a unique
+ * identifier for the tile group.
+ *
+ * RETURNS:
+ * new tile group or error.
+ */
+struct drm_tile_group *drm_mode_create_tile_group(struct drm_device *dev,
+						  char topology[8])
+{
+	struct drm_tile_group *tg;
+	int ret;
+
+	tg = kzalloc(sizeof(*tg), GFP_KERNEL);
+	if (!tg)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&tg->refcount);
+	memcpy(tg->group_data, topology, 8);
+	tg->dev = dev;
+
+	mutex_lock(&dev->mode_config.idr_mutex);
+	ret = idr_alloc(&dev->mode_config.tile_idr, tg, 1, 0, GFP_KERNEL);
+	if (ret >= 0) {
+		tg->id = ret;
+	} else {
+		kfree(tg);
+		tg = ERR_PTR(ret);
+	}
+
+	mutex_unlock(&dev->mode_config.idr_mutex);
+	return tg;
+}
diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c
index 6c65a0a28fbd..d552708409de 100644
--- a/drivers/gpu/drm/drm_crtc_helper.c
+++ b/drivers/gpu/drm/drm_crtc_helper.c
@@ -34,12 +34,35 @@
 #include <linux/moduleparam.h>
 
 #include <drm/drmP.h>
+#include <drm/drm_atomic.h>
 #include <drm/drm_crtc.h>
 #include <drm/drm_fourcc.h>
 #include <drm/drm_crtc_helper.h>
 #include <drm/drm_fb_helper.h>
+#include <drm/drm_plane_helper.h>
+#include <drm/drm_atomic_helper.h>
 #include <drm/drm_edid.h>
 
+/**
+ * DOC: overview
+ *
+ * The CRTC modeset helper library provides a default set_config implementation
+ * in drm_crtc_helper_set_config(). Plus a few other convenience functions using
+ * the same callbacks which drivers can use to e.g. restore the modeset
+ * configuration on resume with drm_helper_resume_force_mode().
+ *
+ * The driver callbacks are mostly compatible with the atomic modeset helpers,
+ * except for the handling of the primary plane: Atomic helpers require that the
+ * primary plane is implemented as a real standalone plane and not directly tied
+ * to the CRTC state. For easier transition this library provides functions to
+ * implement the old semantics required by the CRTC helpers using the new plane
+ * and atomic helper callbacks.
+ *
+ * Drivers are strongly urged to convert to the atomic helpers (by way of first
+ * converting to the plane helpers). New drivers must not use these functions
+ * but need to implement the atomic interface instead, potentially using the
+ * atomic helpers for that.
+ */
 MODULE_AUTHOR("David Airlie, Jesse Barnes");
 MODULE_DESCRIPTION("DRM KMS helper");
 MODULE_LICENSE("GPL and additional rights");
@@ -888,3 +911,112 @@ void drm_helper_resume_force_mode(struct drm_device *dev)
 	drm_modeset_unlock_all(dev);
 }
 EXPORT_SYMBOL(drm_helper_resume_force_mode);
+
+/**
+ * drm_helper_crtc_mode_set - mode_set implementation for atomic plane helpers
+ * @crtc: DRM CRTC
+ * @mode: DRM display mode which userspace requested
+ * @adjusted_mode: DRM display mode adjusted by ->mode_fixup callbacks
+ * @x: x offset of the CRTC scanout area on the underlying framebuffer
+ * @y: y offset of the CRTC scanout area on the underlying framebuffer
+ * @old_fb: previous framebuffer
+ *
+ * This function implements a callback useable as the ->mode_set callback
+ * required by the crtc helpers. Besides the atomic plane helper functions for
+ * the primary plane the driver must also provide the ->mode_set_nofb callback
+ * to set up the crtc.
+ *
+ * This is a transitional helper useful for converting drivers to the atomic
+ * interfaces.
+ */
+int drm_helper_crtc_mode_set(struct drm_crtc *crtc, struct drm_display_mode *mode,
+			     struct drm_display_mode *adjusted_mode, int x, int y,
+			     struct drm_framebuffer *old_fb)
+{
+	struct drm_crtc_state *crtc_state;
+	struct drm_crtc_helper_funcs *crtc_funcs = crtc->helper_private;
+	int ret;
+
+	if (crtc->funcs->atomic_duplicate_state)
+		crtc_state = crtc->funcs->atomic_duplicate_state(crtc);
+	else if (crtc->state)
+		crtc_state = kmemdup(crtc->state, sizeof(*crtc_state),
+				     GFP_KERNEL);
+	else
+		crtc_state = kzalloc(sizeof(*crtc_state), GFP_KERNEL);
+	if (!crtc_state)
+		return -ENOMEM;
+
+	crtc_state->enable = true;
+	crtc_state->planes_changed = true;
+	crtc_state->mode_changed = true;
+	drm_mode_copy(&crtc_state->mode, mode);
+	drm_mode_copy(&crtc_state->adjusted_mode, adjusted_mode);
+
+	if (crtc_funcs->atomic_check) {
+		ret = crtc_funcs->atomic_check(crtc, crtc_state);
+		if (ret) {
+			kfree(crtc_state);
+
+			return ret;
+		}
+	}
+
+	swap(crtc->state, crtc_state);
+
+	crtc_funcs->mode_set_nofb(crtc);
+
+	if (crtc_state) {
+		if (crtc->funcs->atomic_destroy_state)
+			crtc->funcs->atomic_destroy_state(crtc, crtc_state);
+		else
+			kfree(crtc_state);
+	}
+
+	return drm_helper_crtc_mode_set_base(crtc, x, y, old_fb);
+}
+EXPORT_SYMBOL(drm_helper_crtc_mode_set);
+
+/**
+ * drm_helper_crtc_mode_set_base - mode_set_base implementation for atomic plane helpers
+ * @crtc: DRM CRTC
+ * @x: x offset of the CRTC scanout area on the underlying framebuffer
+ * @y: y offset of the CRTC scanout area on the underlying framebuffer
+ * @old_fb: previous framebuffer
+ *
+ * This function implements a callback useable as the ->mode_set_base used
+ * required by the crtc helpers. The driver must provide the atomic plane helper
+ * functions for the primary plane.
+ *
+ * This is a transitional helper useful for converting drivers to the atomic
+ * interfaces.
+ */
+int drm_helper_crtc_mode_set_base(struct drm_crtc *crtc, int x, int y,
+				  struct drm_framebuffer *old_fb)
+{
+	struct drm_plane_state *plane_state;
+	struct drm_plane *plane = crtc->primary;
+
+	if (plane->funcs->atomic_duplicate_state)
+		plane_state = plane->funcs->atomic_duplicate_state(plane);
+	else if (plane->state)
+		plane_state = drm_atomic_helper_plane_duplicate_state(plane);
+	else
+		plane_state = kzalloc(sizeof(*plane_state), GFP_KERNEL);
+	if (!plane_state)
+		return -ENOMEM;
+
+	plane_state->crtc = crtc;
+	drm_atomic_set_fb_for_plane(plane_state, crtc->primary->fb);
+	plane_state->crtc_x = 0;
+	plane_state->crtc_y = 0;
+	plane_state->crtc_h = crtc->mode.vdisplay;
+	plane_state->crtc_w = crtc->mode.hdisplay;
+	plane_state->src_x = x << 16;
+	plane_state->src_y = y << 16;
+	plane_state->src_h = crtc->mode.vdisplay << 16;
+	plane_state->src_w = crtc->mode.hdisplay << 16;
+
+	return drm_plane_helper_commit(plane, plane_state, old_fb);
+}
+EXPORT_SYMBOL(drm_helper_crtc_mode_set_base);
diff --git a/drivers/gpu/drm/drm_dp_helper.c b/drivers/gpu/drm/drm_dp_helper.c
index 08e33b8b13a4..79968e39c8d0 100644
--- a/drivers/gpu/drm/drm_dp_helper.c
+++ b/drivers/gpu/drm/drm_dp_helper.c
@@ -39,198 +39,6 @@
  * blocks, ...
  */
 
-/* Run a single AUX_CH I2C transaction, writing/reading data as necessary */
-static int
-i2c_algo_dp_aux_transaction(struct i2c_adapter *adapter, int mode,
-			    uint8_t write_byte, uint8_t *read_byte)
-{
-	struct i2c_algo_dp_aux_data *algo_data = adapter->algo_data;
-	int ret;
-
-	ret = (*algo_data->aux_ch)(adapter, mode,
-				   write_byte, read_byte);
-	return ret;
-}
-
-/*
- * I2C over AUX CH
- */
-
-/*
- * Send the address. If the I2C link is running, this 'restarts'
- * the connection with the new address, this is used for doing
- * a write followed by a read (as needed for DDC)
- */
-static int
-i2c_algo_dp_aux_address(struct i2c_adapter *adapter, u16 address, bool reading)
-{
-	struct i2c_algo_dp_aux_data *algo_data = adapter->algo_data;
-	int mode = MODE_I2C_START;
-	int ret;
-
-	if (reading)
-		mode |= MODE_I2C_READ;
-	else
-		mode |= MODE_I2C_WRITE;
-	algo_data->address = address;
-	algo_data->running = true;
-	ret = i2c_algo_dp_aux_transaction(adapter, mode, 0, NULL);
-	return ret;
-}
-
-/*
- * Stop the I2C transaction. This closes out the link, sending
- * a bare address packet with the MOT bit turned off
- */
-static void
-i2c_algo_dp_aux_stop(struct i2c_adapter *adapter, bool reading)
-{
-	struct i2c_algo_dp_aux_data *algo_data = adapter->algo_data;
-	int mode = MODE_I2C_STOP;
-
-	if (reading)
-		mode |= MODE_I2C_READ;
-	else
-		mode |= MODE_I2C_WRITE;
-	if (algo_data->running) {
-		(void) i2c_algo_dp_aux_transaction(adapter, mode, 0, NULL);
-		algo_data->running = false;
-	}
-}
-
-/*
- * Write a single byte to the current I2C address, the
- * the I2C link must be running or this returns -EIO
- */
-static int
-i2c_algo_dp_aux_put_byte(struct i2c_adapter *adapter, u8 byte)
-{
-	struct i2c_algo_dp_aux_data *algo_data = adapter->algo_data;
-	int ret;
-
-	if (!algo_data->running)
-		return -EIO;
-
-	ret = i2c_algo_dp_aux_transaction(adapter, MODE_I2C_WRITE, byte, NULL);
-	return ret;
-}
-
-/*
- * Read a single byte from the current I2C address, the
- * I2C link must be running or this returns -EIO
- */
-static int
-i2c_algo_dp_aux_get_byte(struct i2c_adapter *adapter, u8 *byte_ret)
-{
-	struct i2c_algo_dp_aux_data *algo_data = adapter->algo_data;
-	int ret;
-
-	if (!algo_data->running)
-		return -EIO;
-
-	ret = i2c_algo_dp_aux_transaction(adapter, MODE_I2C_READ, 0, byte_ret);
-	return ret;
-}
-
-static int
-i2c_algo_dp_aux_xfer(struct i2c_adapter *adapter,
-		     struct i2c_msg *msgs,
-		     int num)
-{
-	int ret = 0;
-	bool reading = false;
-	int m;
-	int b;
-
-	for (m = 0; m < num; m++) {
-		u16 len = msgs[m].len;
-		u8 *buf = msgs[m].buf;
-		reading = (msgs[m].flags & I2C_M_RD) != 0;
-		ret = i2c_algo_dp_aux_address(adapter, msgs[m].addr, reading);
-		if (ret < 0)
-			break;
-		if (reading) {
-			for (b = 0; b < len; b++) {
-				ret = i2c_algo_dp_aux_get_byte(adapter, &buf[b]);
-				if (ret < 0)
-					break;
-			}
-		} else {
-			for (b = 0; b < len; b++) {
-				ret = i2c_algo_dp_aux_put_byte(adapter, buf[b]);
-				if (ret < 0)
-					break;
-			}
-		}
-		if (ret < 0)
-			break;
-	}
-	if (ret >= 0)
-		ret = num;
-	i2c_algo_dp_aux_stop(adapter, reading);
-	DRM_DEBUG_KMS("dp_aux_xfer return %d\n", ret);
-	return ret;
-}
-
-static u32
-i2c_algo_dp_aux_functionality(struct i2c_adapter *adapter)
-{
-	return I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL |
-	       I2C_FUNC_SMBUS_READ_BLOCK_DATA |
-	       I2C_FUNC_SMBUS_BLOCK_PROC_CALL |
-	       I2C_FUNC_10BIT_ADDR;
-}
-
-static const struct i2c_algorithm i2c_dp_aux_algo = {
-	.master_xfer	= i2c_algo_dp_aux_xfer,
-	.functionality	= i2c_algo_dp_aux_functionality,
-};
-
-static void
-i2c_dp_aux_reset_bus(struct i2c_adapter *adapter)
-{
-	(void) i2c_algo_dp_aux_address(adapter, 0, false);
-	(void) i2c_algo_dp_aux_stop(adapter, false);
-}
-
-static int
-i2c_dp_aux_prepare_bus(struct i2c_adapter *adapter)
-{
-	adapter->algo = &i2c_dp_aux_algo;
-	adapter->retries = 3;
-	i2c_dp_aux_reset_bus(adapter);
-	return 0;
-}
-
-/**
- * i2c_dp_aux_add_bus() - register an i2c adapter using the aux ch helper
- * @adapter: i2c adapter to register
- *
- * This registers an i2c adapter that uses dp aux channel as it's underlaying
- * transport. The driver needs to fill out the &i2c_algo_dp_aux_data structure
- * and store it in the algo_data member of the @adapter argument. This will be
- * used by the i2c over dp aux algorithm to drive the hardware.
- *
- * RETURNS:
- * 0 on success, -ERRNO on failure.
- *
- * IMPORTANT:
- * This interface is deprecated, please switch to the new dp aux helpers and
- * drm_dp_aux_register().
- */
-int
-i2c_dp_aux_add_bus(struct i2c_adapter *adapter)
-{
-	int error;
-
-	error = i2c_dp_aux_prepare_bus(adapter);
-	if (error)
-		return error;
-	error = i2c_add_adapter(adapter);
-	return error;
-}
-EXPORT_SYMBOL(i2c_dp_aux_add_bus);
-
 /* Helpers for DP link training */
 static u8 dp_link_status(const u8 link_status[DP_LINK_STATUS_SIZE], int r)
 {
@@ -378,10 +186,11 @@ static int drm_dp_dpcd_access(struct drm_dp_aux *aux, u8 request,
 
 	/*
 	 * The specification doesn't give any recommendation on how often to
-	 * retry native transactions, so retry 7 times like for I2C-over-AUX
-	 * transactions.
+	 * retry native transactions. We used to retry 7 times like for
+	 * aux i2c transactions but real world devices this wasn't
+	 * sufficient, bump to 32 which makes Dell 4k monitors happier.
 	 */
-	for (retry = 0; retry < 7; retry++) {
+	for (retry = 0; retry < 32; retry++) {
 
 		mutex_lock(&aux->hw_mutex);
 		err = aux->transfer(aux, &msg);
@@ -654,10 +463,12 @@ static int drm_dp_i2c_do_msg(struct drm_dp_aux *aux, struct drm_dp_aux_msg *msg)
 
 		case DP_AUX_I2C_REPLY_NACK:
 			DRM_DEBUG_KMS("I2C nack\n");
+			aux->i2c_nack_count++;
 			return -EREMOTEIO;
 
 		case DP_AUX_I2C_REPLY_DEFER:
 			DRM_DEBUG_KMS("I2C defer\n");
+			aux->i2c_defer_count++;
 			usleep_range(400, 500);
 			continue;
 
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 070f913d2dba..9a5b68717ec8 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -839,6 +839,8 @@ static void drm_dp_put_mst_branch_device(struct drm_dp_mst_branch *mstb)
 
 static void drm_dp_port_teardown_pdt(struct drm_dp_mst_port *port, int old_pdt)
 {
+	struct drm_dp_mst_branch *mstb;
+
 	switch (old_pdt) {
 	case DP_PEER_DEVICE_DP_LEGACY_CONV:
 	case DP_PEER_DEVICE_SST_SINK:
@@ -846,8 +848,9 @@ static void drm_dp_port_teardown_pdt(struct drm_dp_mst_port *port, int old_pdt)
 		drm_dp_mst_unregister_i2c_bus(&port->aux);
 		break;
 	case DP_PEER_DEVICE_MST_BRANCHING:
-		drm_dp_put_mst_branch_device(port->mstb);
+		mstb = port->mstb;
 		port->mstb = NULL;
+		drm_dp_put_mst_branch_device(mstb);
 		break;
 	}
 }
@@ -858,6 +861,8 @@ static void drm_dp_destroy_port(struct kref *kref)
 	struct drm_dp_mst_topology_mgr *mgr = port->mgr;
 	if (!port->input) {
 		port->vcpi.num_slots = 0;
+
+		kfree(port->cached_edid);
 		if (port->connector)
 			(*port->mgr->cbs->destroy_connector)(mgr, port->connector);
 		drm_dp_port_teardown_pdt(port, port->pdt);
@@ -1011,19 +1016,20 @@ static void drm_dp_check_port_guid(struct drm_dp_mst_branch *mstb,
 
 static void build_mst_prop_path(struct drm_dp_mst_port *port,
 				struct drm_dp_mst_branch *mstb,
-				char *proppath)
+				char *proppath,
+				size_t proppath_size)
 {
 	int i;
 	char temp[8];
-	snprintf(proppath, 255, "mst:%d", mstb->mgr->conn_base_id);
+	snprintf(proppath, proppath_size, "mst:%d", mstb->mgr->conn_base_id);
 	for (i = 0; i < (mstb->lct - 1); i++) {
 		int shift = (i % 2) ? 0 : 4;
 		int port_num = mstb->rad[i / 2] >> shift;
-		snprintf(temp, 8, "-%d", port_num);
-		strncat(proppath, temp, 255);
+		snprintf(temp, sizeof(temp), "-%d", port_num);
+		strlcat(proppath, temp, proppath_size);
 	}
-	snprintf(temp, 8, "-%d", port->port_num);
-	strncat(proppath, temp, 255);
+	snprintf(temp, sizeof(temp), "-%d", port->port_num);
+	strlcat(proppath, temp, proppath_size);
 }
 
 static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
@@ -1094,8 +1100,12 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
 
 	if (created && !port->input) {
 		char proppath[255];
-		build_mst_prop_path(port, mstb, proppath);
+		build_mst_prop_path(port, mstb, proppath, sizeof(proppath));
 		port->connector = (*mstb->mgr->cbs->add_connector)(mstb->mgr, port, proppath);
+
+		if (port->port_num >= 8) {
+			port->cached_edid = drm_get_edid(port->connector, &port->aux.ddc);
+		}
 	}
 
 	/* put reference to this port */
@@ -1798,17 +1808,27 @@ static int drm_dp_send_up_ack_reply(struct drm_dp_mst_topology_mgr *mgr,
 	return 0;
 }
 
-static int drm_dp_get_vc_payload_bw(int dp_link_bw, int dp_link_count)
+static bool drm_dp_get_vc_payload_bw(int dp_link_bw,
+				     int dp_link_count,
+				     int *out)
 {
 	switch (dp_link_bw) {
+	default:
+		DRM_DEBUG_KMS("invalid link bandwidth in DPCD: %x (link count: %d)\n",
+			      dp_link_bw, dp_link_count);
+		return false;
+
 	case DP_LINK_BW_1_62:
-		return 3 * dp_link_count;
+		*out = 3 * dp_link_count;
+		break;
 	case DP_LINK_BW_2_7:
-		return 5 * dp_link_count;
+		*out = 5 * dp_link_count;
+		break;
 	case DP_LINK_BW_5_4:
-		return 10 * dp_link_count;
+		*out = 10 * dp_link_count;
+		break;
 	}
-	BUG();
+	return true;
 }
 
 /**
@@ -1840,7 +1860,13 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
 			goto out_unlock;
 		}
 
-		mgr->pbn_div = drm_dp_get_vc_payload_bw(mgr->dpcd[1], mgr->dpcd[2] & DP_MAX_LANE_COUNT_MASK);
+		if (!drm_dp_get_vc_payload_bw(mgr->dpcd[1],
+					      mgr->dpcd[2] & DP_MAX_LANE_COUNT_MASK,
+					      &mgr->pbn_div)) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
 		mgr->total_pbn = 2560;
 		mgr->total_slots = DIV_ROUND_UP(mgr->total_pbn, mgr->pbn_div);
 		mgr->avail_slots = mgr->total_slots;
@@ -2150,7 +2176,8 @@ EXPORT_SYMBOL(drm_dp_mst_hpd_irq);
  * This returns the current connection state for a port. It validates the
  * port pointer still exists so the caller doesn't require a reference
  */
-enum drm_connector_status drm_dp_mst_detect_port(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
+enum drm_connector_status drm_dp_mst_detect_port(struct drm_connector *connector,
+						 struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
 {
 	enum drm_connector_status status = connector_status_disconnected;
 
@@ -2169,6 +2196,10 @@ enum drm_connector_status drm_dp_mst_detect_port(struct drm_dp_mst_topology_mgr
 
 	case DP_PEER_DEVICE_SST_SINK:
 		status = connector_status_connected;
+		/* for logical ports - cache the EDID */
+		if (port->port_num >= 8 && !port->cached_edid) {
+			port->cached_edid = drm_get_edid(connector, &port->aux.ddc);
+		}
 		break;
 	case DP_PEER_DEVICE_DP_LEGACY_CONV:
 		if (port->ldps)
@@ -2200,7 +2231,12 @@ struct edid *drm_dp_mst_get_edid(struct drm_connector *connector, struct drm_dp_
 	if (!port)
 		return NULL;
 
-	edid = drm_get_edid(connector, &port->aux.ddc);
+	if (port->cached_edid)
+		edid = drm_edid_duplicate(port->cached_edid);
+	else
+		edid = drm_get_edid(connector, &port->aux.ddc);
+
+	drm_mode_connector_set_tile_property(connector);
 	drm_dp_put_port(port);
 	return edid;
 }
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index bc3da32d4585..4f41377b0b80 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -56,7 +56,7 @@ static struct idr drm_minors_idr;
 struct class *drm_class;
 static struct dentry *drm_debugfs_root;
 
-void drm_err(const char *func, const char *format, ...)
+void drm_err(const char *format, ...)
 {
 	struct va_format vaf;
 	va_list args;
@@ -66,7 +66,8 @@ void drm_err(const char *func, const char *format, ...)
 	vaf.fmt = format;
 	vaf.va = &args;
 
-	printk(KERN_ERR "[" DRM_NAME ":%s] *ERROR* %pV", func, &vaf);
+	printk(KERN_ERR "[" DRM_NAME ":%pf] *ERROR* %pV",
+	       __builtin_return_address(0), &vaf);
 
 	va_end(args);
 }
@@ -534,6 +535,8 @@ static void drm_fs_inode_free(struct inode *inode)
  * The initial ref-count of the object is 1. Use drm_dev_ref() and
  * drm_dev_unref() to take and drop further ref-counts.
  *
+ * Note that for purely virtual devices @parent can be NULL.
+ *
  * RETURNS:
  * Pointer to new DRM device, or NULL if out of memory.
  */
diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index 3bf999134bcc..53bc7a628909 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -34,6 +34,7 @@
 #include <linux/module.h>
 #include <drm/drmP.h>
 #include <drm/drm_edid.h>
+#include <drm/drm_displayid.h>
 
 #define version_greater(edid, maj, min) \
 	(((edid)->version > (maj)) || \
@@ -1014,6 +1015,27 @@ module_param_named(edid_fixup, edid_fixup, int, 0400);
 MODULE_PARM_DESC(edid_fixup,
 		 "Minimum number of valid EDID header bytes (0-8, default 6)");
 
+static void drm_get_displayid(struct drm_connector *connector,
+			      struct edid *edid);
+
+static int drm_edid_block_checksum(const u8 *raw_edid)
+{
+	int i;
+	u8 csum = 0;
+	for (i = 0; i < EDID_LENGTH; i++)
+		csum += raw_edid[i];
+
+	return csum;
+}
+
+static bool drm_edid_is_zero(const u8 *in_edid, int length)
+{
+	if (memchr_inv(in_edid, 0, length))
+		return false;
+
+	return true;
+}
+
 /**
  * drm_edid_block_valid - Sanity check the EDID block (base or extension)
  * @raw_edid: pointer to raw EDID block
@@ -1027,8 +1049,7 @@ MODULE_PARM_DESC(edid_fixup,
  */
 bool drm_edid_block_valid(u8 *raw_edid, int block, bool print_bad_edid)
 {
-	int i;
-	u8 csum = 0;
+	u8 csum;
 	struct edid *edid = (struct edid *)raw_edid;
 
 	if (WARN_ON(!raw_edid))
@@ -1048,8 +1069,7 @@ bool drm_edid_block_valid(u8 *raw_edid, int block, bool print_bad_edid)
 		}
 	}
 
-	for (i = 0; i < EDID_LENGTH; i++)
-		csum += raw_edid[i];
+	csum = drm_edid_block_checksum(raw_edid);
 	if (csum) {
 		if (print_bad_edid) {
 			DRM_ERROR("EDID checksum is invalid, remainder is %d\n", csum);
@@ -1080,9 +1100,13 @@ bool drm_edid_block_valid(u8 *raw_edid, int block, bool print_bad_edid)
 
 bad:
 	if (print_bad_edid) {
-		printk(KERN_ERR "Raw EDID:\n");
-		print_hex_dump(KERN_ERR, " \t", DUMP_PREFIX_NONE, 16, 1,
+		if (drm_edid_is_zero(raw_edid, EDID_LENGTH)) {
+			printk(KERN_ERR "EDID block is all zeroes\n");
+		} else {
+			printk(KERN_ERR "Raw EDID:\n");
+			print_hex_dump(KERN_ERR, " \t", DUMP_PREFIX_NONE, 16, 1,
 			       raw_edid, EDID_LENGTH, false);
+		}
 	}
 	return false;
 }
@@ -1115,7 +1139,7 @@ EXPORT_SYMBOL(drm_edid_is_valid);
 #define DDC_SEGMENT_ADDR 0x30
 /**
  * drm_do_probe_ddc_edid() - get EDID information via I2C
- * @adapter: I2C device adaptor
+ * @data: I2C device adapter
  * @buf: EDID data buffer to be filled
  * @block: 128 byte EDID block to start fetching from
  * @len: EDID data buffer length to fetch
@@ -1125,9 +1149,9 @@ EXPORT_SYMBOL(drm_edid_is_valid);
  * Return: 0 on success or -1 on failure.
  */
 static int
-drm_do_probe_ddc_edid(struct i2c_adapter *adapter, unsigned char *buf,
-		      int block, int len)
+drm_do_probe_ddc_edid(void *data, u8 *buf, unsigned int block, size_t len)
 {
+	struct i2c_adapter *adapter = data;
 	unsigned char start = block * EDID_LENGTH;
 	unsigned char segment = block >> 1;
 	unsigned char xfers = segment ? 3 : 2;
@@ -1176,16 +1200,26 @@ drm_do_probe_ddc_edid(struct i2c_adapter *adapter, unsigned char *buf,
 	return ret == xfers ? 0 : -1;
 }
 
-static bool drm_edid_is_zero(u8 *in_edid, int length)
-{
-	if (memchr_inv(in_edid, 0, length))
-		return false;
-
-	return true;
-}
-
-static u8 *
-drm_do_get_edid(struct drm_connector *connector, struct i2c_adapter *adapter)
+/**
+ * drm_do_get_edid - get EDID data using a custom EDID block read function
+ * @connector: connector we're probing
+ * @get_edid_block: EDID block read function
+ * @data: private data passed to the block read function
+ *
+ * When the I2C adapter connected to the DDC bus is hidden behind a device that
+ * exposes a different interface to read EDID blocks this function can be used
+ * to get EDID data using a custom block read function.
+ *
+ * As in the general case the DDC bus is accessible by the kernel at the I2C
+ * level, drivers must make all reasonable efforts to expose it as an I2C
+ * adapter and use drm_get_edid() instead of abusing this function.
+ *
+ * Return: Pointer to valid EDID or NULL if we couldn't find any.
+ */
+struct edid *drm_do_get_edid(struct drm_connector *connector,
+	int (*get_edid_block)(void *data, u8 *buf, unsigned int block,
+			      size_t len),
+	void *data)
 {
 	int i, j = 0, valid_extensions = 0;
 	u8 *block, *new;
@@ -1196,7 +1230,7 @@ drm_do_get_edid(struct drm_connector *connector, struct i2c_adapter *adapter)
 
 	/* base block fetch */
 	for (i = 0; i < 4; i++) {
-		if (drm_do_probe_ddc_edid(adapter, block, 0, EDID_LENGTH))
+		if (get_edid_block(data, block, 0, EDID_LENGTH))
 			goto out;
 		if (drm_edid_block_valid(block, 0, print_bad_edid))
 			break;
@@ -1210,7 +1244,7 @@ drm_do_get_edid(struct drm_connector *connector, struct i2c_adapter *adapter)
 
 	/* if there's no extensions, we're done */
 	if (block[0x7e] == 0)
-		return block;
+		return (struct edid *)block;
 
 	new = krealloc(block, (block[0x7e] + 1) * EDID_LENGTH, GFP_KERNEL);
 	if (!new)
@@ -1219,7 +1253,7 @@ drm_do_get_edid(struct drm_connector *connector, struct i2c_adapter *adapter)
 
 	for (j = 1; j <= block[0x7e]; j++) {
 		for (i = 0; i < 4; i++) {
-			if (drm_do_probe_ddc_edid(adapter,
+			if (get_edid_block(data,
 				  block + (valid_extensions + 1) * EDID_LENGTH,
 				  j, EDID_LENGTH))
 				goto out;
@@ -1247,7 +1281,7 @@ drm_do_get_edid(struct drm_connector *connector, struct i2c_adapter *adapter)
 		block = new;
 	}
 
-	return block;
+	return (struct edid *)block;
 
 carp:
 	if (print_bad_edid) {
@@ -1260,6 +1294,7 @@ out:
 	kfree(block);
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(drm_do_get_edid);
 
 /**
  * drm_probe_ddc() - probe DDC presence
@@ -1289,11 +1324,14 @@ EXPORT_SYMBOL(drm_probe_ddc);
 struct edid *drm_get_edid(struct drm_connector *connector,
 			  struct i2c_adapter *adapter)
 {
-	struct edid *edid = NULL;
+	struct edid *edid;
 
-	if (drm_probe_ddc(adapter))
-		edid = (struct edid *)drm_do_get_edid(connector, adapter);
+	if (!drm_probe_ddc(adapter))
+		return NULL;
 
+	edid = drm_do_get_edid(connector, drm_do_probe_ddc_edid, adapter);
+	if (edid)
+		drm_get_displayid(connector, edid);
 	return edid;
 }
 EXPORT_SYMBOL(drm_get_edid);
@@ -2389,7 +2427,7 @@ add_detailed_modes(struct drm_connector *connector, struct edid *edid,
 /*
  * Search EDID for CEA extension block.
  */
-static u8 *drm_find_cea_extension(struct edid *edid)
+static u8 *drm_find_edid_extension(struct edid *edid, int ext_id)
 {
 	u8 *edid_ext = NULL;
 	int i;
@@ -2401,7 +2439,7 @@ static u8 *drm_find_cea_extension(struct edid *edid)
 	/* Find CEA extension */
 	for (i = 0; i < edid->extensions; i++) {
 		edid_ext = (u8 *)edid + EDID_LENGTH * (i + 1);
-		if (edid_ext[0] == CEA_EXT)
+		if (edid_ext[0] == ext_id)
 			break;
 	}
 
@@ -2411,6 +2449,16 @@ static u8 *drm_find_cea_extension(struct edid *edid)
 	return edid_ext;
 }
 
+static u8 *drm_find_cea_extension(struct edid *edid)
+{
+	return drm_find_edid_extension(edid, CEA_EXT);
+}
+
+static u8 *drm_find_displayid_extension(struct edid *edid)
+{
+	return drm_find_edid_extension(edid, DISPLAYID_EXT);
+}
+
 /*
  * Calculate the alternate clock for the CEA mode
  * (60Hz vs. 59.94Hz etc.)
@@ -3128,9 +3176,12 @@ void drm_edid_to_eld(struct drm_connector *connector, struct edid *edid)
 		}
 	}
 	eld[5] |= sad_count << 4;
-	eld[2] = (20 + mnl + sad_count * 3 + 3) / 4;
 
-	DRM_DEBUG_KMS("ELD size %d, SAD count %d\n", (int)eld[2], sad_count);
+	eld[DRM_ELD_BASELINE_ELD_LEN] =
+		DIV_ROUND_UP(drm_eld_calc_baseline_block_size(eld), 4);
+
+	DRM_DEBUG_KMS("ELD size %d, SAD count %d\n",
+		      drm_eld_size(eld), sad_count);
 }
 EXPORT_SYMBOL(drm_edid_to_eld);
 
@@ -3868,3 +3919,123 @@ drm_hdmi_vendor_infoframe_from_display_mode(struct hdmi_vendor_infoframe *frame,
 	return 0;
 }
 EXPORT_SYMBOL(drm_hdmi_vendor_infoframe_from_display_mode);
+
+static int drm_parse_display_id(struct drm_connector *connector,
+				u8 *displayid, int length,
+				bool is_edid_extension)
+{
+	/* if this is an EDID extension the first byte will be 0x70 */
+	int idx = 0;
+	struct displayid_hdr *base;
+	struct displayid_block *block;
+	u8 csum = 0;
+	int i;
+
+	if (is_edid_extension)
+		idx = 1;
+
+	base = (struct displayid_hdr *)&displayid[idx];
+
+	DRM_DEBUG_KMS("base revision 0x%x, length %d, %d %d\n",
+		      base->rev, base->bytes, base->prod_id, base->ext_count);
+
+	if (base->bytes + 5 > length - idx)
+		return -EINVAL;
+
+	for (i = idx; i <= base->bytes + 5; i++) {
+		csum += displayid[i];
+	}
+	if (csum) {
+		DRM_ERROR("DisplayID checksum invalid, remainder is %d\n", csum);
+		return -EINVAL;
+	}
+
+	block = (struct displayid_block *)&displayid[idx + 4];
+	DRM_DEBUG_KMS("block id %d, rev %d, len %d\n",
+		      block->tag, block->rev, block->num_bytes);
+
+	switch (block->tag) {
+	case DATA_BLOCK_TILED_DISPLAY: {
+		struct displayid_tiled_block *tile = (struct displayid_tiled_block *)block;
+
+		u16 w, h;
+		u8 tile_v_loc, tile_h_loc;
+		u8 num_v_tile, num_h_tile;
+		struct drm_tile_group *tg;
+
+		w = tile->tile_size[0] | tile->tile_size[1] << 8;
+		h = tile->tile_size[2] | tile->tile_size[3] << 8;
+
+		num_v_tile = (tile->topo[0] & 0xf) | (tile->topo[2] & 0x30);
+		num_h_tile = (tile->topo[0] >> 4) | ((tile->topo[2] >> 2) & 0x30);
+		tile_v_loc = (tile->topo[1] & 0xf) | ((tile->topo[2] & 0x3) << 4);
+		tile_h_loc = (tile->topo[1] >> 4) | (((tile->topo[2] >> 2) & 0x3) << 4);
+
+		connector->has_tile = true;
+		if (tile->tile_cap & 0x80)
+			connector->tile_is_single_monitor = true;
+
+		connector->num_h_tile = num_h_tile + 1;
+		connector->num_v_tile = num_v_tile + 1;
+		connector->tile_h_loc = tile_h_loc;
+		connector->tile_v_loc = tile_v_loc;
+		connector->tile_h_size = w + 1;
+		connector->tile_v_size = h + 1;
+
+		DRM_DEBUG_KMS("tile cap 0x%x\n", tile->tile_cap);
+		DRM_DEBUG_KMS("tile_size %d x %d\n", w + 1, h + 1);
+		DRM_DEBUG_KMS("topo num tiles %dx%d, location %dx%d\n",
+		       num_h_tile + 1, num_v_tile + 1, tile_h_loc, tile_v_loc);
+		DRM_DEBUG_KMS("vend %c%c%c\n", tile->topology_id[0], tile->topology_id[1], tile->topology_id[2]);
+
+		tg = drm_mode_get_tile_group(connector->dev, tile->topology_id);
+		if (!tg) {
+			tg = drm_mode_create_tile_group(connector->dev, tile->topology_id);
+		}
+		if (!tg)
+			return -ENOMEM;
+
+		if (connector->tile_group != tg) {
+			/* if we haven't got a pointer,
+			   take the reference, drop ref to old tile group */
+			if (connector->tile_group) {
+				drm_mode_put_tile_group(connector->dev, connector->tile_group);
+			}
+			connector->tile_group = tg;
+		} else
+			/* if same tile group, then release the ref we just took. */
+			drm_mode_put_tile_group(connector->dev, tg);
+	}
+		break;
+	default:
+		printk("unknown displayid tag %d\n", block->tag);
+		break;
+	}
+	return 0;
+}
+
+static void drm_get_displayid(struct drm_connector *connector,
+			      struct edid *edid)
+{
+	void *displayid = NULL;
+	int ret;
+	connector->has_tile = false;
+	displayid = drm_find_displayid_extension(edid);
+	if (!displayid) {
+		/* drop reference to any tile group we had */
+		goto out_drop_ref;
+	}
+
+	ret = drm_parse_display_id(connector, displayid, EDID_LENGTH, true);
+	if (ret < 0)
+		goto out_drop_ref;
+	if (!connector->has_tile)
+		goto out_drop_ref;
+	return;
+out_drop_ref:
+	if (connector->tile_group) {
+		drm_mode_put_tile_group(connector->dev, connector->tile_group);
+		connector->tile_group = NULL;
+	}
+	return;
+}
diff --git a/drivers/gpu/drm/drm_edid_load.c b/drivers/gpu/drm/drm_edid_load.c
index 0a235fe61c9b..732cb6f8e653 100644
--- a/drivers/gpu/drm/drm_edid_load.c
+++ b/drivers/gpu/drm/drm_edid_load.c
@@ -254,8 +254,7 @@ static void *edid_load(struct drm_connector *connector, const char *name,
 	    name, connector_name);
 
 out:
-	if (fw)
-		release_firmware(fw);
+	release_firmware(fw);
 	return edid;
 }
 
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 0c0c39bac23d..52ce26d6b4fb 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -347,9 +347,18 @@ bool drm_fb_helper_restore_fbdev_mode_unlocked(struct drm_fb_helper *fb_helper)
 {
 	struct drm_device *dev = fb_helper->dev;
 	bool ret;
+	bool do_delayed = false;
+
 	drm_modeset_lock_all(dev);
 	ret = restore_fbdev_mode(fb_helper);
+
+	do_delayed = fb_helper->delayed_hotplug;
+	if (do_delayed)
+		fb_helper->delayed_hotplug = false;
 	drm_modeset_unlock_all(dev);
+
+	if (do_delayed)
+		drm_fb_helper_hotplug_event(fb_helper);
 	return ret;
 }
 EXPORT_SYMBOL(drm_fb_helper_restore_fbdev_mode_unlocked);
@@ -888,10 +897,6 @@ int drm_fb_helper_set_par(struct fb_info *info)
 
 	drm_fb_helper_restore_fbdev_mode_unlocked(fb_helper);
 
-	if (fb_helper->delayed_hotplug) {
-		fb_helper->delayed_hotplug = false;
-		drm_fb_helper_hotplug_event(fb_helper);
-	}
 	return 0;
 }
 EXPORT_SYMBOL(drm_fb_helper_set_par);
@@ -995,19 +1000,21 @@ static int drm_fb_helper_single_fb_probe(struct drm_fb_helper *fb_helper,
 	crtc_count = 0;
 	for (i = 0; i < fb_helper->crtc_count; i++) {
 		struct drm_display_mode *desired_mode;
+		int x, y;
 		desired_mode = fb_helper->crtc_info[i].desired_mode;
-
+		x = fb_helper->crtc_info[i].x;
+		y = fb_helper->crtc_info[i].y;
 		if (desired_mode) {
 			if (gamma_size == 0)
 				gamma_size = fb_helper->crtc_info[i].mode_set.crtc->gamma_size;
-			if (desired_mode->hdisplay < sizes.fb_width)
-				sizes.fb_width = desired_mode->hdisplay;
-			if (desired_mode->vdisplay < sizes.fb_height)
-				sizes.fb_height = desired_mode->vdisplay;
-			if (desired_mode->hdisplay > sizes.surface_width)
-				sizes.surface_width = desired_mode->hdisplay;
-			if (desired_mode->vdisplay > sizes.surface_height)
-				sizes.surface_height = desired_mode->vdisplay;
+			if (desired_mode->hdisplay + x < sizes.fb_width)
+				sizes.fb_width = desired_mode->hdisplay + x;
+			if (desired_mode->vdisplay + y < sizes.fb_height)
+				sizes.fb_height = desired_mode->vdisplay + y;
+			if (desired_mode->hdisplay + x > sizes.surface_width)
+				sizes.surface_width = desired_mode->hdisplay + x;
+			if (desired_mode->vdisplay + y > sizes.surface_height)
+				sizes.surface_height = desired_mode->vdisplay + y;
 			crtc_count++;
 		}
 	}
@@ -1307,6 +1314,7 @@ static void drm_enable_connectors(struct drm_fb_helper *fb_helper,
 
 static bool drm_target_cloned(struct drm_fb_helper *fb_helper,
 			      struct drm_display_mode **modes,
+			      struct drm_fb_offset *offsets,
 			      bool *enabled, int width, int height)
 {
 	int count, i, j;
@@ -1378,27 +1386,88 @@ static bool drm_target_cloned(struct drm_fb_helper *fb_helper,
 	return false;
 }
 
+static int drm_get_tile_offsets(struct drm_fb_helper *fb_helper,
+				struct drm_display_mode **modes,
+				struct drm_fb_offset *offsets,
+				int idx,
+				int h_idx, int v_idx)
+{
+	struct drm_fb_helper_connector *fb_helper_conn;
+	int i;
+	int hoffset = 0, voffset = 0;
+
+	for (i = 0; i < fb_helper->connector_count; i++) {
+		fb_helper_conn = fb_helper->connector_info[i];
+		if (!fb_helper_conn->connector->has_tile)
+			continue;
+
+		if (!modes[i] && (h_idx || v_idx)) {
+			DRM_DEBUG_KMS("no modes for connector tiled %d %d\n", i,
+				      fb_helper_conn->connector->base.id);
+			continue;
+		}
+		if (fb_helper_conn->connector->tile_h_loc < h_idx)
+			hoffset += modes[i]->hdisplay;
+
+		if (fb_helper_conn->connector->tile_v_loc < v_idx)
+			voffset += modes[i]->vdisplay;
+	}
+	offsets[idx].x = hoffset;
+	offsets[idx].y = voffset;
+	DRM_DEBUG_KMS("returned %d %d for %d %d\n", hoffset, voffset, h_idx, v_idx);
+	return 0;
+}
+
 static bool drm_target_preferred(struct drm_fb_helper *fb_helper,
 				 struct drm_display_mode **modes,
+				 struct drm_fb_offset *offsets,
 				 bool *enabled, int width, int height)
 {
 	struct drm_fb_helper_connector *fb_helper_conn;
 	int i;
-
+	uint64_t conn_configured = 0, mask;
+	int tile_pass = 0;
+	mask = (1 << fb_helper->connector_count) - 1;
+retry:
 	for (i = 0; i < fb_helper->connector_count; i++) {
 		fb_helper_conn = fb_helper->connector_info[i];
 
-		if (enabled[i] == false)
+		if (conn_configured & (1 << i))
 			continue;
 
+		if (enabled[i] == false) {
+			conn_configured |= (1 << i);
+			continue;
+		}
+
+		/* first pass over all the untiled connectors */
+		if (tile_pass == 0 && fb_helper_conn->connector->has_tile)
+			continue;
+
+		if (tile_pass == 1) {
+			if (fb_helper_conn->connector->tile_h_loc != 0 ||
+			    fb_helper_conn->connector->tile_v_loc != 0)
+				continue;
+
+		} else {
+			if (fb_helper_conn->connector->tile_h_loc != tile_pass -1 &&
+			    fb_helper_conn->connector->tile_v_loc != tile_pass - 1)
+			/* if this tile_pass doesn't cover any of the tiles - keep going */
+				continue;
+
+			/* find the tile offsets for this pass - need
+			   to find all tiles left and above */
+			drm_get_tile_offsets(fb_helper, modes, offsets,
+					     i, fb_helper_conn->connector->tile_h_loc, fb_helper_conn->connector->tile_v_loc);
+		}
 		DRM_DEBUG_KMS("looking for cmdline mode on connector %d\n",
 			      fb_helper_conn->connector->base.id);
 
 		/* got for command line mode first */
 		modes[i] = drm_pick_cmdline_mode(fb_helper_conn, width, height);
 		if (!modes[i]) {
-			DRM_DEBUG_KMS("looking for preferred mode on connector %d\n",
-				      fb_helper_conn->connector->base.id);
+			DRM_DEBUG_KMS("looking for preferred mode on connector %d %d\n",
+				      fb_helper_conn->connector->base.id, fb_helper_conn->connector->tile_group ? fb_helper_conn->connector->tile_group->id : 0);
 			modes[i] = drm_has_preferred_mode(fb_helper_conn, width, height);
 		}
 		/* No preferred modes, pick one off the list */
@@ -1408,6 +1477,12 @@ static bool drm_target_preferred(struct drm_fb_helper *fb_helper,
 		}
 		DRM_DEBUG_KMS("found mode %s\n", modes[i] ? modes[i]->name :
 			  "none");
+		conn_configured |= (1 << i);
+	}
+
+	if ((conn_configured & mask) != mask) {
+		tile_pass++;
+		goto retry;
 	}
 	return true;
 }
@@ -1497,6 +1572,7 @@ static void drm_setup_crtcs(struct drm_fb_helper *fb_helper)
 	struct drm_device *dev = fb_helper->dev;
 	struct drm_fb_helper_crtc **crtcs;
 	struct drm_display_mode **modes;
+	struct drm_fb_offset *offsets;
 	struct drm_mode_set *modeset;
 	bool *enabled;
 	int width, height;
@@ -1511,9 +1587,11 @@ static void drm_setup_crtcs(struct drm_fb_helper *fb_helper)
 			sizeof(struct drm_fb_helper_crtc *), GFP_KERNEL);
 	modes = kcalloc(dev->mode_config.num_connector,
 			sizeof(struct drm_display_mode *), GFP_KERNEL);
+	offsets = kcalloc(dev->mode_config.num_connector,
+			  sizeof(struct drm_fb_offset), GFP_KERNEL);
 	enabled = kcalloc(dev->mode_config.num_connector,
 			  sizeof(bool), GFP_KERNEL);
-	if (!crtcs || !modes || !enabled) {
+	if (!crtcs || !modes || !enabled || !offsets) {
 		DRM_ERROR("Memory allocation failed\n");
 		goto out;
 	}
@@ -1523,14 +1601,16 @@ static void drm_setup_crtcs(struct drm_fb_helper *fb_helper)
 
 	if (!(fb_helper->funcs->initial_config &&
 	      fb_helper->funcs->initial_config(fb_helper, crtcs, modes,
+					       offsets,
 					       enabled, width, height))) {
 		memset(modes, 0, dev->mode_config.num_connector*sizeof(modes[0]));
 		memset(crtcs, 0, dev->mode_config.num_connector*sizeof(crtcs[0]));
+		memset(offsets, 0, dev->mode_config.num_connector*sizeof(offsets[0]));
 
-		if (!drm_target_cloned(fb_helper,
-				       modes, enabled, width, height) &&
-		    !drm_target_preferred(fb_helper,
-					  modes, enabled, width, height))
+		if (!drm_target_cloned(fb_helper, modes, offsets,
+				       enabled, width, height) &&
+		    !drm_target_preferred(fb_helper, modes, offsets,
+					  enabled, width, height))
 			DRM_ERROR("Unable to find initial modes\n");
 
 		DRM_DEBUG_KMS("picking CRTCs for %dx%d config\n",
@@ -1550,18 +1630,23 @@ static void drm_setup_crtcs(struct drm_fb_helper *fb_helper)
 	for (i = 0; i < fb_helper->connector_count; i++) {
 		struct drm_display_mode *mode = modes[i];
 		struct drm_fb_helper_crtc *fb_crtc = crtcs[i];
+		struct drm_fb_offset *offset = &offsets[i];
 		modeset = &fb_crtc->mode_set;
 
 		if (mode && fb_crtc) {
-			DRM_DEBUG_KMS("desired mode %s set on crtc %d\n",
-				      mode->name, fb_crtc->mode_set.crtc->base.id);
+			DRM_DEBUG_KMS("desired mode %s set on crtc %d (%d,%d)\n",
+				      mode->name, fb_crtc->mode_set.crtc->base.id, offset->x, offset->y);
 			fb_crtc->desired_mode = mode;
+			fb_crtc->x = offset->x;
+			fb_crtc->y = offset->y;
 			if (modeset->mode)
 				drm_mode_destroy(dev, modeset->mode);
 			modeset->mode = drm_mode_duplicate(dev,
 							   fb_crtc->desired_mode);
 			modeset->connectors[modeset->num_connectors++] = fb_helper->connector_info[i]->connector;
 			modeset->fb = fb_helper->fb;
+			modeset->x = offset->x;
+			modeset->y = offset->y;
 		}
 	}
 
@@ -1570,7 +1655,6 @@ static void drm_setup_crtcs(struct drm_fb_helper *fb_helper)
 		modeset = &fb_helper->crtc_info[i].mode_set;
 		if (modeset->num_connectors == 0) {
 			BUG_ON(modeset->fb);
-			BUG_ON(modeset->num_connectors);
 			if (modeset->mode)
 				drm_mode_destroy(dev, modeset->mode);
 			modeset->mode = NULL;
@@ -1579,6 +1663,7 @@ static void drm_setup_crtcs(struct drm_fb_helper *fb_helper)
 out:
 	kfree(crtcs);
 	kfree(modes);
+	kfree(offsets);
 	kfree(enabled);
 }
 
diff --git a/drivers/gpu/drm/drm_flip_work.c b/drivers/gpu/drm/drm_flip_work.c
index f9c7fa3d0012..43d9b950ef9f 100644
--- a/drivers/gpu/drm/drm_flip_work.c
+++ b/drivers/gpu/drm/drm_flip_work.c
@@ -25,6 +25,44 @@
 #include "drm_flip_work.h"
 
 /**
+ * drm_flip_work_allocate_task - allocate a flip-work task
+ * @data: data associated to the task
+ * @flags: allocator flags
+ *
+ * Allocate a drm_flip_task object and attach private data to it.
+ */
+struct drm_flip_task *drm_flip_work_allocate_task(void *data, gfp_t flags)
+{
+	struct drm_flip_task *task;
+
+	task = kzalloc(sizeof(*task), flags);
+	if (task)
+		task->data = data;
+
+	return task;
+}
+EXPORT_SYMBOL(drm_flip_work_allocate_task);
+
+/**
+ * drm_flip_work_queue_task - queue a specific task
+ * @work: the flip-work
+ * @task: the task to handle
+ *
+ * Queues task, that will later be run (passed back to drm_flip_func_t
+ * func) on a work queue after drm_flip_work_commit() is called.
+ */
+void drm_flip_work_queue_task(struct drm_flip_work *work,
+			      struct drm_flip_task *task)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&work->lock, flags);
+	list_add_tail(&task->node, &work->queued);
+	spin_unlock_irqrestore(&work->lock, flags);
+}
+EXPORT_SYMBOL(drm_flip_work_queue_task);
+
+/**
  * drm_flip_work_queue - queue work
  * @work: the flip-work
  * @val: the value to queue
@@ -34,10 +72,14 @@
  */
 void drm_flip_work_queue(struct drm_flip_work *work, void *val)
 {
-	if (kfifo_put(&work->fifo, val)) {
-		atomic_inc(&work->pending);
+	struct drm_flip_task *task;
+
+	task = drm_flip_work_allocate_task(val,
+				drm_can_sleep() ? GFP_KERNEL : GFP_ATOMIC);
+	if (task) {
+		drm_flip_work_queue_task(work, task);
 	} else {
-		DRM_ERROR("%s fifo full!\n", work->name);
+		DRM_ERROR("%s could not allocate task!\n", work->name);
 		work->func(work, val);
 	}
 }
@@ -56,9 +98,12 @@ EXPORT_SYMBOL(drm_flip_work_queue);
 void drm_flip_work_commit(struct drm_flip_work *work,
 		struct workqueue_struct *wq)
 {
-	uint32_t pending = atomic_read(&work->pending);
-	atomic_add(pending, &work->count);
-	atomic_sub(pending, &work->pending);
+	unsigned long flags;
+
+	spin_lock_irqsave(&work->lock, flags);
+	list_splice_tail(&work->queued, &work->commited);
+	INIT_LIST_HEAD(&work->queued);
+	spin_unlock_irqrestore(&work->lock, flags);
 	queue_work(wq, &work->worker);
 }
 EXPORT_SYMBOL(drm_flip_work_commit);
@@ -66,47 +111,46 @@ EXPORT_SYMBOL(drm_flip_work_commit);
 static void flip_worker(struct work_struct *w)
 {
 	struct drm_flip_work *work = container_of(w, struct drm_flip_work, worker);
-	uint32_t count = atomic_read(&work->count);
-	void *val = NULL;
+	struct list_head tasks;
+	unsigned long flags;
+
+	while (1) {
+		struct drm_flip_task *task, *tmp;
+
+		INIT_LIST_HEAD(&tasks);
+		spin_lock_irqsave(&work->lock, flags);
+		list_splice_tail(&work->commited, &tasks);
+		INIT_LIST_HEAD(&work->commited);
+		spin_unlock_irqrestore(&work->lock, flags);
 
-	atomic_sub(count, &work->count);
+		if (list_empty(&tasks))
+			break;
 
-	while(count--)
-		if (!WARN_ON(!kfifo_get(&work->fifo, &val)))
-			work->func(work, val);
+		list_for_each_entry_safe(task, tmp, &tasks, node) {
+			work->func(work, task->data);
+			kfree(task);
+		}
+	}
 }
 
 /**
  * drm_flip_work_init - initialize flip-work
  * @work: the flip-work to initialize
- * @size: the max queue depth
  * @name: debug name
  * @func: the callback work function
  *
  * Initializes/allocates resources for the flip-work
- *
- * RETURNS:
- * Zero on success, error code on failure.
  */
-int drm_flip_work_init(struct drm_flip_work *work, int size,
+void drm_flip_work_init(struct drm_flip_work *work,
 		const char *name, drm_flip_func_t func)
 {
-	int ret;
-
 	work->name = name;
-	atomic_set(&work->count, 0);
-	atomic_set(&work->pending, 0);
+	INIT_LIST_HEAD(&work->queued);
+	INIT_LIST_HEAD(&work->commited);
+	spin_lock_init(&work->lock);
 	work->func = func;
 
-	ret = kfifo_alloc(&work->fifo, size, GFP_KERNEL);
-	if (ret) {
-		DRM_ERROR("could not allocate %s fifo\n", name);
-		return ret;
-	}
-
 	INIT_WORK(&work->worker, flip_worker);
-
-	return 0;
 }
 EXPORT_SYMBOL(drm_flip_work_init);
 
@@ -118,7 +162,6 @@ EXPORT_SYMBOL(drm_flip_work_init);
  */
 void drm_flip_work_cleanup(struct drm_flip_work *work)
 {
-	WARN_ON(!kfifo_is_empty(&work->fifo));
-	kfifo_free(&work->fifo);
+	WARN_ON(!list_empty(&work->queued) || !list_empty(&work->commited));
 }
 EXPORT_SYMBOL(drm_flip_work_cleanup);
diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index ed7bc68f7e87..0b9514b6cd64 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -515,16 +515,19 @@ ssize_t drm_read(struct file *filp, char __user *buffer,
 	size_t total;
 	ssize_t ret;
 
-	ret = wait_event_interruptible(file_priv->event_wait,
-				       !list_empty(&file_priv->event_list));
-	if (ret < 0)
-		return ret;
+	if ((filp->f_flags & O_NONBLOCK) == 0) {
+		ret = wait_event_interruptible(file_priv->event_wait,
+					       !list_empty(&file_priv->event_list));
+		if (ret < 0)
+			return ret;
+	}
 
 	total = 0;
 	while (drm_dequeue_event(file_priv, total, count, &e)) {
 		if (copy_to_user(buffer + total,
 				 e->event, e->event->length)) {
 			total = -EFAULT;
+			e->destroy(e);
 			break;
 		}
 
@@ -532,7 +535,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer,
 		e->destroy(e);
 	}
 
-	return total;
+	return total ?: -EAGAIN;
 }
 EXPORT_SYMBOL(drm_read);
 
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index f6ca51259fa3..16a164770713 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -188,7 +188,7 @@ drm_gem_remove_prime_handles(struct drm_gem_object *obj, struct drm_file *filp)
 }
 
 /**
- * drm_gem_object_free - release resources bound to userspace handles
+ * drm_gem_object_handle_free - release resources bound to userspace handles
  * @obj: GEM object to clean up.
  *
  * Called after the last handle to the object has been closed
@@ -309,7 +309,7 @@ EXPORT_SYMBOL(drm_gem_dumb_destroy);
  * drm_gem_handle_create_tail - internal functions to create a handle
  * @file_priv: drm file-private structure to register the handle for
  * @obj: object to register
- * @handlep: pionter to return the created handle to the caller
+ * @handlep: pointer to return the created handle to the caller
  * 
  * This expects the dev->object_name_lock to be held already and will drop it
  * before returning. Used to avoid races in establishing new handles when
@@ -362,7 +362,7 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
 }
 
 /**
- * gem_handle_create - create a gem handle for an object
+ * drm_gem_handle_create - create a gem handle for an object
  * @file_priv: drm file-private structure to register the handle for
  * @obj: object to register
  * @handlep: pionter to return the created handle to the caller
@@ -371,10 +371,9 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
  * to the object, which includes a regular reference count. Callers
  * will likely want to dereference the object afterwards.
  */
-int
-drm_gem_handle_create(struct drm_file *file_priv,
-		       struct drm_gem_object *obj,
-		       u32 *handlep)
+int drm_gem_handle_create(struct drm_file *file_priv,
+			  struct drm_gem_object *obj,
+			  u32 *handlep)
 {
 	mutex_lock(&obj->dev->object_name_lock);
 
diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index 0316310e2cc4..e419eedf751d 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -29,18 +29,31 @@
 #include <drm/drm_gem_cma_helper.h>
 #include <drm/drm_vma_manager.h>
 
-/*
+/**
+ * DOC: cma helpers
+ *
+ * The Contiguous Memory Allocator reserves a pool of memory at early boot
+ * that is used to service requests for large blocks of contiguous memory.
+ *
+ * The DRM GEM/CMA helpers use this allocator as a means to provide buffer
+ * objects that are physically contiguous in memory. This is useful for
+ * display drivers that are unable to map scattered buffers via an IOMMU.
+ */
+
+/**
  * __drm_gem_cma_create - Create a GEM CMA object without allocating memory
- * @drm: The drm device
- * @size: The GEM object size
+ * @drm: DRM device
+ * @size: size of the object to allocate
  *
- * This function creates and initializes a GEM CMA object of the given size, but
- * doesn't allocate any memory to back the object.
+ * This function creates and initializes a GEM CMA object of the given size,
+ * but doesn't allocate any memory to back the object.
  *
- * Return a struct drm_gem_cma_object* on success or ERR_PTR values on failure.
+ * Returns:
+ * A struct drm_gem_cma_object * on success or an ERR_PTR()-encoded negative
+ * error code on failure.
  */
 static struct drm_gem_cma_object *
-__drm_gem_cma_create(struct drm_device *drm, unsigned int size)
+__drm_gem_cma_create(struct drm_device *drm, size_t size)
 {
 	struct drm_gem_cma_object *cma_obj;
 	struct drm_gem_object *gem_obj;
@@ -69,14 +82,21 @@ error:
 	return ERR_PTR(ret);
 }
 
-/*
+/**
  * drm_gem_cma_create - allocate an object with the given size
+ * @drm: DRM device
+ * @size: size of the object to allocate
+ *
+ * This function creates a CMA GEM object and allocates a contiguous chunk of
+ * memory as backing store. The backing memory has the writecombine attribute
+ * set.
  *
- * returns a struct drm_gem_cma_object* on success or ERR_PTR values
- * on failure.
+ * Returns:
+ * A struct drm_gem_cma_object * on success or an ERR_PTR()-encoded negative
+ * error code on failure.
  */
 struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
-		unsigned int size)
+					      size_t size)
 {
 	struct drm_gem_cma_object *cma_obj;
 	int ret;
@@ -104,17 +124,26 @@ error:
 }
 EXPORT_SYMBOL_GPL(drm_gem_cma_create);
 
-/*
- * drm_gem_cma_create_with_handle - allocate an object with the given
- * size and create a gem handle on it
+/**
+ * drm_gem_cma_create_with_handle - allocate an object with the given size and
+ *     return a GEM handle to it
+ * @file_priv: DRM file-private structure to register the handle for
+ * @drm: DRM device
+ * @size: size of the object to allocate
+ * @handle: return location for the GEM handle
+ *
+ * This function creates a CMA GEM object, allocating a physically contiguous
+ * chunk of memory as backing store. The GEM object is then added to the list
+ * of object associated with the given file and a handle to it is returned.
  *
- * returns a struct drm_gem_cma_object* on success or ERR_PTR values
- * on failure.
+ * Returns:
+ * A struct drm_gem_cma_object * on success or an ERR_PTR()-encoded negative
+ * error code on failure.
  */
-static struct drm_gem_cma_object *drm_gem_cma_create_with_handle(
-		struct drm_file *file_priv,
-		struct drm_device *drm, unsigned int size,
-		unsigned int *handle)
+static struct drm_gem_cma_object *
+drm_gem_cma_create_with_handle(struct drm_file *file_priv,
+			       struct drm_device *drm, size_t size,
+			       uint32_t *handle)
 {
 	struct drm_gem_cma_object *cma_obj;
 	struct drm_gem_object *gem_obj;
@@ -145,16 +174,19 @@ err_handle_create:
 	return ERR_PTR(ret);
 }
 
-/*
- * drm_gem_cma_free_object - (struct drm_driver)->gem_free_object callback
- * function
+/**
+ * drm_gem_cma_free_object - free resources associated with a CMA GEM object
+ * @gem_obj: GEM object to free
+ *
+ * This function frees the backing memory of the CMA GEM object, cleans up the
+ * GEM object state and frees the memory used to store the object itself.
+ * Drivers using the CMA helpers should set this as their DRM driver's
+ * ->gem_free_object() callback.
  */
 void drm_gem_cma_free_object(struct drm_gem_object *gem_obj)
 {
 	struct drm_gem_cma_object *cma_obj;
 
-	drm_gem_free_mmap_offset(gem_obj);
-
 	cma_obj = to_drm_gem_cma_obj(gem_obj);
 
 	if (cma_obj->vaddr) {
@@ -170,18 +202,26 @@ void drm_gem_cma_free_object(struct drm_gem_object *gem_obj)
 }
 EXPORT_SYMBOL_GPL(drm_gem_cma_free_object);
 
-/*
- * drm_gem_cma_dumb_create - (struct drm_driver)->dumb_create callback
- * function
+/**
+ * drm_gem_cma_dumb_create_internal - create a dumb buffer object
+ * @file_priv: DRM file-private structure to create the dumb buffer for
+ * @drm: DRM device
+ * @args: IOCTL data
+ *
+ * This aligns the pitch and size arguments to the minimum required. This is
+ * an internal helper that can be wrapped by a driver to account for hardware
+ * with more specific alignment requirements. It should not be used directly
+ * as the ->dumb_create() callback in a DRM driver.
  *
- * This aligns the pitch and size arguments to the minimum required. wrap
- * this into your own function if you need bigger alignment.
+ * Returns:
+ * 0 on success or a negative error code on failure.
  */
-int drm_gem_cma_dumb_create(struct drm_file *file_priv,
-		struct drm_device *dev, struct drm_mode_create_dumb *args)
+int drm_gem_cma_dumb_create_internal(struct drm_file *file_priv,
+				     struct drm_device *drm,
+				     struct drm_mode_create_dumb *args)
 {
+	unsigned int min_pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
 	struct drm_gem_cma_object *cma_obj;
-	int min_pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
 
 	if (args->pitch < min_pitch)
 		args->pitch = min_pitch;
@@ -189,18 +229,63 @@ int drm_gem_cma_dumb_create(struct drm_file *file_priv,
 	if (args->size < args->pitch * args->height)
 		args->size = args->pitch * args->height;
 
-	cma_obj = drm_gem_cma_create_with_handle(file_priv, dev,
-			args->size, &args->handle);
+	cma_obj = drm_gem_cma_create_with_handle(file_priv, drm, args->size,
+						 &args->handle);
+	return PTR_ERR_OR_ZERO(cma_obj);
+}
+EXPORT_SYMBOL_GPL(drm_gem_cma_dumb_create_internal);
+
+/**
+ * drm_gem_cma_dumb_create - create a dumb buffer object
+ * @file_priv: DRM file-private structure to create the dumb buffer for
+ * @drm: DRM device
+ * @args: IOCTL data
+ *
+ * This function computes the pitch of the dumb buffer and rounds it up to an
+ * integer number of bytes per pixel. Drivers for hardware that doesn't have
+ * any additional restrictions on the pitch can directly use this function as
+ * their ->dumb_create() callback.
+ *
+ * For hardware with additional restrictions, drivers can adjust the fields
+ * set up by userspace and pass the IOCTL data along to the
+ * drm_gem_cma_dumb_create_internal() function.
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
+ */
+int drm_gem_cma_dumb_create(struct drm_file *file_priv,
+			    struct drm_device *drm,
+			    struct drm_mode_create_dumb *args)
+{
+	struct drm_gem_cma_object *cma_obj;
+
+	args->pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
+	args->size = args->pitch * args->height;
+
+	cma_obj = drm_gem_cma_create_with_handle(file_priv, drm, args->size,
+						 &args->handle);
 	return PTR_ERR_OR_ZERO(cma_obj);
 }
 EXPORT_SYMBOL_GPL(drm_gem_cma_dumb_create);
 
-/*
- * drm_gem_cma_dumb_map_offset - (struct drm_driver)->dumb_map_offset callback
- * function
+/**
+ * drm_gem_cma_dumb_map_offset - return the fake mmap offset for a CMA GEM
+ *     object
+ * @file_priv: DRM file-private structure containing the GEM object
+ * @drm: DRM device
+ * @handle: GEM object handle
+ * @offset: return location for the fake mmap offset
+ *
+ * This function look up an object by its handle and returns the fake mmap
+ * offset associated with it. Drivers using the CMA helpers should set this
+ * as their DRM driver's ->dumb_map_offset() callback.
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
  */
 int drm_gem_cma_dumb_map_offset(struct drm_file *file_priv,
-		struct drm_device *drm, uint32_t handle, uint64_t *offset)
+				struct drm_device *drm, u32 handle,
+				u64 *offset)
 {
 	struct drm_gem_object *gem_obj;
 
@@ -208,7 +293,7 @@ int drm_gem_cma_dumb_map_offset(struct drm_file *file_priv,
 
 	gem_obj = drm_gem_object_lookup(drm, file_priv, handle);
 	if (!gem_obj) {
-		dev_err(drm->dev, "failed to lookup gem object\n");
+		dev_err(drm->dev, "failed to lookup GEM object\n");
 		mutex_unlock(&drm->struct_mutex);
 		return -EINVAL;
 	}
@@ -251,8 +336,20 @@ static int drm_gem_cma_mmap_obj(struct drm_gem_cma_object *cma_obj,
 	return ret;
 }
 
-/*
- * drm_gem_cma_mmap - (struct file_operation)->mmap callback function
+/**
+ * drm_gem_cma_mmap - memory-map a CMA GEM object
+ * @filp: file object
+ * @vma: VMA for the area to be mapped
+ *
+ * This function implements an augmented version of the GEM DRM file mmap
+ * operation for CMA objects: In addition to the usual GEM VMA setup it
+ * immediately faults in the entire object instead of using on-demaind
+ * faulting. Drivers which employ the CMA helpers should use this function
+ * as their ->mmap() handler in the DRM device file's file_operations
+ * structure.
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
  */
 int drm_gem_cma_mmap(struct file *filp, struct vm_area_struct *vma)
 {
@@ -272,7 +369,16 @@ int drm_gem_cma_mmap(struct file *filp, struct vm_area_struct *vma)
 EXPORT_SYMBOL_GPL(drm_gem_cma_mmap);
 
 #ifdef CONFIG_DEBUG_FS
-void drm_gem_cma_describe(struct drm_gem_cma_object *cma_obj, struct seq_file *m)
+/**
+ * drm_gem_cma_describe - describe a CMA GEM object for debugfs
+ * @cma_obj: CMA GEM object
+ * @m: debugfs file handle
+ *
+ * This function can be used to dump a human-readable representation of the
+ * CMA GEM object into a synthetic file.
+ */
+void drm_gem_cma_describe(struct drm_gem_cma_object *cma_obj,
+			  struct seq_file *m)
 {
 	struct drm_gem_object *obj = &cma_obj->base;
 	struct drm_device *dev = obj->dev;
@@ -291,7 +397,18 @@ void drm_gem_cma_describe(struct drm_gem_cma_object *cma_obj, struct seq_file *m
 EXPORT_SYMBOL_GPL(drm_gem_cma_describe);
 #endif
 
-/* low-level interface prime helpers */
+/**
+ * drm_gem_cma_prime_get_sg_table - provide a scatter/gather table of pinned
+ *     pages for a CMA GEM object
+ * @obj: GEM object
+ *
+ * This function exports a scatter/gather table suitable for PRIME usage by
+ * calling the standard DMA mapping API. Drivers using the CMA helpers should
+ * set this as their DRM driver's ->gem_prime_get_sg_table() callback.
+ *
+ * Returns:
+ * A pointer to the scatter/gather table of pinned pages or NULL on failure.
+ */
 struct sg_table *drm_gem_cma_prime_get_sg_table(struct drm_gem_object *obj)
 {
 	struct drm_gem_cma_object *cma_obj = to_drm_gem_cma_obj(obj);
@@ -315,6 +432,23 @@ out:
 }
 EXPORT_SYMBOL_GPL(drm_gem_cma_prime_get_sg_table);
 
+/**
+ * drm_gem_cma_prime_import_sg_table - produce a CMA GEM object from another
+ *     driver's scatter/gather table of pinned pages
+ * @dev: device to import into
+ * @attach: DMA-BUF attachment
+ * @sgt: scatter/gather table of pinned pages
+ *
+ * This function imports a scatter/gather table exported via DMA-BUF by
+ * another driver. Imported buffers must be physically contiguous in memory
+ * (i.e. the scatter/gather table must contain a single entry). Drivers that
+ * use the CMA helpers should set this as their DRM driver's
+ * ->gem_prime_import_sg_table() callback.
+ *
+ * Returns:
+ * A pointer to a newly created GEM object or an ERR_PTR-encoded negative
+ * error code on failure.
+ */
 struct drm_gem_object *
 drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
 				  struct dma_buf_attachment *attach,
@@ -339,6 +473,18 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
 }
 EXPORT_SYMBOL_GPL(drm_gem_cma_prime_import_sg_table);
 
+/**
+ * drm_gem_cma_prime_mmap - memory-map an exported CMA GEM object
+ * @obj: GEM object
+ * @vma: VMA for the area to be mapped
+ *
+ * This function maps a buffer imported via DRM PRIME into a userspace
+ * process's address space. Drivers that use the CMA helpers should set this
+ * as their DRM driver's ->gem_prime_mmap() callback.
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
+ */
 int drm_gem_cma_prime_mmap(struct drm_gem_object *obj,
 			   struct vm_area_struct *vma)
 {
@@ -357,6 +503,20 @@ int drm_gem_cma_prime_mmap(struct drm_gem_object *obj,
 }
 EXPORT_SYMBOL_GPL(drm_gem_cma_prime_mmap);
 
+/**
+ * drm_gem_cma_prime_vmap - map a CMA GEM object into the kernel's virtual
+ *     address space
+ * @obj: GEM object
+ *
+ * This function maps a buffer exported via DRM PRIME into the kernel's
+ * virtual address space. Since the CMA buffers are already mapped into the
+ * kernel virtual address space this simply returns the cached virtual
+ * address. Drivers using the CMA helpers should set this as their DRM
+ * driver's ->gem_prime_vmap() callback.
+ *
+ * Returns:
+ * The kernel virtual address of the CMA GEM object's backing store.
+ */
 void *drm_gem_cma_prime_vmap(struct drm_gem_object *obj)
 {
 	struct drm_gem_cma_object *cma_obj = to_drm_gem_cma_obj(obj);
@@ -365,6 +525,17 @@ void *drm_gem_cma_prime_vmap(struct drm_gem_object *obj)
 }
 EXPORT_SYMBOL_GPL(drm_gem_cma_prime_vmap);
 
+/**
+ * drm_gem_cma_prime_vunmap - unmap a CMA GEM object from the kernel's virtual
+ *     address space
+ * @obj: GEM object
+ * @vaddr: kernel virtual address where the CMA GEM object was mapped
+ *
+ * This function removes a buffer exported via DRM PRIME from the kernel's
+ * virtual address space. This is a no-op because CMA buffers cannot be
+ * unmapped from kernel space. Drivers using the CMA helpers should set this
+ * as their DRM driver's ->gem_prime_vunmap() callback.
+ */
 void drm_gem_cma_prime_vunmap(struct drm_gem_object *obj, void *vaddr)
 {
 	/* Nothing to do */
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 5ef03c216a27..f5a5f18efa5b 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -166,7 +166,7 @@ static void vblank_disable_and_save(struct drm_device *dev, int crtc)
 	spin_lock_irqsave(&dev->vblank_time_lock, irqflags);
 
 	/*
-	 * If the vblank interrupt was already disbled update the count
+	 * If the vblank interrupt was already disabled update the count
 	 * and timestamp to maintain the appearance that the counter
 	 * has been ticking all along until this time. This makes the
 	 * count account for the entire time between drm_vblank_on() and
@@ -1029,7 +1029,8 @@ void drm_vblank_put(struct drm_device *dev, int crtc)
 {
 	struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
 
-	BUG_ON(atomic_read(&vblank->refcount) == 0);
+	if (WARN_ON(atomic_read(&vblank->refcount) == 0))
+		return;
 
 	if (WARN_ON(crtc >= dev->num_crtcs))
 		return;
@@ -1190,7 +1191,7 @@ EXPORT_SYMBOL(drm_crtc_vblank_off);
  *
  * This functions restores the vblank interrupt state captured with
  * drm_vblank_off() again. Note that calls to drm_vblank_on() and
- * drm_vblank_off() can be unbalanced and so can also be unconditionaly called
+ * drm_vblank_off() can be unbalanced and so can also be unconditionally called
  * in driver load code to reflect the current hardware state of the crtc.
  *
  * This is the legacy version of drm_crtc_vblank_on().
@@ -1237,7 +1238,7 @@ EXPORT_SYMBOL(drm_vblank_on);
  *
  * This functions restores the vblank interrupt state captured with
  * drm_vblank_off() again. Note that calls to drm_vblank_on() and
- * drm_vblank_off() can be unbalanced and so can also be unconditionaly called
+ * drm_vblank_off() can be unbalanced and so can also be unconditionally called
  * in driver load code to reflect the current hardware state of the crtc.
  *
  * This is the native kms version of drm_vblank_on().
diff --git a/drivers/gpu/drm/drm_mipi_dsi.c b/drivers/gpu/drm/drm_mipi_dsi.c
index eb6dfe52cab2..c0644bb865f2 100644
--- a/drivers/gpu/drm/drm_mipi_dsi.c
+++ b/drivers/gpu/drm/drm_mipi_dsi.c
@@ -35,6 +35,16 @@
 
 #include <video/mipi_display.h>
 
+/**
+ * DOC: dsi helpers
+ *
+ * These functions contain some common logic and helpers to deal with MIPI DSI
+ * peripherals.
+ *
+ * Helpers are provided for a number of standard MIPI DSI command as well as a
+ * subset of the MIPI DCS command set.
+ */
+
 static int mipi_dsi_device_match(struct device *dev, struct device_driver *drv)
 {
 	return of_driver_match_device(dev, drv);
@@ -57,6 +67,29 @@ static struct bus_type mipi_dsi_bus_type = {
 	.pm = &mipi_dsi_device_pm_ops,
 };
 
+static int of_device_match(struct device *dev, void *data)
+{
+	return dev->of_node == data;
+}
+
+/**
+ * of_find_mipi_dsi_device_by_node() - find the MIPI DSI device matching a
+ *    device tree node
+ * @np: device tree node
+ *
+ * Return: A pointer to the MIPI DSI device corresponding to @np or NULL if no
+ *    such device exists (or has not been registered yet).
+ */
+struct mipi_dsi_device *of_find_mipi_dsi_device_by_node(struct device_node *np)
+{
+	struct device *dev;
+
+	dev = bus_find_device(&mipi_dsi_bus_type, NULL, np, of_device_match);
+
+	return dev ? to_mipi_dsi_device(dev) : NULL;
+}
+EXPORT_SYMBOL(of_find_mipi_dsi_device_by_node);
+
 static void mipi_dsi_dev_release(struct device *dev)
 {
 	struct mipi_dsi_device *dsi = to_mipi_dsi_device(dev);
@@ -198,59 +231,351 @@ int mipi_dsi_detach(struct mipi_dsi_device *dsi)
 }
 EXPORT_SYMBOL(mipi_dsi_detach);
 
+static ssize_t mipi_dsi_device_transfer(struct mipi_dsi_device *dsi,
+					struct mipi_dsi_msg *msg)
+{
+	const struct mipi_dsi_host_ops *ops = dsi->host->ops;
+
+	if (!ops || !ops->transfer)
+		return -ENOSYS;
+
+	if (dsi->mode_flags & MIPI_DSI_MODE_LPM)
+		msg->flags |= MIPI_DSI_MSG_USE_LPM;
+
+	return ops->transfer(dsi->host, msg);
+}
+
 /**
- * mipi_dsi_dcs_write - send DCS write command
- * @dsi: DSI device
- * @data: pointer to the command followed by parameters
- * @len: length of @data
+ * mipi_dsi_packet_format_is_short - check if a packet is of the short format
+ * @type: MIPI DSI data type of the packet
+ *
+ * Return: true if the packet for the given data type is a short packet, false
+ * otherwise.
  */
-ssize_t mipi_dsi_dcs_write(struct mipi_dsi_device *dsi, const void *data,
-			    size_t len)
+bool mipi_dsi_packet_format_is_short(u8 type)
+{
+	switch (type) {
+	case MIPI_DSI_V_SYNC_START:
+	case MIPI_DSI_V_SYNC_END:
+	case MIPI_DSI_H_SYNC_START:
+	case MIPI_DSI_H_SYNC_END:
+	case MIPI_DSI_END_OF_TRANSMISSION:
+	case MIPI_DSI_COLOR_MODE_OFF:
+	case MIPI_DSI_COLOR_MODE_ON:
+	case MIPI_DSI_SHUTDOWN_PERIPHERAL:
+	case MIPI_DSI_TURN_ON_PERIPHERAL:
+	case MIPI_DSI_GENERIC_SHORT_WRITE_0_PARAM:
+	case MIPI_DSI_GENERIC_SHORT_WRITE_1_PARAM:
+	case MIPI_DSI_GENERIC_SHORT_WRITE_2_PARAM:
+	case MIPI_DSI_GENERIC_READ_REQUEST_0_PARAM:
+	case MIPI_DSI_GENERIC_READ_REQUEST_1_PARAM:
+	case MIPI_DSI_GENERIC_READ_REQUEST_2_PARAM:
+	case MIPI_DSI_DCS_SHORT_WRITE:
+	case MIPI_DSI_DCS_SHORT_WRITE_PARAM:
+	case MIPI_DSI_DCS_READ:
+	case MIPI_DSI_SET_MAXIMUM_RETURN_PACKET_SIZE:
+		return true;
+	}
+
+	return false;
+}
+EXPORT_SYMBOL(mipi_dsi_packet_format_is_short);
+
+/**
+ * mipi_dsi_packet_format_is_long - check if a packet is of the long format
+ * @type: MIPI DSI data type of the packet
+ *
+ * Return: true if the packet for the given data type is a long packet, false
+ * otherwise.
+ */
+bool mipi_dsi_packet_format_is_long(u8 type)
+{
+	switch (type) {
+	case MIPI_DSI_NULL_PACKET:
+	case MIPI_DSI_BLANKING_PACKET:
+	case MIPI_DSI_GENERIC_LONG_WRITE:
+	case MIPI_DSI_DCS_LONG_WRITE:
+	case MIPI_DSI_LOOSELY_PACKED_PIXEL_STREAM_YCBCR20:
+	case MIPI_DSI_PACKED_PIXEL_STREAM_YCBCR24:
+	case MIPI_DSI_PACKED_PIXEL_STREAM_YCBCR16:
+	case MIPI_DSI_PACKED_PIXEL_STREAM_30:
+	case MIPI_DSI_PACKED_PIXEL_STREAM_36:
+	case MIPI_DSI_PACKED_PIXEL_STREAM_YCBCR12:
+	case MIPI_DSI_PACKED_PIXEL_STREAM_16:
+	case MIPI_DSI_PACKED_PIXEL_STREAM_18:
+	case MIPI_DSI_PIXEL_STREAM_3BYTE_18:
+	case MIPI_DSI_PACKED_PIXEL_STREAM_24:
+		return true;
+	}
+
+	return false;
+}
+EXPORT_SYMBOL(mipi_dsi_packet_format_is_long);
+
+/**
+ * mipi_dsi_create_packet - create a packet from a message according to the
+ *     DSI protocol
+ * @packet: pointer to a DSI packet structure
+ * @msg: message to translate into a packet
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int mipi_dsi_create_packet(struct mipi_dsi_packet *packet,
+			   const struct mipi_dsi_msg *msg)
+{
+	const u8 *tx = msg->tx_buf;
+
+	if (!packet || !msg)
+		return -EINVAL;
+
+	/* do some minimum sanity checking */
+	if (!mipi_dsi_packet_format_is_short(msg->type) &&
+	    !mipi_dsi_packet_format_is_long(msg->type))
+		return -EINVAL;
+
+	if (msg->channel > 3)
+		return -EINVAL;
+
+	memset(packet, 0, sizeof(*packet));
+	packet->header[0] = ((msg->channel & 0x3) << 6) | (msg->type & 0x3f);
+
+	/* TODO: compute ECC if hardware support is not available */
+
+	/*
+	 * Long write packets contain the word count in header bytes 1 and 2.
+	 * The payload follows the header and is word count bytes long.
+	 *
+	 * Short write packets encode up to two parameters in header bytes 1
+	 * and 2.
+	 */
+	if (mipi_dsi_packet_format_is_long(msg->type)) {
+		packet->header[1] = (msg->tx_len >> 0) & 0xff;
+		packet->header[2] = (msg->tx_len >> 8) & 0xff;
+
+		packet->payload_length = msg->tx_len;
+		packet->payload = tx;
+	} else {
+		packet->header[1] = (msg->tx_len > 0) ? tx[0] : 0;
+		packet->header[2] = (msg->tx_len > 1) ? tx[1] : 0;
+	}
+
+	packet->size = sizeof(packet->header) + packet->payload_length;
+
+	return 0;
+}
+EXPORT_SYMBOL(mipi_dsi_create_packet);
+
+/*
+ * mipi_dsi_set_maximum_return_packet_size() - specify the maximum size of the
+ *    the payload in a long packet transmitted from the peripheral back to the
+ *    host processor
+ * @dsi: DSI peripheral device
+ * @value: the maximum size of the payload
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int mipi_dsi_set_maximum_return_packet_size(struct mipi_dsi_device *dsi,
+					    u16 value)
+{
+	u8 tx[2] = { value & 0xff, value >> 8 };
+	struct mipi_dsi_msg msg = {
+		.channel = dsi->channel,
+		.type = MIPI_DSI_SET_MAXIMUM_RETURN_PACKET_SIZE,
+		.tx_len = sizeof(tx),
+		.tx_buf = tx,
+	};
+
+	return mipi_dsi_device_transfer(dsi, &msg);
+}
+EXPORT_SYMBOL(mipi_dsi_set_maximum_return_packet_size);
+
+/**
+ * mipi_dsi_generic_write() - transmit data using a generic write packet
+ * @dsi: DSI peripheral device
+ * @payload: buffer containing the payload
+ * @size: size of payload buffer
+ *
+ * This function will automatically choose the right data type depending on
+ * the payload length.
+ *
+ * Return: The number of bytes transmitted on success or a negative error code
+ * on failure.
+ */
+ssize_t mipi_dsi_generic_write(struct mipi_dsi_device *dsi, const void *payload,
+			       size_t size)
+{
+	struct mipi_dsi_msg msg = {
+		.channel = dsi->channel,
+		.tx_buf = payload,
+		.tx_len = size
+	};
+
+	switch (size) {
+	case 0:
+		msg.type = MIPI_DSI_GENERIC_SHORT_WRITE_0_PARAM;
+		break;
+
+	case 1:
+		msg.type = MIPI_DSI_GENERIC_SHORT_WRITE_1_PARAM;
+		break;
+
+	case 2:
+		msg.type = MIPI_DSI_GENERIC_SHORT_WRITE_2_PARAM;
+		break;
+
+	default:
+		msg.type = MIPI_DSI_GENERIC_LONG_WRITE;
+		break;
+	}
+
+	return mipi_dsi_device_transfer(dsi, &msg);
+}
+EXPORT_SYMBOL(mipi_dsi_generic_write);
+
+/**
+ * mipi_dsi_generic_read() - receive data using a generic read packet
+ * @dsi: DSI peripheral device
+ * @params: buffer containing the request parameters
+ * @num_params: number of request parameters
+ * @data: buffer in which to return the received data
+ * @size: size of receive buffer
+ *
+ * This function will automatically choose the right data type depending on
+ * the number of parameters passed in.
+ *
+ * Return: The number of bytes successfully read or a negative error code on
+ * failure.
+ */
+ssize_t mipi_dsi_generic_read(struct mipi_dsi_device *dsi, const void *params,
+			      size_t num_params, void *data, size_t size)
+{
+	struct mipi_dsi_msg msg = {
+		.channel = dsi->channel,
+		.tx_len = num_params,
+		.tx_buf = params,
+		.rx_len = size,
+		.rx_buf = data
+	};
+
+	switch (num_params) {
+	case 0:
+		msg.type = MIPI_DSI_GENERIC_READ_REQUEST_0_PARAM;
+		break;
+
+	case 1:
+		msg.type = MIPI_DSI_GENERIC_READ_REQUEST_1_PARAM;
+		break;
+
+	case 2:
+		msg.type = MIPI_DSI_GENERIC_READ_REQUEST_2_PARAM;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	return mipi_dsi_device_transfer(dsi, &msg);
+}
+EXPORT_SYMBOL(mipi_dsi_generic_read);
+
+/**
+ * mipi_dsi_dcs_write_buffer() - transmit a DCS command with payload
+ * @dsi: DSI peripheral device
+ * @data: buffer containing data to be transmitted
+ * @len: size of transmission buffer
+ *
+ * This function will automatically choose the right data type depending on
+ * the command payload length.
+ *
+ * Return: The number of bytes successfully transmitted or a negative error
+ * code on failure.
+ */
+ssize_t mipi_dsi_dcs_write_buffer(struct mipi_dsi_device *dsi,
+				  const void *data, size_t len)
 {
-	const struct mipi_dsi_host_ops *ops = dsi->host->ops;
 	struct mipi_dsi_msg msg = {
 		.channel = dsi->channel,
 		.tx_buf = data,
 		.tx_len = len
 	};
 
-	if (!ops || !ops->transfer)
-		return -ENOSYS;
-
 	switch (len) {
 	case 0:
 		return -EINVAL;
+
 	case 1:
 		msg.type = MIPI_DSI_DCS_SHORT_WRITE;
 		break;
+
 	case 2:
 		msg.type = MIPI_DSI_DCS_SHORT_WRITE_PARAM;
 		break;
+
 	default:
 		msg.type = MIPI_DSI_DCS_LONG_WRITE;
 		break;
 	}
 
-	if (dsi->mode_flags & MIPI_DSI_MODE_LPM)
-		msg.flags = MIPI_DSI_MSG_USE_LPM;
+	return mipi_dsi_device_transfer(dsi, &msg);
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_write_buffer);
 
-	return ops->transfer(dsi->host, &msg);
+/**
+ * mipi_dsi_dcs_write() - send DCS write command
+ * @dsi: DSI peripheral device
+ * @cmd: DCS command
+ * @data: buffer containing the command payload
+ * @len: command payload length
+ *
+ * This function will automatically choose the right data type depending on
+ * the command payload length.
+ *
+ * Return: The number of bytes successfully transmitted or a negative error
+ * code on failure.
+ */
+ssize_t mipi_dsi_dcs_write(struct mipi_dsi_device *dsi, u8 cmd,
+			   const void *data, size_t len)
+{
+	ssize_t err;
+	size_t size;
+	u8 *tx;
+
+	if (len > 0) {
+		size = 1 + len;
+
+		tx = kmalloc(size, GFP_KERNEL);
+		if (!tx)
+			return -ENOMEM;
+
+		/* concatenate the DCS command byte and the payload */
+		tx[0] = cmd;
+		memcpy(&tx[1], data, len);
+	} else {
+		tx = &cmd;
+		size = 1;
+	}
+
+	err = mipi_dsi_dcs_write_buffer(dsi, tx, size);
+
+	if (len > 0)
+		kfree(tx);
+
+	return err;
 }
 EXPORT_SYMBOL(mipi_dsi_dcs_write);
 
 /**
- * mipi_dsi_dcs_read - send DCS read request command
- * @dsi: DSI device
- * @cmd: DCS read command
- * @data: pointer to read buffer
- * @len: length of @data
+ * mipi_dsi_dcs_read() - send DCS read request command
+ * @dsi: DSI peripheral device
+ * @cmd: DCS command
+ * @data: buffer in which to receive data
+ * @len: size of receive buffer
  *
- * Function returns number of read bytes or error code.
+ * Return: The number of bytes read or a negative error code on failure.
  */
 ssize_t mipi_dsi_dcs_read(struct mipi_dsi_device *dsi, u8 cmd, void *data,
 			  size_t len)
 {
-	const struct mipi_dsi_host_ops *ops = dsi->host->ops;
 	struct mipi_dsi_msg msg = {
 		.channel = dsi->channel,
 		.type = MIPI_DSI_DCS_READ,
@@ -260,15 +585,282 @@ ssize_t mipi_dsi_dcs_read(struct mipi_dsi_device *dsi, u8 cmd, void *data,
 		.rx_len = len
 	};
 
-	if (!ops || !ops->transfer)
-		return -ENOSYS;
+	return mipi_dsi_device_transfer(dsi, &msg);
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_read);
 
-	if (dsi->mode_flags & MIPI_DSI_MODE_LPM)
-		msg.flags = MIPI_DSI_MSG_USE_LPM;
+/**
+ * mipi_dsi_dcs_nop() - send DCS nop packet
+ * @dsi: DSI peripheral device
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int mipi_dsi_dcs_nop(struct mipi_dsi_device *dsi)
+{
+	ssize_t err;
+
+	err = mipi_dsi_dcs_write(dsi, MIPI_DCS_NOP, NULL, 0);
+	if (err < 0)
+		return err;
 
-	return ops->transfer(dsi->host, &msg);
+	return 0;
 }
-EXPORT_SYMBOL(mipi_dsi_dcs_read);
+EXPORT_SYMBOL(mipi_dsi_dcs_nop);
+
+/**
+ * mipi_dsi_dcs_soft_reset() - perform a software reset of the display module
+ * @dsi: DSI peripheral device
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int mipi_dsi_dcs_soft_reset(struct mipi_dsi_device *dsi)
+{
+	ssize_t err;
+
+	err = mipi_dsi_dcs_write(dsi, MIPI_DCS_SOFT_RESET, NULL, 0);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_soft_reset);
+
+/**
+ * mipi_dsi_dcs_get_power_mode() - query the display module's current power
+ *    mode
+ * @dsi: DSI peripheral device
+ * @mode: return location for the current power mode
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int mipi_dsi_dcs_get_power_mode(struct mipi_dsi_device *dsi, u8 *mode)
+{
+	ssize_t err;
+
+	err = mipi_dsi_dcs_read(dsi, MIPI_DCS_GET_POWER_MODE, mode,
+				sizeof(*mode));
+	if (err <= 0) {
+		if (err == 0)
+			err = -ENODATA;
+
+		return err;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_get_power_mode);
+
+/**
+ * mipi_dsi_dcs_get_pixel_format() - gets the pixel format for the RGB image
+ *    data used by the interface
+ * @dsi: DSI peripheral device
+ * @format: return location for the pixel format
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int mipi_dsi_dcs_get_pixel_format(struct mipi_dsi_device *dsi, u8 *format)
+{
+	ssize_t err;
+
+	err = mipi_dsi_dcs_read(dsi, MIPI_DCS_GET_PIXEL_FORMAT, format,
+				sizeof(*format));
+	if (err <= 0) {
+		if (err == 0)
+			err = -ENODATA;
+
+		return err;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_get_pixel_format);
+
+/**
+ * mipi_dsi_dcs_enter_sleep_mode() - disable all unnecessary blocks inside the
+ *    display module except interface communication
+ * @dsi: DSI peripheral device
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int mipi_dsi_dcs_enter_sleep_mode(struct mipi_dsi_device *dsi)
+{
+	ssize_t err;
+
+	err = mipi_dsi_dcs_write(dsi, MIPI_DCS_ENTER_SLEEP_MODE, NULL, 0);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_enter_sleep_mode);
+
+/**
+ * mipi_dsi_dcs_exit_sleep_mode() - enable all blocks inside the display
+ *    module
+ * @dsi: DSI peripheral device
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int mipi_dsi_dcs_exit_sleep_mode(struct mipi_dsi_device *dsi)
+{
+	ssize_t err;
+
+	err = mipi_dsi_dcs_write(dsi, MIPI_DCS_EXIT_SLEEP_MODE, NULL, 0);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_exit_sleep_mode);
+
+/**
+ * mipi_dsi_dcs_set_display_off() - stop displaying the image data on the
+ *    display device
+ * @dsi: DSI peripheral device
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int mipi_dsi_dcs_set_display_off(struct mipi_dsi_device *dsi)
+{
+	ssize_t err;
+
+	err = mipi_dsi_dcs_write(dsi, MIPI_DCS_SET_DISPLAY_OFF, NULL, 0);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_set_display_off);
+
+/**
+ * mipi_dsi_dcs_set_display_on() - start displaying the image data on the
+ *    display device
+ * @dsi: DSI peripheral device
+ *
+ * Return: 0 on success or a negative error code on failure
+ */
+int mipi_dsi_dcs_set_display_on(struct mipi_dsi_device *dsi)
+{
+	ssize_t err;
+
+	err = mipi_dsi_dcs_write(dsi, MIPI_DCS_SET_DISPLAY_ON, NULL, 0);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_set_display_on);
+
+/**
+ * mipi_dsi_dcs_set_column_address() - define the column extent of the frame
+ *    memory accessed by the host processor
+ * @dsi: DSI peripheral device
+ * @start: first column of frame memory
+ * @end: last column of frame memory
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int mipi_dsi_dcs_set_column_address(struct mipi_dsi_device *dsi, u16 start,
+				    u16 end)
+{
+	u8 payload[4] = { start >> 8, start & 0xff, end >> 8, end & 0xff };
+	ssize_t err;
+
+	err = mipi_dsi_dcs_write(dsi, MIPI_DCS_SET_COLUMN_ADDRESS, payload,
+				 sizeof(payload));
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_set_column_address);
+
+/**
+ * mipi_dsi_dcs_set_page_address() - define the page extent of the frame
+ *    memory accessed by the host processor
+ * @dsi: DSI peripheral device
+ * @start: first page of frame memory
+ * @end: last page of frame memory
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int mipi_dsi_dcs_set_page_address(struct mipi_dsi_device *dsi, u16 start,
+				  u16 end)
+{
+	u8 payload[4] = { start >> 8, start & 0xff, end >> 8, end & 0xff };
+	ssize_t err;
+
+	err = mipi_dsi_dcs_write(dsi, MIPI_DCS_SET_PAGE_ADDRESS, payload,
+				 sizeof(payload));
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_set_page_address);
+
+/**
+ * mipi_dsi_dcs_set_tear_off() - turn off the display module's Tearing Effect
+ *    output signal on the TE signal line
+ * @dsi: DSI peripheral device
+ *
+ * Return: 0 on success or a negative error code on failure
+ */
+int mipi_dsi_dcs_set_tear_off(struct mipi_dsi_device *dsi)
+{
+	ssize_t err;
+
+	err = mipi_dsi_dcs_write(dsi, MIPI_DCS_SET_TEAR_OFF, NULL, 0);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_set_tear_off);
+
+/**
+ * mipi_dsi_dcs_set_tear_on() - turn on the display module's Tearing Effect
+ *    output signal on the TE signal line.
+ * @dsi: DSI peripheral device
+ * @mode: the Tearing Effect Output Line mode
+ *
+ * Return: 0 on success or a negative error code on failure
+ */
+int mipi_dsi_dcs_set_tear_on(struct mipi_dsi_device *dsi,
+			     enum mipi_dsi_dcs_tear_mode mode)
+{
+	u8 value = mode;
+	ssize_t err;
+
+	err = mipi_dsi_dcs_write(dsi, MIPI_DCS_SET_TEAR_ON, &value,
+				 sizeof(value));
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_set_tear_on);
+
+/**
+ * mipi_dsi_dcs_set_pixel_format() - sets the pixel format for the RGB image
+ *    data used by the interface
+ * @dsi: DSI peripheral device
+ * @format: pixel format
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int mipi_dsi_dcs_set_pixel_format(struct mipi_dsi_device *dsi, u8 format)
+{
+	ssize_t err;
+
+	err = mipi_dsi_dcs_write(dsi, MIPI_DCS_SET_PIXEL_FORMAT, &format,
+				 sizeof(format));
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL(mipi_dsi_dcs_set_pixel_format);
 
 static int mipi_dsi_drv_probe(struct device *dev)
 {
@@ -295,12 +887,18 @@ static void mipi_dsi_drv_shutdown(struct device *dev)
 }
 
 /**
- * mipi_dsi_driver_register - register a driver for DSI devices
+ * mipi_dsi_driver_register_full() - register a driver for DSI devices
  * @drv: DSI driver structure
+ * @owner: owner module
+ *
+ * Return: 0 on success or a negative error code on failure.
  */
-int mipi_dsi_driver_register(struct mipi_dsi_driver *drv)
+int mipi_dsi_driver_register_full(struct mipi_dsi_driver *drv,
+				  struct module *owner)
 {
 	drv->driver.bus = &mipi_dsi_bus_type;
+	drv->driver.owner = owner;
+
 	if (drv->probe)
 		drv->driver.probe = mipi_dsi_drv_probe;
 	if (drv->remove)
@@ -310,11 +908,13 @@ int mipi_dsi_driver_register(struct mipi_dsi_driver *drv)
 
 	return driver_register(&drv->driver);
 }
-EXPORT_SYMBOL(mipi_dsi_driver_register);
+EXPORT_SYMBOL(mipi_dsi_driver_register_full);
 
 /**
- * mipi_dsi_driver_unregister - unregister a driver for DSI devices
+ * mipi_dsi_driver_unregister() - unregister a driver for DSI devices
  * @drv: DSI driver structure
+ *
+ * Return: 0 on success or a negative error code on failure.
  */
 void mipi_dsi_driver_unregister(struct mipi_dsi_driver *drv)
 {
diff --git a/drivers/gpu/drm/drm_modes.c b/drivers/gpu/drm/drm_modes.c
index d1b7d2006529..6d8b941c8200 100644
--- a/drivers/gpu/drm/drm_modes.c
+++ b/drivers/gpu/drm/drm_modes.c
@@ -914,7 +914,7 @@ EXPORT_SYMBOL(drm_mode_equal_no_clocks_no_stereo);
  *
  * This function is a helper which can be used to validate modes against size
  * limitations of the DRM device/connector. If a mode is too big its status
- * memeber is updated with the appropriate validation failure code. The list
+ * member is updated with the appropriate validation failure code. The list
  * itself is not changed.
  */
 void drm_mode_validate_size(struct drm_device *dev,
diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c
index 474e4d12a2d8..51cc47d827d8 100644
--- a/drivers/gpu/drm/drm_modeset_lock.c
+++ b/drivers/gpu/drm/drm_modeset_lock.c
@@ -157,14 +157,20 @@ void drm_modeset_unlock_all(struct drm_device *dev)
 EXPORT_SYMBOL(drm_modeset_unlock_all);
 
 /**
- * drm_modeset_lock_crtc - lock crtc with hidden acquire ctx
- * @crtc: drm crtc
+ * drm_modeset_lock_crtc - lock crtc with hidden acquire ctx for a plane update
+ * @crtc: DRM CRTC
+ * @plane: DRM plane to be updated on @crtc
+ *
+ * This function locks the given crtc and plane (which should be either the
+ * primary or cursor plane) using a hidden acquire context. This is necessary so
+ * that drivers internally using the atomic interfaces can grab further locks
+ * with the lock acquire context.
  *
- * This function locks the given crtc using a hidden acquire context. This is
- * necessary so that drivers internally using the atomic interfaces can grab
- * further locks with the lock acquire context.
+ * Note that @plane can be NULL, e.g. when the cursor support hasn't yet been
+ * converted to universal planes yet.
  */
-void drm_modeset_lock_crtc(struct drm_crtc *crtc)
+void drm_modeset_lock_crtc(struct drm_crtc *crtc,
+			   struct drm_plane *plane)
 {
 	struct drm_modeset_acquire_ctx *ctx;
 	int ret;
@@ -180,6 +186,18 @@ retry:
 	if (ret)
 		goto fail;
 
+	if (plane) {
+		ret = drm_modeset_lock(&plane->mutex, ctx);
+		if (ret)
+			goto fail;
+
+		if (plane->crtc) {
+			ret = drm_modeset_lock(&plane->crtc->mutex, ctx);
+			if (ret)
+				goto fail;
+		}
+	}
+
 	WARN_ON(crtc->acquire_ctx);
 
 	/* now we hold the locks, so now that it is safe, stash the
@@ -437,15 +455,14 @@ void drm_modeset_unlock(struct drm_modeset_lock *lock)
 }
 EXPORT_SYMBOL(drm_modeset_unlock);
 
-/* Temporary.. until we have sufficiently fine grained locking, there
- * are a couple scenarios where it is convenient to grab all crtc locks.
- * It is planned to remove this:
- */
+/* In some legacy codepaths it's convenient to just grab all the crtc and plane
+ * related locks. */
 int drm_modeset_lock_all_crtcs(struct drm_device *dev,
 		struct drm_modeset_acquire_ctx *ctx)
 {
 	struct drm_mode_config *config = &dev->mode_config;
 	struct drm_crtc *crtc;
+	struct drm_plane *plane;
 	int ret = 0;
 
 	list_for_each_entry(crtc, &config->crtc_list, head) {
@@ -454,6 +471,12 @@ int drm_modeset_lock_all_crtcs(struct drm_device *dev,
 			return ret;
 	}
 
+	list_for_each_entry(plane, &config->plane_list, head) {
+		ret = drm_modeset_lock(&plane->mutex, ctx);
+		if (ret)
+			return ret;
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL(drm_modeset_lock_all_crtcs);
diff --git a/drivers/gpu/drm/drm_plane_helper.c b/drivers/gpu/drm/drm_plane_helper.c
index 827ec1a3040b..18a1ac6ac22f 100644
--- a/drivers/gpu/drm/drm_plane_helper.c
+++ b/drivers/gpu/drm/drm_plane_helper.c
@@ -27,10 +27,38 @@
 #include <drm/drmP.h>
 #include <drm/drm_plane_helper.h>
 #include <drm/drm_rect.h>
-#include <drm/drm_plane_helper.h>
+#include <drm/drm_atomic.h>
+#include <drm/drm_crtc_helper.h>
+#include <drm/drm_atomic_helper.h>
 
 #define SUBPIXEL_MASK 0xffff
 
+/**
+ * DOC: overview
+ *
+ * This helper library has two parts. The first part has support to implement
+ * primary plane support on top of the normal CRTC configuration interface.
+ * Since the legacy ->set_config interface ties the primary plane together with
+ * the CRTC state this does not allow userspace to disable the primary plane
+ * itself.  To avoid too much duplicated code use
+ * drm_plane_helper_check_update() which can be used to enforce the same
+ * restrictions as primary planes had thus. The default primary plane only
+ * expose XRBG8888 and ARGB8888 as valid pixel formats for the attached
+ * framebuffer.
+ *
+ * Drivers are highly recommended to implement proper support for primary
+ * planes, and newly merged drivers must not rely upon these transitional
+ * helpers.
+ *
+ * The second part also implements transitional helpers which allow drivers to
+ * gradually switch to the atomic helper infrastructure for plane updates. Once
+ * that switch is complete drivers shouldn't use these any longer, instead using
+ * the proper legacy implementations for update and disable plane hooks provided
+ * by the atomic helpers.
+ *
+ * Again drivers are strongly urged to switch to the new interfaces.
+ */
+
 /*
  * This is the minimal list of formats that seem to be safe for modeset use
  * with all current DRM drivers.  Most hardware can actually support more
@@ -127,6 +155,11 @@ int drm_plane_helper_check_update(struct drm_plane *plane,
 		return -ERANGE;
 	}
 
+	if (!fb) {
+		*visible = false;
+		return 0;
+	}
+
 	*visible = drm_rect_clip_scaled(src, dest, clip, hscale, vscale);
 	if (!*visible)
 		/*
@@ -369,3 +402,171 @@ int drm_crtc_init(struct drm_device *dev, struct drm_crtc *crtc,
 	return drm_crtc_init_with_planes(dev, crtc, primary, NULL, funcs);
 }
 EXPORT_SYMBOL(drm_crtc_init);
+
+int drm_plane_helper_commit(struct drm_plane *plane,
+			    struct drm_plane_state *plane_state,
+			    struct drm_framebuffer *old_fb)
+{
+	struct drm_plane_helper_funcs *plane_funcs;
+	struct drm_crtc *crtc[2];
+	struct drm_crtc_helper_funcs *crtc_funcs[2];
+	int i, ret = 0;
+
+	plane_funcs = plane->helper_private;
+
+	/* Since this is a transitional helper we can't assume that plane->state
+	 * is always valid. Hence we need to use plane->crtc instead of
+	 * plane->state->crtc as the old crtc. */
+	crtc[0] = plane->crtc;
+	crtc[1] = crtc[0] != plane_state->crtc ? plane_state->crtc : NULL;
+
+	for (i = 0; i < 2; i++)
+		crtc_funcs[i] = crtc[i] ? crtc[i]->helper_private : NULL;
+
+	if (plane_funcs->atomic_check) {
+		ret = plane_funcs->atomic_check(plane, plane_state);
+		if (ret)
+			goto out;
+	}
+
+	if (plane_funcs->prepare_fb && plane_state->fb) {
+		ret = plane_funcs->prepare_fb(plane, plane_state->fb);
+		if (ret)
+			goto out;
+	}
+
+	/* Point of no return, commit sw state. */
+	swap(plane->state, plane_state);
+
+	for (i = 0; i < 2; i++) {
+		if (crtc_funcs[i] && crtc_funcs[i]->atomic_begin)
+			crtc_funcs[i]->atomic_begin(crtc[i]);
+	}
+
+	plane_funcs->atomic_update(plane, plane_state);
+
+	for (i = 0; i < 2; i++) {
+		if (crtc_funcs[i] && crtc_funcs[i]->atomic_flush)
+			crtc_funcs[i]->atomic_flush(crtc[i]);
+	}
+
+	for (i = 0; i < 2; i++) {
+		if (!crtc[i])
+			continue;
+
+		/* There's no other way to figure out whether the crtc is running. */
+		ret = drm_crtc_vblank_get(crtc[i]);
+		if (ret == 0) {
+			drm_crtc_wait_one_vblank(crtc[i]);
+			drm_crtc_vblank_put(crtc[i]);
+		}
+
+		ret = 0;
+	}
+
+	if (plane_funcs->cleanup_fb && old_fb)
+		plane_funcs->cleanup_fb(plane, old_fb);
+out:
+	if (plane_state) {
+		if (plane->funcs->atomic_destroy_state)
+			plane->funcs->atomic_destroy_state(plane, plane_state);
+		else
+			drm_atomic_helper_plane_destroy_state(plane, plane_state);
+	}
+
+	return ret;
+}
+
+/**
+ * drm_plane_helper_update() - Helper for primary plane update
+ * @plane: plane object to update
+ * @crtc: owning CRTC of owning plane
+ * @fb: framebuffer to flip onto plane
+ * @crtc_x: x offset of primary plane on crtc
+ * @crtc_y: y offset of primary plane on crtc
+ * @crtc_w: width of primary plane rectangle on crtc
+ * @crtc_h: height of primary plane rectangle on crtc
+ * @src_x: x offset of @fb for panning
+ * @src_y: y offset of @fb for panning
+ * @src_w: width of source rectangle in @fb
+ * @src_h: height of source rectangle in @fb
+ *
+ * Provides a default plane update handler using the atomic plane update
+ * functions. It is fully left to the driver to check plane constraints and
+ * handle corner-cases like a fully occluded or otherwise invisible plane.
+ *
+ * This is useful for piecewise transitioning of a driver to the atomic helpers.
+ *
+ * RETURNS:
+ * Zero on success, error code on failure
+ */
+int drm_plane_helper_update(struct drm_plane *plane, struct drm_crtc *crtc,
+			    struct drm_framebuffer *fb,
+			    int crtc_x, int crtc_y,
+			    unsigned int crtc_w, unsigned int crtc_h,
+			    uint32_t src_x, uint32_t src_y,
+			    uint32_t src_w, uint32_t src_h)
+{
+	struct drm_plane_state *plane_state;
+
+	if (plane->funcs->atomic_duplicate_state)
+		plane_state = plane->funcs->atomic_duplicate_state(plane);
+	else if (plane->state)
+		plane_state = drm_atomic_helper_plane_duplicate_state(plane);
+	else
+		plane_state = kzalloc(sizeof(*plane_state), GFP_KERNEL);
+	if (!plane_state)
+		return -ENOMEM;
+
+	plane_state->crtc = crtc;
+	drm_atomic_set_fb_for_plane(plane_state, fb);
+	plane_state->crtc_x = crtc_x;
+	plane_state->crtc_y = crtc_y;
+	plane_state->crtc_h = crtc_h;
+	plane_state->crtc_w = crtc_w;
+	plane_state->src_x = src_x;
+	plane_state->src_y = src_y;
+	plane_state->src_h = src_h;
+	plane_state->src_w = src_w;
+
+	return drm_plane_helper_commit(plane, plane_state, plane->fb);
+}
+EXPORT_SYMBOL(drm_plane_helper_update);
+
+/**
+ * drm_plane_helper_disable() - Helper for primary plane disable
+ * @plane: plane to disable
+ *
+ * Provides a default plane disable handler using the atomic plane update
+ * functions. It is fully left to the driver to check plane constraints and
+ * handle corner-cases like a fully occluded or otherwise invisible plane.
+ *
+ * This is useful for piecewise transitioning of a driver to the atomic helpers.
+ *
+ * RETURNS:
+ * Zero on success, error code on failure
+ */
+int drm_plane_helper_disable(struct drm_plane *plane)
+{
+	struct drm_plane_state *plane_state;
+
+	/* crtc helpers love to call disable functions for already disabled hw
+	 * functions. So cope with that. */
+	if (!plane->crtc)
+		return 0;
+
+	if (plane->funcs->atomic_duplicate_state)
+		plane_state = plane->funcs->atomic_duplicate_state(plane);
+	else if (plane->state)
+		plane_state = drm_atomic_helper_plane_duplicate_state(plane);
+	else
+		plane_state = kzalloc(sizeof(*plane_state), GFP_KERNEL);
+	if (!plane_state)
+		return -ENOMEM;
+
+	plane_state->crtc = NULL;
+	drm_atomic_set_fb_for_plane(plane_state, NULL);
+
+	return drm_plane_helper_commit(plane, plane_state, plane->fb);
+}
+EXPORT_SYMBOL(drm_plane_helper_disable);
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 78ca30808422..7482b06cd08f 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -328,7 +328,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops =  {
  */
 
 /**
- * drm_gem_prime_export - helper library implemention of the export callback
+ * drm_gem_prime_export - helper library implementation of the export callback
  * @dev: drm_device to export from
  * @obj: GEM object to export
  * @flags: flags like DRM_CLOEXEC
@@ -483,7 +483,7 @@ out_unlock:
 EXPORT_SYMBOL(drm_gem_prime_handle_to_fd);
 
 /**
- * drm_gem_prime_import - helper library implemention of the import callback
+ * drm_gem_prime_import - helper library implementation of the import callback
  * @dev: drm_device to import into
  * @dma_buf: dma-buf object to import
  *
@@ -669,7 +669,7 @@ int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
  * the driver is responsible for mapping the pages into the
  * importers address space for use with dma_buf itself.
  */
-struct sg_table *drm_prime_pages_to_sg(struct page **pages, int nr_pages)
+struct sg_table *drm_prime_pages_to_sg(struct page **pages, unsigned int nr_pages)
 {
 	struct sg_table *sg = NULL;
 	int ret;
diff --git a/drivers/gpu/drm/drm_probe_helper.c b/drivers/gpu/drm/drm_probe_helper.c
index 6857e9ad6339..7483a47de8e4 100644
--- a/drivers/gpu/drm/drm_probe_helper.c
+++ b/drivers/gpu/drm/drm_probe_helper.c
@@ -118,7 +118,8 @@ static int drm_helper_probe_single_connector_modes_merge_bits(struct drm_connect
 		mode->status = MODE_UNVERIFIED;
 
 	if (connector->force) {
-		if (connector->force == DRM_FORCE_ON)
+		if (connector->force == DRM_FORCE_ON ||
+		    connector->force == DRM_FORCE_ON_DIGITAL)
 			connector->status = connector_status_connected;
 		else
 			connector->status = connector_status_disconnected;
diff --git a/drivers/gpu/drm/exynos/exynos_dp_core.c b/drivers/gpu/drm/exynos/exynos_dp_core.c
index 6adb1e5cfb08..34d46aa75416 100644
--- a/drivers/gpu/drm/exynos/exynos_dp_core.c
+++ b/drivers/gpu/drm/exynos/exynos_dp_core.c
@@ -30,12 +30,17 @@
 #include <drm/drm_panel.h>
 #include <drm/bridge/ptn3460.h>
 
-#include "exynos_drm_drv.h"
 #include "exynos_dp_core.h"
 
 #define ctx_from_connector(c)	container_of(c, struct exynos_dp_device, \
 					connector)
 
+static inline struct exynos_dp_device *
+display_to_dp(struct exynos_drm_display *d)
+{
+	return container_of(d, struct exynos_dp_device, display);
+}
+
 struct bridge_init {
 	struct i2c_client *client;
 	struct device_node *node;
@@ -882,7 +887,7 @@ static void exynos_dp_hotplug(struct work_struct *work)
 
 static void exynos_dp_commit(struct exynos_drm_display *display)
 {
-	struct exynos_dp_device *dp = display->ctx;
+	struct exynos_dp_device *dp = display_to_dp(display);
 	int ret;
 
 	/* Keep the panel disabled while we configure video */
@@ -1020,7 +1025,7 @@ static int exynos_drm_attach_lcd_bridge(struct drm_device *dev,
 static int exynos_dp_create_connector(struct exynos_drm_display *display,
 				struct drm_encoder *encoder)
 {
-	struct exynos_dp_device *dp = display->ctx;
+	struct exynos_dp_device *dp = display_to_dp(display);
 	struct drm_connector *connector = &dp->connector;
 	int ret;
 
@@ -1052,33 +1057,19 @@ static int exynos_dp_create_connector(struct exynos_drm_display *display,
 
 static void exynos_dp_phy_init(struct exynos_dp_device *dp)
 {
-	if (dp->phy) {
+	if (dp->phy)
 		phy_power_on(dp->phy);
-	} else if (dp->phy_addr) {
-		u32 reg;
-
-		reg = __raw_readl(dp->phy_addr);
-		reg |= dp->enable_mask;
-		__raw_writel(reg, dp->phy_addr);
-	}
 }
 
 static void exynos_dp_phy_exit(struct exynos_dp_device *dp)
 {
-	if (dp->phy) {
+	if (dp->phy)
 		phy_power_off(dp->phy);
-	} else if (dp->phy_addr) {
-		u32 reg;
-
-		reg = __raw_readl(dp->phy_addr);
-		reg &= ~(dp->enable_mask);
-		__raw_writel(reg, dp->phy_addr);
-	}
 }
 
 static void exynos_dp_poweron(struct exynos_drm_display *display)
 {
-	struct exynos_dp_device *dp = display->ctx;
+	struct exynos_dp_device *dp = display_to_dp(display);
 
 	if (dp->dpms_mode == DRM_MODE_DPMS_ON)
 		return;
@@ -1099,7 +1090,7 @@ static void exynos_dp_poweron(struct exynos_drm_display *display)
 
 static void exynos_dp_poweroff(struct exynos_drm_display *display)
 {
-	struct exynos_dp_device *dp = display->ctx;
+	struct exynos_dp_device *dp = display_to_dp(display);
 
 	if (dp->dpms_mode != DRM_MODE_DPMS_ON)
 		return;
@@ -1124,7 +1115,7 @@ static void exynos_dp_poweroff(struct exynos_drm_display *display)
 
 static void exynos_dp_dpms(struct exynos_drm_display *display, int mode)
 {
-	struct exynos_dp_device *dp = display->ctx;
+	struct exynos_dp_device *dp = display_to_dp(display);
 
 	switch (mode) {
 	case DRM_MODE_DPMS_ON:
@@ -1147,11 +1138,6 @@ static struct exynos_drm_display_ops exynos_dp_display_ops = {
 	.commit = exynos_dp_commit,
 };
 
-static struct exynos_drm_display exynos_dp_display = {
-	.type = EXYNOS_DISPLAY_TYPE_LCD,
-	.ops = &exynos_dp_display_ops,
-};
-
 static struct video_info *exynos_dp_dt_parse_pdata(struct device *dev)
 {
 	struct device_node *dp_node = dev->of_node;
@@ -1210,44 +1196,6 @@ static struct video_info *exynos_dp_dt_parse_pdata(struct device *dev)
 	return dp_video_config;
 }
 
-static int exynos_dp_dt_parse_phydata(struct exynos_dp_device *dp)
-{
-	struct device_node *dp_phy_node = of_node_get(dp->dev->of_node);
-	u32 phy_base;
-	int ret = 0;
-
-	dp_phy_node = of_find_node_by_name(dp_phy_node, "dptx-phy");
-	if (!dp_phy_node) {
-		dp->phy = devm_phy_get(dp->dev, "dp");
-		return PTR_ERR_OR_ZERO(dp->phy);
-	}
-
-	if (of_property_read_u32(dp_phy_node, "reg", &phy_base)) {
-		dev_err(dp->dev, "failed to get reg for dptx-phy\n");
-		ret = -EINVAL;
-		goto err;
-	}
-
-	if (of_property_read_u32(dp_phy_node, "samsung,enable-mask",
-				&dp->enable_mask)) {
-		dev_err(dp->dev, "failed to get enable-mask for dptx-phy\n");
-		ret = -EINVAL;
-		goto err;
-	}
-
-	dp->phy_addr = ioremap(phy_base, SZ_4);
-	if (!dp->phy_addr) {
-		dev_err(dp->dev, "failed to ioremap dp-phy\n");
-		ret = -ENOMEM;
-		goto err;
-	}
-
-err:
-	of_node_put(dp_phy_node);
-
-	return ret;
-}
-
 static int exynos_dp_dt_parse_panel(struct exynos_dp_device *dp)
 {
 	int ret;
@@ -1263,10 +1211,10 @@ static int exynos_dp_dt_parse_panel(struct exynos_dp_device *dp)
 
 static int exynos_dp_bind(struct device *dev, struct device *master, void *data)
 {
+	struct exynos_dp_device *dp = dev_get_drvdata(dev);
 	struct platform_device *pdev = to_platform_device(dev);
 	struct drm_device *drm_dev = data;
 	struct resource *res;
-	struct exynos_dp_device *dp = exynos_dp_display.ctx;
 	unsigned int irq_flags;
 	int ret = 0;
 
@@ -1277,9 +1225,21 @@ static int exynos_dp_bind(struct device *dev, struct device *master, void *data)
 	if (IS_ERR(dp->video_info))
 		return PTR_ERR(dp->video_info);
 
-	ret = exynos_dp_dt_parse_phydata(dp);
-	if (ret)
-		return ret;
+	dp->phy = devm_phy_get(dp->dev, "dp");
+	if (IS_ERR(dp->phy)) {
+		dev_err(dp->dev, "no DP phy configured\n");
+		ret = PTR_ERR(dp->phy);
+		if (ret) {
+			/*
+			 * phy itself is not enabled, so we can move forward
+			 * assigning NULL to phy pointer.
+			 */
+			if (ret == -ENOSYS || ret == -ENODEV)
+				dp->phy = NULL;
+			else
+				return ret;
+		}
+	}
 
 	if (!dp->panel) {
 		ret = exynos_dp_dt_parse_panel(dp);
@@ -1346,17 +1306,15 @@ static int exynos_dp_bind(struct device *dev, struct device *master, void *data)
 
 	dp->drm_dev = drm_dev;
 
-	platform_set_drvdata(pdev, &exynos_dp_display);
-
-	return exynos_drm_create_enc_conn(drm_dev, &exynos_dp_display);
+	return exynos_drm_create_enc_conn(drm_dev, &dp->display);
 }
 
 static void exynos_dp_unbind(struct device *dev, struct device *master,
 				void *data)
 {
-	struct exynos_drm_display *display = dev_get_drvdata(dev);
+	struct exynos_dp_device *dp = dev_get_drvdata(dev);
 
-	exynos_dp_dpms(display, DRM_MODE_DPMS_OFF);
+	exynos_dp_dpms(&dp->display, DRM_MODE_DPMS_OFF);
 }
 
 static const struct component_ops exynos_dp_ops = {
@@ -1371,16 +1329,20 @@ static int exynos_dp_probe(struct platform_device *pdev)
 	struct exynos_dp_device *dp;
 	int ret;
 
-	ret = exynos_drm_component_add(&pdev->dev, EXYNOS_DEVICE_TYPE_CONNECTOR,
-					exynos_dp_display.type);
-	if (ret)
-		return ret;
-
 	dp = devm_kzalloc(&pdev->dev, sizeof(struct exynos_dp_device),
 				GFP_KERNEL);
 	if (!dp)
 		return -ENOMEM;
 
+	dp->display.type = EXYNOS_DISPLAY_TYPE_LCD;
+	dp->display.ops = &exynos_dp_display_ops;
+	platform_set_drvdata(pdev, dp);
+
+	ret = exynos_drm_component_add(&pdev->dev, EXYNOS_DEVICE_TYPE_CONNECTOR,
+					dp->display.type);
+	if (ret)
+		return ret;
+
 	panel_node = of_parse_phandle(dev->of_node, "panel", 0);
 	if (panel_node) {
 		dp->panel = of_drm_find_panel(panel_node);
@@ -1389,8 +1351,6 @@ static int exynos_dp_probe(struct platform_device *pdev)
 			return -EPROBE_DEFER;
 	}
 
-	exynos_dp_display.ctx = dp;
-
 	ret = component_add(&pdev->dev, &exynos_dp_ops);
 	if (ret)
 		exynos_drm_component_del(&pdev->dev,
@@ -1410,19 +1370,17 @@ static int exynos_dp_remove(struct platform_device *pdev)
 #ifdef CONFIG_PM_SLEEP
 static int exynos_dp_suspend(struct device *dev)
 {
-	struct platform_device *pdev = to_platform_device(dev);
-	struct exynos_drm_display *display = platform_get_drvdata(pdev);
+	struct exynos_dp_device *dp = dev_get_drvdata(dev);
 
-	exynos_dp_dpms(display, DRM_MODE_DPMS_OFF);
+	exynos_dp_dpms(&dp->display, DRM_MODE_DPMS_OFF);
 	return 0;
 }
 
 static int exynos_dp_resume(struct device *dev)
 {
-	struct platform_device *pdev = to_platform_device(dev);
-	struct exynos_drm_display *display = platform_get_drvdata(pdev);
+	struct exynos_dp_device *dp = dev_get_drvdata(dev);
 
-	exynos_dp_dpms(display, DRM_MODE_DPMS_ON);
+	exynos_dp_dpms(&dp->display, DRM_MODE_DPMS_ON);
 	return 0;
 }
 #endif
diff --git a/drivers/gpu/drm/exynos/exynos_dp_core.h b/drivers/gpu/drm/exynos/exynos_dp_core.h
index a1aee6931bd7..164f171168e7 100644
--- a/drivers/gpu/drm/exynos/exynos_dp_core.h
+++ b/drivers/gpu/drm/exynos/exynos_dp_core.h
@@ -17,6 +17,8 @@
 #include <drm/drm_dp_helper.h>
 #include <drm/exynos_drm.h>
 
+#include "exynos_drm_drv.h"
+
 #define DP_TIMEOUT_LOOP_COUNT 100
 #define MAX_CR_LOOP 5
 #define MAX_EQ_LOOP 5
@@ -145,6 +147,7 @@ struct link_train {
 };
 
 struct exynos_dp_device {
+	struct exynos_drm_display display;
 	struct device		*dev;
 	struct drm_device	*drm_dev;
 	struct drm_connector	connector;
@@ -153,8 +156,6 @@ struct exynos_dp_device {
 	struct clk		*clock;
 	unsigned int		irq;
 	void __iomem		*reg_base;
-	void __iomem		*phy_addr;
-	unsigned int		enable_mask;
 
 	struct video_info	*video_info;
 	struct link_train	link_train;
diff --git a/drivers/gpu/drm/exynos/exynos_drm_crtc.h b/drivers/gpu/drm/exynos/exynos_drm_crtc.h
index 690dcddab725..e353d353836f 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_crtc.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_crtc.h
@@ -15,10 +15,7 @@
 #ifndef _EXYNOS_DRM_CRTC_H_
 #define _EXYNOS_DRM_CRTC_H_
 
-struct drm_device;
-struct drm_crtc;
-struct exynos_drm_manager;
-struct exynos_drm_overlay;
+#include "exynos_drm_drv.h"
 
 int exynos_drm_crtc_create(struct exynos_drm_manager *manager);
 int exynos_drm_crtc_enable_vblank(struct drm_device *dev, int pipe);
diff --git a/drivers/gpu/drm/exynos/exynos_drm_dpi.c b/drivers/gpu/drm/exynos/exynos_drm_dpi.c
index 3dc678ed9949..37678cf4425a 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_dpi.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_dpi.c
@@ -22,6 +22,7 @@
 #include "exynos_drm_drv.h"
 
 struct exynos_dpi {
+	struct exynos_drm_display display;
 	struct device *dev;
 	struct device_node *panel_node;
 
@@ -35,6 +36,11 @@ struct exynos_dpi {
 
 #define connector_to_dpi(c) container_of(c, struct exynos_dpi, connector)
 
+static inline struct exynos_dpi *display_to_dpi(struct exynos_drm_display *d)
+{
+	return container_of(d, struct exynos_dpi, display);
+}
+
 static enum drm_connector_status
 exynos_dpi_detect(struct drm_connector *connector, bool force)
 {
@@ -100,7 +106,7 @@ static struct drm_connector_helper_funcs exynos_dpi_connector_helper_funcs = {
 static int exynos_dpi_create_connector(struct exynos_drm_display *display,
 				       struct drm_encoder *encoder)
 {
-	struct exynos_dpi *ctx = display->ctx;
+	struct exynos_dpi *ctx = display_to_dpi(display);
 	struct drm_connector *connector = &ctx->connector;
 	int ret;
 
@@ -141,7 +147,7 @@ static void exynos_dpi_poweroff(struct exynos_dpi *ctx)
 
 static void exynos_dpi_dpms(struct exynos_drm_display *display, int mode)
 {
-	struct exynos_dpi *ctx = display->ctx;
+	struct exynos_dpi *ctx = display_to_dpi(display);
 
 	switch (mode) {
 	case DRM_MODE_DPMS_ON:
@@ -165,11 +171,6 @@ static struct exynos_drm_display_ops exynos_dpi_display_ops = {
 	.dpms = exynos_dpi_dpms
 };
 
-static struct exynos_drm_display exynos_dpi_display = {
-	.type = EXYNOS_DISPLAY_TYPE_LCD,
-	.ops = &exynos_dpi_display_ops,
-};
-
 /* of_* functions will be removed after merge of of_graph patches */
 static struct device_node *
 of_get_child_by_name_reg(struct device_node *parent, const char *name, u32 reg)
@@ -299,20 +300,21 @@ struct exynos_drm_display *exynos_dpi_probe(struct device *dev)
 	struct exynos_dpi *ctx;
 	int ret;
 
-	ret = exynos_drm_component_add(dev,
-					EXYNOS_DEVICE_TYPE_CONNECTOR,
-					exynos_dpi_display.type);
-	if (ret)
-		return ERR_PTR(ret);
-
 	ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL);
 	if (!ctx)
-		goto err_del_component;
+		return ERR_PTR(-ENOMEM);
 
+	ctx->display.type = EXYNOS_DISPLAY_TYPE_LCD;
+	ctx->display.ops = &exynos_dpi_display_ops;
 	ctx->dev = dev;
-	exynos_dpi_display.ctx = ctx;
 	ctx->dpms_mode = DRM_MODE_DPMS_OFF;
 
+	ret = exynos_drm_component_add(dev,
+					EXYNOS_DEVICE_TYPE_CONNECTOR,
+					ctx->display.type);
+	if (ret)
+		return ERR_PTR(ret);
+
 	ret = exynos_dpi_parse_dt(ctx);
 	if (ret < 0) {
 		devm_kfree(dev, ctx);
@@ -328,7 +330,7 @@ struct exynos_drm_display *exynos_dpi_probe(struct device *dev)
 		}
 	}
 
-	return &exynos_dpi_display;
+	return &ctx->display;
 
 err_del_component:
 	exynos_drm_component_del(dev, EXYNOS_DEVICE_TYPE_CONNECTOR);
@@ -336,16 +338,16 @@ err_del_component:
 	return NULL;
 }
 
-int exynos_dpi_remove(struct device *dev)
+int exynos_dpi_remove(struct exynos_drm_display *display)
 {
-	struct exynos_dpi *ctx = exynos_dpi_display.ctx;
+	struct exynos_dpi *ctx = display_to_dpi(display);
 
-	exynos_dpi_dpms(&exynos_dpi_display, DRM_MODE_DPMS_OFF);
+	exynos_dpi_dpms(&ctx->display, DRM_MODE_DPMS_OFF);
 
 	if (ctx->panel)
 		drm_panel_detach(ctx->panel);
 
-	exynos_drm_component_del(dev, EXYNOS_DEVICE_TYPE_CONNECTOR);
+	exynos_drm_component_del(ctx->dev, EXYNOS_DEVICE_TYPE_CONNECTOR);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index e277d4f12812..121470a83d1a 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -203,8 +203,6 @@ static int exynos_drm_resume(struct drm_device *dev)
 	}
 	drm_modeset_unlock_all(dev);
 
-	drm_helper_resume_force_mode(dev);
-
 	return 0;
 }
 
@@ -475,8 +473,6 @@ void exynos_drm_component_del(struct device *dev,
 			list_del(&cdev->list);
 			kfree(cdev);
 		}
-
-		break;
 	}
 
 	mutex_unlock(&drm_component_lock);
@@ -556,182 +552,68 @@ static const struct component_master_ops exynos_drm_ops = {
 	.unbind		= exynos_drm_unbind,
 };
 
-static int exynos_drm_platform_probe(struct platform_device *pdev)
-{
-	struct component_match *match;
-	int ret;
-
-	pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
-	exynos_drm_driver.num_ioctls = ARRAY_SIZE(exynos_ioctls);
-
+static struct platform_driver *const exynos_drm_kms_drivers[] = {
 #ifdef CONFIG_DRM_EXYNOS_FIMD
-	ret = platform_driver_register(&fimd_driver);
-	if (ret < 0)
-		return ret;
+	&fimd_driver,
 #endif
-
 #ifdef CONFIG_DRM_EXYNOS_DP
-	ret = platform_driver_register(&dp_driver);
-	if (ret < 0)
-		goto err_unregister_fimd_drv;
+	&dp_driver,
 #endif
-
 #ifdef CONFIG_DRM_EXYNOS_DSI
-	ret = platform_driver_register(&dsi_driver);
-	if (ret < 0)
-		goto err_unregister_dp_drv;
+	&dsi_driver,
 #endif
-
 #ifdef CONFIG_DRM_EXYNOS_HDMI
-	ret = platform_driver_register(&mixer_driver);
-	if (ret < 0)
-		goto err_unregister_dsi_drv;
-	ret = platform_driver_register(&hdmi_driver);
-	if (ret < 0)
-		goto err_unregister_mixer_drv;
+	&mixer_driver,
+	&hdmi_driver,
 #endif
+};
 
-	match = exynos_drm_match_add(&pdev->dev);
-	if (IS_ERR(match)) {
-		ret = PTR_ERR(match);
-		goto err_unregister_hdmi_drv;
-	}
-
-	ret = component_master_add_with_match(&pdev->dev, &exynos_drm_ops,
-						match);
-	if (ret < 0)
-		goto err_unregister_hdmi_drv;
-
+static struct platform_driver *const exynos_drm_non_kms_drivers[] = {
 #ifdef CONFIG_DRM_EXYNOS_G2D
-	ret = platform_driver_register(&g2d_driver);
-	if (ret < 0)
-		goto err_del_component_master;
+	&g2d_driver,
 #endif
-
 #ifdef CONFIG_DRM_EXYNOS_FIMC
-	ret = platform_driver_register(&fimc_driver);
-	if (ret < 0)
-		goto err_unregister_g2d_drv;
+	&fimc_driver,
 #endif
-
 #ifdef CONFIG_DRM_EXYNOS_ROTATOR
-	ret = platform_driver_register(&rotator_driver);
-	if (ret < 0)
-		goto err_unregister_fimc_drv;
+	&rotator_driver,
 #endif
-
 #ifdef CONFIG_DRM_EXYNOS_GSC
-	ret = platform_driver_register(&gsc_driver);
-	if (ret < 0)
-		goto err_unregister_rotator_drv;
-#endif
-
-#ifdef CONFIG_DRM_EXYNOS_IPP
-	ret = platform_driver_register(&ipp_driver);
-	if (ret < 0)
-		goto err_unregister_gsc_drv;
-
-	ret = exynos_platform_device_ipp_register();
-	if (ret < 0)
-		goto err_unregister_ipp_drv;
+	&gsc_driver,
 #endif
-
-	return ret;
-
 #ifdef CONFIG_DRM_EXYNOS_IPP
-err_unregister_ipp_drv:
-	platform_driver_unregister(&ipp_driver);
-err_unregister_gsc_drv:
-#endif
-
-#ifdef CONFIG_DRM_EXYNOS_GSC
-	platform_driver_unregister(&gsc_driver);
-err_unregister_rotator_drv:
+	&ipp_driver,
 #endif
+};
 
-#ifdef CONFIG_DRM_EXYNOS_ROTATOR
-	platform_driver_unregister(&rotator_driver);
-err_unregister_fimc_drv:
-#endif
-
-#ifdef CONFIG_DRM_EXYNOS_FIMC
-	platform_driver_unregister(&fimc_driver);
-err_unregister_g2d_drv:
-#endif
-
-#ifdef CONFIG_DRM_EXYNOS_G2D
-	platform_driver_unregister(&g2d_driver);
-err_del_component_master:
-#endif
-	component_master_del(&pdev->dev, &exynos_drm_ops);
-
-err_unregister_hdmi_drv:
-#ifdef CONFIG_DRM_EXYNOS_HDMI
-	platform_driver_unregister(&hdmi_driver);
-err_unregister_mixer_drv:
-	platform_driver_unregister(&mixer_driver);
-err_unregister_dsi_drv:
-#endif
+static int exynos_drm_platform_probe(struct platform_device *pdev)
+{
+	struct component_match *match;
 
-#ifdef CONFIG_DRM_EXYNOS_DSI
-	platform_driver_unregister(&dsi_driver);
-err_unregister_dp_drv:
-#endif
+	pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
+	exynos_drm_driver.num_ioctls = ARRAY_SIZE(exynos_ioctls);
 
-#ifdef CONFIG_DRM_EXYNOS_DP
-	platform_driver_unregister(&dp_driver);
-err_unregister_fimd_drv:
-#endif
+	match = exynos_drm_match_add(&pdev->dev);
+	if (IS_ERR(match)) {
+		return PTR_ERR(match);
+	}
 
-#ifdef CONFIG_DRM_EXYNOS_FIMD
-	platform_driver_unregister(&fimd_driver);
-#endif
-	return ret;
+	return component_master_add_with_match(&pdev->dev, &exynos_drm_ops,
+					       match);
 }
 
 static int exynos_drm_platform_remove(struct platform_device *pdev)
 {
-#ifdef CONFIG_DRM_EXYNOS_IPP
-	exynos_platform_device_ipp_unregister();
-	platform_driver_unregister(&ipp_driver);
-#endif
-
-#ifdef CONFIG_DRM_EXYNOS_GSC
-	platform_driver_unregister(&gsc_driver);
-#endif
-
-#ifdef CONFIG_DRM_EXYNOS_ROTATOR
-	platform_driver_unregister(&rotator_driver);
-#endif
-
-#ifdef CONFIG_DRM_EXYNOS_FIMC
-	platform_driver_unregister(&fimc_driver);
-#endif
-
-#ifdef CONFIG_DRM_EXYNOS_G2D
-	platform_driver_unregister(&g2d_driver);
-#endif
-
-#ifdef CONFIG_DRM_EXYNOS_HDMI
-	platform_driver_unregister(&mixer_driver);
-	platform_driver_unregister(&hdmi_driver);
-#endif
-
-#ifdef CONFIG_DRM_EXYNOS_FIMD
-	platform_driver_unregister(&fimd_driver);
-#endif
-
-#ifdef CONFIG_DRM_EXYNOS_DSI
-	platform_driver_unregister(&dsi_driver);
-#endif
-
-#ifdef CONFIG_DRM_EXYNOS_DP
-	platform_driver_unregister(&dp_driver);
-#endif
 	component_master_del(&pdev->dev, &exynos_drm_ops);
 	return 0;
 }
 
+static const char * const strings[] = {
+	"samsung,exynos3",
+	"samsung,exynos4",
+	"samsung,exynos5",
+};
+
 static struct platform_driver exynos_drm_platform_driver = {
 	.probe	= exynos_drm_platform_probe,
 	.remove	= exynos_drm_platform_remove,
@@ -743,7 +625,25 @@ static struct platform_driver exynos_drm_platform_driver = {
 
 static int exynos_drm_init(void)
 {
-	int ret;
+	bool is_exynos = false;
+	int ret, i, j;
+
+	/*
+	 * Register device object only in case of Exynos SoC.
+	 *
+	 * Below codes resolves temporarily infinite loop issue incurred
+	 * by Exynos drm driver when using multi-platform kernel.
+	 * So these codes will be replaced with more generic way later.
+	 */
+	for (i = 0; i < ARRAY_SIZE(strings); i++) {
+		if (of_machine_is_compatible(strings[i])) {
+			is_exynos = true;
+			break;
+		}
+	}
+
+	if (!is_exynos)
+		return -ENODEV;
 
 	/*
 	 * Register device object only in case of Exynos SoC.
@@ -762,24 +662,50 @@ static int exynos_drm_init(void)
 	if (IS_ERR(exynos_drm_pdev))
 		return PTR_ERR(exynos_drm_pdev);
 
-#ifdef CONFIG_DRM_EXYNOS_VIDI
 	ret = exynos_drm_probe_vidi();
 	if (ret < 0)
 		goto err_unregister_pd;
+
+	for (i = 0; i < ARRAY_SIZE(exynos_drm_kms_drivers); ++i) {
+		ret = platform_driver_register(exynos_drm_kms_drivers[i]);
+		if (ret < 0)
+			goto err_unregister_kms_drivers;
+	}
+
+	for (j = 0; j < ARRAY_SIZE(exynos_drm_non_kms_drivers); ++j) {
+		ret = platform_driver_register(exynos_drm_non_kms_drivers[j]);
+		if (ret < 0)
+			goto err_unregister_non_kms_drivers;
+	}
+
+#ifdef CONFIG_DRM_EXYNOS_IPP
+	ret = exynos_platform_device_ipp_register();
+	if (ret < 0)
+		goto err_unregister_non_kms_drivers;
 #endif
 
 	ret = platform_driver_register(&exynos_drm_platform_driver);
 	if (ret)
-		goto err_remove_vidi;
+		goto err_unregister_resources;
 
 	return 0;
 
-err_remove_vidi:
-#ifdef CONFIG_DRM_EXYNOS_VIDI
+err_unregister_resources:
+#ifdef CONFIG_DRM_EXYNOS_IPP
+	exynos_platform_device_ipp_unregister();
+#endif
+
+err_unregister_non_kms_drivers:
+	while (--j >= 0)
+		platform_driver_unregister(exynos_drm_non_kms_drivers[j]);
+
+err_unregister_kms_drivers:
+	while (--i >= 0)
+		platform_driver_unregister(exynos_drm_kms_drivers[i]);
+
 	exynos_drm_remove_vidi();
 
 err_unregister_pd:
-#endif
 	platform_device_unregister(exynos_drm_pdev);
 
 	return ret;
@@ -787,10 +713,22 @@ err_unregister_pd:
 
 static void exynos_drm_exit(void)
 {
+	int i;
+
+#ifdef CONFIG_DRM_EXYNOS_IPP
+	exynos_platform_device_ipp_unregister();
+#endif
+
+	for (i = ARRAY_SIZE(exynos_drm_non_kms_drivers) - 1; i >= 0; --i)
+		platform_driver_unregister(exynos_drm_non_kms_drivers[i]);
+
+	for (i = ARRAY_SIZE(exynos_drm_kms_drivers) - 1; i >= 0; --i)
+		platform_driver_unregister(exynos_drm_kms_drivers[i]);
+
 	platform_driver_unregister(&exynos_drm_platform_driver);
-#ifdef CONFIG_DRM_EXYNOS_VIDI
+
 	exynos_drm_remove_vidi();
-#endif
+
 	platform_device_unregister(exynos_drm_pdev);
 }
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.h b/drivers/gpu/drm/exynos/exynos_drm_drv.h
index d22e640f59a0..2e5063488c50 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.h
@@ -15,6 +15,7 @@
 #ifndef _EXYNOS_DRM_DRV_H_
 #define _EXYNOS_DRM_DRV_H_
 
+#include <drm/drmP.h>
 #include <linux/module.h>
 
 #define MAX_CRTC	3
@@ -22,24 +23,6 @@
 #define MAX_FB_BUFFER	4
 #define DEFAULT_ZPOS	-1
 
-#define _wait_for(COND, MS) ({ \
-	unsigned long timeout__ = jiffies + msecs_to_jiffies(MS);	\
-	int ret__ = 0;							\
-	while (!(COND)) {						\
-		if (time_after(jiffies, timeout__)) {			\
-			ret__ = -ETIMEDOUT;				\
-			break;						\
-		}							\
-	}								\
-	ret__;								\
-})
-
-#define wait_for(COND, MS) _wait_for(COND, MS)
-
-struct drm_device;
-struct exynos_drm_overlay;
-struct drm_connector;
-
 /* This enumerates device type. */
 enum exynos_drm_device_type {
 	EXYNOS_DEVICE_TYPE_NONE,
@@ -83,10 +66,10 @@ enum exynos_drm_output_type {
  * @dma_addr: array of bus(accessed by dma) address to the memory region
  *	      allocated for a overlay.
  * @zpos: order of overlay layer(z position).
- * @default_win: a window to be enabled.
- * @color_key: color key on or off.
  * @index_color: if using color key feature then this value would be used
  *			as index color.
+ * @default_win: a window to be enabled.
+ * @color_key: color key on or off.
  * @local_path: in case of lcd type, local path mode on or off.
  * @transparency: transparency on or off.
  * @activated: activated or not.
@@ -114,19 +97,20 @@ struct exynos_drm_overlay {
 	uint32_t pixel_format;
 	dma_addr_t dma_addr[MAX_FB_BUFFER];
 	int zpos;
-
-	bool default_win;
-	bool color_key;
 	unsigned int index_color;
-	bool local_path;
-	bool transparency;
-	bool activated;
+
+	bool default_win:1;
+	bool color_key:1;
+	bool local_path:1;
+	bool transparency:1;
+	bool activated:1;
 };
 
 /*
  * Exynos DRM Display Structure.
  *	- this structure is common to analog tv, digital tv and lcd panel.
  *
+ * @create_connector: initialize and register a new connector
  * @remove: cleans up the display for removal
  * @mode_fixup: fix mode data comparing to hw specific display mode.
  * @mode_set: convert drm_display_mode to hw specific display mode and
@@ -168,7 +152,6 @@ struct exynos_drm_display {
 	struct drm_encoder *encoder;
 	struct drm_connector *connector;
 	struct exynos_drm_display_ops *ops;
-	void *ctx;
 };
 
 /*
@@ -227,7 +210,6 @@ struct exynos_drm_manager {
 	struct drm_crtc *crtc;
 	int pipe;
 	struct exynos_drm_manager_ops *ops;
-	void *ctx;
 };
 
 struct exynos_drm_g2d_private {
@@ -279,8 +261,6 @@ struct exynos_drm_private {
  * @dev: pointer to device object for subdrv device driver.
  * @drm_dev: pointer to drm_device and this pointer would be set
  *	when sub driver calls exynos_drm_subdrv_register().
- * @manager: subdrv has its own manager to control a hardware appropriately
- *     and we can access a hardware drawing on this manager.
  * @probe: this callback would be called by exynos drm driver after
  *     subdrv is registered to it.
  * @remove: this callback is used to release resources created
@@ -312,45 +292,34 @@ int exynos_drm_device_subdrv_remove(struct drm_device *dev);
 int exynos_drm_subdrv_open(struct drm_device *dev, struct drm_file *file);
 void exynos_drm_subdrv_close(struct drm_device *dev, struct drm_file *file);
 
-/*
- * this function registers exynos drm hdmi platform device. It ensures only one
- * instance of the device is created.
- */
-int exynos_platform_device_hdmi_register(void);
-
-/*
- * this function unregisters exynos drm hdmi platform device if it exists.
- */
-void exynos_platform_device_hdmi_unregister(void);
-
-/*
- * this function registers exynos drm ipp platform device.
- */
+#ifdef CONFIG_DRM_EXYNOS_IPP
 int exynos_platform_device_ipp_register(void);
-
-/*
- * this function unregisters exynos drm ipp platform device if it exists.
- */
 void exynos_platform_device_ipp_unregister(void);
+#else
+static inline int exynos_platform_device_ipp_register(void) { return 0; }
+static inline void exynos_platform_device_ipp_unregister(void) {}
+#endif
+
 
 #ifdef CONFIG_DRM_EXYNOS_DPI
 struct exynos_drm_display * exynos_dpi_probe(struct device *dev);
-int exynos_dpi_remove(struct device *dev);
+int exynos_dpi_remove(struct exynos_drm_display *display);
 #else
 static inline struct exynos_drm_display *
 exynos_dpi_probe(struct device *dev) { return NULL; }
-static inline int exynos_dpi_remove(struct device *dev) { return 0; }
+static inline int exynos_dpi_remove(struct exynos_drm_display *display)
+{
+	return 0;
+}
 #endif
 
-/*
- * this function registers exynos drm vidi platform device/driver.
- */
+#ifdef CONFIG_DRM_EXYNOS_VIDI
 int exynos_drm_probe_vidi(void);
-
-/*
- * this function unregister exynos drm vidi platform device/driver.
- */
 void exynos_drm_remove_vidi(void);
+#else
+static inline int exynos_drm_probe_vidi(void) { return 0; }
+static inline void exynos_drm_remove_vidi(void) {}
+#endif
 
 /* This function creates a encoder and a connector, and initializes them. */
 int exynos_drm_create_enc_conn(struct drm_device *dev,
diff --git a/drivers/gpu/drm/exynos/exynos_drm_dsi.c b/drivers/gpu/drm/exynos/exynos_drm_dsi.c
index acf7e9e39dcd..05fe93dc57a8 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_dsi.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_dsi.c
@@ -268,9 +268,9 @@ struct exynos_dsi_driver_data {
 };
 
 struct exynos_dsi {
+	struct exynos_drm_display display;
 	struct mipi_dsi_host dsi_host;
 	struct drm_connector connector;
-	struct drm_encoder *encoder;
 	struct device_node *panel_node;
 	struct drm_panel *panel;
 	struct device *dev;
@@ -304,6 +304,11 @@ struct exynos_dsi {
 #define host_to_dsi(host) container_of(host, struct exynos_dsi, dsi_host)
 #define connector_to_dsi(c) container_of(c, struct exynos_dsi, connector)
 
+static inline struct exynos_dsi *display_to_dsi(struct exynos_drm_display *d)
+{
+	return container_of(d, struct exynos_dsi, display);
+}
+
 static struct exynos_dsi_driver_data exynos3_dsi_driver_data = {
 	.plltmr_reg = 0x50,
 	.has_freqband = 1,
@@ -316,6 +321,11 @@ static struct exynos_dsi_driver_data exynos4_dsi_driver_data = {
 	.has_clklane_stop = 1,
 };
 
+static struct exynos_dsi_driver_data exynos4415_dsi_driver_data = {
+	.plltmr_reg = 0x58,
+	.has_clklane_stop = 1,
+};
+
 static struct exynos_dsi_driver_data exynos5_dsi_driver_data = {
 	.plltmr_reg = 0x58,
 };
@@ -325,6 +335,8 @@ static struct of_device_id exynos_dsi_of_match[] = {
 	  .data = &exynos3_dsi_driver_data },
 	{ .compatible = "samsung,exynos4210-mipi-dsi",
 	  .data = &exynos4_dsi_driver_data },
+	{ .compatible = "samsung,exynos4415-mipi-dsi",
+	  .data = &exynos4415_dsi_driver_data },
 	{ .compatible = "samsung,exynos5410-mipi-dsi",
 	  .data = &exynos5_dsi_driver_data },
 	{ }
@@ -1104,7 +1116,7 @@ static irqreturn_t exynos_dsi_irq(int irq, void *dev_id)
 static irqreturn_t exynos_dsi_te_irq_handler(int irq, void *dev_id)
 {
 	struct exynos_dsi *dsi = (struct exynos_dsi *)dev_id;
-	struct drm_encoder *encoder = dsi->encoder;
+	struct drm_encoder *encoder = dsi->display.encoder;
 
 	if (dsi->state & DSIM_STATE_ENABLED)
 		exynos_drm_crtc_te_handler(encoder->crtc);
@@ -1143,6 +1155,7 @@ static int exynos_dsi_init(struct exynos_dsi *dsi)
 static int exynos_dsi_register_te_irq(struct exynos_dsi *dsi)
 {
 	int ret;
+	int te_gpio_irq;
 
 	dsi->te_gpio = of_get_named_gpio(dsi->panel_node, "te-gpios", 0);
 	if (!gpio_is_valid(dsi->te_gpio)) {
@@ -1157,14 +1170,10 @@ static int exynos_dsi_register_te_irq(struct exynos_dsi *dsi)
 		goto out;
 	}
 
-	/*
-	 * This TE GPIO IRQ should not be set to IRQ_NOAUTOEN, because panel
-	 * calls drm_panel_init() first then calls mipi_dsi_attach() in probe().
-	 * It means that te_gpio is invalid when exynos_dsi_enable_irq() is
-	 * called by drm_panel_init() before panel is attached.
-	 */
-	ret = request_threaded_irq(gpio_to_irq(dsi->te_gpio),
-					exynos_dsi_te_irq_handler, NULL,
+	te_gpio_irq = gpio_to_irq(dsi->te_gpio);
+
+	irq_set_status_flags(te_gpio_irq, IRQ_NOAUTOEN);
+	ret = request_threaded_irq(te_gpio_irq, exynos_dsi_te_irq_handler, NULL,
 					IRQF_TRIGGER_RISING, "TE", dsi);
 	if (ret) {
 		dev_err(dsi->dev, "request interrupt failed with %d\n", ret);
@@ -1195,9 +1204,6 @@ static int exynos_dsi_host_attach(struct mipi_dsi_host *host,
 	dsi->mode_flags = device->mode_flags;
 	dsi->panel_node = device->dev.of_node;
 
-	if (dsi->connector.dev)
-		drm_helper_hpd_irq_event(dsi->connector.dev);
-
 	/*
 	 * This is a temporary solution and should be made by more generic way.
 	 *
@@ -1211,6 +1217,9 @@ static int exynos_dsi_host_attach(struct mipi_dsi_host *host,
 			return ret;
 	}
 
+	if (dsi->connector.dev)
+		drm_helper_hpd_irq_event(dsi->connector.dev);
+
 	return 0;
 }
 
@@ -1236,7 +1245,7 @@ static bool exynos_dsi_is_short_dsi_type(u8 type)
 }
 
 static ssize_t exynos_dsi_host_transfer(struct mipi_dsi_host *host,
-				       struct mipi_dsi_msg *msg)
+				        const struct mipi_dsi_msg *msg)
 {
 	struct exynos_dsi *dsi = host_to_dsi(host);
 	struct exynos_dsi_transfer xfer;
@@ -1369,16 +1378,17 @@ static int exynos_dsi_enable(struct exynos_dsi *dsi)
 	exynos_dsi_set_display_mode(dsi);
 	exynos_dsi_set_display_enable(dsi, true);
 
+	dsi->state |= DSIM_STATE_ENABLED;
+
 	ret = drm_panel_enable(dsi->panel);
 	if (ret < 0) {
+		dsi->state &= ~DSIM_STATE_ENABLED;
 		exynos_dsi_set_display_enable(dsi, false);
 		drm_panel_unprepare(dsi->panel);
 		exynos_dsi_poweroff(dsi);
 		return ret;
 	}
 
-	dsi->state |= DSIM_STATE_ENABLED;
-
 	return 0;
 }
 
@@ -1397,7 +1407,7 @@ static void exynos_dsi_disable(struct exynos_dsi *dsi)
 
 static void exynos_dsi_dpms(struct exynos_drm_display *display, int mode)
 {
-	struct exynos_dsi *dsi = display->ctx;
+	struct exynos_dsi *dsi = display_to_dsi(display);
 
 	if (dsi->panel) {
 		switch (mode) {
@@ -1474,7 +1484,7 @@ exynos_dsi_best_encoder(struct drm_connector *connector)
 {
 	struct exynos_dsi *dsi = connector_to_dsi(connector);
 
-	return dsi->encoder;
+	return dsi->display.encoder;
 }
 
 static struct drm_connector_helper_funcs exynos_dsi_connector_helper_funcs = {
@@ -1486,12 +1496,10 @@ static struct drm_connector_helper_funcs exynos_dsi_connector_helper_funcs = {
 static int exynos_dsi_create_connector(struct exynos_drm_display *display,
 				       struct drm_encoder *encoder)
 {
-	struct exynos_dsi *dsi = display->ctx;
+	struct exynos_dsi *dsi = display_to_dsi(display);
 	struct drm_connector *connector = &dsi->connector;
 	int ret;
 
-	dsi->encoder = encoder;
-
 	connector->polled = DRM_CONNECTOR_POLL_HPD;
 
 	ret = drm_connector_init(encoder->dev, connector,
@@ -1512,7 +1520,7 @@ static int exynos_dsi_create_connector(struct exynos_drm_display *display,
 static void exynos_dsi_mode_set(struct exynos_drm_display *display,
 			 struct drm_display_mode *mode)
 {
-	struct exynos_dsi *dsi = display->ctx;
+	struct exynos_dsi *dsi = display_to_dsi(display);
 	struct videomode *vm = &dsi->vm;
 
 	vm->hactive = mode->hdisplay;
@@ -1531,10 +1539,6 @@ static struct exynos_drm_display_ops exynos_dsi_display_ops = {
 	.dpms = exynos_dsi_dpms
 };
 
-static struct exynos_drm_display exynos_dsi_display = {
-	.type = EXYNOS_DISPLAY_TYPE_LCD,
-	.ops = &exynos_dsi_display_ops,
-};
 MODULE_DEVICE_TABLE(of, exynos_dsi_of_match);
 
 /* of_* functions will be removed after merge of of_graph patches */
@@ -1640,28 +1644,28 @@ end:
 static int exynos_dsi_bind(struct device *dev, struct device *master,
 				void *data)
 {
+	struct exynos_drm_display *display = dev_get_drvdata(dev);
+	struct exynos_dsi *dsi = display_to_dsi(display);
 	struct drm_device *drm_dev = data;
-	struct exynos_dsi *dsi;
 	int ret;
 
-	ret = exynos_drm_create_enc_conn(drm_dev, &exynos_dsi_display);
+	ret = exynos_drm_create_enc_conn(drm_dev, display);
 	if (ret) {
 		DRM_ERROR("Encoder create [%d] failed with %d\n",
-				exynos_dsi_display.type, ret);
+			  display->type, ret);
 		return ret;
 	}
 
-	dsi = exynos_dsi_display.ctx;
-
 	return mipi_dsi_host_register(&dsi->dsi_host);
 }
 
 static void exynos_dsi_unbind(struct device *dev, struct device *master,
 				void *data)
 {
-	struct exynos_dsi *dsi = exynos_dsi_display.ctx;
+	struct exynos_drm_display *display = dev_get_drvdata(dev);
+	struct exynos_dsi *dsi = display_to_dsi(display);
 
-	exynos_dsi_dpms(&exynos_dsi_display, DRM_MODE_DPMS_OFF);
+	exynos_dsi_dpms(display, DRM_MODE_DPMS_OFF);
 
 	mipi_dsi_host_unregister(&dsi->dsi_host);
 }
@@ -1673,22 +1677,23 @@ static const struct component_ops exynos_dsi_component_ops = {
 
 static int exynos_dsi_probe(struct platform_device *pdev)
 {
+	struct device *dev = &pdev->dev;
 	struct resource *res;
 	struct exynos_dsi *dsi;
 	int ret;
 
-	ret = exynos_drm_component_add(&pdev->dev, EXYNOS_DEVICE_TYPE_CONNECTOR,
-					exynos_dsi_display.type);
+	dsi = devm_kzalloc(dev, sizeof(*dsi), GFP_KERNEL);
+	if (!dsi)
+		return -ENOMEM;
+
+	dsi->display.type = EXYNOS_DISPLAY_TYPE_LCD;
+	dsi->display.ops = &exynos_dsi_display_ops;
+
+	ret = exynos_drm_component_add(dev, EXYNOS_DEVICE_TYPE_CONNECTOR,
+				       dsi->display.type);
 	if (ret)
 		return ret;
 
-	dsi = devm_kzalloc(&pdev->dev, sizeof(*dsi), GFP_KERNEL);
-	if (!dsi) {
-		dev_err(&pdev->dev, "failed to allocate dsi object.\n");
-		ret = -ENOMEM;
-		goto err_del_component;
-	}
-
 	/* To be checked as invalid one */
 	dsi->te_gpio = -ENOENT;
 
@@ -1697,9 +1702,9 @@ static int exynos_dsi_probe(struct platform_device *pdev)
 	INIT_LIST_HEAD(&dsi->transfer_list);
 
 	dsi->dsi_host.ops = &exynos_dsi_ops;
-	dsi->dsi_host.dev = &pdev->dev;
+	dsi->dsi_host.dev = dev;
 
-	dsi->dev = &pdev->dev;
+	dsi->dev = dev;
 	dsi->driver_data = exynos_dsi_get_driver_data(pdev);
 
 	ret = exynos_dsi_parse_dt(dsi);
@@ -1708,70 +1713,68 @@ static int exynos_dsi_probe(struct platform_device *pdev)
 
 	dsi->supplies[0].supply = "vddcore";
 	dsi->supplies[1].supply = "vddio";
-	ret = devm_regulator_bulk_get(&pdev->dev, ARRAY_SIZE(dsi->supplies),
+	ret = devm_regulator_bulk_get(dev, ARRAY_SIZE(dsi->supplies),
 				      dsi->supplies);
 	if (ret) {
-		dev_info(&pdev->dev, "failed to get regulators: %d\n", ret);
+		dev_info(dev, "failed to get regulators: %d\n", ret);
 		return -EPROBE_DEFER;
 	}
 
-	dsi->pll_clk = devm_clk_get(&pdev->dev, "pll_clk");
+	dsi->pll_clk = devm_clk_get(dev, "pll_clk");
 	if (IS_ERR(dsi->pll_clk)) {
-		dev_info(&pdev->dev, "failed to get dsi pll input clock\n");
+		dev_info(dev, "failed to get dsi pll input clock\n");
 		ret = PTR_ERR(dsi->pll_clk);
 		goto err_del_component;
 	}
 
-	dsi->bus_clk = devm_clk_get(&pdev->dev, "bus_clk");
+	dsi->bus_clk = devm_clk_get(dev, "bus_clk");
 	if (IS_ERR(dsi->bus_clk)) {
-		dev_info(&pdev->dev, "failed to get dsi bus clock\n");
+		dev_info(dev, "failed to get dsi bus clock\n");
 		ret = PTR_ERR(dsi->bus_clk);
 		goto err_del_component;
 	}
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	dsi->reg_base = devm_ioremap_resource(&pdev->dev, res);
+	dsi->reg_base = devm_ioremap_resource(dev, res);
 	if (IS_ERR(dsi->reg_base)) {
-		dev_err(&pdev->dev, "failed to remap io region\n");
+		dev_err(dev, "failed to remap io region\n");
 		ret = PTR_ERR(dsi->reg_base);
 		goto err_del_component;
 	}
 
-	dsi->phy = devm_phy_get(&pdev->dev, "dsim");
+	dsi->phy = devm_phy_get(dev, "dsim");
 	if (IS_ERR(dsi->phy)) {
-		dev_info(&pdev->dev, "failed to get dsim phy\n");
+		dev_info(dev, "failed to get dsim phy\n");
 		ret = PTR_ERR(dsi->phy);
 		goto err_del_component;
 	}
 
 	dsi->irq = platform_get_irq(pdev, 0);
 	if (dsi->irq < 0) {
-		dev_err(&pdev->dev, "failed to request dsi irq resource\n");
+		dev_err(dev, "failed to request dsi irq resource\n");
 		ret = dsi->irq;
 		goto err_del_component;
 	}
 
 	irq_set_status_flags(dsi->irq, IRQ_NOAUTOEN);
-	ret = devm_request_threaded_irq(&pdev->dev, dsi->irq, NULL,
+	ret = devm_request_threaded_irq(dev, dsi->irq, NULL,
 					exynos_dsi_irq, IRQF_ONESHOT,
-					dev_name(&pdev->dev), dsi);
+					dev_name(dev), dsi);
 	if (ret) {
-		dev_err(&pdev->dev, "failed to request dsi irq\n");
+		dev_err(dev, "failed to request dsi irq\n");
 		goto err_del_component;
 	}
 
-	exynos_dsi_display.ctx = dsi;
-
-	platform_set_drvdata(pdev, &exynos_dsi_display);
+	platform_set_drvdata(pdev, &dsi->display);
 
-	ret = component_add(&pdev->dev, &exynos_dsi_component_ops);
+	ret = component_add(dev, &exynos_dsi_component_ops);
 	if (ret)
 		goto err_del_component;
 
 	return ret;
 
 err_del_component:
-	exynos_drm_component_del(&pdev->dev, EXYNOS_DEVICE_TYPE_CONNECTOR);
+	exynos_drm_component_del(dev, EXYNOS_DEVICE_TYPE_CONNECTOR);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_encoder.h b/drivers/gpu/drm/exynos/exynos_drm_encoder.h
index b7a1620a7e79..26305d8dd93a 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_encoder.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_encoder.h
@@ -14,8 +14,6 @@
 #ifndef _EXYNOS_DRM_ENCODER_H_
 #define _EXYNOS_DRM_ENCODER_H_
 
-struct exynos_drm_manager;
-
 void exynos_drm_encoder_setup(struct drm_device *dev);
 struct drm_encoder *exynos_drm_encoder_create(struct drm_device *dev,
 			struct exynos_drm_display *mgr,
diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimd.c b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
index 085b066a9993..e5810d13bf9c 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_fimd.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
@@ -84,8 +84,6 @@
 /* FIMD has totally five hardware windows. */
 #define WINDOWS_NR	5
 
-#define get_fimd_manager(mgr)	platform_get_drvdata(to_platform_device(dev))
-
 struct fimd_driver_data {
 	unsigned int timing_base;
 	unsigned int lcdblk_offset;
@@ -96,6 +94,7 @@ struct fimd_driver_data {
 	unsigned int has_clksel:1;
 	unsigned int has_limited_fmt:1;
 	unsigned int has_vidoutcon:1;
+	unsigned int has_vtsel:1;
 };
 
 static struct fimd_driver_data s3c64xx_fimd_driver_data = {
@@ -118,6 +117,17 @@ static struct fimd_driver_data exynos4_fimd_driver_data = {
 	.lcdblk_vt_shift = 10,
 	.lcdblk_bypass_shift = 1,
 	.has_shadowcon = 1,
+	.has_vtsel = 1,
+};
+
+static struct fimd_driver_data exynos4415_fimd_driver_data = {
+	.timing_base = 0x20000,
+	.lcdblk_offset = 0x210,
+	.lcdblk_vt_shift = 10,
+	.lcdblk_bypass_shift = 1,
+	.has_shadowcon = 1,
+	.has_vidoutcon = 1,
+	.has_vtsel = 1,
 };
 
 static struct fimd_driver_data exynos5_fimd_driver_data = {
@@ -127,6 +137,7 @@ static struct fimd_driver_data exynos5_fimd_driver_data = {
 	.lcdblk_bypass_shift = 15,
 	.has_shadowcon = 1,
 	.has_vidoutcon = 1,
+	.has_vtsel = 1,
 };
 
 struct fimd_win_data {
@@ -146,6 +157,7 @@ struct fimd_win_data {
 };
 
 struct fimd_context {
+	struct exynos_drm_manager	manager;
 	struct device			*dev;
 	struct drm_device		*drm_dev;
 	struct clk			*bus_clk;
@@ -173,6 +185,11 @@ struct fimd_context {
 	struct exynos_drm_display *display;
 };
 
+static inline struct fimd_context *mgr_to_fimd(struct exynos_drm_manager *mgr)
+{
+	return container_of(mgr, struct fimd_context, manager);
+}
+
 static const struct of_device_id fimd_driver_dt_match[] = {
 	{ .compatible = "samsung,s3c6400-fimd",
 	  .data = &s3c64xx_fimd_driver_data },
@@ -180,6 +197,8 @@ static const struct of_device_id fimd_driver_dt_match[] = {
 	  .data = &exynos3_fimd_driver_data },
 	{ .compatible = "samsung,exynos4210-fimd",
 	  .data = &exynos4_fimd_driver_data },
+	{ .compatible = "samsung,exynos4415-fimd",
+	  .data = &exynos4415_fimd_driver_data },
 	{ .compatible = "samsung,exynos5250-fimd",
 	  .data = &exynos5_fimd_driver_data },
 	{},
@@ -197,7 +216,7 @@ static inline struct fimd_driver_data *drm_fimd_get_driver_data(
 
 static void fimd_wait_for_vblank(struct exynos_drm_manager *mgr)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 
 	if (ctx->suspended)
 		return;
@@ -214,9 +233,35 @@ static void fimd_wait_for_vblank(struct exynos_drm_manager *mgr)
 		DRM_DEBUG_KMS("vblank wait timed out.\n");
 }
 
+static void fimd_enable_video_output(struct fimd_context *ctx, int win,
+					bool enable)
+{
+	u32 val = readl(ctx->regs + WINCON(win));
+
+	if (enable)
+		val |= WINCONx_ENWIN;
+	else
+		val &= ~WINCONx_ENWIN;
+
+	writel(val, ctx->regs + WINCON(win));
+}
+
+static void fimd_enable_shadow_channel_path(struct fimd_context *ctx, int win,
+						bool enable)
+{
+	u32 val = readl(ctx->regs + SHADOWCON);
+
+	if (enable)
+		val |= SHADOWCON_CHx_ENABLE(win);
+	else
+		val &= ~SHADOWCON_CHx_ENABLE(win);
+
+	writel(val, ctx->regs + SHADOWCON);
+}
+
 static void fimd_clear_channel(struct exynos_drm_manager *mgr)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 	int win, ch_enabled = 0;
 
 	DRM_DEBUG_KMS("%s\n", __FILE__);
@@ -226,16 +271,12 @@ static void fimd_clear_channel(struct exynos_drm_manager *mgr)
 		u32 val = readl(ctx->regs + WINCON(win));
 
 		if (val & WINCONx_ENWIN) {
-			/* wincon */
-			val &= ~WINCONx_ENWIN;
-			writel(val, ctx->regs + WINCON(win));
-
-			/* unprotect windows */
-			if (ctx->driver_data->has_shadowcon) {
-				val = readl(ctx->regs + SHADOWCON);
-				val &= ~SHADOWCON_CHx_ENABLE(win);
-				writel(val, ctx->regs + SHADOWCON);
-			}
+			fimd_enable_video_output(ctx, win, false);
+
+			if (ctx->driver_data->has_shadowcon)
+				fimd_enable_shadow_channel_path(ctx, win,
+								false);
+
 			ch_enabled = 1;
 		}
 	}
@@ -253,7 +294,7 @@ static void fimd_clear_channel(struct exynos_drm_manager *mgr)
 static int fimd_mgr_initialize(struct exynos_drm_manager *mgr,
 			struct drm_device *drm_dev)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 	struct exynos_drm_private *priv;
 	priv = drm_dev->dev_private;
 
@@ -275,7 +316,7 @@ static int fimd_mgr_initialize(struct exynos_drm_manager *mgr,
 
 static void fimd_mgr_remove(struct exynos_drm_manager *mgr)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 
 	/* detach this sub driver from iommu mapping if supported. */
 	if (is_drm_iommu_supported(ctx->drm_dev))
@@ -315,14 +356,14 @@ static bool fimd_mode_fixup(struct exynos_drm_manager *mgr,
 static void fimd_mode_set(struct exynos_drm_manager *mgr,
 		const struct drm_display_mode *in_mode)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 
 	drm_mode_copy(&ctx->mode, in_mode);
 }
 
 static void fimd_commit(struct exynos_drm_manager *mgr)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 	struct drm_display_mode *mode = &ctx->mode;
 	struct fimd_driver_data *driver_data = ctx->driver_data;
 	void *timing_base = ctx->regs + driver_data->timing_base;
@@ -343,7 +384,8 @@ static void fimd_commit(struct exynos_drm_manager *mgr)
 		writel(0, timing_base + I80IFCONFBx(0));
 
 		/* set video type selection to I80 interface */
-		if (ctx->sysreg && regmap_update_bits(ctx->sysreg,
+		if (driver_data->has_vtsel && ctx->sysreg &&
+				regmap_update_bits(ctx->sysreg,
 					driver_data->lcdblk_offset,
 					0x3 << driver_data->lcdblk_vt_shift,
 					0x1 << driver_data->lcdblk_vt_shift)) {
@@ -421,7 +463,7 @@ static void fimd_commit(struct exynos_drm_manager *mgr)
 
 static int fimd_enable_vblank(struct exynos_drm_manager *mgr)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 	u32 val;
 
 	if (ctx->suspended)
@@ -431,12 +473,19 @@ static int fimd_enable_vblank(struct exynos_drm_manager *mgr)
 		val = readl(ctx->regs + VIDINTCON0);
 
 		val |= VIDINTCON0_INT_ENABLE;
-		val |= VIDINTCON0_INT_FRAME;
 
-		val &= ~VIDINTCON0_FRAMESEL0_MASK;
-		val |= VIDINTCON0_FRAMESEL0_VSYNC;
-		val &= ~VIDINTCON0_FRAMESEL1_MASK;
-		val |= VIDINTCON0_FRAMESEL1_NONE;
+		if (ctx->i80_if) {
+			val |= VIDINTCON0_INT_I80IFDONE;
+			val |= VIDINTCON0_INT_SYSMAINCON;
+			val &= ~VIDINTCON0_INT_SYSSUBCON;
+		} else {
+			val |= VIDINTCON0_INT_FRAME;
+
+			val &= ~VIDINTCON0_FRAMESEL0_MASK;
+			val |= VIDINTCON0_FRAMESEL0_VSYNC;
+			val &= ~VIDINTCON0_FRAMESEL1_MASK;
+			val |= VIDINTCON0_FRAMESEL1_NONE;
+		}
 
 		writel(val, ctx->regs + VIDINTCON0);
 	}
@@ -446,7 +495,7 @@ static int fimd_enable_vblank(struct exynos_drm_manager *mgr)
 
 static void fimd_disable_vblank(struct exynos_drm_manager *mgr)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 	u32 val;
 
 	if (ctx->suspended)
@@ -455,9 +504,15 @@ static void fimd_disable_vblank(struct exynos_drm_manager *mgr)
 	if (test_and_clear_bit(0, &ctx->irq_flags)) {
 		val = readl(ctx->regs + VIDINTCON0);
 
-		val &= ~VIDINTCON0_INT_FRAME;
 		val &= ~VIDINTCON0_INT_ENABLE;
 
+		if (ctx->i80_if) {
+			val &= ~VIDINTCON0_INT_I80IFDONE;
+			val &= ~VIDINTCON0_INT_SYSMAINCON;
+			val &= ~VIDINTCON0_INT_SYSSUBCON;
+		} else
+			val &= ~VIDINTCON0_INT_FRAME;
+
 		writel(val, ctx->regs + VIDINTCON0);
 	}
 }
@@ -465,7 +520,7 @@ static void fimd_disable_vblank(struct exynos_drm_manager *mgr)
 static void fimd_win_mode_set(struct exynos_drm_manager *mgr,
 			struct exynos_drm_overlay *overlay)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 	struct fimd_win_data *win_data;
 	int win;
 	unsigned long offset;
@@ -623,7 +678,7 @@ static void fimd_shadow_protect_win(struct fimd_context *ctx,
 
 static void fimd_win_commit(struct exynos_drm_manager *mgr, int zpos)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 	struct fimd_win_data *win_data;
 	int win = zpos;
 	unsigned long val, alpha, size;
@@ -730,20 +785,14 @@ static void fimd_win_commit(struct exynos_drm_manager *mgr, int zpos)
 	if (win != 0)
 		fimd_win_set_colkey(ctx, win);
 
-	/* wincon */
-	val = readl(ctx->regs + WINCON(win));
-	val |= WINCONx_ENWIN;
-	writel(val, ctx->regs + WINCON(win));
+	fimd_enable_video_output(ctx, win, true);
+
+	if (ctx->driver_data->has_shadowcon)
+		fimd_enable_shadow_channel_path(ctx, win, true);
 
 	/* Enable DMA channel and unprotect windows */
 	fimd_shadow_protect_win(ctx, win, false);
 
-	if (ctx->driver_data->has_shadowcon) {
-		val = readl(ctx->regs + SHADOWCON);
-		val |= SHADOWCON_CHx_ENABLE(win);
-		writel(val, ctx->regs + SHADOWCON);
-	}
-
 	win_data->enabled = true;
 
 	if (ctx->i80_if)
@@ -752,10 +801,9 @@ static void fimd_win_commit(struct exynos_drm_manager *mgr, int zpos)
 
 static void fimd_win_disable(struct exynos_drm_manager *mgr, int zpos)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 	struct fimd_win_data *win_data;
 	int win = zpos;
-	u32 val;
 
 	if (win == DEFAULT_ZPOS)
 		win = ctx->default_win;
@@ -774,18 +822,12 @@ static void fimd_win_disable(struct exynos_drm_manager *mgr, int zpos)
 	/* protect windows */
 	fimd_shadow_protect_win(ctx, win, true);
 
-	/* wincon */
-	val = readl(ctx->regs + WINCON(win));
-	val &= ~WINCONx_ENWIN;
-	writel(val, ctx->regs + WINCON(win));
+	fimd_enable_video_output(ctx, win, false);
 
-	/* unprotect windows */
-	if (ctx->driver_data->has_shadowcon) {
-		val = readl(ctx->regs + SHADOWCON);
-		val &= ~SHADOWCON_CHx_ENABLE(win);
-		writel(val, ctx->regs + SHADOWCON);
-	}
+	if (ctx->driver_data->has_shadowcon)
+		fimd_enable_shadow_channel_path(ctx, win, false);
 
+	/* unprotect windows */
 	fimd_shadow_protect_win(ctx, win, false);
 
 	win_data->enabled = false;
@@ -793,7 +835,7 @@ static void fimd_win_disable(struct exynos_drm_manager *mgr, int zpos)
 
 static void fimd_window_suspend(struct exynos_drm_manager *mgr)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 	struct fimd_win_data *win_data;
 	int i;
 
@@ -803,12 +845,11 @@ static void fimd_window_suspend(struct exynos_drm_manager *mgr)
 		if (win_data->enabled)
 			fimd_win_disable(mgr, i);
 	}
-	fimd_wait_for_vblank(mgr);
 }
 
 static void fimd_window_resume(struct exynos_drm_manager *mgr)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 	struct fimd_win_data *win_data;
 	int i;
 
@@ -821,7 +862,7 @@ static void fimd_window_resume(struct exynos_drm_manager *mgr)
 
 static void fimd_apply(struct exynos_drm_manager *mgr)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 	struct fimd_win_data *win_data;
 	int i;
 
@@ -838,7 +879,7 @@ static void fimd_apply(struct exynos_drm_manager *mgr)
 
 static int fimd_poweron(struct exynos_drm_manager *mgr)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 	int ret;
 
 	if (!ctx->suspended)
@@ -886,7 +927,7 @@ bus_clk_err:
 
 static int fimd_poweroff(struct exynos_drm_manager *mgr)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 
 	if (ctx->suspended)
 		return 0;
@@ -928,39 +969,41 @@ static void fimd_dpms(struct exynos_drm_manager *mgr, int mode)
 
 static void fimd_trigger(struct device *dev)
 {
-	struct exynos_drm_manager *mgr = get_fimd_manager(dev);
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = dev_get_drvdata(dev);
 	struct fimd_driver_data *driver_data = ctx->driver_data;
 	void *timing_base = ctx->regs + driver_data->timing_base;
 	u32 reg;
 
-	atomic_set(&ctx->triggering, 1);
+	 /*
+	  * Skips triggering if in triggering state, because multiple triggering
+	  * requests can cause panel reset.
+	  */
+	if (atomic_read(&ctx->triggering))
+		return;
 
-	reg = readl(ctx->regs + VIDINTCON0);
-	reg |= (VIDINTCON0_INT_ENABLE | VIDINTCON0_INT_I80IFDONE |
-						VIDINTCON0_INT_SYSMAINCON);
-	writel(reg, ctx->regs + VIDINTCON0);
+	/* Enters triggering mode */
+	atomic_set(&ctx->triggering, 1);
 
 	reg = readl(timing_base + TRIGCON);
 	reg |= (TRGMODE_I80_RGB_ENABLE_I80 | SWTRGCMD_I80_RGB_ENABLE);
 	writel(reg, timing_base + TRIGCON);
+
+	/*
+	 * Exits triggering mode if vblank is not enabled yet, because when the
+	 * VIDINTCON0 register is not set, it can not exit from triggering mode.
+	 */
+	if (!test_bit(0, &ctx->irq_flags))
+		atomic_set(&ctx->triggering, 0);
 }
 
 static void fimd_te_handler(struct exynos_drm_manager *mgr)
 {
-	struct fimd_context *ctx = mgr->ctx;
+	struct fimd_context *ctx = mgr_to_fimd(mgr);
 
 	/* Checks the crtc is detached already from encoder */
 	if (ctx->pipe < 0 || !ctx->drm_dev)
 		return;
 
-	 /*
-	 * Skips to trigger if in triggering state, because multiple triggering
-	 * requests can cause panel reset.
-	 */
-	if (atomic_read(&ctx->triggering))
-		return;
-
 	/*
 	 * If there is a page flip request, triggers and handles the page flip
 	 * event so that current fb can be updated into panel GRAM.
@@ -972,10 +1015,10 @@ static void fimd_te_handler(struct exynos_drm_manager *mgr)
 	if (atomic_read(&ctx->wait_vsync_event)) {
 		atomic_set(&ctx->wait_vsync_event, 0);
 		wake_up(&ctx->wait_vsync_queue);
-
-		if (!atomic_read(&ctx->triggering))
-			drm_handle_vblank(ctx->drm_dev, ctx->pipe);
 	}
+
+	if (test_bit(0, &ctx->irq_flags))
+		drm_handle_vblank(ctx->drm_dev, ctx->pipe);
 }
 
 static struct exynos_drm_manager_ops fimd_manager_ops = {
@@ -992,11 +1035,6 @@ static struct exynos_drm_manager_ops fimd_manager_ops = {
 	.te_handler = fimd_te_handler,
 };
 
-static struct exynos_drm_manager fimd_manager = {
-	.type = EXYNOS_DISPLAY_TYPE_LCD,
-	.ops = &fimd_manager_ops,
-};
-
 static irqreturn_t fimd_irq_handler(int irq, void *dev_id)
 {
 	struct fimd_context *ctx = (struct fimd_context *)dev_id;
@@ -1013,16 +1051,10 @@ static irqreturn_t fimd_irq_handler(int irq, void *dev_id)
 		goto out;
 
 	if (ctx->i80_if) {
-		/* unset I80 frame done interrupt */
-		val = readl(ctx->regs + VIDINTCON0);
-		val &= ~(VIDINTCON0_INT_I80IFDONE | VIDINTCON0_INT_SYSMAINCON);
-		writel(val, ctx->regs + VIDINTCON0);
+		exynos_drm_crtc_finish_pageflip(ctx->drm_dev, ctx->pipe);
 
-		/* exit triggering mode */
+		/* Exits triggering mode */
 		atomic_set(&ctx->triggering, 0);
-
-		drm_handle_vblank(ctx->drm_dev, ctx->pipe);
-		exynos_drm_crtc_finish_pageflip(ctx->drm_dev, ctx->pipe);
 	} else {
 		drm_handle_vblank(ctx->drm_dev, ctx->pipe);
 		exynos_drm_crtc_finish_pageflip(ctx->drm_dev, ctx->pipe);
@@ -1040,11 +1072,11 @@ out:
 
 static int fimd_bind(struct device *dev, struct device *master, void *data)
 {
-	struct fimd_context *ctx = fimd_manager.ctx;
+	struct fimd_context *ctx = dev_get_drvdata(dev);
 	struct drm_device *drm_dev = data;
 
-	fimd_mgr_initialize(&fimd_manager, drm_dev);
-	exynos_drm_crtc_create(&fimd_manager);
+	fimd_mgr_initialize(&ctx->manager, drm_dev);
+	exynos_drm_crtc_create(&ctx->manager);
 	if (ctx->display)
 		exynos_drm_create_enc_conn(drm_dev, ctx->display);
 
@@ -1055,15 +1087,14 @@ static int fimd_bind(struct device *dev, struct device *master, void *data)
 static void fimd_unbind(struct device *dev, struct device *master,
 			void *data)
 {
-	struct exynos_drm_manager *mgr = dev_get_drvdata(dev);
-	struct fimd_context *ctx = fimd_manager.ctx;
+	struct fimd_context *ctx = dev_get_drvdata(dev);
 
-	fimd_dpms(mgr, DRM_MODE_DPMS_OFF);
+	fimd_dpms(&ctx->manager, DRM_MODE_DPMS_OFF);
 
 	if (ctx->display)
-		exynos_dpi_remove(dev);
+		exynos_dpi_remove(ctx->display);
 
-	fimd_mgr_remove(mgr);
+	fimd_mgr_remove(&ctx->manager);
 }
 
 static const struct component_ops fimd_component_ops = {
@@ -1079,21 +1110,20 @@ static int fimd_probe(struct platform_device *pdev)
 	struct resource *res;
 	int ret = -EINVAL;
 
-	ret = exynos_drm_component_add(&pdev->dev, EXYNOS_DEVICE_TYPE_CRTC,
-					fimd_manager.type);
-	if (ret)
-		return ret;
-
-	if (!dev->of_node) {
-		ret = -ENODEV;
-		goto err_del_component;
-	}
+	if (!dev->of_node)
+		return -ENODEV;
 
 	ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL);
-	if (!ctx) {
-		ret = -ENOMEM;
-		goto err_del_component;
-	}
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->manager.type = EXYNOS_DISPLAY_TYPE_LCD;
+	ctx->manager.ops = &fimd_manager_ops;
+
+	ret = exynos_drm_component_add(dev, EXYNOS_DEVICE_TYPE_CRTC,
+				       ctx->manager.type);
+	if (ret)
+		return ret;
 
 	ctx->dev = dev;
 	ctx->suspended = true;
@@ -1182,27 +1212,27 @@ static int fimd_probe(struct platform_device *pdev)
 	init_waitqueue_head(&ctx->wait_vsync_queue);
 	atomic_set(&ctx->wait_vsync_event, 0);
 
-	platform_set_drvdata(pdev, &fimd_manager);
-
-	fimd_manager.ctx = ctx;
+	platform_set_drvdata(pdev, ctx);
 
 	ctx->display = exynos_dpi_probe(dev);
-	if (IS_ERR(ctx->display))
-		return PTR_ERR(ctx->display);
+	if (IS_ERR(ctx->display)) {
+		ret = PTR_ERR(ctx->display);
+		goto err_del_component;
+	}
 
-	pm_runtime_enable(&pdev->dev);
+	pm_runtime_enable(dev);
 
-	ret = component_add(&pdev->dev, &fimd_component_ops);
+	ret = component_add(dev, &fimd_component_ops);
 	if (ret)
 		goto err_disable_pm_runtime;
 
 	return ret;
 
 err_disable_pm_runtime:
-	pm_runtime_disable(&pdev->dev);
+	pm_runtime_disable(dev);
 
 err_del_component:
-	exynos_drm_component_del(&pdev->dev, EXYNOS_DEVICE_TYPE_CRTC);
+	exynos_drm_component_del(dev, EXYNOS_DEVICE_TYPE_CRTC);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
index 72376d41c512..35d25889b476 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
@@ -40,7 +40,6 @@ static inline bool is_drm_iommu_supported(struct drm_device *drm_dev)
 
 #else
 
-struct dma_iommu_mapping;
 static inline int drm_create_iommu_mapping(struct drm_device *drm_dev)
 {
 	return 0;
diff --git a/drivers/gpu/drm/exynos/exynos_drm_ipp.c b/drivers/gpu/drm/exynos/exynos_drm_ipp.c
index 00d74b18f7cb..d5ad17dfc24d 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_ipp.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_ipp.c
@@ -426,18 +426,21 @@ int exynos_drm_ipp_set_property(struct drm_device *drm_dev, void *data,
 	c_node->start_work = ipp_create_cmd_work();
 	if (IS_ERR(c_node->start_work)) {
 		DRM_ERROR("failed to create start work.\n");
+		ret = PTR_ERR(c_node->start_work);
 		goto err_remove_id;
 	}
 
 	c_node->stop_work = ipp_create_cmd_work();
 	if (IS_ERR(c_node->stop_work)) {
 		DRM_ERROR("failed to create stop work.\n");
+		ret = PTR_ERR(c_node->stop_work);
 		goto err_free_start;
 	}
 
 	c_node->event_work = ipp_create_event_work();
 	if (IS_ERR(c_node->event_work)) {
 		DRM_ERROR("failed to create event work.\n");
+		ret = PTR_ERR(c_node->event_work);
 		goto err_free_stop;
 	}
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_vidi.c b/drivers/gpu/drm/exynos/exynos_drm_vidi.c
index 50faf913e574..45899fb63272 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_vidi.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_vidi.c
@@ -14,6 +14,7 @@
 
 #include <linux/kernel.h>
 #include <linux/platform_device.h>
+#include <linux/component.h>
 
 #include <drm/exynos_drm.h>
 
@@ -28,7 +29,6 @@
 /* vidi has totally three virtual windows. */
 #define WINDOWS_NR		3
 
-#define get_vidi_mgr(dev)	platform_get_drvdata(to_platform_device(dev))
 #define ctx_from_connector(c)	container_of(c, struct vidi_context, \
 					connector)
 
@@ -47,11 +47,13 @@ struct vidi_win_data {
 };
 
 struct vidi_context {
+	struct exynos_drm_manager	manager;
+	struct exynos_drm_display	display;
+	struct platform_device		*pdev;
 	struct drm_device		*drm_dev;
 	struct drm_crtc			*crtc;
 	struct drm_encoder		*encoder;
 	struct drm_connector		connector;
-	struct exynos_drm_subdrv	subdrv;
 	struct vidi_win_data		win_data[WINDOWS_NR];
 	struct edid			*raw_edid;
 	unsigned int			clkdiv;
@@ -66,6 +68,16 @@ struct vidi_context {
 	int				pipe;
 };
 
+static inline struct vidi_context *manager_to_vidi(struct exynos_drm_manager *m)
+{
+	return container_of(m, struct vidi_context, manager);
+}
+
+static inline struct vidi_context *display_to_vidi(struct exynos_drm_display *d)
+{
+	return container_of(d, struct vidi_context, display);
+}
+
 static const char fake_edid_info[] = {
 	0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x4c, 0x2d, 0x05, 0x05,
 	0x00, 0x00, 0x00, 0x00, 0x30, 0x12, 0x01, 0x03, 0x80, 0x10, 0x09, 0x78,
@@ -93,7 +105,7 @@ static const char fake_edid_info[] = {
 
 static void vidi_apply(struct exynos_drm_manager *mgr)
 {
-	struct vidi_context *ctx = mgr->ctx;
+	struct vidi_context *ctx = manager_to_vidi(mgr);
 	struct exynos_drm_manager_ops *mgr_ops = mgr->ops;
 	struct vidi_win_data *win_data;
 	int i;
@@ -110,7 +122,7 @@ static void vidi_apply(struct exynos_drm_manager *mgr)
 
 static void vidi_commit(struct exynos_drm_manager *mgr)
 {
-	struct vidi_context *ctx = mgr->ctx;
+	struct vidi_context *ctx = manager_to_vidi(mgr);
 
 	if (ctx->suspended)
 		return;
@@ -118,7 +130,7 @@ static void vidi_commit(struct exynos_drm_manager *mgr)
 
 static int vidi_enable_vblank(struct exynos_drm_manager *mgr)
 {
-	struct vidi_context *ctx = mgr->ctx;
+	struct vidi_context *ctx = manager_to_vidi(mgr);
 
 	if (ctx->suspended)
 		return -EPERM;
@@ -140,7 +152,7 @@ static int vidi_enable_vblank(struct exynos_drm_manager *mgr)
 
 static void vidi_disable_vblank(struct exynos_drm_manager *mgr)
 {
-	struct vidi_context *ctx = mgr->ctx;
+	struct vidi_context *ctx = manager_to_vidi(mgr);
 
 	if (ctx->suspended)
 		return;
@@ -152,7 +164,7 @@ static void vidi_disable_vblank(struct exynos_drm_manager *mgr)
 static void vidi_win_mode_set(struct exynos_drm_manager *mgr,
 			struct exynos_drm_overlay *overlay)
 {
-	struct vidi_context *ctx = mgr->ctx;
+	struct vidi_context *ctx = manager_to_vidi(mgr);
 	struct vidi_win_data *win_data;
 	int win;
 	unsigned long offset;
@@ -204,7 +216,7 @@ static void vidi_win_mode_set(struct exynos_drm_manager *mgr,
 
 static void vidi_win_commit(struct exynos_drm_manager *mgr, int zpos)
 {
-	struct vidi_context *ctx = mgr->ctx;
+	struct vidi_context *ctx = manager_to_vidi(mgr);
 	struct vidi_win_data *win_data;
 	int win = zpos;
 
@@ -229,7 +241,7 @@ static void vidi_win_commit(struct exynos_drm_manager *mgr, int zpos)
 
 static void vidi_win_disable(struct exynos_drm_manager *mgr, int zpos)
 {
-	struct vidi_context *ctx = mgr->ctx;
+	struct vidi_context *ctx = manager_to_vidi(mgr);
 	struct vidi_win_data *win_data;
 	int win = zpos;
 
@@ -247,7 +259,7 @@ static void vidi_win_disable(struct exynos_drm_manager *mgr, int zpos)
 
 static int vidi_power_on(struct exynos_drm_manager *mgr, bool enable)
 {
-	struct vidi_context *ctx = mgr->ctx;
+	struct vidi_context *ctx = manager_to_vidi(mgr);
 
 	DRM_DEBUG_KMS("%s\n", __FILE__);
 
@@ -271,7 +283,7 @@ static int vidi_power_on(struct exynos_drm_manager *mgr, bool enable)
 
 static void vidi_dpms(struct exynos_drm_manager *mgr, int mode)
 {
-	struct vidi_context *ctx = mgr->ctx;
+	struct vidi_context *ctx = manager_to_vidi(mgr);
 
 	DRM_DEBUG_KMS("%d\n", mode);
 
@@ -297,7 +309,7 @@ static void vidi_dpms(struct exynos_drm_manager *mgr, int mode)
 static int vidi_mgr_initialize(struct exynos_drm_manager *mgr,
 			struct drm_device *drm_dev)
 {
-	struct vidi_context *ctx = mgr->ctx;
+	struct vidi_context *ctx = manager_to_vidi(mgr);
 	struct exynos_drm_private *priv = drm_dev->dev_private;
 
 	mgr->drm_dev = ctx->drm_dev = drm_dev;
@@ -316,11 +328,6 @@ static struct exynos_drm_manager_ops vidi_manager_ops = {
 	.win_disable = vidi_win_disable,
 };
 
-static struct exynos_drm_manager vidi_manager = {
-	.type = EXYNOS_DISPLAY_TYPE_VIDI,
-	.ops = &vidi_manager_ops,
-};
-
 static void vidi_fake_vblank_handler(struct work_struct *work)
 {
 	struct vidi_context *ctx = container_of(work, struct vidi_context,
@@ -349,9 +356,8 @@ static void vidi_fake_vblank_handler(struct work_struct *work)
 static int vidi_show_connection(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
+	struct vidi_context *ctx = dev_get_drvdata(dev);
 	int rc;
-	struct exynos_drm_manager *mgr = get_vidi_mgr(dev);
-	struct vidi_context *ctx = mgr->ctx;
 
 	mutex_lock(&ctx->lock);
 
@@ -366,8 +372,7 @@ static int vidi_store_connection(struct device *dev,
 				struct device_attribute *attr,
 				const char *buf, size_t len)
 {
-	struct exynos_drm_manager *mgr = get_vidi_mgr(dev);
-	struct vidi_context *ctx = mgr->ctx;
+	struct vidi_context *ctx = dev_get_drvdata(dev);
 	int ret;
 
 	ret = kstrtoint(buf, 0, &ctx->connected);
@@ -420,7 +425,7 @@ int vidi_connection_ioctl(struct drm_device *drm_dev, void *data,
 		display = exynos_drm_get_display(encoder);
 
 		if (display->type == EXYNOS_DISPLAY_TYPE_VIDI) {
-			ctx = display->ctx;
+			ctx = display_to_vidi(display);
 			break;
 		}
 	}
@@ -530,7 +535,7 @@ static struct drm_connector_helper_funcs vidi_connector_helper_funcs = {
 static int vidi_create_connector(struct exynos_drm_display *display,
 				struct drm_encoder *encoder)
 {
-	struct vidi_context *ctx = display->ctx;
+	struct vidi_context *ctx = display_to_vidi(display);
 	struct drm_connector *connector = &ctx->connector;
 	int ret;
 
@@ -556,27 +561,22 @@ static struct exynos_drm_display_ops vidi_display_ops = {
 	.create_connector = vidi_create_connector,
 };
 
-static struct exynos_drm_display vidi_display = {
-	.type = EXYNOS_DISPLAY_TYPE_VIDI,
-	.ops = &vidi_display_ops,
-};
-
-static int vidi_subdrv_probe(struct drm_device *drm_dev, struct device *dev)
+static int vidi_bind(struct device *dev, struct device *master, void *data)
 {
-	struct exynos_drm_manager *mgr = get_vidi_mgr(dev);
-	struct vidi_context *ctx = mgr->ctx;
+	struct vidi_context *ctx = dev_get_drvdata(dev);
+	struct drm_device *drm_dev = data;
 	struct drm_crtc *crtc = ctx->crtc;
 	int ret;
 
-	vidi_mgr_initialize(mgr, drm_dev);
+	vidi_mgr_initialize(&ctx->manager, drm_dev);
 
-	ret = exynos_drm_crtc_create(&vidi_manager);
+	ret = exynos_drm_crtc_create(&ctx->manager);
 	if (ret) {
 		DRM_ERROR("failed to create crtc.\n");
 		return ret;
 	}
 
-	ret = exynos_drm_create_enc_conn(drm_dev, &vidi_display);
+	ret = exynos_drm_create_enc_conn(drm_dev, &ctx->display);
 	if (ret) {
 		crtc->funcs->destroy(crtc);
 		DRM_ERROR("failed to create encoder and connector.\n");
@@ -586,9 +586,18 @@ static int vidi_subdrv_probe(struct drm_device *drm_dev, struct device *dev)
 	return 0;
 }
 
+
+static void vidi_unbind(struct device *dev, struct device *master, void *data)
+{
+}
+
+static const struct component_ops vidi_component_ops = {
+	.bind	= vidi_bind,
+	.unbind = vidi_unbind,
+};
+
 static int vidi_probe(struct platform_device *pdev)
 {
-	struct exynos_drm_subdrv *subdrv;
 	struct vidi_context *ctx;
 	int ret;
 
@@ -596,40 +605,54 @@ static int vidi_probe(struct platform_device *pdev)
 	if (!ctx)
 		return -ENOMEM;
 
+	ctx->manager.type = EXYNOS_DISPLAY_TYPE_VIDI;
+	ctx->manager.ops = &vidi_manager_ops;
+	ctx->display.type = EXYNOS_DISPLAY_TYPE_VIDI;
+	ctx->display.ops = &vidi_display_ops;
 	ctx->default_win = 0;
+	ctx->pdev = pdev;
 
-	INIT_WORK(&ctx->work, vidi_fake_vblank_handler);
-
-	vidi_manager.ctx = ctx;
-	vidi_display.ctx = ctx;
+	ret = exynos_drm_component_add(&pdev->dev, EXYNOS_DEVICE_TYPE_CRTC,
+					ctx->manager.type);
+	if (ret)
+		return ret;
 
-	mutex_init(&ctx->lock);
+	ret = exynos_drm_component_add(&pdev->dev, EXYNOS_DEVICE_TYPE_CONNECTOR,
+					ctx->display.type);
+	if (ret)
+		goto err_del_crtc_component;
 
-	platform_set_drvdata(pdev, &vidi_manager);
+	INIT_WORK(&ctx->work, vidi_fake_vblank_handler);
 
-	subdrv = &ctx->subdrv;
-	subdrv->dev = &pdev->dev;
-	subdrv->probe = vidi_subdrv_probe;
+	mutex_init(&ctx->lock);
 
-	ret = exynos_drm_subdrv_register(subdrv);
-	if (ret < 0) {
-		dev_err(&pdev->dev, "failed to register drm vidi device\n");
-		return ret;
-	}
+	platform_set_drvdata(pdev, ctx);
 
 	ret = device_create_file(&pdev->dev, &dev_attr_connection);
 	if (ret < 0) {
-		exynos_drm_subdrv_unregister(subdrv);
-		DRM_INFO("failed to create connection sysfs.\n");
+		DRM_ERROR("failed to create connection sysfs.\n");
+		goto err_del_conn_component;
 	}
 
-	return 0;
+	ret = component_add(&pdev->dev, &vidi_component_ops);
+	if (ret)
+		goto err_remove_file;
+
+	return ret;
+
+err_remove_file:
+	device_remove_file(&pdev->dev, &dev_attr_connection);
+err_del_conn_component:
+	exynos_drm_component_del(&pdev->dev, EXYNOS_DEVICE_TYPE_CONNECTOR);
+err_del_crtc_component:
+	exynos_drm_component_del(&pdev->dev, EXYNOS_DEVICE_TYPE_CRTC);
+
+	return ret;
 }
 
 static int vidi_remove(struct platform_device *pdev)
 {
-	struct exynos_drm_manager *mgr = platform_get_drvdata(pdev);
-	struct vidi_context *ctx = mgr->ctx;
+	struct vidi_context *ctx = platform_get_drvdata(pdev);
 
 	if (ctx->raw_edid != (struct edid *)fake_edid_info) {
 		kfree(ctx->raw_edid);
@@ -638,6 +661,10 @@ static int vidi_remove(struct platform_device *pdev)
 		return -EINVAL;
 	}
 
+	component_del(&pdev->dev, &vidi_component_ops);
+	exynos_drm_component_del(&pdev->dev, EXYNOS_DEVICE_TYPE_CONNECTOR);
+	exynos_drm_component_del(&pdev->dev, EXYNOS_DEVICE_TYPE_CRTC);
+
 	return 0;
 }
 
@@ -668,12 +695,19 @@ int exynos_drm_probe_vidi(void)
 	return ret;
 }
 
+static int exynos_drm_remove_vidi_device(struct device *dev, void *data)
+{
+	platform_device_unregister(to_platform_device(dev));
+
+	return 0;
+}
+
 void exynos_drm_remove_vidi(void)
 {
-	struct vidi_context *ctx = vidi_manager.ctx;
-	struct exynos_drm_subdrv *subdrv = &ctx->subdrv;
-	struct platform_device *pdev = to_platform_device(subdrv->dev);
+	int ret = driver_for_each_device(&vidi_driver.driver, NULL, NULL,
+					 exynos_drm_remove_vidi_device);
+	/* silence compiler warning */
+	(void)ret;
 
 	platform_driver_unregister(&vidi_driver);
-	platform_device_unregister(pdev);
 }
diff --git a/drivers/gpu/drm/exynos/exynos_hdmi.c b/drivers/gpu/drm/exynos/exynos_hdmi.c
index 563a19e62eb2..5765a161abdd 100644
--- a/drivers/gpu/drm/exynos/exynos_hdmi.c
+++ b/drivers/gpu/drm/exynos/exynos_hdmi.c
@@ -49,7 +49,6 @@
 #include <linux/gpio.h>
 #include <media/s5p_hdmi.h>
 
-#define get_hdmi_display(dev)	platform_get_drvdata(to_platform_device(dev))
 #define ctx_from_connector(c)	container_of(c, struct hdmi_context, connector)
 
 #define HOTPLUG_DEBOUNCE_MS		1100
@@ -182,6 +181,7 @@ struct hdmi_conf_regs {
 };
 
 struct hdmi_context {
+	struct exynos_drm_display	display;
 	struct device			*dev;
 	struct drm_device		*drm_dev;
 	struct drm_connector		connector;
@@ -213,6 +213,11 @@ struct hdmi_context {
 	enum hdmi_type			type;
 };
 
+static inline struct hdmi_context *display_to_hdmi(struct exynos_drm_display *d)
+{
+	return container_of(d, struct hdmi_context, display);
+}
+
 struct hdmiphy_config {
 	int pixel_clock;
 	u8 conf[32];
@@ -1123,7 +1128,7 @@ static struct drm_connector_helper_funcs hdmi_connector_helper_funcs = {
 static int hdmi_create_connector(struct exynos_drm_display *display,
 			struct drm_encoder *encoder)
 {
-	struct hdmi_context *hdata = display->ctx;
+	struct hdmi_context *hdata = display_to_hdmi(display);
 	struct drm_connector *connector = &hdata->connector;
 	int ret;
 
@@ -2000,7 +2005,7 @@ static void hdmi_v14_mode_set(struct hdmi_context *hdata,
 static void hdmi_mode_set(struct exynos_drm_display *display,
 			struct drm_display_mode *mode)
 {
-	struct hdmi_context *hdata = display->ctx;
+	struct hdmi_context *hdata = display_to_hdmi(display);
 	struct drm_display_mode *m = mode;
 
 	DRM_DEBUG_KMS("xres=%d, yres=%d, refresh=%d, intl=%s\n",
@@ -2019,7 +2024,7 @@ static void hdmi_mode_set(struct exynos_drm_display *display,
 
 static void hdmi_commit(struct exynos_drm_display *display)
 {
-	struct hdmi_context *hdata = display->ctx;
+	struct hdmi_context *hdata = display_to_hdmi(display);
 
 	mutex_lock(&hdata->hdmi_mutex);
 	if (!hdata->powered) {
@@ -2033,7 +2038,7 @@ static void hdmi_commit(struct exynos_drm_display *display)
 
 static void hdmi_poweron(struct exynos_drm_display *display)
 {
-	struct hdmi_context *hdata = display->ctx;
+	struct hdmi_context *hdata = display_to_hdmi(display);
 	struct hdmi_resources *res = &hdata->res;
 
 	mutex_lock(&hdata->hdmi_mutex);
@@ -2064,7 +2069,7 @@ static void hdmi_poweron(struct exynos_drm_display *display)
 
 static void hdmi_poweroff(struct exynos_drm_display *display)
 {
-	struct hdmi_context *hdata = display->ctx;
+	struct hdmi_context *hdata = display_to_hdmi(display);
 	struct hdmi_resources *res = &hdata->res;
 
 	mutex_lock(&hdata->hdmi_mutex);
@@ -2099,7 +2104,7 @@ out:
 
 static void hdmi_dpms(struct exynos_drm_display *display, int mode)
 {
-	struct hdmi_context *hdata = display->ctx;
+	struct hdmi_context *hdata = display_to_hdmi(display);
 	struct drm_encoder *encoder = hdata->encoder;
 	struct drm_crtc *crtc = encoder->crtc;
 	struct drm_crtc_helper_funcs *funcs = NULL;
@@ -2143,11 +2148,6 @@ static struct exynos_drm_display_ops hdmi_display_ops = {
 	.commit		= hdmi_commit,
 };
 
-static struct exynos_drm_display hdmi_display = {
-	.type = EXYNOS_DISPLAY_TYPE_HDMI,
-	.ops = &hdmi_display_ops,
-};
-
 static void hdmi_hotplug_work_func(struct work_struct *work)
 {
 	struct hdmi_context *hdata;
@@ -2302,12 +2302,11 @@ MODULE_DEVICE_TABLE (of, hdmi_match_types);
 static int hdmi_bind(struct device *dev, struct device *master, void *data)
 {
 	struct drm_device *drm_dev = data;
-	struct hdmi_context *hdata;
+	struct hdmi_context *hdata = dev_get_drvdata(dev);
 
-	hdata = hdmi_display.ctx;
 	hdata->drm_dev = drm_dev;
 
-	return exynos_drm_create_enc_conn(drm_dev, &hdmi_display);
+	return exynos_drm_create_enc_conn(drm_dev, &hdata->display);
 }
 
 static void hdmi_unbind(struct device *dev, struct device *master, void *data)
@@ -2349,31 +2348,28 @@ static int hdmi_probe(struct platform_device *pdev)
 	struct resource *res;
 	int ret;
 
-	ret = exynos_drm_component_add(&pdev->dev, EXYNOS_DEVICE_TYPE_CONNECTOR,
-					hdmi_display.type);
-	if (ret)
-		return ret;
-
-	if (!dev->of_node) {
-		ret = -ENODEV;
-		goto err_del_component;
-	}
+	if (!dev->of_node)
+		return -ENODEV;
 
 	pdata = drm_hdmi_dt_parse_pdata(dev);
-	if (!pdata) {
-		ret = -EINVAL;
-		goto err_del_component;
-	}
+	if (!pdata)
+		return -EINVAL;
 
 	hdata = devm_kzalloc(dev, sizeof(struct hdmi_context), GFP_KERNEL);
-	if (!hdata) {
-		ret = -ENOMEM;
-		goto err_del_component;
-	}
+	if (!hdata)
+		return -ENOMEM;
+
+	hdata->display.type = EXYNOS_DISPLAY_TYPE_HDMI;
+	hdata->display.ops = &hdmi_display_ops;
+
+	ret = exynos_drm_component_add(&pdev->dev, EXYNOS_DEVICE_TYPE_CONNECTOR,
+					hdata->display.type);
+	if (ret)
+		return ret;
 
 	mutex_init(&hdata->hdmi_mutex);
 
-	platform_set_drvdata(pdev, &hdmi_display);
+	platform_set_drvdata(pdev, hdata);
 
 	match = of_match_node(hdmi_match_types, dev->of_node);
 	if (!match) {
@@ -2485,7 +2481,6 @@ out_get_phy_port:
 	}
 
 	pm_runtime_enable(dev);
-	hdmi_display.ctx = hdata;
 
 	ret = component_add(&pdev->dev, &hdmi_component_ops);
 	if (ret)
@@ -2510,7 +2505,7 @@ err_del_component:
 
 static int hdmi_remove(struct platform_device *pdev)
 {
-	struct hdmi_context *hdata = hdmi_display.ctx;
+	struct hdmi_context *hdata = platform_get_drvdata(pdev);
 
 	cancel_delayed_work_sync(&hdata->hotplug_work);
 
diff --git a/drivers/gpu/drm/exynos/exynos_mixer.c b/drivers/gpu/drm/exynos/exynos_mixer.c
index a41c84ee3a2d..820b76234ef4 100644
--- a/drivers/gpu/drm/exynos/exynos_mixer.c
+++ b/drivers/gpu/drm/exynos/exynos_mixer.c
@@ -40,8 +40,6 @@
 #include "exynos_drm_iommu.h"
 #include "exynos_mixer.h"
 
-#define get_mixer_manager(dev)	platform_get_drvdata(to_platform_device(dev))
-
 #define MIXER_WIN_NR		3
 #define MIXER_DEFAULT_WIN	0
 
@@ -86,6 +84,7 @@ enum mixer_version_id {
 };
 
 struct mixer_context {
+	struct exynos_drm_manager manager;
 	struct platform_device *pdev;
 	struct device		*dev;
 	struct drm_device	*drm_dev;
@@ -104,6 +103,11 @@ struct mixer_context {
 	atomic_t		wait_vsync_event;
 };
 
+static inline struct mixer_context *mgr_to_mixer(struct exynos_drm_manager *mgr)
+{
+	return container_of(mgr, struct mixer_context, manager);
+}
+
 struct mixer_drv_data {
 	enum mixer_version_id	version;
 	bool					is_vp_enabled;
@@ -854,7 +858,7 @@ static int mixer_initialize(struct exynos_drm_manager *mgr,
 			struct drm_device *drm_dev)
 {
 	int ret;
-	struct mixer_context *mixer_ctx = mgr->ctx;
+	struct mixer_context *mixer_ctx = mgr_to_mixer(mgr);
 	struct exynos_drm_private *priv;
 	priv = drm_dev->dev_private;
 
@@ -885,7 +889,7 @@ static int mixer_initialize(struct exynos_drm_manager *mgr,
 
 static void mixer_mgr_remove(struct exynos_drm_manager *mgr)
 {
-	struct mixer_context *mixer_ctx = mgr->ctx;
+	struct mixer_context *mixer_ctx = mgr_to_mixer(mgr);
 
 	if (is_drm_iommu_supported(mixer_ctx->drm_dev))
 		drm_iommu_detach_device(mixer_ctx->drm_dev, mixer_ctx->dev);
@@ -893,7 +897,7 @@ static void mixer_mgr_remove(struct exynos_drm_manager *mgr)
 
 static int mixer_enable_vblank(struct exynos_drm_manager *mgr)
 {
-	struct mixer_context *mixer_ctx = mgr->ctx;
+	struct mixer_context *mixer_ctx = mgr_to_mixer(mgr);
 	struct mixer_resources *res = &mixer_ctx->mixer_res;
 
 	if (!mixer_ctx->powered) {
@@ -910,7 +914,7 @@ static int mixer_enable_vblank(struct exynos_drm_manager *mgr)
 
 static void mixer_disable_vblank(struct exynos_drm_manager *mgr)
 {
-	struct mixer_context *mixer_ctx = mgr->ctx;
+	struct mixer_context *mixer_ctx = mgr_to_mixer(mgr);
 	struct mixer_resources *res = &mixer_ctx->mixer_res;
 
 	/* disable vsync interrupt */
@@ -920,7 +924,7 @@ static void mixer_disable_vblank(struct exynos_drm_manager *mgr)
 static void mixer_win_mode_set(struct exynos_drm_manager *mgr,
 			struct exynos_drm_overlay *overlay)
 {
-	struct mixer_context *mixer_ctx = mgr->ctx;
+	struct mixer_context *mixer_ctx = mgr_to_mixer(mgr);
 	struct hdmi_win_data *win_data;
 	int win;
 
@@ -971,7 +975,7 @@ static void mixer_win_mode_set(struct exynos_drm_manager *mgr,
 
 static void mixer_win_commit(struct exynos_drm_manager *mgr, int zpos)
 {
-	struct mixer_context *mixer_ctx = mgr->ctx;
+	struct mixer_context *mixer_ctx = mgr_to_mixer(mgr);
 	int win = zpos == DEFAULT_ZPOS ? MIXER_DEFAULT_WIN : zpos;
 
 	DRM_DEBUG_KMS("win: %d\n", win);
@@ -993,7 +997,7 @@ static void mixer_win_commit(struct exynos_drm_manager *mgr, int zpos)
 
 static void mixer_win_disable(struct exynos_drm_manager *mgr, int zpos)
 {
-	struct mixer_context *mixer_ctx = mgr->ctx;
+	struct mixer_context *mixer_ctx = mgr_to_mixer(mgr);
 	struct mixer_resources *res = &mixer_ctx->mixer_res;
 	int win = zpos == DEFAULT_ZPOS ? MIXER_DEFAULT_WIN : zpos;
 	unsigned long flags;
@@ -1021,7 +1025,7 @@ static void mixer_win_disable(struct exynos_drm_manager *mgr, int zpos)
 
 static void mixer_wait_for_vblank(struct exynos_drm_manager *mgr)
 {
-	struct mixer_context *mixer_ctx = mgr->ctx;
+	struct mixer_context *mixer_ctx = mgr_to_mixer(mgr);
 
 	mutex_lock(&mixer_ctx->mixer_mutex);
 	if (!mixer_ctx->powered) {
@@ -1048,7 +1052,7 @@ static void mixer_wait_for_vblank(struct exynos_drm_manager *mgr)
 
 static void mixer_window_suspend(struct exynos_drm_manager *mgr)
 {
-	struct mixer_context *ctx = mgr->ctx;
+	struct mixer_context *ctx = mgr_to_mixer(mgr);
 	struct hdmi_win_data *win_data;
 	int i;
 
@@ -1062,7 +1066,7 @@ static void mixer_window_suspend(struct exynos_drm_manager *mgr)
 
 static void mixer_window_resume(struct exynos_drm_manager *mgr)
 {
-	struct mixer_context *ctx = mgr->ctx;
+	struct mixer_context *ctx = mgr_to_mixer(mgr);
 	struct hdmi_win_data *win_data;
 	int i;
 
@@ -1077,7 +1081,7 @@ static void mixer_window_resume(struct exynos_drm_manager *mgr)
 
 static void mixer_poweron(struct exynos_drm_manager *mgr)
 {
-	struct mixer_context *ctx = mgr->ctx;
+	struct mixer_context *ctx = mgr_to_mixer(mgr);
 	struct mixer_resources *res = &ctx->mixer_res;
 
 	mutex_lock(&ctx->mixer_mutex);
@@ -1111,7 +1115,7 @@ static void mixer_poweron(struct exynos_drm_manager *mgr)
 
 static void mixer_poweroff(struct exynos_drm_manager *mgr)
 {
-	struct mixer_context *ctx = mgr->ctx;
+	struct mixer_context *ctx = mgr_to_mixer(mgr);
 	struct mixer_resources *res = &ctx->mixer_res;
 
 	mutex_lock(&ctx->mixer_mutex);
@@ -1187,11 +1191,6 @@ static struct exynos_drm_manager_ops mixer_manager_ops = {
 	.win_disable		= mixer_win_disable,
 };
 
-static struct exynos_drm_manager mixer_manager = {
-	.type			= EXYNOS_DISPLAY_TYPE_HDMI,
-	.ops			= &mixer_manager_ops,
-};
-
 static struct mixer_drv_data exynos5420_mxr_drv_data = {
 	.version = MXR_VER_128_0_0_184,
 	.is_vp_enabled = 0,
@@ -1249,48 +1248,17 @@ MODULE_DEVICE_TABLE(of, mixer_match_types);
 
 static int mixer_bind(struct device *dev, struct device *manager, void *data)
 {
-	struct platform_device *pdev = to_platform_device(dev);
+	struct mixer_context *ctx = dev_get_drvdata(dev);
 	struct drm_device *drm_dev = data;
-	struct mixer_context *ctx;
-	struct mixer_drv_data *drv;
 	int ret;
 
-	dev_info(dev, "probe start\n");
-
-	ctx = devm_kzalloc(&pdev->dev, sizeof(*ctx), GFP_KERNEL);
-	if (!ctx) {
-		DRM_ERROR("failed to alloc mixer context.\n");
-		return -ENOMEM;
-	}
-
-	mutex_init(&ctx->mixer_mutex);
-
-	if (dev->of_node) {
-		const struct of_device_id *match;
-		match = of_match_node(mixer_match_types, dev->of_node);
-		drv = (struct mixer_drv_data *)match->data;
-	} else {
-		drv = (struct mixer_drv_data *)
-			platform_get_device_id(pdev)->driver_data;
-	}
-
-	ctx->pdev = pdev;
-	ctx->dev = dev;
-	ctx->vp_enabled = drv->is_vp_enabled;
-	ctx->has_sclk = drv->has_sclk;
-	ctx->mxr_ver = drv->version;
-	init_waitqueue_head(&ctx->wait_vsync_queue);
-	atomic_set(&ctx->wait_vsync_event, 0);
-
-	mixer_manager.ctx = ctx;
-	ret = mixer_initialize(&mixer_manager, drm_dev);
+	ret = mixer_initialize(&ctx->manager, drm_dev);
 	if (ret)
 		return ret;
 
-	platform_set_drvdata(pdev, &mixer_manager);
-	ret = exynos_drm_crtc_create(&mixer_manager);
+	ret = exynos_drm_crtc_create(&ctx->manager);
 	if (ret) {
-		mixer_mgr_remove(&mixer_manager);
+		mixer_mgr_remove(&ctx->manager);
 		return ret;
 	}
 
@@ -1301,11 +1269,9 @@ static int mixer_bind(struct device *dev, struct device *manager, void *data)
 
 static void mixer_unbind(struct device *dev, struct device *master, void *data)
 {
-	struct exynos_drm_manager *mgr = dev_get_drvdata(dev);
+	struct mixer_context *ctx = dev_get_drvdata(dev);
 
-	dev_info(dev, "remove successful\n");
-
-	mixer_mgr_remove(mgr);
+	mixer_mgr_remove(&ctx->manager);
 
 	pm_runtime_disable(dev);
 }
@@ -1317,22 +1283,62 @@ static const struct component_ops mixer_component_ops = {
 
 static int mixer_probe(struct platform_device *pdev)
 {
+	struct device *dev = &pdev->dev;
+	struct mixer_drv_data *drv;
+	struct mixer_context *ctx;
 	int ret;
 
+	ctx = devm_kzalloc(&pdev->dev, sizeof(*ctx), GFP_KERNEL);
+	if (!ctx) {
+		DRM_ERROR("failed to alloc mixer context.\n");
+		return -ENOMEM;
+	}
+
+	mutex_init(&ctx->mixer_mutex);
+
+	ctx->manager.type = EXYNOS_DISPLAY_TYPE_HDMI;
+	ctx->manager.ops = &mixer_manager_ops;
+
+	if (dev->of_node) {
+		const struct of_device_id *match;
+
+		match = of_match_node(mixer_match_types, dev->of_node);
+		drv = (struct mixer_drv_data *)match->data;
+	} else {
+		drv = (struct mixer_drv_data *)
+			platform_get_device_id(pdev)->driver_data;
+	}
+
+	ctx->pdev = pdev;
+	ctx->dev = dev;
+	ctx->vp_enabled = drv->is_vp_enabled;
+	ctx->has_sclk = drv->has_sclk;
+	ctx->mxr_ver = drv->version;
+	init_waitqueue_head(&ctx->wait_vsync_queue);
+	atomic_set(&ctx->wait_vsync_event, 0);
+
+	platform_set_drvdata(pdev, ctx);
+
 	ret = exynos_drm_component_add(&pdev->dev, EXYNOS_DEVICE_TYPE_CRTC,
-					mixer_manager.type);
+					ctx->manager.type);
 	if (ret)
 		return ret;
 
 	ret = component_add(&pdev->dev, &mixer_component_ops);
-	if (ret)
+	if (ret) {
 		exynos_drm_component_del(&pdev->dev, EXYNOS_DEVICE_TYPE_CRTC);
+		return ret;
+	}
+
+	pm_runtime_enable(dev);
 
 	return ret;
 }
 
 static int mixer_remove(struct platform_device *pdev)
 {
+	pm_runtime_disable(&pdev->dev);
+
 	component_del(&pdev->dev, &mixer_component_ops);
 	exynos_drm_component_del(&pdev->dev, EXYNOS_DEVICE_TYPE_CRTC);
 
diff --git a/drivers/gpu/drm/gma500/Makefile b/drivers/gpu/drm/gma500/Makefile
index b15315576376..190e55f2f891 100644
--- a/drivers/gpu/drm/gma500/Makefile
+++ b/drivers/gpu/drm/gma500/Makefile
@@ -39,6 +39,7 @@ gma500_gfx-$(CONFIG_DRM_GMA3600) +=  cdv_device.o \
 gma500_gfx-$(CONFIG_DRM_GMA600) += oaktrail_device.o \
 	  oaktrail_crtc.o \
 	  oaktrail_lvds.o \
+	  oaktrail_lvds_i2c.o \
 	  oaktrail_hdmi.o \
 	  oaktrail_hdmi_i2c.o
 
diff --git a/drivers/gpu/drm/gma500/cdv_intel_dp.c b/drivers/gpu/drm/gma500/cdv_intel_dp.c
index 9f158eab517a..0fafb8e2483a 100644
--- a/drivers/gpu/drm/gma500/cdv_intel_dp.c
+++ b/drivers/gpu/drm/gma500/cdv_intel_dp.c
@@ -37,6 +37,201 @@
 #include "gma_display.h"
 #include <drm/drm_dp_helper.h>
 
+/**
+ * struct i2c_algo_dp_aux_data - driver interface structure for i2c over dp
+ * 				 aux algorithm
+ * @running: set by the algo indicating whether an i2c is ongoing or whether
+ * 	     the i2c bus is quiescent
+ * @address: i2c target address for the currently ongoing transfer
+ * @aux_ch: driver callback to transfer a single byte of the i2c payload
+ */
+struct i2c_algo_dp_aux_data {
+	bool running;
+	u16 address;
+	int (*aux_ch) (struct i2c_adapter *adapter,
+		       int mode, uint8_t write_byte,
+		       uint8_t *read_byte);
+};
+
+/* Run a single AUX_CH I2C transaction, writing/reading data as necessary */
+static int
+i2c_algo_dp_aux_transaction(struct i2c_adapter *adapter, int mode,
+			    uint8_t write_byte, uint8_t *read_byte)
+{
+	struct i2c_algo_dp_aux_data *algo_data = adapter->algo_data;
+	int ret;
+
+	ret = (*algo_data->aux_ch)(adapter, mode,
+				   write_byte, read_byte);
+	return ret;
+}
+
+/*
+ * I2C over AUX CH
+ */
+
+/*
+ * Send the address. If the I2C link is running, this 'restarts'
+ * the connection with the new address, this is used for doing
+ * a write followed by a read (as needed for DDC)
+ */
+static int
+i2c_algo_dp_aux_address(struct i2c_adapter *adapter, u16 address, bool reading)
+{
+	struct i2c_algo_dp_aux_data *algo_data = adapter->algo_data;
+	int mode = MODE_I2C_START;
+	int ret;
+
+	if (reading)
+		mode |= MODE_I2C_READ;
+	else
+		mode |= MODE_I2C_WRITE;
+	algo_data->address = address;
+	algo_data->running = true;
+	ret = i2c_algo_dp_aux_transaction(adapter, mode, 0, NULL);
+	return ret;
+}
+
+/*
+ * Stop the I2C transaction. This closes out the link, sending
+ * a bare address packet with the MOT bit turned off
+ */
+static void
+i2c_algo_dp_aux_stop(struct i2c_adapter *adapter, bool reading)
+{
+	struct i2c_algo_dp_aux_data *algo_data = adapter->algo_data;
+	int mode = MODE_I2C_STOP;
+
+	if (reading)
+		mode |= MODE_I2C_READ;
+	else
+		mode |= MODE_I2C_WRITE;
+	if (algo_data->running) {
+		(void) i2c_algo_dp_aux_transaction(adapter, mode, 0, NULL);
+		algo_data->running = false;
+	}
+}
+
+/*
+ * Write a single byte to the current I2C address, the
+ * the I2C link must be running or this returns -EIO
+ */
+static int
+i2c_algo_dp_aux_put_byte(struct i2c_adapter *adapter, u8 byte)
+{
+	struct i2c_algo_dp_aux_data *algo_data = adapter->algo_data;
+	int ret;
+
+	if (!algo_data->running)
+		return -EIO;
+
+	ret = i2c_algo_dp_aux_transaction(adapter, MODE_I2C_WRITE, byte, NULL);
+	return ret;
+}
+
+/*
+ * Read a single byte from the current I2C address, the
+ * I2C link must be running or this returns -EIO
+ */
+static int
+i2c_algo_dp_aux_get_byte(struct i2c_adapter *adapter, u8 *byte_ret)
+{
+	struct i2c_algo_dp_aux_data *algo_data = adapter->algo_data;
+	int ret;
+
+	if (!algo_data->running)
+		return -EIO;
+
+	ret = i2c_algo_dp_aux_transaction(adapter, MODE_I2C_READ, 0, byte_ret);
+	return ret;
+}
+
+static int
+i2c_algo_dp_aux_xfer(struct i2c_adapter *adapter,
+		     struct i2c_msg *msgs,
+		     int num)
+{
+	int ret = 0;
+	bool reading = false;
+	int m;
+	int b;
+
+	for (m = 0; m < num; m++) {
+		u16 len = msgs[m].len;
+		u8 *buf = msgs[m].buf;
+		reading = (msgs[m].flags & I2C_M_RD) != 0;
+		ret = i2c_algo_dp_aux_address(adapter, msgs[m].addr, reading);
+		if (ret < 0)
+			break;
+		if (reading) {
+			for (b = 0; b < len; b++) {
+				ret = i2c_algo_dp_aux_get_byte(adapter, &buf[b]);
+				if (ret < 0)
+					break;
+			}
+		} else {
+			for (b = 0; b < len; b++) {
+				ret = i2c_algo_dp_aux_put_byte(adapter, buf[b]);
+				if (ret < 0)
+					break;
+			}
+		}
+		if (ret < 0)
+			break;
+	}
+	if (ret >= 0)
+		ret = num;
+	i2c_algo_dp_aux_stop(adapter, reading);
+	DRM_DEBUG_KMS("dp_aux_xfer return %d\n", ret);
+	return ret;
+}
+
+static u32
+i2c_algo_dp_aux_functionality(struct i2c_adapter *adapter)
+{
+	return I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL |
+	       I2C_FUNC_SMBUS_READ_BLOCK_DATA |
+	       I2C_FUNC_SMBUS_BLOCK_PROC_CALL |
+	       I2C_FUNC_10BIT_ADDR;
+}
+
+static const struct i2c_algorithm i2c_dp_aux_algo = {
+	.master_xfer	= i2c_algo_dp_aux_xfer,
+	.functionality	= i2c_algo_dp_aux_functionality,
+};
+
+static void
+i2c_dp_aux_reset_bus(struct i2c_adapter *adapter)
+{
+	(void) i2c_algo_dp_aux_address(adapter, 0, false);
+	(void) i2c_algo_dp_aux_stop(adapter, false);
+}
+
+static int
+i2c_dp_aux_prepare_bus(struct i2c_adapter *adapter)
+{
+	adapter->algo = &i2c_dp_aux_algo;
+	adapter->retries = 3;
+	i2c_dp_aux_reset_bus(adapter);
+	return 0;
+}
+
+/*
+ * FIXME: This is the old dp aux helper, gma500 is the last driver that needs to
+ * be ported over to the new helper code in drm_dp_helper.c like i915 or radeon.
+ */
+static int __deprecated
+i2c_dp_aux_add_bus(struct i2c_adapter *adapter)
+{
+	int error;
+
+	error = i2c_dp_aux_prepare_bus(adapter);
+	if (error)
+		return error;
+	error = i2c_add_adapter(adapter);
+	return error;
+}
+
 #define _wait_for(COND, MS, W) ({ \
         unsigned long timeout__ = jiffies + msecs_to_jiffies(MS);       \
         int ret__ = 0;                                                  \
diff --git a/drivers/gpu/drm/gma500/mdfld_dsi_pkg_sender.c b/drivers/gpu/drm/gma500/mdfld_dsi_pkg_sender.c
index 87885d8c06e8..6b43ae3ffd73 100644
--- a/drivers/gpu/drm/gma500/mdfld_dsi_pkg_sender.c
+++ b/drivers/gpu/drm/gma500/mdfld_dsi_pkg_sender.c
@@ -25,6 +25,7 @@
  */
 
 #include <linux/freezer.h>
+#include <video/mipi_display.h>
 
 #include "mdfld_dsi_output.h"
 #include "mdfld_dsi_pkg_sender.h"
@@ -32,20 +33,6 @@
 
 #define MDFLD_DSI_READ_MAX_COUNT		5000
 
-enum data_type {
-	DSI_DT_GENERIC_SHORT_WRITE_0	= 0x03,
-	DSI_DT_GENERIC_SHORT_WRITE_1	= 0x13,
-	DSI_DT_GENERIC_SHORT_WRITE_2	= 0x23,
-	DSI_DT_GENERIC_READ_0		= 0x04,
-	DSI_DT_GENERIC_READ_1		= 0x14,
-	DSI_DT_GENERIC_READ_2		= 0x24,
-	DSI_DT_GENERIC_LONG_WRITE	= 0x29,
-	DSI_DT_DCS_SHORT_WRITE_0	= 0x05,
-	DSI_DT_DCS_SHORT_WRITE_1	= 0x15,
-	DSI_DT_DCS_READ			= 0x06,
-	DSI_DT_DCS_LONG_WRITE		= 0x39,
-};
-
 enum {
 	MDFLD_DSI_PANEL_MODE_SLEEP = 0x1,
 };
@@ -321,9 +308,9 @@ static int send_pkg_prepare(struct mdfld_dsi_pkg_sender *sender, u8 data_type,
 	u8 cmd;
 
 	switch (data_type) {
-	case DSI_DT_DCS_SHORT_WRITE_0:
-	case DSI_DT_DCS_SHORT_WRITE_1:
-	case DSI_DT_DCS_LONG_WRITE:
+	case MIPI_DSI_DCS_SHORT_WRITE:
+	case MIPI_DSI_DCS_SHORT_WRITE_PARAM:
+	case MIPI_DSI_DCS_LONG_WRITE:
 		cmd = *data;
 		break;
 	default:
@@ -334,12 +321,12 @@ static int send_pkg_prepare(struct mdfld_dsi_pkg_sender *sender, u8 data_type,
 	sender->status = MDFLD_DSI_PKG_SENDER_BUSY;
 
 	/*wait for 120 milliseconds in case exit_sleep_mode just be sent*/
-	if (unlikely(cmd == DCS_ENTER_SLEEP_MODE)) {
+	if (unlikely(cmd == MIPI_DCS_ENTER_SLEEP_MODE)) {
 		/*TODO: replace it with msleep later*/
 		mdelay(120);
 	}
 
-	if (unlikely(cmd == DCS_EXIT_SLEEP_MODE)) {
+	if (unlikely(cmd == MIPI_DCS_EXIT_SLEEP_MODE)) {
 		/*TODO: replace it with msleep later*/
 		mdelay(120);
 	}
@@ -352,9 +339,9 @@ static int send_pkg_done(struct mdfld_dsi_pkg_sender *sender, u8 data_type,
 	u8 cmd;
 
 	switch (data_type) {
-	case DSI_DT_DCS_SHORT_WRITE_0:
-	case DSI_DT_DCS_SHORT_WRITE_1:
-	case DSI_DT_DCS_LONG_WRITE:
+	case MIPI_DSI_DCS_SHORT_WRITE:
+	case MIPI_DSI_DCS_SHORT_WRITE_PARAM:
+	case MIPI_DSI_DCS_LONG_WRITE:
 		cmd = *data;
 		break;
 	default:
@@ -362,15 +349,15 @@ static int send_pkg_done(struct mdfld_dsi_pkg_sender *sender, u8 data_type,
 	}
 
 	/*update panel status*/
-	if (unlikely(cmd == DCS_ENTER_SLEEP_MODE)) {
+	if (unlikely(cmd == MIPI_DCS_ENTER_SLEEP_MODE)) {
 		sender->panel_mode |= MDFLD_DSI_PANEL_MODE_SLEEP;
 		/*TODO: replace it with msleep later*/
 		mdelay(120);
-	} else if (unlikely(cmd == DCS_EXIT_SLEEP_MODE)) {
+	} else if (unlikely(cmd == MIPI_DCS_EXIT_SLEEP_MODE)) {
 		sender->panel_mode &= ~MDFLD_DSI_PANEL_MODE_SLEEP;
 		/*TODO: replace it with msleep later*/
 		mdelay(120);
-	} else if (unlikely(cmd == DCS_SOFT_RESET)) {
+	} else if (unlikely(cmd == MIPI_DCS_SOFT_RESET)) {
 		/*TODO: replace it with msleep later*/
 		mdelay(5);
 	}
@@ -405,19 +392,19 @@ static int send_pkg(struct mdfld_dsi_pkg_sender *sender, u8 data_type,
 	}
 
 	switch (data_type) {
-	case DSI_DT_GENERIC_SHORT_WRITE_0:
-	case DSI_DT_GENERIC_SHORT_WRITE_1:
-	case DSI_DT_GENERIC_SHORT_WRITE_2:
-	case DSI_DT_GENERIC_READ_0:
-	case DSI_DT_GENERIC_READ_1:
-	case DSI_DT_GENERIC_READ_2:
-	case DSI_DT_DCS_SHORT_WRITE_0:
-	case DSI_DT_DCS_SHORT_WRITE_1:
-	case DSI_DT_DCS_READ:
+	case MIPI_DSI_GENERIC_SHORT_WRITE_0_PARAM:
+	case MIPI_DSI_GENERIC_SHORT_WRITE_1_PARAM:
+	case MIPI_DSI_GENERIC_SHORT_WRITE_2_PARAM:
+	case MIPI_DSI_GENERIC_READ_REQUEST_0_PARAM:
+	case MIPI_DSI_GENERIC_READ_REQUEST_1_PARAM:
+	case MIPI_DSI_GENERIC_READ_REQUEST_2_PARAM:
+	case MIPI_DSI_DCS_SHORT_WRITE:
+	case MIPI_DSI_DCS_SHORT_WRITE_PARAM:
+	case MIPI_DSI_DCS_READ:
 		ret = send_short_pkg(sender, data_type, data[0], data[1], hs);
 		break;
-	case DSI_DT_GENERIC_LONG_WRITE:
-	case DSI_DT_DCS_LONG_WRITE:
+	case MIPI_DSI_GENERIC_LONG_WRITE:
+	case MIPI_DSI_DCS_LONG_WRITE:
 		ret = send_long_pkg(sender, data_type, data, len, hs);
 		break;
 	}
@@ -440,7 +427,7 @@ int mdfld_dsi_send_mcs_long(struct mdfld_dsi_pkg_sender *sender, u8 *data,
 	}
 
 	spin_lock_irqsave(&sender->lock, flags);
-	send_pkg(sender, DSI_DT_DCS_LONG_WRITE, data, len, hs);
+	send_pkg(sender, MIPI_DSI_DCS_LONG_WRITE, data, len, hs);
 	spin_unlock_irqrestore(&sender->lock, flags);
 
 	return 0;
@@ -461,10 +448,10 @@ int mdfld_dsi_send_mcs_short(struct mdfld_dsi_pkg_sender *sender, u8 cmd,
 	data[0] = cmd;
 
 	if (param_num) {
-		data_type = DSI_DT_DCS_SHORT_WRITE_1;
+		data_type = MIPI_DSI_DCS_SHORT_WRITE_PARAM;
 		data[1] = param;
 	} else {
-		data_type = DSI_DT_DCS_SHORT_WRITE_0;
+		data_type = MIPI_DSI_DCS_SHORT_WRITE;
 		data[1] = 0;
 	}
 
@@ -489,17 +476,17 @@ int mdfld_dsi_send_gen_short(struct mdfld_dsi_pkg_sender *sender, u8 param0,
 
 	switch (param_num) {
 	case 0:
-		data_type = DSI_DT_GENERIC_SHORT_WRITE_0;
+		data_type = MIPI_DSI_GENERIC_SHORT_WRITE_0_PARAM;
 		data[0] = 0;
 		data[1] = 0;
 		break;
 	case 1:
-		data_type = DSI_DT_GENERIC_SHORT_WRITE_1;
+		data_type = MIPI_DSI_GENERIC_SHORT_WRITE_1_PARAM;
 		data[0] = param0;
 		data[1] = 0;
 		break;
 	case 2:
-		data_type = DSI_DT_GENERIC_SHORT_WRITE_2;
+		data_type = MIPI_DSI_GENERIC_SHORT_WRITE_2_PARAM;
 		data[0] = param0;
 		data[1] = param1;
 		break;
@@ -523,7 +510,7 @@ int mdfld_dsi_send_gen_long(struct mdfld_dsi_pkg_sender *sender, u8 *data,
 	}
 
 	spin_lock_irqsave(&sender->lock, flags);
-	send_pkg(sender, DSI_DT_GENERIC_LONG_WRITE, data, len, hs);
+	send_pkg(sender, MIPI_DSI_GENERIC_LONG_WRITE, data, len, hs);
 	spin_unlock_irqrestore(&sender->lock, flags);
 
 	return 0;
@@ -594,7 +581,7 @@ int mdfld_dsi_read_mcs(struct mdfld_dsi_pkg_sender *sender, u8 cmd,
 		return -EINVAL;
 	}
 
-	return __read_panel_data(sender, DSI_DT_DCS_READ, &cmd, 1,
+	return __read_panel_data(sender, MIPI_DSI_DCS_READ, &cmd, 1,
 				data, len, hs);
 }
 
diff --git a/drivers/gpu/drm/gma500/mdfld_dsi_pkg_sender.h b/drivers/gpu/drm/gma500/mdfld_dsi_pkg_sender.h
index 459cd7ea8b81..0478a21c15d5 100644
--- a/drivers/gpu/drm/gma500/mdfld_dsi_pkg_sender.h
+++ b/drivers/gpu/drm/gma500/mdfld_dsi_pkg_sender.h
@@ -62,18 +62,6 @@ struct mdfld_dsi_pkg_sender {
 	u32 mipi_cmd_len_reg;
 };
 
-/* DCS definitions */
-#define DCS_SOFT_RESET			0x01
-#define DCS_ENTER_SLEEP_MODE		0x10
-#define DCS_EXIT_SLEEP_MODE		0x11
-#define DCS_SET_DISPLAY_OFF		0x28
-#define DCS_SET_DISPLAY_ON		0x29
-#define DCS_SET_COLUMN_ADDRESS		0x2a
-#define DCS_SET_PAGE_ADDRESS		0x2b
-#define DCS_WRITE_MEM_START		0x2c
-#define DCS_SET_TEAR_OFF		0x34
-#define DCS_SET_TEAR_ON			0x35
-
 extern int mdfld_dsi_pkg_sender_init(struct mdfld_dsi_connector *dsi_connector,
 					int pipe);
 extern void mdfld_dsi_pkg_sender_destroy(struct mdfld_dsi_pkg_sender *sender);
diff --git a/drivers/gpu/drm/gma500/oaktrail_lvds.c b/drivers/gpu/drm/gma500/oaktrail_lvds.c
index 0d39da6e8b7a..83bbc271bcfb 100644
--- a/drivers/gpu/drm/gma500/oaktrail_lvds.c
+++ b/drivers/gpu/drm/gma500/oaktrail_lvds.c
@@ -359,22 +359,26 @@ void oaktrail_lvds_init(struct drm_device *dev,
 	 *    if closed, act like it's not there for now
 	 */
 
+	edid = NULL;
 	mutex_lock(&dev->mode_config.mutex);
 	i2c_adap = i2c_get_adapter(dev_priv->ops->i2c_bus);
-	if (i2c_adap == NULL)
-		dev_err(dev->dev, "No ddc adapter available!\n");
+	if (i2c_adap)
+		edid = drm_get_edid(connector, i2c_adap);
+	if (edid == NULL && dev_priv->lpc_gpio_base) {
+		oaktrail_lvds_i2c_init(encoder);
+		if (gma_encoder->ddc_bus != NULL) {
+			i2c_adap = &gma_encoder->ddc_bus->adapter;
+			edid = drm_get_edid(connector, i2c_adap);
+		}
+	}
 	/*
 	 * Attempt to get the fixed panel mode from DDC.  Assume that the
 	 * preferred mode is the right one.
 	 */
-	if (i2c_adap) {
-		edid = drm_get_edid(connector, i2c_adap);
-		if (edid) {
-			drm_mode_connector_update_edid_property(connector,
-									edid);
-			drm_add_edid_modes(connector, edid);
-			kfree(edid);
-		}
+	if (edid) {
+		drm_mode_connector_update_edid_property(connector, edid);
+		drm_add_edid_modes(connector, edid);
+		kfree(edid);
 
 		list_for_each_entry(scan, &connector->probed_modes, head) {
 			if (scan->type & DRM_MODE_TYPE_PREFERRED) {
@@ -383,7 +387,8 @@ void oaktrail_lvds_init(struct drm_device *dev,
 				goto out;	/* FIXME: check for quirks */
 			}
 		}
-	}
+	} else
+		dev_err(dev->dev, "No ddc adapter available!\n");
 	/*
 	 * If we didn't get EDID, try geting panel timing
 	 * from configuration data
@@ -411,8 +416,10 @@ failed_find:
 	mutex_unlock(&dev->mode_config.mutex);
 
 	dev_dbg(dev->dev, "No LVDS modes found, disabling.\n");
-	if (gma_encoder->ddc_bus)
+	if (gma_encoder->ddc_bus) {
 		psb_intel_i2c_destroy(gma_encoder->ddc_bus);
+		gma_encoder->ddc_bus = NULL;
+	}
 
 /* failed_ddc: */
 
diff --git a/drivers/gpu/drm/gma500/oaktrail_lvds_i2c.c b/drivers/gpu/drm/gma500/oaktrail_lvds_i2c.c
new file mode 100644
index 000000000000..f913a62eee5f
--- /dev/null
+++ b/drivers/gpu/drm/gma500/oaktrail_lvds_i2c.c
@@ -0,0 +1,170 @@
+/*
+ * Copyright (c) 2002-2010, Intel Corporation.
+ * Copyright (c) 2014 ATRON electronic GmbH
+ *   Author: Jan Safrata <jan.nikitenko@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/types.h>
+#include <linux/i2c.h>
+#include <linux/i2c-algo-bit.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/delay.h>
+
+#include <drm/drmP.h>
+#include "psb_drv.h"
+#include "psb_intel_reg.h"
+
+
+/*
+ * LPC GPIO based I2C bus for LVDS of Atom E6xx
+ */
+
+/*-----------------------------------------------------------------------------
+ * LPC Register Offsets. Used for LVDS GPIO Bit Bashing. Registers are part
+ * Atom E6xx [D31:F0]
+ ----------------------------------------------------------------------------*/
+#define RGEN    0x20
+#define RGIO    0x24
+#define RGLVL   0x28
+#define RGTPE   0x2C
+#define RGTNE   0x30
+#define RGGPE   0x34
+#define RGSMI   0x38
+#define RGTS    0x3C
+
+/* The LVDS GPIO clock lines are GPIOSUS[3]
+ * The LVDS GPIO data lines are GPIOSUS[4]
+ */
+#define GPIO_CLOCK	0x08
+#define GPIO_DATA	0x10
+
+#define LPC_READ_REG(chan, r) inl((chan)->reg + (r))
+#define LPC_WRITE_REG(chan, r, val) outl((val), (chan)->reg + (r))
+
+static int get_clock(void *data)
+{
+	struct psb_intel_i2c_chan *chan = data;
+	u32 val, tmp;
+
+	val = LPC_READ_REG(chan, RGIO);
+	val |= GPIO_CLOCK;
+	LPC_WRITE_REG(chan, RGIO, val);
+	tmp = LPC_READ_REG(chan, RGLVL);
+	val = (LPC_READ_REG(chan, RGLVL) & GPIO_CLOCK) ? 1 : 0;
+
+	return val;
+}
+
+static int get_data(void *data)
+{
+	struct psb_intel_i2c_chan *chan = data;
+	u32 val, tmp;
+
+	val = LPC_READ_REG(chan, RGIO);
+	val |= GPIO_DATA;
+	LPC_WRITE_REG(chan, RGIO, val);
+	tmp = LPC_READ_REG(chan, RGLVL);
+	val = (LPC_READ_REG(chan, RGLVL) & GPIO_DATA) ? 1 : 0;
+
+	return val;
+}
+
+static void set_clock(void *data, int state_high)
+{
+	struct psb_intel_i2c_chan *chan = data;
+	u32 val;
+
+	if (state_high) {
+		val = LPC_READ_REG(chan, RGIO);
+		val |= GPIO_CLOCK;
+		LPC_WRITE_REG(chan, RGIO, val);
+	} else {
+		val = LPC_READ_REG(chan, RGIO);
+		val &= ~GPIO_CLOCK;
+		LPC_WRITE_REG(chan, RGIO, val);
+		val = LPC_READ_REG(chan, RGLVL);
+		val &= ~GPIO_CLOCK;
+		LPC_WRITE_REG(chan, RGLVL, val);
+	}
+}
+
+static void set_data(void *data, int state_high)
+{
+	struct psb_intel_i2c_chan *chan = data;
+	u32 val;
+
+	if (state_high) {
+		val = LPC_READ_REG(chan, RGIO);
+		val |= GPIO_DATA;
+		LPC_WRITE_REG(chan, RGIO, val);
+	} else {
+		val = LPC_READ_REG(chan, RGIO);
+		val &= ~GPIO_DATA;
+		LPC_WRITE_REG(chan, RGIO, val);
+		val = LPC_READ_REG(chan, RGLVL);
+		val &= ~GPIO_DATA;
+		LPC_WRITE_REG(chan, RGLVL, val);
+	}
+}
+
+void oaktrail_lvds_i2c_init(struct drm_encoder *encoder)
+{
+	struct drm_device *dev = encoder->dev;
+	struct gma_encoder *gma_encoder = to_gma_encoder(encoder);
+	struct drm_psb_private *dev_priv = dev->dev_private;
+	struct psb_intel_i2c_chan *chan;
+
+	chan = kzalloc(sizeof(struct psb_intel_i2c_chan), GFP_KERNEL);
+	if (!chan)
+		return;
+
+	chan->drm_dev = dev;
+	chan->reg = dev_priv->lpc_gpio_base;
+	strncpy(chan->adapter.name, "gma500 LPC",  I2C_NAME_SIZE - 1);
+	chan->adapter.owner = THIS_MODULE;
+	chan->adapter.algo_data = &chan->algo;
+	chan->adapter.dev.parent = &dev->pdev->dev;
+	chan->algo.setsda = set_data;
+	chan->algo.setscl = set_clock;
+	chan->algo.getsda = get_data;
+	chan->algo.getscl = get_clock;
+	chan->algo.udelay = 100;
+	chan->algo.timeout = usecs_to_jiffies(2200);
+	chan->algo.data = chan;
+
+	i2c_set_adapdata(&chan->adapter, chan);
+
+	set_data(chan, 1);
+	set_clock(chan, 1);
+	udelay(50);
+
+	if (i2c_bit_add_bus(&chan->adapter)) {
+		kfree(chan);
+		return;
+	}
+
+	gma_encoder->ddc_bus = chan;
+}
diff --git a/drivers/gpu/drm/gma500/psb_drv.c b/drivers/gpu/drm/gma500/psb_drv.c
index 6ec3a905fdd2..92e7e5795398 100644
--- a/drivers/gpu/drm/gma500/psb_drv.c
+++ b/drivers/gpu/drm/gma500/psb_drv.c
@@ -212,6 +212,8 @@ static int psb_driver_unload(struct drm_device *dev)
 		}
 		if (dev_priv->aux_pdev)
 			pci_dev_put(dev_priv->aux_pdev);
+		if (dev_priv->lpc_pdev)
+			pci_dev_put(dev_priv->lpc_pdev);
 
 		/* Destroy VBT data */
 		psb_intel_destroy_bios(dev);
@@ -280,6 +282,24 @@ static int psb_driver_load(struct drm_device *dev, unsigned long flags)
 			DRM_DEBUG_KMS("Couldn't find aux pci device");
 		}
 		dev_priv->gmbus_reg = dev_priv->aux_reg;
+
+		dev_priv->lpc_pdev = pci_get_bus_and_slot(0, PCI_DEVFN(31, 0));
+		if (dev_priv->lpc_pdev) {
+			pci_read_config_word(dev_priv->lpc_pdev, PSB_LPC_GBA,
+				&dev_priv->lpc_gpio_base);
+			pci_write_config_dword(dev_priv->lpc_pdev, PSB_LPC_GBA,
+				(u32)dev_priv->lpc_gpio_base | (1L<<31));
+			pci_read_config_word(dev_priv->lpc_pdev, PSB_LPC_GBA,
+				&dev_priv->lpc_gpio_base);
+			dev_priv->lpc_gpio_base &= 0xffc0;
+			if (dev_priv->lpc_gpio_base)
+				DRM_DEBUG_KMS("Found LPC GPIO at 0x%04x\n",
+						dev_priv->lpc_gpio_base);
+			else {
+				pci_dev_put(dev_priv->lpc_pdev);
+				dev_priv->lpc_pdev = NULL;
+			}
+		}
 	} else {
 		dev_priv->gmbus_reg = dev_priv->vdc_reg;
 	}
diff --git a/drivers/gpu/drm/gma500/psb_drv.h b/drivers/gpu/drm/gma500/psb_drv.h
index 55ebe2bd88dd..e38057b91865 100644
--- a/drivers/gpu/drm/gma500/psb_drv.h
+++ b/drivers/gpu/drm/gma500/psb_drv.h
@@ -83,6 +83,7 @@ enum {
 #define PSB_PGETBL_CTL		 0x2020
 #define _PSB_PGETBL_ENABLED	 0x00000001
 #define PSB_SGX_2D_SLAVE_PORT	 0x4000
+#define PSB_LPC_GBA		 0x44
 
 /* TODO: To get rid of */
 #define PSB_TT_PRIV0_LIMIT	 (256*1024*1024)
@@ -441,6 +442,7 @@ struct psb_ops;
 struct drm_psb_private {
 	struct drm_device *dev;
 	struct pci_dev *aux_pdev; /* Currently only used by mrst */
+	struct pci_dev *lpc_pdev; /* Currently only used by mrst */
 	const struct psb_ops *ops;
 	const struct psb_offset *regmap;
 	
@@ -470,6 +472,7 @@ struct drm_psb_private {
 	uint8_t __iomem *sgx_reg;
 	uint8_t __iomem *vdc_reg;
 	uint8_t __iomem *aux_reg; /* Auxillary vdc pipe regs */
+	uint16_t lpc_gpio_base;
 	uint32_t gatt_free_offset;
 
 	/* Fencing / irq */
diff --git a/drivers/gpu/drm/gma500/psb_intel_display.c b/drivers/gpu/drm/gma500/psb_intel_display.c
index 87b50ba64ed4..b21a09451d1d 100644
--- a/drivers/gpu/drm/gma500/psb_intel_display.c
+++ b/drivers/gpu/drm/gma500/psb_intel_display.c
@@ -21,6 +21,7 @@
 #include <linux/i2c.h>
 
 #include <drm/drmP.h>
+#include <drm/drm_plane_helper.h>
 #include "framebuffer.h"
 #include "psb_drv.h"
 #include "psb_intel_drv.h"
diff --git a/drivers/gpu/drm/gma500/psb_intel_drv.h b/drivers/gpu/drm/gma500/psb_intel_drv.h
index 336bd3aa1a06..860dd2177ca1 100644
--- a/drivers/gpu/drm/gma500/psb_intel_drv.h
+++ b/drivers/gpu/drm/gma500/psb_intel_drv.h
@@ -223,6 +223,7 @@ extern void oaktrail_lvds_init(struct drm_device *dev,
 extern void oaktrail_wait_for_INTR_PKT_SENT(struct drm_device *dev);
 extern void oaktrail_dsi_init(struct drm_device *dev,
 			   struct psb_intel_mode_device *mode_dev);
+extern void oaktrail_lvds_i2c_init(struct drm_encoder *encoder);
 extern void mid_dsi_init(struct drm_device *dev,
 		    struct psb_intel_mode_device *mode_dev, int dsi_num);
 
diff --git a/drivers/gpu/drm/gma500/psb_intel_sdvo.c b/drivers/gpu/drm/gma500/psb_intel_sdvo.c
index 0be96fdb5e28..58529cea575d 100644
--- a/drivers/gpu/drm/gma500/psb_intel_sdvo.c
+++ b/drivers/gpu/drm/gma500/psb_intel_sdvo.c
@@ -1631,57 +1631,8 @@ static int psb_intel_sdvo_get_modes(struct drm_connector *connector)
 	return !list_empty(&connector->probed_modes);
 }
 
-static void
-psb_intel_sdvo_destroy_enhance_property(struct drm_connector *connector)
-{
-	struct psb_intel_sdvo_connector *psb_intel_sdvo_connector = to_psb_intel_sdvo_connector(connector);
-	struct drm_device *dev = connector->dev;
-
-	if (psb_intel_sdvo_connector->left)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->left);
-	if (psb_intel_sdvo_connector->right)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->right);
-	if (psb_intel_sdvo_connector->top)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->top);
-	if (psb_intel_sdvo_connector->bottom)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->bottom);
-	if (psb_intel_sdvo_connector->hpos)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->hpos);
-	if (psb_intel_sdvo_connector->vpos)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->vpos);
-	if (psb_intel_sdvo_connector->saturation)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->saturation);
-	if (psb_intel_sdvo_connector->contrast)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->contrast);
-	if (psb_intel_sdvo_connector->hue)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->hue);
-	if (psb_intel_sdvo_connector->sharpness)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->sharpness);
-	if (psb_intel_sdvo_connector->flicker_filter)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->flicker_filter);
-	if (psb_intel_sdvo_connector->flicker_filter_2d)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->flicker_filter_2d);
-	if (psb_intel_sdvo_connector->flicker_filter_adaptive)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->flicker_filter_adaptive);
-	if (psb_intel_sdvo_connector->tv_luma_filter)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->tv_luma_filter);
-	if (psb_intel_sdvo_connector->tv_chroma_filter)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->tv_chroma_filter);
-	if (psb_intel_sdvo_connector->dot_crawl)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->dot_crawl);
-	if (psb_intel_sdvo_connector->brightness)
-		drm_property_destroy(dev, psb_intel_sdvo_connector->brightness);
-}
-
 static void psb_intel_sdvo_destroy(struct drm_connector *connector)
 {
-	struct psb_intel_sdvo_connector *psb_intel_sdvo_connector = to_psb_intel_sdvo_connector(connector);
-
-	if (psb_intel_sdvo_connector->tv_format)
-		drm_property_destroy(connector->dev,
-				     psb_intel_sdvo_connector->tv_format);
-
-	psb_intel_sdvo_destroy_enhance_property(connector);
 	drm_connector_unregister(connector);
 	drm_connector_cleanup(connector);
 	kfree(connector);
diff --git a/drivers/gpu/drm/i2c/Kconfig b/drivers/gpu/drm/i2c/Kconfig
index 4d341db462a2..22c7ed63a001 100644
--- a/drivers/gpu/drm/i2c/Kconfig
+++ b/drivers/gpu/drm/i2c/Kconfig
@@ -1,6 +1,12 @@
 menu "I2C encoder or helper chips"
      depends on DRM && DRM_KMS_HELPER && I2C
 
+config DRM_I2C_ADV7511
+	tristate "AV7511 encoder"
+	select REGMAP_I2C
+	help
+	  Support for the Analog Device ADV7511(W) and ADV7513 HDMI encoders.
+
 config DRM_I2C_CH7006
 	tristate "Chrontel ch7006 TV encoder"
 	default m if DRM_NOUVEAU
diff --git a/drivers/gpu/drm/i2c/Makefile b/drivers/gpu/drm/i2c/Makefile
index 43aa33baebed..2c72eb584ab7 100644
--- a/drivers/gpu/drm/i2c/Makefile
+++ b/drivers/gpu/drm/i2c/Makefile
@@ -1,5 +1,7 @@
 ccflags-y := -Iinclude/drm
 
+obj-$(CONFIG_DRM_I2C_ADV7511) += adv7511.o
+
 ch7006-y := ch7006_drv.o ch7006_mode.o
 obj-$(CONFIG_DRM_I2C_CH7006) += ch7006.o
 
diff --git a/drivers/gpu/drm/i2c/adv7511.c b/drivers/gpu/drm/i2c/adv7511.c
new file mode 100644
index 000000000000..faf1c0c5ab2e
--- /dev/null
+++ b/drivers/gpu/drm/i2c/adv7511.c
@@ -0,0 +1,1010 @@
+/*
+ * Analog Devices ADV7511 HDMI transmitter driver
+ *
+ * Copyright 2012 Analog Devices Inc.
+ *
+ * Licensed under the GPL-2.
+ */
+
+#include <linux/device.h>
+#include <linux/gpio/consumer.h>
+#include <linux/i2c.h>
+#include <linux/module.h>
+#include <linux/regmap.h>
+#include <linux/slab.h>
+
+#include <drm/drmP.h>
+#include <drm/drm_crtc_helper.h>
+#include <drm/drm_edid.h>
+#include <drm/drm_encoder_slave.h>
+
+#include "adv7511.h"
+
+struct adv7511 {
+	struct i2c_client *i2c_main;
+	struct i2c_client *i2c_edid;
+
+	struct regmap *regmap;
+	struct regmap *packet_memory_regmap;
+	enum drm_connector_status status;
+	int dpms_mode;
+
+	unsigned int f_tmds;
+
+	unsigned int current_edid_segment;
+	uint8_t edid_buf[256];
+
+	wait_queue_head_t wq;
+	struct drm_encoder *encoder;
+
+	bool embedded_sync;
+	enum adv7511_sync_polarity vsync_polarity;
+	enum adv7511_sync_polarity hsync_polarity;
+	bool rgb;
+
+	struct edid *edid;
+
+	struct gpio_desc *gpio_pd;
+};
+
+static struct adv7511 *encoder_to_adv7511(struct drm_encoder *encoder)
+{
+	return to_encoder_slave(encoder)->slave_priv;
+}
+
+/* ADI recommended values for proper operation. */
+static const struct reg_default adv7511_fixed_registers[] = {
+	{ 0x98, 0x03 },
+	{ 0x9a, 0xe0 },
+	{ 0x9c, 0x30 },
+	{ 0x9d, 0x61 },
+	{ 0xa2, 0xa4 },
+	{ 0xa3, 0xa4 },
+	{ 0xe0, 0xd0 },
+	{ 0xf9, 0x00 },
+	{ 0x55, 0x02 },
+};
+
+/* -----------------------------------------------------------------------------
+ * Register access
+ */
+
+static const uint8_t adv7511_register_defaults[] = {
+	0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 00 */
+	0x00, 0x00, 0x01, 0x0e, 0xbc, 0x18, 0x01, 0x13,
+	0x25, 0x37, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 10 */
+	0x46, 0x62, 0x04, 0xa8, 0x00, 0x00, 0x1c, 0x84,
+	0x1c, 0xbf, 0x04, 0xa8, 0x1e, 0x70, 0x02, 0x1e, /* 20 */
+	0x00, 0x00, 0x04, 0xa8, 0x08, 0x12, 0x1b, 0xac,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 30 */
+	0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0xb0,
+	0x00, 0x50, 0x90, 0x7e, 0x79, 0x70, 0x00, 0x00, /* 40 */
+	0x00, 0xa8, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x02, 0x0d, 0x00, 0x00, 0x00, 0x00, /* 50 */
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 60 */
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x01, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 70 */
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 80 */
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0xc0, 0x00, 0x00, 0x00, /* 90 */
+	0x0b, 0x02, 0x00, 0x18, 0x5a, 0x60, 0x00, 0x00,
+	0x00, 0x00, 0x80, 0x80, 0x08, 0x04, 0x00, 0x00, /* a0 */
+	0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x40, 0x14,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* b0 */
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* c0 */
+	0x00, 0x03, 0x00, 0x00, 0x02, 0x00, 0x01, 0x04,
+	0x30, 0xff, 0x80, 0x80, 0x80, 0x00, 0x00, 0x00, /* d0 */
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x01,
+	0x80, 0x75, 0x00, 0x00, 0x60, 0x00, 0x00, 0x00, /* e0 */
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x75, 0x11, 0x00, /* f0 */
+	0x00, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+};
+
+static bool adv7511_register_volatile(struct device *dev, unsigned int reg)
+{
+	switch (reg) {
+	case ADV7511_REG_CHIP_REVISION:
+	case ADV7511_REG_SPDIF_FREQ:
+	case ADV7511_REG_CTS_AUTOMATIC1:
+	case ADV7511_REG_CTS_AUTOMATIC2:
+	case ADV7511_REG_VIC_DETECTED:
+	case ADV7511_REG_VIC_SEND:
+	case ADV7511_REG_AUX_VIC_DETECTED:
+	case ADV7511_REG_STATUS:
+	case ADV7511_REG_GC(1):
+	case ADV7511_REG_INT(0):
+	case ADV7511_REG_INT(1):
+	case ADV7511_REG_PLL_STATUS:
+	case ADV7511_REG_AN(0):
+	case ADV7511_REG_AN(1):
+	case ADV7511_REG_AN(2):
+	case ADV7511_REG_AN(3):
+	case ADV7511_REG_AN(4):
+	case ADV7511_REG_AN(5):
+	case ADV7511_REG_AN(6):
+	case ADV7511_REG_AN(7):
+	case ADV7511_REG_HDCP_STATUS:
+	case ADV7511_REG_BCAPS:
+	case ADV7511_REG_BKSV(0):
+	case ADV7511_REG_BKSV(1):
+	case ADV7511_REG_BKSV(2):
+	case ADV7511_REG_BKSV(3):
+	case ADV7511_REG_BKSV(4):
+	case ADV7511_REG_DDC_STATUS:
+	case ADV7511_REG_BSTATUS(0):
+	case ADV7511_REG_BSTATUS(1):
+	case ADV7511_REG_CHIP_ID_HIGH:
+	case ADV7511_REG_CHIP_ID_LOW:
+		return true;
+	}
+
+	return false;
+}
+
+static const struct regmap_config adv7511_regmap_config = {
+	.reg_bits = 8,
+	.val_bits = 8,
+
+	.max_register = 0xff,
+	.cache_type = REGCACHE_RBTREE,
+	.reg_defaults_raw = adv7511_register_defaults,
+	.num_reg_defaults_raw = ARRAY_SIZE(adv7511_register_defaults),
+
+	.volatile_reg = adv7511_register_volatile,
+};
+
+/* -----------------------------------------------------------------------------
+ * Hardware configuration
+ */
+
+static void adv7511_set_colormap(struct adv7511 *adv7511, bool enable,
+				 const uint16_t *coeff,
+				 unsigned int scaling_factor)
+{
+	unsigned int i;
+
+	regmap_update_bits(adv7511->regmap, ADV7511_REG_CSC_UPPER(1),
+			   ADV7511_CSC_UPDATE_MODE, ADV7511_CSC_UPDATE_MODE);
+
+	if (enable) {
+		for (i = 0; i < 12; ++i) {
+			regmap_update_bits(adv7511->regmap,
+					   ADV7511_REG_CSC_UPPER(i),
+					   0x1f, coeff[i] >> 8);
+			regmap_write(adv7511->regmap,
+				     ADV7511_REG_CSC_LOWER(i),
+				     coeff[i] & 0xff);
+		}
+	}
+
+	if (enable)
+		regmap_update_bits(adv7511->regmap, ADV7511_REG_CSC_UPPER(0),
+				   0xe0, 0x80 | (scaling_factor << 5));
+	else
+		regmap_update_bits(adv7511->regmap, ADV7511_REG_CSC_UPPER(0),
+				   0x80, 0x00);
+
+	regmap_update_bits(adv7511->regmap, ADV7511_REG_CSC_UPPER(1),
+			   ADV7511_CSC_UPDATE_MODE, 0);
+}
+
+static int adv7511_packet_enable(struct adv7511 *adv7511, unsigned int packet)
+{
+	if (packet & 0xff)
+		regmap_update_bits(adv7511->regmap, ADV7511_REG_PACKET_ENABLE0,
+				   packet, 0xff);
+
+	if (packet & 0xff00) {
+		packet >>= 8;
+		regmap_update_bits(adv7511->regmap, ADV7511_REG_PACKET_ENABLE1,
+				   packet, 0xff);
+	}
+
+	return 0;
+}
+
+static int adv7511_packet_disable(struct adv7511 *adv7511, unsigned int packet)
+{
+	if (packet & 0xff)
+		regmap_update_bits(adv7511->regmap, ADV7511_REG_PACKET_ENABLE0,
+				   packet, 0x00);
+
+	if (packet & 0xff00) {
+		packet >>= 8;
+		regmap_update_bits(adv7511->regmap, ADV7511_REG_PACKET_ENABLE1,
+				   packet, 0x00);
+	}
+
+	return 0;
+}
+
+/* Coefficients for adv7511 color space conversion */
+static const uint16_t adv7511_csc_ycbcr_to_rgb[] = {
+	0x0734, 0x04ad, 0x0000, 0x1c1b,
+	0x1ddc, 0x04ad, 0x1f24, 0x0135,
+	0x0000, 0x04ad, 0x087c, 0x1b77,
+};
+
+static void adv7511_set_config_csc(struct adv7511 *adv7511,
+				   struct drm_connector *connector,
+				   bool rgb)
+{
+	struct adv7511_video_config config;
+	bool output_format_422, output_format_ycbcr;
+	unsigned int mode;
+	uint8_t infoframe[17];
+
+	if (adv7511->edid)
+		config.hdmi_mode = drm_detect_hdmi_monitor(adv7511->edid);
+	else
+		config.hdmi_mode = false;
+
+	hdmi_avi_infoframe_init(&config.avi_infoframe);
+
+	config.avi_infoframe.scan_mode = HDMI_SCAN_MODE_UNDERSCAN;
+
+	if (rgb) {
+		config.csc_enable = false;
+		config.avi_infoframe.colorspace = HDMI_COLORSPACE_RGB;
+	} else {
+		config.csc_scaling_factor = ADV7511_CSC_SCALING_4;
+		config.csc_coefficents = adv7511_csc_ycbcr_to_rgb;
+
+		if ((connector->display_info.color_formats &
+		     DRM_COLOR_FORMAT_YCRCB422) &&
+		    config.hdmi_mode) {
+			config.csc_enable = false;
+			config.avi_infoframe.colorspace =
+				HDMI_COLORSPACE_YUV422;
+		} else {
+			config.csc_enable = true;
+			config.avi_infoframe.colorspace = HDMI_COLORSPACE_RGB;
+		}
+	}
+
+	if (config.hdmi_mode) {
+		mode = ADV7511_HDMI_CFG_MODE_HDMI;
+
+		switch (config.avi_infoframe.colorspace) {
+		case HDMI_COLORSPACE_YUV444:
+			output_format_422 = false;
+			output_format_ycbcr = true;
+			break;
+		case HDMI_COLORSPACE_YUV422:
+			output_format_422 = true;
+			output_format_ycbcr = true;
+			break;
+		default:
+			output_format_422 = false;
+			output_format_ycbcr = false;
+			break;
+		}
+	} else {
+		mode = ADV7511_HDMI_CFG_MODE_DVI;
+		output_format_422 = false;
+		output_format_ycbcr = false;
+	}
+
+	adv7511_packet_disable(adv7511, ADV7511_PACKET_ENABLE_AVI_INFOFRAME);
+
+	adv7511_set_colormap(adv7511, config.csc_enable,
+			     config.csc_coefficents,
+			     config.csc_scaling_factor);
+
+	regmap_update_bits(adv7511->regmap, ADV7511_REG_VIDEO_INPUT_CFG1, 0x81,
+			   (output_format_422 << 7) | output_format_ycbcr);
+
+	regmap_update_bits(adv7511->regmap, ADV7511_REG_HDCP_HDMI_CFG,
+			   ADV7511_HDMI_CFG_MODE_MASK, mode);
+
+	hdmi_avi_infoframe_pack(&config.avi_infoframe, infoframe,
+				sizeof(infoframe));
+
+	/* The AVI infoframe id is not configurable */
+	regmap_bulk_write(adv7511->regmap, ADV7511_REG_AVI_INFOFRAME_VERSION,
+			  infoframe + 1, sizeof(infoframe) - 1);
+
+	adv7511_packet_enable(adv7511, ADV7511_PACKET_ENABLE_AVI_INFOFRAME);
+}
+
+static void adv7511_set_link_config(struct adv7511 *adv7511,
+				    const struct adv7511_link_config *config)
+{
+	/*
+	 * The input style values documented in the datasheet don't match the
+	 * hardware register field values :-(
+	 */
+	static const unsigned int input_styles[4] = { 0, 2, 1, 3 };
+
+	unsigned int clock_delay;
+	unsigned int color_depth;
+	unsigned int input_id;
+
+	clock_delay = (config->clock_delay + 1200) / 400;
+	color_depth = config->input_color_depth == 8 ? 3
+		    : (config->input_color_depth == 10 ? 1 : 2);
+
+	/* TODO Support input ID 6 */
+	if (config->input_colorspace != HDMI_COLORSPACE_YUV422)
+		input_id = config->input_clock == ADV7511_INPUT_CLOCK_DDR
+			 ? 5 : 0;
+	else if (config->input_clock == ADV7511_INPUT_CLOCK_DDR)
+		input_id = config->embedded_sync ? 8 : 7;
+	else if (config->input_clock == ADV7511_INPUT_CLOCK_2X)
+		input_id = config->embedded_sync ? 4 : 3;
+	else
+		input_id = config->embedded_sync ? 2 : 1;
+
+	regmap_update_bits(adv7511->regmap, ADV7511_REG_I2C_FREQ_ID_CFG, 0xf,
+			   input_id);
+	regmap_update_bits(adv7511->regmap, ADV7511_REG_VIDEO_INPUT_CFG1, 0x7e,
+			   (color_depth << 4) |
+			   (input_styles[config->input_style] << 2));
+	regmap_write(adv7511->regmap, ADV7511_REG_VIDEO_INPUT_CFG2,
+		     config->input_justification << 3);
+	regmap_write(adv7511->regmap, ADV7511_REG_TIMING_GEN_SEQ,
+		     config->sync_pulse << 2);
+
+	regmap_write(adv7511->regmap, 0xba, clock_delay << 5);
+
+	adv7511->embedded_sync = config->embedded_sync;
+	adv7511->hsync_polarity = config->hsync_polarity;
+	adv7511->vsync_polarity = config->vsync_polarity;
+	adv7511->rgb = config->input_colorspace == HDMI_COLORSPACE_RGB;
+}
+
+/* -----------------------------------------------------------------------------
+ * Interrupt and hotplug detection
+ */
+
+static bool adv7511_hpd(struct adv7511 *adv7511)
+{
+	unsigned int irq0;
+	int ret;
+
+	ret = regmap_read(adv7511->regmap, ADV7511_REG_INT(0), &irq0);
+	if (ret < 0)
+		return false;
+
+	if (irq0 & ADV7511_INT0_HDP) {
+		regmap_write(adv7511->regmap, ADV7511_REG_INT(0),
+			     ADV7511_INT0_HDP);
+		return true;
+	}
+
+	return false;
+}
+
+static irqreturn_t adv7511_irq_handler(int irq, void *devid)
+{
+	struct adv7511 *adv7511 = devid;
+
+	if (adv7511_hpd(adv7511))
+		drm_helper_hpd_irq_event(adv7511->encoder->dev);
+
+	wake_up_all(&adv7511->wq);
+
+	return IRQ_HANDLED;
+}
+
+static unsigned int adv7511_is_interrupt_pending(struct adv7511 *adv7511,
+						 unsigned int irq)
+{
+	unsigned int irq0, irq1;
+	unsigned int pending;
+	int ret;
+
+	ret = regmap_read(adv7511->regmap, ADV7511_REG_INT(0), &irq0);
+	if (ret < 0)
+		return 0;
+	ret = regmap_read(adv7511->regmap, ADV7511_REG_INT(1), &irq1);
+	if (ret < 0)
+		return 0;
+
+	pending = (irq1 << 8) | irq0;
+
+	return pending & irq;
+}
+
+static int adv7511_wait_for_interrupt(struct adv7511 *adv7511, int irq,
+				      int timeout)
+{
+	unsigned int pending;
+	int ret;
+
+	if (adv7511->i2c_main->irq) {
+		ret = wait_event_interruptible_timeout(adv7511->wq,
+				adv7511_is_interrupt_pending(adv7511, irq),
+				msecs_to_jiffies(timeout));
+		if (ret <= 0)
+			return 0;
+		pending = adv7511_is_interrupt_pending(adv7511, irq);
+	} else {
+		if (timeout < 25)
+			timeout = 25;
+		do {
+			pending = adv7511_is_interrupt_pending(adv7511, irq);
+			if (pending)
+				break;
+			msleep(25);
+			timeout -= 25;
+		} while (timeout >= 25);
+	}
+
+	return pending;
+}
+
+/* -----------------------------------------------------------------------------
+ * EDID retrieval
+ */
+
+static int adv7511_get_edid_block(void *data, u8 *buf, unsigned int block,
+				  size_t len)
+{
+	struct adv7511 *adv7511 = data;
+	struct i2c_msg xfer[2];
+	uint8_t offset;
+	unsigned int i;
+	int ret;
+
+	if (len > 128)
+		return -EINVAL;
+
+	if (adv7511->current_edid_segment != block / 2) {
+		unsigned int status;
+
+		ret = regmap_read(adv7511->regmap, ADV7511_REG_DDC_STATUS,
+				  &status);
+		if (ret < 0)
+			return ret;
+
+		if (status != 2) {
+			regmap_write(adv7511->regmap, ADV7511_REG_EDID_SEGMENT,
+				     block);
+			ret = adv7511_wait_for_interrupt(adv7511,
+					ADV7511_INT0_EDID_READY |
+					ADV7511_INT1_DDC_ERROR, 200);
+
+			if (!(ret & ADV7511_INT0_EDID_READY))
+				return -EIO;
+		}
+
+		regmap_write(adv7511->regmap, ADV7511_REG_INT(0),
+			     ADV7511_INT0_EDID_READY | ADV7511_INT1_DDC_ERROR);
+
+		/* Break this apart, hopefully more I2C controllers will
+		 * support 64 byte transfers than 256 byte transfers
+		 */
+
+		xfer[0].addr = adv7511->i2c_edid->addr;
+		xfer[0].flags = 0;
+		xfer[0].len = 1;
+		xfer[0].buf = &offset;
+		xfer[1].addr = adv7511->i2c_edid->addr;
+		xfer[1].flags = I2C_M_RD;
+		xfer[1].len = 64;
+		xfer[1].buf = adv7511->edid_buf;
+
+		offset = 0;
+
+		for (i = 0; i < 4; ++i) {
+			ret = i2c_transfer(adv7511->i2c_edid->adapter, xfer,
+					   ARRAY_SIZE(xfer));
+			if (ret < 0)
+				return ret;
+			else if (ret != 2)
+				return -EIO;
+
+			xfer[1].buf += 64;
+			offset += 64;
+		}
+
+		adv7511->current_edid_segment = block / 2;
+	}
+
+	if (block % 2 == 0)
+		memcpy(buf, adv7511->edid_buf, len);
+	else
+		memcpy(buf, adv7511->edid_buf + 128, len);
+
+	return 0;
+}
+
+/* -----------------------------------------------------------------------------
+ * Encoder operations
+ */
+
+static int adv7511_get_modes(struct drm_encoder *encoder,
+			     struct drm_connector *connector)
+{
+	struct adv7511 *adv7511 = encoder_to_adv7511(encoder);
+	struct edid *edid;
+	unsigned int count;
+
+	/* Reading the EDID only works if the device is powered */
+	if (adv7511->dpms_mode != DRM_MODE_DPMS_ON) {
+		regmap_write(adv7511->regmap, ADV7511_REG_INT(0),
+			     ADV7511_INT0_EDID_READY | ADV7511_INT1_DDC_ERROR);
+		regmap_update_bits(adv7511->regmap, ADV7511_REG_POWER,
+				   ADV7511_POWER_POWER_DOWN, 0);
+		adv7511->current_edid_segment = -1;
+	}
+
+	edid = drm_do_get_edid(connector, adv7511_get_edid_block, adv7511);
+
+	if (adv7511->dpms_mode != DRM_MODE_DPMS_ON)
+		regmap_update_bits(adv7511->regmap, ADV7511_REG_POWER,
+				   ADV7511_POWER_POWER_DOWN,
+				   ADV7511_POWER_POWER_DOWN);
+
+	kfree(adv7511->edid);
+	adv7511->edid = edid;
+	if (!edid)
+		return 0;
+
+	drm_mode_connector_update_edid_property(connector, edid);
+	count = drm_add_edid_modes(connector, edid);
+
+	adv7511_set_config_csc(adv7511, connector, adv7511->rgb);
+
+	return count;
+}
+
+static void adv7511_encoder_dpms(struct drm_encoder *encoder, int mode)
+{
+	struct adv7511 *adv7511 = encoder_to_adv7511(encoder);
+
+	switch (mode) {
+	case DRM_MODE_DPMS_ON:
+		adv7511->current_edid_segment = -1;
+
+		regmap_write(adv7511->regmap, ADV7511_REG_INT(0),
+			     ADV7511_INT0_EDID_READY | ADV7511_INT1_DDC_ERROR);
+		regmap_update_bits(adv7511->regmap, ADV7511_REG_POWER,
+				   ADV7511_POWER_POWER_DOWN, 0);
+		/*
+		 * Per spec it is allowed to pulse the HDP signal to indicate
+		 * that the EDID information has changed. Some monitors do this
+		 * when they wakeup from standby or are enabled. When the HDP
+		 * goes low the adv7511 is reset and the outputs are disabled
+		 * which might cause the monitor to go to standby again. To
+		 * avoid this we ignore the HDP pin for the first few seconds
+		 * after enabeling the output.
+		 */
+		regmap_update_bits(adv7511->regmap, ADV7511_REG_POWER2,
+				   ADV7511_REG_POWER2_HDP_SRC_MASK,
+				   ADV7511_REG_POWER2_HDP_SRC_NONE);
+		/* Most of the registers are reset during power down or
+		 * when HPD is low
+		 */
+		regcache_sync(adv7511->regmap);
+		break;
+	default:
+		/* TODO: setup additional power down modes */
+		regmap_update_bits(adv7511->regmap, ADV7511_REG_POWER,
+				   ADV7511_POWER_POWER_DOWN,
+				   ADV7511_POWER_POWER_DOWN);
+		regcache_mark_dirty(adv7511->regmap);
+		break;
+	}
+
+	adv7511->dpms_mode = mode;
+}
+
+static enum drm_connector_status
+adv7511_encoder_detect(struct drm_encoder *encoder,
+		       struct drm_connector *connector)
+{
+	struct adv7511 *adv7511 = encoder_to_adv7511(encoder);
+	enum drm_connector_status status;
+	unsigned int val;
+	bool hpd;
+	int ret;
+
+	ret = regmap_read(adv7511->regmap, ADV7511_REG_STATUS, &val);
+	if (ret < 0)
+		return connector_status_disconnected;
+
+	if (val & ADV7511_STATUS_HPD)
+		status = connector_status_connected;
+	else
+		status = connector_status_disconnected;
+
+	hpd = adv7511_hpd(adv7511);
+
+	/* The chip resets itself when the cable is disconnected, so in case
+	 * there is a pending HPD interrupt and the cable is connected there was
+	 * at least one transition from disconnected to connected and the chip
+	 * has to be reinitialized. */
+	if (status == connector_status_connected && hpd &&
+	    adv7511->dpms_mode == DRM_MODE_DPMS_ON) {
+		regcache_mark_dirty(adv7511->regmap);
+		adv7511_encoder_dpms(encoder, adv7511->dpms_mode);
+		adv7511_get_modes(encoder, connector);
+		if (adv7511->status == connector_status_connected)
+			status = connector_status_disconnected;
+	} else {
+		/* Renable HDP sensing */
+		regmap_update_bits(adv7511->regmap, ADV7511_REG_POWER2,
+				   ADV7511_REG_POWER2_HDP_SRC_MASK,
+				   ADV7511_REG_POWER2_HDP_SRC_BOTH);
+	}
+
+	adv7511->status = status;
+	return status;
+}
+
+static int adv7511_encoder_mode_valid(struct drm_encoder *encoder,
+				      struct drm_display_mode *mode)
+{
+	if (mode->clock > 165000)
+		return MODE_CLOCK_HIGH;
+
+	if (mode->flags & DRM_MODE_FLAG_INTERLACE)
+		return MODE_NO_INTERLACE;
+
+	return MODE_OK;
+}
+
+static void adv7511_encoder_mode_set(struct drm_encoder *encoder,
+				     struct drm_display_mode *mode,
+				     struct drm_display_mode *adj_mode)
+{
+	struct adv7511 *adv7511 = encoder_to_adv7511(encoder);
+	unsigned int low_refresh_rate;
+	unsigned int hsync_polarity = 0;
+	unsigned int vsync_polarity = 0;
+
+	if (adv7511->embedded_sync) {
+		unsigned int hsync_offset, hsync_len;
+		unsigned int vsync_offset, vsync_len;
+
+		hsync_offset = adj_mode->crtc_hsync_start -
+			       adj_mode->crtc_hdisplay;
+		vsync_offset = adj_mode->crtc_vsync_start -
+			       adj_mode->crtc_vdisplay;
+		hsync_len = adj_mode->crtc_hsync_end -
+			    adj_mode->crtc_hsync_start;
+		vsync_len = adj_mode->crtc_vsync_end -
+			    adj_mode->crtc_vsync_start;
+
+		/* The hardware vsync generator has a off-by-one bug */
+		vsync_offset += 1;
+
+		regmap_write(adv7511->regmap, ADV7511_REG_HSYNC_PLACEMENT_MSB,
+			     ((hsync_offset >> 10) & 0x7) << 5);
+		regmap_write(adv7511->regmap, ADV7511_REG_SYNC_DECODER(0),
+			     (hsync_offset >> 2) & 0xff);
+		regmap_write(adv7511->regmap, ADV7511_REG_SYNC_DECODER(1),
+			     ((hsync_offset & 0x3) << 6) |
+			     ((hsync_len >> 4) & 0x3f));
+		regmap_write(adv7511->regmap, ADV7511_REG_SYNC_DECODER(2),
+			     ((hsync_len & 0xf) << 4) |
+			     ((vsync_offset >> 6) & 0xf));
+		regmap_write(adv7511->regmap, ADV7511_REG_SYNC_DECODER(3),
+			     ((vsync_offset & 0x3f) << 2) |
+			     ((vsync_len >> 8) & 0x3));
+		regmap_write(adv7511->regmap, ADV7511_REG_SYNC_DECODER(4),
+			     vsync_len & 0xff);
+
+		hsync_polarity = !(adj_mode->flags & DRM_MODE_FLAG_PHSYNC);
+		vsync_polarity = !(adj_mode->flags & DRM_MODE_FLAG_PVSYNC);
+	} else {
+		enum adv7511_sync_polarity mode_hsync_polarity;
+		enum adv7511_sync_polarity mode_vsync_polarity;
+
+		/**
+		 * If the input signal is always low or always high we want to
+		 * invert or let it passthrough depending on the polarity of the
+		 * current mode.
+		 **/
+		if (adj_mode->flags & DRM_MODE_FLAG_NHSYNC)
+			mode_hsync_polarity = ADV7511_SYNC_POLARITY_LOW;
+		else
+			mode_hsync_polarity = ADV7511_SYNC_POLARITY_HIGH;
+
+		if (adj_mode->flags & DRM_MODE_FLAG_NVSYNC)
+			mode_vsync_polarity = ADV7511_SYNC_POLARITY_LOW;
+		else
+			mode_vsync_polarity = ADV7511_SYNC_POLARITY_HIGH;
+
+		if (adv7511->hsync_polarity != mode_hsync_polarity &&
+		    adv7511->hsync_polarity !=
+		    ADV7511_SYNC_POLARITY_PASSTHROUGH)
+			hsync_polarity = 1;
+
+		if (adv7511->vsync_polarity != mode_vsync_polarity &&
+		    adv7511->vsync_polarity !=
+		    ADV7511_SYNC_POLARITY_PASSTHROUGH)
+			vsync_polarity = 1;
+	}
+
+	if (mode->vrefresh <= 24000)
+		low_refresh_rate = ADV7511_LOW_REFRESH_RATE_24HZ;
+	else if (mode->vrefresh <= 25000)
+		low_refresh_rate = ADV7511_LOW_REFRESH_RATE_25HZ;
+	else if (mode->vrefresh <= 30000)
+		low_refresh_rate = ADV7511_LOW_REFRESH_RATE_30HZ;
+	else
+		low_refresh_rate = ADV7511_LOW_REFRESH_RATE_NONE;
+
+	regmap_update_bits(adv7511->regmap, 0xfb,
+		0x6, low_refresh_rate << 1);
+	regmap_update_bits(adv7511->regmap, 0x17,
+		0x60, (vsync_polarity << 6) | (hsync_polarity << 5));
+
+	/*
+	 * TODO Test first order 4:2:2 to 4:4:4 up conversion method, which is
+	 * supposed to give better results.
+	 */
+
+	adv7511->f_tmds = mode->clock;
+}
+
+static struct drm_encoder_slave_funcs adv7511_encoder_funcs = {
+	.dpms = adv7511_encoder_dpms,
+	.mode_valid = adv7511_encoder_mode_valid,
+	.mode_set = adv7511_encoder_mode_set,
+	.detect = adv7511_encoder_detect,
+	.get_modes = adv7511_get_modes,
+};
+
+/* -----------------------------------------------------------------------------
+ * Probe & remove
+ */
+
+static int adv7511_parse_dt(struct device_node *np,
+			    struct adv7511_link_config *config)
+{
+	const char *str;
+	int ret;
+
+	memset(config, 0, sizeof(*config));
+
+	of_property_read_u32(np, "adi,input-depth", &config->input_color_depth);
+	if (config->input_color_depth != 8 && config->input_color_depth != 10 &&
+	    config->input_color_depth != 12)
+		return -EINVAL;
+
+	ret = of_property_read_string(np, "adi,input-colorspace", &str);
+	if (ret < 0)
+		return ret;
+
+	if (!strcmp(str, "rgb"))
+		config->input_colorspace = HDMI_COLORSPACE_RGB;
+	else if (!strcmp(str, "yuv422"))
+		config->input_colorspace = HDMI_COLORSPACE_YUV422;
+	else if (!strcmp(str, "yuv444"))
+		config->input_colorspace = HDMI_COLORSPACE_YUV444;
+	else
+		return -EINVAL;
+
+	ret = of_property_read_string(np, "adi,input-clock", &str);
+	if (ret < 0)
+		return ret;
+
+	if (!strcmp(str, "1x"))
+		config->input_clock = ADV7511_INPUT_CLOCK_1X;
+	else if (!strcmp(str, "2x"))
+		config->input_clock = ADV7511_INPUT_CLOCK_2X;
+	else if (!strcmp(str, "ddr"))
+		config->input_clock = ADV7511_INPUT_CLOCK_DDR;
+	else
+		return -EINVAL;
+
+	if (config->input_colorspace == HDMI_COLORSPACE_YUV422 ||
+	    config->input_clock != ADV7511_INPUT_CLOCK_1X) {
+		ret = of_property_read_u32(np, "adi,input-style",
+					   &config->input_style);
+		if (ret)
+			return ret;
+
+		if (config->input_style < 1 || config->input_style > 3)
+			return -EINVAL;
+
+		ret = of_property_read_string(np, "adi,input-justification",
+					      &str);
+		if (ret < 0)
+			return ret;
+
+		if (!strcmp(str, "left"))
+			config->input_justification =
+				ADV7511_INPUT_JUSTIFICATION_LEFT;
+		else if (!strcmp(str, "evenly"))
+			config->input_justification =
+				ADV7511_INPUT_JUSTIFICATION_EVENLY;
+		else if (!strcmp(str, "right"))
+			config->input_justification =
+				ADV7511_INPUT_JUSTIFICATION_RIGHT;
+		else
+			return -EINVAL;
+
+	} else {
+		config->input_style = 1;
+		config->input_justification = ADV7511_INPUT_JUSTIFICATION_LEFT;
+	}
+
+	of_property_read_u32(np, "adi,clock-delay", &config->clock_delay);
+	if (config->clock_delay < -1200 || config->clock_delay > 1600)
+		return -EINVAL;
+
+	config->embedded_sync = of_property_read_bool(np, "adi,embedded-sync");
+
+	/* Hardcode the sync pulse configurations for now. */
+	config->sync_pulse = ADV7511_INPUT_SYNC_PULSE_NONE;
+	config->vsync_polarity = ADV7511_SYNC_POLARITY_PASSTHROUGH;
+	config->hsync_polarity = ADV7511_SYNC_POLARITY_PASSTHROUGH;
+
+	return 0;
+}
+
+static const int edid_i2c_addr = 0x7e;
+static const int packet_i2c_addr = 0x70;
+static const int cec_i2c_addr = 0x78;
+
+static int adv7511_probe(struct i2c_client *i2c, const struct i2c_device_id *id)
+{
+	struct adv7511_link_config link_config;
+	struct adv7511 *adv7511;
+	struct device *dev = &i2c->dev;
+	unsigned int val;
+	int ret;
+
+	if (!dev->of_node)
+		return -EINVAL;
+
+	adv7511 = devm_kzalloc(dev, sizeof(*adv7511), GFP_KERNEL);
+	if (!adv7511)
+		return -ENOMEM;
+
+	adv7511->dpms_mode = DRM_MODE_DPMS_OFF;
+	adv7511->status = connector_status_disconnected;
+
+	ret = adv7511_parse_dt(dev->of_node, &link_config);
+	if (ret)
+		return ret;
+
+	/*
+	 * The power down GPIO is optional. If present, toggle it from active to
+	 * inactive to wake up the encoder.
+	 */
+	adv7511->gpio_pd = devm_gpiod_get_optional(dev, "pd", GPIOD_OUT_HIGH);
+	if (IS_ERR(adv7511->gpio_pd))
+		return PTR_ERR(adv7511->gpio_pd);
+
+	if (adv7511->gpio_pd) {
+		mdelay(5);
+		gpiod_set_value_cansleep(adv7511->gpio_pd, 0);
+	}
+
+	adv7511->regmap = devm_regmap_init_i2c(i2c, &adv7511_regmap_config);
+	if (IS_ERR(adv7511->regmap))
+		return PTR_ERR(adv7511->regmap);
+
+	ret = regmap_read(adv7511->regmap, ADV7511_REG_CHIP_REVISION, &val);
+	if (ret)
+		return ret;
+	dev_dbg(dev, "Rev. %d\n", val);
+
+	ret = regmap_register_patch(adv7511->regmap, adv7511_fixed_registers,
+				    ARRAY_SIZE(adv7511_fixed_registers));
+	if (ret)
+		return ret;
+
+	regmap_write(adv7511->regmap, ADV7511_REG_EDID_I2C_ADDR, edid_i2c_addr);
+	regmap_write(adv7511->regmap, ADV7511_REG_PACKET_I2C_ADDR,
+		     packet_i2c_addr);
+	regmap_write(adv7511->regmap, ADV7511_REG_CEC_I2C_ADDR, cec_i2c_addr);
+	adv7511_packet_disable(adv7511, 0xffff);
+
+	adv7511->i2c_main = i2c;
+	adv7511->i2c_edid = i2c_new_dummy(i2c->adapter, edid_i2c_addr >> 1);
+	if (!adv7511->i2c_edid)
+		return -ENOMEM;
+
+	if (i2c->irq) {
+		init_waitqueue_head(&adv7511->wq);
+
+		ret = devm_request_threaded_irq(dev, i2c->irq, NULL,
+						adv7511_irq_handler,
+						IRQF_ONESHOT, dev_name(dev),
+						adv7511);
+		if (ret)
+			goto err_i2c_unregister_device;
+	}
+
+	/* CEC is unused for now */
+	regmap_write(adv7511->regmap, ADV7511_REG_CEC_CTRL,
+		     ADV7511_CEC_CTRL_POWER_DOWN);
+
+	regmap_update_bits(adv7511->regmap, ADV7511_REG_POWER,
+			   ADV7511_POWER_POWER_DOWN, ADV7511_POWER_POWER_DOWN);
+
+	adv7511->current_edid_segment = -1;
+
+	i2c_set_clientdata(i2c, adv7511);
+
+	adv7511_set_link_config(adv7511, &link_config);
+
+	return 0;
+
+err_i2c_unregister_device:
+	i2c_unregister_device(adv7511->i2c_edid);
+
+	return ret;
+}
+
+static int adv7511_remove(struct i2c_client *i2c)
+{
+	struct adv7511 *adv7511 = i2c_get_clientdata(i2c);
+
+	i2c_unregister_device(adv7511->i2c_edid);
+
+	kfree(adv7511->edid);
+
+	return 0;
+}
+
+static int adv7511_encoder_init(struct i2c_client *i2c, struct drm_device *dev,
+				struct drm_encoder_slave *encoder)
+{
+
+	struct adv7511 *adv7511 = i2c_get_clientdata(i2c);
+
+	encoder->slave_priv = adv7511;
+	encoder->slave_funcs = &adv7511_encoder_funcs;
+
+	adv7511->encoder = &encoder->base;
+
+	return 0;
+}
+
+static const struct i2c_device_id adv7511_i2c_ids[] = {
+	{ "adv7511", 0 },
+	{ "adv7511w", 0 },
+	{ "adv7513", 0 },
+	{ }
+};
+MODULE_DEVICE_TABLE(i2c, adv7511_i2c_ids);
+
+static const struct of_device_id adv7511_of_ids[] = {
+	{ .compatible = "adi,adv7511", },
+	{ .compatible = "adi,adv7511w", },
+	{ .compatible = "adi,adv7513", },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, adv7511_of_ids);
+
+static struct drm_i2c_encoder_driver adv7511_driver = {
+	.i2c_driver = {
+		.driver = {
+			.name = "adv7511",
+			.of_match_table = adv7511_of_ids,
+		},
+		.id_table = adv7511_i2c_ids,
+		.probe = adv7511_probe,
+		.remove = adv7511_remove,
+	},
+
+	.encoder_init = adv7511_encoder_init,
+};
+
+static int __init adv7511_init(void)
+{
+	return drm_i2c_encoder_register(THIS_MODULE, &adv7511_driver);
+}
+module_init(adv7511_init);
+
+static void __exit adv7511_exit(void)
+{
+	drm_i2c_encoder_unregister(&adv7511_driver);
+}
+module_exit(adv7511_exit);
+
+MODULE_AUTHOR("Lars-Peter Clausen <lars@metafoo.de>");
+MODULE_DESCRIPTION("ADV7511 HDMI transmitter driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/drm/i2c/adv7511.h b/drivers/gpu/drm/i2c/adv7511.h
new file mode 100644
index 000000000000..6599ed538426
--- /dev/null
+++ b/drivers/gpu/drm/i2c/adv7511.h
@@ -0,0 +1,289 @@
+/*
+ * Analog Devices ADV7511 HDMI transmitter driver
+ *
+ * Copyright 2012 Analog Devices Inc.
+ *
+ * Licensed under the GPL-2.
+ */
+
+#ifndef __DRM_I2C_ADV7511_H__
+#define __DRM_I2C_ADV7511_H__
+
+#include <linux/hdmi.h>
+
+#define ADV7511_REG_CHIP_REVISION		0x00
+#define ADV7511_REG_N0				0x01
+#define ADV7511_REG_N1				0x02
+#define ADV7511_REG_N2				0x03
+#define ADV7511_REG_SPDIF_FREQ			0x04
+#define ADV7511_REG_CTS_AUTOMATIC1		0x05
+#define ADV7511_REG_CTS_AUTOMATIC2		0x06
+#define ADV7511_REG_CTS_MANUAL0			0x07
+#define ADV7511_REG_CTS_MANUAL1			0x08
+#define ADV7511_REG_CTS_MANUAL2			0x09
+#define ADV7511_REG_AUDIO_SOURCE		0x0a
+#define ADV7511_REG_AUDIO_CONFIG		0x0b
+#define ADV7511_REG_I2S_CONFIG			0x0c
+#define ADV7511_REG_I2S_WIDTH			0x0d
+#define ADV7511_REG_AUDIO_SUB_SRC0		0x0e
+#define ADV7511_REG_AUDIO_SUB_SRC1		0x0f
+#define ADV7511_REG_AUDIO_SUB_SRC2		0x10
+#define ADV7511_REG_AUDIO_SUB_SRC3		0x11
+#define ADV7511_REG_AUDIO_CFG1			0x12
+#define ADV7511_REG_AUDIO_CFG2			0x13
+#define ADV7511_REG_AUDIO_CFG3			0x14
+#define ADV7511_REG_I2C_FREQ_ID_CFG		0x15
+#define ADV7511_REG_VIDEO_INPUT_CFG1		0x16
+#define ADV7511_REG_CSC_UPPER(x)		(0x18 + (x) * 2)
+#define ADV7511_REG_CSC_LOWER(x)		(0x19 + (x) * 2)
+#define ADV7511_REG_SYNC_DECODER(x)		(0x30 + (x))
+#define ADV7511_REG_DE_GENERATOR		(0x35 + (x))
+#define ADV7511_REG_PIXEL_REPETITION		0x3b
+#define ADV7511_REG_VIC_MANUAL			0x3c
+#define ADV7511_REG_VIC_SEND			0x3d
+#define ADV7511_REG_VIC_DETECTED		0x3e
+#define ADV7511_REG_AUX_VIC_DETECTED		0x3f
+#define ADV7511_REG_PACKET_ENABLE0		0x40
+#define ADV7511_REG_POWER			0x41
+#define ADV7511_REG_STATUS			0x42
+#define ADV7511_REG_EDID_I2C_ADDR		0x43
+#define ADV7511_REG_PACKET_ENABLE1		0x44
+#define ADV7511_REG_PACKET_I2C_ADDR		0x45
+#define ADV7511_REG_DSD_ENABLE			0x46
+#define ADV7511_REG_VIDEO_INPUT_CFG2		0x48
+#define ADV7511_REG_INFOFRAME_UPDATE		0x4a
+#define ADV7511_REG_GC(x)			(0x4b + (x)) /* 0x4b - 0x51 */
+#define ADV7511_REG_AVI_INFOFRAME_VERSION	0x52
+#define ADV7511_REG_AVI_INFOFRAME_LENGTH	0x53
+#define ADV7511_REG_AVI_INFOFRAME_CHECKSUM	0x54
+#define ADV7511_REG_AVI_INFOFRAME(x)		(0x55 + (x)) /* 0x55 - 0x6f */
+#define ADV7511_REG_AUDIO_INFOFRAME_VERSION	0x70
+#define ADV7511_REG_AUDIO_INFOFRAME_LENGTH	0x71
+#define ADV7511_REG_AUDIO_INFOFRAME_CHECKSUM	0x72
+#define ADV7511_REG_AUDIO_INFOFRAME(x)		(0x73 + (x)) /* 0x73 - 0x7c */
+#define ADV7511_REG_INT_ENABLE(x)		(0x94 + (x))
+#define ADV7511_REG_INT(x)			(0x96 + (x))
+#define ADV7511_REG_INPUT_CLK_DIV		0x9d
+#define ADV7511_REG_PLL_STATUS			0x9e
+#define ADV7511_REG_HDMI_POWER			0xa1
+#define ADV7511_REG_HDCP_HDMI_CFG		0xaf
+#define ADV7511_REG_AN(x)			(0xb0 + (x)) /* 0xb0 - 0xb7 */
+#define ADV7511_REG_HDCP_STATUS			0xb8
+#define ADV7511_REG_BCAPS			0xbe
+#define ADV7511_REG_BKSV(x)			(0xc0 + (x)) /* 0xc0 - 0xc3 */
+#define ADV7511_REG_EDID_SEGMENT		0xc4
+#define ADV7511_REG_DDC_STATUS			0xc8
+#define ADV7511_REG_EDID_READ_CTRL		0xc9
+#define ADV7511_REG_BSTATUS(x)			(0xca + (x)) /* 0xca - 0xcb */
+#define ADV7511_REG_TIMING_GEN_SEQ		0xd0
+#define ADV7511_REG_POWER2			0xd6
+#define ADV7511_REG_HSYNC_PLACEMENT_MSB		0xfa
+
+#define ADV7511_REG_SYNC_ADJUSTMENT(x)		(0xd7 + (x)) /* 0xd7 - 0xdc */
+#define ADV7511_REG_TMDS_CLOCK_INV		0xde
+#define ADV7511_REG_ARC_CTRL			0xdf
+#define ADV7511_REG_CEC_I2C_ADDR		0xe1
+#define ADV7511_REG_CEC_CTRL			0xe2
+#define ADV7511_REG_CHIP_ID_HIGH		0xf5
+#define ADV7511_REG_CHIP_ID_LOW			0xf6
+
+#define ADV7511_CSC_ENABLE			BIT(7)
+#define ADV7511_CSC_UPDATE_MODE			BIT(5)
+
+#define ADV7511_INT0_HDP			BIT(7)
+#define ADV7511_INT0_VSYNC			BIT(5)
+#define ADV7511_INT0_AUDIO_FIFO_FULL		BIT(4)
+#define ADV7511_INT0_EDID_READY			BIT(2)
+#define ADV7511_INT0_HDCP_AUTHENTICATED		BIT(1)
+
+#define ADV7511_INT1_DDC_ERROR			BIT(7)
+#define ADV7511_INT1_BKSV			BIT(6)
+#define ADV7511_INT1_CEC_TX_READY		BIT(5)
+#define ADV7511_INT1_CEC_TX_ARBIT_LOST		BIT(4)
+#define ADV7511_INT1_CEC_TX_RETRY_TIMEOUT	BIT(3)
+#define ADV7511_INT1_CEC_RX_READY3		BIT(2)
+#define ADV7511_INT1_CEC_RX_READY2		BIT(1)
+#define ADV7511_INT1_CEC_RX_READY1		BIT(0)
+
+#define ADV7511_ARC_CTRL_POWER_DOWN		BIT(0)
+
+#define ADV7511_CEC_CTRL_POWER_DOWN		BIT(0)
+
+#define ADV7511_POWER_POWER_DOWN		BIT(6)
+
+#define ADV7511_HDMI_CFG_MODE_MASK		0x2
+#define ADV7511_HDMI_CFG_MODE_DVI		0x0
+#define ADV7511_HDMI_CFG_MODE_HDMI		0x2
+
+#define ADV7511_AUDIO_SELECT_I2C		0x0
+#define ADV7511_AUDIO_SELECT_SPDIF		0x1
+#define ADV7511_AUDIO_SELECT_DSD		0x2
+#define ADV7511_AUDIO_SELECT_HBR		0x3
+#define ADV7511_AUDIO_SELECT_DST		0x4
+
+#define ADV7511_I2S_SAMPLE_LEN_16		0x2
+#define ADV7511_I2S_SAMPLE_LEN_20		0x3
+#define ADV7511_I2S_SAMPLE_LEN_18		0x4
+#define ADV7511_I2S_SAMPLE_LEN_22		0x5
+#define ADV7511_I2S_SAMPLE_LEN_19		0x8
+#define ADV7511_I2S_SAMPLE_LEN_23		0x9
+#define ADV7511_I2S_SAMPLE_LEN_24		0xb
+#define ADV7511_I2S_SAMPLE_LEN_17		0xc
+#define ADV7511_I2S_SAMPLE_LEN_21		0xd
+
+#define ADV7511_SAMPLE_FREQ_44100		0x0
+#define ADV7511_SAMPLE_FREQ_48000		0x2
+#define ADV7511_SAMPLE_FREQ_32000		0x3
+#define ADV7511_SAMPLE_FREQ_88200		0x8
+#define ADV7511_SAMPLE_FREQ_96000		0xa
+#define ADV7511_SAMPLE_FREQ_176400		0xc
+#define ADV7511_SAMPLE_FREQ_192000		0xe
+
+#define ADV7511_STATUS_POWER_DOWN_POLARITY	BIT(7)
+#define ADV7511_STATUS_HPD			BIT(6)
+#define ADV7511_STATUS_MONITOR_SENSE		BIT(5)
+#define ADV7511_STATUS_I2S_32BIT_MODE		BIT(3)
+
+#define ADV7511_PACKET_ENABLE_N_CTS		BIT(8+6)
+#define ADV7511_PACKET_ENABLE_AUDIO_SAMPLE	BIT(8+5)
+#define ADV7511_PACKET_ENABLE_AVI_INFOFRAME	BIT(8+4)
+#define ADV7511_PACKET_ENABLE_AUDIO_INFOFRAME	BIT(8+3)
+#define ADV7511_PACKET_ENABLE_GC		BIT(7)
+#define ADV7511_PACKET_ENABLE_SPD		BIT(6)
+#define ADV7511_PACKET_ENABLE_MPEG		BIT(5)
+#define ADV7511_PACKET_ENABLE_ACP		BIT(4)
+#define ADV7511_PACKET_ENABLE_ISRC		BIT(3)
+#define ADV7511_PACKET_ENABLE_GM		BIT(2)
+#define ADV7511_PACKET_ENABLE_SPARE2		BIT(1)
+#define ADV7511_PACKET_ENABLE_SPARE1		BIT(0)
+
+#define ADV7511_REG_POWER2_HDP_SRC_MASK		0xc0
+#define ADV7511_REG_POWER2_HDP_SRC_BOTH		0x00
+#define ADV7511_REG_POWER2_HDP_SRC_HDP		0x40
+#define ADV7511_REG_POWER2_HDP_SRC_CEC		0x80
+#define ADV7511_REG_POWER2_HDP_SRC_NONE		0xc0
+#define ADV7511_REG_POWER2_TDMS_ENABLE		BIT(4)
+#define ADV7511_REG_POWER2_GATE_INPUT_CLK	BIT(0)
+
+#define ADV7511_LOW_REFRESH_RATE_NONE		0x0
+#define ADV7511_LOW_REFRESH_RATE_24HZ		0x1
+#define ADV7511_LOW_REFRESH_RATE_25HZ		0x2
+#define ADV7511_LOW_REFRESH_RATE_30HZ		0x3
+
+#define ADV7511_AUDIO_CFG3_LEN_MASK		0x0f
+#define ADV7511_I2C_FREQ_ID_CFG_RATE_MASK	0xf0
+
+#define ADV7511_AUDIO_SOURCE_I2S		0
+#define ADV7511_AUDIO_SOURCE_SPDIF		1
+
+#define ADV7511_I2S_FORMAT_I2S			0
+#define ADV7511_I2S_FORMAT_RIGHT_J		1
+#define ADV7511_I2S_FORMAT_LEFT_J		2
+
+#define ADV7511_PACKET(p, x)	    ((p) * 0x20 + (x))
+#define ADV7511_PACKET_SDP(x)	    ADV7511_PACKET(0, x)
+#define ADV7511_PACKET_MPEG(x)	    ADV7511_PACKET(1, x)
+#define ADV7511_PACKET_ACP(x)	    ADV7511_PACKET(2, x)
+#define ADV7511_PACKET_ISRC1(x)	    ADV7511_PACKET(3, x)
+#define ADV7511_PACKET_ISRC2(x)	    ADV7511_PACKET(4, x)
+#define ADV7511_PACKET_GM(x)	    ADV7511_PACKET(5, x)
+#define ADV7511_PACKET_SPARE(x)	    ADV7511_PACKET(6, x)
+
+enum adv7511_input_clock {
+	ADV7511_INPUT_CLOCK_1X,
+	ADV7511_INPUT_CLOCK_2X,
+	ADV7511_INPUT_CLOCK_DDR,
+};
+
+enum adv7511_input_justification {
+	ADV7511_INPUT_JUSTIFICATION_EVENLY = 0,
+	ADV7511_INPUT_JUSTIFICATION_RIGHT = 1,
+	ADV7511_INPUT_JUSTIFICATION_LEFT = 2,
+};
+
+enum adv7511_input_sync_pulse {
+	ADV7511_INPUT_SYNC_PULSE_DE = 0,
+	ADV7511_INPUT_SYNC_PULSE_HSYNC = 1,
+	ADV7511_INPUT_SYNC_PULSE_VSYNC = 2,
+	ADV7511_INPUT_SYNC_PULSE_NONE = 3,
+};
+
+/**
+ * enum adv7511_sync_polarity - Polarity for the input sync signals
+ * @ADV7511_SYNC_POLARITY_PASSTHROUGH:  Sync polarity matches that of
+ *				       the currently configured mode.
+ * @ADV7511_SYNC_POLARITY_LOW:	    Sync polarity is low
+ * @ADV7511_SYNC_POLARITY_HIGH:	    Sync polarity is high
+ *
+ * If the polarity is set to either LOW or HIGH the driver will configure the
+ * ADV7511 to internally invert the sync signal if required to match the sync
+ * polarity setting for the currently selected output mode.
+ *
+ * If the polarity is set to PASSTHROUGH, the ADV7511 will route the signal
+ * unchanged. This is used when the upstream graphics core already generates
+ * the sync signals with the correct polarity.
+ */
+enum adv7511_sync_polarity {
+	ADV7511_SYNC_POLARITY_PASSTHROUGH,
+	ADV7511_SYNC_POLARITY_LOW,
+	ADV7511_SYNC_POLARITY_HIGH,
+};
+
+/**
+ * struct adv7511_link_config - Describes adv7511 hardware configuration
+ * @input_color_depth:		Number of bits per color component (8, 10 or 12)
+ * @input_colorspace:		The input colorspace (RGB, YUV444, YUV422)
+ * @input_clock:		The input video clock style (1x, 2x, DDR)
+ * @input_style:		The input component arrangement variant
+ * @input_justification:	Video input format bit justification
+ * @clock_delay:		Clock delay for the input clock (in ps)
+ * @embedded_sync:		Video input uses BT.656-style embedded sync
+ * @sync_pulse:			Select the sync pulse
+ * @vsync_polarity:		vsync input signal configuration
+ * @hsync_polarity:		hsync input signal configuration
+ */
+struct adv7511_link_config {
+	unsigned int input_color_depth;
+	enum hdmi_colorspace input_colorspace;
+	enum adv7511_input_clock input_clock;
+	unsigned int input_style;
+	enum adv7511_input_justification input_justification;
+
+	int clock_delay;
+
+	bool embedded_sync;
+	enum adv7511_input_sync_pulse sync_pulse;
+	enum adv7511_sync_polarity vsync_polarity;
+	enum adv7511_sync_polarity hsync_polarity;
+};
+
+/**
+ * enum adv7511_csc_scaling - Scaling factor for the ADV7511 CSC
+ * @ADV7511_CSC_SCALING_1: CSC results are not scaled
+ * @ADV7511_CSC_SCALING_2: CSC results are scaled by a factor of two
+ * @ADV7511_CSC_SCALING_4: CSC results are scalled by a factor of four
+ */
+enum adv7511_csc_scaling {
+	ADV7511_CSC_SCALING_1 = 0,
+	ADV7511_CSC_SCALING_2 = 1,
+	ADV7511_CSC_SCALING_4 = 2,
+};
+
+/**
+ * struct adv7511_video_config - Describes adv7511 hardware configuration
+ * @csc_enable:			Whether to enable color space conversion
+ * @csc_scaling_factor:		Color space conversion scaling factor
+ * @csc_coefficents:		Color space conversion coefficents
+ * @hdmi_mode:			Whether to use HDMI or DVI output mode
+ * @avi_infoframe:		HDMI infoframe
+ */
+struct adv7511_video_config {
+	bool csc_enable;
+	enum adv7511_csc_scaling csc_scaling_factor;
+	const uint16_t *csc_coefficents;
+
+	bool hdmi_mode;
+	struct hdmi_avi_infoframe avi_infoframe;
+};
+
+#endif /* __DRM_I2C_ADV7511_H__ */
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index c1dd485aeb6c..e4083e41a600 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -11,7 +11,9 @@ i915-y := i915_drv.o \
 	  i915_params.o \
           i915_suspend.o \
 	  i915_sysfs.o \
-	  intel_pm.o
+	  intel_pm.o \
+	  intel_runtime_pm.o
+
 i915-$(CONFIG_COMPAT)   += i915_ioc32.o
 i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o
 
@@ -38,13 +40,18 @@ i915-y += i915_cmd_parser.o \
 # autogenerated null render state
 i915-y += intel_renderstate_gen6.o \
 	  intel_renderstate_gen7.o \
-	  intel_renderstate_gen8.o
+	  intel_renderstate_gen8.o \
+	  intel_renderstate_gen9.o
 
 # modesetting core code
-i915-y += intel_bios.o \
+i915-y += intel_audio.o \
+	  intel_bios.o \
 	  intel_display.o \
+	  intel_fifo_underrun.o \
+	  intel_frontbuffer.o \
 	  intel_modes.o \
 	  intel_overlay.o \
+	  intel_psr.o \
 	  intel_sideband.o \
 	  intel_sprite.o
 i915-$(CONFIG_ACPI)		+= intel_acpi.o intel_opregion.o
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 593b657d3e59..22c992a78ac6 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -73,7 +73,7 @@
  * those commands required by the parser. This generally works because command
  * opcode ranges have standard command length encodings. So for commands that
  * the parser does not need to check, it can easily skip them. This is
- * implementated via a per-ring length decoding vfunc.
+ * implemented via a per-ring length decoding vfunc.
  *
  * Unfortunately, there are a number of commands that do not follow the standard
  * length encoding for their opcode range, primarily amongst the MI_* commands.
@@ -138,6 +138,11 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
 			.mask = MI_GLOBAL_GTT,
 			.expected = 0,
 	      }},						       ),
+	/*
+	 * MI_BATCH_BUFFER_START requires some special handling. It's not
+	 * really a 'skip' action but it doesn't seem like it's worth adding
+	 * a new action. See i915_parse_cmds().
+	 */
 	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
 };
 
@@ -408,6 +413,8 @@ static const u32 gen7_render_regs[] = {
 	REG64(PS_INVOCATION_COUNT),
 	REG64(PS_DEPTH_COUNT),
 	OACONTROL, /* Only allowed for LRI and SRM. See below. */
+	REG64(MI_PREDICATE_SRC0),
+	REG64(MI_PREDICATE_SRC1),
 	GEN7_3DPRIM_END_OFFSET,
 	GEN7_3DPRIM_START_VERTEX,
 	GEN7_3DPRIM_VERTEX_COUNT,
@@ -838,7 +845,7 @@ finish:
  * @ring: the ring in question
  *
  * Only certain platforms require software batch buffer command parsing, and
- * only when enabled via module paramter.
+ * only when enabled via module parameter.
  *
  * Return: true if the ring requires software command parsing
  */
@@ -847,12 +854,7 @@ bool i915_needs_cmd_parser(struct intel_engine_cs *ring)
 	if (!ring->needs_cmd_parser)
 		return false;
 
-	/*
-	 * XXX: VLV is Gen7 and therefore has cmd_tables, but has PPGTT
-	 * disabled. That will cause all of the parser's PPGTT checks to
-	 * fail. For now, disable parsing when PPGTT is off.
-	 */
-	if (USES_PPGTT(ring->dev))
+	if (!USES_PPGTT(ring->dev))
 		return false;
 
 	return (i915.enable_cmd_parser == 1);
@@ -888,8 +890,10 @@ static bool check_cmd(const struct intel_engine_cs *ring,
 		 * OACONTROL writes to only MI_LOAD_REGISTER_IMM commands.
 		 */
 		if (reg_addr == OACONTROL) {
-			if (desc->cmd.value == MI_LOAD_REGISTER_MEM)
+			if (desc->cmd.value == MI_LOAD_REGISTER_MEM) {
+				DRM_DEBUG_DRIVER("CMD: Rejected LRM to OACONTROL\n");
 				return false;
+			}
 
 			if (desc->cmd.value == MI_LOAD_REGISTER_IMM(1))
 				*oacontrol_set = (cmd[2] != 0);
@@ -958,7 +962,8 @@ static bool check_cmd(const struct intel_engine_cs *ring,
  * Parses the specified batch buffer looking for privilege violations as
  * described in the overview.
  *
- * Return: non-zero if the parser finds violations or otherwise fails
+ * Return: non-zero if the parser finds violations or otherwise fails; -EACCES
+ * if the batch appears legal but should use hardware parsing
  */
 int i915_parse_cmds(struct intel_engine_cs *ring,
 		    struct drm_i915_gem_object *batch_obj,
@@ -1005,6 +1010,16 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 			break;
 		}
 
+		/*
+		 * If the batch buffer contains a chained batch, return an
+		 * error that tells the caller to abort and dispatch the
+		 * workload as a non-secure batch.
+		 */
+		if (desc->cmd.value == MI_BATCH_BUFFER_START) {
+			ret = -EACCES;
+			break;
+		}
+
 		if (desc->flags & CMD_DESC_FIXED)
 			length = desc->length.fixed;
 		else
@@ -1059,6 +1074,8 @@ int i915_cmd_parser_get_version(void)
 	 *
 	 * 1. Initial version. Checks batches and reports violations, but leaves
 	 *    hardware parsing enabled (so does not allow new use cases).
+	 * 2. Allow access to the MI_PREDICATE_SRC0 and
+	 *    MI_PREDICATE_SRC1 registers.
 	 */
-	return 1;
+	return 2;
 }
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 063b44817e08..779a275eb1fd 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -116,7 +116,7 @@ static const char *get_tiling_flag(struct drm_i915_gem_object *obj)
 
 static inline const char *get_global_flag(struct drm_i915_gem_object *obj)
 {
-	return obj->has_global_gtt_mapping ? "g" : " ";
+	return i915_gem_obj_to_ggtt(obj) ? "g" : " ";
 }
 
 static void
@@ -516,7 +516,6 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 	struct drm_info_node *node = m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
 	struct intel_crtc *crtc;
 	int ret;
 
@@ -529,7 +528,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 		const char plane = plane_name(crtc->plane);
 		struct intel_unpin_work *work;
 
-		spin_lock_irqsave(&dev->event_lock, flags);
+		spin_lock_irq(&dev->event_lock);
 		work = crtc->unpin_work;
 		if (work == NULL) {
 			seq_printf(m, "No flip due on pipe %c (plane %c)\n",
@@ -575,7 +574,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 				seq_printf(m, "MMIO update completed? %d\n",  addr == work->gtt_offset);
 			}
 		}
-		spin_unlock_irqrestore(&dev->event_lock, flags);
+		spin_unlock_irq(&dev->event_lock);
 	}
 
 	mutex_unlock(&dev->struct_mutex);
@@ -717,7 +716,7 @@ static int i915_interrupt_info(struct seq_file *m, void *data)
 		}
 
 		for_each_pipe(dev_priv, pipe) {
-			if (!intel_display_power_enabled(dev_priv,
+			if (!intel_display_power_is_enabled(dev_priv,
 						POWER_DOMAIN_PIPE(pipe))) {
 				seq_printf(m, "Pipe %c power disabled\n",
 					   pipe_name(pipe));
@@ -1241,11 +1240,12 @@ static int vlv_drpc_info(struct seq_file *m)
 	struct drm_info_node *node = m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	u32 rpmodectl1, rcctl1;
+	u32 rpmodectl1, rcctl1, pw_status;
 	unsigned fw_rendercount = 0, fw_mediacount = 0;
 
 	intel_runtime_pm_get(dev_priv);
 
+	pw_status = I915_READ(VLV_GTLC_PW_STATUS);
 	rpmodectl1 = I915_READ(GEN6_RP_CONTROL);
 	rcctl1 = I915_READ(GEN6_RC_CONTROL);
 
@@ -1264,11 +1264,9 @@ static int vlv_drpc_info(struct seq_file *m)
 		   yesno(rcctl1 & (GEN7_RC_CTL_TO_MODE |
 					GEN6_RC_CTL_EI_MODE(1))));
 	seq_printf(m, "Render Power Well: %s\n",
-			(I915_READ(VLV_GTLC_PW_STATUS) &
-				VLV_GTLC_PW_RENDER_STATUS_MASK) ? "Up" : "Down");
+		   (pw_status & VLV_GTLC_PW_RENDER_STATUS_MASK) ? "Up" : "Down");
 	seq_printf(m, "Media Power Well: %s\n",
-			(I915_READ(VLV_GTLC_PW_STATUS) &
-				VLV_GTLC_PW_MEDIA_STATUS_MASK) ? "Up" : "Down");
+		   (pw_status & VLV_GTLC_PW_MEDIA_STATUS_MASK) ? "Up" : "Down");
 
 	seq_printf(m, "Render RC6 residency since boot: %u\n",
 		   I915_READ(VLV_GT_RENDER_RC6));
@@ -1774,6 +1772,50 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	return 0;
 }
 
+static void i915_dump_lrc_obj(struct seq_file *m,
+			      struct intel_engine_cs *ring,
+			      struct drm_i915_gem_object *ctx_obj)
+{
+	struct page *page;
+	uint32_t *reg_state;
+	int j;
+	unsigned long ggtt_offset = 0;
+
+	if (ctx_obj == NULL) {
+		seq_printf(m, "Context on %s with no gem object\n",
+			   ring->name);
+		return;
+	}
+
+	seq_printf(m, "CONTEXT: %s %u\n", ring->name,
+		   intel_execlists_ctx_id(ctx_obj));
+
+	if (!i915_gem_obj_ggtt_bound(ctx_obj))
+		seq_puts(m, "\tNot bound in GGTT\n");
+	else
+		ggtt_offset = i915_gem_obj_ggtt_offset(ctx_obj);
+
+	if (i915_gem_object_get_pages(ctx_obj)) {
+		seq_puts(m, "\tFailed to get pages for context object\n");
+		return;
+	}
+
+	page = i915_gem_object_get_page(ctx_obj, 1);
+	if (!WARN_ON(page == NULL)) {
+		reg_state = kmap_atomic(page);
+
+		for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
+			seq_printf(m, "\t[0x%08lx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
+				   ggtt_offset + 4096 + (j * 4),
+				   reg_state[j], reg_state[j + 1],
+				   reg_state[j + 2], reg_state[j + 3]);
+		}
+		kunmap_atomic(reg_state);
+	}
+
+	seq_putc(m, '\n');
+}
+
 static int i915_dump_lrc(struct seq_file *m, void *unused)
 {
 	struct drm_info_node *node = (struct drm_info_node *) m->private;
@@ -1794,29 +1836,9 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
 
 	list_for_each_entry(ctx, &dev_priv->context_list, link) {
 		for_each_ring(ring, dev_priv, i) {
-			struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
-
-			if (ring->default_context == ctx)
-				continue;
-
-			if (ctx_obj) {
-				struct page *page = i915_gem_object_get_page(ctx_obj, 1);
-				uint32_t *reg_state = kmap_atomic(page);
-				int j;
-
-				seq_printf(m, "CONTEXT: %s %u\n", ring->name,
-						intel_execlists_ctx_id(ctx_obj));
-
-				for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
-					seq_printf(m, "\t[0x%08lx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
-					i915_gem_obj_ggtt_offset(ctx_obj) + 4096 + (j * 4),
-					reg_state[j], reg_state[j + 1],
-					reg_state[j + 2], reg_state[j + 3]);
-				}
-				kunmap_atomic(reg_state);
-
-				seq_putc(m, '\n');
-			}
+			if (ring->default_context != ctx)
+				i915_dump_lrc_obj(m, ring,
+						  ctx->engine[i].state);
 		}
 	}
 
@@ -1849,6 +1871,8 @@ static int i915_execlists(struct seq_file *m, void *data)
 	if (ret)
 		return ret;
 
+	intel_runtime_pm_get(dev_priv);
+
 	for_each_ring(ring, dev_priv, ring_id) {
 		struct intel_ctx_submit_request *head_req = NULL;
 		int count = 0;
@@ -1900,6 +1924,7 @@ static int i915_execlists(struct seq_file *m, void *data)
 		seq_putc(m, '\n');
 	}
 
+	intel_runtime_pm_put(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
 
 	return 0;
@@ -1973,6 +1998,8 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 	if (IS_GEN3(dev) || IS_GEN4(dev)) {
 		seq_printf(m, "DDC = 0x%08x\n",
 			   I915_READ(DCC));
+		seq_printf(m, "DDC2 = 0x%08x\n",
+			   I915_READ(DCC2));
 		seq_printf(m, "C0DRB3 = 0x%04x\n",
 			   I915_READ16(C0DRB3));
 		seq_printf(m, "C1DRB3 = 0x%04x\n",
@@ -1986,7 +2013,7 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 			   I915_READ(MAD_DIMM_C2));
 		seq_printf(m, "TILECTL = 0x%08x\n",
 			   I915_READ(TILECTL));
-		if (IS_GEN8(dev))
+		if (INTEL_INFO(dev)->gen >= 8)
 			seq_printf(m, "GAMTARBMODE = 0x%08x\n",
 				   I915_READ(GAMTARBMODE));
 		else
@@ -1995,6 +2022,10 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 		seq_printf(m, "DISP_ARB_CTL = 0x%08x\n",
 			   I915_READ(DISP_ARB_CTL));
 	}
+
+	if (dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES)
+		seq_puts(m, "L-shaped memory detected\n");
+
 	intel_runtime_pm_put(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
 
@@ -2628,14 +2659,15 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused)
 		struct intel_shared_dpll *pll = &dev_priv->shared_dplls[i];
 
 		seq_printf(m, "DPLL%i: %s, id: %i\n", i, pll->name, pll->id);
-		seq_printf(m, " refcount: %i, active: %i, on: %s\n", pll->refcount,
-			   pll->active, yesno(pll->on));
+		seq_printf(m, " crtc_mask: 0x%08x, active: %d, on: %s\n",
+			   pll->config.crtc_mask, pll->active, yesno(pll->on));
 		seq_printf(m, " tracked hardware state:\n");
-		seq_printf(m, " dpll:    0x%08x\n", pll->hw_state.dpll);
-		seq_printf(m, " dpll_md: 0x%08x\n", pll->hw_state.dpll_md);
-		seq_printf(m, " fp0:     0x%08x\n", pll->hw_state.fp0);
-		seq_printf(m, " fp1:     0x%08x\n", pll->hw_state.fp1);
-		seq_printf(m, " wrpll:   0x%08x\n", pll->hw_state.wrpll);
+		seq_printf(m, " dpll:    0x%08x\n", pll->config.hw_state.dpll);
+		seq_printf(m, " dpll_md: 0x%08x\n",
+			   pll->config.hw_state.dpll_md);
+		seq_printf(m, " fp0:     0x%08x\n", pll->config.hw_state.fp0);
+		seq_printf(m, " fp1:     0x%08x\n", pll->config.hw_state.fp1);
+		seq_printf(m, " wrpll:   0x%08x\n", pll->config.hw_state.wrpll);
 	}
 	drm_modeset_unlock_all(dev);
 
@@ -2656,18 +2688,18 @@ static int i915_wa_registers(struct seq_file *m, void *unused)
 
 	intel_runtime_pm_get(dev_priv);
 
-	seq_printf(m, "Workarounds applied: %d\n", dev_priv->num_wa_regs);
-	for (i = 0; i < dev_priv->num_wa_regs; ++i) {
-		u32 addr, mask;
-
-		addr = dev_priv->intel_wa_regs[i].addr;
-		mask = dev_priv->intel_wa_regs[i].mask;
-		dev_priv->intel_wa_regs[i].value = I915_READ(addr) | mask;
-		if (dev_priv->intel_wa_regs[i].addr)
-			seq_printf(m, "0x%X: 0x%08X, mask: 0x%08X\n",
-				   dev_priv->intel_wa_regs[i].addr,
-				   dev_priv->intel_wa_regs[i].value,
-				   dev_priv->intel_wa_regs[i].mask);
+	seq_printf(m, "Workarounds applied: %d\n", dev_priv->workarounds.count);
+	for (i = 0; i < dev_priv->workarounds.count; ++i) {
+		u32 addr, mask, value, read;
+		bool ok;
+
+		addr = dev_priv->workarounds.reg[i].addr;
+		mask = dev_priv->workarounds.reg[i].mask;
+		value = dev_priv->workarounds.reg[i].value;
+		read = I915_READ(addr);
+		ok = (value & mask) == (read & mask);
+		seq_printf(m, "0x%X: 0x%08X, mask: 0x%08X, read: 0x%08x, status: %s\n",
+			   addr, value, mask, read, ok ? "OK" : "FAIL");
 	}
 
 	intel_runtime_pm_put(dev_priv);
@@ -2676,6 +2708,42 @@ static int i915_wa_registers(struct seq_file *m, void *unused)
 	return 0;
 }
 
+static int i915_ddb_info(struct seq_file *m, void *unused)
+{
+	struct drm_info_node *node = m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct skl_ddb_allocation *ddb;
+	struct skl_ddb_entry *entry;
+	enum pipe pipe;
+	int plane;
+
+	drm_modeset_lock_all(dev);
+
+	ddb = &dev_priv->wm.skl_hw.ddb;
+
+	seq_printf(m, "%-15s%8s%8s%8s\n", "", "Start", "End", "Size");
+
+	for_each_pipe(dev_priv, pipe) {
+		seq_printf(m, "Pipe %c\n", pipe_name(pipe));
+
+		for_each_plane(pipe, plane) {
+			entry = &ddb->plane[pipe][plane];
+			seq_printf(m, "  Plane%-8d%8u%8u%8u\n", plane + 1,
+				   entry->start, entry->end,
+				   skl_ddb_entry_size(entry));
+		}
+
+		entry = &ddb->cursor[pipe];
+		seq_printf(m, "  %-13s%8u%8u%8u\n", "Cursor", entry->start,
+			   entry->end, skl_ddb_entry_size(entry));
+	}
+
+	drm_modeset_unlock_all(dev);
+
+	return 0;
+}
+
 struct pipe_crc_info {
 	const char *name;
 	struct drm_device *dev;
@@ -2969,6 +3037,8 @@ static int i9xx_pipe_crc_auto_source(struct drm_device *dev, enum pipe pipe,
 				break;
 			}
 			break;
+		default:
+			break;
 		}
 	}
 	drm_modeset_unlock_all(dev);
@@ -3256,6 +3326,8 @@ static int pipe_crc_set_source(struct drm_device *dev, enum pipe pipe,
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_pipe_crc *pipe_crc = &dev_priv->pipe_crc[pipe];
+	struct intel_crtc *crtc = to_intel_crtc(intel_get_crtc_for_pipe(dev,
+									pipe));
 	u32 val = 0; /* shut up gcc */
 	int ret;
 
@@ -3266,6 +3338,11 @@ static int pipe_crc_set_source(struct drm_device *dev, enum pipe pipe,
 	if (pipe_crc->source && source)
 		return -EINVAL;
 
+	if (!intel_display_power_is_enabled(dev_priv, POWER_DOMAIN_PIPE(pipe))) {
+		DRM_DEBUG_KMS("Trying to capture CRC while pipe is off\n");
+		return -EIO;
+	}
+
 	if (IS_GEN2(dev))
 		ret = i8xx_pipe_crc_ctl_reg(&source, &val);
 	else if (INTEL_INFO(dev)->gen < 5)
@@ -3291,6 +3368,14 @@ static int pipe_crc_set_source(struct drm_device *dev, enum pipe pipe,
 		if (!pipe_crc->entries)
 			return -ENOMEM;
 
+		/*
+		 * When IPS gets enabled, the pipe CRC changes. Since IPS gets
+		 * enabled and disabled dynamically based on package C states,
+		 * user space can't make reliable use of the CRCs, so let's just
+		 * completely disable it.
+		 */
+		hsw_disable_ips(crtc);
+
 		spin_lock_irq(&pipe_crc->lock);
 		pipe_crc->head = 0;
 		pipe_crc->tail = 0;
@@ -3329,6 +3414,8 @@ static int pipe_crc_set_source(struct drm_device *dev, enum pipe pipe,
 			vlv_undo_pipe_scramble_reset(dev, pipe);
 		else if (IS_HASWELL(dev) && pipe == PIPE_A)
 			hsw_undo_trans_edp_pipe_A_crc_wa(dev);
+
+		hsw_enable_ips(crtc);
 	}
 
 	return 0;
@@ -3506,7 +3593,7 @@ static const struct file_operations i915_display_crc_ctl_fops = {
 	.write = display_crc_ctl_write
 };
 
-static void wm_latency_show(struct seq_file *m, const uint16_t wm[5])
+static void wm_latency_show(struct seq_file *m, const uint16_t wm[8])
 {
 	struct drm_device *dev = m->private;
 	int num_levels = ilk_wm_max_level(dev) + 1;
@@ -3517,13 +3604,17 @@ static void wm_latency_show(struct seq_file *m, const uint16_t wm[5])
 	for (level = 0; level < num_levels; level++) {
 		unsigned int latency = wm[level];
 
-		/* WM1+ latency values in 0.5us units */
-		if (level > 0)
+		/*
+		 * - WM1+ latency values in 0.5us units
+		 * - latencies are in us on gen9
+		 */
+		if (INTEL_INFO(dev)->gen >= 9)
+			latency *= 10;
+		else if (level > 0)
 			latency *= 5;
 
 		seq_printf(m, "WM%d %u (%u.%u usec)\n",
-			   level, wm[level],
-			   latency / 10, latency % 10);
+			   level, wm[level], latency / 10, latency % 10);
 	}
 
 	drm_modeset_unlock_all(dev);
@@ -3532,8 +3623,15 @@ static void wm_latency_show(struct seq_file *m, const uint16_t wm[5])
 static int pri_wm_latency_show(struct seq_file *m, void *data)
 {
 	struct drm_device *dev = m->private;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	const uint16_t *latencies;
+
+	if (INTEL_INFO(dev)->gen >= 9)
+		latencies = dev_priv->wm.skl_latency;
+	else
+		latencies = to_i915(dev)->wm.pri_latency;
 
-	wm_latency_show(m, to_i915(dev)->wm.pri_latency);
+	wm_latency_show(m, latencies);
 
 	return 0;
 }
@@ -3541,8 +3639,15 @@ static int pri_wm_latency_show(struct seq_file *m, void *data)
 static int spr_wm_latency_show(struct seq_file *m, void *data)
 {
 	struct drm_device *dev = m->private;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	const uint16_t *latencies;
+
+	if (INTEL_INFO(dev)->gen >= 9)
+		latencies = dev_priv->wm.skl_latency;
+	else
+		latencies = to_i915(dev)->wm.spr_latency;
 
-	wm_latency_show(m, to_i915(dev)->wm.spr_latency);
+	wm_latency_show(m, latencies);
 
 	return 0;
 }
@@ -3550,8 +3655,15 @@ static int spr_wm_latency_show(struct seq_file *m, void *data)
 static int cur_wm_latency_show(struct seq_file *m, void *data)
 {
 	struct drm_device *dev = m->private;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	const uint16_t *latencies;
+
+	if (INTEL_INFO(dev)->gen >= 9)
+		latencies = dev_priv->wm.skl_latency;
+	else
+		latencies = to_i915(dev)->wm.cur_latency;
 
-	wm_latency_show(m, to_i915(dev)->wm.cur_latency);
+	wm_latency_show(m, latencies);
 
 	return 0;
 }
@@ -3587,11 +3699,11 @@ static int cur_wm_latency_open(struct inode *inode, struct file *file)
 }
 
 static ssize_t wm_latency_write(struct file *file, const char __user *ubuf,
-				size_t len, loff_t *offp, uint16_t wm[5])
+				size_t len, loff_t *offp, uint16_t wm[8])
 {
 	struct seq_file *m = file->private_data;
 	struct drm_device *dev = m->private;
-	uint16_t new[5] = { 0 };
+	uint16_t new[8] = { 0 };
 	int num_levels = ilk_wm_max_level(dev) + 1;
 	int level;
 	int ret;
@@ -3605,7 +3717,9 @@ static ssize_t wm_latency_write(struct file *file, const char __user *ubuf,
 
 	tmp[len] = '\0';
 
-	ret = sscanf(tmp, "%hu %hu %hu %hu %hu", &new[0], &new[1], &new[2], &new[3], &new[4]);
+	ret = sscanf(tmp, "%hu %hu %hu %hu %hu %hu %hu %hu",
+		     &new[0], &new[1], &new[2], &new[3],
+		     &new[4], &new[5], &new[6], &new[7]);
 	if (ret != num_levels)
 		return -EINVAL;
 
@@ -3625,8 +3739,15 @@ static ssize_t pri_wm_latency_write(struct file *file, const char __user *ubuf,
 {
 	struct seq_file *m = file->private_data;
 	struct drm_device *dev = m->private;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	uint16_t *latencies;
 
-	return wm_latency_write(file, ubuf, len, offp, to_i915(dev)->wm.pri_latency);
+	if (INTEL_INFO(dev)->gen >= 9)
+		latencies = dev_priv->wm.skl_latency;
+	else
+		latencies = to_i915(dev)->wm.pri_latency;
+
+	return wm_latency_write(file, ubuf, len, offp, latencies);
 }
 
 static ssize_t spr_wm_latency_write(struct file *file, const char __user *ubuf,
@@ -3634,8 +3755,15 @@ static ssize_t spr_wm_latency_write(struct file *file, const char __user *ubuf,
 {
 	struct seq_file *m = file->private_data;
 	struct drm_device *dev = m->private;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	uint16_t *latencies;
 
-	return wm_latency_write(file, ubuf, len, offp, to_i915(dev)->wm.spr_latency);
+	if (INTEL_INFO(dev)->gen >= 9)
+		latencies = dev_priv->wm.skl_latency;
+	else
+		latencies = to_i915(dev)->wm.spr_latency;
+
+	return wm_latency_write(file, ubuf, len, offp, latencies);
 }
 
 static ssize_t cur_wm_latency_write(struct file *file, const char __user *ubuf,
@@ -3643,8 +3771,15 @@ static ssize_t cur_wm_latency_write(struct file *file, const char __user *ubuf,
 {
 	struct seq_file *m = file->private_data;
 	struct drm_device *dev = m->private;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	uint16_t *latencies;
+
+	if (INTEL_INFO(dev)->gen >= 9)
+		latencies = dev_priv->wm.skl_latency;
+	else
+		latencies = to_i915(dev)->wm.cur_latency;
 
-	return wm_latency_write(file, ubuf, len, offp, to_i915(dev)->wm.cur_latency);
+	return wm_latency_write(file, ubuf, len, offp, latencies);
 }
 
 static const struct file_operations i915_pri_wm_latency_fops = {
@@ -4187,6 +4322,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_shared_dplls_info", i915_shared_dplls_info, 0},
 	{"i915_dp_mst_info", i915_dp_mst_info, 0},
 	{"i915_wa_registers", i915_wa_registers, 0},
+	{"i915_ddb_info", i915_ddb_info, 0},
 };
 #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
 
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 318ade9bb5af..ecee3bcc8772 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -50,884 +50,6 @@
 #include <linux/pm_runtime.h>
 #include <linux/oom.h>
 
-#define LP_RING(d) (&((struct drm_i915_private *)(d))->ring[RCS])
-
-#define BEGIN_LP_RING(n) \
-	intel_ring_begin(LP_RING(dev_priv), (n))
-
-#define OUT_RING(x) \
-	intel_ring_emit(LP_RING(dev_priv), x)
-
-#define ADVANCE_LP_RING() \
-	__intel_ring_advance(LP_RING(dev_priv))
-
-/**
- * Lock test for when it's just for synchronization of ring access.
- *
- * In that case, we don't need to do it when GEM is initialized as nobody else
- * has access to the ring.
- */
-#define RING_LOCK_TEST_WITH_RETURN(dev, file) do {			\
-	if (LP_RING(dev->dev_private)->buffer->obj == NULL)			\
-		LOCK_TEST_WITH_RETURN(dev, file);			\
-} while (0)
-
-static inline u32
-intel_read_legacy_status_page(struct drm_i915_private *dev_priv, int reg)
-{
-	if (I915_NEED_GFX_HWS(dev_priv->dev))
-		return ioread32(dev_priv->dri1.gfx_hws_cpu_addr + reg);
-	else
-		return intel_read_status_page(LP_RING(dev_priv), reg);
-}
-
-#define READ_HWSP(dev_priv, reg) intel_read_legacy_status_page(dev_priv, reg)
-#define READ_BREADCRUMB(dev_priv) READ_HWSP(dev_priv, I915_BREADCRUMB_INDEX)
-#define I915_BREADCRUMB_INDEX		0x21
-
-void i915_update_dri1_breadcrumb(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_master_private *master_priv;
-
-	/*
-	 * The dri breadcrumb update races against the drm master disappearing.
-	 * Instead of trying to fix this (this is by far not the only ums issue)
-	 * just don't do the update in kms mode.
-	 */
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return;
-
-	if (dev->primary->master) {
-		master_priv = dev->primary->master->driver_priv;
-		if (master_priv->sarea_priv)
-			master_priv->sarea_priv->last_dispatch =
-				READ_BREADCRUMB(dev_priv);
-	}
-}
-
-static void i915_write_hws_pga(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	u32 addr;
-
-	addr = dev_priv->status_page_dmah->busaddr;
-	if (INTEL_INFO(dev)->gen >= 4)
-		addr |= (dev_priv->status_page_dmah->busaddr >> 28) & 0xf0;
-	I915_WRITE(HWS_PGA, addr);
-}
-
-/**
- * Frees the hardware status page, whether it's a physical address or a virtual
- * address set up by the X Server.
- */
-static void i915_free_hws(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_engine_cs *ring = LP_RING(dev_priv);
-
-	if (dev_priv->status_page_dmah) {
-		drm_pci_free(dev, dev_priv->status_page_dmah);
-		dev_priv->status_page_dmah = NULL;
-	}
-
-	if (ring->status_page.gfx_addr) {
-		ring->status_page.gfx_addr = 0;
-		iounmap(dev_priv->dri1.gfx_hws_cpu_addr);
-	}
-
-	/* Need to rewrite hardware status page */
-	I915_WRITE(HWS_PGA, 0x1ffff000);
-}
-
-void i915_kernel_lost_context(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_master_private *master_priv;
-	struct intel_engine_cs *ring = LP_RING(dev_priv);
-	struct intel_ringbuffer *ringbuf = ring->buffer;
-
-	/*
-	 * We should never lose context on the ring with modesetting
-	 * as we don't expose it to userspace
-	 */
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return;
-
-	ringbuf->head = I915_READ_HEAD(ring) & HEAD_ADDR;
-	ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
-	ringbuf->space = ringbuf->head - (ringbuf->tail + I915_RING_FREE_SPACE);
-	if (ringbuf->space < 0)
-		ringbuf->space += ringbuf->size;
-
-	if (!dev->primary->master)
-		return;
-
-	master_priv = dev->primary->master->driver_priv;
-	if (ringbuf->head == ringbuf->tail && master_priv->sarea_priv)
-		master_priv->sarea_priv->perf_boxes |= I915_BOX_RING_EMPTY;
-}
-
-static int i915_dma_cleanup(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	int i;
-
-	/* Make sure interrupts are disabled here because the uninstall ioctl
-	 * may not have been called from userspace and after dev_private
-	 * is freed, it's too late.
-	 */
-	if (dev->irq_enabled)
-		drm_irq_uninstall(dev);
-
-	mutex_lock(&dev->struct_mutex);
-	for (i = 0; i < I915_NUM_RINGS; i++)
-		intel_cleanup_ring_buffer(&dev_priv->ring[i]);
-	mutex_unlock(&dev->struct_mutex);
-
-	/* Clear the HWS virtual address at teardown */
-	if (I915_NEED_GFX_HWS(dev))
-		i915_free_hws(dev);
-
-	return 0;
-}
-
-static int i915_initialize(struct drm_device *dev, drm_i915_init_t *init)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_master_private *master_priv = dev->primary->master->driver_priv;
-	int ret;
-
-	master_priv->sarea = drm_legacy_getsarea(dev);
-	if (master_priv->sarea) {
-		master_priv->sarea_priv = (drm_i915_sarea_t *)
-			((u8 *)master_priv->sarea->handle + init->sarea_priv_offset);
-	} else {
-		DRM_DEBUG_DRIVER("sarea not found assuming DRI2 userspace\n");
-	}
-
-	if (init->ring_size != 0) {
-		if (LP_RING(dev_priv)->buffer->obj != NULL) {
-			i915_dma_cleanup(dev);
-			DRM_ERROR("Client tried to initialize ringbuffer in "
-				  "GEM mode\n");
-			return -EINVAL;
-		}
-
-		ret = intel_render_ring_init_dri(dev,
-						 init->ring_start,
-						 init->ring_size);
-		if (ret) {
-			i915_dma_cleanup(dev);
-			return ret;
-		}
-	}
-
-	dev_priv->dri1.cpp = init->cpp;
-	dev_priv->dri1.back_offset = init->back_offset;
-	dev_priv->dri1.front_offset = init->front_offset;
-	dev_priv->dri1.current_page = 0;
-	if (master_priv->sarea_priv)
-		master_priv->sarea_priv->pf_current_page = 0;
-
-	/* Allow hardware batchbuffers unless told otherwise.
-	 */
-	dev_priv->dri1.allow_batchbuffer = 1;
-
-	return 0;
-}
-
-static int i915_dma_resume(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_engine_cs *ring = LP_RING(dev_priv);
-
-	DRM_DEBUG_DRIVER("%s\n", __func__);
-
-	if (ring->buffer->virtual_start == NULL) {
-		DRM_ERROR("can not ioremap virtual address for"
-			  " ring buffer\n");
-		return -ENOMEM;
-	}
-
-	/* Program Hardware Status Page */
-	if (!ring->status_page.page_addr) {
-		DRM_ERROR("Can not find hardware status page\n");
-		return -EINVAL;
-	}
-	DRM_DEBUG_DRIVER("hw status page @ %p\n",
-				ring->status_page.page_addr);
-	if (ring->status_page.gfx_addr != 0)
-		intel_ring_setup_status_page(ring);
-	else
-		i915_write_hws_pga(dev);
-
-	DRM_DEBUG_DRIVER("Enabled hardware status page\n");
-
-	return 0;
-}
-
-static int i915_dma_init(struct drm_device *dev, void *data,
-			 struct drm_file *file_priv)
-{
-	drm_i915_init_t *init = data;
-	int retcode = 0;
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return -ENODEV;
-
-	switch (init->func) {
-	case I915_INIT_DMA:
-		retcode = i915_initialize(dev, init);
-		break;
-	case I915_CLEANUP_DMA:
-		retcode = i915_dma_cleanup(dev);
-		break;
-	case I915_RESUME_DMA:
-		retcode = i915_dma_resume(dev);
-		break;
-	default:
-		retcode = -EINVAL;
-		break;
-	}
-
-	return retcode;
-}
-
-/* Implement basically the same security restrictions as hardware does
- * for MI_BATCH_NON_SECURE.  These can be made stricter at any time.
- *
- * Most of the calculations below involve calculating the size of a
- * particular instruction.  It's important to get the size right as
- * that tells us where the next instruction to check is.  Any illegal
- * instruction detected will be given a size of zero, which is a
- * signal to abort the rest of the buffer.
- */
-static int validate_cmd(int cmd)
-{
-	switch (((cmd >> 29) & 0x7)) {
-	case 0x0:
-		switch ((cmd >> 23) & 0x3f) {
-		case 0x0:
-			return 1;	/* MI_NOOP */
-		case 0x4:
-			return 1;	/* MI_FLUSH */
-		default:
-			return 0;	/* disallow everything else */
-		}
-		break;
-	case 0x1:
-		return 0;	/* reserved */
-	case 0x2:
-		return (cmd & 0xff) + 2;	/* 2d commands */
-	case 0x3:
-		if (((cmd >> 24) & 0x1f) <= 0x18)
-			return 1;
-
-		switch ((cmd >> 24) & 0x1f) {
-		case 0x1c:
-			return 1;
-		case 0x1d:
-			switch ((cmd >> 16) & 0xff) {
-			case 0x3:
-				return (cmd & 0x1f) + 2;
-			case 0x4:
-				return (cmd & 0xf) + 2;
-			default:
-				return (cmd & 0xffff) + 2;
-			}
-		case 0x1e:
-			if (cmd & (1 << 23))
-				return (cmd & 0xffff) + 1;
-			else
-				return 1;
-		case 0x1f:
-			if ((cmd & (1 << 23)) == 0)	/* inline vertices */
-				return (cmd & 0x1ffff) + 2;
-			else if (cmd & (1 << 17))	/* indirect random */
-				if ((cmd & 0xffff) == 0)
-					return 0;	/* unknown length, too hard */
-				else
-					return (((cmd & 0xffff) + 1) / 2) + 1;
-			else
-				return 2;	/* indirect sequential */
-		default:
-			return 0;
-		}
-	default:
-		return 0;
-	}
-
-	return 0;
-}
-
-static int i915_emit_cmds(struct drm_device *dev, int *buffer, int dwords)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	int i, ret;
-
-	if ((dwords+1) * sizeof(int) >= LP_RING(dev_priv)->buffer->size - 8)
-		return -EINVAL;
-
-	for (i = 0; i < dwords;) {
-		int sz = validate_cmd(buffer[i]);
-
-		if (sz == 0 || i + sz > dwords)
-			return -EINVAL;
-		i += sz;
-	}
-
-	ret = BEGIN_LP_RING((dwords+1)&~1);
-	if (ret)
-		return ret;
-
-	for (i = 0; i < dwords; i++)
-		OUT_RING(buffer[i]);
-	if (dwords & 1)
-		OUT_RING(0);
-
-	ADVANCE_LP_RING();
-
-	return 0;
-}
-
-int
-i915_emit_box(struct drm_device *dev,
-	      struct drm_clip_rect *box,
-	      int DR1, int DR4)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	int ret;
-
-	if (box->y2 <= box->y1 || box->x2 <= box->x1 ||
-	    box->y2 <= 0 || box->x2 <= 0) {
-		DRM_ERROR("Bad box %d,%d..%d,%d\n",
-			  box->x1, box->y1, box->x2, box->y2);
-		return -EINVAL;
-	}
-
-	if (INTEL_INFO(dev)->gen >= 4) {
-		ret = BEGIN_LP_RING(4);
-		if (ret)
-			return ret;
-
-		OUT_RING(GFX_OP_DRAWRECT_INFO_I965);
-		OUT_RING((box->x1 & 0xffff) | (box->y1 << 16));
-		OUT_RING(((box->x2 - 1) & 0xffff) | ((box->y2 - 1) << 16));
-		OUT_RING(DR4);
-	} else {
-		ret = BEGIN_LP_RING(6);
-		if (ret)
-			return ret;
-
-		OUT_RING(GFX_OP_DRAWRECT_INFO);
-		OUT_RING(DR1);
-		OUT_RING((box->x1 & 0xffff) | (box->y1 << 16));
-		OUT_RING(((box->x2 - 1) & 0xffff) | ((box->y2 - 1) << 16));
-		OUT_RING(DR4);
-		OUT_RING(0);
-	}
-	ADVANCE_LP_RING();
-
-	return 0;
-}
-
-/* XXX: Emitting the counter should really be moved to part of the IRQ
- * emit. For now, do it in both places:
- */
-
-static void i915_emit_breadcrumb(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_master_private *master_priv = dev->primary->master->driver_priv;
-
-	dev_priv->dri1.counter++;
-	if (dev_priv->dri1.counter > 0x7FFFFFFFUL)
-		dev_priv->dri1.counter = 0;
-	if (master_priv->sarea_priv)
-		master_priv->sarea_priv->last_enqueue = dev_priv->dri1.counter;
-
-	if (BEGIN_LP_RING(4) == 0) {
-		OUT_RING(MI_STORE_DWORD_INDEX);
-		OUT_RING(I915_BREADCRUMB_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-		OUT_RING(dev_priv->dri1.counter);
-		OUT_RING(0);
-		ADVANCE_LP_RING();
-	}
-}
-
-static int i915_dispatch_cmdbuffer(struct drm_device *dev,
-				   drm_i915_cmdbuffer_t *cmd,
-				   struct drm_clip_rect *cliprects,
-				   void *cmdbuf)
-{
-	int nbox = cmd->num_cliprects;
-	int i = 0, count, ret;
-
-	if (cmd->sz & 0x3) {
-		DRM_ERROR("alignment");
-		return -EINVAL;
-	}
-
-	i915_kernel_lost_context(dev);
-
-	count = nbox ? nbox : 1;
-
-	for (i = 0; i < count; i++) {
-		if (i < nbox) {
-			ret = i915_emit_box(dev, &cliprects[i],
-					    cmd->DR1, cmd->DR4);
-			if (ret)
-				return ret;
-		}
-
-		ret = i915_emit_cmds(dev, cmdbuf, cmd->sz / 4);
-		if (ret)
-			return ret;
-	}
-
-	i915_emit_breadcrumb(dev);
-	return 0;
-}
-
-static int i915_dispatch_batchbuffer(struct drm_device *dev,
-				     drm_i915_batchbuffer_t *batch,
-				     struct drm_clip_rect *cliprects)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	int nbox = batch->num_cliprects;
-	int i, count, ret;
-
-	if ((batch->start | batch->used) & 0x7) {
-		DRM_ERROR("alignment");
-		return -EINVAL;
-	}
-
-	i915_kernel_lost_context(dev);
-
-	count = nbox ? nbox : 1;
-	for (i = 0; i < count; i++) {
-		if (i < nbox) {
-			ret = i915_emit_box(dev, &cliprects[i],
-					    batch->DR1, batch->DR4);
-			if (ret)
-				return ret;
-		}
-
-		if (!IS_I830(dev) && !IS_845G(dev)) {
-			ret = BEGIN_LP_RING(2);
-			if (ret)
-				return ret;
-
-			if (INTEL_INFO(dev)->gen >= 4) {
-				OUT_RING(MI_BATCH_BUFFER_START | (2 << 6) | MI_BATCH_NON_SECURE_I965);
-				OUT_RING(batch->start);
-			} else {
-				OUT_RING(MI_BATCH_BUFFER_START | (2 << 6));
-				OUT_RING(batch->start | MI_BATCH_NON_SECURE);
-			}
-		} else {
-			ret = BEGIN_LP_RING(4);
-			if (ret)
-				return ret;
-
-			OUT_RING(MI_BATCH_BUFFER);
-			OUT_RING(batch->start | MI_BATCH_NON_SECURE);
-			OUT_RING(batch->start + batch->used - 4);
-			OUT_RING(0);
-		}
-		ADVANCE_LP_RING();
-	}
-
-
-	if (IS_G4X(dev) || IS_GEN5(dev)) {
-		if (BEGIN_LP_RING(2) == 0) {
-			OUT_RING(MI_FLUSH | MI_NO_WRITE_FLUSH | MI_INVALIDATE_ISP);
-			OUT_RING(MI_NOOP);
-			ADVANCE_LP_RING();
-		}
-	}
-
-	i915_emit_breadcrumb(dev);
-	return 0;
-}
-
-static int i915_dispatch_flip(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_master_private *master_priv =
-		dev->primary->master->driver_priv;
-	int ret;
-
-	if (!master_priv->sarea_priv)
-		return -EINVAL;
-
-	DRM_DEBUG_DRIVER("%s: page=%d pfCurrentPage=%d\n",
-			  __func__,
-			 dev_priv->dri1.current_page,
-			 master_priv->sarea_priv->pf_current_page);
-
-	i915_kernel_lost_context(dev);
-
-	ret = BEGIN_LP_RING(10);
-	if (ret)
-		return ret;
-
-	OUT_RING(MI_FLUSH | MI_READ_FLUSH);
-	OUT_RING(0);
-
-	OUT_RING(CMD_OP_DISPLAYBUFFER_INFO | ASYNC_FLIP);
-	OUT_RING(0);
-	if (dev_priv->dri1.current_page == 0) {
-		OUT_RING(dev_priv->dri1.back_offset);
-		dev_priv->dri1.current_page = 1;
-	} else {
-		OUT_RING(dev_priv->dri1.front_offset);
-		dev_priv->dri1.current_page = 0;
-	}
-	OUT_RING(0);
-
-	OUT_RING(MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PLANE_A_FLIP);
-	OUT_RING(0);
-
-	ADVANCE_LP_RING();
-
-	master_priv->sarea_priv->last_enqueue = dev_priv->dri1.counter++;
-
-	if (BEGIN_LP_RING(4) == 0) {
-		OUT_RING(MI_STORE_DWORD_INDEX);
-		OUT_RING(I915_BREADCRUMB_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-		OUT_RING(dev_priv->dri1.counter);
-		OUT_RING(0);
-		ADVANCE_LP_RING();
-	}
-
-	master_priv->sarea_priv->pf_current_page = dev_priv->dri1.current_page;
-	return 0;
-}
-
-static int i915_quiescent(struct drm_device *dev)
-{
-	i915_kernel_lost_context(dev);
-	return intel_ring_idle(LP_RING(dev->dev_private));
-}
-
-static int i915_flush_ioctl(struct drm_device *dev, void *data,
-			    struct drm_file *file_priv)
-{
-	int ret;
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return -ENODEV;
-
-	RING_LOCK_TEST_WITH_RETURN(dev, file_priv);
-
-	mutex_lock(&dev->struct_mutex);
-	ret = i915_quiescent(dev);
-	mutex_unlock(&dev->struct_mutex);
-
-	return ret;
-}
-
-static int i915_batchbuffer(struct drm_device *dev, void *data,
-			    struct drm_file *file_priv)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_master_private *master_priv;
-	drm_i915_sarea_t *sarea_priv;
-	drm_i915_batchbuffer_t *batch = data;
-	int ret;
-	struct drm_clip_rect *cliprects = NULL;
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return -ENODEV;
-
-	master_priv = dev->primary->master->driver_priv;
-	sarea_priv = (drm_i915_sarea_t *) master_priv->sarea_priv;
-
-	if (!dev_priv->dri1.allow_batchbuffer) {
-		DRM_ERROR("Batchbuffer ioctl disabled\n");
-		return -EINVAL;
-	}
-
-	DRM_DEBUG_DRIVER("i915 batchbuffer, start %x used %d cliprects %d\n",
-			batch->start, batch->used, batch->num_cliprects);
-
-	RING_LOCK_TEST_WITH_RETURN(dev, file_priv);
-
-	if (batch->num_cliprects < 0)
-		return -EINVAL;
-
-	if (batch->num_cliprects) {
-		cliprects = kcalloc(batch->num_cliprects,
-				    sizeof(*cliprects),
-				    GFP_KERNEL);
-		if (cliprects == NULL)
-			return -ENOMEM;
-
-		ret = copy_from_user(cliprects, batch->cliprects,
-				     batch->num_cliprects *
-				     sizeof(struct drm_clip_rect));
-		if (ret != 0) {
-			ret = -EFAULT;
-			goto fail_free;
-		}
-	}
-
-	mutex_lock(&dev->struct_mutex);
-	ret = i915_dispatch_batchbuffer(dev, batch, cliprects);
-	mutex_unlock(&dev->struct_mutex);
-
-	if (sarea_priv)
-		sarea_priv->last_dispatch = READ_BREADCRUMB(dev_priv);
-
-fail_free:
-	kfree(cliprects);
-
-	return ret;
-}
-
-static int i915_cmdbuffer(struct drm_device *dev, void *data,
-			  struct drm_file *file_priv)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_master_private *master_priv;
-	drm_i915_sarea_t *sarea_priv;
-	drm_i915_cmdbuffer_t *cmdbuf = data;
-	struct drm_clip_rect *cliprects = NULL;
-	void *batch_data;
-	int ret;
-
-	DRM_DEBUG_DRIVER("i915 cmdbuffer, buf %p sz %d cliprects %d\n",
-			cmdbuf->buf, cmdbuf->sz, cmdbuf->num_cliprects);
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return -ENODEV;
-
-	master_priv = dev->primary->master->driver_priv;
-	sarea_priv = (drm_i915_sarea_t *) master_priv->sarea_priv;
-
-	RING_LOCK_TEST_WITH_RETURN(dev, file_priv);
-
-	if (cmdbuf->num_cliprects < 0)
-		return -EINVAL;
-
-	batch_data = kmalloc(cmdbuf->sz, GFP_KERNEL);
-	if (batch_data == NULL)
-		return -ENOMEM;
-
-	ret = copy_from_user(batch_data, cmdbuf->buf, cmdbuf->sz);
-	if (ret != 0) {
-		ret = -EFAULT;
-		goto fail_batch_free;
-	}
-
-	if (cmdbuf->num_cliprects) {
-		cliprects = kcalloc(cmdbuf->num_cliprects,
-				    sizeof(*cliprects), GFP_KERNEL);
-		if (cliprects == NULL) {
-			ret = -ENOMEM;
-			goto fail_batch_free;
-		}
-
-		ret = copy_from_user(cliprects, cmdbuf->cliprects,
-				     cmdbuf->num_cliprects *
-				     sizeof(struct drm_clip_rect));
-		if (ret != 0) {
-			ret = -EFAULT;
-			goto fail_clip_free;
-		}
-	}
-
-	mutex_lock(&dev->struct_mutex);
-	ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects, batch_data);
-	mutex_unlock(&dev->struct_mutex);
-	if (ret) {
-		DRM_ERROR("i915_dispatch_cmdbuffer failed\n");
-		goto fail_clip_free;
-	}
-
-	if (sarea_priv)
-		sarea_priv->last_dispatch = READ_BREADCRUMB(dev_priv);
-
-fail_clip_free:
-	kfree(cliprects);
-fail_batch_free:
-	kfree(batch_data);
-
-	return ret;
-}
-
-static int i915_emit_irq(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_master_private *master_priv = dev->primary->master->driver_priv;
-
-	i915_kernel_lost_context(dev);
-
-	DRM_DEBUG_DRIVER("\n");
-
-	dev_priv->dri1.counter++;
-	if (dev_priv->dri1.counter > 0x7FFFFFFFUL)
-		dev_priv->dri1.counter = 1;
-	if (master_priv->sarea_priv)
-		master_priv->sarea_priv->last_enqueue = dev_priv->dri1.counter;
-
-	if (BEGIN_LP_RING(4) == 0) {
-		OUT_RING(MI_STORE_DWORD_INDEX);
-		OUT_RING(I915_BREADCRUMB_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-		OUT_RING(dev_priv->dri1.counter);
-		OUT_RING(MI_USER_INTERRUPT);
-		ADVANCE_LP_RING();
-	}
-
-	return dev_priv->dri1.counter;
-}
-
-static int i915_wait_irq(struct drm_device *dev, int irq_nr)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_master_private *master_priv = dev->primary->master->driver_priv;
-	int ret = 0;
-	struct intel_engine_cs *ring = LP_RING(dev_priv);
-
-	DRM_DEBUG_DRIVER("irq_nr=%d breadcrumb=%d\n", irq_nr,
-		  READ_BREADCRUMB(dev_priv));
-
-	if (READ_BREADCRUMB(dev_priv) >= irq_nr) {
-		if (master_priv->sarea_priv)
-			master_priv->sarea_priv->last_dispatch = READ_BREADCRUMB(dev_priv);
-		return 0;
-	}
-
-	if (master_priv->sarea_priv)
-		master_priv->sarea_priv->perf_boxes |= I915_BOX_WAIT;
-
-	if (ring->irq_get(ring)) {
-		DRM_WAIT_ON(ret, ring->irq_queue, 3 * HZ,
-			    READ_BREADCRUMB(dev_priv) >= irq_nr);
-		ring->irq_put(ring);
-	} else if (wait_for(READ_BREADCRUMB(dev_priv) >= irq_nr, 3000))
-		ret = -EBUSY;
-
-	if (ret == -EBUSY) {
-		DRM_ERROR("EBUSY -- rec: %d emitted: %d\n",
-			  READ_BREADCRUMB(dev_priv), (int)dev_priv->dri1.counter);
-	}
-
-	return ret;
-}
-
-/* Needs the lock as it touches the ring.
- */
-static int i915_irq_emit(struct drm_device *dev, void *data,
-			 struct drm_file *file_priv)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	drm_i915_irq_emit_t *emit = data;
-	int result;
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return -ENODEV;
-
-	if (!dev_priv || !LP_RING(dev_priv)->buffer->virtual_start) {
-		DRM_ERROR("called with no initialization\n");
-		return -EINVAL;
-	}
-
-	RING_LOCK_TEST_WITH_RETURN(dev, file_priv);
-
-	mutex_lock(&dev->struct_mutex);
-	result = i915_emit_irq(dev);
-	mutex_unlock(&dev->struct_mutex);
-
-	if (copy_to_user(emit->irq_seq, &result, sizeof(int))) {
-		DRM_ERROR("copy_to_user\n");
-		return -EFAULT;
-	}
-
-	return 0;
-}
-
-/* Doesn't need the hardware lock.
- */
-static int i915_irq_wait(struct drm_device *dev, void *data,
-			 struct drm_file *file_priv)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	drm_i915_irq_wait_t *irqwait = data;
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return -ENODEV;
-
-	if (!dev_priv) {
-		DRM_ERROR("called with no initialization\n");
-		return -EINVAL;
-	}
-
-	return i915_wait_irq(dev, irqwait->irq_seq);
-}
-
-static int i915_vblank_pipe_get(struct drm_device *dev, void *data,
-			 struct drm_file *file_priv)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	drm_i915_vblank_pipe_t *pipe = data;
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return -ENODEV;
-
-	if (!dev_priv) {
-		DRM_ERROR("called with no initialization\n");
-		return -EINVAL;
-	}
-
-	pipe->pipe = DRM_I915_VBLANK_PIPE_A | DRM_I915_VBLANK_PIPE_B;
-
-	return 0;
-}
-
-/**
- * Schedule buffer swap at given vertical blank.
- */
-static int i915_vblank_swap(struct drm_device *dev, void *data,
-		     struct drm_file *file_priv)
-{
-	/* The delayed swap mechanism was fundamentally racy, and has been
-	 * removed.  The model was that the client requested a delayed flip/swap
-	 * from the kernel, then waited for vblank before continuing to perform
-	 * rendering.  The problem was that the kernel might wake the client
-	 * up before it dispatched the vblank swap (since the lock has to be
-	 * held while touching the ringbuffer), in which case the client would
-	 * clear and start the next frame before the swap occurred, and
-	 * flicker would occur in addition to likely missing the vblank.
-	 *
-	 * In the absence of this ioctl, userland falls back to a correct path
-	 * of waiting for a vblank, then dispatching the swap on its own.
-	 * Context switching to userland and back is plenty fast enough for
-	 * meeting the requirements of vblank swapping.
-	 */
-	return -EINVAL;
-}
-
-static int i915_flip_bufs(struct drm_device *dev, void *data,
-			  struct drm_file *file_priv)
-{
-	int ret;
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return -ENODEV;
-
-	DRM_DEBUG_DRIVER("%s\n", __func__);
-
-	RING_LOCK_TEST_WITH_RETURN(dev, file_priv);
-
-	mutex_lock(&dev->struct_mutex);
-	ret = i915_dispatch_flip(dev);
-	mutex_unlock(&dev->struct_mutex);
-
-	return ret;
-}
 
 static int i915_getparam(struct drm_device *dev, void *data,
 			 struct drm_file *file_priv)
@@ -936,21 +58,12 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	drm_i915_getparam_t *param = data;
 	int value;
 
-	if (!dev_priv) {
-		DRM_ERROR("called with no initialization\n");
-		return -EINVAL;
-	}
-
 	switch (param->param) {
 	case I915_PARAM_IRQ_ACTIVE:
-		value = dev->pdev->irq ? 1 : 0;
-		break;
 	case I915_PARAM_ALLOW_BATCHBUFFER:
-		value = dev_priv->dri1.allow_batchbuffer ? 1 : 0;
-		break;
 	case I915_PARAM_LAST_DISPATCH:
-		value = READ_BREADCRUMB(dev_priv);
-		break;
+		/* Reject all old ums/dri params. */
+		return -ENODEV;
 	case I915_PARAM_CHIPSET_ID:
 		value = dev->pdev->device;
 		break;
@@ -1027,6 +140,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_CMD_PARSER_VERSION:
 		value = i915_cmd_parser_get_version();
 		break;
+	case I915_PARAM_HAS_COHERENT_PHYS_GTT:
+		value = 1;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
@@ -1046,19 +162,13 @@ static int i915_setparam(struct drm_device *dev, void *data,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	drm_i915_setparam_t *param = data;
 
-	if (!dev_priv) {
-		DRM_ERROR("called with no initialization\n");
-		return -EINVAL;
-	}
-
 	switch (param->param) {
 	case I915_SETPARAM_USE_MI_BATCHBUFFER_START:
-		break;
 	case I915_SETPARAM_TEX_LRU_LOG_GRANULARITY:
-		break;
 	case I915_SETPARAM_ALLOW_BATCHBUFFER:
-		dev_priv->dri1.allow_batchbuffer = param->value ? 1 : 0;
-		break;
+		/* Reject all old ums/dri params. */
+		return -ENODEV;
+
 	case I915_SETPARAM_NUM_USED_FENCES:
 		if (param->value > dev_priv->num_fence_regs ||
 		    param->value < 0)
@@ -1075,54 +185,6 @@ static int i915_setparam(struct drm_device *dev, void *data,
 	return 0;
 }
 
-static int i915_set_status_page(struct drm_device *dev, void *data,
-				struct drm_file *file_priv)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	drm_i915_hws_addr_t *hws = data;
-	struct intel_engine_cs *ring;
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return -ENODEV;
-
-	if (!I915_NEED_GFX_HWS(dev))
-		return -EINVAL;
-
-	if (!dev_priv) {
-		DRM_ERROR("called with no initialization\n");
-		return -EINVAL;
-	}
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET)) {
-		WARN(1, "tried to set status page when mode setting active\n");
-		return 0;
-	}
-
-	DRM_DEBUG_DRIVER("set status page addr 0x%08x\n", (u32)hws->addr);
-
-	ring = LP_RING(dev_priv);
-	ring->status_page.gfx_addr = hws->addr & (0x1ffff<<12);
-
-	dev_priv->dri1.gfx_hws_cpu_addr =
-		ioremap_wc(dev_priv->gtt.mappable_base + hws->addr, 4096);
-	if (dev_priv->dri1.gfx_hws_cpu_addr == NULL) {
-		i915_dma_cleanup(dev);
-		ring->status_page.gfx_addr = 0;
-		DRM_ERROR("can not ioremap virtual address for"
-				" G33 hw status page\n");
-		return -ENOMEM;
-	}
-
-	memset_io(dev_priv->dri1.gfx_hws_cpu_addr, 0, PAGE_SIZE);
-	I915_WRITE(HWS_PGA, ring->status_page.gfx_addr);
-
-	DRM_DEBUG_DRIVER("load hws HWS_PGA with gfx mem 0x%x\n",
-			 ring->status_page.gfx_addr);
-	DRM_DEBUG_DRIVER("load hws at %p\n",
-			 ring->status_page.page_addr);
-	return 0;
-}
-
 static int i915_get_bridge_dev(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1275,12 +337,12 @@ static void i915_switcheroo_set_state(struct pci_dev *pdev, enum vga_switcheroo_
 		dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
 		/* i915 resume handler doesn't set to D0 */
 		pci_set_power_state(dev->pdev, PCI_D0);
-		i915_resume(dev);
+		i915_resume_legacy(dev);
 		dev->switch_power_state = DRM_SWITCH_POWER_ON;
 	} else {
 		pr_err("switched off\n");
 		dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
-		i915_suspend(dev, pmm);
+		i915_suspend_legacy(dev, pmm);
 		dev->switch_power_state = DRM_SWITCH_POWER_OFF;
 	}
 }
@@ -1338,14 +400,7 @@ static int i915_load_modeset_init(struct drm_device *dev)
 
 	intel_power_domains_init_hw(dev_priv);
 
-	/*
-	 * We enable some interrupt sources in our postinstall hooks, so mark
-	 * interrupts as enabled _before_ actually enabling them to avoid
-	 * special cases in our ordering checks.
-	 */
-	dev_priv->pm._irqs_disabled = false;
-
-	ret = drm_irq_install(dev, dev->pdev->irq);
+	ret = intel_irq_install(dev_priv);
 	if (ret)
 		goto cleanup_gem_stolen;
 
@@ -1370,7 +425,7 @@ static int i915_load_modeset_init(struct drm_device *dev)
 		goto cleanup_gem;
 
 	/* Only enable hotplug handling once the fbdev is fully set up. */
-	intel_hpd_init(dev);
+	intel_hpd_init(dev_priv);
 
 	/*
 	 * Some ports require correctly set-up hpd registers for detection to
@@ -1405,30 +460,6 @@ out:
 	return ret;
 }
 
-int i915_master_create(struct drm_device *dev, struct drm_master *master)
-{
-	struct drm_i915_master_private *master_priv;
-
-	master_priv = kzalloc(sizeof(*master_priv), GFP_KERNEL);
-	if (!master_priv)
-		return -ENOMEM;
-
-	master->driver_priv = master_priv;
-	return 0;
-}
-
-void i915_master_destroy(struct drm_device *dev, struct drm_master *master)
-{
-	struct drm_i915_master_private *master_priv = master->driver_priv;
-
-	if (!master_priv)
-		return;
-
-	kfree(master_priv);
-
-	master->driver_priv = NULL;
-}
-
 #if IS_ENABLED(CONFIG_FB)
 static int i915_kick_out_firmware_fb(struct drm_i915_private *dev_priv)
 {
@@ -1534,7 +565,7 @@ static void intel_device_info_runtime_init(struct drm_device *dev)
 
 	info = (struct intel_device_info *)&dev_priv->info;
 
-	if (IS_VALLEYVIEW(dev))
+	if (IS_VALLEYVIEW(dev) || INTEL_INFO(dev)->gen == 9)
 		for_each_pipe(dev_priv, pipe)
 			info->num_sprites[pipe] = 2;
 	else
@@ -1614,7 +645,7 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 
 	spin_lock_init(&dev_priv->irq_lock);
 	spin_lock_init(&dev_priv->gpu_error.lock);
-	spin_lock_init(&dev_priv->backlight_lock);
+	mutex_init(&dev_priv->backlight_lock);
 	spin_lock_init(&dev_priv->uncore.lock);
 	spin_lock_init(&dev_priv->mm.object_stat_lock);
 	spin_lock_init(&dev_priv->mmio_flip_lock);
@@ -1742,7 +773,7 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 		goto out_freewq;
 	}
 
-	intel_irq_init(dev);
+	intel_irq_init(dev_priv);
 	intel_uncore_sanitize(dev);
 
 	/* Try to make sure MCHBAR is enabled before poking at it */
@@ -1784,9 +815,6 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 			DRM_ERROR("failed to init modeset\n");
 			goto out_power_well;
 		}
-	} else {
-		/* Start out suspended in ums mode. */
-		dev_priv->ums.mm_suspended = 1;
 	}
 
 	i915_setup_sysfs(dev);
@@ -1800,12 +828,12 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 	if (IS_GEN5(dev))
 		intel_gpu_ips_init(dev_priv);
 
-	intel_init_runtime_pm(dev_priv);
+	intel_runtime_pm_enable(dev_priv);
 
 	return 0;
 
 out_power_well:
-	intel_power_domains_remove(dev_priv);
+	intel_power_domains_fini(dev_priv);
 	drm_vblank_cleanup(dev);
 out_gem_unload:
 	WARN_ON(unregister_oom_notifier(&dev_priv->mm.oom_notifier));
@@ -1848,16 +876,10 @@ int i915_driver_unload(struct drm_device *dev)
 		return ret;
 	}
 
-	intel_fini_runtime_pm(dev_priv);
+	intel_power_domains_fini(dev_priv);
 
 	intel_gpu_ips_teardown();
 
-	/* The i915.ko module is still not prepared to be loaded when
-	 * the power well is not enabled, so just enable it in case
-	 * we're going to unload/reload. */
-	intel_display_set_init_power(dev_priv, true);
-	intel_power_domains_remove(dev_priv);
-
 	i915_teardown_sysfs(dev);
 
 	WARN_ON(unregister_oom_notifier(&dev_priv->mm.oom_notifier));
@@ -1868,8 +890,12 @@ int i915_driver_unload(struct drm_device *dev)
 
 	acpi_video_unregister();
 
-	if (drm_core_check_feature(dev, DRIVER_MODESET)) {
+	if (drm_core_check_feature(dev, DRIVER_MODESET))
 		intel_fbdev_fini(dev);
+
+	drm_vblank_cleanup(dev);
+
+	if (drm_core_check_feature(dev, DRIVER_MODESET)) {
 		intel_modeset_cleanup(dev);
 
 		/*
@@ -1905,13 +931,8 @@ int i915_driver_unload(struct drm_device *dev)
 		i915_gem_context_fini(dev);
 		mutex_unlock(&dev->struct_mutex);
 		i915_gem_cleanup_stolen(dev);
-
-		if (!I915_NEED_GFX_HWS(dev))
-			i915_free_hws(dev);
 	}
 
-	drm_vblank_cleanup(dev);
-
 	intel_teardown_gmbus(dev);
 	intel_teardown_mchbar(dev);
 
@@ -1959,23 +980,8 @@ int i915_driver_open(struct drm_device *dev, struct drm_file *file)
  */
 void i915_driver_lastclose(struct drm_device *dev)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	/* On gen6+ we refuse to init without kms enabled, but then the drm core
-	 * goes right around and calls lastclose. Check for this and don't clean
-	 * up anything. */
-	if (!dev_priv)
-		return;
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET)) {
-		intel_fbdev_restore_mode(dev);
-		vga_switcheroo_process_delayed_switch();
-		return;
-	}
-
-	i915_gem_lastclose(dev);
-
-	i915_dma_cleanup(dev);
+	intel_fbdev_restore_mode(dev);
+	vga_switcheroo_process_delayed_switch();
 }
 
 void i915_driver_preclose(struct drm_device *dev, struct drm_file *file)
@@ -1999,24 +1005,24 @@ void i915_driver_postclose(struct drm_device *dev, struct drm_file *file)
 }
 
 const struct drm_ioctl_desc i915_ioctls[] = {
-	DRM_IOCTL_DEF_DRV(I915_INIT, i915_dma_init, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
-	DRM_IOCTL_DEF_DRV(I915_FLUSH, i915_flush_ioctl, DRM_AUTH),
-	DRM_IOCTL_DEF_DRV(I915_FLIP, i915_flip_bufs, DRM_AUTH),
-	DRM_IOCTL_DEF_DRV(I915_BATCHBUFFER, i915_batchbuffer, DRM_AUTH),
-	DRM_IOCTL_DEF_DRV(I915_IRQ_EMIT, i915_irq_emit, DRM_AUTH),
-	DRM_IOCTL_DEF_DRV(I915_IRQ_WAIT, i915_irq_wait, DRM_AUTH),
+	DRM_IOCTL_DEF_DRV(I915_INIT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
+	DRM_IOCTL_DEF_DRV(I915_FLUSH, drm_noop, DRM_AUTH),
+	DRM_IOCTL_DEF_DRV(I915_FLIP, drm_noop, DRM_AUTH),
+	DRM_IOCTL_DEF_DRV(I915_BATCHBUFFER, drm_noop, DRM_AUTH),
+	DRM_IOCTL_DEF_DRV(I915_IRQ_EMIT, drm_noop, DRM_AUTH),
+	DRM_IOCTL_DEF_DRV(I915_IRQ_WAIT, drm_noop, DRM_AUTH),
 	DRM_IOCTL_DEF_DRV(I915_GETPARAM, i915_getparam, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_SETPARAM, i915_setparam, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_ALLOC, drm_noop, DRM_AUTH),
 	DRM_IOCTL_DEF_DRV(I915_FREE, drm_noop, DRM_AUTH),
 	DRM_IOCTL_DEF_DRV(I915_INIT_HEAP, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
-	DRM_IOCTL_DEF_DRV(I915_CMDBUFFER, i915_cmdbuffer, DRM_AUTH),
+	DRM_IOCTL_DEF_DRV(I915_CMDBUFFER, drm_noop, DRM_AUTH),
 	DRM_IOCTL_DEF_DRV(I915_DESTROY_HEAP,  drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_SET_VBLANK_PIPE,  drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
-	DRM_IOCTL_DEF_DRV(I915_GET_VBLANK_PIPE,  i915_vblank_pipe_get, DRM_AUTH),
-	DRM_IOCTL_DEF_DRV(I915_VBLANK_SWAP, i915_vblank_swap, DRM_AUTH),
-	DRM_IOCTL_DEF_DRV(I915_HWS_ADDR, i915_set_status_page, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
-	DRM_IOCTL_DEF_DRV(I915_GEM_INIT, i915_gem_init_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY|DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(I915_GET_VBLANK_PIPE,  drm_noop, DRM_AUTH),
+	DRM_IOCTL_DEF_DRV(I915_VBLANK_SWAP, drm_noop, DRM_AUTH),
+	DRM_IOCTL_DEF_DRV(I915_HWS_ADDR, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
+	DRM_IOCTL_DEF_DRV(I915_GEM_INIT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER, i915_gem_execbuffer, DRM_AUTH|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER2, i915_gem_execbuffer2, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_PIN, i915_gem_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY|DRM_UNLOCKED),
@@ -2025,8 +1031,8 @@ const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_SET_CACHING, i915_gem_set_caching_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_GET_CACHING, i915_gem_get_caching_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_THROTTLE, i915_gem_throttle_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
-	DRM_IOCTL_DEF_DRV(I915_GEM_ENTERVT, i915_gem_entervt_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY|DRM_UNLOCKED),
-	DRM_IOCTL_DEF_DRV(I915_GEM_LEAVEVT, i915_gem_leavevt_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY|DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(I915_GEM_ENTERVT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY|DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(I915_GEM_LEAVEVT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CREATE, i915_gem_create_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_PREAD, i915_gem_pread_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_PWRITE, i915_gem_pwrite_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 2318b4c7a8f8..f990ab4c3efb 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -356,6 +356,19 @@ static const struct intel_device_info intel_cherryview_info = {
 	CURSOR_OFFSETS,
 };
 
+static const struct intel_device_info intel_skylake_info = {
+	.is_preliminary = 1,
+	.is_skylake = 1,
+	.gen = 9, .num_pipes = 3,
+	.need_gfx_hws = 1, .has_hotplug = 1,
+	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING,
+	.has_llc = 1,
+	.has_ddi = 1,
+	.has_fbc = 1,
+	GEN_DEFAULT_PIPEOFFSETS,
+	IVB_CURSOR_OFFSETS,
+};
+
 /*
  * Make sure any device matches here are from most specific to most
  * general.  For example, since the Quanta match is based on the subsystem
@@ -392,7 +405,8 @@ static const struct intel_device_info intel_cherryview_info = {
 	INTEL_BDW_GT12D_IDS(&intel_broadwell_d_info),	\
 	INTEL_BDW_GT3M_IDS(&intel_broadwell_gt3m_info),	\
 	INTEL_BDW_GT3D_IDS(&intel_broadwell_gt3d_info), \
-	INTEL_CHV_IDS(&intel_cherryview_info)
+	INTEL_CHV_IDS(&intel_cherryview_info),	\
+	INTEL_SKL_IDS(&intel_skylake_info)
 
 static const struct pci_device_id pciidlist[] = {		/* aka */
 	INTEL_PCI_IDS,
@@ -449,7 +463,7 @@ void intel_detect_pch(struct drm_device *dev)
 				dev_priv->pch_type = PCH_LPT;
 				DRM_DEBUG_KMS("Found LynxPoint PCH\n");
 				WARN_ON(!IS_HASWELL(dev));
-				WARN_ON(IS_ULT(dev));
+				WARN_ON(IS_HSW_ULT(dev));
 			} else if (IS_BROADWELL(dev)) {
 				dev_priv->pch_type = PCH_LPT;
 				dev_priv->pch_id =
@@ -460,7 +474,15 @@ void intel_detect_pch(struct drm_device *dev)
 				dev_priv->pch_type = PCH_LPT;
 				DRM_DEBUG_KMS("Found LynxPoint LP PCH\n");
 				WARN_ON(!IS_HASWELL(dev));
-				WARN_ON(!IS_ULT(dev));
+				WARN_ON(!IS_HSW_ULT(dev));
+			} else if (id == INTEL_PCH_SPT_DEVICE_ID_TYPE) {
+				dev_priv->pch_type = PCH_SPT;
+				DRM_DEBUG_KMS("Found SunrisePoint PCH\n");
+				WARN_ON(!IS_SKYLAKE(dev));
+			} else if (id == INTEL_PCH_SPT_LP_DEVICE_ID_TYPE) {
+				dev_priv->pch_type = PCH_SPT;
+				DRM_DEBUG_KMS("Found SunrisePoint LP PCH\n");
+				WARN_ON(!IS_SKYLAKE(dev));
 			} else
 				continue;
 
@@ -529,10 +551,10 @@ static void intel_suspend_encoders(struct drm_i915_private *dev_priv)
 }
 
 static int intel_suspend_complete(struct drm_i915_private *dev_priv);
-static int intel_resume_prepare(struct drm_i915_private *dev_priv,
-				bool rpm_resume);
+static int vlv_resume_prepare(struct drm_i915_private *dev_priv,
+			      bool rpm_resume);
 
-static int i915_drm_freeze(struct drm_device *dev)
+static int i915_drm_suspend(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_crtc *crtc;
@@ -562,6 +584,8 @@ static int i915_drm_freeze(struct drm_device *dev)
 			return error;
 		}
 
+		intel_suspend_gt_powersave(dev);
+
 		/*
 		 * Disable CRTCs directly since we want to preserve sw state
 		 * for _thaw. Also, power gate the CRTC power wells.
@@ -573,16 +597,12 @@ static int i915_drm_freeze(struct drm_device *dev)
 
 		intel_dp_mst_suspend(dev);
 
-		flush_delayed_work(&dev_priv->rps.delayed_resume_work);
-
-		intel_runtime_pm_disable_interrupts(dev);
+		intel_runtime_pm_disable_interrupts(dev_priv);
 		intel_hpd_cancel_work(dev_priv);
 
 		intel_suspend_encoders(dev_priv);
 
-		intel_suspend_gt_powersave(dev);
-
-		intel_modeset_suspend_hw(dev);
+		intel_suspend_hw(dev);
 	}
 
 	i915_gem_suspend_gtt_mappings(dev);
@@ -608,7 +628,26 @@ static int i915_drm_freeze(struct drm_device *dev)
 	return 0;
 }
 
-int i915_suspend(struct drm_device *dev, pm_message_t state)
+static int i915_drm_suspend_late(struct drm_device *drm_dev)
+{
+	struct drm_i915_private *dev_priv = drm_dev->dev_private;
+	int ret;
+
+	ret = intel_suspend_complete(dev_priv);
+
+	if (ret) {
+		DRM_ERROR("Suspend complete failed: %d\n", ret);
+
+		return ret;
+	}
+
+	pci_disable_device(drm_dev->pdev);
+	pci_set_power_state(drm_dev->pdev, PCI_D3hot);
+
+	return 0;
+}
+
+int i915_suspend_legacy(struct drm_device *dev, pm_message_t state)
 {
 	int error;
 
@@ -618,48 +657,25 @@ int i915_suspend(struct drm_device *dev, pm_message_t state)
 		return -ENODEV;
 	}
 
-	if (state.event == PM_EVENT_PRETHAW)
-		return 0;
-
+	if (WARN_ON_ONCE(state.event != PM_EVENT_SUSPEND &&
+			 state.event != PM_EVENT_FREEZE))
+		return -EINVAL;
 
 	if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
 		return 0;
 
-	error = i915_drm_freeze(dev);
+	error = i915_drm_suspend(dev);
 	if (error)
 		return error;
 
-	if (state.event == PM_EVENT_SUSPEND) {
-		/* Shut down the device */
-		pci_disable_device(dev->pdev);
-		pci_set_power_state(dev->pdev, PCI_D3hot);
-	}
-
-	return 0;
+	return i915_drm_suspend_late(dev);
 }
 
-static int i915_drm_thaw_early(struct drm_device *dev)
+static int i915_drm_resume(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	int ret;
-
-	ret = intel_resume_prepare(dev_priv, false);
-	if (ret)
-		DRM_ERROR("Resume prepare failed: %d,Continuing resume\n", ret);
 
-	intel_uncore_early_sanitize(dev, true);
-	intel_uncore_sanitize(dev);
-	intel_power_domains_init_hw(dev_priv);
-
-	return ret;
-}
-
-static int __i915_drm_thaw(struct drm_device *dev, bool restore_gtt_mappings)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET) &&
-	    restore_gtt_mappings) {
+	if (drm_core_check_feature(dev, DRIVER_MODESET)) {
 		mutex_lock(&dev->struct_mutex);
 		i915_gem_restore_gtt_mappings(dev);
 		mutex_unlock(&dev->struct_mutex);
@@ -680,30 +696,29 @@ static int __i915_drm_thaw(struct drm_device *dev, bool restore_gtt_mappings)
 		}
 		mutex_unlock(&dev->struct_mutex);
 
-		intel_runtime_pm_restore_interrupts(dev);
+		/* We need working interrupts for modeset enabling ... */
+		intel_runtime_pm_enable_interrupts(dev_priv);
 
 		intel_modeset_init_hw(dev);
 
-		{
-			unsigned long irqflags;
-			spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
-			if (dev_priv->display.hpd_irq_setup)
-				dev_priv->display.hpd_irq_setup(dev);
-			spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
-		}
+		spin_lock_irq(&dev_priv->irq_lock);
+		if (dev_priv->display.hpd_irq_setup)
+			dev_priv->display.hpd_irq_setup(dev);
+		spin_unlock_irq(&dev_priv->irq_lock);
 
-		intel_dp_mst_resume(dev);
 		drm_modeset_lock_all(dev);
 		intel_modeset_setup_hw_state(dev, true);
 		drm_modeset_unlock_all(dev);
 
+		intel_dp_mst_resume(dev);
+
 		/*
 		 * ... but also need to make sure that hotplug processing
 		 * doesn't cause havoc. Like in the driver load code we don't
 		 * bother with the tiny race here where we might loose hotplug
 		 * notifications.
 		 * */
-		intel_hpd_init(dev);
+		intel_hpd_init(dev_priv);
 		/* Config may have changed between suspend and resume */
 		drm_helper_hpd_irq_event(dev);
 	}
@@ -718,21 +733,15 @@ static int __i915_drm_thaw(struct drm_device *dev, bool restore_gtt_mappings)
 
 	intel_opregion_notify_adapter(dev, PCI_D0);
 
-	return 0;
-}
-
-static int i915_drm_thaw(struct drm_device *dev)
-{
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		i915_check_and_clear_faults(dev);
+	drm_kms_helper_poll_enable(dev);
 
-	return __i915_drm_thaw(dev, true);
+	return 0;
 }
 
-static int i915_resume_early(struct drm_device *dev)
+static int i915_drm_resume_early(struct drm_device *dev)
 {
-	if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
-		return 0;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret = 0;
 
 	/*
 	 * We have a resume ordering issue with the snd-hda driver also
@@ -748,33 +757,34 @@ static int i915_resume_early(struct drm_device *dev)
 
 	pci_set_master(dev->pdev);
 
-	return i915_drm_thaw_early(dev);
+	if (IS_VALLEYVIEW(dev_priv))
+		ret = vlv_resume_prepare(dev_priv, false);
+	if (ret)
+		DRM_ERROR("Resume prepare failed: %d,Continuing resume\n", ret);
+
+	intel_uncore_early_sanitize(dev, true);
+
+	if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
+		hsw_disable_pc8(dev_priv);
+
+	intel_uncore_sanitize(dev);
+	intel_power_domains_init_hw(dev_priv);
+
+	return ret;
 }
 
-int i915_resume(struct drm_device *dev)
+int i915_resume_legacy(struct drm_device *dev)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret;
 
-	/*
-	 * Platforms with opregion should have sane BIOS, older ones (gen3 and
-	 * earlier) need to restore the GTT mappings since the BIOS might clear
-	 * all our scratch PTEs.
-	 */
-	ret = __i915_drm_thaw(dev, !dev_priv->opregion.header);
+	if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
+		return 0;
+
+	ret = i915_drm_resume_early(dev);
 	if (ret)
 		return ret;
 
-	drm_kms_helper_poll_enable(dev);
-	return 0;
-}
-
-static int i915_resume_legacy(struct drm_device *dev)
-{
-	i915_resume_early(dev);
-	i915_resume(dev);
-
-	return 0;
+	return i915_drm_resume(dev);
 }
 
 /**
@@ -820,6 +830,9 @@ int i915_reset(struct drm_device *dev)
 		}
 	}
 
+	if (i915_stop_ring_allow_warn(dev_priv))
+		pr_notice("drm/i915: Resetting chip after gpu hang\n");
+
 	if (ret) {
 		DRM_ERROR("Failed to reset chip: %i\n", ret);
 		mutex_unlock(&dev->struct_mutex);
@@ -840,10 +853,7 @@ int i915_reset(struct drm_device *dev)
 	 * was running at the time of the reset (i.e. we weren't VT
 	 * switched away).
 	 */
-	if (drm_core_check_feature(dev, DRIVER_MODESET) ||
-			!dev_priv->ums.mm_suspended) {
-		dev_priv->ums.mm_suspended = 0;
-
+	if (drm_core_check_feature(dev, DRIVER_MODESET)) {
 		/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */
 		dev_priv->gpu_error.reload_in_reset = true;
 
@@ -923,15 +933,13 @@ static int i915_pm_suspend(struct device *dev)
 	if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF)
 		return 0;
 
-	return i915_drm_freeze(drm_dev);
+	return i915_drm_suspend(drm_dev);
 }
 
 static int i915_pm_suspend_late(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct drm_device *drm_dev = pci_get_drvdata(pdev);
-	struct drm_i915_private *dev_priv = drm_dev->dev_private;
-	int ret;
 
 	/*
 	 * We have a suspedn ordering issue with the snd-hda driver also
@@ -945,16 +953,7 @@ static int i915_pm_suspend_late(struct device *dev)
 	if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF)
 		return 0;
 
-	ret = intel_suspend_complete(dev_priv);
-
-	if (ret)
-		DRM_ERROR("Suspend complete failed: %d\n", ret);
-	else {
-		pci_disable_device(pdev);
-		pci_set_power_state(pdev, PCI_D3hot);
-	}
-
-	return ret;
+	return i915_drm_suspend_late(drm_dev);
 }
 
 static int i915_pm_resume_early(struct device *dev)
@@ -962,61 +961,21 @@ static int i915_pm_resume_early(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct drm_device *drm_dev = pci_get_drvdata(pdev);
 
-	return i915_resume_early(drm_dev);
-}
-
-static int i915_pm_resume(struct device *dev)
-{
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct drm_device *drm_dev = pci_get_drvdata(pdev);
-
-	return i915_resume(drm_dev);
-}
-
-static int i915_pm_freeze(struct device *dev)
-{
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct drm_device *drm_dev = pci_get_drvdata(pdev);
-
-	if (!drm_dev || !drm_dev->dev_private) {
-		dev_err(dev, "DRM not initialized, aborting suspend.\n");
-		return -ENODEV;
-	}
-
-	return i915_drm_freeze(drm_dev);
-}
-
-static int i915_pm_freeze_late(struct device *dev)
-{
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct drm_device *drm_dev = pci_get_drvdata(pdev);
-	struct drm_i915_private *dev_priv = drm_dev->dev_private;
-
-	return intel_suspend_complete(dev_priv);
-}
-
-static int i915_pm_thaw_early(struct device *dev)
-{
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct drm_device *drm_dev = pci_get_drvdata(pdev);
+	if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF)
+		return 0;
 
-	return i915_drm_thaw_early(drm_dev);
+	return i915_drm_resume_early(drm_dev);
 }
 
-static int i915_pm_thaw(struct device *dev)
+static int i915_pm_resume(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct drm_device *drm_dev = pci_get_drvdata(pdev);
 
-	return i915_drm_thaw(drm_dev);
-}
-
-static int i915_pm_poweroff(struct device *dev)
-{
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct drm_device *drm_dev = pci_get_drvdata(pdev);
+	if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF)
+		return 0;
 
-	return i915_drm_freeze(drm_dev);
+	return i915_drm_resume(drm_dev);
 }
 
 static int hsw_suspend_complete(struct drm_i915_private *dev_priv)
@@ -1026,25 +985,6 @@ static int hsw_suspend_complete(struct drm_i915_private *dev_priv)
 	return 0;
 }
 
-static int snb_resume_prepare(struct drm_i915_private *dev_priv,
-				bool rpm_resume)
-{
-	struct drm_device *dev = dev_priv->dev;
-
-	if (rpm_resume)
-		intel_init_pch_refclk(dev);
-
-	return 0;
-}
-
-static int hsw_resume_prepare(struct drm_i915_private *dev_priv,
-				bool rpm_resume)
-{
-	hsw_disable_pc8(dev_priv);
-
-	return 0;
-}
-
 /*
  * Save all Gunit registers that may be lost after a D3 and a subsequent
  * S0i[R123] transition. The list of registers needing a save/restore is
@@ -1449,18 +1389,13 @@ static int intel_runtime_suspend(struct device *device)
 	i915_gem_release_all_mmaps(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
 
-	/*
-	 * rps.work can't be rearmed here, since we get here only after making
-	 * sure the GPU is idle and the RPS freq is set to the minimum. See
-	 * intel_mark_idle().
-	 */
-	cancel_work_sync(&dev_priv->rps.work);
-	intel_runtime_pm_disable_interrupts(dev);
+	intel_suspend_gt_powersave(dev);
+	intel_runtime_pm_disable_interrupts(dev_priv);
 
 	ret = intel_suspend_complete(dev_priv);
 	if (ret) {
 		DRM_ERROR("Runtime suspend failed, disabling it (%d)\n", ret);
-		intel_runtime_pm_restore_interrupts(dev);
+		intel_runtime_pm_enable_interrupts(dev_priv);
 
 		return ret;
 	}
@@ -1502,7 +1437,7 @@ static int intel_runtime_resume(struct device *device)
 	struct pci_dev *pdev = to_pci_dev(device);
 	struct drm_device *dev = pci_get_drvdata(pdev);
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	int ret;
+	int ret = 0;
 
 	if (WARN_ON_ONCE(!HAS_RUNTIME_PM(dev)))
 		return -ENODEV;
@@ -1512,7 +1447,13 @@ static int intel_runtime_resume(struct device *device)
 	intel_opregion_notify_adapter(dev, PCI_D0);
 	dev_priv->pm.suspended = false;
 
-	ret = intel_resume_prepare(dev_priv, true);
+	if (IS_GEN6(dev_priv))
+		intel_init_pch_refclk(dev);
+	else if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
+		hsw_disable_pc8(dev_priv);
+	else if (IS_VALLEYVIEW(dev_priv))
+		ret = vlv_resume_prepare(dev_priv, true);
+
 	/*
 	 * No point of rolling back things in case of an error, as the best
 	 * we can do is to hope that things will still work (and disable RPM).
@@ -1520,8 +1461,8 @@ static int intel_runtime_resume(struct device *device)
 	i915_gem_init_swizzling(dev);
 	gen6_update_ring_freq(dev);
 
-	intel_runtime_pm_restore_interrupts(dev);
-	intel_reset_gt_powersave(dev);
+	intel_runtime_pm_enable_interrupts(dev_priv);
+	intel_enable_gt_powersave(dev);
 
 	if (ret)
 		DRM_ERROR("Runtime resume failed, disabling it (%d)\n", ret);
@@ -1550,41 +1491,41 @@ static int intel_suspend_complete(struct drm_i915_private *dev_priv)
 	return ret;
 }
 
-/*
- * This function implements common functionality of runtime and system
- * resume sequence. Variable rpm_resume used for implementing different
- * code paths.
- */
-static int intel_resume_prepare(struct drm_i915_private *dev_priv,
-				bool rpm_resume)
-{
-	struct drm_device *dev = dev_priv->dev;
-	int ret;
-
-	if (IS_GEN6(dev))
-		ret = snb_resume_prepare(dev_priv, rpm_resume);
-	else if (IS_HASWELL(dev) || IS_BROADWELL(dev))
-		ret = hsw_resume_prepare(dev_priv, rpm_resume);
-	else if (IS_VALLEYVIEW(dev))
-		ret = vlv_resume_prepare(dev_priv, rpm_resume);
-	else
-		ret = 0;
-
-	return ret;
-}
-
 static const struct dev_pm_ops i915_pm_ops = {
+	/*
+	 * S0ix (via system suspend) and S3 event handlers [PMSG_SUSPEND,
+	 * PMSG_RESUME]
+	 */
 	.suspend = i915_pm_suspend,
 	.suspend_late = i915_pm_suspend_late,
 	.resume_early = i915_pm_resume_early,
 	.resume = i915_pm_resume,
-	.freeze = i915_pm_freeze,
-	.freeze_late = i915_pm_freeze_late,
-	.thaw_early = i915_pm_thaw_early,
-	.thaw = i915_pm_thaw,
-	.poweroff = i915_pm_poweroff,
+
+	/*
+	 * S4 event handlers
+	 * @freeze, @freeze_late    : called (1) before creating the
+	 *                            hibernation image [PMSG_FREEZE] and
+	 *                            (2) after rebooting, before restoring
+	 *                            the image [PMSG_QUIESCE]
+	 * @thaw, @thaw_early       : called (1) after creating the hibernation
+	 *                            image, before writing it [PMSG_THAW]
+	 *                            and (2) after failing to create or
+	 *                            restore the image [PMSG_RECOVER]
+	 * @poweroff, @poweroff_late: called after writing the hibernation
+	 *                            image, before rebooting [PMSG_HIBERNATE]
+	 * @restore, @restore_early : called after rebooting and restoring the
+	 *                            hibernation image [PMSG_RESTORE]
+	 */
+	.freeze = i915_pm_suspend,
+	.freeze_late = i915_pm_suspend_late,
+	.thaw_early = i915_pm_resume_early,
+	.thaw = i915_pm_resume,
+	.poweroff = i915_pm_suspend,
+	.poweroff_late = i915_pm_suspend_late,
 	.restore_early = i915_pm_resume_early,
 	.restore = i915_pm_resume,
+
+	/* S0ix (via runtime suspend) event handlers */
 	.runtime_suspend = intel_runtime_suspend,
 	.runtime_resume = intel_runtime_resume,
 };
@@ -1626,12 +1567,10 @@ static struct drm_driver driver = {
 	.set_busid = drm_pci_set_busid,
 
 	/* Used in place of i915_pm_ops for non-DRIVER_MODESET */
-	.suspend = i915_suspend,
+	.suspend = i915_suspend_legacy,
 	.resume = i915_resume_legacy,
 
 	.device_is_agp = i915_driver_device_is_agp,
-	.master_create = i915_master_create,
-	.master_destroy = i915_master_destroy,
 #if defined(CONFIG_DEBUG_FS)
 	.debugfs_init = i915_debugfs_init,
 	.debugfs_cleanup = i915_debugfs_cleanup,
@@ -1645,7 +1584,7 @@ static struct drm_driver driver = {
 	.gem_prime_import = i915_gem_prime_import,
 
 	.dumb_create = i915_gem_dumb_create,
-	.dumb_map_offset = i915_gem_mmap_gtt,
+	.dumb_map_offset = i915_gem_dumb_map_offset,
 	.dumb_destroy = drm_gem_dumb_destroy,
 	.ioctls = i915_ioctls,
 	.fops = &i915_driver_fops,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 16a6f6d187a1..63bcda5541ec 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -55,7 +55,10 @@
 
 #define DRIVER_NAME		"i915"
 #define DRIVER_DESC		"Intel Graphics"
-#define DRIVER_DATE		"20140905"
+#define DRIVER_DATE		"20141121"
+
+#undef WARN_ON
+#define WARN_ON(x)		WARN(x, "WARN_ON(" #x ")")
 
 enum pipe {
 	INVALID_PIPE = -1,
@@ -76,6 +79,14 @@ enum transcoder {
 };
 #define transcoder_name(t) ((t) + 'A')
 
+/*
+ * This is the maximum (across all platforms) number of planes (primary +
+ * sprites) that can be active at the same time on one pipe.
+ *
+ * This value doesn't count the cursor plane.
+ */
+#define I915_MAX_PLANES	3
+
 enum plane {
 	PLANE_A = 0,
 	PLANE_B,
@@ -202,10 +213,15 @@ enum intel_dpll_id {
 	/* real shared dpll ids must be >= 0 */
 	DPLL_ID_PCH_PLL_A = 0,
 	DPLL_ID_PCH_PLL_B = 1,
+	/* hsw/bdw */
 	DPLL_ID_WRPLL1 = 0,
 	DPLL_ID_WRPLL2 = 1,
+	/* skl */
+	DPLL_ID_SKL_DPLL1 = 0,
+	DPLL_ID_SKL_DPLL2 = 1,
+	DPLL_ID_SKL_DPLL3 = 2,
 };
-#define I915_NUM_PLLS 2
+#define I915_NUM_PLLS 3
 
 struct intel_dpll_hw_state {
 	/* i9xx, pch plls */
@@ -216,16 +232,33 @@ struct intel_dpll_hw_state {
 
 	/* hsw, bdw */
 	uint32_t wrpll;
+
+	/* skl */
+	/*
+	 * DPLL_CTRL1 has 6 bits for each each this DPLL. We store those in
+	 * lower part of crtl1 and they get shifted into position when writing
+	 * the register.  This allows us to easily compare the state to share
+	 * the DPLL.
+	 */
+	uint32_t ctrl1;
+	/* HDMI only, 0 when used for DP */
+	uint32_t cfgcr1, cfgcr2;
+};
+
+struct intel_shared_dpll_config {
+	unsigned crtc_mask; /* mask of CRTCs sharing this PLL */
+	struct intel_dpll_hw_state hw_state;
 };
 
 struct intel_shared_dpll {
-	int refcount; /* count of number of CRTCs sharing this PLL */
+	struct intel_shared_dpll_config config;
+	struct intel_shared_dpll_config *new_config;
+
 	int active; /* count of number of active CRTCs (i.e. DPMS on) */
 	bool on; /* is the PLL actually active? Disabled during modeset */
 	const char *name;
 	/* should match the index in the dev_priv->shared_dplls array */
 	enum intel_dpll_id id;
-	struct intel_dpll_hw_state hw_state;
 	/* The mode_set hook is optional and should be used together with the
 	 * intel_prepare_shared_dpll function. */
 	void (*mode_set)(struct drm_i915_private *dev_priv,
@@ -239,6 +272,11 @@ struct intel_shared_dpll {
 			     struct intel_dpll_hw_state *hw_state);
 };
 
+#define SKL_DPLL0 0
+#define SKL_DPLL1 1
+#define SKL_DPLL2 2
+#define SKL_DPLL3 3
+
 /* Used by dp and fdi links */
 struct intel_link_m_n {
 	uint32_t	tu;
@@ -267,7 +305,6 @@ void intel_link_compute_m_n(int bpp, int nlanes,
 #define DRIVER_PATCHLEVEL	0
 
 #define WATCH_LISTS	0
-#define WATCH_GTT	0
 
 struct opregion_header;
 struct opregion_acpi;
@@ -290,12 +327,6 @@ struct intel_opregion {
 struct intel_overlay;
 struct intel_overlay_error_state;
 
-struct drm_local_map;
-
-struct drm_i915_master_private {
-	struct drm_local_map *sarea;
-	struct _drm_i915_sarea *sarea_priv;
-};
 #define I915_FENCE_REG_NONE -1
 #define I915_MAX_NUM_FENCES 32
 /* 32 fences + sign bit for FENCE_REG_NONE */
@@ -426,6 +457,7 @@ struct drm_i915_error_state {
 };
 
 struct intel_connector;
+struct intel_encoder;
 struct intel_crtc_config;
 struct intel_plane_config;
 struct intel_crtc;
@@ -452,7 +484,7 @@ struct drm_i915_display_funcs {
 	 * Returns true on success, false on failure.
 	 */
 	bool (*find_dpll)(const struct intel_limit *limit,
-			  struct drm_crtc *crtc,
+			  struct intel_crtc *crtc,
 			  int target, int refclk,
 			  struct dpll *match_clock,
 			  struct dpll *best_clock);
@@ -468,15 +500,14 @@ struct drm_i915_display_funcs {
 				struct intel_crtc_config *);
 	void (*get_plane_config)(struct intel_crtc *,
 				 struct intel_plane_config *);
-	int (*crtc_mode_set)(struct drm_crtc *crtc,
-			     int x, int y,
-			     struct drm_framebuffer *old_fb);
+	int (*crtc_compute_clock)(struct intel_crtc *crtc);
 	void (*crtc_enable)(struct drm_crtc *crtc);
 	void (*crtc_disable)(struct drm_crtc *crtc);
 	void (*off)(struct drm_crtc *crtc);
-	void (*write_eld)(struct drm_connector *connector,
-			  struct drm_crtc *crtc,
-			  struct drm_display_mode *mode);
+	void (*audio_codec_enable)(struct drm_connector *connector,
+				   struct intel_encoder *encoder,
+				   struct drm_display_mode *mode);
+	void (*audio_codec_disable)(struct intel_encoder *encoder);
 	void (*fdi_link_train)(struct drm_crtc *crtc);
 	void (*init_clock_gating)(struct drm_device *dev);
 	int (*queue_flip)(struct drm_device *dev, struct drm_crtc *crtc,
@@ -494,7 +525,7 @@ struct drm_i915_display_funcs {
 	/* display clock increase/decrease */
 	/* pll clock increase/decrease */
 
-	int (*setup_backlight)(struct intel_connector *connector);
+	int (*setup_backlight)(struct intel_connector *connector, enum pipe pipe);
 	uint32_t (*get_backlight)(struct intel_connector *connector);
 	void (*set_backlight)(struct intel_connector *connector,
 			      uint32_t level);
@@ -533,6 +564,7 @@ struct intel_uncore {
 
 	unsigned fw_rendercount;
 	unsigned fw_mediacount;
+	unsigned fw_blittercount;
 
 	struct timer_list force_wake_timer;
 };
@@ -551,6 +583,7 @@ struct intel_uncore {
 	func(is_ivybridge) sep \
 	func(is_valleyview) sep \
 	func(is_haswell) sep \
+	func(is_skylake) sep \
 	func(is_preliminary) sep \
 	func(has_fbc) sep \
 	func(has_pipe_cxsr) sep \
@@ -646,6 +679,7 @@ struct intel_context {
 	struct {
 		struct drm_i915_gem_object *state;
 		struct intel_ringbuffer *ringbuf;
+		int unpin_count;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
@@ -663,6 +697,18 @@ struct i915_fbc {
 
 	bool false_color;
 
+	/* Tracks whether the HW is actually enabled, not whether the feature is
+	 * possible. */
+	bool enabled;
+
+	/* On gen8 some rings cannont perform fbc clean operation so for now
+	 * we are doing this on SW with mmio.
+	 * This variable works in the opposite information direction
+	 * of ring->fbc_dirty telling software on frontbuffer tracking
+	 * to perform the cache clean on sw side.
+	 */
+	bool need_sw_cache_clean;
+
 	struct intel_fbc_work {
 		struct delayed_work work;
 		struct drm_crtc *crtc;
@@ -704,6 +750,7 @@ enum intel_pch {
 	PCH_IBX,	/* Ibexpeak PCH */
 	PCH_CPT,	/* Cougarpoint PCH */
 	PCH_LPT,	/* Lynxpoint PCH */
+	PCH_SPT,        /* Sunrisepoint PCH */
 	PCH_NOP,
 };
 
@@ -717,6 +764,7 @@ enum intel_sbi_destination {
 #define QUIRK_INVERT_BRIGHTNESS (1<<2)
 #define QUIRK_BACKLIGHT_PRESENT (1<<3)
 #define QUIRK_PIPEB_FORCE (1<<4)
+#define QUIRK_PIN_SWIZZLED_PAGES (1<<5)
 
 struct intel_fbdev;
 struct intel_fbc_work;
@@ -768,7 +816,6 @@ struct i915_suspend_saved_registers {
 	u32 saveBLC_HIST_CTL;
 	u32 saveBLC_PWM_CTL;
 	u32 saveBLC_PWM_CTL2;
-	u32 saveBLC_HIST_CTL_B;
 	u32 saveBLC_CPU_PWM_CTL;
 	u32 saveBLC_CPU_PWM_CTL2;
 	u32 saveFPB0;
@@ -877,6 +924,7 @@ struct i915_suspend_saved_registers {
 	u32 savePIPEB_LINK_N1;
 	u32 saveMCHBAR_RENDER_STANDBY;
 	u32 savePCH_PORT_HOTPLUG;
+	u16 saveGCDGMBUS;
 };
 
 struct vlv_s0ix_state {
@@ -947,8 +995,12 @@ struct intel_rps_ei {
 };
 
 struct intel_gen6_power_mgmt {
-	/* work and pm_iir are protected by dev_priv->irq_lock */
+	/*
+	 * work, interrupts_enabled and pm_iir are protected by
+	 * dev_priv->irq_lock
+	 */
 	struct work_struct work;
+	bool interrupts_enabled;
 	u32 pm_iir;
 
 	/* Frequencies are stored in potentially platform dependent multiples.
@@ -1071,31 +1123,6 @@ struct i915_power_domains {
 	struct i915_power_well *power_wells;
 };
 
-struct i915_dri1_state {
-	unsigned allow_batchbuffer : 1;
-	u32 __iomem *gfx_hws_cpu_addr;
-
-	unsigned int cpp;
-	int back_offset;
-	int front_offset;
-	int current_page;
-	int page_flipping;
-
-	uint32_t counter;
-};
-
-struct i915_ums_state {
-	/**
-	 * Flag if the X Server, and thus DRM, is not currently in
-	 * control of the device.
-	 *
-	 * This is set between LeaveVT and EnterVT.  It needs to be
-	 * replaced with a semaphore.  It also needs to be
-	 * transitioned away from for kernel modesetting.
-	 */
-	int mm_suspended;
-};
-
 #define MAX_L3_SLICES 2
 struct intel_l3_parity {
 	u32 *remap_info[MAX_L3_SLICES];
@@ -1357,6 +1384,49 @@ struct ilk_wm_values {
 	enum intel_ddb_partitioning partitioning;
 };
 
+struct skl_ddb_entry {
+	uint16_t start, end;	/* in number of blocks, 'end' is exclusive */
+};
+
+static inline uint16_t skl_ddb_entry_size(const struct skl_ddb_entry *entry)
+{
+	return entry->end - entry->start;
+}
+
+static inline bool skl_ddb_entry_equal(const struct skl_ddb_entry *e1,
+				       const struct skl_ddb_entry *e2)
+{
+	if (e1->start == e2->start && e1->end == e2->end)
+		return true;
+
+	return false;
+}
+
+struct skl_ddb_allocation {
+	struct skl_ddb_entry pipe[I915_MAX_PIPES];
+	struct skl_ddb_entry plane[I915_MAX_PIPES][I915_MAX_PLANES];
+	struct skl_ddb_entry cursor[I915_MAX_PIPES];
+};
+
+struct skl_wm_values {
+	bool dirty[I915_MAX_PIPES];
+	struct skl_ddb_allocation ddb;
+	uint32_t wm_linetime[I915_MAX_PIPES];
+	uint32_t plane[I915_MAX_PIPES][I915_MAX_PLANES][8];
+	uint32_t cursor[I915_MAX_PIPES][8];
+	uint32_t plane_trans[I915_MAX_PIPES][I915_MAX_PLANES];
+	uint32_t cursor_trans[I915_MAX_PIPES];
+};
+
+struct skl_wm_level {
+	bool plane_en[I915_MAX_PLANES];
+	bool cursor_en;
+	uint16_t plane_res_b[I915_MAX_PLANES];
+	uint8_t plane_res_l[I915_MAX_PLANES];
+	uint16_t cursor_res_b;
+	uint8_t cursor_res_l;
+};
+
 /*
  * This struct helps tracking the state needed for runtime PM, which puts the
  * device in PCI D3 state. Notice that when this happens, nothing on the
@@ -1369,7 +1439,7 @@ struct ilk_wm_values {
  *
  * Our driver uses the autosuspend delay feature, which means we'll only really
  * suspend if we stay with zero refcount for a certain amount of time. The
- * default value is currently very conservative (see intel_init_runtime_pm), but
+ * default value is currently very conservative (see intel_runtime_pm_enable), but
  * it can be changed with the standard runtime PM files from sysfs.
  *
  * The irqs_disabled variable becomes true exactly after we disable the IRQs and
@@ -1382,7 +1452,7 @@ struct ilk_wm_values {
  */
 struct i915_runtime_pm {
 	bool suspended;
-	bool _irqs_disabled;
+	bool irqs_enabled;
 };
 
 enum intel_pipe_crc_source {
@@ -1426,6 +1496,20 @@ struct i915_frontbuffer_tracking {
 	unsigned flip_bits;
 };
 
+struct i915_wa_reg {
+	u32 addr;
+	u32 value;
+	/* bitmask representing WA bits */
+	u32 mask;
+};
+
+#define I915_MAX_WA_REGS 16
+
+struct i915_workarounds {
+	struct i915_wa_reg reg[I915_MAX_WA_REGS];
+	u32 count;
+};
+
 struct drm_i915_private {
 	struct drm_device *dev;
 	struct kmem_cache *slab;
@@ -1505,11 +1589,13 @@ struct drm_i915_private {
 	struct intel_opregion opregion;
 	struct intel_vbt_data vbt;
 
+	bool preserve_bios_swizzle;
+
 	/* overlay */
 	struct intel_overlay *overlay;
 
 	/* backlight registers and fields in struct intel_panel */
-	spinlock_t backlight_lock;
+	struct mutex backlight_lock;
 
 	/* LVDS info */
 	bool no_aux_handshake;
@@ -1523,6 +1609,7 @@ struct drm_i915_private {
 
 	unsigned int fsb_freq, mem_freq, is_ddr3;
 	unsigned int vlv_cdclk_freq;
+	unsigned int hpll_freq;
 
 	/**
 	 * wq - Driver workqueue for GEM.
@@ -1568,19 +1655,7 @@ struct drm_i915_private {
 	struct intel_shared_dpll shared_dplls[I915_NUM_PLLS];
 	int dpio_phy_iosf_port[I915_NUM_PHYS_VLV];
 
-	/*
-	 * workarounds are currently applied at different places and
-	 * changes are being done to consolidate them so exact count is
-	 * not clear at this point, use a max value for now.
-	 */
-#define I915_MAX_WA_REGS  16
-	struct {
-		u32 addr;
-		u32 value;
-		/* bitmask representing WA bits */
-		u32 mask;
-	} intel_wa_regs[I915_MAX_WA_REGS];
-	u32 num_wa_regs;
+	struct i915_workarounds workarounds;
 
 	/* Reclocking support */
 	bool render_reclock_avail;
@@ -1644,9 +1719,25 @@ struct drm_i915_private {
 		uint16_t spr_latency[5];
 		/* cursor */
 		uint16_t cur_latency[5];
+		/*
+		 * Raw watermark memory latency values
+		 * for SKL for all 8 levels
+		 * in 1us units.
+		 */
+		uint16_t skl_latency[8];
+
+		/*
+		 * The skl_wm_values structure is a bit too big for stack
+		 * allocation, so we keep the staging struct where we store
+		 * intermediate results here instead.
+		 */
+		struct skl_wm_values skl_results;
 
 		/* current hardware state */
-		struct ilk_wm_values hw;
+		union {
+			struct ilk_wm_values hw;
+			struct skl_wm_values skl_hw;
+		};
 	} wm;
 
 	struct i915_runtime_pm pm;
@@ -1667,12 +1758,6 @@ struct drm_i915_private {
 
 	uint32_t bios_vgacntr;
 
-	/* Old dri1 support infrastructure, beware the dragons ya fools entering
-	 * here! */
-	struct i915_dri1_state dri1;
-	/* Old ums support infrastructure, same warning applies. */
-	struct i915_ums_state ums;
-
 	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
 	struct {
 		int (*do_execbuf)(struct drm_device *dev, struct drm_file *file,
@@ -1830,8 +1915,6 @@ struct drm_i915_gem_object {
 	unsigned long gt_ro:1;
 	unsigned int cache_level:3;
 
-	unsigned int has_aliasing_ppgtt_mapping:1;
-	unsigned int has_global_gtt_mapping:1;
 	unsigned int has_dma_mapping:1;
 
 	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
@@ -1864,10 +1947,10 @@ struct drm_i915_gem_object {
 	unsigned long user_pin_count;
 	struct drm_file *pin_filp;
 
-	/** for phy allocated objects */
-	struct drm_dma_handle *phys_handle;
-
 	union {
+		/** for phy allocated objects */
+		struct drm_dma_handle *phys_handle;
+
 		struct i915_gem_userptr {
 			uintptr_t ptr;
 			unsigned read_only :1;
@@ -2073,6 +2156,7 @@ struct drm_i915_cmd_table {
 #define IS_CHERRYVIEW(dev)	(INTEL_INFO(dev)->is_valleyview && IS_GEN8(dev))
 #define IS_HASWELL(dev)	(INTEL_INFO(dev)->is_haswell)
 #define IS_BROADWELL(dev)	(!INTEL_INFO(dev)->is_valleyview && IS_GEN8(dev))
+#define IS_SKYLAKE(dev)	(INTEL_INFO(dev)->is_skylake)
 #define IS_MOBILE(dev)		(INTEL_INFO(dev)->is_mobile)
 #define IS_HSW_EARLY_SDV(dev)	(IS_HASWELL(dev) && \
 				 (INTEL_DEVID(dev) & 0xFF00) == 0x0C00)
@@ -2080,9 +2164,10 @@ struct drm_i915_cmd_table {
 				 ((INTEL_DEVID(dev) & 0xf) == 0x2  || \
 				 (INTEL_DEVID(dev) & 0xf) == 0x6 || \
 				 (INTEL_DEVID(dev) & 0xf) == 0xe))
+#define IS_BDW_GT3(dev)		(IS_BROADWELL(dev) && \
+				 (INTEL_DEVID(dev) & 0x00F0) == 0x0020)
 #define IS_HSW_ULT(dev)		(IS_HASWELL(dev) && \
 				 (INTEL_DEVID(dev) & 0xFF00) == 0x0A00)
-#define IS_ULT(dev)		(IS_HSW_ULT(dev) || IS_BDW_ULT(dev))
 #define IS_HSW_GT3(dev)		(IS_HASWELL(dev) && \
 				 (INTEL_DEVID(dev) & 0x00F0) == 0x0020)
 /* ULX machines are also considered ULT. */
@@ -2103,6 +2188,7 @@ struct drm_i915_cmd_table {
 #define IS_GEN6(dev)	(INTEL_INFO(dev)->gen == 6)
 #define IS_GEN7(dev)	(INTEL_INFO(dev)->gen == 7)
 #define IS_GEN8(dev)	(INTEL_INFO(dev)->gen == 8)
+#define IS_GEN9(dev)	(INTEL_INFO(dev)->gen == 9)
 
 #define RENDER_RING		(1<<RCS)
 #define BSD_RING		(1<<VCS)
@@ -2115,13 +2201,11 @@ struct drm_i915_cmd_table {
 #define HAS_VEBOX(dev)		(INTEL_INFO(dev)->ring_mask & VEBOX_RING)
 #define HAS_LLC(dev)		(INTEL_INFO(dev)->has_llc)
 #define HAS_WT(dev)		((IS_HASWELL(dev) || IS_BROADWELL(dev)) && \
-				 to_i915(dev)->ellc_size)
+				 __I915__(dev)->ellc_size)
 #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
 
 #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
 #define HAS_LOGICAL_RING_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 8)
-#define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >= 6)
-#define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 && !IS_GEN8(dev))
 #define USES_PPGTT(dev)		(i915.enable_ppgtt)
 #define USES_FULL_PPGTT(dev)	(i915.enable_ppgtt == 2)
 
@@ -2154,13 +2238,15 @@ struct drm_i915_cmd_table {
 #define HAS_PIPE_CXSR(dev) (INTEL_INFO(dev)->has_pipe_cxsr)
 #define HAS_FBC(dev) (INTEL_INFO(dev)->has_fbc)
 
-#define HAS_IPS(dev)		(IS_ULT(dev) || IS_BROADWELL(dev))
+#define HAS_IPS(dev)		(IS_HSW_ULT(dev) || IS_BROADWELL(dev))
 
 #define HAS_DDI(dev)		(INTEL_INFO(dev)->has_ddi)
 #define HAS_FPGA_DBG_UNCLAIMED(dev)	(INTEL_INFO(dev)->has_fpga_dbg)
 #define HAS_PSR(dev)		(IS_HASWELL(dev) || IS_BROADWELL(dev))
 #define HAS_RUNTIME_PM(dev)	(IS_GEN6(dev) || IS_HASWELL(dev) || \
 				 IS_BROADWELL(dev) || IS_VALLEYVIEW(dev))
+#define HAS_RC6(dev)		(INTEL_INFO(dev)->gen >= 6)
+#define HAS_RC6p(dev)		(INTEL_INFO(dev)->gen == 6 || IS_IVYBRIDGE(dev))
 
 #define INTEL_PCH_DEVICE_ID_MASK		0xff00
 #define INTEL_PCH_IBX_DEVICE_ID_TYPE		0x3b00
@@ -2168,8 +2254,11 @@ struct drm_i915_cmd_table {
 #define INTEL_PCH_PPT_DEVICE_ID_TYPE		0x1e00
 #define INTEL_PCH_LPT_DEVICE_ID_TYPE		0x8c00
 #define INTEL_PCH_LPT_LP_DEVICE_ID_TYPE		0x9c00
+#define INTEL_PCH_SPT_DEVICE_ID_TYPE		0xA100
+#define INTEL_PCH_SPT_LP_DEVICE_ID_TYPE		0x9D00
 
-#define INTEL_PCH_TYPE(dev) (to_i915(dev)->pch_type)
+#define INTEL_PCH_TYPE(dev) (__I915__(dev)->pch_type)
+#define HAS_PCH_SPT(dev) (INTEL_PCH_TYPE(dev) == PCH_SPT)
 #define HAS_PCH_LPT(dev) (INTEL_PCH_TYPE(dev) == PCH_LPT)
 #define HAS_PCH_CPT(dev) (INTEL_PCH_TYPE(dev) == PCH_CPT)
 #define HAS_PCH_IBX(dev) (INTEL_PCH_TYPE(dev) == PCH_IBX)
@@ -2189,8 +2278,8 @@ struct drm_i915_cmd_table {
 extern const struct drm_ioctl_desc i915_ioctls[];
 extern int i915_max_ioctl;
 
-extern int i915_suspend(struct drm_device *dev, pm_message_t state);
-extern int i915_resume(struct drm_device *dev);
+extern int i915_suspend_legacy(struct drm_device *dev, pm_message_t state);
+extern int i915_resume_legacy(struct drm_device *dev);
 extern int i915_master_create(struct drm_device *dev, struct drm_master *master);
 extern void i915_master_destroy(struct drm_device *dev, struct drm_master *master);
 
@@ -2227,8 +2316,6 @@ struct i915_params {
 extern struct i915_params i915 __read_mostly;
 
 				/* i915_dma.c */
-void i915_update_dri1_breadcrumb(struct drm_device *dev);
-extern void i915_kernel_lost_context(struct drm_device * dev);
 extern int i915_driver_load(struct drm_device *, unsigned long flags);
 extern int i915_driver_unload(struct drm_device *);
 extern int i915_driver_open(struct drm_device *dev, struct drm_file *file);
@@ -2242,9 +2329,6 @@ extern int i915_driver_device_is_agp(struct drm_device * dev);
 extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
 			      unsigned long arg);
 #endif
-extern int i915_emit_box(struct drm_device *dev,
-			 struct drm_clip_rect *box,
-			 int DR1, int DR4);
 extern int intel_gpu_reset(struct drm_device *dev);
 extern int i915_reset(struct drm_device *dev);
 extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
@@ -2260,10 +2344,10 @@ __printf(3, 4)
 void i915_handle_error(struct drm_device *dev, bool wedged,
 		       const char *fmt, ...);
 
-void gen6_set_pm_mask(struct drm_i915_private *dev_priv, u32 pm_iir,
-							int new_delay);
-extern void intel_irq_init(struct drm_device *dev);
-extern void intel_hpd_init(struct drm_device *dev);
+extern void intel_irq_init(struct drm_i915_private *dev_priv);
+extern void intel_hpd_init(struct drm_i915_private *dev_priv);
+int intel_irq_install(struct drm_i915_private *dev_priv);
+void intel_irq_uninstall(struct drm_i915_private *dev_priv);
 
 extern void intel_uncore_sanitize(struct drm_device *dev);
 extern void intel_uncore_early_sanitize(struct drm_device *dev,
@@ -2283,10 +2367,19 @@ i915_disable_pipestat(struct drm_i915_private *dev_priv, enum pipe pipe,
 
 void valleyview_enable_display_irqs(struct drm_i915_private *dev_priv);
 void valleyview_disable_display_irqs(struct drm_i915_private *dev_priv);
+void
+ironlake_enable_display_irq(struct drm_i915_private *dev_priv, u32 mask);
+void
+ironlake_disable_display_irq(struct drm_i915_private *dev_priv, u32 mask);
+void ibx_display_interrupt_update(struct drm_i915_private *dev_priv,
+				  uint32_t interrupt_mask,
+				  uint32_t enabled_irq_mask);
+#define ibx_enable_display_interrupt(dev_priv, bits) \
+	ibx_display_interrupt_update((dev_priv), (bits), (bits))
+#define ibx_disable_display_interrupt(dev_priv, bits) \
+	ibx_display_interrupt_update((dev_priv), (bits), 0)
 
 /* i915_gem.c */
-int i915_gem_init_ioctl(struct drm_device *dev, void *data,
-			struct drm_file *file_priv);
 int i915_gem_create_ioctl(struct drm_device *dev, void *data,
 			  struct drm_file *file_priv);
 int i915_gem_pread_ioctl(struct drm_device *dev, void *data,
@@ -2333,10 +2426,6 @@ int i915_gem_throttle_ioctl(struct drm_device *dev, void *data,
 			    struct drm_file *file_priv);
 int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 			   struct drm_file *file_priv);
-int i915_gem_entervt_ioctl(struct drm_device *dev, void *data,
-			   struct drm_file *file_priv);
-int i915_gem_leavevt_ioctl(struct drm_device *dev, void *data,
-			   struct drm_file *file_priv);
 int i915_gem_set_tiling(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int i915_gem_get_tiling(struct drm_device *dev, void *data,
@@ -2379,7 +2468,6 @@ int __must_check i915_vma_unbind(struct i915_vma *vma);
 int i915_gem_object_put_pages(struct drm_i915_gem_object *obj);
 void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv);
 void i915_gem_release_mmap(struct drm_i915_gem_object *obj);
-void i915_gem_lastclose(struct drm_device *dev);
 
 int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 				    int *needs_clflush);
@@ -2413,8 +2501,9 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 int i915_gem_dumb_create(struct drm_file *file_priv,
 			 struct drm_device *dev,
 			 struct drm_mode_create_dumb *args);
-int i915_gem_mmap_gtt(struct drm_file *file_priv, struct drm_device *dev,
-		      uint32_t handle, uint64_t *offset);
+int i915_gem_dumb_map_offset(struct drm_file *file_priv,
+			     struct drm_device *dev, uint32_t handle,
+			     uint64_t *offset);
 /**
  * Returns true if seq1 is later than seq2.
  */
@@ -2486,6 +2575,11 @@ int __i915_add_request(struct intel_engine_cs *ring,
 		       u32 *seqno);
 #define i915_add_request(ring, seqno) \
 	__i915_add_request(ring, NULL, NULL, seqno)
+int __i915_wait_seqno(struct intel_engine_cs *ring, u32 seqno,
+			unsigned reset_counter,
+			bool interruptible,
+			s64 *timeout,
+			struct drm_i915_file_private *file_priv);
 int __must_check i915_wait_seqno(struct intel_engine_cs *ring,
 				 uint32_t seqno);
 int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
@@ -2755,7 +2849,6 @@ static inline bool intel_gmbus_is_forced_bit(struct i2c_adapter *adapter)
 extern void intel_i2c_reset(struct drm_device *dev);
 
 /* intel_opregion.c */
-struct intel_encoder;
 #ifdef CONFIG_ACPI
 extern int intel_opregion_setup(struct drm_device *dev);
 extern void intel_opregion_init(struct drm_device *dev);
@@ -2793,7 +2886,6 @@ static inline void intel_unregister_dsm_handler(void) { return; }
 
 /* modesetting */
 extern void intel_modeset_init_hw(struct drm_device *dev);
-extern void intel_modeset_suspend_hw(struct drm_device *dev);
 extern void intel_modeset_init(struct drm_device *dev);
 extern void intel_modeset_gem_init(struct drm_device *dev);
 extern void intel_modeset_cleanup(struct drm_device *dev);
@@ -2804,7 +2896,7 @@ extern void intel_modeset_setup_hw_state(struct drm_device *dev,
 extern void i915_redisable_vga(struct drm_device *dev);
 extern void i915_redisable_vga_power_on(struct drm_device *dev);
 extern bool intel_fbc_enabled(struct drm_device *dev);
-extern void gen8_fbc_sw_flush(struct drm_device *dev, u32 value);
+extern void bdw_fbc_sw_flush(struct drm_device *dev, u32 value);
 extern void intel_disable_fbc(struct drm_device *dev);
 extern bool ironlake_set_drps(struct drm_device *dev, u8 val);
 extern void intel_init_pch_refclk(struct drm_device *dev);
@@ -2842,8 +2934,8 @@ void gen6_gt_force_wake_get(struct drm_i915_private *dev_priv, int fw_engine);
 void gen6_gt_force_wake_put(struct drm_i915_private *dev_priv, int fw_engine);
 void assert_force_wake_inactive(struct drm_i915_private *dev_priv);
 
-int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u8 mbox, u32 *val);
-int sandybridge_pcode_write(struct drm_i915_private *dev_priv, u8 mbox, u32 val);
+int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val);
+int sandybridge_pcode_write(struct drm_i915_private *dev_priv, u32 mbox, u32 val);
 
 /* intel_sideband.c */
 u32 vlv_punit_read(struct drm_i915_private *dev_priv, u8 addr);
@@ -2873,7 +2965,9 @@ int vlv_freq_opcode(struct drm_i915_private *dev_priv, int val);
 
 #define FORCEWAKE_RENDER	(1 << 0)
 #define FORCEWAKE_MEDIA		(1 << 1)
-#define FORCEWAKE_ALL		(FORCEWAKE_RENDER | FORCEWAKE_MEDIA)
+#define FORCEWAKE_BLITTER	(1 << 2)
+#define FORCEWAKE_ALL		(FORCEWAKE_RENDER | FORCEWAKE_MEDIA | \
+					FORCEWAKE_BLITTER)
 
 
 #define I915_READ8(reg)		dev_priv->uncore.funcs.mmio_readb(dev_priv, (reg), true)
@@ -2939,6 +3033,11 @@ static inline unsigned long msecs_to_jiffies_timeout(const unsigned int m)
 	return min_t(unsigned long, MAX_JIFFY_OFFSET, j + 1);
 }
 
+static inline unsigned long nsecs_to_jiffies_timeout(const u64 n)
+{
+        return min_t(u64, MAX_JIFFY_OFFSET, nsecs_to_jiffies64(n) + 1);
+}
+
 static inline unsigned long
 timespec_to_jiffies_timeout(const struct timespec *value)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 28f91df2604d..4a9faea626db 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -160,33 +160,6 @@ i915_gem_object_is_inactive(struct drm_i915_gem_object *obj)
 }
 
 int
-i915_gem_init_ioctl(struct drm_device *dev, void *data,
-		    struct drm_file *file)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_gem_init *args = data;
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return -ENODEV;
-
-	if (args->gtt_start >= args->gtt_end ||
-	    (args->gtt_end | args->gtt_start) & (PAGE_SIZE - 1))
-		return -EINVAL;
-
-	/* GEM with user mode setting was never supported on ilk and later. */
-	if (INTEL_INFO(dev)->gen >= 5)
-		return -ENODEV;
-
-	mutex_lock(&dev->struct_mutex);
-	i915_gem_setup_global_gtt(dev, args->gtt_start, args->gtt_end,
-				  args->gtt_end);
-	dev_priv->gtt.mappable_end = args->gtt_end;
-	mutex_unlock(&dev->struct_mutex);
-
-	return 0;
-}
-
-int
 i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 			    struct drm_file *file)
 {
@@ -208,40 +181,137 @@ i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 	return 0;
 }
 
-static void i915_gem_object_detach_phys(struct drm_i915_gem_object *obj)
+static int
+i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
 {
-	drm_dma_handle_t *phys = obj->phys_handle;
+	struct address_space *mapping = file_inode(obj->base.filp)->i_mapping;
+	char *vaddr = obj->phys_handle->vaddr;
+	struct sg_table *st;
+	struct scatterlist *sg;
+	int i;
 
-	if (!phys)
-		return;
+	if (WARN_ON(i915_gem_object_needs_bit17_swizzle(obj)))
+		return -EINVAL;
 
-	if (obj->madv == I915_MADV_WILLNEED) {
+	for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
+		struct page *page;
+		char *src;
+
+		page = shmem_read_mapping_page(mapping, i);
+		if (IS_ERR(page))
+			return PTR_ERR(page);
+
+		src = kmap_atomic(page);
+		memcpy(vaddr, src, PAGE_SIZE);
+		drm_clflush_virt_range(vaddr, PAGE_SIZE);
+		kunmap_atomic(src);
+
+		page_cache_release(page);
+		vaddr += PAGE_SIZE;
+	}
+
+	i915_gem_chipset_flush(obj->base.dev);
+
+	st = kmalloc(sizeof(*st), GFP_KERNEL);
+	if (st == NULL)
+		return -ENOMEM;
+
+	if (sg_alloc_table(st, 1, GFP_KERNEL)) {
+		kfree(st);
+		return -ENOMEM;
+	}
+
+	sg = st->sgl;
+	sg->offset = 0;
+	sg->length = obj->base.size;
+
+	sg_dma_address(sg) = obj->phys_handle->busaddr;
+	sg_dma_len(sg) = obj->base.size;
+
+	obj->pages = st;
+	obj->has_dma_mapping = true;
+	return 0;
+}
+
+static void
+i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj)
+{
+	int ret;
+
+	BUG_ON(obj->madv == __I915_MADV_PURGED);
+
+	ret = i915_gem_object_set_to_cpu_domain(obj, true);
+	if (ret) {
+		/* In the event of a disaster, abandon all caches and
+		 * hope for the best.
+		 */
+		WARN_ON(ret != -EIO);
+		obj->base.read_domains = obj->base.write_domain = I915_GEM_DOMAIN_CPU;
+	}
+
+	if (obj->madv == I915_MADV_DONTNEED)
+		obj->dirty = 0;
+
+	if (obj->dirty) {
 		struct address_space *mapping = file_inode(obj->base.filp)->i_mapping;
-		char *vaddr = phys->vaddr;
+		char *vaddr = obj->phys_handle->vaddr;
 		int i;
 
 		for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
-			struct page *page = shmem_read_mapping_page(mapping, i);
-			if (!IS_ERR(page)) {
-				char *dst = kmap_atomic(page);
-				memcpy(dst, vaddr, PAGE_SIZE);
-				drm_clflush_virt_range(dst, PAGE_SIZE);
-				kunmap_atomic(dst);
-
-				set_page_dirty(page);
+			struct page *page;
+			char *dst;
+
+			page = shmem_read_mapping_page(mapping, i);
+			if (IS_ERR(page))
+				continue;
+
+			dst = kmap_atomic(page);
+			drm_clflush_virt_range(vaddr, PAGE_SIZE);
+			memcpy(dst, vaddr, PAGE_SIZE);
+			kunmap_atomic(dst);
+
+			set_page_dirty(page);
+			if (obj->madv == I915_MADV_WILLNEED)
 				mark_page_accessed(page);
-				page_cache_release(page);
-			}
+			page_cache_release(page);
 			vaddr += PAGE_SIZE;
 		}
-		i915_gem_chipset_flush(obj->base.dev);
+		obj->dirty = 0;
 	}
 
-#ifdef CONFIG_X86
-	set_memory_wb((unsigned long)phys->vaddr, phys->size / PAGE_SIZE);
-#endif
-	drm_pci_free(obj->base.dev, phys);
-	obj->phys_handle = NULL;
+	sg_free_table(obj->pages);
+	kfree(obj->pages);
+
+	obj->has_dma_mapping = false;
+}
+
+static void
+i915_gem_object_release_phys(struct drm_i915_gem_object *obj)
+{
+	drm_pci_free(obj->base.dev, obj->phys_handle);
+}
+
+static const struct drm_i915_gem_object_ops i915_gem_phys_ops = {
+	.get_pages = i915_gem_object_get_pages_phys,
+	.put_pages = i915_gem_object_put_pages_phys,
+	.release = i915_gem_object_release_phys,
+};
+
+static int
+drop_pages(struct drm_i915_gem_object *obj)
+{
+	struct i915_vma *vma, *next;
+	int ret;
+
+	drm_gem_object_reference(&obj->base);
+	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link)
+		if (i915_vma_unbind(vma))
+			break;
+
+	ret = i915_gem_object_put_pages(obj);
+	drm_gem_object_unreference(&obj->base);
+
+	return ret;
 }
 
 int
@@ -249,9 +319,7 @@ i915_gem_object_attach_phys(struct drm_i915_gem_object *obj,
 			    int align)
 {
 	drm_dma_handle_t *phys;
-	struct address_space *mapping;
-	char *vaddr;
-	int i;
+	int ret;
 
 	if (obj->phys_handle) {
 		if ((unsigned long)obj->phys_handle->vaddr & (align -1))
@@ -266,41 +334,19 @@ i915_gem_object_attach_phys(struct drm_i915_gem_object *obj,
 	if (obj->base.filp == NULL)
 		return -EINVAL;
 
+	ret = drop_pages(obj);
+	if (ret)
+		return ret;
+
 	/* create a new object */
 	phys = drm_pci_alloc(obj->base.dev, obj->base.size, align);
 	if (!phys)
 		return -ENOMEM;
 
-	vaddr = phys->vaddr;
-#ifdef CONFIG_X86
-	set_memory_wc((unsigned long)vaddr, phys->size / PAGE_SIZE);
-#endif
-	mapping = file_inode(obj->base.filp)->i_mapping;
-	for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
-		struct page *page;
-		char *src;
-
-		page = shmem_read_mapping_page(mapping, i);
-		if (IS_ERR(page)) {
-#ifdef CONFIG_X86
-			set_memory_wb((unsigned long)phys->vaddr, phys->size / PAGE_SIZE);
-#endif
-			drm_pci_free(obj->base.dev, phys);
-			return PTR_ERR(page);
-		}
-
-		src = kmap_atomic(page);
-		memcpy(vaddr, src, PAGE_SIZE);
-		kunmap_atomic(src);
-
-		mark_page_accessed(page);
-		page_cache_release(page);
-
-		vaddr += PAGE_SIZE;
-	}
-
 	obj->phys_handle = phys;
-	return 0;
+	obj->ops = &i915_gem_phys_ops;
+
+	return i915_gem_object_get_pages(obj);
 }
 
 static int
@@ -311,6 +357,14 @@ i915_gem_phys_pwrite(struct drm_i915_gem_object *obj,
 	struct drm_device *dev = obj->base.dev;
 	void *vaddr = obj->phys_handle->vaddr + args->offset;
 	char __user *user_data = to_user_ptr(args->data_ptr);
+	int ret;
+
+	/* We manually control the domain here and pretend that it
+	 * remains coherent i.e. in the GTT domain, like shmem_pwrite.
+	 */
+	ret = i915_gem_object_wait_rendering(obj, false);
+	if (ret)
+		return ret;
 
 	if (__copy_from_user_inatomic_nocache(vaddr, user_data, args->size)) {
 		unsigned long unwritten;
@@ -326,6 +380,7 @@ i915_gem_phys_pwrite(struct drm_i915_gem_object *obj,
 			return -EFAULT;
 	}
 
+	drm_clflush_virt_range(vaddr, args->size);
 	i915_gem_chipset_flush(dev);
 	return 0;
 }
@@ -346,6 +401,7 @@ static int
 i915_gem_create(struct drm_file *file,
 		struct drm_device *dev,
 		uint64_t size,
+		bool dumb,
 		uint32_t *handle_p)
 {
 	struct drm_i915_gem_object *obj;
@@ -361,6 +417,7 @@ i915_gem_create(struct drm_file *file,
 	if (obj == NULL)
 		return -ENOMEM;
 
+	obj->base.dumb = dumb;
 	ret = drm_gem_handle_create(file, &obj->base, &handle);
 	/* drop reference from allocate - handle holds it now */
 	drm_gem_object_unreference_unlocked(&obj->base);
@@ -380,7 +437,7 @@ i915_gem_dumb_create(struct drm_file *file,
 	args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64);
 	args->size = args->pitch * args->height;
 	return i915_gem_create(file, dev,
-			       args->size, &args->handle);
+			       args->size, true, &args->handle);
 }
 
 /**
@@ -393,7 +450,7 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_gem_create *args = data;
 
 	return i915_gem_create(file, dev,
-			       args->size, &args->handle);
+			       args->size, false, &args->handle);
 }
 
 static inline int
@@ -1046,11 +1103,6 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	 * pread/pwrite currently are reading and writing from the CPU
 	 * perspective, requiring manual detiling by the client.
 	 */
-	if (obj->phys_handle) {
-		ret = i915_gem_phys_pwrite(obj, args, file);
-		goto out;
-	}
-
 	if (obj->tiling_mode == I915_TILING_NONE &&
 	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
 	    cpu_write_needs_clflush(obj)) {
@@ -1060,8 +1112,12 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 		 * textures). Fallback to the shmem path in that case. */
 	}
 
-	if (ret == -EFAULT || ret == -ENOSPC)
-		ret = i915_gem_shmem_pwrite(dev, obj, args, file);
+	if (ret == -EFAULT || ret == -ENOSPC) {
+		if (obj->phys_handle)
+			ret = i915_gem_phys_pwrite(obj, args, file);
+		else
+			ret = i915_gem_shmem_pwrite(dev, obj, args, file);
+	}
 
 out:
 	drm_gem_object_unreference(&obj->base);
@@ -1134,7 +1190,7 @@ static bool can_wait_boost(struct drm_i915_file_private *file_priv)
 }
 
 /**
- * __wait_seqno - wait until execution of seqno has finished
+ * __i915_wait_seqno - wait until execution of seqno has finished
  * @ring: the ring expected to report seqno
  * @seqno: duh!
  * @reset_counter: reset sequence associated with the given seqno
@@ -1151,7 +1207,7 @@ static bool can_wait_boost(struct drm_i915_file_private *file_priv)
  * Returns 0 if the seqno was found within the alloted time. Else returns the
  * errno with remaining time filled in timeout argument.
  */
-static int __wait_seqno(struct intel_engine_cs *ring, u32 seqno,
+int __i915_wait_seqno(struct intel_engine_cs *ring, u32 seqno,
 			unsigned reset_counter,
 			bool interruptible,
 			s64 *timeout,
@@ -1171,7 +1227,8 @@ static int __wait_seqno(struct intel_engine_cs *ring, u32 seqno,
 	if (i915_seqno_passed(ring->get_seqno(ring, true), seqno))
 		return 0;
 
-	timeout_expire = timeout ? jiffies + nsecs_to_jiffies((u64)*timeout) : 0;
+	timeout_expire = timeout ?
+		jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
 
 	if (INTEL_INFO(dev)->gen >= 6 && ring->id == RCS && can_wait_boost(file_priv)) {
 		gen6_rps_boost(dev_priv);
@@ -1247,6 +1304,16 @@ static int __wait_seqno(struct intel_engine_cs *ring, u32 seqno,
 		s64 tres = *timeout - (now - before);
 
 		*timeout = tres < 0 ? 0 : tres;
+
+		/*
+		 * Apparently ktime isn't accurate enough and occasionally has a
+		 * bit of mismatch in the jiffies<->nsecs<->ktime loop. So patch
+		 * things up to make the test happy. We allow up to 1 jiffy.
+		 *
+		 * This is a regrssion from the timespec->ktime conversion.
+		 */
+		if (ret == -ETIME && *timeout < jiffies_to_usecs(1)*1000)
+			*timeout = 0;
 	}
 
 	return ret;
@@ -1262,6 +1329,7 @@ i915_wait_seqno(struct intel_engine_cs *ring, uint32_t seqno)
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	bool interruptible = dev_priv->mm.interruptible;
+	unsigned reset_counter;
 	int ret;
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
@@ -1275,14 +1343,13 @@ i915_wait_seqno(struct intel_engine_cs *ring, uint32_t seqno)
 	if (ret)
 		return ret;
 
-	return __wait_seqno(ring, seqno,
-			    atomic_read(&dev_priv->gpu_error.reset_counter),
-			    interruptible, NULL, NULL);
+	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+	return __i915_wait_seqno(ring, seqno, reset_counter, interruptible,
+				 NULL, NULL);
 }
 
 static int
-i915_gem_object_wait_rendering__tail(struct drm_i915_gem_object *obj,
-				     struct intel_engine_cs *ring)
+i915_gem_object_wait_rendering__tail(struct drm_i915_gem_object *obj)
 {
 	if (!obj->active)
 		return 0;
@@ -1319,7 +1386,7 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	return i915_gem_object_wait_rendering__tail(obj, ring);
+	return i915_gem_object_wait_rendering__tail(obj);
 }
 
 /* A nonblocking variant of the above wait. This is a highly dangerous routine
@@ -1354,12 +1421,13 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 
 	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
 	mutex_unlock(&dev->struct_mutex);
-	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL, file_priv);
+	ret = __i915_wait_seqno(ring, seqno, reset_counter, true, NULL,
+				file_priv);
 	mutex_lock(&dev->struct_mutex);
 	if (ret)
 		return ret;
 
-	return i915_gem_object_wait_rendering__tail(obj, ring);
+	return i915_gem_object_wait_rendering__tail(obj);
 }
 
 /**
@@ -1466,6 +1534,16 @@ unlock:
  *
  * While the mapping holds a reference on the contents of the object, it doesn't
  * imply a ref on the object itself.
+ *
+ * IMPORTANT:
+ *
+ * DRM driver writers who look a this function as an example for how to do GEM
+ * mmap support, please don't implement mmap support like here. The modern way
+ * to implement DRM mmap support is with an mmap offset ioctl (like
+ * i915_gem_mmap_gtt) and then using the mmap syscall on the DRM fd directly.
+ * That way debug tooling like valgrind will understand what's going on, hiding
+ * the mmap call in a driver private ioctl will break that. The i915 driver only
+ * does cpu mmaps this way because we didn't know better.
  */
 int
 i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
@@ -1762,10 +1840,10 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
 	drm_gem_free_mmap_offset(&obj->base);
 }
 
-int
+static int
 i915_gem_mmap_gtt(struct drm_file *file,
 		  struct drm_device *dev,
-		  uint32_t handle,
+		  uint32_t handle, bool dumb,
 		  uint64_t *offset)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1782,6 +1860,13 @@ i915_gem_mmap_gtt(struct drm_file *file,
 		goto unlock;
 	}
 
+	/*
+	 * We don't allow dumb mmaps on objects created using another
+	 * interface.
+	 */
+	WARN_ONCE(dumb && !(obj->base.dumb || obj->base.import_attach),
+		  "Illegal dumb map of accelerated buffer.\n");
+
 	if (obj->base.size > dev_priv->gtt.mappable_end) {
 		ret = -E2BIG;
 		goto out;
@@ -1806,6 +1891,15 @@ unlock:
 	return ret;
 }
 
+int
+i915_gem_dumb_map_offset(struct drm_file *file,
+			 struct drm_device *dev,
+			 uint32_t handle,
+			 uint64_t *offset)
+{
+	return i915_gem_mmap_gtt(file, dev, handle, true, offset);
+}
+
 /**
  * i915_gem_mmap_gtt_ioctl - prepare an object for GTT mmap'ing
  * @dev: DRM device
@@ -1827,7 +1921,7 @@ i915_gem_mmap_gtt_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_gem_mmap_gtt *args = data;
 
-	return i915_gem_mmap_gtt(file, dev, args->handle, &args->offset);
+	return i915_gem_mmap_gtt(file, dev, args->handle, false, &args->offset);
 }
 
 static inline int
@@ -1945,7 +2039,14 @@ unsigned long
 i915_gem_shrink(struct drm_i915_private *dev_priv,
 		long target, unsigned flags)
 {
-	const bool purgeable_only = flags & I915_SHRINK_PURGEABLE;
+	const struct {
+		struct list_head *list;
+		unsigned int bit;
+	} phases[] = {
+		{ &dev_priv->mm.unbound_list, I915_SHRINK_UNBOUND },
+		{ &dev_priv->mm.bound_list, I915_SHRINK_BOUND },
+		{ NULL, 0 },
+	}, *phase;
 	unsigned long count = 0;
 
 	/*
@@ -1967,48 +2068,30 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 	 * dev->struct_mutex and so we won't ever be able to observe an
 	 * object on the bound_list with a reference count equals 0.
 	 */
-	if (flags & I915_SHRINK_UNBOUND) {
+	for (phase = phases; phase->list; phase++) {
 		struct list_head still_in_list;
 
-		INIT_LIST_HEAD(&still_in_list);
-		while (count < target && !list_empty(&dev_priv->mm.unbound_list)) {
-			struct drm_i915_gem_object *obj;
-
-			obj = list_first_entry(&dev_priv->mm.unbound_list,
-					       typeof(*obj), global_list);
-			list_move_tail(&obj->global_list, &still_in_list);
-
-			if (!i915_gem_object_is_purgeable(obj) && purgeable_only)
-				continue;
-
-			drm_gem_object_reference(&obj->base);
-
-			if (i915_gem_object_put_pages(obj) == 0)
-				count += obj->base.size >> PAGE_SHIFT;
-
-			drm_gem_object_unreference(&obj->base);
-		}
-		list_splice(&still_in_list, &dev_priv->mm.unbound_list);
-	}
-
-	if (flags & I915_SHRINK_BOUND) {
-		struct list_head still_in_list;
+		if ((flags & phase->bit) == 0)
+			continue;
 
 		INIT_LIST_HEAD(&still_in_list);
-		while (count < target && !list_empty(&dev_priv->mm.bound_list)) {
+		while (count < target && !list_empty(phase->list)) {
 			struct drm_i915_gem_object *obj;
 			struct i915_vma *vma, *v;
 
-			obj = list_first_entry(&dev_priv->mm.bound_list,
+			obj = list_first_entry(phase->list,
 					       typeof(*obj), global_list);
 			list_move_tail(&obj->global_list, &still_in_list);
 
-			if (!i915_gem_object_is_purgeable(obj) && purgeable_only)
+			if (flags & I915_SHRINK_PURGEABLE &&
+			    !i915_gem_object_is_purgeable(obj))
 				continue;
 
 			drm_gem_object_reference(&obj->base);
 
-			list_for_each_entry_safe(vma, v, &obj->vma_list, vma_link)
+			/* For the unbound phase, this should be a no-op! */
+			list_for_each_entry_safe(vma, v,
+						 &obj->vma_list, vma_link)
 				if (i915_vma_unbind(vma))
 					break;
 
@@ -2017,7 +2100,7 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 
 			drm_gem_object_unreference(&obj->base);
 		}
-		list_splice(&still_in_list, &dev_priv->mm.bound_list);
+		list_splice(&still_in_list, phase->list);
 	}
 
 	return count;
@@ -2122,6 +2205,10 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	if (i915_gem_object_needs_bit17_swizzle(obj))
 		i915_gem_object_do_bit_17_swizzle(obj);
 
+	if (obj->tiling_mode != I915_TILING_NONE &&
+	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES)
+		i915_gem_object_pin_pages(obj);
+
 	return 0;
 
 err_pages:
@@ -2420,15 +2507,13 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	ring->outstanding_lazy_seqno = 0;
 	ring->preallocated_lazy_request = NULL;
 
-	if (!dev_priv->ums.mm_suspended) {
-		i915_queue_hangcheck(ring->dev);
+	i915_queue_hangcheck(ring->dev);
 
-		cancel_delayed_work_sync(&dev_priv->mm.idle_work);
-		queue_delayed_work(dev_priv->wq,
-				   &dev_priv->mm.retire_work,
-				   round_jiffies_up_relative(HZ));
-		intel_mark_busy(dev_priv->dev);
-	}
+	cancel_delayed_work_sync(&dev_priv->mm.idle_work);
+	queue_delayed_work(dev_priv->wq,
+			   &dev_priv->mm.retire_work,
+			   round_jiffies_up_relative(HZ));
+	intel_mark_busy(dev_priv->dev);
 
 	if (out_seqno)
 		*out_seqno = request->seqno;
@@ -2495,12 +2580,20 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 
 static void i915_gem_free_request(struct drm_i915_gem_request *request)
 {
+	struct intel_context *ctx = request->ctx;
+
 	list_del(&request->list);
 	i915_gem_request_remove_from_client(request);
 
-	if (request->ctx)
-		i915_gem_context_unreference(request->ctx);
+	if (ctx) {
+		if (i915.enable_execlists) {
+			struct intel_engine_cs *ring = request->ring;
 
+			if (ctx != ring->default_context)
+				intel_lr_context_unpin(ring, ctx);
+		}
+		i915_gem_context_unreference(ctx);
+	}
 	kfree(request);
 }
 
@@ -2555,6 +2648,23 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	}
 
 	/*
+	 * Clear the execlists queue up before freeing the requests, as those
+	 * are the ones that keep the context and ringbuffer backing objects
+	 * pinned in place.
+	 */
+	while (!list_empty(&ring->execlist_queue)) {
+		struct intel_ctx_submit_request *submit_req;
+
+		submit_req = list_first_entry(&ring->execlist_queue,
+				struct intel_ctx_submit_request,
+				execlist_link);
+		list_del(&submit_req->execlist_link);
+		intel_runtime_pm_put(dev_priv);
+		i915_gem_context_unreference(submit_req->ctx);
+		kfree(submit_req);
+	}
+
+	/*
 	 * We must free the requests after all the corresponding objects have
 	 * been moved off active lists. Which is the same order as the normal
 	 * retire_requests function does. This is important if object hold
@@ -2571,18 +2681,6 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 		i915_gem_free_request(request);
 	}
 
-	while (!list_empty(&ring->execlist_queue)) {
-		struct intel_ctx_submit_request *submit_req;
-
-		submit_req = list_first_entry(&ring->execlist_queue,
-				struct intel_ctx_submit_request,
-				execlist_link);
-		list_del(&submit_req->execlist_link);
-		intel_runtime_pm_put(dev_priv);
-		i915_gem_context_unreference(submit_req->ctx);
-		kfree(submit_req);
-	}
-
 	/* These may not have been flush before the reset, do so now */
 	kfree(ring->preallocated_lazy_request);
 	ring->preallocated_lazy_request = NULL;
@@ -2719,6 +2817,15 @@ i915_gem_retire_requests(struct drm_device *dev)
 	for_each_ring(ring, dev_priv, i) {
 		i915_gem_retire_requests_ring(ring);
 		idle &= list_empty(&ring->request_list);
+		if (i915.enable_execlists) {
+			unsigned long flags;
+
+			spin_lock_irqsave(&ring->execlist_lock, flags);
+			idle &= list_empty(&ring->execlist_queue);
+			spin_unlock_irqrestore(&ring->execlist_lock, flags);
+
+			intel_execlists_retire_requests(ring);
+		}
 	}
 
 	if (idle)
@@ -2811,6 +2918,9 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	u32 seqno = 0;
 	int ret = 0;
 
+	if (args->flags != 0)
+		return -EINVAL;
+
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
 		return ret;
@@ -2846,8 +2956,8 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
 	mutex_unlock(&dev->struct_mutex);
 
-	return __wait_seqno(ring, seqno, reset_counter, true, &args->timeout_ns,
-			    file->driver_priv);
+	return __i915_wait_seqno(ring, seqno, reset_counter, true,
+				 &args->timeout_ns, file->driver_priv);
 
 out:
 	drm_gem_object_unreference(&obj->base);
@@ -3166,6 +3276,7 @@ static void i915_gem_write_fence(struct drm_device *dev, int reg,
 	     obj->stride, obj->tiling_mode);
 
 	switch (INTEL_INFO(dev)->gen) {
+	case 9:
 	case 8:
 	case 7:
 	case 6:
@@ -3384,46 +3495,6 @@ static bool i915_gem_valid_gtt_space(struct i915_vma *vma,
 	return true;
 }
 
-static void i915_gem_verify_gtt(struct drm_device *dev)
-{
-#if WATCH_GTT
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_gem_object *obj;
-	int err = 0;
-
-	list_for_each_entry(obj, &dev_priv->mm.gtt_list, global_list) {
-		if (obj->gtt_space == NULL) {
-			printk(KERN_ERR "object found on GTT list with no space reserved\n");
-			err++;
-			continue;
-		}
-
-		if (obj->cache_level != obj->gtt_space->color) {
-			printk(KERN_ERR "object reserved space [%08lx, %08lx] with wrong color, cache_level=%x, color=%lx\n",
-			       i915_gem_obj_ggtt_offset(obj),
-			       i915_gem_obj_ggtt_offset(obj) + i915_gem_obj_ggtt_size(obj),
-			       obj->cache_level,
-			       obj->gtt_space->color);
-			err++;
-			continue;
-		}
-
-		if (!i915_gem_valid_gtt_space(dev,
-					      obj->gtt_space,
-					      obj->cache_level)) {
-			printk(KERN_ERR "invalid GTT space found at [%08lx, %08lx] - color=%x\n",
-			       i915_gem_obj_ggtt_offset(obj),
-			       i915_gem_obj_ggtt_offset(obj) + i915_gem_obj_ggtt_size(obj),
-			       obj->cache_level);
-			err++;
-			continue;
-		}
-	}
-
-	WARN_ON(err);
-#endif
-}
-
 /**
  * Finds free space in the GTT aperture and binds the object there.
  */
@@ -3514,25 +3585,10 @@ search_free:
 	list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
 	list_add_tail(&vma->mm_list, &vm->inactive_list);
 
-	if (i915_is_ggtt(vm)) {
-		bool mappable, fenceable;
-
-		fenceable = (vma->node.size == fence_size &&
-			     (vma->node.start & (fence_alignment - 1)) == 0);
-
-		mappable = (vma->node.start + obj->base.size <=
-			    dev_priv->gtt.mappable_end);
-
-		obj->map_and_fenceable = mappable && fenceable;
-	}
-
-	WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
-
 	trace_i915_vma_bind(vma, flags);
 	vma->bind_vma(vma, obj->cache_level,
-		      flags & (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0);
+		      flags & PIN_GLOBAL ? GLOBAL_BIND : 0);
 
-	i915_gem_verify_gtt(dev);
 	return vma;
 
 err_remove_node:
@@ -3560,7 +3616,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 	 * Stolen memory is always coherent with the GPU as it is explicitly
 	 * marked as wc by the system, or the system is cache-coherent.
 	 */
-	if (obj->stolen)
+	if (obj->stolen || obj->phys_handle)
 		return false;
 
 	/* If the GPU is snooping the contents of the CPU cache,
@@ -3739,7 +3795,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		list_for_each_entry(vma, &obj->vma_list, vma_link)
 			if (drm_mm_node_allocated(&vma->node))
 				vma->bind_vma(vma, cache_level,
-					      obj->has_global_gtt_mapping ? GLOBAL_BIND : 0);
+						vma->bound & GLOBAL_BIND);
 	}
 
 	list_for_each_entry(vma, &obj->vma_list, vma_link)
@@ -3769,7 +3825,6 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 						    old_write_domain);
 	}
 
-	i915_gem_verify_gtt(dev);
 	return 0;
 }
 
@@ -4067,7 +4122,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (seqno == 0)
 		return 0;
 
-	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL, NULL);
+	ret = __i915_wait_seqno(ring, seqno, reset_counter, true, NULL, NULL);
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
@@ -4101,6 +4156,7 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
 {
 	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
 	struct i915_vma *vma;
+	unsigned bound;
 	int ret;
 
 	if (WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base))
@@ -4109,6 +4165,9 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
 	if (WARN_ON(flags & (PIN_GLOBAL | PIN_MAPPABLE) && !i915_is_ggtt(vm)))
 		return -EINVAL;
 
+	if (WARN_ON((flags & (PIN_MAPPABLE | PIN_GLOBAL)) == PIN_MAPPABLE))
+		return -EINVAL;
+
 	vma = i915_gem_obj_to_vma(obj, vm);
 	if (vma) {
 		if (WARN_ON(vma->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
@@ -4130,15 +4189,39 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
 		}
 	}
 
+	bound = vma ? vma->bound : 0;
 	if (vma == NULL || !drm_mm_node_allocated(&vma->node)) {
 		vma = i915_gem_object_bind_to_vm(obj, vm, alignment, flags);
 		if (IS_ERR(vma))
 			return PTR_ERR(vma);
 	}
 
-	if (flags & PIN_GLOBAL && !obj->has_global_gtt_mapping)
+	if (flags & PIN_GLOBAL && !(vma->bound & GLOBAL_BIND))
 		vma->bind_vma(vma, obj->cache_level, GLOBAL_BIND);
 
+	if ((bound ^ vma->bound) & GLOBAL_BIND) {
+		bool mappable, fenceable;
+		u32 fence_size, fence_alignment;
+
+		fence_size = i915_gem_get_gtt_size(obj->base.dev,
+						   obj->base.size,
+						   obj->tiling_mode);
+		fence_alignment = i915_gem_get_gtt_alignment(obj->base.dev,
+							     obj->base.size,
+							     obj->tiling_mode,
+							     true);
+
+		fenceable = (vma->node.size == fence_size &&
+			     (vma->node.start & (fence_alignment - 1)) == 0);
+
+		mappable = (vma->node.start + obj->base.size <=
+			    dev_priv->gtt.mappable_end);
+
+		obj->map_and_fenceable = mappable && fenceable;
+	}
+
+	WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
+
 	vma->pin_count++;
 	if (flags & PIN_MAPPABLE)
 		obj->pin_mappable |= true;
@@ -4193,7 +4276,7 @@ i915_gem_pin_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_gem_object *obj;
 	int ret;
 
-	if (INTEL_INFO(dev)->gen >= 6)
+	if (drm_core_check_feature(dev, DRIVER_MODESET))
 		return -ENODEV;
 
 	ret = i915_mutex_lock_interruptible(dev);
@@ -4249,6 +4332,9 @@ i915_gem_unpin_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_gem_object *obj;
 	int ret;
 
+	if (drm_core_check_feature(dev, DRIVER_MODESET))
+		return -ENODEV;
+
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
 		return ret;
@@ -4326,6 +4412,7 @@ int
 i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 		       struct drm_file *file_priv)
 {
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_madvise *args = data;
 	struct drm_i915_gem_object *obj;
 	int ret;
@@ -4353,6 +4440,15 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
+	if (obj->pages &&
+	    obj->tiling_mode != I915_TILING_NONE &&
+	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
+		if (obj->madv == I915_MADV_WILLNEED)
+			i915_gem_object_unpin_pages(obj);
+		if (args->madv == I915_MADV_WILLNEED)
+			i915_gem_object_pin_pages(obj);
+	}
+
 	if (obj->madv != __I915_MADV_PURGED)
 		obj->madv = args->madv;
 
@@ -4495,8 +4591,6 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 		}
 	}
 
-	i915_gem_object_detach_phys(obj);
-
 	/* Stolen objects don't hold a ref, but do hold pin count. Fix that up
 	 * before progressing. */
 	if (obj->stolen)
@@ -4504,6 +4598,11 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 
 	WARN_ON(obj->frontbuffer_bits);
 
+	if (obj->pages && obj->madv == I915_MADV_WILLNEED &&
+	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES &&
+	    obj->tiling_mode != I915_TILING_NONE)
+		i915_gem_object_unpin_pages(obj);
+
 	if (WARN_ON(obj->pages_pin_count))
 		obj->pages_pin_count = 0;
 	if (discard_backing_storage(obj))
@@ -4576,9 +4675,6 @@ i915_gem_suspend(struct drm_device *dev)
 	int ret = 0;
 
 	mutex_lock(&dev->struct_mutex);
-	if (dev_priv->ums.mm_suspended)
-		goto err;
-
 	ret = i915_gpu_idle(dev);
 	if (ret)
 		goto err;
@@ -4589,15 +4685,7 @@ i915_gem_suspend(struct drm_device *dev)
 	if (!drm_core_check_feature(dev, DRIVER_MODESET))
 		i915_gem_evict_everything(dev);
 
-	i915_kernel_lost_context(dev);
 	i915_gem_stop_ringbuffers(dev);
-
-	/* Hack!  Don't let anybody do execbuf while we don't control the chip.
-	 * We need to replace this with a semaphore, or something.
-	 * And not confound ums.mm_suspended!
-	 */
-	dev_priv->ums.mm_suspended = !drm_core_check_feature(dev,
-							     DRIVER_MODESET);
 	mutex_unlock(&dev->struct_mutex);
 
 	del_timer_sync(&dev_priv->gpu_error.hangcheck_timer);
@@ -4888,9 +4976,6 @@ int i915_gem_init(struct drm_device *dev)
 	}
 	mutex_unlock(&dev->struct_mutex);
 
-	/* Allow hardware batchbuffers unless told otherwise, but not for KMS. */
-	if (!drm_core_check_feature(dev, DRIVER_MODESET))
-		dev_priv->dri1.allow_batchbuffer = 1;
 	return ret;
 }
 
@@ -4905,74 +4990,6 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
 		dev_priv->gt.cleanup_ring(ring);
 }
 
-int
-i915_gem_entervt_ioctl(struct drm_device *dev, void *data,
-		       struct drm_file *file_priv)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	int ret;
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return 0;
-
-	if (i915_reset_in_progress(&dev_priv->gpu_error)) {
-		DRM_ERROR("Reenabling wedged hardware, good luck\n");
-		atomic_set(&dev_priv->gpu_error.reset_counter, 0);
-	}
-
-	mutex_lock(&dev->struct_mutex);
-	dev_priv->ums.mm_suspended = 0;
-
-	ret = i915_gem_init_hw(dev);
-	if (ret != 0) {
-		mutex_unlock(&dev->struct_mutex);
-		return ret;
-	}
-
-	BUG_ON(!list_empty(&dev_priv->gtt.base.active_list));
-
-	ret = drm_irq_install(dev, dev->pdev->irq);
-	if (ret)
-		goto cleanup_ringbuffer;
-	mutex_unlock(&dev->struct_mutex);
-
-	return 0;
-
-cleanup_ringbuffer:
-	i915_gem_cleanup_ringbuffer(dev);
-	dev_priv->ums.mm_suspended = 1;
-	mutex_unlock(&dev->struct_mutex);
-
-	return ret;
-}
-
-int
-i915_gem_leavevt_ioctl(struct drm_device *dev, void *data,
-		       struct drm_file *file_priv)
-{
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return 0;
-
-	mutex_lock(&dev->struct_mutex);
-	drm_irq_uninstall(dev);
-	mutex_unlock(&dev->struct_mutex);
-
-	return i915_gem_suspend(dev);
-}
-
-void
-i915_gem_lastclose(struct drm_device *dev)
-{
-	int ret;
-
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		return;
-
-	ret = i915_gem_suspend(dev);
-	if (ret)
-		DRM_ERROR("failed to idle hardware: %d\n", ret);
-}
-
 static void
 init_ring_lists(struct intel_engine_cs *ring)
 {
@@ -5119,6 +5136,15 @@ int i915_gem_open(struct drm_device *dev, struct drm_file *file)
 	return ret;
 }
 
+/**
+ * i915_gem_track_fb - update frontbuffer tracking
+ * old: current GEM buffer for the frontbuffer slots
+ * new: new GEM buffer for the frontbuffer slots
+ * frontbuffer_bits: bitmask of frontbuffer slots
+ *
+ * This updates the frontbuffer tracking bits @frontbuffer_bits by clearing them
+ * from @old and setting them in @new. Both @old and @new can be NULL.
+ */
 void i915_gem_track_fb(struct drm_i915_gem_object *old,
 		       struct drm_i915_gem_object *new,
 		       unsigned frontbuffer_bits)
@@ -5302,7 +5328,7 @@ i915_gem_shrinker_oom(struct notifier_block *nb, unsigned long event, void *ptr)
 	struct drm_device *dev = dev_priv->dev;
 	struct drm_i915_gem_object *obj;
 	unsigned long timeout = msecs_to_jiffies(5000) + 1;
-	unsigned long pinned, bound, unbound, freed;
+	unsigned long pinned, bound, unbound, freed_pages;
 	bool was_interruptible;
 	bool unlock;
 
@@ -5319,7 +5345,7 @@ i915_gem_shrinker_oom(struct notifier_block *nb, unsigned long event, void *ptr)
 	was_interruptible = dev_priv->mm.interruptible;
 	dev_priv->mm.interruptible = false;
 
-	freed = i915_gem_shrink_all(dev_priv);
+	freed_pages = i915_gem_shrink_all(dev_priv);
 
 	dev_priv->mm.interruptible = was_interruptible;
 
@@ -5350,14 +5376,15 @@ i915_gem_shrinker_oom(struct notifier_block *nb, unsigned long event, void *ptr)
 	if (unlock)
 		mutex_unlock(&dev->struct_mutex);
 
-	pr_info("Purging GPU memory, %lu bytes freed, %lu bytes still pinned.\n",
-		freed, pinned);
+	if (freed_pages || unbound || bound)
+		pr_info("Purging GPU memory, %lu bytes freed, %lu bytes still pinned.\n",
+			freed_pages << PAGE_SHIFT, pinned);
 	if (unbound || bound)
 		pr_err("%lu and %lu bytes still available in the "
 		       "bound and unbound GPU page lists.\n",
 		       bound, unbound);
 
-	*(unsigned long *)ptr += freed;
+	*(unsigned long *)ptr += freed_pages;
 	return NOTIFY_DONE;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index a5221d8f1580..d17ff435f276 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -88,6 +88,7 @@
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
+#include "i915_trace.h"
 
 /* This is a HW constraint. The value below is the largest known requirement
  * I've seen in a spec to date, and that was a workaround for a non-shipping
@@ -137,6 +138,8 @@ void i915_gem_context_free(struct kref *ctx_ref)
 	struct intel_context *ctx = container_of(ctx_ref,
 						 typeof(*ctx), ref);
 
+	trace_i915_context_free(ctx);
+
 	if (i915.enable_execlists)
 		intel_lr_context_free(ctx);
 
@@ -274,6 +277,8 @@ i915_gem_create_context(struct drm_device *dev,
 		ctx->ppgtt = ppgtt;
 	}
 
+	trace_i915_context_create(ctx);
+
 	return ctx;
 
 err_unpin:
@@ -522,6 +527,7 @@ static int do_switch(struct intel_engine_cs *ring,
 	struct intel_context *from = ring->last_context;
 	u32 hw_flags = 0;
 	bool uninitialized = false;
+	struct i915_vma *vma;
 	int ret, i;
 
 	if (from != NULL && ring == &dev_priv->ring[RCS]) {
@@ -548,6 +554,7 @@ static int do_switch(struct intel_engine_cs *ring,
 	from = ring->last_context;
 
 	if (to->ppgtt) {
+		trace_switch_mm(ring, to);
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		if (ret)
 			goto unpin_out;
@@ -571,11 +578,10 @@ static int do_switch(struct intel_engine_cs *ring,
 	if (ret)
 		goto unpin_out;
 
-	if (!to->legacy_hw_ctx.rcs_state->has_global_gtt_mapping) {
-		struct i915_vma *vma = i915_gem_obj_to_vma(to->legacy_hw_ctx.rcs_state,
-							   &dev_priv->gtt.base);
-		vma->bind_vma(vma, to->legacy_hw_ctx.rcs_state->cache_level, GLOBAL_BIND);
-	}
+	vma = i915_gem_obj_to_ggtt(to->legacy_hw_ctx.rcs_state);
+	if (!(vma->bound & GLOBAL_BIND))
+		vma->bind_vma(vma, to->legacy_hw_ctx.rcs_state->cache_level,
+				GLOBAL_BIND);
 
 	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
 		hw_flags |= MI_RESTORE_INHIBIT;
@@ -629,7 +635,7 @@ done:
 
 	if (uninitialized) {
 		if (ring->init_context) {
-			ret = ring->init_context(ring);
+			ret = ring->init_context(ring, to);
 			if (ret)
 				DRM_ERROR("ring init context: %d\n", ret);
 		}
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 1a0611bb576b..f06027ba3ee5 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -121,6 +121,9 @@ eb_lookup_vmas(struct eb_vmas *eb,
 			goto err;
 		}
 
+		WARN_ONCE(obj->base.dumb,
+			  "GPU use of dumb buffer is illegal.\n");
+
 		drm_gem_object_reference(&obj->base);
 		list_add_tail(&obj->obj_exec_link, &objects);
 	}
@@ -357,12 +360,9 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 	 * through the ppgtt for non_secure batchbuffers. */
 	if (unlikely(IS_GEN6(dev) &&
 	    reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION &&
-	    !target_i915_obj->has_global_gtt_mapping)) {
-		struct i915_vma *vma =
-			list_first_entry(&target_i915_obj->vma_list,
-					 typeof(*vma), vma_link);
-		vma->bind_vma(vma, target_i915_obj->cache_level, GLOBAL_BIND);
-	}
+	    !(target_vma->bound & GLOBAL_BIND)))
+		target_vma->bind_vma(target_vma, target_i915_obj->cache_level,
+				GLOBAL_BIND);
 
 	/* Validate that the target is in a valid r/w GPU domain */
 	if (unlikely(reloc->write_domain & (reloc->write_domain - 1))) {
@@ -531,7 +531,7 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 
 	flags = 0;
 	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
-		flags |= PIN_MAPPABLE;
+		flags |= PIN_GLOBAL | PIN_MAPPABLE;
 	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
 		flags |= PIN_GLOBAL;
 	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
@@ -1023,6 +1023,47 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
 	return 0;
 }
 
+static int
+i915_emit_box(struct intel_engine_cs *ring,
+	      struct drm_clip_rect *box,
+	      int DR1, int DR4)
+{
+	int ret;
+
+	if (box->y2 <= box->y1 || box->x2 <= box->x1 ||
+	    box->y2 <= 0 || box->x2 <= 0) {
+		DRM_ERROR("Bad box %d,%d..%d,%d\n",
+			  box->x1, box->y1, box->x2, box->y2);
+		return -EINVAL;
+	}
+
+	if (INTEL_INFO(ring->dev)->gen >= 4) {
+		ret = intel_ring_begin(ring, 4);
+		if (ret)
+			return ret;
+
+		intel_ring_emit(ring, GFX_OP_DRAWRECT_INFO_I965);
+		intel_ring_emit(ring, (box->x1 & 0xffff) | box->y1 << 16);
+		intel_ring_emit(ring, ((box->x2 - 1) & 0xffff) | (box->y2 - 1) << 16);
+		intel_ring_emit(ring, DR4);
+	} else {
+		ret = intel_ring_begin(ring, 6);
+		if (ret)
+			return ret;
+
+		intel_ring_emit(ring, GFX_OP_DRAWRECT_INFO);
+		intel_ring_emit(ring, DR1);
+		intel_ring_emit(ring, (box->x1 & 0xffff) | box->y1 << 16);
+		intel_ring_emit(ring, ((box->x2 - 1) & 0xffff) | (box->y2 - 1) << 16);
+		intel_ring_emit(ring, DR4);
+		intel_ring_emit(ring, 0);
+	}
+	intel_ring_advance(ring);
+
+	return 0;
+}
+
+
 int
 i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
 			       struct intel_engine_cs *ring,
@@ -1151,7 +1192,7 @@ i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
 	exec_len = args->batch_len;
 	if (cliprects) {
 		for (i = 0; i < args->num_cliprects; i++) {
-			ret = i915_emit_box(dev, &cliprects[i],
+			ret = i915_emit_box(ring, &cliprects[i],
 					    args->DR1, args->DR4);
 			if (ret)
 				goto error;
@@ -1300,12 +1341,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		goto pre_mutex_err;
 
-	if (dev_priv->ums.mm_suspended) {
-		mutex_unlock(&dev->struct_mutex);
-		ret = -EBUSY;
-		goto pre_mutex_err;
-	}
-
 	ctx = i915_gem_validate_context(dev, file, ring, ctx_id);
 	if (IS_ERR(ctx)) {
 		mutex_unlock(&dev->struct_mutex);
@@ -1368,17 +1403,19 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 				      batch_obj,
 				      args->batch_start_offset,
 				      file->is_master);
-		if (ret)
-			goto err;
-
-		/*
-		 * XXX: Actually do this when enabling batch copy...
-		 *
-		 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
-		 * from MI_BATCH_BUFFER_START commands issued in the
-		 * dispatch_execbuffer implementations. We specifically don't
-		 * want that set when the command parser is enabled.
-		 */
+		if (ret) {
+			if (ret != -EACCES)
+				goto err;
+		} else {
+			/*
+			 * XXX: Actually do this when enabling batch copy...
+			 *
+			 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
+			 * from MI_BATCH_BUFFER_START commands issued in the
+			 * dispatch_execbuffer implementations. We specifically don't
+			 * want that set when the command parser is enabled.
+			 */
+		}
 	}
 
 	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 728938f02341..171f6eafdeee 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -35,13 +35,26 @@ static void chv_setup_private_ppat(struct drm_i915_private *dev_priv);
 
 static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 {
-	if (enable_ppgtt == 0 || !HAS_ALIASING_PPGTT(dev))
+	bool has_aliasing_ppgtt;
+	bool has_full_ppgtt;
+
+	has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
+	has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
+	if (IS_GEN8(dev))
+		has_full_ppgtt = false; /* XXX why? */
+
+	/*
+	 * We don't allow disabling PPGTT for gen9+ as it's a requirement for
+	 * execlists, the sole mechanism available to submit work.
+	 */
+	if (INTEL_INFO(dev)->gen < 9 &&
+	    (enable_ppgtt == 0 || !has_aliasing_ppgtt))
 		return 0;
 
 	if (enable_ppgtt == 1)
 		return 1;
 
-	if (enable_ppgtt == 2 && HAS_PPGTT(dev))
+	if (enable_ppgtt == 2 && has_full_ppgtt)
 		return 2;
 
 #ifdef CONFIG_INTEL_IOMMU
@@ -59,7 +72,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 		return 0;
 	}
 
-	return HAS_ALIASING_PPGTT(dev) ? 1 : 0;
+	return has_aliasing_ppgtt ? 1 : 0;
 }
 
 
@@ -156,9 +169,6 @@ static gen6_gtt_pte_t byt_pte_encode(dma_addr_t addr,
 	gen6_gtt_pte_t pte = valid ? GEN6_PTE_VALID : 0;
 	pte |= GEN6_PTE_ADDR_ENCODE(addr);
 
-	/* Mark the page as writeable.  Other platforms don't have a
-	 * setting for read-only/writable, so this matches that behavior.
-	 */
 	if (!(flags & PTE_READ_ONLY))
 		pte |= BYT_PTE_WRITEABLE;
 
@@ -1092,7 +1102,7 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 
 	if (INTEL_INFO(dev)->gen < 8)
 		return gen6_ppgtt_init(ppgtt);
-	else if (IS_GEN8(dev))
+	else if (IS_GEN8(dev) || IS_GEN9(dev))
 		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
 	else
 		BUG();
@@ -1166,6 +1176,8 @@ i915_ppgtt_create(struct drm_device *dev, struct drm_i915_file_private *fpriv)
 
 	ppgtt->file_priv = fpriv;
 
+	trace_i915_ppgtt_create(&ppgtt->base);
+
 	return ppgtt;
 }
 
@@ -1174,6 +1186,8 @@ void  i915_ppgtt_release(struct kref *kref)
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(kref, struct i915_hw_ppgtt, ref);
 
+	trace_i915_ppgtt_release(&ppgtt->base);
+
 	/* vmas should already be unbound */
 	WARN_ON(!list_empty(&ppgtt->base.active_list));
 	WARN_ON(!list_empty(&ppgtt->base.inactive_list));
@@ -1258,7 +1272,7 @@ void i915_check_and_clear_faults(struct drm_device *dev)
 		fault_reg = I915_READ(RING_FAULT_REG(ring));
 		if (fault_reg & RING_FAULT_VALID) {
 			DRM_DEBUG_DRIVER("Unexpected fault\n"
-					 "\tAddr: 0x%08lx\\n"
+					 "\tAddr: 0x%08lx\n"
 					 "\tAddress space: %s\n"
 					 "\tSource ID: %d\n"
 					 "\tType: %d\n",
@@ -1328,7 +1342,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		 * Unfortunately above, we've just wiped out the mappings
 		 * without telling our object about it. So we need to fake it.
 		 */
-		obj->has_global_gtt_mapping = 0;
+		vma->bound &= ~GLOBAL_BIND;
 		vma->bind_vma(vma, obj->cache_level, GLOBAL_BIND);
 	}
 
@@ -1525,7 +1539,7 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
 
 	BUG_ON(!i915_is_ggtt(vma->vm));
 	intel_gtt_insert_sg_entries(vma->obj->pages, entry, flags);
-	vma->obj->has_global_gtt_mapping = 1;
+	vma->bound = GLOBAL_BIND;
 }
 
 static void i915_ggtt_clear_range(struct i915_address_space *vm,
@@ -1544,7 +1558,7 @@ static void i915_ggtt_unbind_vma(struct i915_vma *vma)
 	const unsigned int size = vma->obj->base.size >> PAGE_SHIFT;
 
 	BUG_ON(!i915_is_ggtt(vma->vm));
-	vma->obj->has_global_gtt_mapping = 0;
+	vma->bound = 0;
 	intel_gtt_clear_range(first, size);
 }
 
@@ -1572,24 +1586,24 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 	 * flags. At all other times, the GPU will use the aliasing PPGTT.
 	 */
 	if (!dev_priv->mm.aliasing_ppgtt || flags & GLOBAL_BIND) {
-		if (!obj->has_global_gtt_mapping ||
+		if (!(vma->bound & GLOBAL_BIND) ||
 		    (cache_level != obj->cache_level)) {
 			vma->vm->insert_entries(vma->vm, obj->pages,
 						vma->node.start,
 						cache_level, flags);
-			obj->has_global_gtt_mapping = 1;
+			vma->bound |= GLOBAL_BIND;
 		}
 	}
 
 	if (dev_priv->mm.aliasing_ppgtt &&
-	    (!obj->has_aliasing_ppgtt_mapping ||
+	    (!(vma->bound & LOCAL_BIND) ||
 	     (cache_level != obj->cache_level))) {
 		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
 		appgtt->base.insert_entries(&appgtt->base,
 					    vma->obj->pages,
 					    vma->node.start,
 					    cache_level, flags);
-		vma->obj->has_aliasing_ppgtt_mapping = 1;
+		vma->bound |= LOCAL_BIND;
 	}
 }
 
@@ -1599,21 +1613,21 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj = vma->obj;
 
-	if (obj->has_global_gtt_mapping) {
+	if (vma->bound & GLOBAL_BIND) {
 		vma->vm->clear_range(vma->vm,
 				     vma->node.start,
 				     obj->base.size,
 				     true);
-		obj->has_global_gtt_mapping = 0;
+		vma->bound &= ~GLOBAL_BIND;
 	}
 
-	if (obj->has_aliasing_ppgtt_mapping) {
+	if (vma->bound & LOCAL_BIND) {
 		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
 		appgtt->base.clear_range(&appgtt->base,
 					 vma->node.start,
 					 obj->base.size,
 					 true);
-		obj->has_aliasing_ppgtt_mapping = 0;
+		vma->bound &= ~LOCAL_BIND;
 	}
 }
 
@@ -1650,10 +1664,10 @@ static void i915_gtt_color_adjust(struct drm_mm_node *node,
 	}
 }
 
-int i915_gem_setup_global_gtt(struct drm_device *dev,
-			      unsigned long start,
-			      unsigned long mappable_end,
-			      unsigned long end)
+static int i915_gem_setup_global_gtt(struct drm_device *dev,
+				     unsigned long start,
+				     unsigned long mappable_end,
+				     unsigned long end)
 {
 	/* Let GEM Manage all of the aperture.
 	 *
@@ -1691,7 +1705,7 @@ int i915_gem_setup_global_gtt(struct drm_device *dev,
 			DRM_DEBUG_KMS("Reservation failed: %i\n", ret);
 			return ret;
 		}
-		obj->has_global_gtt_mapping = 1;
+		vma->bound |= GLOBAL_BIND;
 	}
 
 	dev_priv->gtt.base.start = start;
@@ -1764,7 +1778,6 @@ static int setup_scratch_page(struct drm_device *dev)
 	page = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO);
 	if (page == NULL)
 		return -ENOMEM;
-	get_page(page);
 	set_pages_uc(page, 1);
 
 #ifdef CONFIG_INTEL_IOMMU
@@ -1789,7 +1802,6 @@ static void teardown_scratch_page(struct drm_device *dev)
 	set_pages_wb(page, 1);
 	pci_unmap_page(dev->pdev, dev_priv->gtt.base.scratch.addr,
 		       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-	put_page(page);
 	__free_page(page);
 }
 
@@ -1859,6 +1871,18 @@ static size_t chv_get_stolen_size(u16 gmch_ctrl)
 		return (gmch_ctrl - 0x17 + 9) << 22;
 }
 
+static size_t gen9_get_stolen_size(u16 gen9_gmch_ctl)
+{
+	gen9_gmch_ctl >>= BDW_GMCH_GMS_SHIFT;
+	gen9_gmch_ctl &= BDW_GMCH_GMS_MASK;
+
+	if (gen9_gmch_ctl < 0xf0)
+		return gen9_gmch_ctl << 25; /* 32 MB units */
+	else
+		/* 4MB increments starting at 0xf0 for 4MB */
+		return (gen9_gmch_ctl - 0xf0 + 1) << 22;
+}
+
 static int ggtt_probe_common(struct drm_device *dev,
 			     size_t gtt_size)
 {
@@ -1934,9 +1958,17 @@ static void chv_setup_private_ppat(struct drm_i915_private *dev_priv)
 	 * Only the snoop bit has meaning for CHV, the rest is
 	 * ignored.
 	 *
-	 * Note that the harware enforces snooping for all page
-	 * table accesses. The snoop bit is actually ignored for
-	 * PDEs.
+	 * The hardware will never snoop for certain types of accesses:
+	 * - CPU GTT (GMADR->GGTT->no snoop->memory)
+	 * - PPGTT page tables
+	 * - some other special cycles
+	 *
+	 * As with BDW, we also need to consider the following for GT accesses:
+	 * "For GGTT, there is NO pat_sel[2:0] from the entry,
+	 * so RTL will always use the value corresponding to
+	 * pat_sel = 000".
+	 * Which means we must set the snoop bit in PAT entry 0
+	 * in order to keep the global status page working.
 	 */
 	pat = GEN8_PPAT(0, CHV_PPAT_SNOOP) |
 	      GEN8_PPAT(1, 0) |
@@ -1971,7 +2003,10 @@ static int gen8_gmch_probe(struct drm_device *dev,
 
 	pci_read_config_word(dev->pdev, SNB_GMCH_CTRL, &snb_gmch_ctl);
 
-	if (IS_CHERRYVIEW(dev)) {
+	if (INTEL_INFO(dev)->gen >= 9) {
+		*stolen = gen9_get_stolen_size(snb_gmch_ctl);
+		gtt_size = gen8_get_total_gtt_size(snb_gmch_ctl);
+	} else if (IS_CHERRYVIEW(dev)) {
 		*stolen = chv_get_stolen_size(snb_gmch_ctl);
 		gtt_size = chv_get_total_gtt_size(snb_gmch_ctl);
 	} else {
@@ -2143,6 +2178,7 @@ static struct i915_vma *__i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	vma->obj = obj;
 
 	switch (INTEL_INFO(vm->dev)->gen) {
+	case 9:
 	case 8:
 	case 7:
 	case 6:
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d5c14af51e99..beaf4bcfdac8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -123,6 +123,12 @@ struct i915_vma {
 	struct drm_i915_gem_object *obj;
 	struct i915_address_space *vm;
 
+	/** Flags and address space this VMA is bound to */
+#define GLOBAL_BIND	(1<<0)
+#define LOCAL_BIND	(1<<1)
+#define PTE_READ_ONLY	(1<<2)
+	unsigned int bound : 4;
+
 	/** This object's place on the active/inactive lists */
 	struct list_head mm_list;
 
@@ -155,8 +161,6 @@ struct i915_vma {
 	 * setting the valid PTE entries to a reserved scratch page. */
 	void (*unbind_vma)(struct i915_vma *vma);
 	/* Map an object into an address space with the given cache flags. */
-#define GLOBAL_BIND (1<<0)
-#define PTE_READ_ONLY (1<<1)
 	void (*bind_vma)(struct i915_vma *vma,
 			 enum i915_cache_level cache_level,
 			 u32 flags);
@@ -270,8 +274,6 @@ struct i915_hw_ppgtt {
 
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
-int i915_gem_setup_global_gtt(struct drm_device *dev, unsigned long start,
-			      unsigned long mappable_end, unsigned long end);
 void i915_global_gtt_cleanup(struct drm_device *dev);
 
 
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index a9a62d75aa57..98dcd94acba8 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -38,6 +38,8 @@ render_state_get_rodata(struct drm_device *dev, const int gen)
 		return &gen7_null_state;
 	case 8:
 		return &gen8_null_state;
+	case 9:
+		return &gen9_null_state;
 	}
 
 	return NULL;
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 85fda6b803e4..a2045848bd1a 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -137,7 +137,11 @@ static unsigned long i915_stolen_to_physical(struct drm_device *dev)
 		r = devm_request_mem_region(dev->dev, base + 1,
 					    dev_priv->gtt.stolen_size - 1,
 					    "Graphics Stolen Memory");
-		if (r == NULL) {
+		/*
+		 * GEN3 firmware likes to smash pci bridges into the stolen
+		 * range. Apparently this works.
+		 */
+		if (r == NULL && !IS_GEN3(dev)) {
 			DRM_ERROR("conflict detected with stolen region: [0x%08x - 0x%08x]\n",
 				  base, base + (uint32_t)dev_priv->gtt.stolen_size);
 			base = 0;
@@ -533,7 +537,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 		}
 	}
 
-	obj->has_global_gtt_mapping = 1;
+	vma->bound |= GLOBAL_BIND;
 
 	list_add_tail(&obj->global_list, &dev_priv->mm.bound_list);
 	list_add_tail(&vma->mm_list, &ggtt->inactive_list);
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index 2b1eaa29ada4..4727a4e2c87c 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -102,22 +102,33 @@ i915_gem_detect_bit_6_swizzle(struct drm_device *dev)
 		swizzle_x = I915_BIT_6_SWIZZLE_NONE;
 		swizzle_y = I915_BIT_6_SWIZZLE_NONE;
 	} else if (INTEL_INFO(dev)->gen >= 6) {
-		uint32_t dimm_c0, dimm_c1;
-		dimm_c0 = I915_READ(MAD_DIMM_C0);
-		dimm_c1 = I915_READ(MAD_DIMM_C1);
-		dimm_c0 &= MAD_DIMM_A_SIZE_MASK | MAD_DIMM_B_SIZE_MASK;
-		dimm_c1 &= MAD_DIMM_A_SIZE_MASK | MAD_DIMM_B_SIZE_MASK;
-		/* Enable swizzling when the channels are populated with
-		 * identically sized dimms. We don't need to check the 3rd
-		 * channel because no cpu with gpu attached ships in that
-		 * configuration. Also, swizzling only makes sense for 2
-		 * channels anyway. */
-		if (dimm_c0 == dimm_c1) {
-			swizzle_x = I915_BIT_6_SWIZZLE_9_10;
-			swizzle_y = I915_BIT_6_SWIZZLE_9;
+		if (dev_priv->preserve_bios_swizzle) {
+			if (I915_READ(DISP_ARB_CTL) &
+			    DISP_TILE_SURFACE_SWIZZLING) {
+				swizzle_x = I915_BIT_6_SWIZZLE_9_10;
+				swizzle_y = I915_BIT_6_SWIZZLE_9;
+			} else {
+				swizzle_x = I915_BIT_6_SWIZZLE_NONE;
+				swizzle_y = I915_BIT_6_SWIZZLE_NONE;
+			}
 		} else {
-			swizzle_x = I915_BIT_6_SWIZZLE_NONE;
-			swizzle_y = I915_BIT_6_SWIZZLE_NONE;
+			uint32_t dimm_c0, dimm_c1;
+			dimm_c0 = I915_READ(MAD_DIMM_C0);
+			dimm_c1 = I915_READ(MAD_DIMM_C1);
+			dimm_c0 &= MAD_DIMM_A_SIZE_MASK | MAD_DIMM_B_SIZE_MASK;
+			dimm_c1 &= MAD_DIMM_A_SIZE_MASK | MAD_DIMM_B_SIZE_MASK;
+			/* Enable swizzling when the channels are populated
+			 * with identically sized dimms. We don't need to check
+			 * the 3rd channel because no cpu with gpu attached
+			 * ships in that configuration. Also, swizzling only
+			 * makes sense for 2 channels anyway. */
+			if (dimm_c0 == dimm_c1) {
+				swizzle_x = I915_BIT_6_SWIZZLE_9_10;
+				swizzle_y = I915_BIT_6_SWIZZLE_9;
+			} else {
+				swizzle_x = I915_BIT_6_SWIZZLE_NONE;
+				swizzle_y = I915_BIT_6_SWIZZLE_NONE;
+			}
 		}
 	} else if (IS_GEN5(dev)) {
 		/* On Ironlake whatever DRAM config, GPU always do
@@ -167,6 +178,15 @@ i915_gem_detect_bit_6_swizzle(struct drm_device *dev)
 			}
 			break;
 		}
+
+		/* check for L-shaped memory aka modified enhanced addressing */
+		if (IS_GEN4(dev)) {
+			uint32_t ddc2 = I915_READ(DCC2);
+
+			if (!(ddc2 & DCC2_MODIFIED_ENHANCED_DISABLE))
+				dev_priv->quirks |= QUIRK_PIN_SWIZZLED_PAGES;
+		}
+
 		if (dcc == 0xffffffff) {
 			DRM_ERROR("Couldn't read from MCHBAR.  "
 				  "Disabling tiling.\n");
@@ -369,6 +389,15 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 			ret = i915_gem_object_ggtt_unbind(obj);
 
 		if (ret == 0) {
+			if (obj->pages &&
+			    obj->madv == I915_MADV_WILLNEED &&
+			    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
+				if (args->tiling_mode == I915_TILING_NONE)
+					i915_gem_object_unpin_pages(obj);
+				if (obj->tiling_mode == I915_TILING_NONE)
+					i915_gem_object_pin_pages(obj);
+			}
+
 			obj->fence_dirty =
 				obj->last_fenced_seqno ||
 				obj->fence_reg != I915_FENCE_REG_NONE;
@@ -434,6 +463,7 @@ i915_gem_get_tiling(struct drm_device *dev, void *data,
 	}
 
 	/* Hide bit 17 from the user -- see comment in i915_gem_set_tiling */
+	args->phys_swizzle_mode = args->swizzle_mode;
 	if (args->swizzle_mode == I915_BIT_6_SWIZZLE_9_17)
 		args->swizzle_mode = I915_BIT_6_SWIZZLE_9;
 	if (args->swizzle_mode == I915_BIT_6_SWIZZLE_9_10_17)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 2c87a797213f..cdaee6ce05f8 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -242,11 +242,15 @@ static const char *hangcheck_action_to_str(enum intel_ring_hangcheck_action a)
 
 static void i915_ring_error_state(struct drm_i915_error_state_buf *m,
 				  struct drm_device *dev,
-				  struct drm_i915_error_ring *ring)
+				  struct drm_i915_error_state *error,
+				  int ring_idx)
 {
+	struct drm_i915_error_ring *ring = &error->ring[ring_idx];
+
 	if (!ring->valid)
 		return;
 
+	err_printf(m, "%s command stream:\n", ring_str(ring_idx));
 	err_printf(m, "  HEAD: 0x%08x\n", ring->head);
 	err_printf(m, "  TAIL: 0x%08x\n", ring->tail);
 	err_printf(m, "  CTL: 0x%08x\n", ring->ctl);
@@ -388,10 +392,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 	if (INTEL_INFO(dev)->gen == 7)
 		err_printf(m, "ERR_INT: 0x%08x\n", error->err_int);
 
-	for (i = 0; i < ARRAY_SIZE(error->ring); i++) {
-		err_printf(m, "%s command stream:\n", ring_str(i));
-		i915_ring_error_state(m, dev, &error->ring[i]);
-	}
+	for (i = 0; i < ARRAY_SIZE(error->ring); i++)
+		i915_ring_error_state(m, dev, error, i);
 
 	for (i = 0; i < error->vm_count; i++) {
 		err_printf(m, "vm[%d]\n", i);
@@ -565,6 +567,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 			 struct i915_address_space *vm)
 {
 	struct drm_i915_error_object *dst;
+	struct i915_vma *vma = NULL;
 	int num_pages;
 	bool use_ggtt;
 	int i = 0;
@@ -585,16 +588,17 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 		dst->gtt_offset = -1;
 
 	reloc_offset = dst->gtt_offset;
+	if (i915_is_ggtt(vm))
+		vma = i915_gem_obj_to_ggtt(src);
 	use_ggtt = (src->cache_level == I915_CACHE_NONE &&
-		    i915_is_ggtt(vm) &&
-		    src->has_global_gtt_mapping &&
-		    reloc_offset + num_pages * PAGE_SIZE <= dev_priv->gtt.mappable_end);
+		   vma && (vma->bound & GLOBAL_BIND) &&
+		   reloc_offset + num_pages * PAGE_SIZE <= dev_priv->gtt.mappable_end);
 
 	/* Cannot access stolen address directly, try to use the aperture */
 	if (src->stolen) {
 		use_ggtt = true;
 
-		if (!src->has_global_gtt_mapping)
+		if (!(vma && vma->bound & GLOBAL_BIND))
 			goto unwind;
 
 		reloc_offset = i915_gem_obj_ggtt_offset(src);
@@ -765,6 +769,7 @@ static void i915_gem_record_fences(struct drm_device *dev,
 
 	/* Fences */
 	switch (INTEL_INFO(dev)->gen) {
+	case 9:
 	case 8:
 	case 7:
 	case 6:
@@ -804,9 +809,8 @@ static void gen8_record_semaphore_state(struct drm_i915_private *dev_priv,
 
 	if (!error->semaphore_obj)
 		error->semaphore_obj =
-			i915_error_object_create(dev_priv,
-						 dev_priv->semaphore_obj,
-						 &dev_priv->gtt.base);
+			i915_error_ggtt_object_create(dev_priv,
+						      dev_priv->semaphore_obj);
 
 	for_each_ring(to, dev_priv, i) {
 		int idx;
@@ -923,6 +927,7 @@ static void i915_record_ring_state(struct drm_device *dev,
 		ering->vm_info.gfx_mode = I915_READ(RING_MODE_GEN7(ring));
 
 		switch (INTEL_INFO(dev)->gen) {
+		case 9:
 		case 8:
 			for (i = 0; i < 4; i++) {
 				ering->vm_info.pdp[i] =
@@ -1238,7 +1243,8 @@ static void i915_error_capture_msg(struct drm_device *dev,
 	ecode = i915_error_generate_code(dev_priv, error, &ring_id);
 
 	len = scnprintf(error->error_msg, sizeof(error->error_msg),
-			"GPU HANG: ecode %d:0x%08x", ring_id, ecode);
+			"GPU HANG: ecode %d:%d:0x%08x",
+			INTEL_INFO(dev)->gen, ring_id, ecode);
 
 	if (ring_id != -1 && error->ring[ring_id].pid != -1)
 		len += scnprintf(error->error_msg + len,
@@ -1326,13 +1332,12 @@ void i915_error_state_get(struct drm_device *dev,
 			  struct i915_error_state_file_priv *error_priv)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dev_priv->gpu_error.lock, flags);
+	spin_lock_irq(&dev_priv->gpu_error.lock);
 	error_priv->error = dev_priv->gpu_error.first_error;
 	if (error_priv->error)
 		kref_get(&error_priv->error->ref);
-	spin_unlock_irqrestore(&dev_priv->gpu_error.lock, flags);
+	spin_unlock_irq(&dev_priv->gpu_error.lock);
 
 }
 
@@ -1346,12 +1351,11 @@ void i915_destroy_error_state(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_error_state *error;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dev_priv->gpu_error.lock, flags);
+	spin_lock_irq(&dev_priv->gpu_error.lock);
 	error = dev_priv->gpu_error.first_error;
 	dev_priv->gpu_error.first_error = NULL;
-	spin_unlock_irqrestore(&dev_priv->gpu_error.lock, flags);
+	spin_unlock_irq(&dev_priv->gpu_error.lock);
 
 	if (error)
 		kref_put(&error->ref, i915_error_state_free);
@@ -1389,6 +1393,7 @@ void i915_get_extra_instdone(struct drm_device *dev, uint32_t *instdone)
 		WARN_ONCE(1, "Unsupported platform\n");
 	case 7:
 	case 8:
+	case 9:
 		instdone[0] = I915_READ(GEN7_INSTDONE_1);
 		instdone[1] = I915_READ(GEN7_SC_INSTDONE);
 		instdone[2] = I915_READ(GEN7_SAMPLER_INSTDONE);
diff --git a/drivers/gpu/drm/i915/i915_ioc32.c b/drivers/gpu/drm/i915/i915_ioc32.c
index 2e0613e26251..176de6322e4d 100644
--- a/drivers/gpu/drm/i915/i915_ioc32.c
+++ b/drivers/gpu/drm/i915/i915_ioc32.c
@@ -189,7 +189,6 @@ static drm_ioctl_compat_t *i915_compat_ioctls[] = {
 	[DRM_I915_ALLOC] = compat_i915_alloc
 };
 
-#ifdef CONFIG_COMPAT
 /**
  * Called whenever a 32-bit process running under a 64-bit kernel
  * performs an ioctl on /dev/dri/card<n>.
@@ -218,4 +217,3 @@ long i915_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 
 	return ret;
 }
-#endif
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f66392b6e287..981834b0f9b6 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -37,6 +37,14 @@
 #include "i915_trace.h"
 #include "intel_drv.h"
 
+/**
+ * DOC: interrupt handling
+ *
+ * These functions provide the basic support for enabling and disabling the
+ * interrupt handling support. There's a lot more functionality in i915_irq.c
+ * and related files, but that will be described in separate chapters.
+ */
+
 static const u32 hpd_ibx[] = {
 	[HPD_CRT] = SDE_CRT_HOTPLUG,
 	[HPD_SDVO_B] = SDE_SDVOB_HOTPLUG,
@@ -118,20 +126,22 @@ static const u32 hpd_status_i915[] = { /* i915 and valleyview are the same */
 
 #define GEN8_IRQ_INIT_NDX(type, which, imr_val, ier_val) do { \
 	GEN5_ASSERT_IIR_IS_ZERO(GEN8_##type##_IIR(which)); \
-	I915_WRITE(GEN8_##type##_IMR(which), (imr_val)); \
 	I915_WRITE(GEN8_##type##_IER(which), (ier_val)); \
-	POSTING_READ(GEN8_##type##_IER(which)); \
+	I915_WRITE(GEN8_##type##_IMR(which), (imr_val)); \
+	POSTING_READ(GEN8_##type##_IMR(which)); \
 } while (0)
 
 #define GEN5_IRQ_INIT(type, imr_val, ier_val) do { \
 	GEN5_ASSERT_IIR_IS_ZERO(type##IIR); \
-	I915_WRITE(type##IMR, (imr_val)); \
 	I915_WRITE(type##IER, (ier_val)); \
-	POSTING_READ(type##IER); \
+	I915_WRITE(type##IMR, (imr_val)); \
+	POSTING_READ(type##IMR); \
 } while (0)
 
+static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir);
+
 /* For display hotplug interrupt */
-static void
+void
 ironlake_enable_display_irq(struct drm_i915_private *dev_priv, u32 mask)
 {
 	assert_spin_locked(&dev_priv->irq_lock);
@@ -146,7 +156,7 @@ ironlake_enable_display_irq(struct drm_i915_private *dev_priv, u32 mask)
 	}
 }
 
-static void
+void
 ironlake_disable_display_irq(struct drm_i915_private *dev_priv, u32 mask)
 {
 	assert_spin_locked(&dev_priv->irq_lock);
@@ -192,71 +202,28 @@ void gen5_disable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask)
 	ilk_update_gt_irq(dev_priv, mask, 0);
 }
 
-/**
-  * snb_update_pm_irq - update GEN6_PMIMR
-  * @dev_priv: driver private
-  * @interrupt_mask: mask of interrupt bits to update
-  * @enabled_irq_mask: mask of interrupt bits to enable
-  */
-static void snb_update_pm_irq(struct drm_i915_private *dev_priv,
-			      uint32_t interrupt_mask,
-			      uint32_t enabled_irq_mask)
-{
-	uint32_t new_val;
-
-	assert_spin_locked(&dev_priv->irq_lock);
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return;
-
-	new_val = dev_priv->pm_irq_mask;
-	new_val &= ~interrupt_mask;
-	new_val |= (~enabled_irq_mask & interrupt_mask);
-
-	if (new_val != dev_priv->pm_irq_mask) {
-		dev_priv->pm_irq_mask = new_val;
-		I915_WRITE(GEN6_PMIMR, dev_priv->pm_irq_mask);
-		POSTING_READ(GEN6_PMIMR);
-	}
-}
-
-void gen6_enable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask)
+static u32 gen6_pm_iir(struct drm_i915_private *dev_priv)
 {
-	snb_update_pm_irq(dev_priv, mask, mask);
+	return INTEL_INFO(dev_priv)->gen >= 8 ? GEN8_GT_IIR(2) : GEN6_PMIIR;
 }
 
-void gen6_disable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask)
+static u32 gen6_pm_imr(struct drm_i915_private *dev_priv)
 {
-	snb_update_pm_irq(dev_priv, mask, 0);
+	return INTEL_INFO(dev_priv)->gen >= 8 ? GEN8_GT_IMR(2) : GEN6_PMIMR;
 }
 
-static bool ivb_can_enable_err_int(struct drm_device *dev)
+static u32 gen6_pm_ier(struct drm_i915_private *dev_priv)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_crtc *crtc;
-	enum pipe pipe;
-
-	assert_spin_locked(&dev_priv->irq_lock);
-
-	for_each_pipe(dev_priv, pipe) {
-		crtc = to_intel_crtc(dev_priv->pipe_to_crtc_mapping[pipe]);
-
-		if (crtc->cpu_fifo_underrun_disabled)
-			return false;
-	}
-
-	return true;
+	return INTEL_INFO(dev_priv)->gen >= 8 ? GEN8_GT_IER(2) : GEN6_PMIER;
 }
 
 /**
-  * bdw_update_pm_irq - update GT interrupt 2
+  * snb_update_pm_irq - update GEN6_PMIMR
   * @dev_priv: driver private
   * @interrupt_mask: mask of interrupt bits to update
   * @enabled_irq_mask: mask of interrupt bits to enable
-  *
-  * Copied from the snb function, updated with relevant register offsets
   */
-static void bdw_update_pm_irq(struct drm_i915_private *dev_priv,
+static void snb_update_pm_irq(struct drm_i915_private *dev_priv,
 			      uint32_t interrupt_mask,
 			      uint32_t enabled_irq_mask)
 {
@@ -264,144 +231,87 @@ static void bdw_update_pm_irq(struct drm_i915_private *dev_priv,
 
 	assert_spin_locked(&dev_priv->irq_lock);
 
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return;
-
 	new_val = dev_priv->pm_irq_mask;
 	new_val &= ~interrupt_mask;
 	new_val |= (~enabled_irq_mask & interrupt_mask);
 
 	if (new_val != dev_priv->pm_irq_mask) {
 		dev_priv->pm_irq_mask = new_val;
-		I915_WRITE(GEN8_GT_IMR(2), dev_priv->pm_irq_mask);
-		POSTING_READ(GEN8_GT_IMR(2));
+		I915_WRITE(gen6_pm_imr(dev_priv), dev_priv->pm_irq_mask);
+		POSTING_READ(gen6_pm_imr(dev_priv));
 	}
 }
 
-void gen8_enable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask)
+void gen6_enable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask)
 {
-	bdw_update_pm_irq(dev_priv, mask, mask);
-}
+	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
+		return;
 
-void gen8_disable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask)
-{
-	bdw_update_pm_irq(dev_priv, mask, 0);
+	snb_update_pm_irq(dev_priv, mask, mask);
 }
 
-static bool cpt_can_enable_serr_int(struct drm_device *dev)
+static void __gen6_disable_pm_irq(struct drm_i915_private *dev_priv,
+				  uint32_t mask)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	enum pipe pipe;
-	struct intel_crtc *crtc;
-
-	assert_spin_locked(&dev_priv->irq_lock);
-
-	for_each_pipe(dev_priv, pipe) {
-		crtc = to_intel_crtc(dev_priv->pipe_to_crtc_mapping[pipe]);
-
-		if (crtc->pch_fifo_underrun_disabled)
-			return false;
-	}
-
-	return true;
+	snb_update_pm_irq(dev_priv, mask, 0);
 }
 
-void i9xx_check_fifo_underruns(struct drm_device *dev)
+void gen6_disable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_crtc *crtc;
-	unsigned long flags;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-
-	for_each_intel_crtc(dev, crtc) {
-		u32 reg = PIPESTAT(crtc->pipe);
-		u32 pipestat;
-
-		if (crtc->cpu_fifo_underrun_disabled)
-			continue;
-
-		pipestat = I915_READ(reg) & 0xffff0000;
-		if ((pipestat & PIPE_FIFO_UNDERRUN_STATUS) == 0)
-			continue;
-
-		I915_WRITE(reg, pipestat | PIPE_FIFO_UNDERRUN_STATUS);
-		POSTING_READ(reg);
-
-		DRM_ERROR("pipe %c underrun\n", pipe_name(crtc->pipe));
-	}
+	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
+		return;
 
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	__gen6_disable_pm_irq(dev_priv, mask);
 }
 
-static void i9xx_set_fifo_underrun_reporting(struct drm_device *dev,
-					     enum pipe pipe,
-					     bool enable, bool old)
+void gen6_reset_rps_interrupts(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	u32 reg = PIPESTAT(pipe);
-	u32 pipestat = I915_READ(reg) & 0xffff0000;
-
-	assert_spin_locked(&dev_priv->irq_lock);
+	uint32_t reg = gen6_pm_iir(dev_priv);
 
-	if (enable) {
-		I915_WRITE(reg, pipestat | PIPE_FIFO_UNDERRUN_STATUS);
-		POSTING_READ(reg);
-	} else {
-		if (old && pipestat & PIPE_FIFO_UNDERRUN_STATUS)
-			DRM_ERROR("pipe %c underrun\n", pipe_name(pipe));
-	}
+	spin_lock_irq(&dev_priv->irq_lock);
+	I915_WRITE(reg, dev_priv->pm_rps_events);
+	I915_WRITE(reg, dev_priv->pm_rps_events);
+	POSTING_READ(reg);
+	spin_unlock_irq(&dev_priv->irq_lock);
 }
 
-static void ironlake_set_fifo_underrun_reporting(struct drm_device *dev,
-						 enum pipe pipe, bool enable)
+void gen6_enable_rps_interrupts(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	uint32_t bit = (pipe == PIPE_A) ? DE_PIPEA_FIFO_UNDERRUN :
-					  DE_PIPEB_FIFO_UNDERRUN;
 
-	if (enable)
-		ironlake_enable_display_irq(dev_priv, bit);
-	else
-		ironlake_disable_display_irq(dev_priv, bit);
+	spin_lock_irq(&dev_priv->irq_lock);
+	WARN_ON(dev_priv->rps.pm_iir);
+	WARN_ON(I915_READ(gen6_pm_iir(dev_priv)) & dev_priv->pm_rps_events);
+	dev_priv->rps.interrupts_enabled = true;
+	gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
+	spin_unlock_irq(&dev_priv->irq_lock);
 }
 
-static void ivybridge_set_fifo_underrun_reporting(struct drm_device *dev,
-						  enum pipe pipe,
-						  bool enable, bool old)
+void gen6_disable_rps_interrupts(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	if (enable) {
-		I915_WRITE(GEN7_ERR_INT, ERR_INT_FIFO_UNDERRUN(pipe));
 
-		if (!ivb_can_enable_err_int(dev))
-			return;
+	spin_lock_irq(&dev_priv->irq_lock);
+	dev_priv->rps.interrupts_enabled = false;
+	spin_unlock_irq(&dev_priv->irq_lock);
 
-		ironlake_enable_display_irq(dev_priv, DE_ERR_INT_IVB);
-	} else {
-		ironlake_disable_display_irq(dev_priv, DE_ERR_INT_IVB);
+	cancel_work_sync(&dev_priv->rps.work);
 
-		if (old &&
-		    I915_READ(GEN7_ERR_INT) & ERR_INT_FIFO_UNDERRUN(pipe)) {
-			DRM_ERROR("uncleared fifo underrun on pipe %c\n",
-				  pipe_name(pipe));
-		}
-	}
-}
+	spin_lock_irq(&dev_priv->irq_lock);
 
-static void broadwell_set_fifo_underrun_reporting(struct drm_device *dev,
-						  enum pipe pipe, bool enable)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	I915_WRITE(GEN6_PMINTRMSK, INTEL_INFO(dev_priv)->gen >= 8 ?
+		   ~GEN8_PMINTR_REDIRECT_TO_NON_DISP : ~0);
 
-	assert_spin_locked(&dev_priv->irq_lock);
+	__gen6_disable_pm_irq(dev_priv, dev_priv->pm_rps_events);
+	I915_WRITE(gen6_pm_ier(dev_priv), I915_READ(gen6_pm_ier(dev_priv)) &
+				~dev_priv->pm_rps_events);
+	I915_WRITE(gen6_pm_iir(dev_priv), dev_priv->pm_rps_events);
+	I915_WRITE(gen6_pm_iir(dev_priv), dev_priv->pm_rps_events);
 
-	if (enable)
-		dev_priv->de_irq_mask[pipe] &= ~GEN8_PIPE_FIFO_UNDERRUN;
-	else
-		dev_priv->de_irq_mask[pipe] |= GEN8_PIPE_FIFO_UNDERRUN;
-	I915_WRITE(GEN8_DE_PIPE_IMR(pipe), dev_priv->de_irq_mask[pipe]);
-	POSTING_READ(GEN8_DE_PIPE_IMR(pipe));
+	dev_priv->rps.pm_iir = 0;
+
+	spin_unlock_irq(&dev_priv->irq_lock);
 }
 
 /**
@@ -410,9 +320,9 @@ static void broadwell_set_fifo_underrun_reporting(struct drm_device *dev,
  * @interrupt_mask: mask of interrupt bits to update
  * @enabled_irq_mask: mask of interrupt bits to enable
  */
-static void ibx_display_interrupt_update(struct drm_i915_private *dev_priv,
-					 uint32_t interrupt_mask,
-					 uint32_t enabled_irq_mask)
+void ibx_display_interrupt_update(struct drm_i915_private *dev_priv,
+				  uint32_t interrupt_mask,
+				  uint32_t enabled_irq_mask)
 {
 	uint32_t sdeimr = I915_READ(SDEIMR);
 	sdeimr &= ~interrupt_mask;
@@ -426,160 +336,6 @@ static void ibx_display_interrupt_update(struct drm_i915_private *dev_priv,
 	I915_WRITE(SDEIMR, sdeimr);
 	POSTING_READ(SDEIMR);
 }
-#define ibx_enable_display_interrupt(dev_priv, bits) \
-	ibx_display_interrupt_update((dev_priv), (bits), (bits))
-#define ibx_disable_display_interrupt(dev_priv, bits) \
-	ibx_display_interrupt_update((dev_priv), (bits), 0)
-
-static void ibx_set_fifo_underrun_reporting(struct drm_device *dev,
-					    enum transcoder pch_transcoder,
-					    bool enable)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	uint32_t bit = (pch_transcoder == TRANSCODER_A) ?
-		       SDE_TRANSA_FIFO_UNDER : SDE_TRANSB_FIFO_UNDER;
-
-	if (enable)
-		ibx_enable_display_interrupt(dev_priv, bit);
-	else
-		ibx_disable_display_interrupt(dev_priv, bit);
-}
-
-static void cpt_set_fifo_underrun_reporting(struct drm_device *dev,
-					    enum transcoder pch_transcoder,
-					    bool enable, bool old)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	if (enable) {
-		I915_WRITE(SERR_INT,
-			   SERR_INT_TRANS_FIFO_UNDERRUN(pch_transcoder));
-
-		if (!cpt_can_enable_serr_int(dev))
-			return;
-
-		ibx_enable_display_interrupt(dev_priv, SDE_ERROR_CPT);
-	} else {
-		ibx_disable_display_interrupt(dev_priv, SDE_ERROR_CPT);
-
-		if (old && I915_READ(SERR_INT) &
-		    SERR_INT_TRANS_FIFO_UNDERRUN(pch_transcoder)) {
-			DRM_ERROR("uncleared pch fifo underrun on pch transcoder %c\n",
-				  transcoder_name(pch_transcoder));
-		}
-	}
-}
-
-/**
- * intel_set_cpu_fifo_underrun_reporting - enable/disable FIFO underrun messages
- * @dev: drm device
- * @pipe: pipe
- * @enable: true if we want to report FIFO underrun errors, false otherwise
- *
- * This function makes us disable or enable CPU fifo underruns for a specific
- * pipe. Notice that on some Gens (e.g. IVB, HSW), disabling FIFO underrun
- * reporting for one pipe may also disable all the other CPU error interruts for
- * the other pipes, due to the fact that there's just one interrupt mask/enable
- * bit for all the pipes.
- *
- * Returns the previous state of underrun reporting.
- */
-static bool __intel_set_cpu_fifo_underrun_reporting(struct drm_device *dev,
-						    enum pipe pipe, bool enable)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
-	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-	bool old;
-
-	assert_spin_locked(&dev_priv->irq_lock);
-
-	old = !intel_crtc->cpu_fifo_underrun_disabled;
-	intel_crtc->cpu_fifo_underrun_disabled = !enable;
-
-	if (HAS_GMCH_DISPLAY(dev))
-		i9xx_set_fifo_underrun_reporting(dev, pipe, enable, old);
-	else if (IS_GEN5(dev) || IS_GEN6(dev))
-		ironlake_set_fifo_underrun_reporting(dev, pipe, enable);
-	else if (IS_GEN7(dev))
-		ivybridge_set_fifo_underrun_reporting(dev, pipe, enable, old);
-	else if (IS_GEN8(dev))
-		broadwell_set_fifo_underrun_reporting(dev, pipe, enable);
-
-	return old;
-}
-
-bool intel_set_cpu_fifo_underrun_reporting(struct drm_device *dev,
-					   enum pipe pipe, bool enable)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long flags;
-	bool ret;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	ret = __intel_set_cpu_fifo_underrun_reporting(dev, pipe, enable);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return ret;
-}
-
-static bool __cpu_fifo_underrun_reporting_enabled(struct drm_device *dev,
-						  enum pipe pipe)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
-	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-
-	return !intel_crtc->cpu_fifo_underrun_disabled;
-}
-
-/**
- * intel_set_pch_fifo_underrun_reporting - enable/disable FIFO underrun messages
- * @dev: drm device
- * @pch_transcoder: the PCH transcoder (same as pipe on IVB and older)
- * @enable: true if we want to report FIFO underrun errors, false otherwise
- *
- * This function makes us disable or enable PCH fifo underruns for a specific
- * PCH transcoder. Notice that on some PCHs (e.g. CPT/PPT), disabling FIFO
- * underrun reporting for one transcoder may also disable all the other PCH
- * error interruts for the other transcoders, due to the fact that there's just
- * one interrupt mask/enable bit for all the transcoders.
- *
- * Returns the previous state of underrun reporting.
- */
-bool intel_set_pch_fifo_underrun_reporting(struct drm_device *dev,
-					   enum transcoder pch_transcoder,
-					   bool enable)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pch_transcoder];
-	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-	unsigned long flags;
-	bool old;
-
-	/*
-	 * NOTE: Pre-LPT has a fixed cpu pipe -> pch transcoder mapping, but LPT
-	 * has only one pch transcoder A that all pipes can use. To avoid racy
-	 * pch transcoder -> pipe lookups from interrupt code simply store the
-	 * underrun statistics in crtc A. Since we never expose this anywhere
-	 * nor use it outside of the fifo underrun code here using the "wrong"
-	 * crtc on LPT won't cause issues.
-	 */
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-
-	old = !intel_crtc->pch_fifo_underrun_disabled;
-	intel_crtc->pch_fifo_underrun_disabled = !enable;
-
-	if (HAS_PCH_IBX(dev))
-		ibx_set_fifo_underrun_reporting(dev, pch_transcoder, enable);
-	else
-		cpt_set_fifo_underrun_reporting(dev, pch_transcoder, enable, old);
-
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-	return old;
-}
-
 
 static void
 __i915_enable_pipestat(struct drm_i915_private *dev_priv, enum pipe pipe,
@@ -589,6 +345,7 @@ __i915_enable_pipestat(struct drm_i915_private *dev_priv, enum pipe pipe,
 	u32 pipestat = I915_READ(reg) & PIPESTAT_INT_ENABLE_MASK;
 
 	assert_spin_locked(&dev_priv->irq_lock);
+	WARN_ON(!intel_irqs_enabled(dev_priv));
 
 	if (WARN_ONCE(enable_mask & ~PIPESTAT_INT_ENABLE_MASK ||
 		      status_mask & ~PIPESTAT_INT_STATUS_MASK,
@@ -615,6 +372,7 @@ __i915_disable_pipestat(struct drm_i915_private *dev_priv, enum pipe pipe,
 	u32 pipestat = I915_READ(reg) & PIPESTAT_INT_ENABLE_MASK;
 
 	assert_spin_locked(&dev_priv->irq_lock);
+	WARN_ON(!intel_irqs_enabled(dev_priv));
 
 	if (WARN_ONCE(enable_mask & ~PIPESTAT_INT_ENABLE_MASK ||
 		      status_mask & ~PIPESTAT_INT_STATUS_MASK,
@@ -694,19 +452,18 @@ i915_disable_pipestat(struct drm_i915_private *dev_priv, enum pipe pipe,
 static void i915_enable_asle_pipestat(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long irqflags;
 
 	if (!dev_priv->opregion.asle || !IS_MOBILE(dev))
 		return;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+	spin_lock_irq(&dev_priv->irq_lock);
 
 	i915_enable_pipestat(dev_priv, PIPE_B, PIPE_LEGACY_BLC_EVENT_STATUS);
 	if (INTEL_INFO(dev)->gen >= 4)
 		i915_enable_pipestat(dev_priv, PIPE_A,
 				     PIPE_LEGACY_BLC_EVENT_STATUS);
 
-	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+	spin_unlock_irq(&dev_priv->irq_lock);
 }
 
 /**
@@ -1094,18 +851,17 @@ static void i915_digport_work_func(struct work_struct *work)
 {
 	struct drm_i915_private *dev_priv =
 		container_of(work, struct drm_i915_private, dig_port_work);
-	unsigned long irqflags;
 	u32 long_port_mask, short_port_mask;
 	struct intel_digital_port *intel_dig_port;
 	int i, ret;
 	u32 old_bits = 0;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+	spin_lock_irq(&dev_priv->irq_lock);
 	long_port_mask = dev_priv->long_hpd_port_mask;
 	dev_priv->long_hpd_port_mask = 0;
 	short_port_mask = dev_priv->short_hpd_port_mask;
 	dev_priv->short_hpd_port_mask = 0;
-	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+	spin_unlock_irq(&dev_priv->irq_lock);
 
 	for (i = 0; i < I915_MAX_PORTS; i++) {
 		bool valid = false;
@@ -1130,9 +886,9 @@ static void i915_digport_work_func(struct work_struct *work)
 	}
 
 	if (old_bits) {
-		spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+		spin_lock_irq(&dev_priv->irq_lock);
 		dev_priv->hpd_event_bits |= old_bits;
-		spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+		spin_unlock_irq(&dev_priv->irq_lock);
 		schedule_work(&dev_priv->hotplug_work);
 	}
 }
@@ -1151,7 +907,6 @@ static void i915_hotplug_work_func(struct work_struct *work)
 	struct intel_connector *intel_connector;
 	struct intel_encoder *intel_encoder;
 	struct drm_connector *connector;
-	unsigned long irqflags;
 	bool hpd_disabled = false;
 	bool changed = false;
 	u32 hpd_event_bits;
@@ -1159,7 +914,7 @@ static void i915_hotplug_work_func(struct work_struct *work)
 	mutex_lock(&mode_config->mutex);
 	DRM_DEBUG_KMS("running encoder hotplug functions\n");
 
-	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+	spin_lock_irq(&dev_priv->irq_lock);
 
 	hpd_event_bits = dev_priv->hpd_event_bits;
 	dev_priv->hpd_event_bits = 0;
@@ -1193,7 +948,7 @@ static void i915_hotplug_work_func(struct work_struct *work)
 				 msecs_to_jiffies(I915_REENABLE_HOTPLUG_DELAY));
 	}
 
-	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+	spin_unlock_irq(&dev_priv->irq_lock);
 
 	list_for_each_entry(connector, &mode_config->connector_list, head) {
 		intel_connector = to_intel_connector(connector);
@@ -1260,11 +1015,7 @@ static void notify_ring(struct drm_device *dev,
 
 	trace_i915_gem_request_complete(ring);
 
-	if (drm_core_check_feature(dev, DRIVER_MODESET))
-		intel_notify_mmio_flip(ring);
-
 	wake_up_all(&ring->irq_queue);
-	i915_queue_hangcheck(dev);
 }
 
 static u32 vlv_c0_residency(struct drm_i915_private *dev_priv,
@@ -1400,14 +1151,15 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	int new_delay, adj;
 
 	spin_lock_irq(&dev_priv->irq_lock);
+	/* Speed up work cancelation during disabling rps interrupts. */
+	if (!dev_priv->rps.interrupts_enabled) {
+		spin_unlock_irq(&dev_priv->irq_lock);
+		return;
+	}
 	pm_iir = dev_priv->rps.pm_iir;
 	dev_priv->rps.pm_iir = 0;
-	if (INTEL_INFO(dev_priv->dev)->gen >= 8)
-		gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
-	else {
-		/* Make sure not to corrupt PMIMR state used by ringbuffer */
-		gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
-	}
+	/* Make sure not to corrupt PMIMR state used by ringbuffer on GEN6 */
+	gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
 	spin_unlock_irq(&dev_priv->irq_lock);
 
 	/* Make sure we didn't queue anything we're not going to process. */
@@ -1488,7 +1240,6 @@ static void ivybridge_parity_work(struct work_struct *work)
 	u32 error_status, row, bank, subbank;
 	char *parity_event[6];
 	uint32_t misccpctl;
-	unsigned long flags;
 	uint8_t slice = 0;
 
 	/* We must turn off DOP level clock gating to access the L3 registers.
@@ -1547,9 +1298,9 @@ static void ivybridge_parity_work(struct work_struct *work)
 
 out:
 	WARN_ON(dev_priv->l3_parity.which_slice);
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
+	spin_lock_irq(&dev_priv->irq_lock);
 	gen5_enable_gt_irq(dev_priv, GT_PARITY_ERROR(dev_priv->dev));
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	spin_unlock_irq(&dev_priv->irq_lock);
 
 	mutex_unlock(&dev_priv->dev->struct_mutex);
 }
@@ -1601,28 +1352,13 @@ static void snb_gt_irq_handler(struct drm_device *dev,
 
 	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
 		      GT_BSD_CS_ERROR_INTERRUPT |
-		      GT_RENDER_CS_MASTER_ERROR_INTERRUPT)) {
-		i915_handle_error(dev, false, "GT error interrupt 0x%08x",
-				  gt_iir);
-	}
+		      GT_RENDER_CS_MASTER_ERROR_INTERRUPT))
+		DRM_DEBUG("Command parser error, gt_iir 0x%08x\n", gt_iir);
 
 	if (gt_iir & GT_PARITY_ERROR(dev))
 		ivybridge_parity_error_irq_handler(dev, gt_iir);
 }
 
-static void gen8_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
-{
-	if ((pm_iir & dev_priv->pm_rps_events) == 0)
-		return;
-
-	spin_lock(&dev_priv->irq_lock);
-	dev_priv->rps.pm_iir |= pm_iir & dev_priv->pm_rps_events;
-	gen8_disable_pm_irq(dev_priv, pm_iir & dev_priv->pm_rps_events);
-	spin_unlock(&dev_priv->irq_lock);
-
-	queue_work(dev_priv->wq, &dev_priv->rps.work);
-}
-
 static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 				       struct drm_i915_private *dev_priv,
 				       u32 master_ctl)
@@ -1684,7 +1420,7 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 			I915_WRITE(GEN8_GT_IIR(2),
 				   tmp & dev_priv->pm_rps_events);
 			ret = IRQ_HANDLED;
-			gen8_rps_irq_handler(dev_priv, tmp);
+			gen6_rps_irq_handler(dev_priv, tmp);
 		} else
 			DRM_ERROR("The master control interrupt lied (PM)!\n");
 	}
@@ -1898,7 +1634,7 @@ static void display_pipe_crc_irq_handler(struct drm_device *dev, enum pipe pipe,
 
 	if (!pipe_crc->entries) {
 		spin_unlock(&pipe_crc->lock);
-		DRM_ERROR("spurious interrupt\n");
+		DRM_DEBUG_KMS("spurious interrupt\n");
 		return;
 	}
 
@@ -1984,24 +1720,30 @@ static void i9xx_pipe_crc_irq_handler(struct drm_device *dev, enum pipe pipe)
  * the work queue. */
 static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
 {
+	/* TODO: RPS on GEN9+ is not supported yet. */
+	if (WARN_ONCE(INTEL_INFO(dev_priv)->gen >= 9,
+		      "GEN9+: unexpected RPS IRQ\n"))
+		return;
+
 	if (pm_iir & dev_priv->pm_rps_events) {
 		spin_lock(&dev_priv->irq_lock);
-		dev_priv->rps.pm_iir |= pm_iir & dev_priv->pm_rps_events;
 		gen6_disable_pm_irq(dev_priv, pm_iir & dev_priv->pm_rps_events);
+		if (dev_priv->rps.interrupts_enabled) {
+			dev_priv->rps.pm_iir |= pm_iir & dev_priv->pm_rps_events;
+			queue_work(dev_priv->wq, &dev_priv->rps.work);
+		}
 		spin_unlock(&dev_priv->irq_lock);
-
-		queue_work(dev_priv->wq, &dev_priv->rps.work);
 	}
 
+	if (INTEL_INFO(dev_priv)->gen >= 8)
+		return;
+
 	if (HAS_VEBOX(dev_priv->dev)) {
 		if (pm_iir & PM_VEBOX_USER_INTERRUPT)
 			notify_ring(dev_priv->dev, &dev_priv->ring[VECS]);
 
-		if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT) {
-			i915_handle_error(dev_priv->dev, false,
-					  "VEBOX CS error interrupt 0x%08x",
-					  pm_iir);
-		}
+		if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
+			DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
 	}
 }
 
@@ -2031,9 +1773,9 @@ static void valleyview_pipestat_irq_handler(struct drm_device *dev, u32 iir)
 		 * we need to be careful that we only handle what we want to
 		 * handle.
 		 */
-		mask = 0;
-		if (__cpu_fifo_underrun_reporting_enabled(dev, pipe))
-			mask |= PIPE_FIFO_UNDERRUN_STATUS;
+
+		/* fifo underruns are filterered in the underrun handler. */
+		mask = PIPE_FIFO_UNDERRUN_STATUS;
 
 		switch (pipe) {
 		case PIPE_A:
@@ -2078,9 +1820,8 @@ static void valleyview_pipestat_irq_handler(struct drm_device *dev, u32 iir)
 		if (pipe_stats[pipe] & PIPE_CRC_DONE_INTERRUPT_STATUS)
 			i9xx_pipe_crc_irq_handler(dev, pipe);
 
-		if (pipe_stats[pipe] & PIPE_FIFO_UNDERRUN_STATUS &&
-		    intel_set_cpu_fifo_underrun_reporting(dev, pipe, false))
-			DRM_ERROR("pipe %c underrun\n", pipe_name(pipe));
+		if (pipe_stats[pipe] & PIPE_FIFO_UNDERRUN_STATUS)
+			intel_cpu_fifo_underrun_irq_handler(dev_priv, pipe);
 	}
 
 	if (pipe_stats[0] & PIPE_GMBUS_INTERRUPT_STATUS)
@@ -2247,14 +1988,10 @@ static void ibx_irq_handler(struct drm_device *dev, u32 pch_iir)
 		DRM_DEBUG_DRIVER("PCH transcoder CRC error interrupt\n");
 
 	if (pch_iir & SDE_TRANSA_FIFO_UNDER)
-		if (intel_set_pch_fifo_underrun_reporting(dev, TRANSCODER_A,
-							  false))
-			DRM_ERROR("PCH transcoder A FIFO underrun\n");
+		intel_pch_fifo_underrun_irq_handler(dev_priv, TRANSCODER_A);
 
 	if (pch_iir & SDE_TRANSB_FIFO_UNDER)
-		if (intel_set_pch_fifo_underrun_reporting(dev, TRANSCODER_B,
-							  false))
-			DRM_ERROR("PCH transcoder B FIFO underrun\n");
+		intel_pch_fifo_underrun_irq_handler(dev_priv, TRANSCODER_B);
 }
 
 static void ivb_err_int_handler(struct drm_device *dev)
@@ -2267,12 +2004,8 @@ static void ivb_err_int_handler(struct drm_device *dev)
 		DRM_ERROR("Poison interrupt\n");
 
 	for_each_pipe(dev_priv, pipe) {
-		if (err_int & ERR_INT_FIFO_UNDERRUN(pipe)) {
-			if (intel_set_cpu_fifo_underrun_reporting(dev, pipe,
-								  false))
-				DRM_ERROR("Pipe %c FIFO underrun\n",
-					  pipe_name(pipe));
-		}
+		if (err_int & ERR_INT_FIFO_UNDERRUN(pipe))
+			intel_cpu_fifo_underrun_irq_handler(dev_priv, pipe);
 
 		if (err_int & ERR_INT_PIPE_CRC_DONE(pipe)) {
 			if (IS_IVYBRIDGE(dev))
@@ -2294,19 +2027,13 @@ static void cpt_serr_int_handler(struct drm_device *dev)
 		DRM_ERROR("PCH poison interrupt\n");
 
 	if (serr_int & SERR_INT_TRANS_A_FIFO_UNDERRUN)
-		if (intel_set_pch_fifo_underrun_reporting(dev, TRANSCODER_A,
-							  false))
-			DRM_ERROR("PCH transcoder A FIFO underrun\n");
+		intel_pch_fifo_underrun_irq_handler(dev_priv, TRANSCODER_A);
 
 	if (serr_int & SERR_INT_TRANS_B_FIFO_UNDERRUN)
-		if (intel_set_pch_fifo_underrun_reporting(dev, TRANSCODER_B,
-							  false))
-			DRM_ERROR("PCH transcoder B FIFO underrun\n");
+		intel_pch_fifo_underrun_irq_handler(dev_priv, TRANSCODER_B);
 
 	if (serr_int & SERR_INT_TRANS_C_FIFO_UNDERRUN)
-		if (intel_set_pch_fifo_underrun_reporting(dev, TRANSCODER_C,
-							  false))
-			DRM_ERROR("PCH transcoder C FIFO underrun\n");
+		intel_pch_fifo_underrun_irq_handler(dev_priv, TRANSCODER_C);
 
 	I915_WRITE(SERR_INT, serr_int);
 }
@@ -2372,9 +2099,7 @@ static void ilk_display_irq_handler(struct drm_device *dev, u32 de_iir)
 			intel_check_page_flip(dev, pipe);
 
 		if (de_iir & DE_PIPE_FIFO_UNDERRUN(pipe))
-			if (intel_set_cpu_fifo_underrun_reporting(dev, pipe, false))
-				DRM_ERROR("Pipe %c FIFO underrun\n",
-					  pipe_name(pipe));
+			intel_cpu_fifo_underrun_irq_handler(dev_priv, pipe);
 
 		if (de_iir & DE_PIPE_CRC_DONE(pipe))
 			i9xx_pipe_crc_irq_handler(dev, pipe);
@@ -2524,6 +2249,11 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 	irqreturn_t ret = IRQ_NONE;
 	uint32_t tmp = 0;
 	enum pipe pipe;
+	u32 aux_mask = GEN8_AUX_CHANNEL_A;
+
+	if (IS_GEN9(dev))
+		aux_mask |=  GEN9_AUX_CHANNEL_B | GEN9_AUX_CHANNEL_C |
+			GEN9_AUX_CHANNEL_D;
 
 	master_ctl = I915_READ(GEN8_MASTER_IRQ);
 	master_ctl &= ~GEN8_MASTER_IRQ_CONTROL;
@@ -2556,7 +2286,8 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 		if (tmp) {
 			I915_WRITE(GEN8_DE_PORT_IIR, tmp);
 			ret = IRQ_HANDLED;
-			if (tmp & GEN8_AUX_CHANNEL_A)
+
+			if (tmp & aux_mask)
 				dp_aux_irq_handler(dev);
 			else
 				DRM_ERROR("Unexpected DE Port interrupt\n");
@@ -2566,7 +2297,7 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 	}
 
 	for_each_pipe(dev_priv, pipe) {
-		uint32_t pipe_iir;
+		uint32_t pipe_iir, flip_done = 0, fault_errors = 0;
 
 		if (!(master_ctl & GEN8_DE_PIPE_IRQ(pipe)))
 			continue;
@@ -2575,11 +2306,17 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 		if (pipe_iir) {
 			ret = IRQ_HANDLED;
 			I915_WRITE(GEN8_DE_PIPE_IIR(pipe), pipe_iir);
+
 			if (pipe_iir & GEN8_PIPE_VBLANK &&
 			    intel_pipe_handle_vblank(dev, pipe))
 				intel_check_page_flip(dev, pipe);
 
-			if (pipe_iir & GEN8_PIPE_PRIMARY_FLIP_DONE) {
+			if (IS_GEN9(dev))
+				flip_done = pipe_iir & GEN9_PIPE_PLANE1_FLIP_DONE;
+			else
+				flip_done = pipe_iir & GEN8_PIPE_PRIMARY_FLIP_DONE;
+
+			if (flip_done) {
 				intel_prepare_page_flip(dev, pipe);
 				intel_finish_page_flip_plane(dev, pipe);
 			}
@@ -2587,18 +2324,20 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 			if (pipe_iir & GEN8_PIPE_CDCLK_CRC_DONE)
 				hsw_pipe_crc_irq_handler(dev, pipe);
 
-			if (pipe_iir & GEN8_PIPE_FIFO_UNDERRUN) {
-				if (intel_set_cpu_fifo_underrun_reporting(dev, pipe,
-									  false))
-					DRM_ERROR("Pipe %c FIFO underrun\n",
-						  pipe_name(pipe));
-			}
+			if (pipe_iir & GEN8_PIPE_FIFO_UNDERRUN)
+				intel_cpu_fifo_underrun_irq_handler(dev_priv,
+								    pipe);
+
 
-			if (pipe_iir & GEN8_DE_PIPE_IRQ_FAULT_ERRORS) {
+			if (IS_GEN9(dev))
+				fault_errors = pipe_iir & GEN9_DE_PIPE_IRQ_FAULT_ERRORS;
+			else
+				fault_errors = pipe_iir & GEN8_DE_PIPE_IRQ_FAULT_ERRORS;
+
+			if (fault_errors)
 				DRM_ERROR("Fault errors on pipe %c\n: 0x%08x",
 					  pipe_name(pipe),
 					  pipe_iir & GEN8_DE_PIPE_IRQ_FAULT_ERRORS);
-			}
 		} else
 			DRM_ERROR("The master control interrupt lied (DE PIPE)!\n");
 	}
@@ -2697,6 +2436,9 @@ static void i915_error_work_func(struct work_struct *work)
 		 * simulated reset via debugs, so get an RPM reference.
 		 */
 		intel_runtime_pm_get(dev_priv);
+
+		intel_prepare_reset(dev);
+
 		/*
 		 * All state reset _must_ be completed before we update the
 		 * reset counter, for otherwise waiters might miss the reset
@@ -2705,7 +2447,7 @@ static void i915_error_work_func(struct work_struct *work)
 		 */
 		ret = i915_reset(dev);
 
-		intel_display_handle_reset(dev);
+		intel_finish_reset(dev);
 
 		intel_runtime_pm_put(dev_priv);
 
@@ -3330,10 +3072,15 @@ static void i915_hangcheck_elapsed(unsigned long data)
 void i915_queue_hangcheck(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct timer_list *timer = &dev_priv->gpu_error.hangcheck_timer;
+
 	if (!i915.enable_hangcheck)
 		return;
 
-	mod_timer(&dev_priv->gpu_error.hangcheck_timer,
+	/* Don't continually defer the hangcheck, but make sure it is active */
+	if (timer_pending(timer))
+		return;
+	mod_timer(timer,
 		  round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES));
 }
 
@@ -3396,10 +3143,22 @@ static void ironlake_irq_reset(struct drm_device *dev)
 	ibx_irq_reset(dev);
 }
 
+static void vlv_display_irq_reset(struct drm_i915_private *dev_priv)
+{
+	enum pipe pipe;
+
+	I915_WRITE(PORT_HOTPLUG_EN, 0);
+	I915_WRITE(PORT_HOTPLUG_STAT, I915_READ(PORT_HOTPLUG_STAT));
+
+	for_each_pipe(dev_priv, pipe)
+		I915_WRITE(PIPESTAT(pipe), 0xffff);
+
+	GEN5_IRQ_RESET(VLV_);
+}
+
 static void valleyview_irq_preinstall(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	int pipe;
 
 	/* VLV magic */
 	I915_WRITE(VLV_IMR, 0);
@@ -3407,22 +3166,11 @@ static void valleyview_irq_preinstall(struct drm_device *dev)
 	I915_WRITE(RING_IMR(GEN6_BSD_RING_BASE), 0);
 	I915_WRITE(RING_IMR(BLT_RING_BASE), 0);
 
-	/* and GT */
-	I915_WRITE(GTIIR, I915_READ(GTIIR));
-	I915_WRITE(GTIIR, I915_READ(GTIIR));
-
 	gen5_gt_irq_reset(dev);
 
-	I915_WRITE(DPINVGTT, 0xff);
+	I915_WRITE(DPINVGTT, DPINVGTT_STATUS_MASK);
 
-	I915_WRITE(PORT_HOTPLUG_EN, 0);
-	I915_WRITE(PORT_HOTPLUG_STAT, I915_READ(PORT_HOTPLUG_STAT));
-	for_each_pipe(dev_priv, pipe)
-		I915_WRITE(PIPESTAT(pipe), 0xffff);
-	I915_WRITE(VLV_IIR, 0xffffffff);
-	I915_WRITE(VLV_IMR, 0xffffffff);
-	I915_WRITE(VLV_IER, 0x0);
-	POSTING_READ(VLV_IER);
+	vlv_display_irq_reset(dev_priv);
 }
 
 static void gen8_gt_irq_reset(struct drm_i915_private *dev_priv)
@@ -3444,8 +3192,8 @@ static void gen8_irq_reset(struct drm_device *dev)
 	gen8_gt_irq_reset(dev_priv);
 
 	for_each_pipe(dev_priv, pipe)
-		if (intel_display_power_enabled(dev_priv,
-						POWER_DOMAIN_PIPE(pipe)))
+		if (intel_display_power_is_enabled(dev_priv,
+						   POWER_DOMAIN_PIPE(pipe)))
 			GEN8_IRQ_RESET_NDX(DE_PIPE, pipe);
 
 	GEN5_IRQ_RESET(GEN8_DE_PORT_);
@@ -3457,21 +3205,19 @@ static void gen8_irq_reset(struct drm_device *dev)
 
 void gen8_irq_power_well_post_enable(struct drm_i915_private *dev_priv)
 {
-	unsigned long irqflags;
 	uint32_t extra_ier = GEN8_PIPE_VBLANK | GEN8_PIPE_FIFO_UNDERRUN;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+	spin_lock_irq(&dev_priv->irq_lock);
 	GEN8_IRQ_INIT_NDX(DE_PIPE, PIPE_B, dev_priv->de_irq_mask[PIPE_B],
 			  ~dev_priv->de_irq_mask[PIPE_B] | extra_ier);
 	GEN8_IRQ_INIT_NDX(DE_PIPE, PIPE_C, dev_priv->de_irq_mask[PIPE_C],
 			  ~dev_priv->de_irq_mask[PIPE_C] | extra_ier);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+	spin_unlock_irq(&dev_priv->irq_lock);
 }
 
 static void cherryview_irq_preinstall(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	int pipe;
 
 	I915_WRITE(GEN8_MASTER_IRQ, 0);
 	POSTING_READ(GEN8_MASTER_IRQ);
@@ -3480,20 +3226,9 @@ static void cherryview_irq_preinstall(struct drm_device *dev)
 
 	GEN5_IRQ_RESET(GEN8_PCU_);
 
-	POSTING_READ(GEN8_PCU_IIR);
-
 	I915_WRITE(DPINVGTT, DPINVGTT_STATUS_MASK_CHV);
 
-	I915_WRITE(PORT_HOTPLUG_EN, 0);
-	I915_WRITE(PORT_HOTPLUG_STAT, I915_READ(PORT_HOTPLUG_STAT));
-
-	for_each_pipe(dev_priv, pipe)
-		I915_WRITE(PIPESTAT(pipe), 0xffff);
-
-	I915_WRITE(VLV_IMR, 0xffffffff);
-	I915_WRITE(VLV_IER, 0x0);
-	I915_WRITE(VLV_IIR, 0xffffffff);
-	POSTING_READ(VLV_IIR);
+	vlv_display_irq_reset(dev_priv);
 }
 
 static void ibx_hpd_irq_setup(struct drm_device *dev)
@@ -3584,7 +3319,6 @@ static void gen5_gt_irq_postinstall(struct drm_device *dev)
 
 static int ironlake_irq_postinstall(struct drm_device *dev)
 {
-	unsigned long irqflags;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 display_mask, extra_mask;
 
@@ -3623,9 +3357,9 @@ static int ironlake_irq_postinstall(struct drm_device *dev)
 		 * spinlocking not required here for correctness since interrupt
 		 * setup is guaranteed to run in single-threaded context. But we
 		 * need it to make the assert_spin_locked happy. */
-		spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+		spin_lock_irq(&dev_priv->irq_lock);
 		ironlake_enable_display_irq(dev_priv, DE_PCU_EVENT);
-		spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+		spin_unlock_irq(&dev_priv->irq_lock);
 	}
 
 	return 0;
@@ -3635,45 +3369,51 @@ static void valleyview_display_irqs_install(struct drm_i915_private *dev_priv)
 {
 	u32 pipestat_mask;
 	u32 iir_mask;
+	enum pipe pipe;
 
 	pipestat_mask = PIPESTAT_INT_STATUS_MASK |
 			PIPE_FIFO_UNDERRUN_STATUS;
 
-	I915_WRITE(PIPESTAT(PIPE_A), pipestat_mask);
-	I915_WRITE(PIPESTAT(PIPE_B), pipestat_mask);
+	for_each_pipe(dev_priv, pipe)
+		I915_WRITE(PIPESTAT(pipe), pipestat_mask);
 	POSTING_READ(PIPESTAT(PIPE_A));
 
 	pipestat_mask = PLANE_FLIP_DONE_INT_STATUS_VLV |
 			PIPE_CRC_DONE_INTERRUPT_STATUS;
 
-	i915_enable_pipestat(dev_priv, PIPE_A, pipestat_mask |
-					       PIPE_GMBUS_INTERRUPT_STATUS);
-	i915_enable_pipestat(dev_priv, PIPE_B, pipestat_mask);
+	i915_enable_pipestat(dev_priv, PIPE_A, PIPE_GMBUS_INTERRUPT_STATUS);
+	for_each_pipe(dev_priv, pipe)
+		      i915_enable_pipestat(dev_priv, pipe, pipestat_mask);
 
 	iir_mask = I915_DISPLAY_PORT_INTERRUPT |
 		   I915_DISPLAY_PIPE_A_EVENT_INTERRUPT |
 		   I915_DISPLAY_PIPE_B_EVENT_INTERRUPT;
+	if (IS_CHERRYVIEW(dev_priv))
+		iir_mask |= I915_DISPLAY_PIPE_C_EVENT_INTERRUPT;
 	dev_priv->irq_mask &= ~iir_mask;
 
 	I915_WRITE(VLV_IIR, iir_mask);
 	I915_WRITE(VLV_IIR, iir_mask);
-	I915_WRITE(VLV_IMR, dev_priv->irq_mask);
 	I915_WRITE(VLV_IER, ~dev_priv->irq_mask);
-	POSTING_READ(VLV_IER);
+	I915_WRITE(VLV_IMR, dev_priv->irq_mask);
+	POSTING_READ(VLV_IMR);
 }
 
 static void valleyview_display_irqs_uninstall(struct drm_i915_private *dev_priv)
 {
 	u32 pipestat_mask;
 	u32 iir_mask;
+	enum pipe pipe;
 
 	iir_mask = I915_DISPLAY_PORT_INTERRUPT |
 		   I915_DISPLAY_PIPE_A_EVENT_INTERRUPT |
 		   I915_DISPLAY_PIPE_B_EVENT_INTERRUPT;
+	if (IS_CHERRYVIEW(dev_priv))
+		iir_mask |= I915_DISPLAY_PIPE_C_EVENT_INTERRUPT;
 
 	dev_priv->irq_mask |= iir_mask;
-	I915_WRITE(VLV_IER, ~dev_priv->irq_mask);
 	I915_WRITE(VLV_IMR, dev_priv->irq_mask);
+	I915_WRITE(VLV_IER, ~dev_priv->irq_mask);
 	I915_WRITE(VLV_IIR, iir_mask);
 	I915_WRITE(VLV_IIR, iir_mask);
 	POSTING_READ(VLV_IIR);
@@ -3681,14 +3421,15 @@ static void valleyview_display_irqs_uninstall(struct drm_i915_private *dev_priv)
 	pipestat_mask = PLANE_FLIP_DONE_INT_STATUS_VLV |
 			PIPE_CRC_DONE_INTERRUPT_STATUS;
 
-	i915_disable_pipestat(dev_priv, PIPE_A, pipestat_mask |
-					        PIPE_GMBUS_INTERRUPT_STATUS);
-	i915_disable_pipestat(dev_priv, PIPE_B, pipestat_mask);
+	i915_disable_pipestat(dev_priv, PIPE_A, PIPE_GMBUS_INTERRUPT_STATUS);
+	for_each_pipe(dev_priv, pipe)
+		i915_disable_pipestat(dev_priv, pipe, pipestat_mask);
 
 	pipestat_mask = PIPESTAT_INT_STATUS_MASK |
 			PIPE_FIFO_UNDERRUN_STATUS;
-	I915_WRITE(PIPESTAT(PIPE_A), pipestat_mask);
-	I915_WRITE(PIPESTAT(PIPE_B), pipestat_mask);
+
+	for_each_pipe(dev_priv, pipe)
+		I915_WRITE(PIPESTAT(pipe), pipestat_mask);
 	POSTING_READ(PIPESTAT(PIPE_A));
 }
 
@@ -3701,7 +3442,7 @@ void valleyview_enable_display_irqs(struct drm_i915_private *dev_priv)
 
 	dev_priv->display_irqs_enabled = true;
 
-	if (dev_priv->dev->irq_enabled)
+	if (intel_irqs_enabled(dev_priv))
 		valleyview_display_irqs_install(dev_priv);
 }
 
@@ -3714,34 +3455,36 @@ void valleyview_disable_display_irqs(struct drm_i915_private *dev_priv)
 
 	dev_priv->display_irqs_enabled = false;
 
-	if (dev_priv->dev->irq_enabled)
+	if (intel_irqs_enabled(dev_priv))
 		valleyview_display_irqs_uninstall(dev_priv);
 }
 
-static int valleyview_irq_postinstall(struct drm_device *dev)
+static void vlv_display_irq_postinstall(struct drm_i915_private *dev_priv)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long irqflags;
-
 	dev_priv->irq_mask = ~0;
 
 	I915_WRITE(PORT_HOTPLUG_EN, 0);
 	POSTING_READ(PORT_HOTPLUG_EN);
 
-	I915_WRITE(VLV_IMR, dev_priv->irq_mask);
-	I915_WRITE(VLV_IER, ~dev_priv->irq_mask);
 	I915_WRITE(VLV_IIR, 0xffffffff);
-	POSTING_READ(VLV_IER);
+	I915_WRITE(VLV_IIR, 0xffffffff);
+	I915_WRITE(VLV_IER, ~dev_priv->irq_mask);
+	I915_WRITE(VLV_IMR, dev_priv->irq_mask);
+	POSTING_READ(VLV_IMR);
 
 	/* Interrupt setup is already guaranteed to be single-threaded, this is
 	 * just to make the assert_spin_locked check happy. */
-	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+	spin_lock_irq(&dev_priv->irq_lock);
 	if (dev_priv->display_irqs_enabled)
 		valleyview_display_irqs_install(dev_priv);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+	spin_unlock_irq(&dev_priv->irq_lock);
+}
 
-	I915_WRITE(VLV_IIR, 0xffffffff);
-	I915_WRITE(VLV_IIR, 0xffffffff);
+static int valleyview_irq_postinstall(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	vlv_display_irq_postinstall(dev_priv);
 
 	gen5_gt_irq_postinstall(dev);
 
@@ -3783,24 +3526,35 @@ static void gen8_gt_irq_postinstall(struct drm_i915_private *dev_priv)
 
 static void gen8_de_irq_postinstall(struct drm_i915_private *dev_priv)
 {
-	uint32_t de_pipe_masked = GEN8_PIPE_PRIMARY_FLIP_DONE |
-		GEN8_PIPE_CDCLK_CRC_DONE |
-		GEN8_DE_PIPE_IRQ_FAULT_ERRORS;
-	uint32_t de_pipe_enables = de_pipe_masked | GEN8_PIPE_VBLANK |
-		GEN8_PIPE_FIFO_UNDERRUN;
+	uint32_t de_pipe_masked = GEN8_PIPE_CDCLK_CRC_DONE;
+	uint32_t de_pipe_enables;
 	int pipe;
+	u32 aux_en = GEN8_AUX_CHANNEL_A;
+
+	if (IS_GEN9(dev_priv)) {
+		de_pipe_masked |= GEN9_PIPE_PLANE1_FLIP_DONE |
+				  GEN9_DE_PIPE_IRQ_FAULT_ERRORS;
+		aux_en |= GEN9_AUX_CHANNEL_B | GEN9_AUX_CHANNEL_C |
+			GEN9_AUX_CHANNEL_D;
+	} else
+		de_pipe_masked |= GEN8_PIPE_PRIMARY_FLIP_DONE |
+				  GEN8_DE_PIPE_IRQ_FAULT_ERRORS;
+
+	de_pipe_enables = de_pipe_masked | GEN8_PIPE_VBLANK |
+					   GEN8_PIPE_FIFO_UNDERRUN;
+
 	dev_priv->de_irq_mask[PIPE_A] = ~de_pipe_masked;
 	dev_priv->de_irq_mask[PIPE_B] = ~de_pipe_masked;
 	dev_priv->de_irq_mask[PIPE_C] = ~de_pipe_masked;
 
 	for_each_pipe(dev_priv, pipe)
-		if (intel_display_power_enabled(dev_priv,
+		if (intel_display_power_is_enabled(dev_priv,
 				POWER_DOMAIN_PIPE(pipe)))
 			GEN8_IRQ_INIT_NDX(DE_PIPE, pipe,
 					  dev_priv->de_irq_mask[pipe],
 					  de_pipe_enables);
 
-	GEN5_IRQ_INIT(GEN8_DE_PORT_, ~GEN8_AUX_CHANNEL_A, GEN8_AUX_CHANNEL_A);
+	GEN5_IRQ_INIT(GEN8_DE_PORT_, ~aux_en, aux_en);
 }
 
 static int gen8_irq_postinstall(struct drm_device *dev)
@@ -3823,33 +3577,8 @@ static int gen8_irq_postinstall(struct drm_device *dev)
 static int cherryview_irq_postinstall(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	u32 enable_mask = I915_DISPLAY_PORT_INTERRUPT |
-		I915_DISPLAY_PIPE_A_EVENT_INTERRUPT |
-		I915_DISPLAY_PIPE_B_EVENT_INTERRUPT |
-		I915_DISPLAY_PIPE_C_EVENT_INTERRUPT;
-	u32 pipestat_enable = PLANE_FLIP_DONE_INT_STATUS_VLV |
-		PIPE_CRC_DONE_INTERRUPT_STATUS;
-	unsigned long irqflags;
-	int pipe;
 
-	/*
-	 * Leave vblank interrupts masked initially.  enable/disable will
-	 * toggle them based on usage.
-	 */
-	dev_priv->irq_mask = ~enable_mask;
-
-	for_each_pipe(dev_priv, pipe)
-		I915_WRITE(PIPESTAT(pipe), 0xffff);
-
-	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
-	i915_enable_pipestat(dev_priv, PIPE_A, PIPE_GMBUS_INTERRUPT_STATUS);
-	for_each_pipe(dev_priv, pipe)
-		i915_enable_pipestat(dev_priv, pipe, pipestat_enable);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
-
-	I915_WRITE(VLV_IIR, 0xffffffff);
-	I915_WRITE(VLV_IMR, dev_priv->irq_mask);
-	I915_WRITE(VLV_IER, enable_mask);
+	vlv_display_irq_postinstall(dev_priv);
 
 	gen8_gt_irq_postinstall(dev_priv);
 
@@ -3869,41 +3598,39 @@ static void gen8_irq_uninstall(struct drm_device *dev)
 	gen8_irq_reset(dev);
 }
 
+static void vlv_display_irq_uninstall(struct drm_i915_private *dev_priv)
+{
+	/* Interrupt setup is already guaranteed to be single-threaded, this is
+	 * just to make the assert_spin_locked check happy. */
+	spin_lock_irq(&dev_priv->irq_lock);
+	if (dev_priv->display_irqs_enabled)
+		valleyview_display_irqs_uninstall(dev_priv);
+	spin_unlock_irq(&dev_priv->irq_lock);
+
+	vlv_display_irq_reset(dev_priv);
+
+	dev_priv->irq_mask = 0;
+}
+
 static void valleyview_irq_uninstall(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long irqflags;
-	int pipe;
 
 	if (!dev_priv)
 		return;
 
 	I915_WRITE(VLV_MASTER_IER, 0);
 
-	for_each_pipe(dev_priv, pipe)
-		I915_WRITE(PIPESTAT(pipe), 0xffff);
+	gen5_gt_irq_reset(dev);
 
 	I915_WRITE(HWSTAM, 0xffffffff);
-	I915_WRITE(PORT_HOTPLUG_EN, 0);
-	I915_WRITE(PORT_HOTPLUG_STAT, I915_READ(PORT_HOTPLUG_STAT));
-
-	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
-	if (dev_priv->display_irqs_enabled)
-		valleyview_display_irqs_uninstall(dev_priv);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
 
-	dev_priv->irq_mask = 0;
-
-	I915_WRITE(VLV_IIR, 0xffffffff);
-	I915_WRITE(VLV_IMR, 0xffffffff);
-	I915_WRITE(VLV_IER, 0x0);
-	POSTING_READ(VLV_IER);
+	vlv_display_irq_uninstall(dev_priv);
 }
 
 static void cherryview_irq_uninstall(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	int pipe;
 
 	if (!dev_priv)
 		return;
@@ -3911,44 +3638,11 @@ static void cherryview_irq_uninstall(struct drm_device *dev)
 	I915_WRITE(GEN8_MASTER_IRQ, 0);
 	POSTING_READ(GEN8_MASTER_IRQ);
 
-#define GEN8_IRQ_FINI_NDX(type, which)				\
-do {								\
-	I915_WRITE(GEN8_##type##_IMR(which), 0xffffffff);	\
-	I915_WRITE(GEN8_##type##_IER(which), 0);		\
-	I915_WRITE(GEN8_##type##_IIR(which), 0xffffffff);	\
-	POSTING_READ(GEN8_##type##_IIR(which));			\
-	I915_WRITE(GEN8_##type##_IIR(which), 0xffffffff);	\
-} while (0)
-
-#define GEN8_IRQ_FINI(type)				\
-do {							\
-	I915_WRITE(GEN8_##type##_IMR, 0xffffffff);	\
-	I915_WRITE(GEN8_##type##_IER, 0);		\
-	I915_WRITE(GEN8_##type##_IIR, 0xffffffff);	\
-	POSTING_READ(GEN8_##type##_IIR);		\
-	I915_WRITE(GEN8_##type##_IIR, 0xffffffff);	\
-} while (0)
-
-	GEN8_IRQ_FINI_NDX(GT, 0);
-	GEN8_IRQ_FINI_NDX(GT, 1);
-	GEN8_IRQ_FINI_NDX(GT, 2);
-	GEN8_IRQ_FINI_NDX(GT, 3);
-
-	GEN8_IRQ_FINI(PCU);
-
-#undef GEN8_IRQ_FINI
-#undef GEN8_IRQ_FINI_NDX
-
-	I915_WRITE(PORT_HOTPLUG_EN, 0);
-	I915_WRITE(PORT_HOTPLUG_STAT, I915_READ(PORT_HOTPLUG_STAT));
+	gen8_gt_irq_reset(dev_priv);
 
-	for_each_pipe(dev_priv, pipe)
-		I915_WRITE(PIPESTAT(pipe), 0xffff);
+	GEN5_IRQ_RESET(GEN8_PCU_);
 
-	I915_WRITE(VLV_IMR, 0xffffffff);
-	I915_WRITE(VLV_IER, 0x0);
-	I915_WRITE(VLV_IIR, 0xffffffff);
-	POSTING_READ(VLV_IIR);
+	vlv_display_irq_uninstall(dev_priv);
 }
 
 static void ironlake_irq_uninstall(struct drm_device *dev)
@@ -3976,7 +3670,6 @@ static void i8xx_irq_preinstall(struct drm_device * dev)
 static int i8xx_irq_postinstall(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long irqflags;
 
 	I915_WRITE16(EMR,
 		     ~(I915_ERROR_PAGE_TABLE | I915_ERROR_MEMORY_REFRESH));
@@ -3999,10 +3692,10 @@ static int i8xx_irq_postinstall(struct drm_device *dev)
 
 	/* Interrupt setup is already guaranteed to be single-threaded, this is
 	 * just to make the assert_spin_locked check happy. */
-	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+	spin_lock_irq(&dev_priv->irq_lock);
 	i915_enable_pipestat(dev_priv, PIPE_A, PIPE_CRC_DONE_INTERRUPT_STATUS);
 	i915_enable_pipestat(dev_priv, PIPE_B, PIPE_CRC_DONE_INTERRUPT_STATUS);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+	spin_unlock_irq(&dev_priv->irq_lock);
 
 	return 0;
 }
@@ -4047,7 +3740,6 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u16 iir, new_iir;
 	u32 pipe_stats[2];
-	unsigned long irqflags;
 	int pipe;
 	u16 flip_mask =
 		I915_DISPLAY_PLANE_A_FLIP_PENDING_INTERRUPT |
@@ -4063,11 +3755,9 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
 		 * It doesn't set the bit in iir again, but it still produces
 		 * interrupts (for non-MSI).
 		 */
-		spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+		spin_lock(&dev_priv->irq_lock);
 		if (iir & I915_RENDER_COMMAND_PARSER_ERROR_INTERRUPT)
-			i915_handle_error(dev, false,
-					  "Command parser error, iir 0x%08x",
-					  iir);
+			DRM_DEBUG("Command parser error, iir 0x%08x\n", iir);
 
 		for_each_pipe(dev_priv, pipe) {
 			int reg = PIPESTAT(pipe);
@@ -4079,13 +3769,11 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
 			if (pipe_stats[pipe] & 0x8000ffff)
 				I915_WRITE(reg, pipe_stats[pipe]);
 		}
-		spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+		spin_unlock(&dev_priv->irq_lock);
 
 		I915_WRITE16(IIR, iir & ~flip_mask);
 		new_iir = I915_READ16(IIR); /* Flush posted writes */
 
-		i915_update_dri1_breadcrumb(dev);
-
 		if (iir & I915_USER_INTERRUPT)
 			notify_ring(dev, &dev_priv->ring[RCS]);
 
@@ -4101,9 +3789,9 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
 			if (pipe_stats[pipe] & PIPE_CRC_DONE_INTERRUPT_STATUS)
 				i9xx_pipe_crc_irq_handler(dev, pipe);
 
-			if (pipe_stats[pipe] & PIPE_FIFO_UNDERRUN_STATUS &&
-			    intel_set_cpu_fifo_underrun_reporting(dev, pipe, false))
-				DRM_ERROR("pipe %c underrun\n", pipe_name(pipe));
+			if (pipe_stats[pipe] & PIPE_FIFO_UNDERRUN_STATUS)
+				intel_cpu_fifo_underrun_irq_handler(dev_priv,
+								    pipe);
 		}
 
 		iir = new_iir;
@@ -4149,7 +3837,6 @@ static int i915_irq_postinstall(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 enable_mask;
-	unsigned long irqflags;
 
 	I915_WRITE(EMR, ~(I915_ERROR_PAGE_TABLE | I915_ERROR_MEMORY_REFRESH));
 
@@ -4187,10 +3874,10 @@ static int i915_irq_postinstall(struct drm_device *dev)
 
 	/* Interrupt setup is already guaranteed to be single-threaded, this is
 	 * just to make the assert_spin_locked check happy. */
-	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+	spin_lock_irq(&dev_priv->irq_lock);
 	i915_enable_pipestat(dev_priv, PIPE_A, PIPE_CRC_DONE_INTERRUPT_STATUS);
 	i915_enable_pipestat(dev_priv, PIPE_B, PIPE_CRC_DONE_INTERRUPT_STATUS);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+	spin_unlock_irq(&dev_priv->irq_lock);
 
 	return 0;
 }
@@ -4234,7 +3921,6 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
 	struct drm_device *dev = arg;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 iir, new_iir, pipe_stats[I915_MAX_PIPES];
-	unsigned long irqflags;
 	u32 flip_mask =
 		I915_DISPLAY_PLANE_A_FLIP_PENDING_INTERRUPT |
 		I915_DISPLAY_PLANE_B_FLIP_PENDING_INTERRUPT;
@@ -4250,11 +3936,9 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
 		 * It doesn't set the bit in iir again, but it still produces
 		 * interrupts (for non-MSI).
 		 */
-		spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+		spin_lock(&dev_priv->irq_lock);
 		if (iir & I915_RENDER_COMMAND_PARSER_ERROR_INTERRUPT)
-			i915_handle_error(dev, false,
-					  "Command parser error, iir 0x%08x",
-					  iir);
+			DRM_DEBUG("Command parser error, iir 0x%08x\n", iir);
 
 		for_each_pipe(dev_priv, pipe) {
 			int reg = PIPESTAT(pipe);
@@ -4266,7 +3950,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
 				irq_received = true;
 			}
 		}
-		spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+		spin_unlock(&dev_priv->irq_lock);
 
 		if (!irq_received)
 			break;
@@ -4297,9 +3981,9 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
 			if (pipe_stats[pipe] & PIPE_CRC_DONE_INTERRUPT_STATUS)
 				i9xx_pipe_crc_irq_handler(dev, pipe);
 
-			if (pipe_stats[pipe] & PIPE_FIFO_UNDERRUN_STATUS &&
-			    intel_set_cpu_fifo_underrun_reporting(dev, pipe, false))
-				DRM_ERROR("pipe %c underrun\n", pipe_name(pipe));
+			if (pipe_stats[pipe] & PIPE_FIFO_UNDERRUN_STATUS)
+				intel_cpu_fifo_underrun_irq_handler(dev_priv,
+								    pipe);
 		}
 
 		if (blc_event || (iir & I915_ASLE_INTERRUPT))
@@ -4324,8 +4008,6 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
 		iir = new_iir;
 	} while (iir & ~flip_mask);
 
-	i915_update_dri1_breadcrumb(dev);
-
 	return ret;
 }
 
@@ -4372,7 +4054,6 @@ static int i965_irq_postinstall(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 enable_mask;
 	u32 error_mask;
-	unsigned long irqflags;
 
 	/* Unmask the interrupts that we always want on. */
 	dev_priv->irq_mask = ~(I915_ASLE_INTERRUPT |
@@ -4393,11 +4074,11 @@ static int i965_irq_postinstall(struct drm_device *dev)
 
 	/* Interrupt setup is already guaranteed to be single-threaded, this is
 	 * just to make the assert_spin_locked check happy. */
-	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+	spin_lock_irq(&dev_priv->irq_lock);
 	i915_enable_pipestat(dev_priv, PIPE_A, PIPE_GMBUS_INTERRUPT_STATUS);
 	i915_enable_pipestat(dev_priv, PIPE_A, PIPE_CRC_DONE_INTERRUPT_STATUS);
 	i915_enable_pipestat(dev_priv, PIPE_B, PIPE_CRC_DONE_INTERRUPT_STATUS);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+	spin_unlock_irq(&dev_priv->irq_lock);
 
 	/*
 	 * Enable some error detection, note the instruction error mask
@@ -4462,7 +4143,6 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 iir, new_iir;
 	u32 pipe_stats[I915_MAX_PIPES];
-	unsigned long irqflags;
 	int ret = IRQ_NONE, pipe;
 	u32 flip_mask =
 		I915_DISPLAY_PLANE_A_FLIP_PENDING_INTERRUPT |
@@ -4479,11 +4159,9 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
 		 * It doesn't set the bit in iir again, but it still produces
 		 * interrupts (for non-MSI).
 		 */
-		spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+		spin_lock(&dev_priv->irq_lock);
 		if (iir & I915_RENDER_COMMAND_PARSER_ERROR_INTERRUPT)
-			i915_handle_error(dev, false,
-					  "Command parser error, iir 0x%08x",
-					  iir);
+			DRM_DEBUG("Command parser error, iir 0x%08x\n", iir);
 
 		for_each_pipe(dev_priv, pipe) {
 			int reg = PIPESTAT(pipe);
@@ -4497,7 +4175,7 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
 				irq_received = true;
 			}
 		}
-		spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+		spin_unlock(&dev_priv->irq_lock);
 
 		if (!irq_received)
 			break;
@@ -4527,9 +4205,8 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
 			if (pipe_stats[pipe] & PIPE_CRC_DONE_INTERRUPT_STATUS)
 				i9xx_pipe_crc_irq_handler(dev, pipe);
 
-			if (pipe_stats[pipe] & PIPE_FIFO_UNDERRUN_STATUS &&
-			    intel_set_cpu_fifo_underrun_reporting(dev, pipe, false))
-				DRM_ERROR("pipe %c underrun\n", pipe_name(pipe));
+			if (pipe_stats[pipe] & PIPE_FIFO_UNDERRUN_STATUS)
+				intel_cpu_fifo_underrun_irq_handler(dev_priv, pipe);
 		}
 
 		if (blc_event || (iir & I915_ASLE_INTERRUPT))
@@ -4556,8 +4233,6 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
 		iir = new_iir;
 	}
 
-	i915_update_dri1_breadcrumb(dev);
-
 	return ret;
 }
 
@@ -4584,19 +4259,18 @@ static void i965_irq_uninstall(struct drm_device * dev)
 	I915_WRITE(IIR, I915_READ(IIR));
 }
 
-static void intel_hpd_irq_reenable(struct work_struct *work)
+static void intel_hpd_irq_reenable_work(struct work_struct *work)
 {
 	struct drm_i915_private *dev_priv =
 		container_of(work, typeof(*dev_priv),
 			     hotplug_reenable_work.work);
 	struct drm_device *dev = dev_priv->dev;
 	struct drm_mode_config *mode_config = &dev->mode_config;
-	unsigned long irqflags;
 	int i;
 
 	intel_runtime_pm_get(dev_priv);
 
-	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+	spin_lock_irq(&dev_priv->irq_lock);
 	for (i = (HPD_NONE + 1); i < HPD_NUM_PINS; i++) {
 		struct drm_connector *connector;
 
@@ -4620,14 +4294,21 @@ static void intel_hpd_irq_reenable(struct work_struct *work)
 	}
 	if (dev_priv->display.hpd_irq_setup)
 		dev_priv->display.hpd_irq_setup(dev);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+	spin_unlock_irq(&dev_priv->irq_lock);
 
 	intel_runtime_pm_put(dev_priv);
 }
 
-void intel_irq_init(struct drm_device *dev)
+/**
+ * intel_irq_init - initializes irq support
+ * @dev_priv: i915 device instance
+ *
+ * This function initializes all the irq support including work items, timers
+ * and all the vtables. It does not setup the interrupt itself though.
+ */
+void intel_irq_init(struct drm_i915_private *dev_priv)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_device *dev = dev_priv->dev;
 
 	INIT_WORK(&dev_priv->hotplug_work, i915_hotplug_work_func);
 	INIT_WORK(&dev_priv->dig_port_work, i915_digport_work_func);
@@ -4636,7 +4317,7 @@ void intel_irq_init(struct drm_device *dev)
 	INIT_WORK(&dev_priv->l3_parity.error_work, ivybridge_parity_work);
 
 	/* Let's track the enabled rps events */
-	if (IS_VALLEYVIEW(dev) && !IS_CHERRYVIEW(dev))
+	if (IS_VALLEYVIEW(dev_priv) && !IS_CHERRYVIEW(dev_priv))
 		/* WaGsvRC0ResidencyMethod:vlv */
 		dev_priv->pm_rps_events = GEN6_PM_RP_UP_EI_EXPIRED;
 	else
@@ -4646,17 +4327,14 @@ void intel_irq_init(struct drm_device *dev)
 		    i915_hangcheck_elapsed,
 		    (unsigned long) dev);
 	INIT_DELAYED_WORK(&dev_priv->hotplug_reenable_work,
-			  intel_hpd_irq_reenable);
+			  intel_hpd_irq_reenable_work);
 
 	pm_qos_add_request(&dev_priv->pm_qos, PM_QOS_CPU_DMA_LATENCY, PM_QOS_DEFAULT_VALUE);
 
-	/* Haven't installed the IRQ handler yet */
-	dev_priv->pm._irqs_disabled = true;
-
-	if (IS_GEN2(dev)) {
+	if (IS_GEN2(dev_priv)) {
 		dev->max_vblank_count = 0;
 		dev->driver->get_vblank_counter = i8xx_get_vblank_counter;
-	} else if (IS_G4X(dev) || INTEL_INFO(dev)->gen >= 5) {
+	} else if (IS_G4X(dev_priv) || INTEL_INFO(dev_priv)->gen >= 5) {
 		dev->max_vblank_count = 0xffffffff; /* full 32 bit counter */
 		dev->driver->get_vblank_counter = gm45_get_vblank_counter;
 	} else {
@@ -4669,7 +4347,7 @@ void intel_irq_init(struct drm_device *dev)
 	 * Gen2 doesn't have a hardware frame counter and so depends on
 	 * vblank interrupts to produce sane vblank seuquence numbers.
 	 */
-	if (!IS_GEN2(dev))
+	if (!IS_GEN2(dev_priv))
 		dev->vblank_disable_immediate = true;
 
 	if (drm_core_check_feature(dev, DRIVER_MODESET)) {
@@ -4677,7 +4355,7 @@ void intel_irq_init(struct drm_device *dev)
 		dev->driver->get_scanout_position = i915_get_crtc_scanoutpos;
 	}
 
-	if (IS_CHERRYVIEW(dev)) {
+	if (IS_CHERRYVIEW(dev_priv)) {
 		dev->driver->irq_handler = cherryview_irq_handler;
 		dev->driver->irq_preinstall = cherryview_irq_preinstall;
 		dev->driver->irq_postinstall = cherryview_irq_postinstall;
@@ -4685,7 +4363,7 @@ void intel_irq_init(struct drm_device *dev)
 		dev->driver->enable_vblank = valleyview_enable_vblank;
 		dev->driver->disable_vblank = valleyview_disable_vblank;
 		dev_priv->display.hpd_irq_setup = i915_hpd_irq_setup;
-	} else if (IS_VALLEYVIEW(dev)) {
+	} else if (IS_VALLEYVIEW(dev_priv)) {
 		dev->driver->irq_handler = valleyview_irq_handler;
 		dev->driver->irq_preinstall = valleyview_irq_preinstall;
 		dev->driver->irq_postinstall = valleyview_irq_postinstall;
@@ -4693,7 +4371,7 @@ void intel_irq_init(struct drm_device *dev)
 		dev->driver->enable_vblank = valleyview_enable_vblank;
 		dev->driver->disable_vblank = valleyview_disable_vblank;
 		dev_priv->display.hpd_irq_setup = i915_hpd_irq_setup;
-	} else if (IS_GEN8(dev)) {
+	} else if (INTEL_INFO(dev_priv)->gen >= 8) {
 		dev->driver->irq_handler = gen8_irq_handler;
 		dev->driver->irq_preinstall = gen8_irq_reset;
 		dev->driver->irq_postinstall = gen8_irq_postinstall;
@@ -4710,12 +4388,12 @@ void intel_irq_init(struct drm_device *dev)
 		dev->driver->disable_vblank = ironlake_disable_vblank;
 		dev_priv->display.hpd_irq_setup = ibx_hpd_irq_setup;
 	} else {
-		if (INTEL_INFO(dev)->gen == 2) {
+		if (INTEL_INFO(dev_priv)->gen == 2) {
 			dev->driver->irq_preinstall = i8xx_irq_preinstall;
 			dev->driver->irq_postinstall = i8xx_irq_postinstall;
 			dev->driver->irq_handler = i8xx_irq_handler;
 			dev->driver->irq_uninstall = i8xx_irq_uninstall;
-		} else if (INTEL_INFO(dev)->gen == 3) {
+		} else if (INTEL_INFO(dev_priv)->gen == 3) {
 			dev->driver->irq_preinstall = i915_irq_preinstall;
 			dev->driver->irq_postinstall = i915_irq_postinstall;
 			dev->driver->irq_uninstall = i915_irq_uninstall;
@@ -4733,12 +4411,23 @@ void intel_irq_init(struct drm_device *dev)
 	}
 }
 
-void intel_hpd_init(struct drm_device *dev)
+/**
+ * intel_hpd_init - initializes and enables hpd support
+ * @dev_priv: i915 device instance
+ *
+ * This function enables the hotplug support. It requires that interrupts have
+ * already been enabled with intel_irq_init_hw(). From this point on hotplug and
+ * poll request can run concurrently to other code, so locking rules must be
+ * obeyed.
+ *
+ * This is a separate step from interrupt enabling to simplify the locking rules
+ * in the driver load and resume code.
+ */
+void intel_hpd_init(struct drm_i915_private *dev_priv)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_device *dev = dev_priv->dev;
 	struct drm_mode_config *mode_config = &dev->mode_config;
 	struct drm_connector *connector;
-	unsigned long irqflags;
 	int i;
 
 	for (i = 1; i < HPD_NUM_PINS; i++) {
@@ -4756,27 +4445,72 @@ void intel_hpd_init(struct drm_device *dev)
 
 	/* Interrupt setup is already guaranteed to be single-threaded, this is
 	 * just to make the assert_spin_locked checks happy. */
-	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+	spin_lock_irq(&dev_priv->irq_lock);
 	if (dev_priv->display.hpd_irq_setup)
 		dev_priv->display.hpd_irq_setup(dev);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+	spin_unlock_irq(&dev_priv->irq_lock);
 }
 
-/* Disable interrupts so we can allow runtime PM. */
-void intel_runtime_pm_disable_interrupts(struct drm_device *dev)
+/**
+ * intel_irq_install - enables the hardware interrupt
+ * @dev_priv: i915 device instance
+ *
+ * This function enables the hardware interrupt handling, but leaves the hotplug
+ * handling still disabled. It is called after intel_irq_init().
+ *
+ * In the driver load and resume code we need working interrupts in a few places
+ * but don't want to deal with the hassle of concurrent probe and hotplug
+ * workers. Hence the split into this two-stage approach.
+ */
+int intel_irq_install(struct drm_i915_private *dev_priv)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	/*
+	 * We enable some interrupt sources in our postinstall hooks, so mark
+	 * interrupts as enabled _before_ actually enabling them to avoid
+	 * special cases in our ordering checks.
+	 */
+	dev_priv->pm.irqs_enabled = true;
 
-	dev->driver->irq_uninstall(dev);
-	dev_priv->pm._irqs_disabled = true;
+	return drm_irq_install(dev_priv->dev, dev_priv->dev->pdev->irq);
 }
 
-/* Restore interrupts so we can recover from runtime PM. */
-void intel_runtime_pm_restore_interrupts(struct drm_device *dev)
+/**
+ * intel_irq_uninstall - finilizes all irq handling
+ * @dev_priv: i915 device instance
+ *
+ * This stops interrupt and hotplug handling and unregisters and frees all
+ * resources acquired in the init functions.
+ */
+void intel_irq_uninstall(struct drm_i915_private *dev_priv)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	drm_irq_uninstall(dev_priv->dev);
+	intel_hpd_cancel_work(dev_priv);
+	dev_priv->pm.irqs_enabled = false;
+}
 
-	dev_priv->pm._irqs_disabled = false;
-	dev->driver->irq_preinstall(dev);
-	dev->driver->irq_postinstall(dev);
+/**
+ * intel_runtime_pm_disable_interrupts - runtime interrupt disabling
+ * @dev_priv: i915 device instance
+ *
+ * This function is used to disable interrupts at runtime, both in the runtime
+ * pm and the system suspend/resume code.
+ */
+void intel_runtime_pm_disable_interrupts(struct drm_i915_private *dev_priv)
+{
+	dev_priv->dev->driver->irq_uninstall(dev_priv->dev);
+	dev_priv->pm.irqs_enabled = false;
+}
+
+/**
+ * intel_runtime_pm_enable_interrupts - runtime interrupt enabling
+ * @dev_priv: i915 device instance
+ *
+ * This function is used to enable interrupts at runtime, both in the runtime
+ * pm and the system suspend/resume code.
+ */
+void intel_runtime_pm_enable_interrupts(struct drm_i915_private *dev_priv)
+{
+	dev_priv->pm.irqs_enabled = true;
+	dev_priv->dev->driver->irq_preinstall(dev_priv->dev);
+	dev_priv->dev->driver->irq_postinstall(dev_priv->dev);
 }
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index c01e5f31430e..eefdc238f70b 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -26,14 +26,25 @@
 #define _I915_REG_H_
 
 #define _PIPE(pipe, a, b) ((a) + (pipe)*((b)-(a)))
+#define _PLANE(plane, a, b) _PIPE(plane, a, b)
 #define _TRANSCODER(tran, a, b) ((a) + (tran)*((b)-(a)))
-
 #define _PORT(port, a, b) ((a) + (port)*((b)-(a)))
 #define _PIPE3(pipe, a, b, c) ((pipe) == PIPE_A ? (a) : \
 			       (pipe) == PIPE_B ? (b) : (c))
 
-#define _MASKED_BIT_ENABLE(a) (((a) << 16) | (a))
-#define _MASKED_BIT_DISABLE(a) ((a) << 16)
+#define _MASKED_FIELD(mask, value) ({					   \
+	if (__builtin_constant_p(mask))					   \
+		BUILD_BUG_ON_MSG(((mask) & 0xffff0000), "Incorrect mask"); \
+	if (__builtin_constant_p(value))				   \
+		BUILD_BUG_ON_MSG((value) & 0xffff0000, "Incorrect value"); \
+	if (__builtin_constant_p(mask) && __builtin_constant_p(value))	   \
+		BUILD_BUG_ON_MSG((value) & ~(mask),			   \
+				 "Incorrect value for mask");		   \
+	(mask) << 16 | (value); })
+#define _MASKED_BIT_ENABLE(a)	({ typeof(a) _a = (a); _MASKED_FIELD(_a, _a); })
+#define _MASKED_BIT_DISABLE(a)	(_MASKED_FIELD((a), 0))
+
+
 
 /* PCI config space */
 
@@ -74,15 +85,17 @@
 #define   I915_GC_RENDER_CLOCK_166_MHZ	(0 << 0)
 #define   I915_GC_RENDER_CLOCK_200_MHZ	(1 << 0)
 #define   I915_GC_RENDER_CLOCK_333_MHZ	(4 << 0)
+#define GCDGMBUS 0xcc
 #define PCI_LBPC 0xf4 /* legacy/combination backlight modes, also called LBB */
 
 
 /* Graphics reset regs */
-#define I965_GDRST 0xc0 /* PCI config register */
+#define I915_GDRST 0xc0 /* PCI config register */
 #define  GRDOM_FULL	(0<<2)
 #define  GRDOM_RENDER	(1<<2)
 #define  GRDOM_MEDIA	(3<<2)
 #define  GRDOM_MASK	(3<<2)
+#define  GRDOM_RESET_STATUS (1<<1)
 #define  GRDOM_RESET_ENABLE (1<<0)
 
 #define ILK_GDSR 0x2ca4 /* MCHBAR offset */
@@ -248,6 +261,16 @@
 #define   MI_DISPLAY_FLIP_IVB_SPRITE_B (3 << 19)
 #define   MI_DISPLAY_FLIP_IVB_PLANE_C  (4 << 19)
 #define   MI_DISPLAY_FLIP_IVB_SPRITE_C (5 << 19)
+/* SKL ones */
+#define   MI_DISPLAY_FLIP_SKL_PLANE_1_A	(0 << 8)
+#define   MI_DISPLAY_FLIP_SKL_PLANE_1_B	(1 << 8)
+#define   MI_DISPLAY_FLIP_SKL_PLANE_1_C	(2 << 8)
+#define   MI_DISPLAY_FLIP_SKL_PLANE_2_A	(4 << 8)
+#define   MI_DISPLAY_FLIP_SKL_PLANE_2_B	(5 << 8)
+#define   MI_DISPLAY_FLIP_SKL_PLANE_2_C	(6 << 8)
+#define   MI_DISPLAY_FLIP_SKL_PLANE_3_A	(7 << 8)
+#define   MI_DISPLAY_FLIP_SKL_PLANE_3_B	(8 << 8)
+#define   MI_DISPLAY_FLIP_SKL_PLANE_3_C	(9 << 8)
 #define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6, gen7 */
 #define   MI_SEMAPHORE_GLOBAL_GTT    (1<<22)
 #define   MI_SEMAPHORE_UPDATE	    (1<<21)
@@ -314,6 +337,8 @@
 #define   MI_BATCH_GTT		    (2<<6) /* aliased with (1<<7) on gen4 */
 #define MI_BATCH_BUFFER_START_GEN8	MI_INSTR(0x31, 1)
 
+#define MI_PREDICATE_SRC0	(0x2400)
+#define MI_PREDICATE_SRC1	(0x2408)
 
 #define MI_PREDICATE_RESULT_2	(0x2214)
 #define  LOWER_SLICE_ENABLED	(1<<0)
@@ -564,6 +589,7 @@ enum punit_power_well {
 #define PUNIT_REG_GPU_LFM			0xd3
 #define PUNIT_REG_GPU_FREQ_REQ			0xd4
 #define PUNIT_REG_GPU_FREQ_STS			0xd8
+#define   GPLLENABLE				(1<<4)
 #define   GENFREQSTATUS				(1<<0)
 #define PUNIT_REG_MEDIA_TURBO_FREQ_REQ		0xdc
 #define PUNIT_REG_CZ_TIMESTAMP			0xce
@@ -672,7 +698,7 @@ enum punit_power_well {
  * need to be accessed during AUX communication,
  *
  * Generally the common lane corresponds to the pipe and
- * the spline (PCS/TX) correponds to the port.
+ * the spline (PCS/TX) corresponds to the port.
  *
  * For dual channel PHY (VLV/CHV):
  *
@@ -796,6 +822,8 @@ enum punit_power_well {
 #define _VLV_PCS_DW0_CH1		0x8400
 #define   DPIO_PCS_TX_LANE2_RESET	(1<<16)
 #define   DPIO_PCS_TX_LANE1_RESET	(1<<7)
+#define   DPIO_LEFT_TXFIFO_RST_MASTER2	(1<<4)
+#define   DPIO_RIGHT_TXFIFO_RST_MASTER2	(1<<3)
 #define VLV_PCS_DW0(ch) _PORT(ch, _VLV_PCS_DW0_CH0, _VLV_PCS_DW0_CH1)
 
 #define _VLV_PCS01_DW0_CH0		0x200
@@ -836,12 +864,31 @@ enum punit_power_well {
 
 #define _VLV_PCS_DW9_CH0		0x8224
 #define _VLV_PCS_DW9_CH1		0x8424
+#define   DPIO_PCS_TX2MARGIN_MASK	(0x7<<13)
+#define   DPIO_PCS_TX2MARGIN_000	(0<<13)
+#define   DPIO_PCS_TX2MARGIN_101	(1<<13)
+#define   DPIO_PCS_TX1MARGIN_MASK	(0x7<<10)
+#define   DPIO_PCS_TX1MARGIN_000	(0<<10)
+#define   DPIO_PCS_TX1MARGIN_101	(1<<10)
 #define	VLV_PCS_DW9(ch) _PORT(ch, _VLV_PCS_DW9_CH0, _VLV_PCS_DW9_CH1)
 
+#define _VLV_PCS01_DW9_CH0		0x224
+#define _VLV_PCS23_DW9_CH0		0x424
+#define _VLV_PCS01_DW9_CH1		0x2624
+#define _VLV_PCS23_DW9_CH1		0x2824
+#define VLV_PCS01_DW9(ch) _PORT(ch, _VLV_PCS01_DW9_CH0, _VLV_PCS01_DW9_CH1)
+#define VLV_PCS23_DW9(ch) _PORT(ch, _VLV_PCS23_DW9_CH0, _VLV_PCS23_DW9_CH1)
+
 #define _CHV_PCS_DW10_CH0		0x8228
 #define _CHV_PCS_DW10_CH1		0x8428
 #define   DPIO_PCS_SWING_CALC_TX0_TX2	(1<<30)
 #define   DPIO_PCS_SWING_CALC_TX1_TX3	(1<<31)
+#define   DPIO_PCS_TX2DEEMP_MASK	(0xf<<24)
+#define   DPIO_PCS_TX2DEEMP_9P5		(0<<24)
+#define   DPIO_PCS_TX2DEEMP_6P0		(2<<24)
+#define   DPIO_PCS_TX1DEEMP_MASK	(0xf<<16)
+#define   DPIO_PCS_TX1DEEMP_9P5		(0<<16)
+#define   DPIO_PCS_TX1DEEMP_6P0		(2<<16)
 #define CHV_PCS_DW10(ch) _PORT(ch, _CHV_PCS_DW10_CH0, _CHV_PCS_DW10_CH1)
 
 #define _VLV_PCS01_DW10_CH0		0x0228
@@ -853,8 +900,18 @@ enum punit_power_well {
 
 #define _VLV_PCS_DW11_CH0		0x822c
 #define _VLV_PCS_DW11_CH1		0x842c
+#define   DPIO_LANEDESKEW_STRAP_OVRD	(1<<3)
+#define   DPIO_LEFT_TXFIFO_RST_MASTER	(1<<1)
+#define   DPIO_RIGHT_TXFIFO_RST_MASTER	(1<<0)
 #define VLV_PCS_DW11(ch) _PORT(ch, _VLV_PCS_DW11_CH0, _VLV_PCS_DW11_CH1)
 
+#define _VLV_PCS01_DW11_CH0		0x022c
+#define _VLV_PCS23_DW11_CH0		0x042c
+#define _VLV_PCS01_DW11_CH1		0x262c
+#define _VLV_PCS23_DW11_CH1		0x282c
+#define VLV_PCS01_DW11(ch) _PORT(ch, _VLV_PCS01_DW11_CH0, _VLV_PCS01_DW11_CH1)
+#define VLV_PCS23_DW11(ch) _PORT(ch, _VLV_PCS23_DW11_CH0, _VLV_PCS23_DW11_CH1)
+
 #define _VLV_PCS_DW12_CH0		0x8230
 #define _VLV_PCS_DW12_CH1		0x8430
 #define VLV_PCS_DW12(ch) _PORT(ch, _VLV_PCS_DW12_CH0, _VLV_PCS_DW12_CH1)
@@ -1237,7 +1294,7 @@ enum punit_power_well {
 #define   GEN6_WIZ_HASHING_8x8				GEN6_WIZ_HASHING(0, 0)
 #define   GEN6_WIZ_HASHING_8x4				GEN6_WIZ_HASHING(0, 1)
 #define   GEN6_WIZ_HASHING_16x4				GEN6_WIZ_HASHING(1, 0)
-#define   GEN6_WIZ_HASHING_MASK				(GEN6_WIZ_HASHING(1, 1) << 16)
+#define   GEN6_WIZ_HASHING_MASK				GEN6_WIZ_HASHING(1, 1)
 #define   GEN6_TD_FOUR_ROW_DISPATCH_DISABLE		(1 << 5)
 
 #define GFX_MODE	0x02520
@@ -1999,6 +2056,8 @@ enum punit_power_well {
 #define DCC_ADDRESSING_MODE_MASK			(3 << 0)
 #define DCC_CHANNEL_XOR_DISABLE				(1 << 10)
 #define DCC_CHANNEL_XOR_BIT_17				(1 << 9)
+#define DCC2			0x10204
+#define DCC2_MODIFIED_ENHANCED_DISABLE			(1 << 20)
 
 /* Pineview MCH register contains DDR3 setting */
 #define CSHRDDR3CTL            0x101a8
@@ -2282,7 +2341,6 @@ enum punit_power_well {
 
 #define GEN6_GT_THREAD_STATUS_REG 0x13805c
 #define GEN6_GT_THREAD_STATUS_CORE_MASK 0x7
-#define GEN6_GT_THREAD_STATUS_CORE_MASK_HSW (0x7 | (0x07 << 16))
 
 #define GEN6_GT_PERF_STATUS	(MCHBAR_MIRROR_BASE_SNB + 0x5948)
 #define GEN6_RP_STATE_LIMITS	(MCHBAR_MIRROR_BASE_SNB + 0x5994)
@@ -2506,9 +2564,7 @@ enum punit_power_well {
 
 #define EDP_PSR_AUX_CTL(dev)			(EDP_PSR_BASE(dev) + 0x10)
 #define EDP_PSR_AUX_DATA1(dev)			(EDP_PSR_BASE(dev) + 0x14)
-#define   EDP_PSR_DPCD_COMMAND		0x80060000
 #define EDP_PSR_AUX_DATA2(dev)			(EDP_PSR_BASE(dev) + 0x18)
-#define   EDP_PSR_DPCD_NORMAL_OPERATION	(1<<24)
 #define EDP_PSR_AUX_DATA3(dev)			(EDP_PSR_BASE(dev) + 0x1c)
 #define EDP_PSR_AUX_DATA4(dev)			(EDP_PSR_BASE(dev) + 0x20)
 #define EDP_PSR_AUX_DATA5(dev)			(EDP_PSR_BASE(dev) + 0x24)
@@ -3645,6 +3701,7 @@ enum punit_power_well {
 #define   DP_AUX_CH_CTL_PRECHARGE_TEST	    (1 << 11)
 #define   DP_AUX_CH_CTL_BIT_CLOCK_2X_MASK    (0x7ff)
 #define   DP_AUX_CH_CTL_BIT_CLOCK_2X_SHIFT   0
+#define   DP_AUX_CH_CTL_SYNC_PULSE_SKL(c)   ((c) - 1)
 
 /*
  * Computing GMCH M and N values for the Display Port link
@@ -4024,17 +4081,18 @@ enum punit_power_well {
 #define   DSPFW_PLANEA_WM1_HI_MASK	(1<<0)
 
 /* drain latency register values*/
+#define DRAIN_LATENCY_PRECISION_16	16
 #define DRAIN_LATENCY_PRECISION_32	32
 #define DRAIN_LATENCY_PRECISION_64	64
 #define VLV_DDL(pipe)			(VLV_DISPLAY_BASE + 0x70050 + 4 * (pipe))
-#define DDL_CURSOR_PRECISION_64		(1<<31)
-#define DDL_CURSOR_PRECISION_32		(0<<31)
+#define DDL_CURSOR_PRECISION_HIGH	(1<<31)
+#define DDL_CURSOR_PRECISION_LOW	(0<<31)
 #define DDL_CURSOR_SHIFT		24
-#define DDL_SPRITE_PRECISION_64(sprite)	(1<<(15+8*(sprite)))
-#define DDL_SPRITE_PRECISION_32(sprite)	(0<<(15+8*(sprite)))
+#define DDL_SPRITE_PRECISION_HIGH(sprite)	(1<<(15+8*(sprite)))
+#define DDL_SPRITE_PRECISION_LOW(sprite)	(0<<(15+8*(sprite)))
 #define DDL_SPRITE_SHIFT(sprite)	(8+8*(sprite))
-#define DDL_PLANE_PRECISION_64		(1<<7)
-#define DDL_PLANE_PRECISION_32		(0<<7)
+#define DDL_PLANE_PRECISION_HIGH	(1<<7)
+#define DDL_PLANE_PRECISION_LOW		(0<<7)
 #define DDL_PLANE_SHIFT			0
 #define DRAIN_LATENCY_MASK		0x7f
 
@@ -4071,6 +4129,41 @@ enum punit_power_well {
 #define I965_CURSOR_MAX_WM	32
 #define I965_CURSOR_DFT_WM	8
 
+/* Watermark register definitions for SKL */
+#define CUR_WM_A_0		0x70140
+#define CUR_WM_B_0		0x71140
+#define PLANE_WM_1_A_0		0x70240
+#define PLANE_WM_1_B_0		0x71240
+#define PLANE_WM_2_A_0		0x70340
+#define PLANE_WM_2_B_0		0x71340
+#define PLANE_WM_TRANS_1_A_0	0x70268
+#define PLANE_WM_TRANS_1_B_0	0x71268
+#define PLANE_WM_TRANS_2_A_0	0x70368
+#define PLANE_WM_TRANS_2_B_0	0x71368
+#define CUR_WM_TRANS_A_0	0x70168
+#define CUR_WM_TRANS_B_0	0x71168
+#define   PLANE_WM_EN		(1 << 31)
+#define   PLANE_WM_LINES_SHIFT	14
+#define   PLANE_WM_LINES_MASK	0x1f
+#define   PLANE_WM_BLOCKS_MASK	0x3ff
+
+#define CUR_WM_0(pipe) _PIPE(pipe, CUR_WM_A_0, CUR_WM_B_0)
+#define CUR_WM(pipe, level) (CUR_WM_0(pipe) + ((4) * (level)))
+#define CUR_WM_TRANS(pipe) _PIPE(pipe, CUR_WM_TRANS_A_0, CUR_WM_TRANS_B_0)
+
+#define _PLANE_WM_1(pipe) _PIPE(pipe, PLANE_WM_1_A_0, PLANE_WM_1_B_0)
+#define _PLANE_WM_2(pipe) _PIPE(pipe, PLANE_WM_2_A_0, PLANE_WM_2_B_0)
+#define _PLANE_WM_BASE(pipe, plane)	\
+			_PLANE(plane, _PLANE_WM_1(pipe), _PLANE_WM_2(pipe))
+#define PLANE_WM(pipe, plane, level)	\
+			(_PLANE_WM_BASE(pipe, plane) + ((4) * (level)))
+#define _PLANE_WM_TRANS_1(pipe)	\
+			_PIPE(pipe, PLANE_WM_TRANS_1_A_0, PLANE_WM_TRANS_1_B_0)
+#define _PLANE_WM_TRANS_2(pipe)	\
+			_PIPE(pipe, PLANE_WM_TRANS_2_A_0, PLANE_WM_TRANS_2_B_0)
+#define PLANE_WM_TRANS(pipe, plane)	\
+		_PLANE(plane, _PLANE_WM_TRANS_1(pipe), _PLANE_WM_TRANS_2(pipe))
+
 /* define the Watermark register on Ironlake */
 #define WM0_PIPEA_ILK		0x45100
 #define  WM0_PIPE_PLANE_MASK	(0xffff<<16)
@@ -4177,6 +4270,7 @@ enum punit_power_well {
 #define   MCURSOR_PIPE_A	0x00
 #define   MCURSOR_PIPE_B	(1 << 28)
 #define   MCURSOR_GAMMA_ENABLE  (1 << 26)
+#define   CURSOR_ROTATE_180	(1<<15)
 #define   CURSOR_TRICKLE_FEED_DISABLE	(1 << 14)
 #define _CURABASE		0x70084
 #define _CURAPOS		0x70088
@@ -4240,9 +4334,11 @@ enum punit_power_well {
 #define   DISPPLANE_NO_LINE_DOUBLE		0
 #define   DISPPLANE_STEREO_POLARITY_FIRST	0
 #define   DISPPLANE_STEREO_POLARITY_SECOND	(1<<18)
-#define   DISPPLANE_ROTATE_180         (1<<15)
+#define   DISPPLANE_ALPHA_PREMULTIPLY		(1<<16) /* CHV pipe B */
+#define   DISPPLANE_ROTATE_180			(1<<15)
 #define   DISPPLANE_TRICKLE_FEED_DISABLE	(1<<14) /* Ironlake */
 #define   DISPPLANE_TILED			(1<<10)
+#define   DISPPLANE_MIRROR			(1<<8) /* CHV pipe B */
 #define _DSPAADDR				0x70184
 #define _DSPASTRIDE				0x70188
 #define _DSPAPOS				0x7018C /* reserved */
@@ -4263,6 +4359,24 @@ enum punit_power_well {
 #define DSPOFFSET(plane) _PIPE2(plane, _DSPAOFFSET)
 #define DSPSURFLIVE(plane) _PIPE2(plane, _DSPASURFLIVE)
 
+/* CHV pipe B blender and primary plane */
+#define _CHV_BLEND_A		0x60a00
+#define   CHV_BLEND_LEGACY		(0<<30)
+#define   CHV_BLEND_ANDROID		(1<<30)
+#define   CHV_BLEND_MPO			(2<<30)
+#define   CHV_BLEND_MASK		(3<<30)
+#define _CHV_CANVAS_A		0x60a04
+#define _PRIMPOS_A		0x60a08
+#define _PRIMSIZE_A		0x60a0c
+#define _PRIMCNSTALPHA_A	0x60a10
+#define   PRIM_CONST_ALPHA_ENABLE	(1<<31)
+
+#define CHV_BLEND(pipe) _TRANSCODER2(pipe, _CHV_BLEND_A)
+#define CHV_CANVAS(pipe) _TRANSCODER2(pipe, _CHV_CANVAS_A)
+#define PRIMPOS(plane) _TRANSCODER2(plane, _PRIMPOS_A)
+#define PRIMSIZE(plane) _TRANSCODER2(plane, _PRIMSIZE_A)
+#define PRIMCNSTALPHA(plane) _TRANSCODER2(plane, _PRIMCNSTALPHA_A)
+
 /* Display/Sprite base address macros */
 #define DISP_BASEADDR_MASK	(0xfffff000)
 #define I915_LO_DISPBASE(val)	(val & ~DISP_BASEADDR_MASK)
@@ -4464,6 +4578,7 @@ enum punit_power_well {
 #define   SP_FORMAT_RGBA1010102		(9<<26)
 #define   SP_FORMAT_RGBX8888		(0xe<<26)
 #define   SP_FORMAT_RGBA8888		(0xf<<26)
+#define   SP_ALPHA_PREMULTIPLY		(1<<23) /* CHV pipe B */
 #define   SP_SOURCE_KEY			(1<<22)
 #define   SP_YUV_BYTE_ORDER_MASK	(3<<16)
 #define   SP_YUV_ORDER_YUYV		(0<<16)
@@ -4472,6 +4587,7 @@ enum punit_power_well {
 #define   SP_YUV_ORDER_VYUY		(3<<16)
 #define   SP_ROTATE_180			(1<<15)
 #define   SP_TILED			(1<<10)
+#define   SP_MIRROR			(1<<8) /* CHV pipe B */
 #define _SPALINOFF		(VLV_DISPLAY_BASE + 0x72184)
 #define _SPASTRIDE		(VLV_DISPLAY_BASE + 0x72188)
 #define _SPAPOS			(VLV_DISPLAY_BASE + 0x7218c)
@@ -4482,6 +4598,7 @@ enum punit_power_well {
 #define _SPAKEYMAXVAL		(VLV_DISPLAY_BASE + 0x721a0)
 #define _SPATILEOFF		(VLV_DISPLAY_BASE + 0x721a4)
 #define _SPACONSTALPHA		(VLV_DISPLAY_BASE + 0x721a8)
+#define   SP_CONST_ALPHA_ENABLE		(1<<31)
 #define _SPAGAMC		(VLV_DISPLAY_BASE + 0x721f4)
 
 #define _SPBCNTR		(VLV_DISPLAY_BASE + 0x72280)
@@ -4510,6 +4627,195 @@ enum punit_power_well {
 #define SPCONSTALPHA(pipe, plane) _PIPE(pipe * 2 + plane, _SPACONSTALPHA, _SPBCONSTALPHA)
 #define SPGAMC(pipe, plane) _PIPE(pipe * 2 + plane, _SPAGAMC, _SPBGAMC)
 
+/*
+ * CHV pipe B sprite CSC
+ *
+ * |cr|   |c0 c1 c2|   |cr + cr_ioff|   |cr_ooff|
+ * |yg| = |c3 c4 c5| x |yg + yg_ioff| + |yg_ooff|
+ * |cb|   |c6 c7 c8|   |cb + cr_ioff|   |cb_ooff|
+ */
+#define SPCSCYGOFF(sprite)	(VLV_DISPLAY_BASE + 0x6d900 + (sprite) * 0x1000)
+#define SPCSCCBOFF(sprite)	(VLV_DISPLAY_BASE + 0x6d904 + (sprite) * 0x1000)
+#define SPCSCCROFF(sprite)	(VLV_DISPLAY_BASE + 0x6d908 + (sprite) * 0x1000)
+#define  SPCSC_OOFF(x)		(((x) & 0x7ff) << 16) /* s11 */
+#define  SPCSC_IOFF(x)		(((x) & 0x7ff) << 0) /* s11 */
+
+#define SPCSCC01(sprite)	(VLV_DISPLAY_BASE + 0x6d90c + (sprite) * 0x1000)
+#define SPCSCC23(sprite)	(VLV_DISPLAY_BASE + 0x6d910 + (sprite) * 0x1000)
+#define SPCSCC45(sprite)	(VLV_DISPLAY_BASE + 0x6d914 + (sprite) * 0x1000)
+#define SPCSCC67(sprite)	(VLV_DISPLAY_BASE + 0x6d918 + (sprite) * 0x1000)
+#define SPCSCC8(sprite)		(VLV_DISPLAY_BASE + 0x6d91c + (sprite) * 0x1000)
+#define  SPCSC_C1(x)		(((x) & 0x7fff) << 16) /* s3.12 */
+#define  SPCSC_C0(x)		(((x) & 0x7fff) << 0) /* s3.12 */
+
+#define SPCSCYGICLAMP(sprite)	(VLV_DISPLAY_BASE + 0x6d920 + (sprite) * 0x1000)
+#define SPCSCCBICLAMP(sprite)	(VLV_DISPLAY_BASE + 0x6d924 + (sprite) * 0x1000)
+#define SPCSCCRICLAMP(sprite)	(VLV_DISPLAY_BASE + 0x6d928 + (sprite) * 0x1000)
+#define  SPCSC_IMAX(x)		(((x) & 0x7ff) << 16) /* s11 */
+#define  SPCSC_IMIN(x)		(((x) & 0x7ff) << 0) /* s11 */
+
+#define SPCSCYGOCLAMP(sprite)	(VLV_DISPLAY_BASE + 0x6d92c + (sprite) * 0x1000)
+#define SPCSCCBOCLAMP(sprite)	(VLV_DISPLAY_BASE + 0x6d930 + (sprite) * 0x1000)
+#define SPCSCCROCLAMP(sprite)	(VLV_DISPLAY_BASE + 0x6d934 + (sprite) * 0x1000)
+#define  SPCSC_OMAX(x)		((x) << 16) /* u10 */
+#define  SPCSC_OMIN(x)		((x) << 0) /* u10 */
+
+/* Skylake plane registers */
+
+#define _PLANE_CTL_1_A				0x70180
+#define _PLANE_CTL_2_A				0x70280
+#define _PLANE_CTL_3_A				0x70380
+#define   PLANE_CTL_ENABLE			(1 << 31)
+#define   PLANE_CTL_PIPE_GAMMA_ENABLE		(1 << 30)
+#define   PLANE_CTL_FORMAT_MASK			(0xf << 24)
+#define   PLANE_CTL_FORMAT_YUV422		(  0 << 24)
+#define   PLANE_CTL_FORMAT_NV12			(  1 << 24)
+#define   PLANE_CTL_FORMAT_XRGB_2101010		(  2 << 24)
+#define   PLANE_CTL_FORMAT_XRGB_8888		(  4 << 24)
+#define   PLANE_CTL_FORMAT_XRGB_16161616F	(  6 << 24)
+#define   PLANE_CTL_FORMAT_AYUV			(  8 << 24)
+#define   PLANE_CTL_FORMAT_INDEXED		( 12 << 24)
+#define   PLANE_CTL_FORMAT_RGB_565		( 14 << 24)
+#define   PLANE_CTL_PIPE_CSC_ENABLE		(1 << 23)
+#define   PLANE_CTL_KEY_ENABLE_MASK		(0x3 << 21)
+#define   PLANE_CTL_KEY_ENABLE_SOURCE		(  1 << 21)
+#define   PLANE_CTL_KEY_ENABLE_DESTINATION	(  2 << 21)
+#define   PLANE_CTL_ORDER_BGRX			(0 << 20)
+#define   PLANE_CTL_ORDER_RGBX			(1 << 20)
+#define   PLANE_CTL_YUV422_ORDER_MASK		(0x3 << 16)
+#define   PLANE_CTL_YUV422_YUYV			(  0 << 16)
+#define   PLANE_CTL_YUV422_UYVY			(  1 << 16)
+#define   PLANE_CTL_YUV422_YVYU			(  2 << 16)
+#define   PLANE_CTL_YUV422_VYUY			(  3 << 16)
+#define   PLANE_CTL_DECOMPRESSION_ENABLE	(1 << 15)
+#define   PLANE_CTL_TRICKLE_FEED_DISABLE	(1 << 14)
+#define   PLANE_CTL_PLANE_GAMMA_DISABLE		(1 << 13)
+#define   PLANE_CTL_TILED_MASK			(0x7 << 10)
+#define   PLANE_CTL_TILED_LINEAR		(  0 << 10)
+#define   PLANE_CTL_TILED_X			(  1 << 10)
+#define   PLANE_CTL_TILED_Y			(  4 << 10)
+#define   PLANE_CTL_TILED_YF			(  5 << 10)
+#define   PLANE_CTL_ALPHA_MASK			(0x3 << 4)
+#define   PLANE_CTL_ALPHA_DISABLE		(  0 << 4)
+#define   PLANE_CTL_ALPHA_SW_PREMULTIPLY	(  2 << 4)
+#define   PLANE_CTL_ALPHA_HW_PREMULTIPLY	(  3 << 4)
+#define   PLANE_CTL_ROTATE_MASK			0x3
+#define   PLANE_CTL_ROTATE_0			0x0
+#define   PLANE_CTL_ROTATE_180			0x2
+#define _PLANE_STRIDE_1_A			0x70188
+#define _PLANE_STRIDE_2_A			0x70288
+#define _PLANE_STRIDE_3_A			0x70388
+#define _PLANE_POS_1_A				0x7018c
+#define _PLANE_POS_2_A				0x7028c
+#define _PLANE_POS_3_A				0x7038c
+#define _PLANE_SIZE_1_A				0x70190
+#define _PLANE_SIZE_2_A				0x70290
+#define _PLANE_SIZE_3_A				0x70390
+#define _PLANE_SURF_1_A				0x7019c
+#define _PLANE_SURF_2_A				0x7029c
+#define _PLANE_SURF_3_A				0x7039c
+#define _PLANE_OFFSET_1_A			0x701a4
+#define _PLANE_OFFSET_2_A			0x702a4
+#define _PLANE_OFFSET_3_A			0x703a4
+#define _PLANE_KEYVAL_1_A			0x70194
+#define _PLANE_KEYVAL_2_A			0x70294
+#define _PLANE_KEYMSK_1_A			0x70198
+#define _PLANE_KEYMSK_2_A			0x70298
+#define _PLANE_KEYMAX_1_A			0x701a0
+#define _PLANE_KEYMAX_2_A			0x702a0
+#define _PLANE_BUF_CFG_1_A			0x7027c
+#define _PLANE_BUF_CFG_2_A			0x7037c
+
+#define _PLANE_CTL_1_B				0x71180
+#define _PLANE_CTL_2_B				0x71280
+#define _PLANE_CTL_3_B				0x71380
+#define _PLANE_CTL_1(pipe)	_PIPE(pipe, _PLANE_CTL_1_A, _PLANE_CTL_1_B)
+#define _PLANE_CTL_2(pipe)	_PIPE(pipe, _PLANE_CTL_2_A, _PLANE_CTL_2_B)
+#define _PLANE_CTL_3(pipe)	_PIPE(pipe, _PLANE_CTL_3_A, _PLANE_CTL_3_B)
+#define PLANE_CTL(pipe, plane)	\
+	_PLANE(plane, _PLANE_CTL_1(pipe), _PLANE_CTL_2(pipe))
+
+#define _PLANE_STRIDE_1_B			0x71188
+#define _PLANE_STRIDE_2_B			0x71288
+#define _PLANE_STRIDE_3_B			0x71388
+#define _PLANE_STRIDE_1(pipe)	\
+	_PIPE(pipe, _PLANE_STRIDE_1_A, _PLANE_STRIDE_1_B)
+#define _PLANE_STRIDE_2(pipe)	\
+	_PIPE(pipe, _PLANE_STRIDE_2_A, _PLANE_STRIDE_2_B)
+#define _PLANE_STRIDE_3(pipe)	\
+	_PIPE(pipe, _PLANE_STRIDE_3_A, _PLANE_STRIDE_3_B)
+#define PLANE_STRIDE(pipe, plane)	\
+	_PLANE(plane, _PLANE_STRIDE_1(pipe), _PLANE_STRIDE_2(pipe))
+
+#define _PLANE_POS_1_B				0x7118c
+#define _PLANE_POS_2_B				0x7128c
+#define _PLANE_POS_3_B				0x7138c
+#define _PLANE_POS_1(pipe)	_PIPE(pipe, _PLANE_POS_1_A, _PLANE_POS_1_B)
+#define _PLANE_POS_2(pipe)	_PIPE(pipe, _PLANE_POS_2_A, _PLANE_POS_2_B)
+#define _PLANE_POS_3(pipe)	_PIPE(pipe, _PLANE_POS_3_A, _PLANE_POS_3_B)
+#define PLANE_POS(pipe, plane)	\
+	_PLANE(plane, _PLANE_POS_1(pipe), _PLANE_POS_2(pipe))
+
+#define _PLANE_SIZE_1_B				0x71190
+#define _PLANE_SIZE_2_B				0x71290
+#define _PLANE_SIZE_3_B				0x71390
+#define _PLANE_SIZE_1(pipe)	_PIPE(pipe, _PLANE_SIZE_1_A, _PLANE_SIZE_1_B)
+#define _PLANE_SIZE_2(pipe)	_PIPE(pipe, _PLANE_SIZE_2_A, _PLANE_SIZE_2_B)
+#define _PLANE_SIZE_3(pipe)	_PIPE(pipe, _PLANE_SIZE_3_A, _PLANE_SIZE_3_B)
+#define PLANE_SIZE(pipe, plane)	\
+	_PLANE(plane, _PLANE_SIZE_1(pipe), _PLANE_SIZE_2(pipe))
+
+#define _PLANE_SURF_1_B				0x7119c
+#define _PLANE_SURF_2_B				0x7129c
+#define _PLANE_SURF_3_B				0x7139c
+#define _PLANE_SURF_1(pipe)	_PIPE(pipe, _PLANE_SURF_1_A, _PLANE_SURF_1_B)
+#define _PLANE_SURF_2(pipe)	_PIPE(pipe, _PLANE_SURF_2_A, _PLANE_SURF_2_B)
+#define _PLANE_SURF_3(pipe)	_PIPE(pipe, _PLANE_SURF_3_A, _PLANE_SURF_3_B)
+#define PLANE_SURF(pipe, plane)	\
+	_PLANE(plane, _PLANE_SURF_1(pipe), _PLANE_SURF_2(pipe))
+
+#define _PLANE_OFFSET_1_B			0x711a4
+#define _PLANE_OFFSET_2_B			0x712a4
+#define _PLANE_OFFSET_1(pipe) _PIPE(pipe, _PLANE_OFFSET_1_A, _PLANE_OFFSET_1_B)
+#define _PLANE_OFFSET_2(pipe) _PIPE(pipe, _PLANE_OFFSET_2_A, _PLANE_OFFSET_2_B)
+#define PLANE_OFFSET(pipe, plane)	\
+	_PLANE(plane, _PLANE_OFFSET_1(pipe), _PLANE_OFFSET_2(pipe))
+
+#define _PLANE_KEYVAL_1_B			0x71194
+#define _PLANE_KEYVAL_2_B			0x71294
+#define _PLANE_KEYVAL_1(pipe) _PIPE(pipe, _PLANE_KEYVAL_1_A, _PLANE_KEYVAL_1_B)
+#define _PLANE_KEYVAL_2(pipe) _PIPE(pipe, _PLANE_KEYVAL_2_A, _PLANE_KEYVAL_2_B)
+#define PLANE_KEYVAL(pipe, plane)	\
+	_PLANE(plane, _PLANE_KEYVAL_1(pipe), _PLANE_KEYVAL_2(pipe))
+
+#define _PLANE_KEYMSK_1_B			0x71198
+#define _PLANE_KEYMSK_2_B			0x71298
+#define _PLANE_KEYMSK_1(pipe) _PIPE(pipe, _PLANE_KEYMSK_1_A, _PLANE_KEYMSK_1_B)
+#define _PLANE_KEYMSK_2(pipe) _PIPE(pipe, _PLANE_KEYMSK_2_A, _PLANE_KEYMSK_2_B)
+#define PLANE_KEYMSK(pipe, plane)	\
+	_PLANE(plane, _PLANE_KEYMSK_1(pipe), _PLANE_KEYMSK_2(pipe))
+
+#define _PLANE_KEYMAX_1_B			0x711a0
+#define _PLANE_KEYMAX_2_B			0x712a0
+#define _PLANE_KEYMAX_1(pipe) _PIPE(pipe, _PLANE_KEYMAX_1_A, _PLANE_KEYMAX_1_B)
+#define _PLANE_KEYMAX_2(pipe) _PIPE(pipe, _PLANE_KEYMAX_2_A, _PLANE_KEYMAX_2_B)
+#define PLANE_KEYMAX(pipe, plane)	\
+	_PLANE(plane, _PLANE_KEYMAX_1(pipe), _PLANE_KEYMAX_2(pipe))
+
+#define _PLANE_BUF_CFG_1_B			0x7127c
+#define _PLANE_BUF_CFG_2_B			0x7137c
+#define _PLANE_BUF_CFG_1(pipe)	\
+	_PIPE(pipe, _PLANE_BUF_CFG_1_A, _PLANE_BUF_CFG_1_B)
+#define _PLANE_BUF_CFG_2(pipe)	\
+	_PIPE(pipe, _PLANE_BUF_CFG_2_A, _PLANE_BUF_CFG_2_B)
+#define PLANE_BUF_CFG(pipe, plane)	\
+	_PLANE(plane, _PLANE_BUF_CFG_1(pipe), _PLANE_BUF_CFG_2(pipe))
+
+/* SKL new cursor registers */
+#define _CUR_BUF_CFG_A				0x7017c
+#define _CUR_BUF_CFG_B				0x7117c
+#define CUR_BUF_CFG(pipe)	_PIPE(pipe, _CUR_BUF_CFG_A, _CUR_BUF_CFG_B)
+
 /* VBIOS regs */
 #define VGACNTRL		0x71400
 # define VGA_DISP_DISABLE			(1 << 31)
@@ -4625,6 +4931,18 @@ enum punit_power_well {
 #define PF_VSCALE(pipe)		_PIPE(pipe, _PFA_VSCALE, _PFB_VSCALE)
 #define PF_HSCALE(pipe)		_PIPE(pipe, _PFA_HSCALE, _PFB_HSCALE)
 
+#define _PSA_CTL		0x68180
+#define _PSB_CTL		0x68980
+#define PS_ENABLE		(1<<31)
+#define _PSA_WIN_SZ		0x68174
+#define _PSB_WIN_SZ		0x68974
+#define _PSA_WIN_POS		0x68170
+#define _PSB_WIN_POS		0x68970
+
+#define PS_CTL(pipe)		_PIPE(pipe, _PSA_CTL, _PSB_CTL)
+#define PS_WIN_SZ(pipe)		_PIPE(pipe, _PSA_WIN_SZ, _PSB_WIN_SZ)
+#define PS_WIN_POS(pipe)	_PIPE(pipe, _PSA_WIN_POS, _PSB_WIN_POS)
+
 /* legacy palette */
 #define _LGC_PALETTE_A           0x4a000
 #define _LGC_PALETTE_B           0x4a800
@@ -4746,16 +5064,32 @@ enum punit_power_well {
 #define  GEN8_PIPE_SCAN_LINE_EVENT	(1 << 2)
 #define  GEN8_PIPE_VSYNC		(1 << 1)
 #define  GEN8_PIPE_VBLANK		(1 << 0)
+#define  GEN9_PIPE_CURSOR_FAULT		(1 << 11)
+#define  GEN9_PIPE_PLANE3_FAULT		(1 << 9)
+#define  GEN9_PIPE_PLANE2_FAULT		(1 << 8)
+#define  GEN9_PIPE_PLANE1_FAULT		(1 << 7)
+#define  GEN9_PIPE_PLANE3_FLIP_DONE	(1 << 5)
+#define  GEN9_PIPE_PLANE2_FLIP_DONE	(1 << 4)
+#define  GEN9_PIPE_PLANE1_FLIP_DONE	(1 << 3)
+#define  GEN9_PIPE_PLANE_FLIP_DONE(p)	(1 << (3 + p))
 #define GEN8_DE_PIPE_IRQ_FAULT_ERRORS \
 	(GEN8_PIPE_CURSOR_FAULT | \
 	 GEN8_PIPE_SPRITE_FAULT | \
 	 GEN8_PIPE_PRIMARY_FAULT)
+#define GEN9_DE_PIPE_IRQ_FAULT_ERRORS \
+	(GEN9_PIPE_CURSOR_FAULT | \
+	 GEN9_PIPE_PLANE3_FAULT | \
+	 GEN9_PIPE_PLANE2_FAULT | \
+	 GEN9_PIPE_PLANE1_FAULT)
 
 #define GEN8_DE_PORT_ISR 0x44440
 #define GEN8_DE_PORT_IMR 0x44444
 #define GEN8_DE_PORT_IIR 0x44448
 #define GEN8_DE_PORT_IER 0x4444c
 #define  GEN8_PORT_DP_A_HOTPLUG		(1 << 3)
+#define  GEN9_AUX_CHANNEL_D		(1 << 27)
+#define  GEN9_AUX_CHANNEL_C		(1 << 26)
+#define  GEN9_AUX_CHANNEL_B		(1 << 25)
 #define  GEN8_AUX_CHANNEL_A		(1 << 0)
 
 #define GEN8_DE_MISC_ISR 0x44460
@@ -4839,6 +5173,8 @@ enum punit_power_well {
 /* GEN8 chicken */
 #define HDC_CHICKEN0				0x7300
 #define  HDC_FORCE_NON_COHERENT			(1<<4)
+#define  HDC_DONOT_FETCH_MEM_WHEN_MASKED	(1<<11)
+#define  HDC_FENCE_DEST_SLM_DISABLE		(1<<14)
 
 /* WaCatErrorRejectionIssue */
 #define GEN7_SQ_CHICKEN_MBCUNIT_CONFIG		0x9030
@@ -5540,6 +5876,12 @@ enum punit_power_well {
 #define   VLV_GTLC_PW_MEDIA_STATUS_MASK		(1 << 5)
 #define   VLV_GTLC_PW_RENDER_STATUS_MASK	(1 << 7)
 #define  FORCEWAKE_MT				0xa188 /* multi-threaded */
+#define  FORCEWAKE_MEDIA_GEN9			0xa270
+#define  FORCEWAKE_RENDER_GEN9			0xa278
+#define  FORCEWAKE_BLITTER_GEN9			0xa188
+#define  FORCEWAKE_ACK_MEDIA_GEN9		0x0D88
+#define  FORCEWAKE_ACK_RENDER_GEN9		0x0D84
+#define  FORCEWAKE_ACK_BLITTER_GEN9		0x130044
 #define   FORCEWAKE_KERNEL			0x1
 #define   FORCEWAKE_USER			0x2
 #define  FORCEWAKE_MT_ACK			0x130040
@@ -5711,9 +6053,17 @@ enum punit_power_well {
 #define   GEN6_ENCODE_RC6_VID(mv)		(((mv) - 245) / 5)
 #define   GEN6_DECODE_RC6_VID(vids)		(((vids) * 5) + 245)
 #define   DISPLAY_IPS_CONTROL			0x19
+#define	  HSW_PCODE_DYNAMIC_DUTY_CYCLE_CONTROL	0x1A
 #define GEN6_PCODE_DATA				0x138128
 #define   GEN6_PCODE_FREQ_IA_RATIO_SHIFT	8
 #define   GEN6_PCODE_FREQ_RING_RATIO_SHIFT	16
+#define GEN6_PCODE_DATA1			0x13812C
+
+#define   GEN9_PCODE_READ_MEM_LATENCY		0x6
+#define   GEN9_MEM_LATENCY_LEVEL_MASK		0xFF
+#define   GEN9_MEM_LATENCY_LEVEL_1_5_SHIFT	8
+#define   GEN9_MEM_LATENCY_LEVEL_2_6_SHIFT	16
+#define   GEN9_MEM_LATENCY_LEVEL_3_7_SHIFT	24
 
 #define GEN6_GT_CORE_STATUS		0x138060
 #define   GEN6_CORE_CPD_STATE_MASK	(7<<4)
@@ -5751,6 +6101,9 @@ enum punit_power_well {
 #define   GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE	(1<<10)
 #define   GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE	(1<<3)
 
+#define GEN9_HALF_SLICE_CHICKEN5	0xe188
+#define   GEN9_DG_MIRROR_FIX_ENABLE	(1<<5)
+
 #define GEN8_ROW_CHICKEN		0xe4f0
 #define   PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE	(1<<8)
 #define   STALL_DOP_GATING_DISABLE		(1<<5)
@@ -5766,57 +6119,58 @@ enum punit_power_well {
 #define   GEN8_CENTROID_PIXEL_OPT_DIS	(1<<8)
 #define   GEN8_SAMPLER_POWER_BYPASS_DIS	(1<<1)
 
+/* Audio */
 #define G4X_AUD_VID_DID			(dev_priv->info.display_mmio_offset + 0x62020)
-#define INTEL_AUDIO_DEVCL		0x808629FB
-#define INTEL_AUDIO_DEVBLC		0x80862801
-#define INTEL_AUDIO_DEVCTG		0x80862802
+#define   INTEL_AUDIO_DEVCL		0x808629FB
+#define   INTEL_AUDIO_DEVBLC		0x80862801
+#define   INTEL_AUDIO_DEVCTG		0x80862802
 
 #define G4X_AUD_CNTL_ST			0x620B4
-#define G4X_ELDV_DEVCL_DEVBLC		(1 << 13)
-#define G4X_ELDV_DEVCTG			(1 << 14)
-#define G4X_ELD_ADDR			(0xf << 5)
-#define G4X_ELD_ACK			(1 << 4)
+#define   G4X_ELDV_DEVCL_DEVBLC		(1 << 13)
+#define   G4X_ELDV_DEVCTG		(1 << 14)
+#define   G4X_ELD_ADDR_MASK		(0xf << 5)
+#define   G4X_ELD_ACK			(1 << 4)
 #define G4X_HDMIW_HDMIEDID		0x6210C
 
-#define IBX_HDMIW_HDMIEDID_A		0xE2050
-#define IBX_HDMIW_HDMIEDID_B		0xE2150
+#define _IBX_HDMIW_HDMIEDID_A		0xE2050
+#define _IBX_HDMIW_HDMIEDID_B		0xE2150
 #define IBX_HDMIW_HDMIEDID(pipe) _PIPE(pipe, \
-					IBX_HDMIW_HDMIEDID_A, \
-					IBX_HDMIW_HDMIEDID_B)
-#define IBX_AUD_CNTL_ST_A		0xE20B4
-#define IBX_AUD_CNTL_ST_B		0xE21B4
+					_IBX_HDMIW_HDMIEDID_A, \
+					_IBX_HDMIW_HDMIEDID_B)
+#define _IBX_AUD_CNTL_ST_A		0xE20B4
+#define _IBX_AUD_CNTL_ST_B		0xE21B4
 #define IBX_AUD_CNTL_ST(pipe) _PIPE(pipe, \
-					IBX_AUD_CNTL_ST_A, \
-					IBX_AUD_CNTL_ST_B)
-#define IBX_ELD_BUFFER_SIZE		(0x1f << 10)
-#define IBX_ELD_ADDRESS			(0x1f << 5)
-#define IBX_ELD_ACK			(1 << 4)
+					_IBX_AUD_CNTL_ST_A, \
+					_IBX_AUD_CNTL_ST_B)
+#define   IBX_ELD_BUFFER_SIZE_MASK	(0x1f << 10)
+#define   IBX_ELD_ADDRESS_MASK		(0x1f << 5)
+#define   IBX_ELD_ACK			(1 << 4)
 #define IBX_AUD_CNTL_ST2		0xE20C0
-#define IBX_ELD_VALIDB			(1 << 0)
-#define IBX_CP_READYB			(1 << 1)
+#define   IBX_CP_READY(port)		((1 << 1) << (((port) - 1) * 4))
+#define   IBX_ELD_VALID(port)		((1 << 0) << (((port) - 1) * 4))
 
-#define CPT_HDMIW_HDMIEDID_A		0xE5050
-#define CPT_HDMIW_HDMIEDID_B		0xE5150
+#define _CPT_HDMIW_HDMIEDID_A		0xE5050
+#define _CPT_HDMIW_HDMIEDID_B		0xE5150
 #define CPT_HDMIW_HDMIEDID(pipe) _PIPE(pipe, \
-					CPT_HDMIW_HDMIEDID_A, \
-					CPT_HDMIW_HDMIEDID_B)
-#define CPT_AUD_CNTL_ST_A		0xE50B4
-#define CPT_AUD_CNTL_ST_B		0xE51B4
+					_CPT_HDMIW_HDMIEDID_A, \
+					_CPT_HDMIW_HDMIEDID_B)
+#define _CPT_AUD_CNTL_ST_A		0xE50B4
+#define _CPT_AUD_CNTL_ST_B		0xE51B4
 #define CPT_AUD_CNTL_ST(pipe) _PIPE(pipe, \
-					CPT_AUD_CNTL_ST_A, \
-					CPT_AUD_CNTL_ST_B)
+					_CPT_AUD_CNTL_ST_A, \
+					_CPT_AUD_CNTL_ST_B)
 #define CPT_AUD_CNTRL_ST2		0xE50C0
 
-#define VLV_HDMIW_HDMIEDID_A		(VLV_DISPLAY_BASE + 0x62050)
-#define VLV_HDMIW_HDMIEDID_B		(VLV_DISPLAY_BASE + 0x62150)
+#define _VLV_HDMIW_HDMIEDID_A		(VLV_DISPLAY_BASE + 0x62050)
+#define _VLV_HDMIW_HDMIEDID_B		(VLV_DISPLAY_BASE + 0x62150)
 #define VLV_HDMIW_HDMIEDID(pipe) _PIPE(pipe, \
-					VLV_HDMIW_HDMIEDID_A, \
-					VLV_HDMIW_HDMIEDID_B)
-#define VLV_AUD_CNTL_ST_A		(VLV_DISPLAY_BASE + 0x620B4)
-#define VLV_AUD_CNTL_ST_B		(VLV_DISPLAY_BASE + 0x621B4)
+					_VLV_HDMIW_HDMIEDID_A, \
+					_VLV_HDMIW_HDMIEDID_B)
+#define _VLV_AUD_CNTL_ST_A		(VLV_DISPLAY_BASE + 0x620B4)
+#define _VLV_AUD_CNTL_ST_B		(VLV_DISPLAY_BASE + 0x621B4)
 #define VLV_AUD_CNTL_ST(pipe) _PIPE(pipe, \
-					VLV_AUD_CNTL_ST_A, \
-					VLV_AUD_CNTL_ST_B)
+					_VLV_AUD_CNTL_ST_A, \
+					_VLV_AUD_CNTL_ST_B)
 #define VLV_AUD_CNTL_ST2		(VLV_DISPLAY_BASE + 0x620C0)
 
 /* These are the 4 32-bit write offset registers for each stream
@@ -5825,28 +6179,28 @@ enum punit_power_well {
  */
 #define GEN7_SO_WRITE_OFFSET(n)		(0x5280 + (n) * 4)
 
-#define IBX_AUD_CONFIG_A			0xe2000
-#define IBX_AUD_CONFIG_B			0xe2100
+#define _IBX_AUD_CONFIG_A		0xe2000
+#define _IBX_AUD_CONFIG_B		0xe2100
 #define IBX_AUD_CFG(pipe) _PIPE(pipe, \
-					IBX_AUD_CONFIG_A, \
-					IBX_AUD_CONFIG_B)
-#define CPT_AUD_CONFIG_A			0xe5000
-#define CPT_AUD_CONFIG_B			0xe5100
+					_IBX_AUD_CONFIG_A, \
+					_IBX_AUD_CONFIG_B)
+#define _CPT_AUD_CONFIG_A		0xe5000
+#define _CPT_AUD_CONFIG_B		0xe5100
 #define CPT_AUD_CFG(pipe) _PIPE(pipe, \
-					CPT_AUD_CONFIG_A, \
-					CPT_AUD_CONFIG_B)
-#define VLV_AUD_CONFIG_A		(VLV_DISPLAY_BASE + 0x62000)
-#define VLV_AUD_CONFIG_B		(VLV_DISPLAY_BASE + 0x62100)
+					_CPT_AUD_CONFIG_A, \
+					_CPT_AUD_CONFIG_B)
+#define _VLV_AUD_CONFIG_A		(VLV_DISPLAY_BASE + 0x62000)
+#define _VLV_AUD_CONFIG_B		(VLV_DISPLAY_BASE + 0x62100)
 #define VLV_AUD_CFG(pipe) _PIPE(pipe, \
-					VLV_AUD_CONFIG_A, \
-					VLV_AUD_CONFIG_B)
+					_VLV_AUD_CONFIG_A, \
+					_VLV_AUD_CONFIG_B)
 
 #define   AUD_CONFIG_N_VALUE_INDEX		(1 << 29)
 #define   AUD_CONFIG_N_PROG_ENABLE		(1 << 28)
 #define   AUD_CONFIG_UPPER_N_SHIFT		20
-#define   AUD_CONFIG_UPPER_N_VALUE		(0xff << 20)
+#define   AUD_CONFIG_UPPER_N_MASK		(0xff << 20)
 #define   AUD_CONFIG_LOWER_N_SHIFT		4
-#define   AUD_CONFIG_LOWER_N_VALUE		(0xfff << 4)
+#define   AUD_CONFIG_LOWER_N_MASK		(0xfff << 4)
 #define   AUD_CONFIG_PIXEL_CLOCK_HDMI_SHIFT	16
 #define   AUD_CONFIG_PIXEL_CLOCK_HDMI_MASK	(0xf << 16)
 #define   AUD_CONFIG_PIXEL_CLOCK_HDMI_25175	(0 << 16)
@@ -5862,52 +6216,44 @@ enum punit_power_well {
 #define   AUD_CONFIG_DISABLE_NCTS		(1 << 3)
 
 /* HSW Audio */
-#define   HSW_AUD_CONFIG_A		0x65000 /* Audio Configuration Transcoder A */
-#define   HSW_AUD_CONFIG_B		0x65100 /* Audio Configuration Transcoder B */
-#define   HSW_AUD_CFG(pipe) _PIPE(pipe, \
-					HSW_AUD_CONFIG_A, \
-					HSW_AUD_CONFIG_B)
-
-#define   HSW_AUD_MISC_CTRL_A		0x65010 /* Audio Misc Control Convert 1 */
-#define   HSW_AUD_MISC_CTRL_B		0x65110 /* Audio Misc Control Convert 2 */
-#define   HSW_AUD_MISC_CTRL(pipe) _PIPE(pipe, \
-					HSW_AUD_MISC_CTRL_A, \
-					HSW_AUD_MISC_CTRL_B)
-
-#define   HSW_AUD_DIP_ELD_CTRL_ST_A	0x650b4 /* Audio DIP and ELD Control State Transcoder A */
-#define   HSW_AUD_DIP_ELD_CTRL_ST_B	0x651b4 /* Audio DIP and ELD Control State Transcoder B */
-#define   HSW_AUD_DIP_ELD_CTRL(pipe) _PIPE(pipe, \
-					HSW_AUD_DIP_ELD_CTRL_ST_A, \
-					HSW_AUD_DIP_ELD_CTRL_ST_B)
+#define _HSW_AUD_CONFIG_A		0x65000
+#define _HSW_AUD_CONFIG_B		0x65100
+#define HSW_AUD_CFG(pipe) _PIPE(pipe, \
+					_HSW_AUD_CONFIG_A, \
+					_HSW_AUD_CONFIG_B)
+
+#define _HSW_AUD_MISC_CTRL_A		0x65010
+#define _HSW_AUD_MISC_CTRL_B		0x65110
+#define HSW_AUD_MISC_CTRL(pipe) _PIPE(pipe, \
+					_HSW_AUD_MISC_CTRL_A, \
+					_HSW_AUD_MISC_CTRL_B)
+
+#define _HSW_AUD_DIP_ELD_CTRL_ST_A	0x650b4
+#define _HSW_AUD_DIP_ELD_CTRL_ST_B	0x651b4
+#define HSW_AUD_DIP_ELD_CTRL(pipe) _PIPE(pipe, \
+					_HSW_AUD_DIP_ELD_CTRL_ST_A, \
+					_HSW_AUD_DIP_ELD_CTRL_ST_B)
 
 /* Audio Digital Converter */
-#define   HSW_AUD_DIG_CNVT_1		0x65080 /* Audio Converter 1 */
-#define   HSW_AUD_DIG_CNVT_2		0x65180 /* Audio Converter 1 */
-#define   AUD_DIG_CNVT(pipe) _PIPE(pipe, \
-					HSW_AUD_DIG_CNVT_1, \
-					HSW_AUD_DIG_CNVT_2)
-#define   DIP_PORT_SEL_MASK		0x3
-
-#define   HSW_AUD_EDID_DATA_A		0x65050
-#define   HSW_AUD_EDID_DATA_B		0x65150
-#define   HSW_AUD_EDID_DATA(pipe) _PIPE(pipe, \
-					HSW_AUD_EDID_DATA_A, \
-					HSW_AUD_EDID_DATA_B)
-
-#define   HSW_AUD_PIPE_CONV_CFG		0x6507c /* Audio pipe and converter configs */
-#define   HSW_AUD_PIN_ELD_CP_VLD	0x650c0 /* Audio ELD and CP Ready Status */
-#define   AUDIO_INACTIVE_C		(1<<11)
-#define   AUDIO_INACTIVE_B		(1<<7)
-#define   AUDIO_INACTIVE_A		(1<<3)
-#define   AUDIO_OUTPUT_ENABLE_A		(1<<2)
-#define   AUDIO_OUTPUT_ENABLE_B		(1<<6)
-#define   AUDIO_OUTPUT_ENABLE_C		(1<<10)
-#define   AUDIO_ELD_VALID_A		(1<<0)
-#define   AUDIO_ELD_VALID_B		(1<<4)
-#define   AUDIO_ELD_VALID_C		(1<<8)
-#define   AUDIO_CP_READY_A		(1<<1)
-#define   AUDIO_CP_READY_B		(1<<5)
-#define   AUDIO_CP_READY_C		(1<<9)
+#define _HSW_AUD_DIG_CNVT_1		0x65080
+#define _HSW_AUD_DIG_CNVT_2		0x65180
+#define AUD_DIG_CNVT(pipe) _PIPE(pipe, \
+					_HSW_AUD_DIG_CNVT_1, \
+					_HSW_AUD_DIG_CNVT_2)
+#define DIP_PORT_SEL_MASK		0x3
+
+#define _HSW_AUD_EDID_DATA_A		0x65050
+#define _HSW_AUD_EDID_DATA_B		0x65150
+#define HSW_AUD_EDID_DATA(pipe) _PIPE(pipe, \
+					_HSW_AUD_EDID_DATA_A, \
+					_HSW_AUD_EDID_DATA_B)
+
+#define HSW_AUD_PIPE_CONV_CFG		0x6507c
+#define HSW_AUD_PIN_ELD_CP_VLD		0x650c0
+#define   AUDIO_INACTIVE(trans)		((1 << 3) << ((trans) * 4))
+#define   AUDIO_OUTPUT_ENABLE(trans)	((1 << 2) << ((trans) * 4))
+#define   AUDIO_CP_READY(trans)		((1 << 1) << ((trans) * 4))
+#define   AUDIO_ELD_VALID(trans)	((1 << 0) << ((trans) * 4))
 
 /* HSW Power Wells */
 #define HSW_PWR_WELL_BIOS			0x45400 /* CTL1 */
@@ -6125,6 +6471,83 @@ enum punit_power_well {
 #define  LCPLL_CD_SOURCE_FCLK		(1<<21)
 #define  LCPLL_CD_SOURCE_FCLK_DONE	(1<<19)
 
+/*
+ * SKL Clocks
+ */
+
+/* CDCLK_CTL */
+#define CDCLK_CTL			0x46000
+#define  CDCLK_FREQ_SEL_MASK		(3<<26)
+#define  CDCLK_FREQ_450_432		(0<<26)
+#define  CDCLK_FREQ_540			(1<<26)
+#define  CDCLK_FREQ_337_308		(2<<26)
+#define  CDCLK_FREQ_675_617		(3<<26)
+#define  CDCLK_FREQ_DECIMAL_MASK	(0x7ff)
+
+/* LCPLL_CTL */
+#define LCPLL1_CTL		0x46010
+#define LCPLL2_CTL		0x46014
+#define  LCPLL_PLL_ENABLE	(1<<31)
+
+/* DPLL control1 */
+#define DPLL_CTRL1		0x6C058
+#define  DPLL_CTRL1_HDMI_MODE(id)		(1<<((id)*6+5))
+#define  DPLL_CTRL1_SSC(id)			(1<<((id)*6+4))
+#define  DPLL_CRTL1_LINK_RATE_MASK(id)		(7<<((id)*6+1))
+#define  DPLL_CRTL1_LINK_RATE_SHIFT(id)		((id)*6+1)
+#define  DPLL_CRTL1_LINK_RATE(linkrate, id)	((linkrate)<<((id)*6+1))
+#define  DPLL_CTRL1_OVERRIDE(id)		(1<<((id)*6))
+#define  DPLL_CRTL1_LINK_RATE_2700		0
+#define  DPLL_CRTL1_LINK_RATE_1350		1
+#define  DPLL_CRTL1_LINK_RATE_810		2
+#define  DPLL_CRTL1_LINK_RATE_1620		3
+#define  DPLL_CRTL1_LINK_RATE_1080		4
+#define  DPLL_CRTL1_LINK_RATE_2160		5
+
+/* DPLL control2 */
+#define DPLL_CTRL2				0x6C05C
+#define  DPLL_CTRL2_DDI_CLK_OFF(port)		(1<<(port+15))
+#define  DPLL_CTRL2_DDI_CLK_SEL_MASK(port)	(3<<((port)*3+1))
+#define  DPLL_CTRL2_DDI_CLK_SEL_SHIFT(port)    ((port)*3+1)
+#define  DPLL_CTRL2_DDI_CLK_SEL(clk, port)	(clk<<((port)*3+1))
+#define  DPLL_CTRL2_DDI_SEL_OVERRIDE(port)     (1<<((port)*3))
+
+/* DPLL Status */
+#define DPLL_STATUS	0x6C060
+#define  DPLL_LOCK(id) (1<<((id)*8))
+
+/* DPLL cfg */
+#define DPLL1_CFGCR1	0x6C040
+#define DPLL2_CFGCR1	0x6C048
+#define DPLL3_CFGCR1	0x6C050
+#define  DPLL_CFGCR1_FREQ_ENABLE	(1<<31)
+#define  DPLL_CFGCR1_DCO_FRACTION_MASK	(0x7fff<<9)
+#define  DPLL_CFGCR1_DCO_FRACTION(x)	(x<<9)
+#define  DPLL_CFGCR1_DCO_INTEGER_MASK	(0x1ff)
+
+#define DPLL1_CFGCR2	0x6C044
+#define DPLL2_CFGCR2	0x6C04C
+#define DPLL3_CFGCR2	0x6C054
+#define  DPLL_CFGCR2_QDIV_RATIO_MASK	(0xff<<8)
+#define  DPLL_CFGCR2_QDIV_RATIO(x)	(x<<8)
+#define  DPLL_CFGCR2_QDIV_MODE(x)	(x<<7)
+#define  DPLL_CFGCR2_KDIV_MASK		(3<<5)
+#define  DPLL_CFGCR2_KDIV(x)		(x<<5)
+#define  DPLL_CFGCR2_KDIV_5 (0<<5)
+#define  DPLL_CFGCR2_KDIV_2 (1<<5)
+#define  DPLL_CFGCR2_KDIV_3 (2<<5)
+#define  DPLL_CFGCR2_KDIV_1 (3<<5)
+#define  DPLL_CFGCR2_PDIV_MASK		(7<<2)
+#define  DPLL_CFGCR2_PDIV(x)		(x<<2)
+#define  DPLL_CFGCR2_PDIV_1 (0<<2)
+#define  DPLL_CFGCR2_PDIV_2 (1<<2)
+#define  DPLL_CFGCR2_PDIV_3 (2<<2)
+#define  DPLL_CFGCR2_PDIV_7 (4<<2)
+#define  DPLL_CFGCR2_CENTRAL_FREQ_MASK	(3)
+
+#define GET_CFG_CR1_REG(id) (DPLL1_CFGCR1 + (id - SKL_DPLL1) * 8)
+#define GET_CFG_CR2_REG(id) (DPLL1_CFGCR2 + (id - SKL_DPLL1) * 8)
+
 /* Please see hsw_read_dcomp() and hsw_write_dcomp() before using this register,
  * since on HSW we can't write to it using I915_WRITE. */
 #define D_COMP_HSW			(MCHBAR_MIRROR_BASE_SNB + 0x5F0C)
diff --git a/drivers/gpu/drm/i915/i915_suspend.c b/drivers/gpu/drm/i915/i915_suspend.c
index 043123c77a1f..26368822a33f 100644
--- a/drivers/gpu/drm/i915/i915_suspend.c
+++ b/drivers/gpu/drm/i915/i915_suspend.c
@@ -203,34 +203,19 @@ static void i915_save_display(struct drm_device *dev)
 		i915_save_display_reg(dev);
 
 	/* LVDS state */
-	if (HAS_PCH_SPLIT(dev)) {
-		dev_priv->regfile.savePP_CONTROL = I915_READ(PCH_PP_CONTROL);
-		if (HAS_PCH_IBX(dev) || HAS_PCH_CPT(dev))
-			dev_priv->regfile.saveLVDS = I915_READ(PCH_LVDS);
-	} else if (IS_VALLEYVIEW(dev)) {
-		dev_priv->regfile.savePP_CONTROL = I915_READ(PP_CONTROL);
-		dev_priv->regfile.savePFIT_PGM_RATIOS = I915_READ(PFIT_PGM_RATIOS);
-
-		dev_priv->regfile.saveBLC_HIST_CTL =
-			I915_READ(VLV_BLC_HIST_CTL(PIPE_A));
-		dev_priv->regfile.saveBLC_HIST_CTL_B =
-			I915_READ(VLV_BLC_HIST_CTL(PIPE_B));
-	} else {
-		dev_priv->regfile.savePP_CONTROL = I915_READ(PP_CONTROL);
-		dev_priv->regfile.savePFIT_PGM_RATIOS = I915_READ(PFIT_PGM_RATIOS);
-		dev_priv->regfile.saveBLC_HIST_CTL = I915_READ(BLC_HIST_CTL);
-		if (IS_MOBILE(dev) && !IS_I830(dev))
-			dev_priv->regfile.saveLVDS = I915_READ(LVDS);
-	}
-
-	if (!IS_I830(dev) && !IS_845G(dev) && !HAS_PCH_SPLIT(dev))
-		dev_priv->regfile.savePFIT_CONTROL = I915_READ(PFIT_CONTROL);
+	if (HAS_PCH_IBX(dev) || HAS_PCH_CPT(dev))
+		dev_priv->regfile.saveLVDS = I915_READ(PCH_LVDS);
+	else if (INTEL_INFO(dev)->gen <= 4 && IS_MOBILE(dev) && !IS_I830(dev))
+		dev_priv->regfile.saveLVDS = I915_READ(LVDS);
 
+	/* Panel power sequencer */
 	if (HAS_PCH_SPLIT(dev)) {
+		dev_priv->regfile.savePP_CONTROL = I915_READ(PCH_PP_CONTROL);
 		dev_priv->regfile.savePP_ON_DELAYS = I915_READ(PCH_PP_ON_DELAYS);
 		dev_priv->regfile.savePP_OFF_DELAYS = I915_READ(PCH_PP_OFF_DELAYS);
 		dev_priv->regfile.savePP_DIVISOR = I915_READ(PCH_PP_DIVISOR);
-	} else {
+	} else if (!IS_VALLEYVIEW(dev)) {
+		dev_priv->regfile.savePP_CONTROL = I915_READ(PP_CONTROL);
 		dev_priv->regfile.savePP_ON_DELAYS = I915_READ(PP_ON_DELAYS);
 		dev_priv->regfile.savePP_OFF_DELAYS = I915_READ(PP_OFF_DELAYS);
 		dev_priv->regfile.savePP_DIVISOR = I915_READ(PP_DIVISOR);
@@ -259,29 +244,19 @@ static void i915_restore_display(struct drm_device *dev)
 	if (drm_core_check_feature(dev, DRIVER_MODESET))
 		mask = ~LVDS_PORT_EN;
 
+	/* LVDS state */
 	if (HAS_PCH_IBX(dev) || HAS_PCH_CPT(dev))
 		I915_WRITE(PCH_LVDS, dev_priv->regfile.saveLVDS & mask);
 	else if (INTEL_INFO(dev)->gen <= 4 && IS_MOBILE(dev) && !IS_I830(dev))
 		I915_WRITE(LVDS, dev_priv->regfile.saveLVDS & mask);
 
-	if (!IS_I830(dev) && !IS_845G(dev) && !HAS_PCH_SPLIT(dev))
-		I915_WRITE(PFIT_CONTROL, dev_priv->regfile.savePFIT_CONTROL);
-
+	/* Panel power sequencer */
 	if (HAS_PCH_SPLIT(dev)) {
 		I915_WRITE(PCH_PP_ON_DELAYS, dev_priv->regfile.savePP_ON_DELAYS);
 		I915_WRITE(PCH_PP_OFF_DELAYS, dev_priv->regfile.savePP_OFF_DELAYS);
 		I915_WRITE(PCH_PP_DIVISOR, dev_priv->regfile.savePP_DIVISOR);
 		I915_WRITE(PCH_PP_CONTROL, dev_priv->regfile.savePP_CONTROL);
-		I915_WRITE(RSTDBYCTL,
-			   dev_priv->regfile.saveMCHBAR_RENDER_STANDBY);
-	} else if (IS_VALLEYVIEW(dev)) {
-		I915_WRITE(VLV_BLC_HIST_CTL(PIPE_A),
-			   dev_priv->regfile.saveBLC_HIST_CTL);
-		I915_WRITE(VLV_BLC_HIST_CTL(PIPE_B),
-			   dev_priv->regfile.saveBLC_HIST_CTL);
-	} else {
-		I915_WRITE(PFIT_PGM_RATIOS, dev_priv->regfile.savePFIT_PGM_RATIOS);
-		I915_WRITE(BLC_HIST_CTL, dev_priv->regfile.saveBLC_HIST_CTL);
+	} else if (!IS_VALLEYVIEW(dev)) {
 		I915_WRITE(PP_ON_DELAYS, dev_priv->regfile.savePP_ON_DELAYS);
 		I915_WRITE(PP_OFF_DELAYS, dev_priv->regfile.savePP_OFF_DELAYS);
 		I915_WRITE(PP_DIVISOR, dev_priv->regfile.savePP_DIVISOR);
@@ -328,6 +303,10 @@ int i915_save_state(struct drm_device *dev)
 		}
 	}
 
+	if (IS_GEN4(dev))
+		pci_read_config_word(dev->pdev, GCDGMBUS,
+				     &dev_priv->regfile.saveGCDGMBUS);
+
 	/* Cache mode state */
 	if (INTEL_INFO(dev)->gen < 7)
 		dev_priv->regfile.saveCACHE_MODE_0 = I915_READ(CACHE_MODE_0);
@@ -356,6 +335,10 @@ int i915_restore_state(struct drm_device *dev)
 	mutex_lock(&dev->struct_mutex);
 
 	i915_gem_restore_fences(dev);
+
+	if (IS_GEN4(dev))
+		pci_write_config_word(dev->pdev, GCDGMBUS,
+				      dev_priv->regfile.saveGCDGMBUS);
 	i915_restore_display(dev);
 
 	if (!drm_core_check_feature(dev, DRIVER_MODESET)) {
@@ -368,6 +351,8 @@ int i915_restore_state(struct drm_device *dev)
 			I915_WRITE(_FDI_RXA_IMR, dev_priv->regfile.saveFDI_RXA_IMR);
 			I915_WRITE(_FDI_RXB_IMR, dev_priv->regfile.saveFDI_RXB_IMR);
 			I915_WRITE(PCH_PORT_HOTPLUG, dev_priv->regfile.savePCH_PORT_HOTPLUG);
+			I915_WRITE(RSTDBYCTL,
+				   dev_priv->regfile.saveMCHBAR_RENDER_STANDBY);
 		} else {
 			I915_WRITE(IER, dev_priv->regfile.saveIER);
 			I915_WRITE(IMR, dev_priv->regfile.saveIMR);
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index 503847f18fdd..4a5af695307e 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -139,8 +139,6 @@ static DEVICE_ATTR(rc6pp_residency_ms, S_IRUGO, show_rc6pp_ms, NULL);
 static struct attribute *rc6_attrs[] = {
 	&dev_attr_rc6_enable.attr,
 	&dev_attr_rc6_residency_ms.attr,
-	&dev_attr_rc6p_residency_ms.attr,
-	&dev_attr_rc6pp_residency_ms.attr,
 	NULL
 };
 
@@ -148,6 +146,17 @@ static struct attribute_group rc6_attr_group = {
 	.name = power_group_name,
 	.attrs =  rc6_attrs
 };
+
+static struct attribute *rc6p_attrs[] = {
+	&dev_attr_rc6p_residency_ms.attr,
+	&dev_attr_rc6pp_residency_ms.attr,
+	NULL
+};
+
+static struct attribute_group rc6p_attr_group = {
+	.name = power_group_name,
+	.attrs =  rc6p_attrs
+};
 #endif
 
 static int l3_access_valid(struct drm_device *dev, loff_t offset)
@@ -595,12 +604,18 @@ void i915_setup_sysfs(struct drm_device *dev)
 	int ret;
 
 #ifdef CONFIG_PM
-	if (INTEL_INFO(dev)->gen >= 6) {
+	if (HAS_RC6(dev)) {
 		ret = sysfs_merge_group(&dev->primary->kdev->kobj,
 					&rc6_attr_group);
 		if (ret)
 			DRM_ERROR("RC6 residency sysfs setup failed\n");
 	}
+	if (HAS_RC6p(dev)) {
+		ret = sysfs_merge_group(&dev->primary->kdev->kobj,
+					&rc6p_attr_group);
+		if (ret)
+			DRM_ERROR("RC6p residency sysfs setup failed\n");
+	}
 #endif
 	if (HAS_L3_DPF(dev)) {
 		ret = device_create_bin_file(dev->primary->kdev, &dpf_attrs);
@@ -640,5 +655,6 @@ void i915_teardown_sysfs(struct drm_device *dev)
 	device_remove_bin_file(dev->primary->kdev,  &dpf_attrs);
 #ifdef CONFIG_PM
 	sysfs_unmerge_group(&dev->primary->kdev->kobj, &rc6_attr_group);
+	sysfs_unmerge_group(&dev->primary->kdev->kobj, &rc6p_attr_group);
 #endif
 }
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index f5aa0067755a..751d4ad14d62 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -587,6 +587,110 @@ TRACE_EVENT(intel_gpu_freq_change,
 	    TP_printk("new_freq=%u", __entry->freq)
 );
 
+/**
+ * DOC: i915_ppgtt_create and i915_ppgtt_release tracepoints
+ *
+ * With full ppgtt enabled each process using drm will allocate at least one
+ * translation table. With these traces it is possible to keep track of the
+ * allocation and of the lifetime of the tables; this can be used during
+ * testing/debug to verify that we are not leaking ppgtts.
+ * These traces identify the ppgtt through the vm pointer, which is also printed
+ * by the i915_vma_bind and i915_vma_unbind tracepoints.
+ */
+DECLARE_EVENT_CLASS(i915_ppgtt,
+	TP_PROTO(struct i915_address_space *vm),
+	TP_ARGS(vm),
+
+	TP_STRUCT__entry(
+			__field(struct i915_address_space *, vm)
+			__field(u32, dev)
+	),
+
+	TP_fast_assign(
+			__entry->vm = vm;
+			__entry->dev = vm->dev->primary->index;
+	),
+
+	TP_printk("dev=%u, vm=%p", __entry->dev, __entry->vm)
+)
+
+DEFINE_EVENT(i915_ppgtt, i915_ppgtt_create,
+	TP_PROTO(struct i915_address_space *vm),
+	TP_ARGS(vm)
+);
+
+DEFINE_EVENT(i915_ppgtt, i915_ppgtt_release,
+	TP_PROTO(struct i915_address_space *vm),
+	TP_ARGS(vm)
+);
+
+/**
+ * DOC: i915_context_create and i915_context_free tracepoints
+ *
+ * These tracepoints are used to track creation and deletion of contexts.
+ * If full ppgtt is enabled, they also print the address of the vm assigned to
+ * the context.
+ */
+DECLARE_EVENT_CLASS(i915_context,
+	TP_PROTO(struct intel_context *ctx),
+	TP_ARGS(ctx),
+
+	TP_STRUCT__entry(
+			__field(u32, dev)
+			__field(struct intel_context *, ctx)
+			__field(struct i915_address_space *, vm)
+	),
+
+	TP_fast_assign(
+			__entry->ctx = ctx;
+			__entry->vm = ctx->ppgtt ? &ctx->ppgtt->base : NULL;
+			__entry->dev = ctx->file_priv->dev_priv->dev->primary->index;
+	),
+
+	TP_printk("dev=%u, ctx=%p, ctx_vm=%p",
+		  __entry->dev, __entry->ctx, __entry->vm)
+)
+
+DEFINE_EVENT(i915_context, i915_context_create,
+	TP_PROTO(struct intel_context *ctx),
+	TP_ARGS(ctx)
+);
+
+DEFINE_EVENT(i915_context, i915_context_free,
+	TP_PROTO(struct intel_context *ctx),
+	TP_ARGS(ctx)
+);
+
+/**
+ * DOC: switch_mm tracepoint
+ *
+ * This tracepoint allows tracking of the mm switch, which is an important point
+ * in the lifetime of the vm in the legacy submission path. This tracepoint is
+ * called only if full ppgtt is enabled.
+ */
+TRACE_EVENT(switch_mm,
+	TP_PROTO(struct intel_engine_cs *ring, struct intel_context *to),
+
+	TP_ARGS(ring, to),
+
+	TP_STRUCT__entry(
+			__field(u32, ring)
+			__field(struct intel_context *, to)
+			__field(struct i915_address_space *, vm)
+			__field(u32, dev)
+	),
+
+	TP_fast_assign(
+			__entry->ring = ring->id;
+			__entry->to = to;
+			__entry->vm = to->ppgtt? &to->ppgtt->base : NULL;
+			__entry->dev = ring->dev->primary->index;
+	),
+
+	TP_printk("dev=%u, ring=%u, ctx=%p, ctx_vm=%p",
+		  __entry->dev, __entry->ring, __entry->to, __entry->vm)
+);
+
 #endif /* _I915_TRACE_H_ */
 
 /* This part must be outside protection */
diff --git a/drivers/gpu/drm/i915/i915_ums.c b/drivers/gpu/drm/i915/i915_ums.c
index 480da593e6c0..d10fe3e9c49f 100644
--- a/drivers/gpu/drm/i915/i915_ums.c
+++ b/drivers/gpu/drm/i915/i915_ums.c
@@ -270,6 +270,12 @@ void i915_save_display_reg(struct drm_device *dev)
 	}
 	/* FIXME: regfile.save TV & SDVO state */
 
+	/* Panel fitter */
+	if (!IS_I830(dev) && !IS_845G(dev) && !HAS_PCH_SPLIT(dev)) {
+		dev_priv->regfile.savePFIT_CONTROL = I915_READ(PFIT_CONTROL);
+		dev_priv->regfile.savePFIT_PGM_RATIOS = I915_READ(PFIT_PGM_RATIOS);
+	}
+
 	/* Backlight */
 	if (INTEL_INFO(dev)->gen <= 4)
 		pci_read_config_byte(dev->pdev, PCI_LBPC,
@@ -284,6 +290,7 @@ void i915_save_display_reg(struct drm_device *dev)
 		dev_priv->regfile.saveBLC_PWM_CTL = I915_READ(BLC_PWM_CTL);
 		if (INTEL_INFO(dev)->gen >= 4)
 			dev_priv->regfile.saveBLC_PWM_CTL2 = I915_READ(BLC_PWM_CTL2);
+		dev_priv->regfile.saveBLC_HIST_CTL = I915_READ(BLC_HIST_CTL);
 	}
 
 	return;
@@ -313,6 +320,13 @@ void i915_restore_display_reg(struct drm_device *dev)
 		if (INTEL_INFO(dev)->gen >= 4)
 			I915_WRITE(BLC_PWM_CTL2, dev_priv->regfile.saveBLC_PWM_CTL2);
 		I915_WRITE(BLC_PWM_CTL, dev_priv->regfile.saveBLC_PWM_CTL);
+		I915_WRITE(BLC_HIST_CTL, dev_priv->regfile.saveBLC_HIST_CTL);
+	}
+
+	/* Panel fitter */
+	if (!IS_I830(dev) && !IS_845G(dev) && !HAS_PCH_SPLIT(dev)) {
+		I915_WRITE(PFIT_PGM_RATIOS, dev_priv->regfile.savePFIT_PGM_RATIOS);
+		I915_WRITE(PFIT_CONTROL, dev_priv->regfile.savePFIT_CONTROL);
 	}
 
 	/* Display port ratios (must be done before clock is set) */
diff --git a/drivers/gpu/drm/i915/intel_audio.c b/drivers/gpu/drm/i915/intel_audio.c
new file mode 100644
index 000000000000..2c7ed5cb29c0
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_audio.c
@@ -0,0 +1,463 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/kernel.h>
+
+#include <drm/drmP.h>
+#include <drm/drm_edid.h>
+#include "intel_drv.h"
+#include "i915_drv.h"
+
+/**
+ * DOC: High Definition Audio over HDMI and Display Port
+ *
+ * The graphics and audio drivers together support High Definition Audio over
+ * HDMI and Display Port. The audio programming sequences are divided into audio
+ * codec and controller enable and disable sequences. The graphics driver
+ * handles the audio codec sequences, while the audio driver handles the audio
+ * controller sequences.
+ *
+ * The disable sequences must be performed before disabling the transcoder or
+ * port. The enable sequences may only be performed after enabling the
+ * transcoder and port, and after completed link training.
+ *
+ * The codec and controller sequences could be done either parallel or serial,
+ * but generally the ELDV/PD change in the codec sequence indicates to the audio
+ * driver that the controller sequence should start. Indeed, most of the
+ * co-operation between the graphics and audio drivers is handled via audio
+ * related registers. (The notable exception is the power management, not
+ * covered here.)
+ */
+
+static const struct {
+	int clock;
+	u32 config;
+} hdmi_audio_clock[] = {
+	{ DIV_ROUND_UP(25200 * 1000, 1001), AUD_CONFIG_PIXEL_CLOCK_HDMI_25175 },
+	{ 25200, AUD_CONFIG_PIXEL_CLOCK_HDMI_25200 }, /* default per bspec */
+	{ 27000, AUD_CONFIG_PIXEL_CLOCK_HDMI_27000 },
+	{ 27000 * 1001 / 1000, AUD_CONFIG_PIXEL_CLOCK_HDMI_27027 },
+	{ 54000, AUD_CONFIG_PIXEL_CLOCK_HDMI_54000 },
+	{ 54000 * 1001 / 1000, AUD_CONFIG_PIXEL_CLOCK_HDMI_54054 },
+	{ DIV_ROUND_UP(74250 * 1000, 1001), AUD_CONFIG_PIXEL_CLOCK_HDMI_74176 },
+	{ 74250, AUD_CONFIG_PIXEL_CLOCK_HDMI_74250 },
+	{ DIV_ROUND_UP(148500 * 1000, 1001), AUD_CONFIG_PIXEL_CLOCK_HDMI_148352 },
+	{ 148500, AUD_CONFIG_PIXEL_CLOCK_HDMI_148500 },
+};
+
+/* get AUD_CONFIG_PIXEL_CLOCK_HDMI_* value for mode */
+static u32 audio_config_hdmi_pixel_clock(struct drm_display_mode *mode)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(hdmi_audio_clock); i++) {
+		if (mode->clock == hdmi_audio_clock[i].clock)
+			break;
+	}
+
+	if (i == ARRAY_SIZE(hdmi_audio_clock)) {
+		DRM_DEBUG_KMS("HDMI audio pixel clock setting for %d not found, falling back to defaults\n", mode->clock);
+		i = 1;
+	}
+
+	DRM_DEBUG_KMS("Configuring HDMI audio for pixel clock %d (0x%08x)\n",
+		      hdmi_audio_clock[i].clock,
+		      hdmi_audio_clock[i].config);
+
+	return hdmi_audio_clock[i].config;
+}
+
+static bool intel_eld_uptodate(struct drm_connector *connector,
+			       int reg_eldv, uint32_t bits_eldv,
+			       int reg_elda, uint32_t bits_elda,
+			       int reg_edid)
+{
+	struct drm_i915_private *dev_priv = connector->dev->dev_private;
+	uint8_t *eld = connector->eld;
+	uint32_t tmp;
+	int i;
+
+	tmp = I915_READ(reg_eldv);
+	tmp &= bits_eldv;
+
+	if (!tmp)
+		return false;
+
+	tmp = I915_READ(reg_elda);
+	tmp &= ~bits_elda;
+	I915_WRITE(reg_elda, tmp);
+
+	for (i = 0; i < drm_eld_size(eld) / 4; i++)
+		if (I915_READ(reg_edid) != *((uint32_t *)eld + i))
+			return false;
+
+	return true;
+}
+
+static void g4x_audio_codec_disable(struct intel_encoder *encoder)
+{
+	struct drm_i915_private *dev_priv = encoder->base.dev->dev_private;
+	uint32_t eldv, tmp;
+
+	DRM_DEBUG_KMS("Disable audio codec\n");
+
+	tmp = I915_READ(G4X_AUD_VID_DID);
+	if (tmp == INTEL_AUDIO_DEVBLC || tmp == INTEL_AUDIO_DEVCL)
+		eldv = G4X_ELDV_DEVCL_DEVBLC;
+	else
+		eldv = G4X_ELDV_DEVCTG;
+
+	/* Invalidate ELD */
+	tmp = I915_READ(G4X_AUD_CNTL_ST);
+	tmp &= ~eldv;
+	I915_WRITE(G4X_AUD_CNTL_ST, tmp);
+}
+
+static void g4x_audio_codec_enable(struct drm_connector *connector,
+				   struct intel_encoder *encoder,
+				   struct drm_display_mode *mode)
+{
+	struct drm_i915_private *dev_priv = connector->dev->dev_private;
+	uint8_t *eld = connector->eld;
+	uint32_t eldv;
+	uint32_t tmp;
+	int len, i;
+
+	DRM_DEBUG_KMS("Enable audio codec, %u bytes ELD\n", eld[2]);
+
+	tmp = I915_READ(G4X_AUD_VID_DID);
+	if (tmp == INTEL_AUDIO_DEVBLC || tmp == INTEL_AUDIO_DEVCL)
+		eldv = G4X_ELDV_DEVCL_DEVBLC;
+	else
+		eldv = G4X_ELDV_DEVCTG;
+
+	if (intel_eld_uptodate(connector,
+			       G4X_AUD_CNTL_ST, eldv,
+			       G4X_AUD_CNTL_ST, G4X_ELD_ADDR_MASK,
+			       G4X_HDMIW_HDMIEDID))
+		return;
+
+	tmp = I915_READ(G4X_AUD_CNTL_ST);
+	tmp &= ~(eldv | G4X_ELD_ADDR_MASK);
+	len = (tmp >> 9) & 0x1f;		/* ELD buffer size */
+	I915_WRITE(G4X_AUD_CNTL_ST, tmp);
+
+	len = min(drm_eld_size(eld) / 4, len);
+	DRM_DEBUG_DRIVER("ELD size %d\n", len);
+	for (i = 0; i < len; i++)
+		I915_WRITE(G4X_HDMIW_HDMIEDID, *((uint32_t *)eld + i));
+
+	tmp = I915_READ(G4X_AUD_CNTL_ST);
+	tmp |= eldv;
+	I915_WRITE(G4X_AUD_CNTL_ST, tmp);
+}
+
+static void hsw_audio_codec_disable(struct intel_encoder *encoder)
+{
+	struct drm_i915_private *dev_priv = encoder->base.dev->dev_private;
+	struct intel_crtc *intel_crtc = to_intel_crtc(encoder->base.crtc);
+	enum pipe pipe = intel_crtc->pipe;
+	uint32_t tmp;
+
+	DRM_DEBUG_KMS("Disable audio codec on pipe %c\n", pipe_name(pipe));
+
+	/* Disable timestamps */
+	tmp = I915_READ(HSW_AUD_CFG(pipe));
+	tmp &= ~AUD_CONFIG_N_VALUE_INDEX;
+	tmp |= AUD_CONFIG_N_PROG_ENABLE;
+	tmp &= ~AUD_CONFIG_UPPER_N_MASK;
+	tmp &= ~AUD_CONFIG_LOWER_N_MASK;
+	if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT))
+		tmp |= AUD_CONFIG_N_VALUE_INDEX;
+	I915_WRITE(HSW_AUD_CFG(pipe), tmp);
+
+	/* Invalidate ELD */
+	tmp = I915_READ(HSW_AUD_PIN_ELD_CP_VLD);
+	tmp &= ~AUDIO_ELD_VALID(pipe);
+	tmp &= ~AUDIO_OUTPUT_ENABLE(pipe);
+	I915_WRITE(HSW_AUD_PIN_ELD_CP_VLD, tmp);
+}
+
+static void hsw_audio_codec_enable(struct drm_connector *connector,
+				   struct intel_encoder *encoder,
+				   struct drm_display_mode *mode)
+{
+	struct drm_i915_private *dev_priv = connector->dev->dev_private;
+	struct intel_crtc *intel_crtc = to_intel_crtc(encoder->base.crtc);
+	enum pipe pipe = intel_crtc->pipe;
+	const uint8_t *eld = connector->eld;
+	uint32_t tmp;
+	int len, i;
+
+	DRM_DEBUG_KMS("Enable audio codec on pipe %c, %u bytes ELD\n",
+		      pipe_name(pipe), drm_eld_size(eld));
+
+	/* Enable audio presence detect, invalidate ELD */
+	tmp = I915_READ(HSW_AUD_PIN_ELD_CP_VLD);
+	tmp |= AUDIO_OUTPUT_ENABLE(pipe);
+	tmp &= ~AUDIO_ELD_VALID(pipe);
+	I915_WRITE(HSW_AUD_PIN_ELD_CP_VLD, tmp);
+
+	/*
+	 * FIXME: We're supposed to wait for vblank here, but we have vblanks
+	 * disabled during the mode set. The proper fix would be to push the
+	 * rest of the setup into a vblank work item, queued here, but the
+	 * infrastructure is not there yet.
+	 */
+
+	/* Reset ELD write address */
+	tmp = I915_READ(HSW_AUD_DIP_ELD_CTRL(pipe));
+	tmp &= ~IBX_ELD_ADDRESS_MASK;
+	I915_WRITE(HSW_AUD_DIP_ELD_CTRL(pipe), tmp);
+
+	/* Up to 84 bytes of hw ELD buffer */
+	len = min(drm_eld_size(eld), 84);
+	for (i = 0; i < len / 4; i++)
+		I915_WRITE(HSW_AUD_EDID_DATA(pipe), *((uint32_t *)eld + i));
+
+	/* ELD valid */
+	tmp = I915_READ(HSW_AUD_PIN_ELD_CP_VLD);
+	tmp |= AUDIO_ELD_VALID(pipe);
+	I915_WRITE(HSW_AUD_PIN_ELD_CP_VLD, tmp);
+
+	/* Enable timestamps */
+	tmp = I915_READ(HSW_AUD_CFG(pipe));
+	tmp &= ~AUD_CONFIG_N_VALUE_INDEX;
+	tmp &= ~AUD_CONFIG_N_PROG_ENABLE;
+	tmp &= ~AUD_CONFIG_PIXEL_CLOCK_HDMI_MASK;
+	if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT))
+		tmp |= AUD_CONFIG_N_VALUE_INDEX;
+	else
+		tmp |= audio_config_hdmi_pixel_clock(mode);
+	I915_WRITE(HSW_AUD_CFG(pipe), tmp);
+}
+
+static void ilk_audio_codec_disable(struct intel_encoder *encoder)
+{
+	struct drm_i915_private *dev_priv = encoder->base.dev->dev_private;
+	struct intel_crtc *intel_crtc = to_intel_crtc(encoder->base.crtc);
+	struct intel_digital_port *intel_dig_port =
+		enc_to_dig_port(&encoder->base);
+	enum port port = intel_dig_port->port;
+	enum pipe pipe = intel_crtc->pipe;
+	uint32_t tmp, eldv;
+	int aud_config;
+	int aud_cntrl_st2;
+
+	DRM_DEBUG_KMS("Disable audio codec on port %c, pipe %c\n",
+		      port_name(port), pipe_name(pipe));
+
+	if (HAS_PCH_IBX(dev_priv->dev)) {
+		aud_config = IBX_AUD_CFG(pipe);
+		aud_cntrl_st2 = IBX_AUD_CNTL_ST2;
+	} else if (IS_VALLEYVIEW(dev_priv)) {
+		aud_config = VLV_AUD_CFG(pipe);
+		aud_cntrl_st2 = VLV_AUD_CNTL_ST2;
+	} else {
+		aud_config = CPT_AUD_CFG(pipe);
+		aud_cntrl_st2 = CPT_AUD_CNTRL_ST2;
+	}
+
+	/* Disable timestamps */
+	tmp = I915_READ(aud_config);
+	tmp &= ~AUD_CONFIG_N_VALUE_INDEX;
+	tmp |= AUD_CONFIG_N_PROG_ENABLE;
+	tmp &= ~AUD_CONFIG_UPPER_N_MASK;
+	tmp &= ~AUD_CONFIG_LOWER_N_MASK;
+	if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT))
+		tmp |= AUD_CONFIG_N_VALUE_INDEX;
+	I915_WRITE(aud_config, tmp);
+
+	if (WARN_ON(!port)) {
+		eldv = IBX_ELD_VALID(PORT_B) | IBX_ELD_VALID(PORT_C) |
+			IBX_ELD_VALID(PORT_D);
+	} else {
+		eldv = IBX_ELD_VALID(port);
+	}
+
+	/* Invalidate ELD */
+	tmp = I915_READ(aud_cntrl_st2);
+	tmp &= ~eldv;
+	I915_WRITE(aud_cntrl_st2, tmp);
+}
+
+static void ilk_audio_codec_enable(struct drm_connector *connector,
+				   struct intel_encoder *encoder,
+				   struct drm_display_mode *mode)
+{
+	struct drm_i915_private *dev_priv = connector->dev->dev_private;
+	struct intel_crtc *intel_crtc = to_intel_crtc(encoder->base.crtc);
+	struct intel_digital_port *intel_dig_port =
+		enc_to_dig_port(&encoder->base);
+	enum port port = intel_dig_port->port;
+	enum pipe pipe = intel_crtc->pipe;
+	uint8_t *eld = connector->eld;
+	uint32_t eldv;
+	uint32_t tmp;
+	int len, i;
+	int hdmiw_hdmiedid;
+	int aud_config;
+	int aud_cntl_st;
+	int aud_cntrl_st2;
+
+	DRM_DEBUG_KMS("Enable audio codec on port %c, pipe %c, %u bytes ELD\n",
+		      port_name(port), pipe_name(pipe), drm_eld_size(eld));
+
+	/*
+	 * FIXME: We're supposed to wait for vblank here, but we have vblanks
+	 * disabled during the mode set. The proper fix would be to push the
+	 * rest of the setup into a vblank work item, queued here, but the
+	 * infrastructure is not there yet.
+	 */
+
+	if (HAS_PCH_IBX(connector->dev)) {
+		hdmiw_hdmiedid = IBX_HDMIW_HDMIEDID(pipe);
+		aud_config = IBX_AUD_CFG(pipe);
+		aud_cntl_st = IBX_AUD_CNTL_ST(pipe);
+		aud_cntrl_st2 = IBX_AUD_CNTL_ST2;
+	} else if (IS_VALLEYVIEW(connector->dev)) {
+		hdmiw_hdmiedid = VLV_HDMIW_HDMIEDID(pipe);
+		aud_config = VLV_AUD_CFG(pipe);
+		aud_cntl_st = VLV_AUD_CNTL_ST(pipe);
+		aud_cntrl_st2 = VLV_AUD_CNTL_ST2;
+	} else {
+		hdmiw_hdmiedid = CPT_HDMIW_HDMIEDID(pipe);
+		aud_config = CPT_AUD_CFG(pipe);
+		aud_cntl_st = CPT_AUD_CNTL_ST(pipe);
+		aud_cntrl_st2 = CPT_AUD_CNTRL_ST2;
+	}
+
+	if (WARN_ON(!port)) {
+		eldv = IBX_ELD_VALID(PORT_B) | IBX_ELD_VALID(PORT_C) |
+			IBX_ELD_VALID(PORT_D);
+	} else {
+		eldv = IBX_ELD_VALID(port);
+	}
+
+	/* Invalidate ELD */
+	tmp = I915_READ(aud_cntrl_st2);
+	tmp &= ~eldv;
+	I915_WRITE(aud_cntrl_st2, tmp);
+
+	/* Reset ELD write address */
+	tmp = I915_READ(aud_cntl_st);
+	tmp &= ~IBX_ELD_ADDRESS_MASK;
+	I915_WRITE(aud_cntl_st, tmp);
+
+	/* Up to 84 bytes of hw ELD buffer */
+	len = min(drm_eld_size(eld), 84);
+	for (i = 0; i < len / 4; i++)
+		I915_WRITE(hdmiw_hdmiedid, *((uint32_t *)eld + i));
+
+	/* ELD valid */
+	tmp = I915_READ(aud_cntrl_st2);
+	tmp |= eldv;
+	I915_WRITE(aud_cntrl_st2, tmp);
+
+	/* Enable timestamps */
+	tmp = I915_READ(aud_config);
+	tmp &= ~AUD_CONFIG_N_VALUE_INDEX;
+	tmp &= ~AUD_CONFIG_N_PROG_ENABLE;
+	tmp &= ~AUD_CONFIG_PIXEL_CLOCK_HDMI_MASK;
+	if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT))
+		tmp |= AUD_CONFIG_N_VALUE_INDEX;
+	else
+		tmp |= audio_config_hdmi_pixel_clock(mode);
+	I915_WRITE(aud_config, tmp);
+}
+
+/**
+ * intel_audio_codec_enable - Enable the audio codec for HD audio
+ * @intel_encoder: encoder on which to enable audio
+ *
+ * The enable sequences may only be performed after enabling the transcoder and
+ * port, and after completed link training.
+ */
+void intel_audio_codec_enable(struct intel_encoder *intel_encoder)
+{
+	struct drm_encoder *encoder = &intel_encoder->base;
+	struct intel_crtc *crtc = to_intel_crtc(encoder->crtc);
+	struct drm_display_mode *mode = &crtc->config.adjusted_mode;
+	struct drm_connector *connector;
+	struct drm_device *dev = encoder->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	connector = drm_select_eld(encoder, mode);
+	if (!connector)
+		return;
+
+	DRM_DEBUG_DRIVER("ELD on [CONNECTOR:%d:%s], [ENCODER:%d:%s]\n",
+			 connector->base.id,
+			 connector->name,
+			 connector->encoder->base.id,
+			 connector->encoder->name);
+
+	/* ELD Conn_Type */
+	connector->eld[5] &= ~(3 << 2);
+	if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DISPLAYPORT))
+		connector->eld[5] |= (1 << 2);
+
+	connector->eld[6] = drm_av_sync_delay(connector, mode) / 2;
+
+	if (dev_priv->display.audio_codec_enable)
+		dev_priv->display.audio_codec_enable(connector, intel_encoder, mode);
+}
+
+/**
+ * intel_audio_codec_disable - Disable the audio codec for HD audio
+ * @encoder: encoder on which to disable audio
+ *
+ * The disable sequences must be performed before disabling the transcoder or
+ * port.
+ */
+void intel_audio_codec_disable(struct intel_encoder *encoder)
+{
+	struct drm_device *dev = encoder->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (dev_priv->display.audio_codec_disable)
+		dev_priv->display.audio_codec_disable(encoder);
+}
+
+/**
+ * intel_init_audio - Set up chip specific audio functions
+ * @dev: drm device
+ */
+void intel_init_audio(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (IS_G4X(dev)) {
+		dev_priv->display.audio_codec_enable = g4x_audio_codec_enable;
+		dev_priv->display.audio_codec_disable = g4x_audio_codec_disable;
+	} else if (IS_VALLEYVIEW(dev)) {
+		dev_priv->display.audio_codec_enable = ilk_audio_codec_enable;
+		dev_priv->display.audio_codec_disable = ilk_audio_codec_disable;
+	} else if (IS_HASWELL(dev) || INTEL_INFO(dev)->gen >= 8) {
+		dev_priv->display.audio_codec_enable = hsw_audio_codec_enable;
+		dev_priv->display.audio_codec_disable = hsw_audio_codec_disable;
+	} else if (HAS_PCH_SPLIT(dev)) {
+		dev_priv->display.audio_codec_enable = ilk_audio_codec_enable;
+		dev_priv->display.audio_codec_disable = ilk_audio_codec_disable;
+	}
+}
diff --git a/drivers/gpu/drm/i915/intel_bios.h b/drivers/gpu/drm/i915/intel_bios.h
index 905999bee2ac..7603765c91fc 100644
--- a/drivers/gpu/drm/i915/intel_bios.h
+++ b/drivers/gpu/drm/i915/intel_bios.h
@@ -46,7 +46,7 @@ struct bdb_header {
 	u16 version;			/**< decimal */
 	u16 header_size;		/**< in bytes */
 	u16 bdb_size;			/**< in bytes */
-};
+} __packed;
 
 /* strictly speaking, this is a "skip" block, but it has interesting info */
 struct vbios_data {
@@ -252,7 +252,7 @@ union child_device_config {
 	/* This one should also be safe to use anywhere, even without version
 	 * checks. */
 	struct common_child_dev_config common;
-};
+} __packed;
 
 struct bdb_general_definitions {
 	/* DDC GPIO */
@@ -888,12 +888,12 @@ struct mipi_pps_data {
 	u16 bl_disable_delay;
 	u16 panel_off_delay;
 	u16 panel_power_cycle_delay;
-};
+} __packed;
 
 struct bdb_mipi_config {
 	struct mipi_config config[MAX_MIPI_CONFIGURATIONS];
 	struct mipi_pps_data pps[MAX_MIPI_CONFIGURATIONS];
-};
+} __packed;
 
 /* Block 53 contains MIPI sequences as needed by the panel
  * for enabling it. This block can be variable in size and
@@ -902,7 +902,7 @@ struct bdb_mipi_config {
 struct bdb_mipi_sequence {
 	u8 version;
 	u8 data[0];
-};
+} __packed;
 
 /* MIPI Sequnece Block definitions */
 enum mipi_seq {
diff --git a/drivers/gpu/drm/i915/intel_crt.c b/drivers/gpu/drm/i915/intel_crt.c
index 9212e6504e0f..a9af9a4866db 100644
--- a/drivers/gpu/drm/i915/intel_crt.c
+++ b/drivers/gpu/drm/i915/intel_crt.c
@@ -72,7 +72,7 @@ static bool intel_crt_get_hw_state(struct intel_encoder *encoder,
 	u32 tmp;
 
 	power_domain = intel_display_port_power_domain(encoder);
-	if (!intel_display_power_enabled(dev_priv, power_domain))
+	if (!intel_display_power_is_enabled(dev_priv, power_domain))
 		return false;
 
 	tmp = I915_READ(crt->adpa_reg);
@@ -775,7 +775,7 @@ static void intel_crt_reset(struct drm_connector *connector)
 		I915_WRITE(crt->adpa_reg, adpa);
 		POSTING_READ(crt->adpa_reg);
 
-		DRM_DEBUG_KMS("pch crt adpa set to 0x%x\n", adpa);
+		DRM_DEBUG_KMS("crt adpa set to 0x%x\n", adpa);
 		crt->force_hotplug_required = 1;
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c
index b63d4fa204a3..e6b45cd150d3 100644
--- a/drivers/gpu/drm/i915/intel_ddi.c
+++ b/drivers/gpu/drm/i915/intel_ddi.c
@@ -95,8 +95,8 @@ static const struct ddi_buf_trans bdw_ddi_translations_dp[] = {
 	{ 0x00BEFFFF, 0x00140006 },
 	{ 0x80B2CFFF, 0x001B0002 },
 	{ 0x00FFFFFF, 0x000E000A },
-	{ 0x00D75FFF, 0x00180004 },
-	{ 0x80CB2FFF, 0x001B0002 },
+	{ 0x00DB6FFF, 0x00160005 },
+	{ 0x80C71FFF, 0x001A0002 },
 	{ 0x00F7DFFF, 0x00180004 },
 	{ 0x80D75FFF, 0x001B0002 },
 };
@@ -127,6 +127,32 @@ static const struct ddi_buf_trans bdw_ddi_translations_hdmi[] = {
 	{ 0x80FFFFFF, 0x001B0002 },	/* 9:	1000	1000	0	*/
 };
 
+static const struct ddi_buf_trans skl_ddi_translations_dp[] = {
+	{ 0x00000018, 0x000000a0 },
+	{ 0x00004014, 0x00000098 },
+	{ 0x00006012, 0x00000088 },
+	{ 0x00008010, 0x00000080 },
+	{ 0x00000018, 0x00000098 },
+	{ 0x00004014, 0x00000088 },
+	{ 0x00006012, 0x00000080 },
+	{ 0x00000018, 0x00000088 },
+	{ 0x00004014, 0x00000080 },
+};
+
+static const struct ddi_buf_trans skl_ddi_translations_hdmi[] = {
+					/* Idx	NT mV   T mV    db  */
+	{ 0x00000018, 0x000000a0 },	/* 0:	400	400	0   */
+	{ 0x00004014, 0x00000098 },	/* 1:	400	600	3.5 */
+	{ 0x00006012, 0x00000088 },	/* 2:	400	800	6   */
+	{ 0x00000018, 0x0000003c },	/* 3:	450	450	0   */
+	{ 0x00000018, 0x00000098 },	/* 4:	600	600	0   */
+	{ 0x00003015, 0x00000088 },	/* 5:	600	800	2.5 */
+	{ 0x00005013, 0x00000080 },	/* 6:	600	1000	4.5 */
+	{ 0x00000018, 0x00000088 },	/* 7:	800	800	0   */
+	{ 0x00000096, 0x00000080 },	/* 8:	800	1000	2   */
+	{ 0x00000018, 0x00000080 },	/* 9:	1200	1200	0   */
+};
+
 enum port intel_ddi_get_encoder_port(struct intel_encoder *intel_encoder)
 {
 	struct drm_encoder *encoder = &intel_encoder->base;
@@ -169,7 +195,14 @@ static void intel_prepare_ddi_buffers(struct drm_device *dev, enum port port)
 	const struct ddi_buf_trans *ddi_translations_hdmi;
 	const struct ddi_buf_trans *ddi_translations;
 
-	if (IS_BROADWELL(dev)) {
+	if (IS_SKYLAKE(dev)) {
+		ddi_translations_fdi = NULL;
+		ddi_translations_dp = skl_ddi_translations_dp;
+		ddi_translations_edp = skl_ddi_translations_dp;
+		ddi_translations_hdmi = skl_ddi_translations_hdmi;
+		n_hdmi_entries = ARRAY_SIZE(skl_ddi_translations_hdmi);
+		hdmi_800mV_0dB = 7;
+	} else if (IS_BROADWELL(dev)) {
 		ddi_translations_fdi = bdw_ddi_translations_fdi;
 		ddi_translations_dp = bdw_ddi_translations_dp;
 		ddi_translations_edp = bdw_ddi_translations_edp;
@@ -208,7 +241,10 @@ static void intel_prepare_ddi_buffers(struct drm_device *dev, enum port port)
 			ddi_translations = ddi_translations_dp;
 		break;
 	case PORT_E:
-		ddi_translations = ddi_translations_fdi;
+		if (ddi_translations_fdi)
+			ddi_translations = ddi_translations_fdi;
+		else
+			ddi_translations = ddi_translations_dp;
 		break;
 	default:
 		BUG();
@@ -423,6 +459,27 @@ intel_ddi_get_crtc_encoder(struct drm_crtc *crtc)
 	return ret;
 }
 
+static struct intel_encoder *
+intel_ddi_get_crtc_new_encoder(struct intel_crtc *crtc)
+{
+	struct drm_device *dev = crtc->base.dev;
+	struct intel_encoder *intel_encoder, *ret = NULL;
+	int num_encoders = 0;
+
+	for_each_intel_encoder(dev, intel_encoder) {
+		if (intel_encoder->new_crtc == crtc) {
+			ret = intel_encoder;
+			num_encoders++;
+		}
+	}
+
+	WARN(num_encoders != 1, "%d encoders on crtc for pipe %c\n", num_encoders,
+	     pipe_name(crtc->pipe));
+
+	BUG_ON(ret == NULL);
+	return ret;
+}
+
 #define LC_FREQ 2700
 #define LC_FREQ_2K U64_C(LC_FREQ * 2000)
 
@@ -613,6 +670,111 @@ static int intel_ddi_calc_wrpll_link(struct drm_i915_private *dev_priv,
 	return (refclk * n * 100) / (p * r);
 }
 
+static int skl_calc_wrpll_link(struct drm_i915_private *dev_priv,
+			       uint32_t dpll)
+{
+	uint32_t cfgcr1_reg, cfgcr2_reg;
+	uint32_t cfgcr1_val, cfgcr2_val;
+	uint32_t p0, p1, p2, dco_freq;
+
+	cfgcr1_reg = GET_CFG_CR1_REG(dpll);
+	cfgcr2_reg = GET_CFG_CR2_REG(dpll);
+
+	cfgcr1_val = I915_READ(cfgcr1_reg);
+	cfgcr2_val = I915_READ(cfgcr2_reg);
+
+	p0 = cfgcr2_val & DPLL_CFGCR2_PDIV_MASK;
+	p2 = cfgcr2_val & DPLL_CFGCR2_KDIV_MASK;
+
+	if (cfgcr2_val &  DPLL_CFGCR2_QDIV_MODE(1))
+		p1 = (cfgcr2_val & DPLL_CFGCR2_QDIV_RATIO_MASK) >> 8;
+	else
+		p1 = 1;
+
+
+	switch (p0) {
+	case DPLL_CFGCR2_PDIV_1:
+		p0 = 1;
+		break;
+	case DPLL_CFGCR2_PDIV_2:
+		p0 = 2;
+		break;
+	case DPLL_CFGCR2_PDIV_3:
+		p0 = 3;
+		break;
+	case DPLL_CFGCR2_PDIV_7:
+		p0 = 7;
+		break;
+	}
+
+	switch (p2) {
+	case DPLL_CFGCR2_KDIV_5:
+		p2 = 5;
+		break;
+	case DPLL_CFGCR2_KDIV_2:
+		p2 = 2;
+		break;
+	case DPLL_CFGCR2_KDIV_3:
+		p2 = 3;
+		break;
+	case DPLL_CFGCR2_KDIV_1:
+		p2 = 1;
+		break;
+	}
+
+	dco_freq = (cfgcr1_val & DPLL_CFGCR1_DCO_INTEGER_MASK) * 24 * 1000;
+
+	dco_freq += (((cfgcr1_val & DPLL_CFGCR1_DCO_FRACTION_MASK) >> 9) * 24 *
+		1000) / 0x8000;
+
+	return dco_freq / (p0 * p1 * p2 * 5);
+}
+
+
+static void skl_ddi_clock_get(struct intel_encoder *encoder,
+				struct intel_crtc_config *pipe_config)
+{
+	struct drm_i915_private *dev_priv = encoder->base.dev->dev_private;
+	int link_clock = 0;
+	uint32_t dpll_ctl1, dpll;
+
+	dpll = pipe_config->ddi_pll_sel;
+
+	dpll_ctl1 = I915_READ(DPLL_CTRL1);
+
+	if (dpll_ctl1 & DPLL_CTRL1_HDMI_MODE(dpll)) {
+		link_clock = skl_calc_wrpll_link(dev_priv, dpll);
+	} else {
+		link_clock = dpll_ctl1 & DPLL_CRTL1_LINK_RATE_MASK(dpll);
+		link_clock >>= DPLL_CRTL1_LINK_RATE_SHIFT(dpll);
+
+		switch (link_clock) {
+		case DPLL_CRTL1_LINK_RATE_810:
+			link_clock = 81000;
+			break;
+		case DPLL_CRTL1_LINK_RATE_1350:
+			link_clock = 135000;
+			break;
+		case DPLL_CRTL1_LINK_RATE_2700:
+			link_clock = 270000;
+			break;
+		default:
+			WARN(1, "Unsupported link rate\n");
+			break;
+		}
+		link_clock *= 2;
+	}
+
+	pipe_config->port_clock = link_clock;
+
+	if (pipe_config->has_dp_encoder)
+		pipe_config->adjusted_mode.crtc_clock =
+			intel_dotclock_calculate(pipe_config->port_clock,
+						 &pipe_config->dp_m_n);
+	else
+		pipe_config->adjusted_mode.crtc_clock = pipe_config->port_clock;
+}
+
 static void hsw_ddi_clock_get(struct intel_encoder *encoder,
 			      struct intel_crtc_config *pipe_config)
 {
@@ -756,7 +918,7 @@ hsw_ddi_pll_select(struct intel_crtc *intel_crtc,
 		      WRPLL_DIVIDER_REFERENCE(r2) | WRPLL_DIVIDER_FEEDBACK(n2) |
 		      WRPLL_DIVIDER_POST(p);
 
-		intel_crtc->config.dpll_hw_state.wrpll = val;
+		intel_crtc->new_config->dpll_hw_state.wrpll = val;
 
 		pll = intel_get_shared_dpll(intel_crtc);
 		if (pll == NULL) {
@@ -765,12 +927,234 @@ hsw_ddi_pll_select(struct intel_crtc *intel_crtc,
 			return false;
 		}
 
-		intel_crtc->config.ddi_pll_sel = PORT_CLK_SEL_WRPLL(pll->id);
+		intel_crtc->new_config->ddi_pll_sel = PORT_CLK_SEL_WRPLL(pll->id);
 	}
 
 	return true;
 }
 
+struct skl_wrpll_params {
+	uint32_t        dco_fraction;
+	uint32_t        dco_integer;
+	uint32_t        qdiv_ratio;
+	uint32_t        qdiv_mode;
+	uint32_t        kdiv;
+	uint32_t        pdiv;
+	uint32_t        central_freq;
+};
+
+static void
+skl_ddi_calculate_wrpll(int clock /* in Hz */,
+			struct skl_wrpll_params *wrpll_params)
+{
+	uint64_t afe_clock = clock * 5; /* AFE Clock is 5x Pixel clock */
+	uint64_t dco_central_freq[3] = {8400000000ULL,
+					9000000000ULL,
+					9600000000ULL};
+	uint32_t min_dco_deviation = 400;
+	uint32_t min_dco_index = 3;
+	uint32_t P0[4] = {1, 2, 3, 7};
+	uint32_t P2[4] = {1, 2, 3, 5};
+	bool found = false;
+	uint32_t candidate_p = 0;
+	uint32_t candidate_p0[3] = {0}, candidate_p1[3] = {0};
+	uint32_t candidate_p2[3] = {0};
+	uint32_t dco_central_freq_deviation[3];
+	uint32_t i, P1, k, dco_count;
+	bool retry_with_odd = false;
+	uint64_t dco_freq;
+
+	/* Determine P0, P1 or P2 */
+	for (dco_count = 0; dco_count < 3; dco_count++) {
+		found = false;
+		candidate_p =
+			div64_u64(dco_central_freq[dco_count], afe_clock);
+		if (retry_with_odd == false)
+			candidate_p = (candidate_p % 2 == 0 ?
+				candidate_p : candidate_p + 1);
+
+		for (P1 = 1; P1 < candidate_p; P1++) {
+			for (i = 0; i < 4; i++) {
+				if (!(P0[i] != 1 || P1 == 1))
+					continue;
+
+				for (k = 0; k < 4; k++) {
+					if (P1 != 1 && P2[k] != 2)
+						continue;
+
+					if (candidate_p == P0[i] * P1 * P2[k]) {
+						/* Found possible P0, P1, P2 */
+						found = true;
+						candidate_p0[dco_count] = P0[i];
+						candidate_p1[dco_count] = P1;
+						candidate_p2[dco_count] = P2[k];
+						goto found;
+					}
+
+				}
+			}
+		}
+
+found:
+		if (found) {
+			dco_central_freq_deviation[dco_count] =
+				div64_u64(10000 *
+					  abs_diff((candidate_p * afe_clock),
+						   dco_central_freq[dco_count]),
+					  dco_central_freq[dco_count]);
+
+			if (dco_central_freq_deviation[dco_count] <
+				min_dco_deviation) {
+				min_dco_deviation =
+					dco_central_freq_deviation[dco_count];
+				min_dco_index = dco_count;
+			}
+		}
+
+		if (min_dco_index > 2 && dco_count == 2) {
+			retry_with_odd = true;
+			dco_count = 0;
+		}
+	}
+
+	if (min_dco_index > 2) {
+		WARN(1, "No valid values found for the given pixel clock\n");
+	} else {
+		 wrpll_params->central_freq = dco_central_freq[min_dco_index];
+
+		 switch (dco_central_freq[min_dco_index]) {
+		 case 9600000000ULL:
+			wrpll_params->central_freq = 0;
+			break;
+		 case 9000000000ULL:
+			wrpll_params->central_freq = 1;
+			break;
+		 case 8400000000ULL:
+			wrpll_params->central_freq = 3;
+		 }
+
+		 switch (candidate_p0[min_dco_index]) {
+		 case 1:
+			wrpll_params->pdiv = 0;
+			break;
+		 case 2:
+			wrpll_params->pdiv = 1;
+			break;
+		 case 3:
+			wrpll_params->pdiv = 2;
+			break;
+		 case 7:
+			wrpll_params->pdiv = 4;
+			break;
+		 default:
+			WARN(1, "Incorrect PDiv\n");
+		 }
+
+		 switch (candidate_p2[min_dco_index]) {
+		 case 5:
+			wrpll_params->kdiv = 0;
+			break;
+		 case 2:
+			wrpll_params->kdiv = 1;
+			break;
+		 case 3:
+			wrpll_params->kdiv = 2;
+			break;
+		 case 1:
+			wrpll_params->kdiv = 3;
+			break;
+		 default:
+			WARN(1, "Incorrect KDiv\n");
+		 }
+
+		 wrpll_params->qdiv_ratio = candidate_p1[min_dco_index];
+		 wrpll_params->qdiv_mode =
+			(wrpll_params->qdiv_ratio == 1) ? 0 : 1;
+
+		 dco_freq = candidate_p0[min_dco_index] *
+			 candidate_p1[min_dco_index] *
+			 candidate_p2[min_dco_index] * afe_clock;
+
+		/*
+		* Intermediate values are in Hz.
+		* Divide by MHz to match bsepc
+		*/
+		 wrpll_params->dco_integer = div_u64(dco_freq, (24 * MHz(1)));
+		 wrpll_params->dco_fraction =
+			 div_u64(((div_u64(dco_freq, 24) -
+				   wrpll_params->dco_integer * MHz(1)) * 0x8000), MHz(1));
+
+	}
+}
+
+
+static bool
+skl_ddi_pll_select(struct intel_crtc *intel_crtc,
+		   struct intel_encoder *intel_encoder,
+		   int clock)
+{
+	struct intel_shared_dpll *pll;
+	uint32_t ctrl1, cfgcr1, cfgcr2;
+
+	/*
+	 * See comment in intel_dpll_hw_state to understand why we always use 0
+	 * as the DPLL id in this function.
+	 */
+
+	ctrl1 = DPLL_CTRL1_OVERRIDE(0);
+
+	if (intel_encoder->type == INTEL_OUTPUT_HDMI) {
+		struct skl_wrpll_params wrpll_params = { 0, };
+
+		ctrl1 |= DPLL_CTRL1_HDMI_MODE(0);
+
+		skl_ddi_calculate_wrpll(clock * 1000, &wrpll_params);
+
+		cfgcr1 = DPLL_CFGCR1_FREQ_ENABLE |
+			 DPLL_CFGCR1_DCO_FRACTION(wrpll_params.dco_fraction) |
+			 wrpll_params.dco_integer;
+
+		cfgcr2 = DPLL_CFGCR2_QDIV_RATIO(wrpll_params.qdiv_ratio) |
+			 DPLL_CFGCR2_QDIV_MODE(wrpll_params.qdiv_mode) |
+			 DPLL_CFGCR2_KDIV(wrpll_params.kdiv) |
+			 DPLL_CFGCR2_PDIV(wrpll_params.pdiv) |
+			 wrpll_params.central_freq;
+	} else if (intel_encoder->type == INTEL_OUTPUT_DISPLAYPORT) {
+		struct drm_encoder *encoder = &intel_encoder->base;
+		struct intel_dp *intel_dp = enc_to_intel_dp(encoder);
+
+		switch (intel_dp->link_bw) {
+		case DP_LINK_BW_1_62:
+			ctrl1 |= DPLL_CRTL1_LINK_RATE(DPLL_CRTL1_LINK_RATE_810, 0);
+			break;
+		case DP_LINK_BW_2_7:
+			ctrl1 |= DPLL_CRTL1_LINK_RATE(DPLL_CRTL1_LINK_RATE_1350, 0);
+			break;
+		case DP_LINK_BW_5_4:
+			ctrl1 |= DPLL_CRTL1_LINK_RATE(DPLL_CRTL1_LINK_RATE_2700, 0);
+			break;
+		}
+
+		cfgcr1 = cfgcr2 = 0;
+	} else /* eDP */
+		return true;
+
+	intel_crtc->new_config->dpll_hw_state.ctrl1 = ctrl1;
+	intel_crtc->new_config->dpll_hw_state.cfgcr1 = cfgcr1;
+	intel_crtc->new_config->dpll_hw_state.cfgcr2 = cfgcr2;
+
+	pll = intel_get_shared_dpll(intel_crtc);
+	if (pll == NULL) {
+		DRM_DEBUG_DRIVER("failed to find PLL for pipe %c\n",
+				 pipe_name(intel_crtc->pipe));
+		return false;
+	}
+
+	/* shared DPLL id 0 is DPLL 1 */
+	intel_crtc->new_config->ddi_pll_sel = pll->id + 1;
+
+	return true;
+}
 
 /*
  * Tries to find a *shared* PLL for the CRTC and store it in
@@ -781,13 +1165,15 @@ hsw_ddi_pll_select(struct intel_crtc *intel_crtc,
  */
 bool intel_ddi_pll_select(struct intel_crtc *intel_crtc)
 {
-	struct drm_crtc *crtc = &intel_crtc->base;
-	struct intel_encoder *intel_encoder = intel_ddi_get_crtc_encoder(crtc);
-	int clock = intel_crtc->config.port_clock;
-
-	intel_put_shared_dpll(intel_crtc);
+	struct drm_device *dev = intel_crtc->base.dev;
+	struct intel_encoder *intel_encoder =
+		intel_ddi_get_crtc_new_encoder(intel_crtc);
+	int clock = intel_crtc->new_config->port_clock;
 
-	return hsw_ddi_pll_select(intel_crtc, intel_encoder, clock);
+	if (IS_SKYLAKE(dev))
+		return skl_ddi_pll_select(intel_crtc, intel_encoder, clock);
+	else
+		return hsw_ddi_pll_select(intel_crtc, intel_encoder, clock);
 }
 
 void intel_ddi_set_pipe_settings(struct drm_crtc *crtc)
@@ -962,7 +1348,7 @@ bool intel_ddi_connector_get_hw_state(struct intel_connector *intel_connector)
 	uint32_t tmp;
 
 	power_domain = intel_display_port_power_domain(intel_encoder);
-	if (!intel_display_power_enabled(dev_priv, power_domain))
+	if (!intel_display_power_is_enabled(dev_priv, power_domain))
 		return false;
 
 	if (!intel_encoder->get_hw_state(intel_encoder, &pipe))
@@ -1008,7 +1394,7 @@ bool intel_ddi_get_hw_state(struct intel_encoder *encoder,
 	int i;
 
 	power_domain = intel_display_port_power_domain(encoder);
-	if (!intel_display_power_enabled(dev_priv, power_domain))
+	if (!intel_display_power_is_enabled(dev_priv, power_domain))
 		return false;
 
 	tmp = I915_READ(DDI_BUF_CTL(port));
@@ -1079,27 +1465,53 @@ void intel_ddi_disable_pipe_clock(struct intel_crtc *intel_crtc)
 static void intel_ddi_pre_enable(struct intel_encoder *intel_encoder)
 {
 	struct drm_encoder *encoder = &intel_encoder->base;
-	struct drm_i915_private *dev_priv = encoder->dev->dev_private;
+	struct drm_device *dev = encoder->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_crtc *crtc = to_intel_crtc(encoder->crtc);
 	enum port port = intel_ddi_get_encoder_port(intel_encoder);
 	int type = intel_encoder->type;
 
-	if (crtc->config.has_audio) {
-		DRM_DEBUG_DRIVER("Audio on pipe %c on DDI\n",
-				 pipe_name(crtc->pipe));
-
-		/* write eld */
-		DRM_DEBUG_DRIVER("DDI audio: write eld information\n");
-		intel_write_eld(encoder, &crtc->config.adjusted_mode);
-	}
-
 	if (type == INTEL_OUTPUT_EDP) {
 		struct intel_dp *intel_dp = enc_to_intel_dp(encoder);
 		intel_edp_panel_on(intel_dp);
 	}
 
-	WARN_ON(crtc->config.ddi_pll_sel == PORT_CLK_SEL_NONE);
-	I915_WRITE(PORT_CLK_SEL(port), crtc->config.ddi_pll_sel);
+	if (IS_SKYLAKE(dev)) {
+		uint32_t dpll = crtc->config.ddi_pll_sel;
+		uint32_t val;
+
+		/*
+		 * DPLL0 is used for eDP and is the only "private" DPLL (as
+		 * opposed to shared) on SKL
+		 */
+		if (type == INTEL_OUTPUT_EDP) {
+			WARN_ON(dpll != SKL_DPLL0);
+
+			val = I915_READ(DPLL_CTRL1);
+
+			val &= ~(DPLL_CTRL1_HDMI_MODE(dpll) |
+				 DPLL_CTRL1_SSC(dpll) |
+				 DPLL_CRTL1_LINK_RATE_MASK(dpll));
+			val |= crtc->config.dpll_hw_state.ctrl1 << (dpll * 6);
+
+			I915_WRITE(DPLL_CTRL1, val);
+			POSTING_READ(DPLL_CTRL1);
+		}
+
+		/* DDI -> PLL mapping  */
+		val = I915_READ(DPLL_CTRL2);
+
+		val &= ~(DPLL_CTRL2_DDI_CLK_OFF(port) |
+			DPLL_CTRL2_DDI_CLK_SEL_MASK(port));
+		val |= (DPLL_CTRL2_DDI_CLK_SEL(dpll, port) |
+			DPLL_CTRL2_DDI_SEL_OVERRIDE(port));
+
+		I915_WRITE(DPLL_CTRL2, val);
+
+	} else {
+		WARN_ON(crtc->config.ddi_pll_sel == PORT_CLK_SEL_NONE);
+		I915_WRITE(PORT_CLK_SEL(port), crtc->config.ddi_pll_sel);
+	}
 
 	if (type == INTEL_OUTPUT_DISPLAYPORT || type == INTEL_OUTPUT_EDP) {
 		struct intel_dp *intel_dp = enc_to_intel_dp(encoder);
@@ -1109,7 +1521,7 @@ static void intel_ddi_pre_enable(struct intel_encoder *intel_encoder)
 		intel_dp_sink_dpms(intel_dp, DRM_MODE_DPMS_ON);
 		intel_dp_start_link_train(intel_dp);
 		intel_dp_complete_link_train(intel_dp);
-		if (port != PORT_A)
+		if (port != PORT_A || INTEL_INFO(dev)->gen >= 9)
 			intel_dp_stop_link_train(intel_dp);
 	} else if (type == INTEL_OUTPUT_HDMI) {
 		struct intel_hdmi *intel_hdmi = enc_to_intel_hdmi(encoder);
@@ -1123,7 +1535,8 @@ static void intel_ddi_pre_enable(struct intel_encoder *intel_encoder)
 static void intel_ddi_post_disable(struct intel_encoder *intel_encoder)
 {
 	struct drm_encoder *encoder = &intel_encoder->base;
-	struct drm_i915_private *dev_priv = encoder->dev->dev_private;
+	struct drm_device *dev = encoder->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	enum port port = intel_ddi_get_encoder_port(intel_encoder);
 	int type = intel_encoder->type;
 	uint32_t val;
@@ -1151,7 +1564,11 @@ static void intel_ddi_post_disable(struct intel_encoder *intel_encoder)
 		intel_edp_panel_off(intel_dp);
 	}
 
-	I915_WRITE(PORT_CLK_SEL(port), PORT_CLK_SEL_NONE);
+	if (IS_SKYLAKE(dev))
+		I915_WRITE(DPLL_CTRL2, (I915_READ(DPLL_CTRL2) |
+					DPLL_CTRL2_DDI_CLK_OFF(port)));
+	else
+		I915_WRITE(PORT_CLK_SEL(port), PORT_CLK_SEL_NONE);
 }
 
 static void intel_enable_ddi(struct intel_encoder *intel_encoder)
@@ -1159,12 +1576,10 @@ static void intel_enable_ddi(struct intel_encoder *intel_encoder)
 	struct drm_encoder *encoder = &intel_encoder->base;
 	struct drm_crtc *crtc = encoder->crtc;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-	int pipe = intel_crtc->pipe;
 	struct drm_device *dev = encoder->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	enum port port = intel_ddi_get_encoder_port(intel_encoder);
 	int type = intel_encoder->type;
-	uint32_t tmp;
 
 	if (type == INTEL_OUTPUT_HDMI) {
 		struct intel_digital_port *intel_dig_port =
@@ -1180,18 +1595,16 @@ static void intel_enable_ddi(struct intel_encoder *intel_encoder)
 	} else if (type == INTEL_OUTPUT_EDP) {
 		struct intel_dp *intel_dp = enc_to_intel_dp(encoder);
 
-		if (port == PORT_A)
+		if (port == PORT_A && INTEL_INFO(dev)->gen < 9)
 			intel_dp_stop_link_train(intel_dp);
 
 		intel_edp_backlight_on(intel_dp);
-		intel_edp_psr_enable(intel_dp);
+		intel_psr_enable(intel_dp);
 	}
 
 	if (intel_crtc->config.has_audio) {
 		intel_display_power_get(dev_priv, POWER_DOMAIN_AUDIO);
-		tmp = I915_READ(HSW_AUD_PIN_ELD_CP_VLD);
-		tmp |= ((AUDIO_OUTPUT_ENABLE_A | AUDIO_ELD_VALID_A) << (pipe * 4));
-		I915_WRITE(HSW_AUD_PIN_ELD_CP_VLD, tmp);
+		intel_audio_codec_enable(intel_encoder);
 	}
 }
 
@@ -1200,30 +1613,71 @@ static void intel_disable_ddi(struct intel_encoder *intel_encoder)
 	struct drm_encoder *encoder = &intel_encoder->base;
 	struct drm_crtc *crtc = encoder->crtc;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-	int pipe = intel_crtc->pipe;
 	int type = intel_encoder->type;
 	struct drm_device *dev = encoder->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	uint32_t tmp;
 
-	/* We can't touch HSW_AUD_PIN_ELD_CP_VLD uncionditionally because this
-	 * register is part of the power well on Haswell. */
 	if (intel_crtc->config.has_audio) {
-		tmp = I915_READ(HSW_AUD_PIN_ELD_CP_VLD);
-		tmp &= ~((AUDIO_OUTPUT_ENABLE_A | AUDIO_ELD_VALID_A) <<
-			 (pipe * 4));
-		I915_WRITE(HSW_AUD_PIN_ELD_CP_VLD, tmp);
+		intel_audio_codec_disable(intel_encoder);
 		intel_display_power_put(dev_priv, POWER_DOMAIN_AUDIO);
 	}
 
 	if (type == INTEL_OUTPUT_EDP) {
 		struct intel_dp *intel_dp = enc_to_intel_dp(encoder);
 
-		intel_edp_psr_disable(intel_dp);
+		intel_psr_disable(intel_dp);
 		intel_edp_backlight_off(intel_dp);
 	}
 }
 
+static int skl_get_cdclk_freq(struct drm_i915_private *dev_priv)
+{
+	uint32_t lcpll1 = I915_READ(LCPLL1_CTL);
+	uint32_t cdctl = I915_READ(CDCLK_CTL);
+	uint32_t linkrate;
+
+	if (!(lcpll1 & LCPLL_PLL_ENABLE)) {
+		WARN(1, "LCPLL1 not enabled\n");
+		return 24000; /* 24MHz is the cd freq with NSSC ref */
+	}
+
+	if ((cdctl & CDCLK_FREQ_SEL_MASK) == CDCLK_FREQ_540)
+		return 540000;
+
+	linkrate = (I915_READ(DPLL_CTRL1) &
+		    DPLL_CRTL1_LINK_RATE_MASK(SKL_DPLL0)) >> 1;
+
+	if (linkrate == DPLL_CRTL1_LINK_RATE_2160 ||
+	    linkrate == DPLL_CRTL1_LINK_RATE_1080) {
+		/* vco 8640 */
+		switch (cdctl & CDCLK_FREQ_SEL_MASK) {
+		case CDCLK_FREQ_450_432:
+			return 432000;
+		case CDCLK_FREQ_337_308:
+			return 308570;
+		case CDCLK_FREQ_675_617:
+			return 617140;
+		default:
+			WARN(1, "Unknown cd freq selection\n");
+		}
+	} else {
+		/* vco 8100 */
+		switch (cdctl & CDCLK_FREQ_SEL_MASK) {
+		case CDCLK_FREQ_450_432:
+			return 450000;
+		case CDCLK_FREQ_337_308:
+			return 337500;
+		case CDCLK_FREQ_675_617:
+			return 675000;
+		default:
+			WARN(1, "Unknown cd freq selection\n");
+		}
+	}
+
+	/* error case, do as if DPLL0 isn't enabled */
+	return 24000;
+}
+
 static int bdw_get_cdclk_freq(struct drm_i915_private *dev_priv)
 {
 	uint32_t lcpll = I915_READ(LCPLL_CTL);
@@ -1255,7 +1709,7 @@ static int hsw_get_cdclk_freq(struct drm_i915_private *dev_priv)
 		return 450000;
 	else if (freq == LCPLL_CLK_FREQ_450)
 		return 450000;
-	else if (IS_ULT(dev))
+	else if (IS_HSW_ULT(dev))
 		return 337500;
 	else
 		return 540000;
@@ -1265,6 +1719,9 @@ int intel_ddi_get_cdclk_freq(struct drm_i915_private *dev_priv)
 {
 	struct drm_device *dev = dev_priv->dev;
 
+	if (IS_SKYLAKE(dev))
+		return skl_get_cdclk_freq(dev_priv);
+
 	if (IS_BROADWELL(dev))
 		return bdw_get_cdclk_freq(dev_priv);
 
@@ -1275,7 +1732,7 @@ int intel_ddi_get_cdclk_freq(struct drm_i915_private *dev_priv)
 static void hsw_ddi_pll_enable(struct drm_i915_private *dev_priv,
 			       struct intel_shared_dpll *pll)
 {
-	I915_WRITE(WRPLL_CTL(pll->id), pll->hw_state.wrpll);
+	I915_WRITE(WRPLL_CTL(pll->id), pll->config.hw_state.wrpll);
 	POSTING_READ(WRPLL_CTL(pll->id));
 	udelay(20);
 }
@@ -1296,7 +1753,7 @@ static bool hsw_ddi_pll_get_hw_state(struct drm_i915_private *dev_priv,
 {
 	uint32_t val;
 
-	if (!intel_display_power_enabled(dev_priv, POWER_DOMAIN_PLLS))
+	if (!intel_display_power_is_enabled(dev_priv, POWER_DOMAIN_PLLS))
 		return false;
 
 	val = I915_READ(WRPLL_CTL(pll->id));
@@ -1326,26 +1783,156 @@ static void hsw_shared_dplls_init(struct drm_i915_private *dev_priv)
 	}
 }
 
+static const char * const skl_ddi_pll_names[] = {
+	"DPLL 1",
+	"DPLL 2",
+	"DPLL 3",
+};
+
+struct skl_dpll_regs {
+	u32 ctl, cfgcr1, cfgcr2;
+};
+
+/* this array is indexed by the *shared* pll id */
+static const struct skl_dpll_regs skl_dpll_regs[3] = {
+	{
+		/* DPLL 1 */
+		.ctl = LCPLL2_CTL,
+		.cfgcr1 = DPLL1_CFGCR1,
+		.cfgcr2 = DPLL1_CFGCR2,
+	},
+	{
+		/* DPLL 2 */
+		.ctl = WRPLL_CTL1,
+		.cfgcr1 = DPLL2_CFGCR1,
+		.cfgcr2 = DPLL2_CFGCR2,
+	},
+	{
+		/* DPLL 3 */
+		.ctl = WRPLL_CTL2,
+		.cfgcr1 = DPLL3_CFGCR1,
+		.cfgcr2 = DPLL3_CFGCR2,
+	},
+};
+
+static void skl_ddi_pll_enable(struct drm_i915_private *dev_priv,
+			       struct intel_shared_dpll *pll)
+{
+	uint32_t val;
+	unsigned int dpll;
+	const struct skl_dpll_regs *regs = skl_dpll_regs;
+
+	/* DPLL0 is not part of the shared DPLLs, so pll->id is 0 for DPLL1 */
+	dpll = pll->id + 1;
+
+	val = I915_READ(DPLL_CTRL1);
+
+	val &= ~(DPLL_CTRL1_HDMI_MODE(dpll) | DPLL_CTRL1_SSC(dpll) |
+		 DPLL_CRTL1_LINK_RATE_MASK(dpll));
+	val |= pll->config.hw_state.ctrl1 << (dpll * 6);
+
+	I915_WRITE(DPLL_CTRL1, val);
+	POSTING_READ(DPLL_CTRL1);
+
+	I915_WRITE(regs[pll->id].cfgcr1, pll->config.hw_state.cfgcr1);
+	I915_WRITE(regs[pll->id].cfgcr2, pll->config.hw_state.cfgcr2);
+	POSTING_READ(regs[pll->id].cfgcr1);
+	POSTING_READ(regs[pll->id].cfgcr2);
+
+	/* the enable bit is always bit 31 */
+	I915_WRITE(regs[pll->id].ctl,
+		   I915_READ(regs[pll->id].ctl) | LCPLL_PLL_ENABLE);
+
+	if (wait_for(I915_READ(DPLL_STATUS) & DPLL_LOCK(dpll), 5))
+		DRM_ERROR("DPLL %d not locked\n", dpll);
+}
+
+static void skl_ddi_pll_disable(struct drm_i915_private *dev_priv,
+				struct intel_shared_dpll *pll)
+{
+	const struct skl_dpll_regs *regs = skl_dpll_regs;
+
+	/* the enable bit is always bit 31 */
+	I915_WRITE(regs[pll->id].ctl,
+		   I915_READ(regs[pll->id].ctl) & ~LCPLL_PLL_ENABLE);
+	POSTING_READ(regs[pll->id].ctl);
+}
+
+static bool skl_ddi_pll_get_hw_state(struct drm_i915_private *dev_priv,
+				     struct intel_shared_dpll *pll,
+				     struct intel_dpll_hw_state *hw_state)
+{
+	uint32_t val;
+	unsigned int dpll;
+	const struct skl_dpll_regs *regs = skl_dpll_regs;
+
+	if (!intel_display_power_is_enabled(dev_priv, POWER_DOMAIN_PLLS))
+		return false;
+
+	/* DPLL0 is not part of the shared DPLLs, so pll->id is 0 for DPLL1 */
+	dpll = pll->id + 1;
+
+	val = I915_READ(regs[pll->id].ctl);
+	if (!(val & LCPLL_PLL_ENABLE))
+		return false;
+
+	val = I915_READ(DPLL_CTRL1);
+	hw_state->ctrl1 = (val >> (dpll * 6)) & 0x3f;
+
+	/* avoid reading back stale values if HDMI mode is not enabled */
+	if (val & DPLL_CTRL1_HDMI_MODE(dpll)) {
+		hw_state->cfgcr1 = I915_READ(regs[pll->id].cfgcr1);
+		hw_state->cfgcr2 = I915_READ(regs[pll->id].cfgcr2);
+	}
+
+	return true;
+}
+
+static void skl_shared_dplls_init(struct drm_i915_private *dev_priv)
+{
+	int i;
+
+	dev_priv->num_shared_dpll = 3;
+
+	for (i = 0; i < dev_priv->num_shared_dpll; i++) {
+		dev_priv->shared_dplls[i].id = i;
+		dev_priv->shared_dplls[i].name = skl_ddi_pll_names[i];
+		dev_priv->shared_dplls[i].disable = skl_ddi_pll_disable;
+		dev_priv->shared_dplls[i].enable = skl_ddi_pll_enable;
+		dev_priv->shared_dplls[i].get_hw_state =
+			skl_ddi_pll_get_hw_state;
+	}
+}
+
 void intel_ddi_pll_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	uint32_t val = I915_READ(LCPLL_CTL);
 
-	hsw_shared_dplls_init(dev_priv);
-
-	/* The LCPLL register should be turned on by the BIOS. For now let's
-	 * just check its state and print errors in case something is wrong.
-	 * Don't even try to turn it on.
-	 */
+	if (IS_SKYLAKE(dev))
+		skl_shared_dplls_init(dev_priv);
+	else
+		hsw_shared_dplls_init(dev_priv);
 
 	DRM_DEBUG_KMS("CDCLK running at %dKHz\n",
 		      intel_ddi_get_cdclk_freq(dev_priv));
 
-	if (val & LCPLL_CD_SOURCE_FCLK)
-		DRM_ERROR("CDCLK source is not LCPLL\n");
+	if (IS_SKYLAKE(dev)) {
+		if (!(I915_READ(LCPLL1_CTL) & LCPLL_PLL_ENABLE))
+			DRM_ERROR("LCPLL1 is disabled\n");
+	} else {
+		/*
+		 * The LCPLL register should be turned on by the BIOS. For now
+		 * let's just check its state and print errors in case
+		 * something is wrong.  Don't even try to turn it on.
+		 */
 
-	if (val & LCPLL_PLL_DISABLE)
-		DRM_ERROR("LCPLL is disabled\n");
+		if (val & LCPLL_CD_SOURCE_FCLK)
+			DRM_ERROR("CDCLK source is not LCPLL\n");
+
+		if (val & LCPLL_PLL_DISABLE)
+			DRM_ERROR("LCPLL is disabled\n");
+	}
 }
 
 void intel_ddi_prepare_link_retrain(struct drm_encoder *encoder)
@@ -1440,7 +2027,9 @@ void intel_ddi_get_config(struct intel_encoder *encoder,
 	struct drm_i915_private *dev_priv = encoder->base.dev->dev_private;
 	struct intel_crtc *intel_crtc = to_intel_crtc(encoder->base.crtc);
 	enum transcoder cpu_transcoder = intel_crtc->config.cpu_transcoder;
+	struct intel_hdmi *intel_hdmi;
 	u32 temp, flags = 0;
+	struct drm_device *dev = dev_priv->dev;
 
 	temp = I915_READ(TRANS_DDI_FUNC_CTL(cpu_transcoder));
 	if (temp & TRANS_DDI_PHSYNC)
@@ -1474,6 +2063,11 @@ void intel_ddi_get_config(struct intel_encoder *encoder,
 	switch (temp & TRANS_DDI_MODE_SELECT_MASK) {
 	case TRANS_DDI_MODE_SELECT_HDMI:
 		pipe_config->has_hdmi_sink = true;
+		intel_hdmi = enc_to_intel_hdmi(&encoder->base);
+
+		if (intel_hdmi->infoframe_enabled(&encoder->base))
+			pipe_config->has_infoframe = true;
+		break;
 	case TRANS_DDI_MODE_SELECT_DVI:
 	case TRANS_DDI_MODE_SELECT_FDI:
 		break;
@@ -1486,9 +2080,9 @@ void intel_ddi_get_config(struct intel_encoder *encoder,
 		break;
 	}
 
-	if (intel_display_power_enabled(dev_priv, POWER_DOMAIN_AUDIO)) {
+	if (intel_display_power_is_enabled(dev_priv, POWER_DOMAIN_AUDIO)) {
 		temp = I915_READ(HSW_AUD_PIN_ELD_CP_VLD);
-		if (temp & (AUDIO_OUTPUT_ENABLE_A << (intel_crtc->pipe * 4)))
+		if (temp & AUDIO_OUTPUT_ENABLE(intel_crtc->pipe))
 			pipe_config->has_audio = true;
 	}
 
@@ -1512,7 +2106,10 @@ void intel_ddi_get_config(struct intel_encoder *encoder,
 		dev_priv->vbt.edp_bpp = pipe_config->pipe_bpp;
 	}
 
-	hsw_ddi_clock_get(encoder, pipe_config);
+	if (INTEL_INFO(dev)->gen <= 8)
+		hsw_ddi_clock_get(encoder, pipe_config);
+	else
+		skl_ddi_clock_get(encoder, pipe_config);
 }
 
 static void intel_ddi_destroy(struct drm_encoder *encoder)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 9cb5c95d5898..fb3e3d429191 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -73,8 +73,6 @@ static const uint32_t intel_cursor_formats[] = {
 	DRM_FORMAT_ARGB8888,
 };
 
-static void intel_increase_pllclock(struct drm_device *dev,
-				    enum pipe pipe);
 static void intel_crtc_update_cursor(struct drm_crtc *crtc, bool on);
 
 static void i9xx_crtc_clock_get(struct intel_crtc *crtc,
@@ -96,8 +94,10 @@ static void intel_cpu_transcoder_set_m_n(struct intel_crtc *crtc,
 static void ironlake_set_pipeconf(struct drm_crtc *crtc);
 static void haswell_set_pipeconf(struct drm_crtc *crtc);
 static void intel_set_pipe_csc(struct drm_crtc *crtc);
-static void vlv_prepare_pll(struct intel_crtc *crtc);
-static void chv_prepare_pll(struct intel_crtc *crtc);
+static void vlv_prepare_pll(struct intel_crtc *crtc,
+			    const struct intel_crtc_config *pipe_config);
+static void chv_prepare_pll(struct intel_crtc *crtc,
+			    const struct intel_crtc_config *pipe_config);
 
 static struct intel_encoder *intel_find_encoder(struct intel_connector *connector, int pipe)
 {
@@ -408,25 +408,43 @@ static void vlv_clock(int refclk, intel_clock_t *clock)
 /**
  * Returns whether any output on the specified pipe is of the specified type
  */
-static bool intel_pipe_has_type(struct drm_crtc *crtc, int type)
+bool intel_pipe_has_type(struct intel_crtc *crtc, enum intel_output_type type)
 {
-	struct drm_device *dev = crtc->dev;
+	struct drm_device *dev = crtc->base.dev;
 	struct intel_encoder *encoder;
 
-	for_each_encoder_on_crtc(dev, crtc, encoder)
+	for_each_encoder_on_crtc(dev, &crtc->base, encoder)
 		if (encoder->type == type)
 			return true;
 
 	return false;
 }
 
-static const intel_limit_t *intel_ironlake_limit(struct drm_crtc *crtc,
+/**
+ * Returns whether any output on the specified pipe will have the specified
+ * type after a staged modeset is complete, i.e., the same as
+ * intel_pipe_has_type() but looking at encoder->new_crtc instead of
+ * encoder->crtc.
+ */
+static bool intel_pipe_will_have_type(struct intel_crtc *crtc, int type)
+{
+	struct drm_device *dev = crtc->base.dev;
+	struct intel_encoder *encoder;
+
+	for_each_intel_encoder(dev, encoder)
+		if (encoder->new_crtc == crtc && encoder->type == type)
+			return true;
+
+	return false;
+}
+
+static const intel_limit_t *intel_ironlake_limit(struct intel_crtc *crtc,
 						int refclk)
 {
-	struct drm_device *dev = crtc->dev;
+	struct drm_device *dev = crtc->base.dev;
 	const intel_limit_t *limit;
 
-	if (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS)) {
+	if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS)) {
 		if (intel_is_dual_link_lvds(dev)) {
 			if (refclk == 100000)
 				limit = &intel_limits_ironlake_dual_lvds_100m;
@@ -444,20 +462,20 @@ static const intel_limit_t *intel_ironlake_limit(struct drm_crtc *crtc,
 	return limit;
 }
 
-static const intel_limit_t *intel_g4x_limit(struct drm_crtc *crtc)
+static const intel_limit_t *intel_g4x_limit(struct intel_crtc *crtc)
 {
-	struct drm_device *dev = crtc->dev;
+	struct drm_device *dev = crtc->base.dev;
 	const intel_limit_t *limit;
 
-	if (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS)) {
+	if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS)) {
 		if (intel_is_dual_link_lvds(dev))
 			limit = &intel_limits_g4x_dual_channel_lvds;
 		else
 			limit = &intel_limits_g4x_single_channel_lvds;
-	} else if (intel_pipe_has_type(crtc, INTEL_OUTPUT_HDMI) ||
-		   intel_pipe_has_type(crtc, INTEL_OUTPUT_ANALOG)) {
+	} else if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_HDMI) ||
+		   intel_pipe_will_have_type(crtc, INTEL_OUTPUT_ANALOG)) {
 		limit = &intel_limits_g4x_hdmi;
-	} else if (intel_pipe_has_type(crtc, INTEL_OUTPUT_SDVO)) {
+	} else if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_SDVO)) {
 		limit = &intel_limits_g4x_sdvo;
 	} else /* The option is for other outputs */
 		limit = &intel_limits_i9xx_sdvo;
@@ -465,9 +483,9 @@ static const intel_limit_t *intel_g4x_limit(struct drm_crtc *crtc)
 	return limit;
 }
 
-static const intel_limit_t *intel_limit(struct drm_crtc *crtc, int refclk)
+static const intel_limit_t *intel_limit(struct intel_crtc *crtc, int refclk)
 {
-	struct drm_device *dev = crtc->dev;
+	struct drm_device *dev = crtc->base.dev;
 	const intel_limit_t *limit;
 
 	if (HAS_PCH_SPLIT(dev))
@@ -475,7 +493,7 @@ static const intel_limit_t *intel_limit(struct drm_crtc *crtc, int refclk)
 	else if (IS_G4X(dev)) {
 		limit = intel_g4x_limit(crtc);
 	} else if (IS_PINEVIEW(dev)) {
-		if (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS))
+		if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS))
 			limit = &intel_limits_pineview_lvds;
 		else
 			limit = &intel_limits_pineview_sdvo;
@@ -484,14 +502,14 @@ static const intel_limit_t *intel_limit(struct drm_crtc *crtc, int refclk)
 	} else if (IS_VALLEYVIEW(dev)) {
 		limit = &intel_limits_vlv;
 	} else if (!IS_GEN2(dev)) {
-		if (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS))
+		if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS))
 			limit = &intel_limits_i9xx_lvds;
 		else
 			limit = &intel_limits_i9xx_sdvo;
 	} else {
-		if (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS))
+		if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS))
 			limit = &intel_limits_i8xx_lvds;
-		else if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DVO))
+		else if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_DVO))
 			limit = &intel_limits_i8xx_dvo;
 		else
 			limit = &intel_limits_i8xx_dac;
@@ -578,15 +596,15 @@ static bool intel_PLL_is_valid(struct drm_device *dev,
 }
 
 static bool
-i9xx_find_best_dpll(const intel_limit_t *limit, struct drm_crtc *crtc,
+i9xx_find_best_dpll(const intel_limit_t *limit, struct intel_crtc *crtc,
 		    int target, int refclk, intel_clock_t *match_clock,
 		    intel_clock_t *best_clock)
 {
-	struct drm_device *dev = crtc->dev;
+	struct drm_device *dev = crtc->base.dev;
 	intel_clock_t clock;
 	int err = target;
 
-	if (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS)) {
+	if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS)) {
 		/*
 		 * For LVDS just rely on its current settings for dual-channel.
 		 * We haven't figured out how to reliably set up different
@@ -639,15 +657,15 @@ i9xx_find_best_dpll(const intel_limit_t *limit, struct drm_crtc *crtc,
 }
 
 static bool
-pnv_find_best_dpll(const intel_limit_t *limit, struct drm_crtc *crtc,
+pnv_find_best_dpll(const intel_limit_t *limit, struct intel_crtc *crtc,
 		   int target, int refclk, intel_clock_t *match_clock,
 		   intel_clock_t *best_clock)
 {
-	struct drm_device *dev = crtc->dev;
+	struct drm_device *dev = crtc->base.dev;
 	intel_clock_t clock;
 	int err = target;
 
-	if (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS)) {
+	if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS)) {
 		/*
 		 * For LVDS just rely on its current settings for dual-channel.
 		 * We haven't figured out how to reliably set up different
@@ -698,11 +716,11 @@ pnv_find_best_dpll(const intel_limit_t *limit, struct drm_crtc *crtc,
 }
 
 static bool
-g4x_find_best_dpll(const intel_limit_t *limit, struct drm_crtc *crtc,
+g4x_find_best_dpll(const intel_limit_t *limit, struct intel_crtc *crtc,
 		   int target, int refclk, intel_clock_t *match_clock,
 		   intel_clock_t *best_clock)
 {
-	struct drm_device *dev = crtc->dev;
+	struct drm_device *dev = crtc->base.dev;
 	intel_clock_t clock;
 	int max_n;
 	bool found;
@@ -710,7 +728,7 @@ g4x_find_best_dpll(const intel_limit_t *limit, struct drm_crtc *crtc,
 	int err_most = (target >> 8) + (target >> 9);
 	found = false;
 
-	if (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS)) {
+	if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS)) {
 		if (intel_is_dual_link_lvds(dev))
 			clock.p2 = limit->p2.p2_fast;
 		else
@@ -755,11 +773,11 @@ g4x_find_best_dpll(const intel_limit_t *limit, struct drm_crtc *crtc,
 }
 
 static bool
-vlv_find_best_dpll(const intel_limit_t *limit, struct drm_crtc *crtc,
+vlv_find_best_dpll(const intel_limit_t *limit, struct intel_crtc *crtc,
 		   int target, int refclk, intel_clock_t *match_clock,
 		   intel_clock_t *best_clock)
 {
-	struct drm_device *dev = crtc->dev;
+	struct drm_device *dev = crtc->base.dev;
 	intel_clock_t clock;
 	unsigned int bestppm = 1000000;
 	/* min update 19.2 MHz */
@@ -812,11 +830,11 @@ vlv_find_best_dpll(const intel_limit_t *limit, struct drm_crtc *crtc,
 }
 
 static bool
-chv_find_best_dpll(const intel_limit_t *limit, struct drm_crtc *crtc,
+chv_find_best_dpll(const intel_limit_t *limit, struct intel_crtc *crtc,
 		   int target, int refclk, intel_clock_t *match_clock,
 		   intel_clock_t *best_clock)
 {
-	struct drm_device *dev = crtc->dev;
+	struct drm_device *dev = crtc->base.dev;
 	intel_clock_t clock;
 	uint64_t m2;
 	int found = false;
@@ -889,60 +907,6 @@ enum transcoder intel_pipe_to_cpu_transcoder(struct drm_i915_private *dev_priv,
 	return intel_crtc->config.cpu_transcoder;
 }
 
-static void g4x_wait_for_vblank(struct drm_device *dev, int pipe)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	u32 frame, frame_reg = PIPE_FRMCOUNT_GM45(pipe);
-
-	frame = I915_READ(frame_reg);
-
-	if (wait_for(I915_READ_NOTRACE(frame_reg) != frame, 50))
-		WARN(1, "vblank wait on pipe %c timed out\n",
-		     pipe_name(pipe));
-}
-
-/**
- * intel_wait_for_vblank - wait for vblank on a given pipe
- * @dev: drm device
- * @pipe: pipe to wait for
- *
- * Wait for vblank to occur on a given pipe.  Needed for various bits of
- * mode setting code.
- */
-void intel_wait_for_vblank(struct drm_device *dev, int pipe)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	int pipestat_reg = PIPESTAT(pipe);
-
-	if (IS_G4X(dev) || INTEL_INFO(dev)->gen >= 5) {
-		g4x_wait_for_vblank(dev, pipe);
-		return;
-	}
-
-	/* Clear existing vblank status. Note this will clear any other
-	 * sticky status fields as well.
-	 *
-	 * This races with i915_driver_irq_handler() with the result
-	 * that either function could miss a vblank event.  Here it is not
-	 * fatal, as we will either wait upon the next vblank interrupt or
-	 * timeout.  Generally speaking intel_wait_for_vblank() is only
-	 * called during modeset at which time the GPU should be idle and
-	 * should *not* be performing page flips and thus not waiting on
-	 * vblanks...
-	 * Currently, the result of us stealing a vblank from the irq
-	 * handler is that a single frame will be skipped during swapbuffers.
-	 */
-	I915_WRITE(pipestat_reg,
-		   I915_READ(pipestat_reg) | PIPE_VBLANK_INTERRUPT_STATUS);
-
-	/* Wait for vblank interrupt bit to set */
-	if (wait_for(I915_READ(pipestat_reg) &
-		     PIPE_VBLANK_INTERRUPT_STATUS,
-		     50))
-		DRM_DEBUG_KMS("vblank wait on pipe %c timed out\n",
-			      pipe_name(pipe));
-}
-
 static bool pipe_dsl_stopped(struct drm_device *dev, enum pipe pipe)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1189,8 +1153,8 @@ void assert_fdi_rx_pll(struct drm_i915_private *dev_priv,
 	     state_string(state), state_string(cur_state));
 }
 
-static void assert_panel_unlocked(struct drm_i915_private *dev_priv,
-				  enum pipe pipe)
+void assert_panel_unlocked(struct drm_i915_private *dev_priv,
+			   enum pipe pipe)
 {
 	struct drm_device *dev = dev_priv->dev;
 	int pp_reg;
@@ -1263,7 +1227,7 @@ void assert_pipe(struct drm_i915_private *dev_priv,
 	    (pipe == PIPE_B && dev_priv->quirks & QUIRK_PIPEB_FORCE))
 		state = true;
 
-	if (!intel_display_power_enabled(dev_priv,
+	if (!intel_display_power_is_enabled(dev_priv,
 				POWER_DOMAIN_TRANSCODER(cpu_transcoder))) {
 		cur_state = false;
 	} else {
@@ -1332,7 +1296,14 @@ static void assert_sprites_disabled(struct drm_i915_private *dev_priv,
 	int reg, sprite;
 	u32 val;
 
-	if (IS_VALLEYVIEW(dev)) {
+	if (INTEL_INFO(dev)->gen >= 9) {
+		for_each_sprite(pipe, sprite) {
+			val = I915_READ(PLANE_CTL(pipe, sprite));
+			WARN(val & PLANE_CTL_ENABLE,
+			     "plane %d assertion failure, should be off on pipe %c but is still active\n",
+			     sprite, pipe_name(pipe));
+		}
+	} else if (IS_VALLEYVIEW(dev)) {
 		for_each_sprite(pipe, sprite) {
 			reg = SPCNTR(pipe, sprite);
 			val = I915_READ(reg);
@@ -1533,12 +1504,13 @@ static void intel_init_dpio(struct drm_device *dev)
 	}
 }
 
-static void vlv_enable_pll(struct intel_crtc *crtc)
+static void vlv_enable_pll(struct intel_crtc *crtc,
+			   const struct intel_crtc_config *pipe_config)
 {
 	struct drm_device *dev = crtc->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int reg = DPLL(crtc->pipe);
-	u32 dpll = crtc->config.dpll_hw_state.dpll;
+	u32 dpll = pipe_config->dpll_hw_state.dpll;
 
 	assert_pipe_disabled(dev_priv, crtc->pipe);
 
@@ -1556,7 +1528,7 @@ static void vlv_enable_pll(struct intel_crtc *crtc)
 	if (wait_for(((I915_READ(reg) & DPLL_LOCK_VLV) == DPLL_LOCK_VLV), 1))
 		DRM_ERROR("DPLL %d failed to lock\n", crtc->pipe);
 
-	I915_WRITE(DPLL_MD(crtc->pipe), crtc->config.dpll_hw_state.dpll_md);
+	I915_WRITE(DPLL_MD(crtc->pipe), pipe_config->dpll_hw_state.dpll_md);
 	POSTING_READ(DPLL_MD(crtc->pipe));
 
 	/* We do this three times for luck */
@@ -1571,7 +1543,8 @@ static void vlv_enable_pll(struct intel_crtc *crtc)
 	udelay(150); /* wait for warmup */
 }
 
-static void chv_enable_pll(struct intel_crtc *crtc)
+static void chv_enable_pll(struct intel_crtc *crtc,
+			   const struct intel_crtc_config *pipe_config)
 {
 	struct drm_device *dev = crtc->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1596,14 +1569,14 @@ static void chv_enable_pll(struct intel_crtc *crtc)
 	udelay(1);
 
 	/* Enable PLL */
-	I915_WRITE(DPLL(pipe), crtc->config.dpll_hw_state.dpll);
+	I915_WRITE(DPLL(pipe), pipe_config->dpll_hw_state.dpll);
 
 	/* Check PLL is locked */
 	if (wait_for(((I915_READ(DPLL(pipe)) & DPLL_LOCK_VLV) == DPLL_LOCK_VLV), 1))
 		DRM_ERROR("PLL %d failed to lock\n", pipe);
 
 	/* not sure when this should be written */
-	I915_WRITE(DPLL_MD(pipe), crtc->config.dpll_hw_state.dpll_md);
+	I915_WRITE(DPLL_MD(pipe), pipe_config->dpll_hw_state.dpll_md);
 	POSTING_READ(DPLL_MD(pipe));
 
 	mutex_unlock(&dev_priv->dpio_lock);
@@ -1616,7 +1589,7 @@ static int intel_num_dvo_pipes(struct drm_device *dev)
 
 	for_each_intel_crtc(dev, crtc)
 		count += crtc->active &&
-			intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_DVO);
+			intel_pipe_has_type(crtc, INTEL_OUTPUT_DVO);
 
 	return count;
 }
@@ -1695,7 +1668,7 @@ static void i9xx_disable_pll(struct intel_crtc *crtc)
 
 	/* Disable DVO 2x clock on both PLLs if necessary */
 	if (IS_I830(dev) &&
-	    intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_DVO) &&
+	    intel_pipe_has_type(crtc, INTEL_OUTPUT_DVO) &&
 	    intel_num_dvo_pipes(dev) == 1) {
 		I915_WRITE(DPLL(PIPE_B),
 			   I915_READ(DPLL(PIPE_B)) & ~DPLL_DVO_2X_MODE);
@@ -1806,7 +1779,7 @@ static void intel_prepare_shared_dpll(struct intel_crtc *crtc)
 	if (WARN_ON(pll == NULL))
 		return;
 
-	WARN_ON(!pll->refcount);
+	WARN_ON(!pll->config.crtc_mask);
 	if (pll->active == 0) {
 		DRM_DEBUG_DRIVER("setting up %s\n", pll->name);
 		WARN_ON(pll->on);
@@ -1833,7 +1806,7 @@ static void intel_enable_shared_dpll(struct intel_crtc *crtc)
 	if (WARN_ON(pll == NULL))
 		return;
 
-	if (WARN_ON(pll->refcount == 0))
+	if (WARN_ON(pll->config.crtc_mask == 0))
 		return;
 
 	DRM_DEBUG_KMS("enable %s (active %d, on? %d) for crtc %d\n",
@@ -1865,7 +1838,7 @@ static void intel_disable_shared_dpll(struct intel_crtc *crtc)
 	if (WARN_ON(pll == NULL))
 	       return;
 
-	if (WARN_ON(pll->refcount == 0))
+	if (WARN_ON(pll->config.crtc_mask == 0))
 		return;
 
 	DRM_DEBUG_KMS("disable %s (active %d, on? %d) for crtc %d\n",
@@ -1933,7 +1906,7 @@ static void ironlake_enable_pch_transcoder(struct drm_i915_private *dev_priv,
 	val &= ~TRANS_INTERLACE_MASK;
 	if ((pipeconf_val & PIPECONF_INTERLACE_MASK) == PIPECONF_INTERLACED_ILK)
 		if (HAS_PCH_IBX(dev_priv->dev) &&
-		    intel_pipe_has_type(crtc, INTEL_OUTPUT_SDVO))
+		    intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_SDVO))
 			val |= TRANS_LEGACY_INTERLACED_ILK;
 		else
 			val |= TRANS_INTERLACED;
@@ -2056,7 +2029,7 @@ static void intel_enable_pipe(struct intel_crtc *crtc)
 	 * need the check.
 	 */
 	if (!HAS_PCH_SPLIT(dev_priv->dev))
-		if (intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_DSI))
+		if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DSI))
 			assert_dsi_pll_enabled(dev_priv);
 		else
 			assert_pll_enabled(dev_priv, pipe);
@@ -2221,11 +2194,13 @@ static int intel_align_height(struct drm_device *dev, int height, bool tiled)
 }
 
 int
-intel_pin_and_fence_fb_obj(struct drm_device *dev,
-			   struct drm_i915_gem_object *obj,
+intel_pin_and_fence_fb_obj(struct drm_plane *plane,
+			   struct drm_framebuffer *fb,
 			   struct intel_engine_cs *pipelined)
 {
+	struct drm_device *dev = fb->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	u32 alignment;
 	int ret;
 
@@ -2233,7 +2208,9 @@ intel_pin_and_fence_fb_obj(struct drm_device *dev,
 
 	switch (obj->tiling_mode) {
 	case I915_TILING_NONE:
-		if (IS_BROADWATER(dev) || IS_CRESTLINE(dev))
+		if (INTEL_INFO(dev)->gen >= 9)
+			alignment = 256 * 1024;
+		else if (IS_BROADWATER(dev) || IS_CRESTLINE(dev))
 			alignment = 128 * 1024;
 		else if (INTEL_INFO(dev)->gen >= 4)
 			alignment = 4 * 1024;
@@ -2241,8 +2218,12 @@ intel_pin_and_fence_fb_obj(struct drm_device *dev,
 			alignment = 64 * 1024;
 		break;
 	case I915_TILING_X:
-		/* pin() will align the object as required by fence */
-		alignment = 0;
+		if (INTEL_INFO(dev)->gen >= 9)
+			alignment = 256 * 1024;
+		else {
+			/* pin() will align the object as required by fence */
+			alignment = 0;
+		}
 		break;
 	case I915_TILING_Y:
 		WARN(1, "Y tiled bo slipped through, driver bug!\n");
@@ -2402,6 +2383,7 @@ static void intel_find_plane_obj(struct intel_crtc *intel_crtc,
 				 struct intel_plane_config *plane_config)
 {
 	struct drm_device *dev = intel_crtc->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_crtc *c;
 	struct intel_crtc *i;
 	struct drm_i915_gem_object *obj;
@@ -2433,6 +2415,9 @@ static void intel_find_plane_obj(struct intel_crtc *intel_crtc,
 			continue;
 
 		if (i915_gem_obj_ggtt_offset(obj) == plane_config->base) {
+			if (obj->tiling_mode != I915_TILING_NONE)
+				dev_priv->preserve_bios_swizzle = true;
+
 			drm_framebuffer_reference(c->primary->fb);
 			intel_crtc->base.primary->fb = c->primary->fb;
 			obj->frontbuffer_bits |= INTEL_FRONTBUFFER_PRIMARY(intel_crtc->pipe);
@@ -2486,6 +2471,12 @@ static void i9xx_update_primary_plane(struct drm_crtc *crtc,
 			   ((intel_crtc->config.pipe_src_h - 1) << 16) |
 			   (intel_crtc->config.pipe_src_w - 1));
 		I915_WRITE(DSPPOS(plane), 0);
+	} else if (IS_CHERRYVIEW(dev) && plane == PLANE_B) {
+		I915_WRITE(PRIMSIZE(plane),
+			   ((intel_crtc->config.pipe_src_h - 1) << 16) |
+			   (intel_crtc->config.pipe_src_w - 1));
+		I915_WRITE(PRIMPOS(plane), 0);
+		I915_WRITE(PRIMCNSTALPHA(plane), 0);
 	}
 
 	switch (fb->pixel_format) {
@@ -2672,6 +2663,92 @@ static void ironlake_update_primary_plane(struct drm_crtc *crtc,
 	POSTING_READ(reg);
 }
 
+static void skylake_update_primary_plane(struct drm_crtc *crtc,
+					 struct drm_framebuffer *fb,
+					 int x, int y)
+{
+	struct drm_device *dev = crtc->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	struct intel_framebuffer *intel_fb;
+	struct drm_i915_gem_object *obj;
+	int pipe = intel_crtc->pipe;
+	u32 plane_ctl, stride;
+
+	if (!intel_crtc->primary_enabled) {
+		I915_WRITE(PLANE_CTL(pipe, 0), 0);
+		I915_WRITE(PLANE_SURF(pipe, 0), 0);
+		POSTING_READ(PLANE_CTL(pipe, 0));
+		return;
+	}
+
+	plane_ctl = PLANE_CTL_ENABLE |
+		    PLANE_CTL_PIPE_GAMMA_ENABLE |
+		    PLANE_CTL_PIPE_CSC_ENABLE;
+
+	switch (fb->pixel_format) {
+	case DRM_FORMAT_RGB565:
+		plane_ctl |= PLANE_CTL_FORMAT_RGB_565;
+		break;
+	case DRM_FORMAT_XRGB8888:
+		plane_ctl |= PLANE_CTL_FORMAT_XRGB_8888;
+		break;
+	case DRM_FORMAT_XBGR8888:
+		plane_ctl |= PLANE_CTL_ORDER_RGBX;
+		plane_ctl |= PLANE_CTL_FORMAT_XRGB_8888;
+		break;
+	case DRM_FORMAT_XRGB2101010:
+		plane_ctl |= PLANE_CTL_FORMAT_XRGB_2101010;
+		break;
+	case DRM_FORMAT_XBGR2101010:
+		plane_ctl |= PLANE_CTL_ORDER_RGBX;
+		plane_ctl |= PLANE_CTL_FORMAT_XRGB_2101010;
+		break;
+	default:
+		BUG();
+	}
+
+	intel_fb = to_intel_framebuffer(fb);
+	obj = intel_fb->obj;
+
+	/*
+	 * The stride is either expressed as a multiple of 64 bytes chunks for
+	 * linear buffers or in number of tiles for tiled buffers.
+	 */
+	switch (obj->tiling_mode) {
+	case I915_TILING_NONE:
+		stride = fb->pitches[0] >> 6;
+		break;
+	case I915_TILING_X:
+		plane_ctl |= PLANE_CTL_TILED_X;
+		stride = fb->pitches[0] >> 9;
+		break;
+	default:
+		BUG();
+	}
+
+	plane_ctl |= PLANE_CTL_PLANE_GAMMA_DISABLE;
+	if (to_intel_plane(crtc->primary)->rotation == BIT(DRM_ROTATE_180))
+		plane_ctl |= PLANE_CTL_ROTATE_180;
+
+	I915_WRITE(PLANE_CTL(pipe, 0), plane_ctl);
+
+	DRM_DEBUG_KMS("Writing base %08lX %d,%d,%d,%d pitch=%d\n",
+		      i915_gem_obj_ggtt_offset(obj),
+		      x, y, fb->width, fb->height,
+		      fb->pitches[0]);
+
+	I915_WRITE(PLANE_POS(pipe, 0), 0);
+	I915_WRITE(PLANE_OFFSET(pipe, 0), (y << 16) | x);
+	I915_WRITE(PLANE_SIZE(pipe, 0),
+		   (intel_crtc->config.pipe_src_h - 1) << 16 |
+		   (intel_crtc->config.pipe_src_w - 1));
+	I915_WRITE(PLANE_STRIDE(pipe, 0), stride);
+	I915_WRITE(PLANE_SURF(pipe, 0), i915_gem_obj_ggtt_offset(obj));
+
+	POSTING_READ(PLANE_SURF(pipe, 0));
+}
+
 /* Assume fb object is pinned & idle & fenced and just update base pointers */
 static int
 intel_pipe_set_base_atomic(struct drm_crtc *crtc, struct drm_framebuffer *fb,
@@ -2682,32 +2759,16 @@ intel_pipe_set_base_atomic(struct drm_crtc *crtc, struct drm_framebuffer *fb,
 
 	if (dev_priv->display.disable_fbc)
 		dev_priv->display.disable_fbc(dev);
-	intel_increase_pllclock(dev, to_intel_crtc(crtc)->pipe);
 
 	dev_priv->display.update_primary_plane(crtc, fb, x, y);
 
 	return 0;
 }
 
-void intel_display_handle_reset(struct drm_device *dev)
+static void intel_complete_page_flips(struct drm_device *dev)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_crtc *crtc;
 
-	/*
-	 * Flips in the rings have been nuked by the reset,
-	 * so complete all pending flips so that user space
-	 * will get its events and not get stuck.
-	 *
-	 * Also update the base address of all primary
-	 * planes to the the last fb to make sure we're
-	 * showing the correct fb after a reset.
-	 *
-	 * Need to make two loops over the crtcs so that we
-	 * don't try to grab a crtc mutex before the
-	 * pending_flip_queue really got woken up.
-	 */
-
 	for_each_crtc(dev, crtc) {
 		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 		enum plane plane = intel_crtc->plane;
@@ -2715,6 +2776,12 @@ void intel_display_handle_reset(struct drm_device *dev)
 		intel_prepare_page_flip(dev, plane);
 		intel_finish_page_flip_plane(dev, plane);
 	}
+}
+
+static void intel_update_primary_planes(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_crtc *crtc;
 
 	for_each_crtc(dev, crtc) {
 		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
@@ -2734,6 +2801,79 @@ void intel_display_handle_reset(struct drm_device *dev)
 	}
 }
 
+void intel_prepare_reset(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct intel_crtc *crtc;
+
+	/* no reset support for gen2 */
+	if (IS_GEN2(dev))
+		return;
+
+	/* reset doesn't touch the display */
+	if (INTEL_INFO(dev)->gen >= 5 || IS_G4X(dev))
+		return;
+
+	drm_modeset_lock_all(dev);
+
+	/*
+	 * Disabling the crtcs gracefully seems nicer. Also the
+	 * g33 docs say we should at least disable all the planes.
+	 */
+	for_each_intel_crtc(dev, crtc) {
+		if (crtc->active)
+			dev_priv->display.crtc_disable(&crtc->base);
+	}
+}
+
+void intel_finish_reset(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+
+	/*
+	 * Flips in the rings will be nuked by the reset,
+	 * so complete all pending flips so that user space
+	 * will get its events and not get stuck.
+	 */
+	intel_complete_page_flips(dev);
+
+	/* no reset support for gen2 */
+	if (IS_GEN2(dev))
+		return;
+
+	/* reset doesn't touch the display */
+	if (INTEL_INFO(dev)->gen >= 5 || IS_G4X(dev)) {
+		/*
+		 * Flips in the rings have been nuked by the reset,
+		 * so update the base address of all primary
+		 * planes to the the last fb to make sure we're
+		 * showing the correct fb after a reset.
+		 */
+		intel_update_primary_planes(dev);
+		return;
+	}
+
+	/*
+	 * The display has been reset as well,
+	 * so need a full re-initialization.
+	 */
+	intel_runtime_pm_disable_interrupts(dev_priv);
+	intel_runtime_pm_enable_interrupts(dev_priv);
+
+	intel_modeset_init_hw(dev);
+
+	spin_lock_irq(&dev_priv->irq_lock);
+	if (dev_priv->display.hpd_irq_setup)
+		dev_priv->display.hpd_irq_setup(dev);
+	spin_unlock_irq(&dev_priv->irq_lock);
+
+	intel_modeset_setup_hw_state(dev, true);
+
+	intel_hpd_init(dev_priv);
+
+	drm_modeset_unlock_all(dev);
+}
+
 static int
 intel_finish_fb(struct drm_framebuffer *old_fb)
 {
@@ -2762,20 +2902,58 @@ static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
 	struct drm_device *dev = crtc->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-	unsigned long flags;
 	bool pending;
 
 	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
 	    intel_crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
 		return false;
 
-	spin_lock_irqsave(&dev->event_lock, flags);
+	spin_lock_irq(&dev->event_lock);
 	pending = to_intel_crtc(crtc)->unpin_work != NULL;
-	spin_unlock_irqrestore(&dev->event_lock, flags);
+	spin_unlock_irq(&dev->event_lock);
 
 	return pending;
 }
 
+static void intel_update_pipe_size(struct intel_crtc *crtc)
+{
+	struct drm_device *dev = crtc->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	const struct drm_display_mode *adjusted_mode;
+
+	if (!i915.fastboot)
+		return;
+
+	/*
+	 * Update pipe size and adjust fitter if needed: the reason for this is
+	 * that in compute_mode_changes we check the native mode (not the pfit
+	 * mode) to see if we can flip rather than do a full mode set. In the
+	 * fastboot case, we'll flip, but if we don't update the pipesrc and
+	 * pfit state, we'll end up with a big fb scanned out into the wrong
+	 * sized surface.
+	 *
+	 * To fix this properly, we need to hoist the checks up into
+	 * compute_mode_changes (or above), check the actual pfit state and
+	 * whether the platform allows pfit disable with pipe active, and only
+	 * then update the pipesrc and pfit state, even on the flip path.
+	 */
+
+	adjusted_mode = &crtc->config.adjusted_mode;
+
+	I915_WRITE(PIPESRC(crtc->pipe),
+		   ((adjusted_mode->crtc_hdisplay - 1) << 16) |
+		   (adjusted_mode->crtc_vdisplay - 1));
+	if (!crtc->config.pch_pfit.enabled &&
+	    (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS) ||
+	     intel_pipe_has_type(crtc, INTEL_OUTPUT_EDP))) {
+		I915_WRITE(PF_CTL(crtc->pipe), 0);
+		I915_WRITE(PF_WIN_POS(crtc->pipe), 0);
+		I915_WRITE(PF_WIN_SZ(crtc->pipe), 0);
+	}
+	crtc->config.pipe_src_w = adjusted_mode->crtc_hdisplay;
+	crtc->config.pipe_src_h = adjusted_mode->crtc_vdisplay;
+}
+
 static int
 intel_pipe_set_base(struct drm_crtc *crtc, int x, int y,
 		    struct drm_framebuffer *fb)
@@ -2785,7 +2963,6 @@ intel_pipe_set_base(struct drm_crtc *crtc, int x, int y,
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	enum pipe pipe = intel_crtc->pipe;
 	struct drm_framebuffer *old_fb = crtc->primary->fb;
-	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	struct drm_i915_gem_object *old_obj = intel_fb_obj(old_fb);
 	int ret;
 
@@ -2808,9 +2985,9 @@ intel_pipe_set_base(struct drm_crtc *crtc, int x, int y,
 	}
 
 	mutex_lock(&dev->struct_mutex);
-	ret = intel_pin_and_fence_fb_obj(dev, obj, NULL);
+	ret = intel_pin_and_fence_fb_obj(crtc->primary, fb, NULL);
 	if (ret == 0)
-		i915_gem_track_fb(old_obj, obj,
+		i915_gem_track_fb(old_obj, intel_fb_obj(fb),
 				  INTEL_FRONTBUFFER_PRIMARY(pipe));
 	mutex_unlock(&dev->struct_mutex);
 	if (ret != 0) {
@@ -2818,37 +2995,6 @@ intel_pipe_set_base(struct drm_crtc *crtc, int x, int y,
 		return ret;
 	}
 
-	/*
-	 * Update pipe size and adjust fitter if needed: the reason for this is
-	 * that in compute_mode_changes we check the native mode (not the pfit
-	 * mode) to see if we can flip rather than do a full mode set. In the
-	 * fastboot case, we'll flip, but if we don't update the pipesrc and
-	 * pfit state, we'll end up with a big fb scanned out into the wrong
-	 * sized surface.
-	 *
-	 * To fix this properly, we need to hoist the checks up into
-	 * compute_mode_changes (or above), check the actual pfit state and
-	 * whether the platform allows pfit disable with pipe active, and only
-	 * then update the pipesrc and pfit state, even on the flip path.
-	 */
-	if (i915.fastboot) {
-		const struct drm_display_mode *adjusted_mode =
-			&intel_crtc->config.adjusted_mode;
-
-		I915_WRITE(PIPESRC(intel_crtc->pipe),
-			   ((adjusted_mode->crtc_hdisplay - 1) << 16) |
-			   (adjusted_mode->crtc_vdisplay - 1));
-		if (!intel_crtc->config.pch_pfit.enabled &&
-		    (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS) ||
-		     intel_pipe_has_type(crtc, INTEL_OUTPUT_EDP))) {
-			I915_WRITE(PF_CTL(intel_crtc->pipe), 0);
-			I915_WRITE(PF_WIN_POS(intel_crtc->pipe), 0);
-			I915_WRITE(PF_WIN_SZ(intel_crtc->pipe), 0);
-		}
-		intel_crtc->config.pipe_src_w = adjusted_mode->crtc_hdisplay;
-		intel_crtc->config.pipe_src_h = adjusted_mode->crtc_vdisplay;
-	}
-
 	dev_priv->display.update_primary_plane(crtc, fb, x, y);
 
 	if (intel_crtc->active)
@@ -3472,14 +3618,13 @@ void intel_crtc_wait_for_pending_flips(struct drm_crtc *crtc)
 				       !intel_crtc_has_pending_flip(crtc),
 				       60*HZ) == 0)) {
 		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-		unsigned long flags;
 
-		spin_lock_irqsave(&dev->event_lock, flags);
+		spin_lock_irq(&dev->event_lock);
 		if (intel_crtc->unpin_work) {
 			WARN_ONCE(1, "Removing stuck page flip\n");
 			page_flip_completed(intel_crtc);
 		}
-		spin_unlock_irqrestore(&dev->event_lock, flags);
+		spin_unlock_irq(&dev->event_lock);
 	}
 
 	if (crtc->primary->fb) {
@@ -3704,9 +3849,7 @@ static void ironlake_pch_enable(struct drm_crtc *crtc)
 	intel_fdi_normal_train(crtc);
 
 	/* For PCH DP, enable TRANS_DP_CTL */
-	if (HAS_PCH_CPT(dev) &&
-	    (intel_pipe_has_type(crtc, INTEL_OUTPUT_DISPLAYPORT) ||
-	     intel_pipe_has_type(crtc, INTEL_OUTPUT_EDP))) {
+	if (HAS_PCH_CPT(dev) && intel_crtc->config.has_dp_encoder) {
 		u32 bpc = (I915_READ(PIPECONF(pipe)) & PIPECONF_BPC_MASK) >> 5;
 		reg = TRANS_DP_CTL(pipe);
 		temp = I915_READ(reg);
@@ -3766,12 +3909,13 @@ void intel_put_shared_dpll(struct intel_crtc *crtc)
 	if (pll == NULL)
 		return;
 
-	if (pll->refcount == 0) {
-		WARN(1, "bad %s refcount\n", pll->name);
+	if (!(pll->config.crtc_mask & (1 << crtc->pipe))) {
+		WARN(1, "bad %s crtc mask\n", pll->name);
 		return;
 	}
 
-	if (--pll->refcount == 0) {
+	pll->config.crtc_mask &= ~(1 << crtc->pipe);
+	if (pll->config.crtc_mask == 0) {
 		WARN_ON(pll->on);
 		WARN_ON(pll->active);
 	}
@@ -3782,15 +3926,9 @@ void intel_put_shared_dpll(struct intel_crtc *crtc)
 struct intel_shared_dpll *intel_get_shared_dpll(struct intel_crtc *crtc)
 {
 	struct drm_i915_private *dev_priv = crtc->base.dev->dev_private;
-	struct intel_shared_dpll *pll = intel_crtc_to_shared_dpll(crtc);
+	struct intel_shared_dpll *pll;
 	enum intel_dpll_id i;
 
-	if (pll) {
-		DRM_DEBUG_KMS("CRTC:%d dropping existing %s\n",
-			      crtc->base.base.id, pll->name);
-		intel_put_shared_dpll(crtc);
-	}
-
 	if (HAS_PCH_IBX(dev_priv->dev)) {
 		/* Ironlake PCH has a fixed PLL->PCH pipe mapping. */
 		i = (enum intel_dpll_id) crtc->pipe;
@@ -3799,7 +3937,7 @@ struct intel_shared_dpll *intel_get_shared_dpll(struct intel_crtc *crtc)
 		DRM_DEBUG_KMS("CRTC:%d using pre-allocated %s\n",
 			      crtc->base.base.id, pll->name);
 
-		WARN_ON(pll->refcount);
+		WARN_ON(pll->new_config->crtc_mask);
 
 		goto found;
 	}
@@ -3808,15 +3946,16 @@ struct intel_shared_dpll *intel_get_shared_dpll(struct intel_crtc *crtc)
 		pll = &dev_priv->shared_dplls[i];
 
 		/* Only want to check enabled timings first */
-		if (pll->refcount == 0)
+		if (pll->new_config->crtc_mask == 0)
 			continue;
 
-		if (memcmp(&crtc->config.dpll_hw_state, &pll->hw_state,
-			   sizeof(pll->hw_state)) == 0) {
-			DRM_DEBUG_KMS("CRTC:%d sharing existing %s (refcount %d, ative %d)\n",
-				      crtc->base.base.id,
-				      pll->name, pll->refcount, pll->active);
-
+		if (memcmp(&crtc->new_config->dpll_hw_state,
+			   &pll->new_config->hw_state,
+			   sizeof(pll->new_config->hw_state)) == 0) {
+			DRM_DEBUG_KMS("CRTC:%d sharing existing %s (crtc mask 0x%08x, ative %d)\n",
+				      crtc->base.base.id, pll->name,
+				      pll->new_config->crtc_mask,
+				      pll->active);
 			goto found;
 		}
 	}
@@ -3824,7 +3963,7 @@ struct intel_shared_dpll *intel_get_shared_dpll(struct intel_crtc *crtc)
 	/* Ok no matching timings, maybe there's a free one? */
 	for (i = 0; i < dev_priv->num_shared_dpll; i++) {
 		pll = &dev_priv->shared_dplls[i];
-		if (pll->refcount == 0) {
+		if (pll->new_config->crtc_mask == 0) {
 			DRM_DEBUG_KMS("CRTC:%d allocated %s\n",
 				      crtc->base.base.id, pll->name);
 			goto found;
@@ -3834,18 +3973,86 @@ struct intel_shared_dpll *intel_get_shared_dpll(struct intel_crtc *crtc)
 	return NULL;
 
 found:
-	if (pll->refcount == 0)
-		pll->hw_state = crtc->config.dpll_hw_state;
+	if (pll->new_config->crtc_mask == 0)
+		pll->new_config->hw_state = crtc->new_config->dpll_hw_state;
 
-	crtc->config.shared_dpll = i;
+	crtc->new_config->shared_dpll = i;
 	DRM_DEBUG_DRIVER("using %s for pipe %c\n", pll->name,
 			 pipe_name(crtc->pipe));
 
-	pll->refcount++;
+	pll->new_config->crtc_mask |= 1 << crtc->pipe;
 
 	return pll;
 }
 
+/**
+ * intel_shared_dpll_start_config - start a new PLL staged config
+ * @dev_priv: DRM device
+ * @clear_pipes: mask of pipes that will have their PLLs freed
+ *
+ * Starts a new PLL staged config, copying the current config but
+ * releasing the references of pipes specified in clear_pipes.
+ */
+static int intel_shared_dpll_start_config(struct drm_i915_private *dev_priv,
+					  unsigned clear_pipes)
+{
+	struct intel_shared_dpll *pll;
+	enum intel_dpll_id i;
+
+	for (i = 0; i < dev_priv->num_shared_dpll; i++) {
+		pll = &dev_priv->shared_dplls[i];
+
+		pll->new_config = kmemdup(&pll->config, sizeof pll->config,
+					  GFP_KERNEL);
+		if (!pll->new_config)
+			goto cleanup;
+
+		pll->new_config->crtc_mask &= ~clear_pipes;
+	}
+
+	return 0;
+
+cleanup:
+	while (--i >= 0) {
+		pll = &dev_priv->shared_dplls[i];
+		kfree(pll->new_config);
+		pll->new_config = NULL;
+	}
+
+	return -ENOMEM;
+}
+
+static void intel_shared_dpll_commit(struct drm_i915_private *dev_priv)
+{
+	struct intel_shared_dpll *pll;
+	enum intel_dpll_id i;
+
+	for (i = 0; i < dev_priv->num_shared_dpll; i++) {
+		pll = &dev_priv->shared_dplls[i];
+
+		WARN_ON(pll->new_config == &pll->config);
+
+		pll->config = *pll->new_config;
+		kfree(pll->new_config);
+		pll->new_config = NULL;
+	}
+}
+
+static void intel_shared_dpll_abort_config(struct drm_i915_private *dev_priv)
+{
+	struct intel_shared_dpll *pll;
+	enum intel_dpll_id i;
+
+	for (i = 0; i < dev_priv->num_shared_dpll; i++) {
+		pll = &dev_priv->shared_dplls[i];
+
+		WARN_ON(pll->new_config == &pll->config);
+
+		kfree(pll->new_config);
+		pll->new_config = NULL;
+	}
+}
+
 static void cpt_verify_modeset(struct drm_device *dev, int pipe)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -3860,6 +4067,19 @@ static void cpt_verify_modeset(struct drm_device *dev, int pipe)
 	}
 }
 
+static void skylake_pfit_enable(struct intel_crtc *crtc)
+{
+	struct drm_device *dev = crtc->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int pipe = crtc->pipe;
+
+	if (crtc->config.pch_pfit.enabled) {
+		I915_WRITE(PS_CTL(pipe), PS_ENABLE);
+		I915_WRITE(PS_WIN_POS(pipe), crtc->config.pch_pfit.pos);
+		I915_WRITE(PS_WIN_SZ(pipe), crtc->config.pch_pfit.size);
+	}
+}
+
 static void ironlake_pfit_enable(struct intel_crtc *crtc)
 {
 	struct drm_device *dev = crtc->base.dev;
@@ -3983,7 +4203,7 @@ static void intel_crtc_load_lut(struct drm_crtc *crtc)
 		return;
 
 	if (!HAS_PCH_SPLIT(dev_priv->dev)) {
-		if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DSI))
+		if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DSI))
 			assert_dsi_pll_enabled(dev_priv);
 		else
 			assert_pll_enabled(dev_priv, pipe);
@@ -4038,10 +4258,6 @@ static void intel_crtc_enable_planes(struct drm_crtc *crtc)
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	int pipe = intel_crtc->pipe;
 
-	assert_vblank_disabled(crtc);
-
-	drm_vblank_on(dev, pipe);
-
 	intel_enable_primary_hw_plane(crtc->primary, crtc);
 	intel_enable_planes(crtc);
 	intel_crtc_update_cursor(crtc, true);
@@ -4087,10 +4303,6 @@ static void intel_crtc_disable_planes(struct drm_crtc *crtc)
 	 * consider this a flip to a NULL plane.
 	 */
 	intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_ALL_MASK(pipe));
-
-	drm_vblank_off(dev, pipe);
-
-	assert_vblank_disabled(crtc);
 }
 
 static void ironlake_crtc_enable(struct drm_crtc *crtc)
@@ -4123,8 +4335,8 @@ static void ironlake_crtc_enable(struct drm_crtc *crtc)
 
 	intel_crtc->active = true;
 
-	intel_set_cpu_fifo_underrun_reporting(dev, pipe, true);
-	intel_set_pch_fifo_underrun_reporting(dev, pipe, true);
+	intel_set_cpu_fifo_underrun_reporting(dev_priv, pipe, true);
+	intel_set_pch_fifo_underrun_reporting(dev_priv, pipe, true);
 
 	for_each_encoder_on_crtc(dev, crtc, encoder)
 		if (encoder->pre_enable)
@@ -4160,6 +4372,9 @@ static void ironlake_crtc_enable(struct drm_crtc *crtc)
 	if (HAS_PCH_CPT(dev))
 		cpt_verify_modeset(dev, intel_crtc->pipe);
 
+	assert_vblank_disabled(crtc);
+	drm_crtc_vblank_on(crtc);
+
 	intel_crtc_enable_planes(crtc);
 }
 
@@ -4235,19 +4450,23 @@ static void haswell_crtc_enable(struct drm_crtc *crtc)
 
 	intel_crtc->active = true;
 
-	intel_set_cpu_fifo_underrun_reporting(dev, pipe, true);
+	intel_set_cpu_fifo_underrun_reporting(dev_priv, pipe, true);
 	for_each_encoder_on_crtc(dev, crtc, encoder)
 		if (encoder->pre_enable)
 			encoder->pre_enable(encoder);
 
 	if (intel_crtc->config.has_pch_encoder) {
-		intel_set_pch_fifo_underrun_reporting(dev, TRANSCODER_A, true);
+		intel_set_pch_fifo_underrun_reporting(dev_priv, TRANSCODER_A,
+						      true);
 		dev_priv->display.fdi_link_train(crtc);
 	}
 
 	intel_ddi_enable_pipe_clock(intel_crtc);
 
-	ironlake_pfit_enable(intel_crtc);
+	if (IS_SKYLAKE(dev))
+		skylake_pfit_enable(intel_crtc);
+	else
+		ironlake_pfit_enable(intel_crtc);
 
 	/*
 	 * On ILK+ LUT must be loaded before the pipe is running but with
@@ -4272,12 +4491,30 @@ static void haswell_crtc_enable(struct drm_crtc *crtc)
 		intel_opregion_notify_encoder(encoder, true);
 	}
 
+	assert_vblank_disabled(crtc);
+	drm_crtc_vblank_on(crtc);
+
 	/* If we change the relative order between pipe/planes enabling, we need
 	 * to change the workaround. */
 	haswell_mode_set_planes_workaround(intel_crtc);
 	intel_crtc_enable_planes(crtc);
 }
 
+static void skylake_pfit_disable(struct intel_crtc *crtc)
+{
+	struct drm_device *dev = crtc->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int pipe = crtc->pipe;
+
+	/* To avoid upsetting the power well on haswell only disable the pfit if
+	 * it's in use. The hw state code will make sure we get this right. */
+	if (crtc->config.pch_pfit.enabled) {
+		I915_WRITE(PS_CTL(pipe), 0);
+		I915_WRITE(PS_WIN_POS(pipe), 0);
+		I915_WRITE(PS_WIN_SZ(pipe), 0);
+	}
+}
+
 static void ironlake_pfit_disable(struct intel_crtc *crtc)
 {
 	struct drm_device *dev = crtc->base.dev;
@@ -4307,11 +4544,14 @@ static void ironlake_crtc_disable(struct drm_crtc *crtc)
 
 	intel_crtc_disable_planes(crtc);
 
+	drm_crtc_vblank_off(crtc);
+	assert_vblank_disabled(crtc);
+
 	for_each_encoder_on_crtc(dev, crtc, encoder)
 		encoder->disable(encoder);
 
 	if (intel_crtc->config.has_pch_encoder)
-		intel_set_pch_fifo_underrun_reporting(dev, pipe, false);
+		intel_set_pch_fifo_underrun_reporting(dev_priv, pipe, false);
 
 	intel_disable_pipe(intel_crtc);
 
@@ -4368,13 +4608,17 @@ static void haswell_crtc_disable(struct drm_crtc *crtc)
 
 	intel_crtc_disable_planes(crtc);
 
+	drm_crtc_vblank_off(crtc);
+	assert_vblank_disabled(crtc);
+
 	for_each_encoder_on_crtc(dev, crtc, encoder) {
 		intel_opregion_notify_encoder(encoder, false);
 		encoder->disable(encoder);
 	}
 
 	if (intel_crtc->config.has_pch_encoder)
-		intel_set_pch_fifo_underrun_reporting(dev, TRANSCODER_A, false);
+		intel_set_pch_fifo_underrun_reporting(dev_priv, TRANSCODER_A,
+						      false);
 	intel_disable_pipe(intel_crtc);
 
 	if (intel_crtc->config.dp_encoder_is_mst)
@@ -4382,7 +4626,10 @@ static void haswell_crtc_disable(struct drm_crtc *crtc)
 
 	intel_ddi_disable_transcoder_func(dev_priv, cpu_transcoder);
 
-	ironlake_pfit_disable(intel_crtc);
+	if (IS_SKYLAKE(dev))
+		skylake_pfit_disable(intel_crtc);
+	else
+		ironlake_pfit_disable(intel_crtc);
 
 	intel_ddi_disable_pipe_clock(intel_crtc);
 
@@ -4508,20 +4755,6 @@ static unsigned long get_crtc_power_domains(struct drm_crtc *crtc)
 	return mask;
 }
 
-void intel_display_set_init_power(struct drm_i915_private *dev_priv,
-				  bool enable)
-{
-	if (dev_priv->power_domains.init_power_on == enable)
-		return;
-
-	if (enable)
-		intel_display_power_get(dev_priv, POWER_DOMAIN_INIT);
-	else
-		intel_display_power_put(dev_priv, POWER_DOMAIN_INIT);
-
-	dev_priv->power_domains.init_power_on = enable;
-}
-
 static void modeset_update_crtc_power_domains(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -4544,6 +4777,9 @@ static void modeset_update_crtc_power_domains(struct drm_device *dev)
 			intel_display_power_get(dev_priv, domain);
 	}
 
+	if (dev_priv->display.modeset_global_resources)
+		dev_priv->display.modeset_global_resources(dev);
+
 	for_each_intel_crtc(dev, crtc) {
 		enum intel_display_power_domain domain;
 
@@ -4575,7 +4811,7 @@ static void vlv_update_cdclk(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
 	dev_priv->vlv_cdclk_freq = dev_priv->display.get_display_clock_speed(dev);
-	DRM_DEBUG_DRIVER("Current CD clock rate: %d kHz",
+	DRM_DEBUG_DRIVER("Current CD clock rate: %d kHz\n",
 			 dev_priv->vlv_cdclk_freq);
 
 	/*
@@ -4614,10 +4850,9 @@ static void valleyview_set_cdclk(struct drm_device *dev, int cdclk)
 	mutex_unlock(&dev_priv->rps.hw_lock);
 
 	if (cdclk == 400000) {
-		u32 divider, vco;
+		u32 divider;
 
-		vco = valleyview_get_vco(dev_priv);
-		divider = DIV_ROUND_CLOSEST(vco << 1, cdclk) - 1;
+		divider = DIV_ROUND_CLOSEST(dev_priv->hpll_freq << 1, cdclk) - 1;
 
 		mutex_lock(&dev_priv->dpio_lock);
 		/* adjust cdclk divider */
@@ -4696,8 +4931,7 @@ static void cherryview_set_cdclk(struct drm_device *dev, int cdclk)
 static int valleyview_calc_cdclk(struct drm_i915_private *dev_priv,
 				 int max_pixclk)
 {
-	int vco = valleyview_get_vco(dev_priv);
-	int freq_320 = (vco <<  1) % 320000 != 0 ? 333333 : 320000;
+	int freq_320 = (dev_priv->hpll_freq <<  1) % 320000 != 0 ? 333333 : 320000;
 
 	/* FIXME: Punit isn't quite ready yet */
 	if (IS_CHERRYVIEW(dev_priv->dev))
@@ -4766,18 +5000,30 @@ static void valleyview_modeset_global_resources(struct drm_device *dev)
 	int req_cdclk = valleyview_calc_cdclk(dev_priv, max_pixclk);
 
 	if (req_cdclk != dev_priv->vlv_cdclk_freq) {
+		/*
+		 * FIXME: We can end up here with all power domains off, yet
+		 * with a CDCLK frequency other than the minimum. To account
+		 * for this take the PIPE-A power domain, which covers the HW
+		 * blocks needed for the following programming. This can be
+		 * removed once it's guaranteed that we get here either with
+		 * the minimum CDCLK set, or the required power domains
+		 * enabled.
+		 */
+		intel_display_power_get(dev_priv, POWER_DOMAIN_PIPE_A);
+
 		if (IS_CHERRYVIEW(dev))
 			cherryview_set_cdclk(dev, req_cdclk);
 		else
 			valleyview_set_cdclk(dev, req_cdclk);
-	}
 
-	modeset_update_crtc_power_domains(dev);
+		intel_display_power_put(dev_priv, POWER_DOMAIN_PIPE_A);
+	}
 }
 
 static void valleyview_crtc_enable(struct drm_crtc *crtc)
 {
 	struct drm_device *dev = crtc->dev;
+	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	struct intel_encoder *encoder;
 	int pipe = intel_crtc->pipe;
@@ -4788,13 +5034,13 @@ static void valleyview_crtc_enable(struct drm_crtc *crtc)
 	if (intel_crtc->active)
 		return;
 
-	is_dsi = intel_pipe_has_type(crtc, INTEL_OUTPUT_DSI);
+	is_dsi = intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DSI);
 
 	if (!is_dsi) {
 		if (IS_CHERRYVIEW(dev))
-			chv_prepare_pll(intel_crtc);
+			chv_prepare_pll(intel_crtc, &intel_crtc->config);
 		else
-			vlv_prepare_pll(intel_crtc);
+			vlv_prepare_pll(intel_crtc, &intel_crtc->config);
 	}
 
 	if (intel_crtc->config.has_dp_encoder)
@@ -4802,11 +5048,18 @@ static void valleyview_crtc_enable(struct drm_crtc *crtc)
 
 	intel_set_pipe_timings(intel_crtc);
 
+	if (IS_CHERRYVIEW(dev) && pipe == PIPE_B) {
+		struct drm_i915_private *dev_priv = dev->dev_private;
+
+		I915_WRITE(CHV_BLEND(pipe), CHV_BLEND_LEGACY);
+		I915_WRITE(CHV_CANVAS(pipe), 0);
+	}
+
 	i9xx_set_pipeconf(intel_crtc);
 
 	intel_crtc->active = true;
 
-	intel_set_cpu_fifo_underrun_reporting(dev, pipe, true);
+	intel_set_cpu_fifo_underrun_reporting(dev_priv, pipe, true);
 
 	for_each_encoder_on_crtc(dev, crtc, encoder)
 		if (encoder->pre_pll_enable)
@@ -4814,9 +5067,9 @@ static void valleyview_crtc_enable(struct drm_crtc *crtc)
 
 	if (!is_dsi) {
 		if (IS_CHERRYVIEW(dev))
-			chv_enable_pll(intel_crtc);
+			chv_enable_pll(intel_crtc, &intel_crtc->config);
 		else
-			vlv_enable_pll(intel_crtc);
+			vlv_enable_pll(intel_crtc, &intel_crtc->config);
 	}
 
 	for_each_encoder_on_crtc(dev, crtc, encoder)
@@ -4833,10 +5086,13 @@ static void valleyview_crtc_enable(struct drm_crtc *crtc)
 	for_each_encoder_on_crtc(dev, crtc, encoder)
 		encoder->enable(encoder);
 
+	assert_vblank_disabled(crtc);
+	drm_crtc_vblank_on(crtc);
+
 	intel_crtc_enable_planes(crtc);
 
 	/* Underruns don't raise interrupts, so check manually. */
-	i9xx_check_fifo_underruns(dev);
+	i9xx_check_fifo_underruns(dev_priv);
 }
 
 static void i9xx_set_pll_dividers(struct intel_crtc *crtc)
@@ -4851,6 +5107,7 @@ static void i9xx_set_pll_dividers(struct intel_crtc *crtc)
 static void i9xx_crtc_enable(struct drm_crtc *crtc)
 {
 	struct drm_device *dev = crtc->dev;
+	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	struct intel_encoder *encoder;
 	int pipe = intel_crtc->pipe;
@@ -4872,7 +5129,7 @@ static void i9xx_crtc_enable(struct drm_crtc *crtc)
 	intel_crtc->active = true;
 
 	if (!IS_GEN2(dev))
-		intel_set_cpu_fifo_underrun_reporting(dev, pipe, true);
+		intel_set_cpu_fifo_underrun_reporting(dev_priv, pipe, true);
 
 	for_each_encoder_on_crtc(dev, crtc, encoder)
 		if (encoder->pre_enable)
@@ -4890,6 +5147,9 @@ static void i9xx_crtc_enable(struct drm_crtc *crtc)
 	for_each_encoder_on_crtc(dev, crtc, encoder)
 		encoder->enable(encoder);
 
+	assert_vblank_disabled(crtc);
+	drm_crtc_vblank_on(crtc);
+
 	intel_crtc_enable_planes(crtc);
 
 	/*
@@ -4900,10 +5160,10 @@ static void i9xx_crtc_enable(struct drm_crtc *crtc)
 	 * but leave the pipe running.
 	 */
 	if (IS_GEN2(dev))
-		intel_set_cpu_fifo_underrun_reporting(dev, pipe, true);
+		intel_set_cpu_fifo_underrun_reporting(dev_priv, pipe, true);
 
 	/* Underruns don't raise interrupts, so check manually. */
-	i9xx_check_fifo_underruns(dev);
+	i9xx_check_fifo_underruns(dev_priv);
 }
 
 static void i9xx_pfit_disable(struct intel_crtc *crtc)
@@ -4939,7 +5199,7 @@ static void i9xx_crtc_disable(struct drm_crtc *crtc)
 	 * but leave the pipe running.
 	 */
 	if (IS_GEN2(dev))
-		intel_set_cpu_fifo_underrun_reporting(dev, pipe, false);
+		intel_set_cpu_fifo_underrun_reporting(dev_priv, pipe, false);
 
 	/*
 	 * Vblank time updates from the shadow to live plane control register
@@ -4953,9 +5213,6 @@ static void i9xx_crtc_disable(struct drm_crtc *crtc)
 	intel_set_memory_cxsr(dev_priv, false);
 	intel_crtc_disable_planes(crtc);
 
-	for_each_encoder_on_crtc(dev, crtc, encoder)
-		encoder->disable(encoder);
-
 	/*
 	 * On gen2 planes are double buffered but the pipe isn't, so we must
 	 * wait for planes to fully turn off before disabling the pipe.
@@ -4964,6 +5221,12 @@ static void i9xx_crtc_disable(struct drm_crtc *crtc)
 	 */
 	intel_wait_for_vblank(dev, pipe);
 
+	drm_crtc_vblank_off(crtc);
+	assert_vblank_disabled(crtc);
+
+	for_each_encoder_on_crtc(dev, crtc, encoder)
+		encoder->disable(encoder);
+
 	intel_disable_pipe(intel_crtc);
 
 	i9xx_pfit_disable(intel_crtc);
@@ -4972,7 +5235,7 @@ static void i9xx_crtc_disable(struct drm_crtc *crtc)
 		if (encoder->post_disable)
 			encoder->post_disable(encoder);
 
-	if (!intel_pipe_has_type(crtc, INTEL_OUTPUT_DSI)) {
+	if (!intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DSI)) {
 		if (IS_CHERRYVIEW(dev))
 			chv_disable_pll(dev_priv, pipe);
 		else if (IS_VALLEYVIEW(dev))
@@ -4982,7 +5245,7 @@ static void i9xx_crtc_disable(struct drm_crtc *crtc)
 	}
 
 	if (!IS_GEN2(dev))
-		intel_set_cpu_fifo_underrun_reporting(dev, pipe, false);
+		intel_set_cpu_fifo_underrun_reporting(dev_priv, pipe, false);
 
 	intel_crtc->active = false;
 	intel_update_watermarks(crtc);
@@ -4996,36 +5259,6 @@ static void i9xx_crtc_off(struct drm_crtc *crtc)
 {
 }
 
-static void intel_crtc_update_sarea(struct drm_crtc *crtc,
-				    bool enabled)
-{
-	struct drm_device *dev = crtc->dev;
-	struct drm_i915_master_private *master_priv;
-	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-	int pipe = intel_crtc->pipe;
-
-	if (!dev->primary->master)
-		return;
-
-	master_priv = dev->primary->master->driver_priv;
-	if (!master_priv->sarea_priv)
-		return;
-
-	switch (pipe) {
-	case 0:
-		master_priv->sarea_priv->pipeA_w = enabled ? crtc->mode.hdisplay : 0;
-		master_priv->sarea_priv->pipeA_h = enabled ? crtc->mode.vdisplay : 0;
-		break;
-	case 1:
-		master_priv->sarea_priv->pipeB_w = enabled ? crtc->mode.hdisplay : 0;
-		master_priv->sarea_priv->pipeB_h = enabled ? crtc->mode.vdisplay : 0;
-		break;
-	default:
-		DRM_ERROR("Can't update pipe %c in SAREA\n", pipe_name(pipe));
-		break;
-	}
-}
-
 /* Master function to enable/disable CRTC and corresponding power wells */
 void intel_crtc_control(struct drm_crtc *crtc, bool enable)
 {
@@ -5069,8 +5302,6 @@ void intel_crtc_update_dpms(struct drm_crtc *crtc)
 		enable |= intel_encoder->connectors_active;
 
 	intel_crtc_control(crtc, enable);
-
-	intel_crtc_update_sarea(crtc, enable);
 }
 
 static void intel_crtc_disable(struct drm_crtc *crtc)
@@ -5085,7 +5316,6 @@ static void intel_crtc_disable(struct drm_crtc *crtc)
 	WARN_ON(!crtc->enabled);
 
 	dev_priv->display.crtc_disable(crtc);
-	intel_crtc_update_sarea(crtc, false);
 	dev_priv->display.off(crtc);
 
 	if (crtc->primary->fb) {
@@ -5324,11 +5554,11 @@ static int intel_crtc_compute_config(struct intel_crtc *crtc,
 				     struct intel_crtc_config *pipe_config)
 {
 	struct drm_device *dev = crtc->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_display_mode *adjusted_mode = &pipe_config->adjusted_mode;
 
 	/* FIXME should check pixel clock limits on all platforms */
 	if (INTEL_INFO(dev)->gen < 4) {
-		struct drm_i915_private *dev_priv = dev->dev_private;
 		int clock_limit =
 			dev_priv->display.get_display_clock_speed(dev);
 
@@ -5355,7 +5585,7 @@ static int intel_crtc_compute_config(struct intel_crtc *crtc,
 	 * - LVDS dual channel mode
 	 * - Double wide pipe
 	 */
-	if ((intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_LVDS) &&
+	if ((intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS) &&
 	     intel_is_dual_link_lvds(dev)) || pipe_config->double_wide)
 		pipe_config->pipe_src_w &= ~1;
 
@@ -5377,13 +5607,6 @@ static int intel_crtc_compute_config(struct intel_crtc *crtc,
 	if (HAS_IPS(dev))
 		hsw_compute_ips_config(crtc, pipe_config);
 
-	/*
-	 * XXX: PCH/WRPLL clock sharing is done in ->mode_set, so make sure the
-	 * old clock survives for now.
-	 */
-	if (HAS_PCH_IBX(dev) || HAS_PCH_CPT(dev) || HAS_DDI(dev))
-		pipe_config->shared_dpll = crtc->config.shared_dpll;
-
 	if (pipe_config->has_pch_encoder)
 		return ironlake_fdi_compute_config(crtc, pipe_config);
 
@@ -5393,7 +5616,6 @@ static int intel_crtc_compute_config(struct intel_crtc *crtc,
 static int valleyview_get_display_clock_speed(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	int vco = valleyview_get_vco(dev_priv);
 	u32 val;
 	int divider;
 
@@ -5401,6 +5623,9 @@ static int valleyview_get_display_clock_speed(struct drm_device *dev)
 	if (IS_CHERRYVIEW(dev))
 		return 400000;
 
+	if (dev_priv->hpll_freq == 0)
+		dev_priv->hpll_freq = valleyview_get_vco(dev_priv);
+
 	mutex_lock(&dev_priv->dpio_lock);
 	val = vlv_cck_read(dev_priv, CCK_DISPLAY_CLOCK_CONTROL);
 	mutex_unlock(&dev_priv->dpio_lock);
@@ -5411,7 +5636,7 @@ static int valleyview_get_display_clock_speed(struct drm_device *dev)
 	     (divider << DISPLAY_FREQUENCY_STATUS_SHIFT),
 	     "cdclk change in progress\n");
 
-	return DIV_ROUND_CLOSEST(vco << 1, divider + 1);
+	return DIV_ROUND_CLOSEST(dev_priv->hpll_freq << 1, divider + 1);
 }
 
 static int i945_get_display_clock_speed(struct drm_device *dev)
@@ -5543,15 +5768,15 @@ static inline bool intel_panel_use_ssc(struct drm_i915_private *dev_priv)
 		&& !(dev_priv->quirks & QUIRK_LVDS_SSC_DISABLE);
 }
 
-static int i9xx_get_refclk(struct drm_crtc *crtc, int num_connectors)
+static int i9xx_get_refclk(struct intel_crtc *crtc, int num_connectors)
 {
-	struct drm_device *dev = crtc->dev;
+	struct drm_device *dev = crtc->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int refclk;
 
 	if (IS_VALLEYVIEW(dev)) {
 		refclk = 100000;
-	} else if (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS) &&
+	} else if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS) &&
 	    intel_panel_use_ssc(dev_priv) && num_connectors < 2) {
 		refclk = dev_priv->vbt.lvds_ssc_freq;
 		DRM_DEBUG_KMS("using SSC reference clock of %d kHz\n", refclk);
@@ -5581,24 +5806,24 @@ static void i9xx_update_pll_dividers(struct intel_crtc *crtc,
 	u32 fp, fp2 = 0;
 
 	if (IS_PINEVIEW(dev)) {
-		fp = pnv_dpll_compute_fp(&crtc->config.dpll);
+		fp = pnv_dpll_compute_fp(&crtc->new_config->dpll);
 		if (reduced_clock)
 			fp2 = pnv_dpll_compute_fp(reduced_clock);
 	} else {
-		fp = i9xx_dpll_compute_fp(&crtc->config.dpll);
+		fp = i9xx_dpll_compute_fp(&crtc->new_config->dpll);
 		if (reduced_clock)
 			fp2 = i9xx_dpll_compute_fp(reduced_clock);
 	}
 
-	crtc->config.dpll_hw_state.fp0 = fp;
+	crtc->new_config->dpll_hw_state.fp0 = fp;
 
 	crtc->lowfreq_avail = false;
-	if (intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_LVDS) &&
+	if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS) &&
 	    reduced_clock && i915.powersave) {
-		crtc->config.dpll_hw_state.fp1 = fp2;
+		crtc->new_config->dpll_hw_state.fp1 = fp2;
 		crtc->lowfreq_avail = true;
 	} else {
-		crtc->config.dpll_hw_state.fp1 = fp;
+		crtc->new_config->dpll_hw_state.fp1 = fp;
 	}
 }
 
@@ -5687,7 +5912,8 @@ void intel_dp_set_m_n(struct intel_crtc *crtc)
 						   &crtc->config.dp_m2_n2);
 }
 
-static void vlv_update_pll(struct intel_crtc *crtc)
+static void vlv_update_pll(struct intel_crtc *crtc,
+			   struct intel_crtc_config *pipe_config)
 {
 	u32 dpll, dpll_md;
 
@@ -5702,14 +5928,15 @@ static void vlv_update_pll(struct intel_crtc *crtc)
 	if (crtc->pipe == PIPE_B)
 		dpll |= DPLL_INTEGRATED_CRI_CLK_VLV;
 	dpll |= DPLL_VCO_ENABLE;
-	crtc->config.dpll_hw_state.dpll = dpll;
+	pipe_config->dpll_hw_state.dpll = dpll;
 
-	dpll_md = (crtc->config.pixel_multiplier - 1)
+	dpll_md = (pipe_config->pixel_multiplier - 1)
 		<< DPLL_MD_UDI_MULTIPLIER_SHIFT;
-	crtc->config.dpll_hw_state.dpll_md = dpll_md;
+	pipe_config->dpll_hw_state.dpll_md = dpll_md;
 }
 
-static void vlv_prepare_pll(struct intel_crtc *crtc)
+static void vlv_prepare_pll(struct intel_crtc *crtc,
+			    const struct intel_crtc_config *pipe_config)
 {
 	struct drm_device *dev = crtc->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -5720,11 +5947,11 @@ static void vlv_prepare_pll(struct intel_crtc *crtc)
 
 	mutex_lock(&dev_priv->dpio_lock);
 
-	bestn = crtc->config.dpll.n;
-	bestm1 = crtc->config.dpll.m1;
-	bestm2 = crtc->config.dpll.m2;
-	bestp1 = crtc->config.dpll.p1;
-	bestp2 = crtc->config.dpll.p2;
+	bestn = pipe_config->dpll.n;
+	bestm1 = pipe_config->dpll.m1;
+	bestm2 = pipe_config->dpll.m2;
+	bestp1 = pipe_config->dpll.p1;
+	bestp2 = pipe_config->dpll.p2;
 
 	/* See eDP HDMI DPIO driver vbios notes doc */
 
@@ -5761,17 +5988,16 @@ static void vlv_prepare_pll(struct intel_crtc *crtc)
 	vlv_dpio_write(dev_priv, pipe, VLV_PLL_DW3(pipe), mdiv);
 
 	/* Set HBR and RBR LPF coefficients */
-	if (crtc->config.port_clock == 162000 ||
-	    intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_ANALOG) ||
-	    intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_HDMI))
+	if (pipe_config->port_clock == 162000 ||
+	    intel_pipe_has_type(crtc, INTEL_OUTPUT_ANALOG) ||
+	    intel_pipe_has_type(crtc, INTEL_OUTPUT_HDMI))
 		vlv_dpio_write(dev_priv, pipe, VLV_PLL_DW10(pipe),
 				 0x009f0003);
 	else
 		vlv_dpio_write(dev_priv, pipe, VLV_PLL_DW10(pipe),
 				 0x00d0000f);
 
-	if (intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_EDP) ||
-	    intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_DISPLAYPORT)) {
+	if (crtc->config.has_dp_encoder) {
 		/* Use SSC source */
 		if (pipe == PIPE_A)
 			vlv_dpio_write(dev_priv, pipe, VLV_PLL_DW5(pipe),
@@ -5791,8 +6017,8 @@ static void vlv_prepare_pll(struct intel_crtc *crtc)
 
 	coreclk = vlv_dpio_read(dev_priv, pipe, VLV_PLL_DW7(pipe));
 	coreclk = (coreclk & 0x0000ff00) | 0x01c00000;
-	if (intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_DISPLAYPORT) ||
-	    intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_EDP))
+	if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DISPLAYPORT) ||
+	    intel_pipe_has_type(crtc, INTEL_OUTPUT_EDP))
 		coreclk |= 0x01000000;
 	vlv_dpio_write(dev_priv, pipe, VLV_PLL_DW7(pipe), coreclk);
 
@@ -5800,19 +6026,21 @@ static void vlv_prepare_pll(struct intel_crtc *crtc)
 	mutex_unlock(&dev_priv->dpio_lock);
 }
 
-static void chv_update_pll(struct intel_crtc *crtc)
+static void chv_update_pll(struct intel_crtc *crtc,
+			   struct intel_crtc_config *pipe_config)
 {
-	crtc->config.dpll_hw_state.dpll = DPLL_SSC_REF_CLOCK_CHV |
+	pipe_config->dpll_hw_state.dpll = DPLL_SSC_REF_CLOCK_CHV |
 		DPLL_REFA_CLK_ENABLE_VLV | DPLL_VGA_MODE_DIS |
 		DPLL_VCO_ENABLE;
 	if (crtc->pipe != PIPE_A)
-		crtc->config.dpll_hw_state.dpll |= DPLL_INTEGRATED_CRI_CLK_VLV;
+		pipe_config->dpll_hw_state.dpll |= DPLL_INTEGRATED_CRI_CLK_VLV;
 
-	crtc->config.dpll_hw_state.dpll_md =
-		(crtc->config.pixel_multiplier - 1) << DPLL_MD_UDI_MULTIPLIER_SHIFT;
+	pipe_config->dpll_hw_state.dpll_md =
+		(pipe_config->pixel_multiplier - 1) << DPLL_MD_UDI_MULTIPLIER_SHIFT;
 }
 
-static void chv_prepare_pll(struct intel_crtc *crtc)
+static void chv_prepare_pll(struct intel_crtc *crtc,
+			    const struct intel_crtc_config *pipe_config)
 {
 	struct drm_device *dev = crtc->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -5823,18 +6051,18 @@ static void chv_prepare_pll(struct intel_crtc *crtc)
 	u32 bestn, bestm1, bestm2, bestp1, bestp2, bestm2_frac;
 	int refclk;
 
-	bestn = crtc->config.dpll.n;
-	bestm2_frac = crtc->config.dpll.m2 & 0x3fffff;
-	bestm1 = crtc->config.dpll.m1;
-	bestm2 = crtc->config.dpll.m2 >> 22;
-	bestp1 = crtc->config.dpll.p1;
-	bestp2 = crtc->config.dpll.p2;
+	bestn = pipe_config->dpll.n;
+	bestm2_frac = pipe_config->dpll.m2 & 0x3fffff;
+	bestm1 = pipe_config->dpll.m1;
+	bestm2 = pipe_config->dpll.m2 >> 22;
+	bestp1 = pipe_config->dpll.p1;
+	bestp2 = pipe_config->dpll.p2;
 
 	/*
 	 * Enable Refclk and SSC
 	 */
 	I915_WRITE(dpll_reg,
-		   crtc->config.dpll_hw_state.dpll & ~DPLL_VCO_ENABLE);
+		   pipe_config->dpll_hw_state.dpll & ~DPLL_VCO_ENABLE);
 
 	mutex_lock(&dev_priv->dpio_lock);
 
@@ -5862,7 +6090,7 @@ static void chv_prepare_pll(struct intel_crtc *crtc)
 		       (2 << DPIO_CHV_FEEDFWD_GAIN_SHIFT));
 
 	/* Loop filter */
-	refclk = i9xx_get_refclk(&crtc->base, 0);
+	refclk = i9xx_get_refclk(crtc, 0);
 	loopfilter = 5 << DPIO_CHV_PROP_COEFF_SHIFT |
 		2 << DPIO_CHV_GAIN_CTRL_SHIFT;
 	if (refclk == 100000)
@@ -5882,6 +6110,53 @@ static void chv_prepare_pll(struct intel_crtc *crtc)
 	mutex_unlock(&dev_priv->dpio_lock);
 }
 
+/**
+ * vlv_force_pll_on - forcibly enable just the PLL
+ * @dev_priv: i915 private structure
+ * @pipe: pipe PLL to enable
+ * @dpll: PLL configuration
+ *
+ * Enable the PLL for @pipe using the supplied @dpll config. To be used
+ * in cases where we need the PLL enabled even when @pipe is not going to
+ * be enabled.
+ */
+void vlv_force_pll_on(struct drm_device *dev, enum pipe pipe,
+		      const struct dpll *dpll)
+{
+	struct intel_crtc *crtc =
+		to_intel_crtc(intel_get_crtc_for_pipe(dev, pipe));
+	struct intel_crtc_config pipe_config = {
+		.pixel_multiplier = 1,
+		.dpll = *dpll,
+	};
+
+	if (IS_CHERRYVIEW(dev)) {
+		chv_update_pll(crtc, &pipe_config);
+		chv_prepare_pll(crtc, &pipe_config);
+		chv_enable_pll(crtc, &pipe_config);
+	} else {
+		vlv_update_pll(crtc, &pipe_config);
+		vlv_prepare_pll(crtc, &pipe_config);
+		vlv_enable_pll(crtc, &pipe_config);
+	}
+}
+
+/**
+ * vlv_force_pll_off - forcibly disable just the PLL
+ * @dev_priv: i915 private structure
+ * @pipe: pipe PLL to disable
+ *
+ * Disable the PLL for @pipe. To be used in cases where we need
+ * the PLL enabled even when @pipe is not going to be enabled.
+ */
+void vlv_force_pll_off(struct drm_device *dev, enum pipe pipe)
+{
+	if (IS_CHERRYVIEW(dev))
+		chv_disable_pll(to_i915(dev), pipe);
+	else
+		vlv_disable_pll(to_i915(dev), pipe);
+}
+
 static void i9xx_update_pll(struct intel_crtc *crtc,
 			    intel_clock_t *reduced_clock,
 			    int num_connectors)
@@ -5890,29 +6165,29 @@ static void i9xx_update_pll(struct intel_crtc *crtc,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 dpll;
 	bool is_sdvo;
-	struct dpll *clock = &crtc->config.dpll;
+	struct dpll *clock = &crtc->new_config->dpll;
 
 	i9xx_update_pll_dividers(crtc, reduced_clock);
 
-	is_sdvo = intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_SDVO) ||
-		intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_HDMI);
+	is_sdvo = intel_pipe_will_have_type(crtc, INTEL_OUTPUT_SDVO) ||
+		intel_pipe_will_have_type(crtc, INTEL_OUTPUT_HDMI);
 
 	dpll = DPLL_VGA_MODE_DIS;
 
-	if (intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_LVDS))
+	if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS))
 		dpll |= DPLLB_MODE_LVDS;
 	else
 		dpll |= DPLLB_MODE_DAC_SERIAL;
 
 	if (IS_I945G(dev) || IS_I945GM(dev) || IS_G33(dev)) {
-		dpll |= (crtc->config.pixel_multiplier - 1)
+		dpll |= (crtc->new_config->pixel_multiplier - 1)
 			<< SDVO_MULTIPLIER_SHIFT_HIRES;
 	}
 
 	if (is_sdvo)
 		dpll |= DPLL_SDVO_HIGH_SPEED;
 
-	if (intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_DISPLAYPORT))
+	if (crtc->new_config->has_dp_encoder)
 		dpll |= DPLL_SDVO_HIGH_SPEED;
 
 	/* compute bitmask from p1 value */
@@ -5940,21 +6215,21 @@ static void i9xx_update_pll(struct intel_crtc *crtc,
 	if (INTEL_INFO(dev)->gen >= 4)
 		dpll |= (6 << PLL_LOAD_PULSE_PHASE_SHIFT);
 
-	if (crtc->config.sdvo_tv_clock)
+	if (crtc->new_config->sdvo_tv_clock)
 		dpll |= PLL_REF_INPUT_TVCLKINBC;
-	else if (intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_LVDS) &&
+	else if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS) &&
 		 intel_panel_use_ssc(dev_priv) && num_connectors < 2)
 		dpll |= PLLB_REF_INPUT_SPREADSPECTRUMIN;
 	else
 		dpll |= PLL_REF_INPUT_DREFCLK;
 
 	dpll |= DPLL_VCO_ENABLE;
-	crtc->config.dpll_hw_state.dpll = dpll;
+	crtc->new_config->dpll_hw_state.dpll = dpll;
 
 	if (INTEL_INFO(dev)->gen >= 4) {
-		u32 dpll_md = (crtc->config.pixel_multiplier - 1)
+		u32 dpll_md = (crtc->new_config->pixel_multiplier - 1)
 			<< DPLL_MD_UDI_MULTIPLIER_SHIFT;
-		crtc->config.dpll_hw_state.dpll_md = dpll_md;
+		crtc->new_config->dpll_hw_state.dpll_md = dpll_md;
 	}
 }
 
@@ -5965,13 +6240,13 @@ static void i8xx_update_pll(struct intel_crtc *crtc,
 	struct drm_device *dev = crtc->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 dpll;
-	struct dpll *clock = &crtc->config.dpll;
+	struct dpll *clock = &crtc->new_config->dpll;
 
 	i9xx_update_pll_dividers(crtc, reduced_clock);
 
 	dpll = DPLL_VGA_MODE_DIS;
 
-	if (intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_LVDS)) {
+	if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS)) {
 		dpll |= (1 << (clock->p1 - 1)) << DPLL_FPA01_P1_POST_DIV_SHIFT;
 	} else {
 		if (clock->p1 == 2)
@@ -5982,17 +6257,17 @@ static void i8xx_update_pll(struct intel_crtc *crtc,
 			dpll |= PLL_P2_DIVIDE_BY_4;
 	}
 
-	if (!IS_I830(dev) && intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_DVO))
+	if (!IS_I830(dev) && intel_pipe_will_have_type(crtc, INTEL_OUTPUT_DVO))
 		dpll |= DPLL_DVO_2X_MODE;
 
-	if (intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_LVDS) &&
+	if (intel_pipe_will_have_type(crtc, INTEL_OUTPUT_LVDS) &&
 		 intel_panel_use_ssc(dev_priv) && num_connectors < 2)
 		dpll |= PLLB_REF_INPUT_SPREADSPECTRUMIN;
 	else
 		dpll |= PLL_REF_INPUT_DREFCLK;
 
 	dpll |= DPLL_VCO_ENABLE;
-	crtc->config.dpll_hw_state.dpll = dpll;
+	crtc->new_config->dpll_hw_state.dpll = dpll;
 }
 
 static void intel_set_pipe_timings(struct intel_crtc *intel_crtc)
@@ -6016,7 +6291,7 @@ static void intel_set_pipe_timings(struct intel_crtc *intel_crtc)
 		crtc_vtotal -= 1;
 		crtc_vblank_end -= 1;
 
-		if (intel_pipe_has_type(&intel_crtc->base, INTEL_OUTPUT_SDVO))
+		if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_SDVO))
 			vsyncshift = (adjusted_mode->crtc_htotal - 1) / 2;
 		else
 			vsyncshift = adjusted_mode->crtc_hsync_start -
@@ -6174,7 +6449,7 @@ static void i9xx_set_pipeconf(struct intel_crtc *intel_crtc)
 
 	if (intel_crtc->config.adjusted_mode.flags & DRM_MODE_FLAG_INTERLACE) {
 		if (INTEL_INFO(dev)->gen < 4 ||
-		    intel_pipe_has_type(&intel_crtc->base, INTEL_OUTPUT_SDVO))
+		    intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_SDVO))
 			pipeconf |= PIPECONF_INTERLACE_W_FIELD_INDICATION;
 		else
 			pipeconf |= PIPECONF_INTERLACE_W_SYNC_SHIFT;
@@ -6188,13 +6463,10 @@ static void i9xx_set_pipeconf(struct intel_crtc *intel_crtc)
 	POSTING_READ(PIPECONF(intel_crtc->pipe));
 }
 
-static int i9xx_crtc_mode_set(struct drm_crtc *crtc,
-			      int x, int y,
-			      struct drm_framebuffer *fb)
+static int i9xx_crtc_compute_clock(struct intel_crtc *crtc)
 {
-	struct drm_device *dev = crtc->dev;
+	struct drm_device *dev = crtc->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	int refclk, num_connectors = 0;
 	intel_clock_t clock, reduced_clock;
 	bool ok, has_reduced_clock = false;
@@ -6202,7 +6474,10 @@ static int i9xx_crtc_mode_set(struct drm_crtc *crtc,
 	struct intel_encoder *encoder;
 	const intel_limit_t *limit;
 
-	for_each_encoder_on_crtc(dev, crtc, encoder) {
+	for_each_intel_encoder(dev, encoder) {
+		if (encoder->new_crtc != crtc)
+			continue;
+
 		switch (encoder->type) {
 		case INTEL_OUTPUT_LVDS:
 			is_lvds = true;
@@ -6210,6 +6485,8 @@ static int i9xx_crtc_mode_set(struct drm_crtc *crtc,
 		case INTEL_OUTPUT_DSI:
 			is_dsi = true;
 			break;
+		default:
+			break;
 		}
 
 		num_connectors++;
@@ -6218,7 +6495,7 @@ static int i9xx_crtc_mode_set(struct drm_crtc *crtc,
 	if (is_dsi)
 		return 0;
 
-	if (!intel_crtc->config.clock_set) {
+	if (!crtc->new_config->clock_set) {
 		refclk = i9xx_get_refclk(crtc, num_connectors);
 
 		/*
@@ -6229,7 +6506,7 @@ static int i9xx_crtc_mode_set(struct drm_crtc *crtc,
 		 */
 		limit = intel_limit(crtc, refclk);
 		ok = dev_priv->display.find_dpll(limit, crtc,
-						 intel_crtc->config.port_clock,
+						 crtc->new_config->port_clock,
 						 refclk, NULL, &clock);
 		if (!ok) {
 			DRM_ERROR("Couldn't find PLL settings for mode!\n");
@@ -6250,23 +6527,23 @@ static int i9xx_crtc_mode_set(struct drm_crtc *crtc,
 							    &reduced_clock);
 		}
 		/* Compat-code for transition, will disappear. */
-		intel_crtc->config.dpll.n = clock.n;
-		intel_crtc->config.dpll.m1 = clock.m1;
-		intel_crtc->config.dpll.m2 = clock.m2;
-		intel_crtc->config.dpll.p1 = clock.p1;
-		intel_crtc->config.dpll.p2 = clock.p2;
+		crtc->new_config->dpll.n = clock.n;
+		crtc->new_config->dpll.m1 = clock.m1;
+		crtc->new_config->dpll.m2 = clock.m2;
+		crtc->new_config->dpll.p1 = clock.p1;
+		crtc->new_config->dpll.p2 = clock.p2;
 	}
 
 	if (IS_GEN2(dev)) {
-		i8xx_update_pll(intel_crtc,
+		i8xx_update_pll(crtc,
 				has_reduced_clock ? &reduced_clock : NULL,
 				num_connectors);
 	} else if (IS_CHERRYVIEW(dev)) {
-		chv_update_pll(intel_crtc);
+		chv_update_pll(crtc, crtc->new_config);
 	} else if (IS_VALLEYVIEW(dev)) {
-		vlv_update_pll(intel_crtc);
+		vlv_update_pll(crtc, crtc->new_config);
 	} else {
-		i9xx_update_pll(intel_crtc,
+		i9xx_update_pll(crtc,
 				has_reduced_clock ? &reduced_clock : NULL,
 				num_connectors);
 	}
@@ -6432,8 +6709,8 @@ static bool i9xx_get_pipe_config(struct intel_crtc *crtc,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	uint32_t tmp;
 
-	if (!intel_display_power_enabled(dev_priv,
-					 POWER_DOMAIN_PIPE(crtc->pipe)))
+	if (!intel_display_power_is_enabled(dev_priv,
+					    POWER_DOMAIN_PIPE(crtc->pipe)))
 		return false;
 
 	pipe_config->cpu_transcoder = (enum transcoder) crtc->pipe;
@@ -6538,6 +6815,8 @@ static void ironlake_init_pch_refclk(struct drm_device *dev)
 			if (enc_to_dig_port(&encoder->base)->port == PORT_A)
 				has_cpu_edp = true;
 			break;
+		default:
+			break;
 		}
 	}
 
@@ -6842,6 +7121,8 @@ static void lpt_init_pch_refclk(struct drm_device *dev)
 		case INTEL_OUTPUT_ANALOG:
 			has_vga = true;
 			break;
+		default:
+			break;
 		}
 	}
 
@@ -6870,11 +7151,16 @@ static int ironlake_get_refclk(struct drm_crtc *crtc)
 	int num_connectors = 0;
 	bool is_lvds = false;
 
-	for_each_encoder_on_crtc(dev, crtc, encoder) {
+	for_each_intel_encoder(dev, encoder) {
+		if (encoder->new_crtc != to_intel_crtc(crtc))
+			continue;
+
 		switch (encoder->type) {
 		case INTEL_OUTPUT_LVDS:
 			is_lvds = true;
 			break;
+		default:
+			break;
 		}
 		num_connectors++;
 	}
@@ -7019,7 +7305,7 @@ static void haswell_set_pipeconf(struct drm_crtc *crtc)
 	I915_WRITE(GAMMA_MODE(intel_crtc->pipe), GAMMA_MODE_MODE_8BIT);
 	POSTING_READ(GAMMA_MODE(intel_crtc->pipe));
 
-	if (IS_BROADWELL(dev)) {
+	if (IS_BROADWELL(dev) || INTEL_INFO(dev)->gen >= 9) {
 		val = 0;
 
 		switch (intel_crtc->config.pipe_bpp) {
@@ -7054,18 +7340,12 @@ static bool ironlake_compute_clocks(struct drm_crtc *crtc,
 {
 	struct drm_device *dev = crtc->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_encoder *intel_encoder;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	int refclk;
 	const intel_limit_t *limit;
 	bool ret, is_lvds = false;
 
-	for_each_encoder_on_crtc(dev, crtc, intel_encoder) {
-		switch (intel_encoder->type) {
-		case INTEL_OUTPUT_LVDS:
-			is_lvds = true;
-			break;
-		}
-	}
+	is_lvds = intel_pipe_will_have_type(intel_crtc, INTEL_OUTPUT_LVDS);
 
 	refclk = ironlake_get_refclk(crtc);
 
@@ -7074,9 +7354,9 @@ static bool ironlake_compute_clocks(struct drm_crtc *crtc,
 	 * refclk, or FALSE.  The returned values represent the clock equation:
 	 * reflck * (5 * (m1 + 2) + (m2 + 2)) / (n + 2) / p1 / p2.
 	 */
-	limit = intel_limit(crtc, refclk);
-	ret = dev_priv->display.find_dpll(limit, crtc,
-					  to_intel_crtc(crtc)->config.port_clock,
+	limit = intel_limit(intel_crtc, refclk);
+	ret = dev_priv->display.find_dpll(limit, intel_crtc,
+					  intel_crtc->new_config->port_clock,
 					  refclk, NULL, clock);
 	if (!ret)
 		return false;
@@ -7089,7 +7369,7 @@ static bool ironlake_compute_clocks(struct drm_crtc *crtc,
 		 * downclock feature.
 		*/
 		*has_reduced_clock =
-			dev_priv->display.find_dpll(limit, crtc,
+			dev_priv->display.find_dpll(limit, intel_crtc,
 						    dev_priv->lvds_downclock,
 						    refclk, clock,
 						    reduced_clock);
@@ -7126,7 +7406,10 @@ static uint32_t ironlake_compute_dpll(struct intel_crtc *intel_crtc,
 	int factor, num_connectors = 0;
 	bool is_lvds = false, is_sdvo = false;
 
-	for_each_encoder_on_crtc(dev, crtc, intel_encoder) {
+	for_each_intel_encoder(dev, intel_encoder) {
+		if (intel_encoder->new_crtc != to_intel_crtc(crtc))
+			continue;
+
 		switch (intel_encoder->type) {
 		case INTEL_OUTPUT_LVDS:
 			is_lvds = true;
@@ -7135,6 +7418,8 @@ static uint32_t ironlake_compute_dpll(struct intel_crtc *intel_crtc,
 		case INTEL_OUTPUT_HDMI:
 			is_sdvo = true;
 			break;
+		default:
+			break;
 		}
 
 		num_connectors++;
@@ -7147,10 +7432,10 @@ static uint32_t ironlake_compute_dpll(struct intel_crtc *intel_crtc,
 		     dev_priv->vbt.lvds_ssc_freq == 100000) ||
 		    (HAS_PCH_IBX(dev) && intel_is_dual_link_lvds(dev)))
 			factor = 25;
-	} else if (intel_crtc->config.sdvo_tv_clock)
+	} else if (intel_crtc->new_config->sdvo_tv_clock)
 		factor = 20;
 
-	if (ironlake_needs_fb_cb_tune(&intel_crtc->config.dpll, factor))
+	if (ironlake_needs_fb_cb_tune(&intel_crtc->new_config->dpll, factor))
 		*fp |= FP_CB_TUNE;
 
 	if (fp2 && (reduced_clock->m < factor * reduced_clock->n))
@@ -7163,20 +7448,20 @@ static uint32_t ironlake_compute_dpll(struct intel_crtc *intel_crtc,
 	else
 		dpll |= DPLLB_MODE_DAC_SERIAL;
 
-	dpll |= (intel_crtc->config.pixel_multiplier - 1)
+	dpll |= (intel_crtc->new_config->pixel_multiplier - 1)
 		<< PLL_REF_SDVO_HDMI_MULTIPLIER_SHIFT;
 
 	if (is_sdvo)
 		dpll |= DPLL_SDVO_HIGH_SPEED;
-	if (intel_crtc->config.has_dp_encoder)
+	if (intel_crtc->new_config->has_dp_encoder)
 		dpll |= DPLL_SDVO_HIGH_SPEED;
 
 	/* compute bitmask from p1 value */
-	dpll |= (1 << (intel_crtc->config.dpll.p1 - 1)) << DPLL_FPA01_P1_POST_DIV_SHIFT;
+	dpll |= (1 << (intel_crtc->new_config->dpll.p1 - 1)) << DPLL_FPA01_P1_POST_DIV_SHIFT;
 	/* also FPA1 */
-	dpll |= (1 << (intel_crtc->config.dpll.p1 - 1)) << DPLL_FPA1_P1_POST_DIV_SHIFT;
+	dpll |= (1 << (intel_crtc->new_config->dpll.p1 - 1)) << DPLL_FPA1_P1_POST_DIV_SHIFT;
 
-	switch (intel_crtc->config.dpll.p2) {
+	switch (intel_crtc->new_config->dpll.p2) {
 	case 5:
 		dpll |= DPLL_DAC_SERIAL_P2_CLOCK_DIV_5;
 		break;
@@ -7199,78 +7484,64 @@ static uint32_t ironlake_compute_dpll(struct intel_crtc *intel_crtc,
 	return dpll | DPLL_VCO_ENABLE;
 }
 
-static int ironlake_crtc_mode_set(struct drm_crtc *crtc,
-				  int x, int y,
-				  struct drm_framebuffer *fb)
+static int ironlake_crtc_compute_clock(struct intel_crtc *crtc)
 {
-	struct drm_device *dev = crtc->dev;
-	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-	int num_connectors = 0;
+	struct drm_device *dev = crtc->base.dev;
 	intel_clock_t clock, reduced_clock;
 	u32 dpll = 0, fp = 0, fp2 = 0;
 	bool ok, has_reduced_clock = false;
 	bool is_lvds = false;
-	struct intel_encoder *encoder;
 	struct intel_shared_dpll *pll;
 
-	for_each_encoder_on_crtc(dev, crtc, encoder) {
-		switch (encoder->type) {
-		case INTEL_OUTPUT_LVDS:
-			is_lvds = true;
-			break;
-		}
-
-		num_connectors++;
-	}
+	is_lvds = intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS);
 
 	WARN(!(HAS_PCH_IBX(dev) || HAS_PCH_CPT(dev)),
 	     "Unexpected PCH type %d\n", INTEL_PCH_TYPE(dev));
 
-	ok = ironlake_compute_clocks(crtc, &clock,
+	ok = ironlake_compute_clocks(&crtc->base, &clock,
 				     &has_reduced_clock, &reduced_clock);
-	if (!ok && !intel_crtc->config.clock_set) {
+	if (!ok && !crtc->new_config->clock_set) {
 		DRM_ERROR("Couldn't find PLL settings for mode!\n");
 		return -EINVAL;
 	}
 	/* Compat-code for transition, will disappear. */
-	if (!intel_crtc->config.clock_set) {
-		intel_crtc->config.dpll.n = clock.n;
-		intel_crtc->config.dpll.m1 = clock.m1;
-		intel_crtc->config.dpll.m2 = clock.m2;
-		intel_crtc->config.dpll.p1 = clock.p1;
-		intel_crtc->config.dpll.p2 = clock.p2;
+	if (!crtc->new_config->clock_set) {
+		crtc->new_config->dpll.n = clock.n;
+		crtc->new_config->dpll.m1 = clock.m1;
+		crtc->new_config->dpll.m2 = clock.m2;
+		crtc->new_config->dpll.p1 = clock.p1;
+		crtc->new_config->dpll.p2 = clock.p2;
 	}
 
 	/* CPU eDP is the only output that doesn't need a PCH PLL of its own. */
-	if (intel_crtc->config.has_pch_encoder) {
-		fp = i9xx_dpll_compute_fp(&intel_crtc->config.dpll);
+	if (crtc->new_config->has_pch_encoder) {
+		fp = i9xx_dpll_compute_fp(&crtc->new_config->dpll);
 		if (has_reduced_clock)
 			fp2 = i9xx_dpll_compute_fp(&reduced_clock);
 
-		dpll = ironlake_compute_dpll(intel_crtc,
+		dpll = ironlake_compute_dpll(crtc,
 					     &fp, &reduced_clock,
 					     has_reduced_clock ? &fp2 : NULL);
 
-		intel_crtc->config.dpll_hw_state.dpll = dpll;
-		intel_crtc->config.dpll_hw_state.fp0 = fp;
+		crtc->new_config->dpll_hw_state.dpll = dpll;
+		crtc->new_config->dpll_hw_state.fp0 = fp;
 		if (has_reduced_clock)
-			intel_crtc->config.dpll_hw_state.fp1 = fp2;
+			crtc->new_config->dpll_hw_state.fp1 = fp2;
 		else
-			intel_crtc->config.dpll_hw_state.fp1 = fp;
+			crtc->new_config->dpll_hw_state.fp1 = fp;
 
-		pll = intel_get_shared_dpll(intel_crtc);
+		pll = intel_get_shared_dpll(crtc);
 		if (pll == NULL) {
 			DRM_DEBUG_DRIVER("failed to find PLL for pipe %c\n",
-					 pipe_name(intel_crtc->pipe));
+					 pipe_name(crtc->pipe));
 			return -EINVAL;
 		}
-	} else
-		intel_put_shared_dpll(intel_crtc);
+	}
 
 	if (is_lvds && has_reduced_clock && i915.powersave)
-		intel_crtc->lowfreq_avail = true;
+		crtc->lowfreq_avail = true;
 	else
-		intel_crtc->lowfreq_avail = false;
+		crtc->lowfreq_avail = false;
 
 	return 0;
 }
@@ -7351,6 +7622,22 @@ static void ironlake_get_fdi_m_n_config(struct intel_crtc *crtc,
 				     &pipe_config->fdi_m_n, NULL);
 }
 
+static void skylake_get_pfit_config(struct intel_crtc *crtc,
+				    struct intel_crtc_config *pipe_config)
+{
+	struct drm_device *dev = crtc->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	uint32_t tmp;
+
+	tmp = I915_READ(PS_CTL(crtc->pipe));
+
+	if (tmp & PS_ENABLE) {
+		pipe_config->pch_pfit.enabled = true;
+		pipe_config->pch_pfit.pos = I915_READ(PS_WIN_POS(crtc->pipe));
+		pipe_config->pch_pfit.size = I915_READ(PS_WIN_SZ(crtc->pipe));
+	}
+}
+
 static void ironlake_get_pfit_config(struct intel_crtc *crtc,
 				     struct intel_crtc_config *pipe_config)
 {
@@ -7442,8 +7729,8 @@ static bool ironlake_get_pipe_config(struct intel_crtc *crtc,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	uint32_t tmp;
 
-	if (!intel_display_power_enabled(dev_priv,
-					 POWER_DOMAIN_PIPE(crtc->pipe)))
+	if (!intel_display_power_is_enabled(dev_priv,
+					    POWER_DOMAIN_PIPE(crtc->pipe)))
 		return false;
 
 	pipe_config->cpu_transcoder = (enum transcoder) crtc->pipe;
@@ -7636,7 +7923,6 @@ static void hsw_disable_lcpll(struct drm_i915_private *dev_priv,
 static void hsw_restore_lcpll(struct drm_i915_private *dev_priv)
 {
 	uint32_t val;
-	unsigned long irqflags;
 
 	val = I915_READ(LCPLL_CTL);
 
@@ -7656,10 +7942,10 @@ static void hsw_restore_lcpll(struct drm_i915_private *dev_priv)
 	 * to call special forcewake code that doesn't touch runtime PM and
 	 * doesn't enable the forcewake delayed work.
 	 */
-	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+	spin_lock_irq(&dev_priv->uncore.lock);
 	if (dev_priv->uncore.forcewake_count++ == 0)
 		dev_priv->uncore.funcs.force_wake_get(dev_priv, FORCEWAKE_ALL);
-	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+	spin_unlock_irq(&dev_priv->uncore.lock);
 
 	if (val & LCPLL_POWER_DOWN_ALLOW) {
 		val &= ~LCPLL_POWER_DOWN_ALLOW;
@@ -7690,10 +7976,10 @@ static void hsw_restore_lcpll(struct drm_i915_private *dev_priv)
 	}
 
 	/* See the big comment above. */
-	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+	spin_lock_irq(&dev_priv->uncore.lock);
 	if (--dev_priv->uncore.forcewake_count == 0)
 		dev_priv->uncore.funcs.force_wake_put(dev_priv, FORCEWAKE_ALL);
-	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+	spin_unlock_irq(&dev_priv->uncore.lock);
 }
 
 /*
@@ -7755,28 +8041,36 @@ void hsw_disable_pc8(struct drm_i915_private *dev_priv)
 	intel_prepare_ddi(dev);
 }
 
-static void snb_modeset_global_resources(struct drm_device *dev)
+static int haswell_crtc_compute_clock(struct intel_crtc *crtc)
 {
-	modeset_update_crtc_power_domains(dev);
-}
+	if (!intel_ddi_pll_select(crtc))
+		return -EINVAL;
 
-static void haswell_modeset_global_resources(struct drm_device *dev)
-{
-	modeset_update_crtc_power_domains(dev);
+	crtc->lowfreq_avail = false;
+
+	return 0;
 }
 
-static int haswell_crtc_mode_set(struct drm_crtc *crtc,
-				 int x, int y,
-				 struct drm_framebuffer *fb)
+static void skylake_get_ddi_pll(struct drm_i915_private *dev_priv,
+				enum port port,
+				struct intel_crtc_config *pipe_config)
 {
-	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-
-	if (!intel_ddi_pll_select(intel_crtc))
-		return -EINVAL;
+	u32 temp;
 
-	intel_crtc->lowfreq_avail = false;
+	temp = I915_READ(DPLL_CTRL2) & DPLL_CTRL2_DDI_CLK_SEL_MASK(port);
+	pipe_config->ddi_pll_sel = temp >> (port * 3 + 1);
 
-	return 0;
+	switch (pipe_config->ddi_pll_sel) {
+	case SKL_DPLL1:
+		pipe_config->shared_dpll = DPLL_ID_SKL_DPLL1;
+		break;
+	case SKL_DPLL2:
+		pipe_config->shared_dpll = DPLL_ID_SKL_DPLL2;
+		break;
+	case SKL_DPLL3:
+		pipe_config->shared_dpll = DPLL_ID_SKL_DPLL3;
+		break;
+	}
 }
 
 static void haswell_get_ddi_pll(struct drm_i915_private *dev_priv,
@@ -7808,7 +8102,10 @@ static void haswell_get_ddi_port_state(struct intel_crtc *crtc,
 
 	port = (tmp & TRANS_DDI_PORT_MASK) >> TRANS_DDI_PORT_SHIFT;
 
-	haswell_get_ddi_pll(dev_priv, port, pipe_config);
+	if (IS_SKYLAKE(dev))
+		skylake_get_ddi_pll(dev_priv, port, pipe_config);
+	else
+		haswell_get_ddi_pll(dev_priv, port, pipe_config);
 
 	if (pipe_config->shared_dpll >= 0) {
 		pll = &dev_priv->shared_dplls[pipe_config->shared_dpll];
@@ -7822,7 +8119,8 @@ static void haswell_get_ddi_port_state(struct intel_crtc *crtc,
 	 * DDI E. So just check whether this pipe is wired to DDI E and whether
 	 * the PCH transcoder is on.
 	 */
-	if ((port == PORT_E) && I915_READ(LPT_TRANSCONF) & TRANS_ENABLE) {
+	if (INTEL_INFO(dev)->gen < 9 &&
+	    (port == PORT_E) && I915_READ(LPT_TRANSCONF) & TRANS_ENABLE) {
 		pipe_config->has_pch_encoder = true;
 
 		tmp = I915_READ(FDI_RX_CTL(PIPE_A));
@@ -7841,7 +8139,7 @@ static bool haswell_get_pipe_config(struct intel_crtc *crtc,
 	enum intel_display_power_domain pfit_domain;
 	uint32_t tmp;
 
-	if (!intel_display_power_enabled(dev_priv,
+	if (!intel_display_power_is_enabled(dev_priv,
 					 POWER_DOMAIN_PIPE(crtc->pipe)))
 		return false;
 
@@ -7870,7 +8168,7 @@ static bool haswell_get_pipe_config(struct intel_crtc *crtc,
 			pipe_config->cpu_transcoder = TRANSCODER_EDP;
 	}
 
-	if (!intel_display_power_enabled(dev_priv,
+	if (!intel_display_power_is_enabled(dev_priv,
 			POWER_DOMAIN_TRANSCODER(pipe_config->cpu_transcoder)))
 		return false;
 
@@ -7883,8 +8181,12 @@ static bool haswell_get_pipe_config(struct intel_crtc *crtc,
 	intel_get_pipe_timings(crtc, pipe_config);
 
 	pfit_domain = POWER_DOMAIN_PIPE_PANEL_FITTER(crtc->pipe);
-	if (intel_display_power_enabled(dev_priv, pfit_domain))
-		ironlake_get_pfit_config(crtc, pipe_config);
+	if (intel_display_power_is_enabled(dev_priv, pfit_domain)) {
+		if (IS_SKYLAKE(dev))
+			skylake_get_pfit_config(crtc, pipe_config);
+		else
+			ironlake_get_pfit_config(crtc, pipe_config);
+	}
 
 	if (IS_HASWELL(dev))
 		pipe_config->ips_enabled = hsw_crtc_supports_ips(crtc) &&
@@ -7900,314 +8202,6 @@ static bool haswell_get_pipe_config(struct intel_crtc *crtc,
 	return true;
 }
 
-static struct {
-	int clock;
-	u32 config;
-} hdmi_audio_clock[] = {
-	{ DIV_ROUND_UP(25200 * 1000, 1001), AUD_CONFIG_PIXEL_CLOCK_HDMI_25175 },
-	{ 25200, AUD_CONFIG_PIXEL_CLOCK_HDMI_25200 }, /* default per bspec */
-	{ 27000, AUD_CONFIG_PIXEL_CLOCK_HDMI_27000 },
-	{ 27000 * 1001 / 1000, AUD_CONFIG_PIXEL_CLOCK_HDMI_27027 },
-	{ 54000, AUD_CONFIG_PIXEL_CLOCK_HDMI_54000 },
-	{ 54000 * 1001 / 1000, AUD_CONFIG_PIXEL_CLOCK_HDMI_54054 },
-	{ DIV_ROUND_UP(74250 * 1000, 1001), AUD_CONFIG_PIXEL_CLOCK_HDMI_74176 },
-	{ 74250, AUD_CONFIG_PIXEL_CLOCK_HDMI_74250 },
-	{ DIV_ROUND_UP(148500 * 1000, 1001), AUD_CONFIG_PIXEL_CLOCK_HDMI_148352 },
-	{ 148500, AUD_CONFIG_PIXEL_CLOCK_HDMI_148500 },
-};
-
-/* get AUD_CONFIG_PIXEL_CLOCK_HDMI_* value for mode */
-static u32 audio_config_hdmi_pixel_clock(struct drm_display_mode *mode)
-{
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(hdmi_audio_clock); i++) {
-		if (mode->clock == hdmi_audio_clock[i].clock)
-			break;
-	}
-
-	if (i == ARRAY_SIZE(hdmi_audio_clock)) {
-		DRM_DEBUG_KMS("HDMI audio pixel clock setting for %d not found, falling back to defaults\n", mode->clock);
-		i = 1;
-	}
-
-	DRM_DEBUG_KMS("Configuring HDMI audio for pixel clock %d (0x%08x)\n",
-		      hdmi_audio_clock[i].clock,
-		      hdmi_audio_clock[i].config);
-
-	return hdmi_audio_clock[i].config;
-}
-
-static bool intel_eld_uptodate(struct drm_connector *connector,
-			       int reg_eldv, uint32_t bits_eldv,
-			       int reg_elda, uint32_t bits_elda,
-			       int reg_edid)
-{
-	struct drm_i915_private *dev_priv = connector->dev->dev_private;
-	uint8_t *eld = connector->eld;
-	uint32_t i;
-
-	i = I915_READ(reg_eldv);
-	i &= bits_eldv;
-
-	if (!eld[0])
-		return !i;
-
-	if (!i)
-		return false;
-
-	i = I915_READ(reg_elda);
-	i &= ~bits_elda;
-	I915_WRITE(reg_elda, i);
-
-	for (i = 0; i < eld[2]; i++)
-		if (I915_READ(reg_edid) != *((uint32_t *)eld + i))
-			return false;
-
-	return true;
-}
-
-static void g4x_write_eld(struct drm_connector *connector,
-			  struct drm_crtc *crtc,
-			  struct drm_display_mode *mode)
-{
-	struct drm_i915_private *dev_priv = connector->dev->dev_private;
-	uint8_t *eld = connector->eld;
-	uint32_t eldv;
-	uint32_t len;
-	uint32_t i;
-
-	i = I915_READ(G4X_AUD_VID_DID);
-
-	if (i == INTEL_AUDIO_DEVBLC || i == INTEL_AUDIO_DEVCL)
-		eldv = G4X_ELDV_DEVCL_DEVBLC;
-	else
-		eldv = G4X_ELDV_DEVCTG;
-
-	if (intel_eld_uptodate(connector,
-			       G4X_AUD_CNTL_ST, eldv,
-			       G4X_AUD_CNTL_ST, G4X_ELD_ADDR,
-			       G4X_HDMIW_HDMIEDID))
-		return;
-
-	i = I915_READ(G4X_AUD_CNTL_ST);
-	i &= ~(eldv | G4X_ELD_ADDR);
-	len = (i >> 9) & 0x1f;		/* ELD buffer size */
-	I915_WRITE(G4X_AUD_CNTL_ST, i);
-
-	if (!eld[0])
-		return;
-
-	len = min_t(uint8_t, eld[2], len);
-	DRM_DEBUG_DRIVER("ELD size %d\n", len);
-	for (i = 0; i < len; i++)
-		I915_WRITE(G4X_HDMIW_HDMIEDID, *((uint32_t *)eld + i));
-
-	i = I915_READ(G4X_AUD_CNTL_ST);
-	i |= eldv;
-	I915_WRITE(G4X_AUD_CNTL_ST, i);
-}
-
-static void haswell_write_eld(struct drm_connector *connector,
-			      struct drm_crtc *crtc,
-			      struct drm_display_mode *mode)
-{
-	struct drm_i915_private *dev_priv = connector->dev->dev_private;
-	uint8_t *eld = connector->eld;
-	uint32_t eldv;
-	uint32_t i;
-	int len;
-	int pipe = to_intel_crtc(crtc)->pipe;
-	int tmp;
-
-	int hdmiw_hdmiedid = HSW_AUD_EDID_DATA(pipe);
-	int aud_cntl_st = HSW_AUD_DIP_ELD_CTRL(pipe);
-	int aud_config = HSW_AUD_CFG(pipe);
-	int aud_cntrl_st2 = HSW_AUD_PIN_ELD_CP_VLD;
-
-	/* Audio output enable */
-	DRM_DEBUG_DRIVER("HDMI audio: enable codec\n");
-	tmp = I915_READ(aud_cntrl_st2);
-	tmp |= (AUDIO_OUTPUT_ENABLE_A << (pipe * 4));
-	I915_WRITE(aud_cntrl_st2, tmp);
-	POSTING_READ(aud_cntrl_st2);
-
-	assert_pipe_disabled(dev_priv, to_intel_crtc(crtc)->pipe);
-
-	/* Set ELD valid state */
-	tmp = I915_READ(aud_cntrl_st2);
-	DRM_DEBUG_DRIVER("HDMI audio: pin eld vld status=0x%08x\n", tmp);
-	tmp |= (AUDIO_ELD_VALID_A << (pipe * 4));
-	I915_WRITE(aud_cntrl_st2, tmp);
-	tmp = I915_READ(aud_cntrl_st2);
-	DRM_DEBUG_DRIVER("HDMI audio: eld vld status=0x%08x\n", tmp);
-
-	/* Enable HDMI mode */
-	tmp = I915_READ(aud_config);
-	DRM_DEBUG_DRIVER("HDMI audio: audio conf: 0x%08x\n", tmp);
-	/* clear N_programing_enable and N_value_index */
-	tmp &= ~(AUD_CONFIG_N_VALUE_INDEX | AUD_CONFIG_N_PROG_ENABLE);
-	I915_WRITE(aud_config, tmp);
-
-	DRM_DEBUG_DRIVER("ELD on pipe %c\n", pipe_name(pipe));
-
-	eldv = AUDIO_ELD_VALID_A << (pipe * 4);
-
-	if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DISPLAYPORT)) {
-		DRM_DEBUG_DRIVER("ELD: DisplayPort detected\n");
-		eld[5] |= (1 << 2);	/* Conn_Type, 0x1 = DisplayPort */
-		I915_WRITE(aud_config, AUD_CONFIG_N_VALUE_INDEX); /* 0x1 = DP */
-	} else {
-		I915_WRITE(aud_config, audio_config_hdmi_pixel_clock(mode));
-	}
-
-	if (intel_eld_uptodate(connector,
-			       aud_cntrl_st2, eldv,
-			       aud_cntl_st, IBX_ELD_ADDRESS,
-			       hdmiw_hdmiedid))
-		return;
-
-	i = I915_READ(aud_cntrl_st2);
-	i &= ~eldv;
-	I915_WRITE(aud_cntrl_st2, i);
-
-	if (!eld[0])
-		return;
-
-	i = I915_READ(aud_cntl_st);
-	i &= ~IBX_ELD_ADDRESS;
-	I915_WRITE(aud_cntl_st, i);
-	i = (i >> 29) & DIP_PORT_SEL_MASK;		/* DIP_Port_Select, 0x1 = PortB */
-	DRM_DEBUG_DRIVER("port num:%d\n", i);
-
-	len = min_t(uint8_t, eld[2], 21);	/* 84 bytes of hw ELD buffer */
-	DRM_DEBUG_DRIVER("ELD size %d\n", len);
-	for (i = 0; i < len; i++)
-		I915_WRITE(hdmiw_hdmiedid, *((uint32_t *)eld + i));
-
-	i = I915_READ(aud_cntrl_st2);
-	i |= eldv;
-	I915_WRITE(aud_cntrl_st2, i);
-
-}
-
-static void ironlake_write_eld(struct drm_connector *connector,
-			       struct drm_crtc *crtc,
-			       struct drm_display_mode *mode)
-{
-	struct drm_i915_private *dev_priv = connector->dev->dev_private;
-	uint8_t *eld = connector->eld;
-	uint32_t eldv;
-	uint32_t i;
-	int len;
-	int hdmiw_hdmiedid;
-	int aud_config;
-	int aud_cntl_st;
-	int aud_cntrl_st2;
-	int pipe = to_intel_crtc(crtc)->pipe;
-
-	if (HAS_PCH_IBX(connector->dev)) {
-		hdmiw_hdmiedid = IBX_HDMIW_HDMIEDID(pipe);
-		aud_config = IBX_AUD_CFG(pipe);
-		aud_cntl_st = IBX_AUD_CNTL_ST(pipe);
-		aud_cntrl_st2 = IBX_AUD_CNTL_ST2;
-	} else if (IS_VALLEYVIEW(connector->dev)) {
-		hdmiw_hdmiedid = VLV_HDMIW_HDMIEDID(pipe);
-		aud_config = VLV_AUD_CFG(pipe);
-		aud_cntl_st = VLV_AUD_CNTL_ST(pipe);
-		aud_cntrl_st2 = VLV_AUD_CNTL_ST2;
-	} else {
-		hdmiw_hdmiedid = CPT_HDMIW_HDMIEDID(pipe);
-		aud_config = CPT_AUD_CFG(pipe);
-		aud_cntl_st = CPT_AUD_CNTL_ST(pipe);
-		aud_cntrl_st2 = CPT_AUD_CNTRL_ST2;
-	}
-
-	DRM_DEBUG_DRIVER("ELD on pipe %c\n", pipe_name(pipe));
-
-	if (IS_VALLEYVIEW(connector->dev))  {
-		struct intel_encoder *intel_encoder;
-		struct intel_digital_port *intel_dig_port;
-
-		intel_encoder = intel_attached_encoder(connector);
-		intel_dig_port = enc_to_dig_port(&intel_encoder->base);
-		i = intel_dig_port->port;
-	} else {
-		i = I915_READ(aud_cntl_st);
-		i = (i >> 29) & DIP_PORT_SEL_MASK;
-		/* DIP_Port_Select, 0x1 = PortB */
-	}
-
-	if (!i) {
-		DRM_DEBUG_DRIVER("Audio directed to unknown port\n");
-		/* operate blindly on all ports */
-		eldv = IBX_ELD_VALIDB;
-		eldv |= IBX_ELD_VALIDB << 4;
-		eldv |= IBX_ELD_VALIDB << 8;
-	} else {
-		DRM_DEBUG_DRIVER("ELD on port %c\n", port_name(i));
-		eldv = IBX_ELD_VALIDB << ((i - 1) * 4);
-	}
-
-	if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DISPLAYPORT)) {
-		DRM_DEBUG_DRIVER("ELD: DisplayPort detected\n");
-		eld[5] |= (1 << 2);	/* Conn_Type, 0x1 = DisplayPort */
-		I915_WRITE(aud_config, AUD_CONFIG_N_VALUE_INDEX); /* 0x1 = DP */
-	} else {
-		I915_WRITE(aud_config, audio_config_hdmi_pixel_clock(mode));
-	}
-
-	if (intel_eld_uptodate(connector,
-			       aud_cntrl_st2, eldv,
-			       aud_cntl_st, IBX_ELD_ADDRESS,
-			       hdmiw_hdmiedid))
-		return;
-
-	i = I915_READ(aud_cntrl_st2);
-	i &= ~eldv;
-	I915_WRITE(aud_cntrl_st2, i);
-
-	if (!eld[0])
-		return;
-
-	i = I915_READ(aud_cntl_st);
-	i &= ~IBX_ELD_ADDRESS;
-	I915_WRITE(aud_cntl_st, i);
-
-	len = min_t(uint8_t, eld[2], 21);	/* 84 bytes of hw ELD buffer */
-	DRM_DEBUG_DRIVER("ELD size %d\n", len);
-	for (i = 0; i < len; i++)
-		I915_WRITE(hdmiw_hdmiedid, *((uint32_t *)eld + i));
-
-	i = I915_READ(aud_cntrl_st2);
-	i |= eldv;
-	I915_WRITE(aud_cntrl_st2, i);
-}
-
-void intel_write_eld(struct drm_encoder *encoder,
-		     struct drm_display_mode *mode)
-{
-	struct drm_crtc *crtc = encoder->crtc;
-	struct drm_connector *connector;
-	struct drm_device *dev = encoder->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	connector = drm_select_eld(encoder, mode);
-	if (!connector)
-		return;
-
-	DRM_DEBUG_DRIVER("ELD on [CONNECTOR:%d:%s], [ENCODER:%d:%s]\n",
-			 connector->base.id,
-			 connector->name,
-			 connector->encoder->base.id,
-			 connector->encoder->name);
-
-	connector->eld[6] = drm_av_sync_delay(connector, mode) / 2;
-
-	if (dev_priv->display.write_eld)
-		dev_priv->display.write_eld(connector, crtc, mode);
-}
-
 static void i845_update_cursor(struct drm_crtc *crtc, u32 base)
 {
 	struct drm_device *dev = crtc->dev;
@@ -8253,8 +8247,10 @@ static void i845_update_cursor(struct drm_crtc *crtc, u32 base)
 		intel_crtc->cursor_cntl = 0;
 	}
 
-	if (intel_crtc->cursor_base != base)
+	if (intel_crtc->cursor_base != base) {
 		I915_WRITE(_CURABASE, base);
+		intel_crtc->cursor_base = base;
+	}
 
 	if (intel_crtc->cursor_size != size) {
 		I915_WRITE(CURSIZE, size);
@@ -8294,9 +8290,13 @@ static void i9xx_update_cursor(struct drm_crtc *crtc, u32 base)
 				return;
 		}
 		cntl |= pipe << 28; /* Connect to correct pipe */
+
+		if (IS_HASWELL(dev) || IS_BROADWELL(dev))
+			cntl |= CURSOR_PIPE_CSC_ENABLE;
 	}
-	if (IS_HASWELL(dev) || IS_BROADWELL(dev))
-		cntl |= CURSOR_PIPE_CSC_ENABLE;
+
+	if (to_intel_plane(crtc->cursor)->rotation == BIT(DRM_ROTATE_180))
+		cntl |= CURSOR_ROTATE_180;
 
 	if (intel_crtc->cursor_cntl != cntl) {
 		I915_WRITE(CURCNTR(pipe), cntl);
@@ -8307,6 +8307,8 @@ static void i9xx_update_cursor(struct drm_crtc *crtc, u32 base)
 	/* and commit changes on next vblank */
 	I915_WRITE(CURBASE(pipe), base);
 	POSTING_READ(CURBASE(pipe));
+
+	intel_crtc->cursor_base = base;
 }
 
 /* If no-part of the cursor is visible on the framebuffer, then the GPU may hang... */
@@ -8353,11 +8355,17 @@ static void intel_crtc_update_cursor(struct drm_crtc *crtc,
 
 	I915_WRITE(CURPOS(pipe), pos);
 
+	/* ILK+ do this automagically */
+	if (HAS_GMCH_DISPLAY(dev) &&
+		to_intel_plane(crtc->cursor)->rotation == BIT(DRM_ROTATE_180)) {
+		base += (intel_crtc->cursor_height *
+			intel_crtc->cursor_width - 1) * 4;
+	}
+
 	if (IS_845G(dev) || IS_I865G(dev))
 		i845_update_cursor(crtc, base);
 	else
 		i9xx_update_cursor(crtc, base);
-	intel_crtc->cursor_base = base;
 }
 
 static bool cursor_size_ok(struct drm_device *dev,
@@ -8397,22 +8405,15 @@ static bool cursor_size_ok(struct drm_device *dev,
 	return true;
 }
 
-/*
- * intel_crtc_cursor_set_obj - Set cursor to specified GEM object
- *
- * Note that the object's reference will be consumed if the update fails.  If
- * the update succeeds, the reference of the old object (if any) will be
- * consumed.
- */
 static int intel_crtc_cursor_set_obj(struct drm_crtc *crtc,
 				     struct drm_i915_gem_object *obj,
 				     uint32_t width, uint32_t height)
 {
 	struct drm_device *dev = crtc->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	enum pipe pipe = intel_crtc->pipe;
-	unsigned old_width, stride;
+	unsigned old_width;
 	uint32_t addr;
 	int ret;
 
@@ -8424,30 +8425,11 @@ static int intel_crtc_cursor_set_obj(struct drm_crtc *crtc,
 		goto finish;
 	}
 
-	/* Check for which cursor types we support */
-	if (!cursor_size_ok(dev, width, height)) {
-		DRM_DEBUG("Cursor dimension not supported\n");
-		return -EINVAL;
-	}
-
-	stride = roundup_pow_of_two(width) * 4;
-	if (obj->base.size < stride * height) {
-		DRM_DEBUG_KMS("buffer is too small\n");
-		ret = -ENOMEM;
-		goto fail;
-	}
-
 	/* we only need to pin inside GTT if cursor is non-phy */
 	mutex_lock(&dev->struct_mutex);
 	if (!INTEL_INFO(dev)->cursor_needs_physical) {
 		unsigned alignment;
 
-		if (obj->tiling_mode) {
-			DRM_DEBUG_KMS("cursor cannot be tiled\n");
-			ret = -EINVAL;
-			goto fail_locked;
-		}
-
 		/*
 		 * Global gtt pte registers are special registers which actually
 		 * forward writes to a chunk of system memory. Which means that
@@ -8514,17 +8496,15 @@ static int intel_crtc_cursor_set_obj(struct drm_crtc *crtc,
 		if (old_width != width)
 			intel_update_watermarks(crtc);
 		intel_crtc_update_cursor(crtc, intel_crtc->cursor_bo != NULL);
-	}
 
-	intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_CURSOR(pipe));
+		intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_CURSOR(pipe));
+	}
 
 	return 0;
 fail_unpin:
 	i915_gem_object_unpin_from_display_plane(obj);
 fail_locked:
 	mutex_unlock(&dev->struct_mutex);
-fail:
-	drm_gem_object_unreference_unlocked(&obj->base);
 	return ret;
 }
 
@@ -8559,7 +8539,7 @@ __intel_framebuffer_create(struct drm_device *dev,
 
 	intel_fb = kzalloc(sizeof(*intel_fb), GFP_KERNEL);
 	if (!intel_fb) {
-		drm_gem_object_unreference_unlocked(&obj->base);
+		drm_gem_object_unreference(&obj->base);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -8569,7 +8549,7 @@ __intel_framebuffer_create(struct drm_device *dev,
 
 	return &intel_fb->base;
 err:
-	drm_gem_object_unreference_unlocked(&obj->base);
+	drm_gem_object_unreference(&obj->base);
 	kfree(intel_fb);
 
 	return ERR_PTR(ret);
@@ -8702,6 +8682,9 @@ retry:
 		ret = drm_modeset_lock(&crtc->mutex, ctx);
 		if (ret)
 			goto fail_unlock;
+		ret = drm_modeset_lock(&crtc->primary->mutex, ctx);
+		if (ret)
+			goto fail_unlock;
 
 		old->dpms_mode = connector->dpms;
 		old->load_detect_temp = false;
@@ -8739,6 +8722,9 @@ retry:
 	ret = drm_modeset_lock(&crtc->mutex, ctx);
 	if (ret)
 		goto fail_unlock;
+	ret = drm_modeset_lock(&crtc->primary->mutex, ctx);
+	if (ret)
+		goto fail_unlock;
 	intel_encoder->new_crtc = to_intel_crtc(crtc);
 	to_intel_connector(connector)->new_encoder = intel_encoder;
 
@@ -9021,35 +9007,6 @@ struct drm_display_mode *intel_crtc_mode_get(struct drm_device *dev,
 	return mode;
 }
 
-static void intel_increase_pllclock(struct drm_device *dev,
-				    enum pipe pipe)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	int dpll_reg = DPLL(pipe);
-	int dpll;
-
-	if (!HAS_GMCH_DISPLAY(dev))
-		return;
-
-	if (!dev_priv->lvds_downclock_avail)
-		return;
-
-	dpll = I915_READ(dpll_reg);
-	if (!HAS_PIPE_CXSR(dev) && (dpll & DISPLAY_RATE_SELECT_FPA1)) {
-		DRM_DEBUG_DRIVER("upclocking LVDS\n");
-
-		assert_panel_unlocked(dev_priv, pipe);
-
-		dpll &= ~DISPLAY_RATE_SELECT_FPA1;
-		I915_WRITE(dpll_reg, dpll);
-		intel_wait_for_vblank(dev, pipe);
-
-		dpll = I915_READ(dpll_reg);
-		if (dpll & DISPLAY_RATE_SELECT_FPA1)
-			DRM_DEBUG_DRIVER("failed to upclock LVDS!\n");
-	}
-}
-
 static void intel_decrease_pllclock(struct drm_crtc *crtc)
 {
 	struct drm_device *dev = crtc->dev;
@@ -9125,199 +9082,16 @@ out:
 	intel_runtime_pm_put(dev_priv);
 }
 
-
-/**
- * intel_mark_fb_busy - mark given planes as busy
- * @dev: DRM device
- * @frontbuffer_bits: bits for the affected planes
- * @ring: optional ring for asynchronous commands
- *
- * This function gets called every time the screen contents change. It can be
- * used to keep e.g. the update rate at the nominal refresh rate with DRRS.
- */
-static void intel_mark_fb_busy(struct drm_device *dev,
-			       unsigned frontbuffer_bits,
-			       struct intel_engine_cs *ring)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	enum pipe pipe;
-
-	if (!i915.powersave)
-		return;
-
-	for_each_pipe(dev_priv, pipe) {
-		if (!(frontbuffer_bits & INTEL_FRONTBUFFER_ALL_MASK(pipe)))
-			continue;
-
-		intel_increase_pllclock(dev, pipe);
-		if (ring && intel_fbc_enabled(dev))
-			ring->fbc_dirty = true;
-	}
-}
-
-/**
- * intel_fb_obj_invalidate - invalidate frontbuffer object
- * @obj: GEM object to invalidate
- * @ring: set for asynchronous rendering
- *
- * This function gets called every time rendering on the given object starts and
- * frontbuffer caching (fbc, low refresh rate for DRRS, panel self refresh) must
- * be invalidated. If @ring is non-NULL any subsequent invalidation will be delayed
- * until the rendering completes or a flip on this frontbuffer plane is
- * scheduled.
- */
-void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
-			     struct intel_engine_cs *ring)
-{
-	struct drm_device *dev = obj->base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
-
-	if (!obj->frontbuffer_bits)
-		return;
-
-	if (ring) {
-		mutex_lock(&dev_priv->fb_tracking.lock);
-		dev_priv->fb_tracking.busy_bits
-			|= obj->frontbuffer_bits;
-		dev_priv->fb_tracking.flip_bits
-			&= ~obj->frontbuffer_bits;
-		mutex_unlock(&dev_priv->fb_tracking.lock);
-	}
-
-	intel_mark_fb_busy(dev, obj->frontbuffer_bits, ring);
-
-	intel_edp_psr_invalidate(dev, obj->frontbuffer_bits);
-}
-
-/**
- * intel_frontbuffer_flush - flush frontbuffer
- * @dev: DRM device
- * @frontbuffer_bits: frontbuffer plane tracking bits
- *
- * This function gets called every time rendering on the given planes has
- * completed and frontbuffer caching can be started again. Flushes will get
- * delayed if they're blocked by some oustanding asynchronous rendering.
- *
- * Can be called without any locks held.
- */
-void intel_frontbuffer_flush(struct drm_device *dev,
-			     unsigned frontbuffer_bits)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	/* Delay flushing when rings are still busy.*/
-	mutex_lock(&dev_priv->fb_tracking.lock);
-	frontbuffer_bits &= ~dev_priv->fb_tracking.busy_bits;
-	mutex_unlock(&dev_priv->fb_tracking.lock);
-
-	intel_mark_fb_busy(dev, frontbuffer_bits, NULL);
-
-	intel_edp_psr_flush(dev, frontbuffer_bits);
-
-	/*
-	 * FIXME: Unconditional fbc flushing here is a rather gross hack and
-	 * needs to be reworked into a proper frontbuffer tracking scheme like
-	 * psr employs.
-	 */
-	if (IS_BROADWELL(dev))
-		gen8_fbc_sw_flush(dev, FBC_REND_CACHE_CLEAN);
-}
-
-/**
- * intel_fb_obj_flush - flush frontbuffer object
- * @obj: GEM object to flush
- * @retire: set when retiring asynchronous rendering
- *
- * This function gets called every time rendering on the given object has
- * completed and frontbuffer caching can be started again. If @retire is true
- * then any delayed flushes will be unblocked.
- */
-void intel_fb_obj_flush(struct drm_i915_gem_object *obj,
-			bool retire)
-{
-	struct drm_device *dev = obj->base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned frontbuffer_bits;
-
-	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
-
-	if (!obj->frontbuffer_bits)
-		return;
-
-	frontbuffer_bits = obj->frontbuffer_bits;
-
-	if (retire) {
-		mutex_lock(&dev_priv->fb_tracking.lock);
-		/* Filter out new bits since rendering started. */
-		frontbuffer_bits &= dev_priv->fb_tracking.busy_bits;
-
-		dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits;
-		mutex_unlock(&dev_priv->fb_tracking.lock);
-	}
-
-	intel_frontbuffer_flush(dev, frontbuffer_bits);
-}
-
-/**
- * intel_frontbuffer_flip_prepare - prepare asnychronous frontbuffer flip
- * @dev: DRM device
- * @frontbuffer_bits: frontbuffer plane tracking bits
- *
- * This function gets called after scheduling a flip on @obj. The actual
- * frontbuffer flushing will be delayed until completion is signalled with
- * intel_frontbuffer_flip_complete. If an invalidate happens in between this
- * flush will be cancelled.
- *
- * Can be called without any locks held.
- */
-void intel_frontbuffer_flip_prepare(struct drm_device *dev,
-				    unsigned frontbuffer_bits)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	mutex_lock(&dev_priv->fb_tracking.lock);
-	dev_priv->fb_tracking.flip_bits
-		|= frontbuffer_bits;
-	mutex_unlock(&dev_priv->fb_tracking.lock);
-}
-
-/**
- * intel_frontbuffer_flip_complete - complete asynchronous frontbuffer flush
- * @dev: DRM device
- * @frontbuffer_bits: frontbuffer plane tracking bits
- *
- * This function gets called after the flip has been latched and will complete
- * on the next vblank. It will execute the fush if it hasn't been cancalled yet.
- *
- * Can be called without any locks held.
- */
-void intel_frontbuffer_flip_complete(struct drm_device *dev,
-				     unsigned frontbuffer_bits)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	mutex_lock(&dev_priv->fb_tracking.lock);
-	/* Mask any cancelled flips. */
-	frontbuffer_bits &= dev_priv->fb_tracking.flip_bits;
-	dev_priv->fb_tracking.flip_bits &= ~frontbuffer_bits;
-	mutex_unlock(&dev_priv->fb_tracking.lock);
-
-	intel_frontbuffer_flush(dev, frontbuffer_bits);
-}
-
 static void intel_crtc_destroy(struct drm_crtc *crtc)
 {
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	struct drm_device *dev = crtc->dev;
 	struct intel_unpin_work *work;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dev->event_lock, flags);
+	spin_lock_irq(&dev->event_lock);
 	work = intel_crtc->unpin_work;
 	intel_crtc->unpin_work = NULL;
-	spin_unlock_irqrestore(&dev->event_lock, flags);
+	spin_unlock_irq(&dev->event_lock);
 
 	if (work) {
 		cancel_work_sync(&work->work);
@@ -9363,6 +9137,10 @@ static void do_intel_finish_page_flip(struct drm_device *dev,
 	if (intel_crtc == NULL)
 		return;
 
+	/*
+	 * This is called both by irq handlers and the reset code (to complete
+	 * lost pageflips) so needs the full irqsave spinlocks.
+	 */
 	spin_lock_irqsave(&dev->event_lock, flags);
 	work = intel_crtc->unpin_work;
 
@@ -9448,7 +9226,12 @@ void intel_prepare_page_flip(struct drm_device *dev, int plane)
 		to_intel_crtc(dev_priv->plane_to_crtc_mapping[plane]);
 	unsigned long flags;
 
-	/* NB: An MMIO update of the plane base pointer will also
+
+	/*
+	 * This is called both by irq handlers and the reset code (to complete
+	 * lost pageflips) so needs the full irqsave spinlocks.
+	 *
+	 * NB: An MMIO update of the plane base pointer will also
 	 * generate a page-flip completion irq, i.e. every modeset
 	 * is also accompanied by a spurious intel_prepare_page_flip().
 	 */
@@ -9738,115 +9521,128 @@ static void intel_do_mmio_flip(struct intel_crtc *intel_crtc)
 	struct intel_framebuffer *intel_fb =
 		to_intel_framebuffer(intel_crtc->base.primary->fb);
 	struct drm_i915_gem_object *obj = intel_fb->obj;
+	bool atomic_update;
+	u32 start_vbl_count;
 	u32 dspcntr;
 	u32 reg;
 
 	intel_mark_page_flip_active(intel_crtc);
 
+	atomic_update = intel_pipe_update_start(intel_crtc, &start_vbl_count);
+
 	reg = DSPCNTR(intel_crtc->plane);
 	dspcntr = I915_READ(reg);
 
-	if (INTEL_INFO(dev)->gen >= 4) {
-		if (obj->tiling_mode != I915_TILING_NONE)
-			dspcntr |= DISPPLANE_TILED;
-		else
-			dspcntr &= ~DISPPLANE_TILED;
-	}
+	if (obj->tiling_mode != I915_TILING_NONE)
+		dspcntr |= DISPPLANE_TILED;
+	else
+		dspcntr &= ~DISPPLANE_TILED;
+
 	I915_WRITE(reg, dspcntr);
 
 	I915_WRITE(DSPSURF(intel_crtc->plane),
 		   intel_crtc->unpin_work->gtt_offset);
 	POSTING_READ(DSPSURF(intel_crtc->plane));
+
+	if (atomic_update)
+		intel_pipe_update_end(intel_crtc, start_vbl_count);
 }
 
-static int intel_postpone_flip(struct drm_i915_gem_object *obj)
+static void intel_mmio_flip_work_func(struct work_struct *work)
 {
+	struct intel_crtc *intel_crtc =
+		container_of(work, struct intel_crtc, mmio_flip.work);
 	struct intel_engine_cs *ring;
-	int ret;
+	uint32_t seqno;
 
-	lockdep_assert_held(&obj->base.dev->struct_mutex);
+	seqno = intel_crtc->mmio_flip.seqno;
+	ring = intel_crtc->mmio_flip.ring;
 
-	if (!obj->last_write_seqno)
-		return 0;
+	if (seqno)
+		WARN_ON(__i915_wait_seqno(ring, seqno,
+					  intel_crtc->reset_counter,
+					  false, NULL, NULL) != 0);
 
-	ring = obj->ring;
-
-	if (i915_seqno_passed(ring->get_seqno(ring, true),
-			      obj->last_write_seqno))
-		return 0;
-
-	ret = i915_gem_check_olr(ring, obj->last_write_seqno);
-	if (ret)
-		return ret;
-
-	if (WARN_ON(!ring->irq_get(ring)))
-		return 0;
-
-	return 1;
+	intel_do_mmio_flip(intel_crtc);
 }
 
-void intel_notify_mmio_flip(struct intel_engine_cs *ring)
+static int intel_queue_mmio_flip(struct drm_device *dev,
+				 struct drm_crtc *crtc,
+				 struct drm_framebuffer *fb,
+				 struct drm_i915_gem_object *obj,
+				 struct intel_engine_cs *ring,
+				 uint32_t flags)
 {
-	struct drm_i915_private *dev_priv = to_i915(ring->dev);
-	struct intel_crtc *intel_crtc;
-	unsigned long irq_flags;
-	u32 seqno;
-
-	seqno = ring->get_seqno(ring, false);
-
-	spin_lock_irqsave(&dev_priv->mmio_flip_lock, irq_flags);
-	for_each_intel_crtc(ring->dev, intel_crtc) {
-		struct intel_mmio_flip *mmio_flip;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 
-		mmio_flip = &intel_crtc->mmio_flip;
-		if (mmio_flip->seqno == 0)
-			continue;
+	intel_crtc->mmio_flip.seqno = obj->last_write_seqno;
+	intel_crtc->mmio_flip.ring = obj->ring;
 
-		if (ring->id != mmio_flip->ring_id)
-			continue;
+	schedule_work(&intel_crtc->mmio_flip.work);
 
-		if (i915_seqno_passed(seqno, mmio_flip->seqno)) {
-			intel_do_mmio_flip(intel_crtc);
-			mmio_flip->seqno = 0;
-			ring->irq_put(ring);
-		}
-	}
-	spin_unlock_irqrestore(&dev_priv->mmio_flip_lock, irq_flags);
+	return 0;
 }
 
-static int intel_queue_mmio_flip(struct drm_device *dev,
+static int intel_gen9_queue_flip(struct drm_device *dev,
 				 struct drm_crtc *crtc,
 				 struct drm_framebuffer *fb,
 				 struct drm_i915_gem_object *obj,
 				 struct intel_engine_cs *ring,
 				 uint32_t flags)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-	unsigned long irq_flags;
+	uint32_t plane = 0, stride;
 	int ret;
 
-	if (WARN_ON(intel_crtc->mmio_flip.seqno))
-		return -EBUSY;
+	switch(intel_crtc->pipe) {
+	case PIPE_A:
+		plane = MI_DISPLAY_FLIP_SKL_PLANE_1_A;
+		break;
+	case PIPE_B:
+		plane = MI_DISPLAY_FLIP_SKL_PLANE_1_B;
+		break;
+	case PIPE_C:
+		plane = MI_DISPLAY_FLIP_SKL_PLANE_1_C;
+		break;
+	default:
+		WARN_ONCE(1, "unknown plane in flip command\n");
+		return -ENODEV;
+	}
 
-	ret = intel_postpone_flip(obj);
-	if (ret < 0)
-		return ret;
-	if (ret == 0) {
-		intel_do_mmio_flip(intel_crtc);
-		return 0;
+	switch (obj->tiling_mode) {
+	case I915_TILING_NONE:
+		stride = fb->pitches[0] >> 6;
+		break;
+	case I915_TILING_X:
+		stride = fb->pitches[0] >> 9;
+		break;
+	default:
+		WARN_ONCE(1, "unknown tiling in flip command\n");
+		return -ENODEV;
 	}
 
-	spin_lock_irqsave(&dev_priv->mmio_flip_lock, irq_flags);
-	intel_crtc->mmio_flip.seqno = obj->last_write_seqno;
-	intel_crtc->mmio_flip.ring_id = obj->ring->id;
-	spin_unlock_irqrestore(&dev_priv->mmio_flip_lock, irq_flags);
+	ret = intel_ring_begin(ring, 10);
+	if (ret)
+		return ret;
+
+	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
+	intel_ring_emit(ring, DERRMR);
+	intel_ring_emit(ring, ~(DERRMR_PIPEA_PRI_FLIP_DONE |
+				DERRMR_PIPEB_PRI_FLIP_DONE |
+				DERRMR_PIPEC_PRI_FLIP_DONE));
+	intel_ring_emit(ring, MI_STORE_REGISTER_MEM_GEN8(1) |
+			      MI_SRM_LRM_GLOBAL_GTT);
+	intel_ring_emit(ring, DERRMR);
+	intel_ring_emit(ring, ring->scratch.gtt_offset + 256);
+	intel_ring_emit(ring, 0);
+
+	intel_ring_emit(ring, MI_DISPLAY_FLIP_I915 | plane);
+	intel_ring_emit(ring, stride << 6 | obj->tiling_mode);
+	intel_ring_emit(ring, intel_crtc->unpin_work->gtt_offset);
+
+	intel_mark_page_flip_active(intel_crtc);
+	__intel_ring_advance(ring);
 
-	/*
-	 * Double check to catch cases where irq fired before
-	 * mmio flip data was ready
-	 */
-	intel_notify_mmio_flip(obj->ring);
 	return 0;
 }
 
@@ -9905,18 +9701,19 @@ void intel_check_page_flip(struct drm_device *dev, int pipe)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-	unsigned long flags;
+
+	WARN_ON(!in_irq());
 
 	if (crtc == NULL)
 		return;
 
-	spin_lock_irqsave(&dev->event_lock, flags);
+	spin_lock(&dev->event_lock);
 	if (intel_crtc->unpin_work && __intel_pageflip_stall_check(dev, crtc)) {
 		WARN_ONCE(1, "Kicking stuck page flip: queued at %d, now %d\n",
 			 intel_crtc->unpin_work->flip_queued_vblank, drm_vblank_count(dev, pipe));
 		page_flip_completed(intel_crtc);
 	}
-	spin_unlock_irqrestore(&dev->event_lock, flags);
+	spin_unlock(&dev->event_lock);
 }
 
 static int intel_crtc_page_flip(struct drm_crtc *crtc,
@@ -9932,7 +9729,6 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	enum pipe pipe = intel_crtc->pipe;
 	struct intel_unpin_work *work;
 	struct intel_engine_cs *ring;
-	unsigned long flags;
 	int ret;
 
 	/*
@@ -9973,7 +9769,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 		goto free_work;
 
 	/* We borrow the event spin lock for protecting unpin_work */
-	spin_lock_irqsave(&dev->event_lock, flags);
+	spin_lock_irq(&dev->event_lock);
 	if (intel_crtc->unpin_work) {
 		/* Before declaring the flip queue wedged, check if
 		 * the hardware completed the operation behind our backs.
@@ -9983,7 +9779,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 			page_flip_completed(intel_crtc);
 		} else {
 			DRM_DEBUG_DRIVER("flip queue: crtc already busy\n");
-			spin_unlock_irqrestore(&dev->event_lock, flags);
+			spin_unlock_irq(&dev->event_lock);
 
 			drm_crtc_vblank_put(crtc);
 			kfree(work);
@@ -9991,7 +9787,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 		}
 	}
 	intel_crtc->unpin_work = work;
-	spin_unlock_irqrestore(&dev->event_lock, flags);
+	spin_unlock_irq(&dev->event_lock);
 
 	if (atomic_read(&intel_crtc->unpin_work_count) >= 2)
 		flush_workqueue(dev_priv->wq);
@@ -10029,7 +9825,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 		ring = &dev_priv->ring[RCS];
 	}
 
-	ret = intel_pin_and_fence_fb_obj(dev, obj, ring);
+	ret = intel_pin_and_fence_fb_obj(crtc->primary, fb, ring);
 	if (ret)
 		goto cleanup_pending;
 
@@ -10078,9 +9874,9 @@ cleanup_pending:
 	mutex_unlock(&dev->struct_mutex);
 
 cleanup:
-	spin_lock_irqsave(&dev->event_lock, flags);
+	spin_lock_irq(&dev->event_lock);
 	intel_crtc->unpin_work = NULL;
-	spin_unlock_irqrestore(&dev->event_lock, flags);
+	spin_unlock_irq(&dev->event_lock);
 
 	drm_crtc_vblank_put(crtc);
 free_work:
@@ -10091,9 +9887,9 @@ out_hang:
 		intel_crtc_wait_for_pending_flips(crtc);
 		ret = intel_pipe_set_base(crtc, crtc->x, crtc->y, fb);
 		if (ret == 0 && event) {
-			spin_lock_irqsave(&dev->event_lock, flags);
+			spin_lock_irq(&dev->event_lock);
 			drm_send_vblank_event(dev, pipe, event);
-			spin_unlock_irqrestore(&dev->event_lock, flags);
+			spin_unlock_irq(&dev->event_lock);
 		}
 	}
 	return ret;
@@ -10289,6 +10085,10 @@ static void intel_dump_pipe_config(struct intel_crtc *crtc,
 		      pipe_config->dp_m2_n2.link_n,
 		      pipe_config->dp_m2_n2.tu);
 
+	DRM_DEBUG_KMS("audio: %i, infoframes: %i\n",
+		      pipe_config->has_audio,
+		      pipe_config->has_infoframe);
+
 	DRM_DEBUG_KMS("requested mode:\n");
 	drm_mode_debug_printmodeline(&pipe_config->requested_mode);
 	DRM_DEBUG_KMS("adjusted mode:\n");
@@ -10350,6 +10150,48 @@ static bool check_encoder_cloning(struct intel_crtc *crtc)
 	return true;
 }
 
+static bool check_digital_port_conflicts(struct drm_device *dev)
+{
+	struct intel_connector *connector;
+	unsigned int used_ports = 0;
+
+	/*
+	 * Walk the connector list instead of the encoder
+	 * list to detect the problem on ddi platforms
+	 * where there's just one encoder per digital port.
+	 */
+	list_for_each_entry(connector,
+			    &dev->mode_config.connector_list, base.head) {
+		struct intel_encoder *encoder = connector->new_encoder;
+
+		if (!encoder)
+			continue;
+
+		WARN_ON(!encoder->new_crtc);
+
+		switch (encoder->type) {
+			unsigned int port_mask;
+		case INTEL_OUTPUT_UNKNOWN:
+			if (WARN_ON(!HAS_DDI(dev)))
+				break;
+		case INTEL_OUTPUT_DISPLAYPORT:
+		case INTEL_OUTPUT_HDMI:
+		case INTEL_OUTPUT_EDP:
+			port_mask = 1 << enc_to_dig_port(&encoder->base)->port;
+
+			/* the same port mustn't appear more than once */
+			if (used_ports & port_mask)
+				return false;
+
+			used_ports |= port_mask;
+		default:
+			break;
+		}
+	}
+
+	return true;
+}
+
 static struct intel_crtc_config *
 intel_modeset_pipe_config(struct drm_crtc *crtc,
 			  struct drm_framebuffer *fb,
@@ -10366,6 +10208,11 @@ intel_modeset_pipe_config(struct drm_crtc *crtc,
 		return ERR_PTR(-EINVAL);
 	}
 
+	if (!check_digital_port_conflicts(dev)) {
+		DRM_DEBUG_KMS("rejecting conflicting digital port configuration\n");
+		return ERR_PTR(-EINVAL);
+	}
+
 	pipe_config = kzalloc(sizeof(*pipe_config), GFP_KERNEL);
 	if (!pipe_config)
 		return ERR_PTR(-ENOMEM);
@@ -10571,10 +10418,13 @@ static bool intel_crtc_in_use(struct drm_crtc *crtc)
 static void
 intel_modeset_update_state(struct drm_device *dev, unsigned prepare_pipes)
 {
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_encoder *intel_encoder;
 	struct intel_crtc *intel_crtc;
 	struct drm_connector *connector;
 
+	intel_shared_dpll_commit(dev_priv);
+
 	for_each_intel_encoder(dev, intel_encoder) {
 		if (!intel_encoder->base.crtc)
 			continue;
@@ -10754,6 +10604,7 @@ intel_pipe_config_compare(struct drm_device *dev,
 	if ((INTEL_INFO(dev)->gen < 8 && !IS_HASWELL(dev)) ||
 	    IS_VALLEYVIEW(dev))
 		PIPE_CONF_CHECK_I(limited_color_range);
+	PIPE_CONF_CHECK_I(has_infoframe);
 
 	PIPE_CONF_CHECK_I(has_audio);
 
@@ -10810,6 +10661,9 @@ intel_pipe_config_compare(struct drm_device *dev,
 	PIPE_CONF_CHECK_X(dpll_hw_state.fp0);
 	PIPE_CONF_CHECK_X(dpll_hw_state.fp1);
 	PIPE_CONF_CHECK_X(dpll_hw_state.wrpll);
+	PIPE_CONF_CHECK_X(dpll_hw_state.ctrl1);
+	PIPE_CONF_CHECK_X(dpll_hw_state.cfgcr1);
+	PIPE_CONF_CHECK_X(dpll_hw_state.cfgcr2);
 
 	if (IS_G4X(dev) || INTEL_INFO(dev)->gen >= 5)
 		PIPE_CONF_CHECK_I(pipe_bpp);
@@ -10827,6 +10681,56 @@ intel_pipe_config_compare(struct drm_device *dev,
 	return true;
 }
 
+static void check_wm_state(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct skl_ddb_allocation hw_ddb, *sw_ddb;
+	struct intel_crtc *intel_crtc;
+	int plane;
+
+	if (INTEL_INFO(dev)->gen < 9)
+		return;
+
+	skl_ddb_get_hw_state(dev_priv, &hw_ddb);
+	sw_ddb = &dev_priv->wm.skl_hw.ddb;
+
+	for_each_intel_crtc(dev, intel_crtc) {
+		struct skl_ddb_entry *hw_entry, *sw_entry;
+		const enum pipe pipe = intel_crtc->pipe;
+
+		if (!intel_crtc->active)
+			continue;
+
+		/* planes */
+		for_each_plane(pipe, plane) {
+			hw_entry = &hw_ddb.plane[pipe][plane];
+			sw_entry = &sw_ddb->plane[pipe][plane];
+
+			if (skl_ddb_entry_equal(hw_entry, sw_entry))
+				continue;
+
+			DRM_ERROR("mismatch in DDB state pipe %c plane %d "
+				  "(expected (%u,%u), found (%u,%u))\n",
+				  pipe_name(pipe), plane + 1,
+				  sw_entry->start, sw_entry->end,
+				  hw_entry->start, hw_entry->end);
+		}
+
+		/* cursor */
+		hw_entry = &hw_ddb.cursor[pipe];
+		sw_entry = &sw_ddb->cursor[pipe];
+
+		if (skl_ddb_entry_equal(hw_entry, sw_entry))
+			continue;
+
+		DRM_ERROR("mismatch in DDB state pipe %c cursor "
+			  "(expected (%u,%u), found (%u,%u))\n",
+			  pipe_name(pipe),
+			  sw_entry->start, sw_entry->end,
+			  hw_entry->start, hw_entry->end);
+	}
+}
+
 static void
 check_connector_state(struct drm_device *dev)
 {
@@ -10993,9 +10897,9 @@ check_shared_dpll_state(struct drm_device *dev)
 
 		active = pll->get_hw_state(dev_priv, pll, &dpll_hw_state);
 
-		WARN(pll->active > pll->refcount,
+		WARN(pll->active > hweight32(pll->config.crtc_mask),
 		     "more active pll users than references: %i vs %i\n",
-		     pll->active, pll->refcount);
+		     pll->active, hweight32(pll->config.crtc_mask));
 		WARN(pll->active && !pll->on,
 		     "pll in active use but not on in sw tracking\n");
 		WARN(pll->on && !pll->active,
@@ -11013,11 +10917,11 @@ check_shared_dpll_state(struct drm_device *dev)
 		WARN(pll->active != active_crtcs,
 		     "pll active crtcs mismatch (expected %i, found %i)\n",
 		     pll->active, active_crtcs);
-		WARN(pll->refcount != enabled_crtcs,
+		WARN(hweight32(pll->config.crtc_mask) != enabled_crtcs,
 		     "pll enabled crtcs mismatch (expected %i, found %i)\n",
-		     pll->refcount, enabled_crtcs);
+		     hweight32(pll->config.crtc_mask), enabled_crtcs);
 
-		WARN(pll->on && memcmp(&pll->hw_state, &dpll_hw_state,
+		WARN(pll->on && memcmp(&pll->config.hw_state, &dpll_hw_state,
 				       sizeof(dpll_hw_state)),
 		     "pll hw state mismatch\n");
 	}
@@ -11026,6 +10930,7 @@ check_shared_dpll_state(struct drm_device *dev)
 void
 intel_modeset_check_state(struct drm_device *dev)
 {
+	check_wm_state(dev);
 	check_connector_state(dev);
 	check_encoder_state(dev);
 	check_crtc_state(dev);
@@ -11076,50 +10981,67 @@ static void update_scanline_offset(struct intel_crtc *crtc)
 
 		crtc->scanline_offset = vtotal - 1;
 	} else if (HAS_DDI(dev) &&
-		   intel_pipe_has_type(&crtc->base, INTEL_OUTPUT_HDMI)) {
+		   intel_pipe_has_type(crtc, INTEL_OUTPUT_HDMI)) {
 		crtc->scanline_offset = 2;
 	} else
 		crtc->scanline_offset = 1;
 }
 
+static struct intel_crtc_config *
+intel_modeset_compute_config(struct drm_crtc *crtc,
+			     struct drm_display_mode *mode,
+			     struct drm_framebuffer *fb,
+			     unsigned *modeset_pipes,
+			     unsigned *prepare_pipes,
+			     unsigned *disable_pipes)
+{
+	struct intel_crtc_config *pipe_config = NULL;
+
+	intel_modeset_affected_pipes(crtc, modeset_pipes,
+				     prepare_pipes, disable_pipes);
+
+	if ((*modeset_pipes) == 0)
+		goto out;
+
+	/*
+	 * Note this needs changes when we start tracking multiple modes
+	 * and crtcs.  At that point we'll need to compute the whole config
+	 * (i.e. one pipe_config for each crtc) rather than just the one
+	 * for this crtc.
+	 */
+	pipe_config = intel_modeset_pipe_config(crtc, fb, mode);
+	if (IS_ERR(pipe_config)) {
+		goto out;
+	}
+	intel_dump_pipe_config(to_intel_crtc(crtc), pipe_config,
+			       "[modeset]");
+
+out:
+	return pipe_config;
+}
+
 static int __intel_set_mode(struct drm_crtc *crtc,
 			    struct drm_display_mode *mode,
-			    int x, int y, struct drm_framebuffer *fb)
+			    int x, int y, struct drm_framebuffer *fb,
+			    struct intel_crtc_config *pipe_config,
+			    unsigned modeset_pipes,
+			    unsigned prepare_pipes,
+			    unsigned disable_pipes)
 {
 	struct drm_device *dev = crtc->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_display_mode *saved_mode;
-	struct intel_crtc_config *pipe_config = NULL;
 	struct intel_crtc *intel_crtc;
-	unsigned disable_pipes, prepare_pipes, modeset_pipes;
 	int ret = 0;
 
 	saved_mode = kmalloc(sizeof(*saved_mode), GFP_KERNEL);
 	if (!saved_mode)
 		return -ENOMEM;
 
-	intel_modeset_affected_pipes(crtc, &modeset_pipes,
-				     &prepare_pipes, &disable_pipes);
-
 	*saved_mode = crtc->mode;
 
-	/* Hack: Because we don't (yet) support global modeset on multiple
-	 * crtcs, we don't keep track of the new mode for more than one crtc.
-	 * Hence simply check whether any bit is set in modeset_pipes in all the
-	 * pieces of code that are not yet converted to deal with mutliple crtcs
-	 * changing their mode at the same time. */
-	if (modeset_pipes) {
-		pipe_config = intel_modeset_pipe_config(crtc, fb, mode);
-		if (IS_ERR(pipe_config)) {
-			ret = PTR_ERR(pipe_config);
-			pipe_config = NULL;
-
-			goto out;
-		}
-		intel_dump_pipe_config(to_intel_crtc(crtc), pipe_config,
-				       "[modeset]");
+	if (modeset_pipes)
 		to_intel_crtc(crtc)->new_config = pipe_config;
-	}
 
 	/*
 	 * See if the config requires any additional preparation, e.g.
@@ -11135,6 +11057,22 @@ static int __intel_set_mode(struct drm_crtc *crtc,
 		prepare_pipes &= ~disable_pipes;
 	}
 
+	if (dev_priv->display.crtc_compute_clock) {
+		unsigned clear_pipes = modeset_pipes | disable_pipes;
+
+		ret = intel_shared_dpll_start_config(dev_priv, clear_pipes);
+		if (ret)
+			goto done;
+
+		for_each_intel_crtc_masked(dev, modeset_pipes, intel_crtc) {
+			ret = dev_priv->display.crtc_compute_clock(intel_crtc);
+			if (ret) {
+				intel_shared_dpll_abort_config(dev_priv);
+				goto done;
+			}
+		}
+	}
+
 	for_each_intel_crtc_masked(dev, disable_pipes, intel_crtc)
 		intel_crtc_disable(&intel_crtc->base);
 
@@ -11145,6 +11083,10 @@ static int __intel_set_mode(struct drm_crtc *crtc,
 
 	/* crtc->mode is already used by the ->mode_set callbacks, hence we need
 	 * to set it here already despite that we pass it down the callchain.
+	 *
+	 * Note we'll need to fix this up when we start tracking multiple
+	 * pipes; here we assume a single modeset_pipe and only track the
+	 * single crtc and mode.
 	 */
 	if (modeset_pipes) {
 		crtc->mode = *mode;
@@ -11166,8 +11108,7 @@ static int __intel_set_mode(struct drm_crtc *crtc,
 	 * update the the output configuration. */
 	intel_modeset_update_state(dev, prepare_pipes);
 
-	if (dev_priv->display.modeset_global_resources)
-		dev_priv->display.modeset_global_resources(dev);
+	modeset_update_crtc_power_domains(dev);
 
 	/* Set up the DPLL and any encoders state that needs to adjust or depend
 	 * on the DPLL.
@@ -11178,9 +11119,7 @@ static int __intel_set_mode(struct drm_crtc *crtc,
 		struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 
 		mutex_lock(&dev->struct_mutex);
-		ret = intel_pin_and_fence_fb_obj(dev,
-						 obj,
-						 NULL);
+		ret = intel_pin_and_fence_fb_obj(crtc->primary, fb, NULL);
 		if (ret != 0) {
 			DRM_ERROR("pin & fence failed\n");
 			mutex_unlock(&dev->struct_mutex);
@@ -11195,11 +11134,6 @@ static int __intel_set_mode(struct drm_crtc *crtc,
 		crtc->primary->fb = fb;
 		crtc->x = x;
 		crtc->y = y;
-
-		ret = dev_priv->display.crtc_mode_set(&intel_crtc->base,
-						      x, y, fb);
-		if (ret)
-			goto done;
 	}
 
 	/* Now enable the clocks, plane, pipe, and connectors that we set up. */
@@ -11214,19 +11148,23 @@ done:
 	if (ret && crtc->enabled)
 		crtc->mode = *saved_mode;
 
-out:
 	kfree(pipe_config);
 	kfree(saved_mode);
 	return ret;
 }
 
-static int intel_set_mode(struct drm_crtc *crtc,
-			  struct drm_display_mode *mode,
-			  int x, int y, struct drm_framebuffer *fb)
+static int intel_set_mode_pipes(struct drm_crtc *crtc,
+				struct drm_display_mode *mode,
+				int x, int y, struct drm_framebuffer *fb,
+				struct intel_crtc_config *pipe_config,
+				unsigned modeset_pipes,
+				unsigned prepare_pipes,
+				unsigned disable_pipes)
 {
 	int ret;
 
-	ret = __intel_set_mode(crtc, mode, x, y, fb);
+	ret = __intel_set_mode(crtc, mode, x, y, fb, pipe_config, modeset_pipes,
+			       prepare_pipes, disable_pipes);
 
 	if (ret == 0)
 		intel_modeset_check_state(crtc->dev);
@@ -11234,6 +11172,26 @@ static int intel_set_mode(struct drm_crtc *crtc,
 	return ret;
 }
 
+static int intel_set_mode(struct drm_crtc *crtc,
+			  struct drm_display_mode *mode,
+			  int x, int y, struct drm_framebuffer *fb)
+{
+	struct intel_crtc_config *pipe_config;
+	unsigned modeset_pipes, prepare_pipes, disable_pipes;
+
+	pipe_config = intel_modeset_compute_config(crtc, mode, fb,
+						   &modeset_pipes,
+						   &prepare_pipes,
+						   &disable_pipes);
+
+	if (IS_ERR(pipe_config))
+		return PTR_ERR(pipe_config);
+
+	return intel_set_mode_pipes(crtc, mode, x, y, fb, pipe_config,
+				    modeset_pipes, prepare_pipes,
+				    disable_pipes);
+}
+
 void intel_crtc_restore_mode(struct drm_crtc *crtc)
 {
 	intel_set_mode(crtc, &crtc->mode, crtc->x, crtc->y, crtc->primary->fb);
@@ -11562,6 +11520,8 @@ static int intel_crtc_set_config(struct drm_mode_set *set)
 	struct drm_device *dev;
 	struct drm_mode_set save_set;
 	struct intel_set_config *config;
+	struct intel_crtc_config *pipe_config;
+	unsigned modeset_pipes, prepare_pipes, disable_pipes;
 	int ret;
 
 	BUG_ON(!set);
@@ -11607,9 +11567,38 @@ static int intel_crtc_set_config(struct drm_mode_set *set)
 	if (ret)
 		goto fail;
 
+	pipe_config = intel_modeset_compute_config(set->crtc, set->mode,
+						   set->fb,
+						   &modeset_pipes,
+						   &prepare_pipes,
+						   &disable_pipes);
+	if (IS_ERR(pipe_config)) {
+		ret = PTR_ERR(pipe_config);
+		goto fail;
+	} else if (pipe_config) {
+		if (pipe_config->has_audio !=
+		    to_intel_crtc(set->crtc)->config.has_audio)
+			config->mode_changed = true;
+
+		/*
+		 * Note we have an issue here with infoframes: current code
+		 * only updates them on the full mode set path per hw
+		 * requirements.  So here we should be checking for any
+		 * required changes and forcing a mode set.
+		 */
+	}
+
+	/* set_mode will free it in the mode_changed case */
+	if (!config->mode_changed)
+		kfree(pipe_config);
+
+	intel_update_pipe_size(to_intel_crtc(set->crtc));
+
 	if (config->mode_changed) {
-		ret = intel_set_mode(set->crtc, set->mode,
-				     set->x, set->y, set->fb);
+		ret = intel_set_mode_pipes(set->crtc, set->mode,
+					   set->x, set->y, set->fb, pipe_config,
+					   modeset_pipes, prepare_pipes,
+					   disable_pipes);
 	} else if (config->fb_changed) {
 		struct intel_crtc *intel_crtc = to_intel_crtc(set->crtc);
 
@@ -11679,7 +11668,7 @@ static bool ibx_pch_dpll_get_hw_state(struct drm_i915_private *dev_priv,
 {
 	uint32_t val;
 
-	if (!intel_display_power_enabled(dev_priv, POWER_DOMAIN_PLLS))
+	if (!intel_display_power_is_enabled(dev_priv, POWER_DOMAIN_PLLS))
 		return false;
 
 	val = I915_READ(PCH_DPLL(pll->id));
@@ -11693,8 +11682,8 @@ static bool ibx_pch_dpll_get_hw_state(struct drm_i915_private *dev_priv,
 static void ibx_pch_dpll_mode_set(struct drm_i915_private *dev_priv,
 				  struct intel_shared_dpll *pll)
 {
-	I915_WRITE(PCH_FP0(pll->id), pll->hw_state.fp0);
-	I915_WRITE(PCH_FP1(pll->id), pll->hw_state.fp1);
+	I915_WRITE(PCH_FP0(pll->id), pll->config.hw_state.fp0);
+	I915_WRITE(PCH_FP1(pll->id), pll->config.hw_state.fp1);
 }
 
 static void ibx_pch_dpll_enable(struct drm_i915_private *dev_priv,
@@ -11703,7 +11692,7 @@ static void ibx_pch_dpll_enable(struct drm_i915_private *dev_priv,
 	/* PCH refclock must be enabled first */
 	ibx_assert_pch_refclk_enabled(dev_priv);
 
-	I915_WRITE(PCH_DPLL(pll->id), pll->hw_state.dpll);
+	I915_WRITE(PCH_DPLL(pll->id), pll->config.hw_state.dpll);
 
 	/* Wait for the clocks to stabilize. */
 	POSTING_READ(PCH_DPLL(pll->id));
@@ -11714,7 +11703,7 @@ static void ibx_pch_dpll_enable(struct drm_i915_private *dev_priv,
 	 *
 	 * So write it again.
 	 */
-	I915_WRITE(PCH_DPLL(pll->id), pll->hw_state.dpll);
+	I915_WRITE(PCH_DPLL(pll->id), pll->config.hw_state.dpll);
 	POSTING_READ(PCH_DPLL(pll->id));
 	udelay(200);
 }
@@ -11813,161 +11802,195 @@ disable_unpin:
 }
 
 static int
-intel_primary_plane_setplane(struct drm_plane *plane, struct drm_crtc *crtc,
-			     struct drm_framebuffer *fb, int crtc_x, int crtc_y,
-			     unsigned int crtc_w, unsigned int crtc_h,
-			     uint32_t src_x, uint32_t src_y,
-			     uint32_t src_w, uint32_t src_h)
+intel_check_primary_plane(struct drm_plane *plane,
+			  struct intel_plane_state *state)
 {
+	struct drm_crtc *crtc = state->crtc;
+	struct drm_framebuffer *fb = state->fb;
+	struct drm_rect *dest = &state->dst;
+	struct drm_rect *src = &state->src;
+	const struct drm_rect *clip = &state->clip;
+
+	return drm_plane_helper_check_update(plane, crtc, fb,
+					     src, dest, clip,
+					     DRM_PLANE_HELPER_NO_SCALING,
+					     DRM_PLANE_HELPER_NO_SCALING,
+					     false, true, &state->visible);
+}
+
+static int
+intel_prepare_primary_plane(struct drm_plane *plane,
+			    struct intel_plane_state *state)
+{
+	struct drm_crtc *crtc = state->crtc;
+	struct drm_framebuffer *fb = state->fb;
 	struct drm_device *dev = crtc->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	enum pipe pipe = intel_crtc->pipe;
 	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	struct drm_i915_gem_object *old_obj = intel_fb_obj(plane->fb);
-	struct drm_rect dest = {
-		/* integer pixels */
-		.x1 = crtc_x,
-		.y1 = crtc_y,
-		.x2 = crtc_x + crtc_w,
-		.y2 = crtc_y + crtc_h,
-	};
-	struct drm_rect src = {
-		/* 16.16 fixed point */
-		.x1 = src_x,
-		.y1 = src_y,
-		.x2 = src_x + src_w,
-		.y2 = src_y + src_h,
-	};
-	const struct drm_rect clip = {
-		/* integer pixels */
-		.x2 = intel_crtc->active ? intel_crtc->config.pipe_src_w : 0,
-		.y2 = intel_crtc->active ? intel_crtc->config.pipe_src_h : 0,
-	};
-	const struct {
-		int crtc_x, crtc_y;
-		unsigned int crtc_w, crtc_h;
-		uint32_t src_x, src_y, src_w, src_h;
-	} orig = {
-		.crtc_x = crtc_x,
-		.crtc_y = crtc_y,
-		.crtc_w = crtc_w,
-		.crtc_h = crtc_h,
-		.src_x = src_x,
-		.src_y = src_y,
-		.src_w = src_w,
-		.src_h = src_h,
-	};
-	struct intel_plane *intel_plane = to_intel_plane(plane);
-	bool visible;
 	int ret;
 
-	ret = drm_plane_helper_check_update(plane, crtc, fb,
-					    &src, &dest, &clip,
-					    DRM_PLANE_HELPER_NO_SCALING,
-					    DRM_PLANE_HELPER_NO_SCALING,
-					    false, true, &visible);
+	intel_crtc_wait_for_pending_flips(crtc);
 
-	if (ret)
-		return ret;
+	if (intel_crtc_has_pending_flip(crtc)) {
+		DRM_ERROR("pipe is still busy with an old pageflip\n");
+		return -EBUSY;
+	}
 
-	/*
-	 * If the CRTC isn't enabled, we're just pinning the framebuffer,
-	 * updating the fb pointer, and returning without touching the
-	 * hardware.  This allows us to later do a drmModeSetCrtc with fb=-1 to
-	 * turn on the display with all planes setup as desired.
-	 */
-	if (!crtc->enabled) {
+	if (old_obj != obj) {
 		mutex_lock(&dev->struct_mutex);
-
-		/*
-		 * If we already called setplane while the crtc was disabled,
-		 * we may have an fb pinned; unpin it.
-		 */
-		if (plane->fb)
-			intel_unpin_fb_obj(old_obj);
-
-		i915_gem_track_fb(old_obj, obj,
-				  INTEL_FRONTBUFFER_PRIMARY(intel_crtc->pipe));
-
-		/* Pin and return without programming hardware */
-		ret = intel_pin_and_fence_fb_obj(dev, obj, NULL);
+		ret = intel_pin_and_fence_fb_obj(plane, fb, NULL);
+		if (ret == 0)
+			i915_gem_track_fb(old_obj, obj,
+					  INTEL_FRONTBUFFER_PRIMARY(pipe));
 		mutex_unlock(&dev->struct_mutex);
-
-		return ret;
+		if (ret != 0) {
+			DRM_DEBUG_KMS("pin & fence failed\n");
+			return ret;
+		}
 	}
 
-	intel_crtc_wait_for_pending_flips(crtc);
+	return 0;
+}
 
-	/*
-	 * If clipping results in a non-visible primary plane, we'll disable
-	 * the primary plane.  Note that this is a bit different than what
-	 * happens if userspace explicitly disables the plane by passing fb=0
-	 * because plane->fb still gets set and pinned.
-	 */
-	if (!visible) {
-		mutex_lock(&dev->struct_mutex);
+static void
+intel_commit_primary_plane(struct drm_plane *plane,
+			   struct intel_plane_state *state)
+{
+	struct drm_crtc *crtc = state->crtc;
+	struct drm_framebuffer *fb = state->fb;
+	struct drm_device *dev = crtc->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	enum pipe pipe = intel_crtc->pipe;
+	struct drm_framebuffer *old_fb = plane->fb;
+	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
+	struct drm_i915_gem_object *old_obj = intel_fb_obj(plane->fb);
+	struct intel_plane *intel_plane = to_intel_plane(plane);
+	struct drm_rect *src = &state->src;
 
+	crtc->primary->fb = fb;
+	crtc->x = src->x1 >> 16;
+	crtc->y = src->y1 >> 16;
+
+	intel_plane->crtc_x = state->orig_dst.x1;
+	intel_plane->crtc_y = state->orig_dst.y1;
+	intel_plane->crtc_w = drm_rect_width(&state->orig_dst);
+	intel_plane->crtc_h = drm_rect_height(&state->orig_dst);
+	intel_plane->src_x = state->orig_src.x1;
+	intel_plane->src_y = state->orig_src.y1;
+	intel_plane->src_w = drm_rect_width(&state->orig_src);
+	intel_plane->src_h = drm_rect_height(&state->orig_src);
+	intel_plane->obj = obj;
+
+	if (intel_crtc->active) {
 		/*
-		 * Try to pin the new fb first so that we can bail out if we
-		 * fail.
+		 * FBC does not work on some platforms for rotated
+		 * planes, so disable it when rotation is not 0 and
+		 * update it when rotation is set back to 0.
+		 *
+		 * FIXME: This is redundant with the fbc update done in
+		 * the primary plane enable function except that that
+		 * one is done too late. We eventually need to unify
+		 * this.
 		 */
-		if (plane->fb != fb) {
-			ret = intel_pin_and_fence_fb_obj(dev, obj, NULL);
-			if (ret) {
-				mutex_unlock(&dev->struct_mutex);
-				return ret;
-			}
+		if (intel_crtc->primary_enabled &&
+		    INTEL_INFO(dev)->gen <= 4 && !IS_G4X(dev) &&
+		    dev_priv->fbc.plane == intel_crtc->plane &&
+		    intel_plane->rotation != BIT(DRM_ROTATE_0)) {
+			intel_disable_fbc(dev);
 		}
 
-		i915_gem_track_fb(old_obj, obj,
-				  INTEL_FRONTBUFFER_PRIMARY(intel_crtc->pipe));
-
-		if (intel_crtc->primary_enabled)
-			intel_disable_primary_hw_plane(plane, crtc);
+		if (state->visible) {
+			bool was_enabled = intel_crtc->primary_enabled;
 
+			/* FIXME: kill this fastboot hack */
+			intel_update_pipe_size(intel_crtc);
 
-		if (plane->fb != fb)
-			if (plane->fb)
-				intel_unpin_fb_obj(old_obj);
+			intel_crtc->primary_enabled = true;
 
-		mutex_unlock(&dev->struct_mutex);
+			dev_priv->display.update_primary_plane(crtc, plane->fb,
+					crtc->x, crtc->y);
 
-	} else {
-		if (intel_crtc && intel_crtc->active &&
-		    intel_crtc->primary_enabled) {
 			/*
-			 * FBC does not work on some platforms for rotated
-			 * planes, so disable it when rotation is not 0 and
-			 * update it when rotation is set back to 0.
-			 *
-			 * FIXME: This is redundant with the fbc update done in
-			 * the primary plane enable function except that that
-			 * one is done too late. We eventually need to unify
-			 * this.
+			 * BDW signals flip done immediately if the plane
+			 * is disabled, even if the plane enable is already
+			 * armed to occur at the next vblank :(
 			 */
-			if (INTEL_INFO(dev)->gen <= 4 && !IS_G4X(dev) &&
-			    dev_priv->fbc.plane == intel_crtc->plane &&
-			    intel_plane->rotation != BIT(DRM_ROTATE_0)) {
-				intel_disable_fbc(dev);
-			}
+			if (IS_BROADWELL(dev) && !was_enabled)
+				intel_wait_for_vblank(dev, intel_crtc->pipe);
+		} else {
+			/*
+			 * If clipping results in a non-visible primary plane,
+			 * we'll disable the primary plane.  Note that this is
+			 * a bit different than what happens if userspace
+			 * explicitly disables the plane by passing fb=0
+			 * because plane->fb still gets set and pinned.
+			 */
+			intel_disable_primary_hw_plane(plane, crtc);
 		}
-		ret = intel_pipe_set_base(crtc, src.x1, src.y1, fb);
-		if (ret)
-			return ret;
 
-		if (!intel_crtc->primary_enabled)
-			intel_enable_primary_hw_plane(plane, crtc);
+		intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_PRIMARY(pipe));
+
+		mutex_lock(&dev->struct_mutex);
+		intel_update_fbc(dev);
+		mutex_unlock(&dev->struct_mutex);
 	}
 
-	intel_plane->crtc_x = orig.crtc_x;
-	intel_plane->crtc_y = orig.crtc_y;
-	intel_plane->crtc_w = orig.crtc_w;
-	intel_plane->crtc_h = orig.crtc_h;
-	intel_plane->src_x = orig.src_x;
-	intel_plane->src_y = orig.src_y;
-	intel_plane->src_w = orig.src_w;
-	intel_plane->src_h = orig.src_h;
-	intel_plane->obj = obj;
+	if (old_fb && old_fb != fb) {
+		if (intel_crtc->active)
+			intel_wait_for_vblank(dev, intel_crtc->pipe);
+
+		mutex_lock(&dev->struct_mutex);
+		intel_unpin_fb_obj(old_obj);
+		mutex_unlock(&dev->struct_mutex);
+	}
+}
+
+static int
+intel_primary_plane_setplane(struct drm_plane *plane, struct drm_crtc *crtc,
+			     struct drm_framebuffer *fb, int crtc_x, int crtc_y,
+			     unsigned int crtc_w, unsigned int crtc_h,
+			     uint32_t src_x, uint32_t src_y,
+			     uint32_t src_w, uint32_t src_h)
+{
+	struct intel_plane_state state;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	int ret;
+
+	state.crtc = crtc;
+	state.fb = fb;
+
+	/* sample coordinates in 16.16 fixed point */
+	state.src.x1 = src_x;
+	state.src.x2 = src_x + src_w;
+	state.src.y1 = src_y;
+	state.src.y2 = src_y + src_h;
+
+	/* integer pixels */
+	state.dst.x1 = crtc_x;
+	state.dst.x2 = crtc_x + crtc_w;
+	state.dst.y1 = crtc_y;
+	state.dst.y2 = crtc_y + crtc_h;
+
+	state.clip.x1 = 0;
+	state.clip.y1 = 0;
+	state.clip.x2 = intel_crtc->active ? intel_crtc->config.pipe_src_w : 0;
+	state.clip.y2 = intel_crtc->active ? intel_crtc->config.pipe_src_h : 0;
+
+	state.orig_src = state.src;
+	state.orig_dst = state.dst;
+
+	ret = intel_check_primary_plane(plane, &state);
+	if (ret)
+		return ret;
+
+	ret = intel_prepare_primary_plane(plane, &state);
+	if (ret)
+		return ret;
+
+	intel_commit_primary_plane(plane, &state);
 
 	return 0;
 }
@@ -12046,51 +12069,92 @@ intel_cursor_plane_disable(struct drm_plane *plane)
 }
 
 static int
-intel_cursor_plane_update(struct drm_plane *plane, struct drm_crtc *crtc,
-			  struct drm_framebuffer *fb, int crtc_x, int crtc_y,
-			  unsigned int crtc_w, unsigned int crtc_h,
-			  uint32_t src_x, uint32_t src_y,
-			  uint32_t src_w, uint32_t src_h)
+intel_check_cursor_plane(struct drm_plane *plane,
+			 struct intel_plane_state *state)
 {
-	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-	struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb);
-	struct drm_i915_gem_object *obj = intel_fb->obj;
-	struct drm_rect dest = {
-		/* integer pixels */
-		.x1 = crtc_x,
-		.y1 = crtc_y,
-		.x2 = crtc_x + crtc_w,
-		.y2 = crtc_y + crtc_h,
-	};
-	struct drm_rect src = {
-		/* 16.16 fixed point */
-		.x1 = src_x,
-		.y1 = src_y,
-		.x2 = src_x + src_w,
-		.y2 = src_y + src_h,
-	};
-	const struct drm_rect clip = {
-		/* integer pixels */
-		.x2 = intel_crtc->active ? intel_crtc->config.pipe_src_w : 0,
-		.y2 = intel_crtc->active ? intel_crtc->config.pipe_src_h : 0,
-	};
-	bool visible;
+	struct drm_crtc *crtc = state->crtc;
+	struct drm_device *dev = crtc->dev;
+	struct drm_framebuffer *fb = state->fb;
+	struct drm_rect *dest = &state->dst;
+	struct drm_rect *src = &state->src;
+	const struct drm_rect *clip = &state->clip;
+	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
+	int crtc_w, crtc_h;
+	unsigned stride;
 	int ret;
 
 	ret = drm_plane_helper_check_update(plane, crtc, fb,
-					    &src, &dest, &clip,
+					    src, dest, clip,
 					    DRM_PLANE_HELPER_NO_SCALING,
 					    DRM_PLANE_HELPER_NO_SCALING,
-					    true, true, &visible);
+					    true, true, &state->visible);
 	if (ret)
 		return ret;
 
-	crtc->cursor_x = crtc_x;
-	crtc->cursor_y = crtc_y;
+
+	/* if we want to turn off the cursor ignore width and height */
+	if (!obj)
+		return 0;
+
+	/* Check for which cursor types we support */
+	crtc_w = drm_rect_width(&state->orig_dst);
+	crtc_h = drm_rect_height(&state->orig_dst);
+	if (!cursor_size_ok(dev, crtc_w, crtc_h)) {
+		DRM_DEBUG("Cursor dimension not supported\n");
+		return -EINVAL;
+	}
+
+	stride = roundup_pow_of_two(crtc_w) * 4;
+	if (obj->base.size < stride * crtc_h) {
+		DRM_DEBUG_KMS("buffer is too small\n");
+		return -ENOMEM;
+	}
+
+	if (fb == crtc->cursor->fb)
+		return 0;
+
+	/* we only need to pin inside GTT if cursor is non-phy */
+	mutex_lock(&dev->struct_mutex);
+	if (!INTEL_INFO(dev)->cursor_needs_physical && obj->tiling_mode) {
+		DRM_DEBUG_KMS("cursor cannot be tiled\n");
+		ret = -EINVAL;
+	}
+	mutex_unlock(&dev->struct_mutex);
+
+	return ret;
+}
+
+static int
+intel_commit_cursor_plane(struct drm_plane *plane,
+			  struct intel_plane_state *state)
+{
+	struct drm_crtc *crtc = state->crtc;
+	struct drm_framebuffer *fb = state->fb;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	struct intel_plane *intel_plane = to_intel_plane(plane);
+	struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb);
+	struct drm_i915_gem_object *obj = intel_fb->obj;
+	int crtc_w, crtc_h;
+
+	crtc->cursor_x = state->orig_dst.x1;
+	crtc->cursor_y = state->orig_dst.y1;
+
+	intel_plane->crtc_x = state->orig_dst.x1;
+	intel_plane->crtc_y = state->orig_dst.y1;
+	intel_plane->crtc_w = drm_rect_width(&state->orig_dst);
+	intel_plane->crtc_h = drm_rect_height(&state->orig_dst);
+	intel_plane->src_x = state->orig_src.x1;
+	intel_plane->src_y = state->orig_src.y1;
+	intel_plane->src_w = drm_rect_width(&state->orig_src);
+	intel_plane->src_h = drm_rect_height(&state->orig_src);
+	intel_plane->obj = obj;
+
 	if (fb != crtc->cursor->fb) {
+		crtc_w = drm_rect_width(&state->orig_dst);
+		crtc_h = drm_rect_height(&state->orig_dst);
 		return intel_crtc_cursor_set_obj(crtc, obj, crtc_w, crtc_h);
 	} else {
-		intel_crtc_update_cursor(crtc, visible);
+		intel_crtc_update_cursor(crtc, state->visible);
 
 		intel_frontbuffer_flip(crtc->dev,
 				       INTEL_FRONTBUFFER_CURSOR(intel_crtc->pipe));
@@ -12098,10 +12162,53 @@ intel_cursor_plane_update(struct drm_plane *plane, struct drm_crtc *crtc,
 		return 0;
 	}
 }
+
+static int
+intel_cursor_plane_update(struct drm_plane *plane, struct drm_crtc *crtc,
+			  struct drm_framebuffer *fb, int crtc_x, int crtc_y,
+			  unsigned int crtc_w, unsigned int crtc_h,
+			  uint32_t src_x, uint32_t src_y,
+			  uint32_t src_w, uint32_t src_h)
+{
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	struct intel_plane_state state;
+	int ret;
+
+	state.crtc = crtc;
+	state.fb = fb;
+
+	/* sample coordinates in 16.16 fixed point */
+	state.src.x1 = src_x;
+	state.src.x2 = src_x + src_w;
+	state.src.y1 = src_y;
+	state.src.y2 = src_y + src_h;
+
+	/* integer pixels */
+	state.dst.x1 = crtc_x;
+	state.dst.x2 = crtc_x + crtc_w;
+	state.dst.y1 = crtc_y;
+	state.dst.y2 = crtc_y + crtc_h;
+
+	state.clip.x1 = 0;
+	state.clip.y1 = 0;
+	state.clip.x2 = intel_crtc->active ? intel_crtc->config.pipe_src_w : 0;
+	state.clip.y2 = intel_crtc->active ? intel_crtc->config.pipe_src_h : 0;
+
+	state.orig_src = state.src;
+	state.orig_dst = state.dst;
+
+	ret = intel_check_cursor_plane(plane, &state);
+	if (ret)
+		return ret;
+
+	return intel_commit_cursor_plane(plane, &state);
+}
+
 static const struct drm_plane_funcs intel_cursor_plane_funcs = {
 	.update_plane = intel_cursor_plane_update,
 	.disable_plane = intel_cursor_plane_disable,
 	.destroy = intel_plane_destroy,
+	.set_property = intel_plane_set_property,
 };
 
 static struct drm_plane *intel_cursor_plane_create(struct drm_device *dev,
@@ -12117,12 +12224,26 @@ static struct drm_plane *intel_cursor_plane_create(struct drm_device *dev,
 	cursor->max_downscale = 1;
 	cursor->pipe = pipe;
 	cursor->plane = pipe;
+	cursor->rotation = BIT(DRM_ROTATE_0);
 
 	drm_universal_plane_init(dev, &cursor->base, 0,
 				 &intel_cursor_plane_funcs,
 				 intel_cursor_formats,
 				 ARRAY_SIZE(intel_cursor_formats),
 				 DRM_PLANE_TYPE_CURSOR);
+
+	if (INTEL_INFO(dev)->gen >= 4) {
+		if (!dev->mode_config.rotation_property)
+			dev->mode_config.rotation_property =
+				drm_mode_create_rotation_property(dev,
+							BIT(DRM_ROTATE_0) |
+							BIT(DRM_ROTATE_180));
+		if (dev->mode_config.rotation_property)
+			drm_object_attach_property(&cursor->base.base,
+				dev->mode_config.rotation_property,
+				cursor->rotation);
+	}
+
 	return &cursor->base;
 }
 
@@ -12178,6 +12299,8 @@ static void intel_crtc_init(struct drm_device *dev, int pipe)
 	dev_priv->plane_to_crtc_mapping[intel_crtc->plane] = &intel_crtc->base;
 	dev_priv->pipe_to_crtc_mapping[intel_crtc->pipe] = &intel_crtc->base;
 
+	INIT_WORK(&intel_crtc->mmio_flip.work, intel_mmio_flip_work_func);
+
 	drm_crtc_helper_add(&intel_crtc->base, &intel_helper_funcs);
 
 	WARN_ON(drm_crtc_index(&intel_crtc->base) != intel_crtc->pipe);
@@ -12198,7 +12321,7 @@ enum pipe intel_get_pipe_from_connector(struct intel_connector *connector)
 
 	WARN_ON(!drm_modeset_is_locked(&dev->mode_config.connection_mutex));
 
-	if (!encoder)
+	if (!encoder || WARN_ON(!encoder->crtc))
 		return INVALID_PIPE;
 
 	return to_intel_crtc(encoder->crtc)->pipe;
@@ -12286,7 +12409,10 @@ static bool intel_crt_present(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	if (IS_ULT(dev))
+	if (INTEL_INFO(dev)->gen >= 9)
+		return false;
+
+	if (IS_HSW_ULT(dev) || IS_BDW_ULT(dev))
 		return false;
 
 	if (IS_CHERRYVIEW(dev))
@@ -12430,7 +12556,7 @@ static void intel_setup_outputs(struct drm_device *dev)
 	if (SUPPORTS_TV(dev))
 		intel_tv_init(dev);
 
-	intel_edp_psr_init(dev);
+	intel_psr_init(dev);
 
 	for_each_intel_encoder(dev, encoder) {
 		encoder->base.possible_crtcs = encoder->crtc_mask;
@@ -12634,16 +12760,22 @@ static void intel_init_display(struct drm_device *dev)
 	if (HAS_DDI(dev)) {
 		dev_priv->display.get_pipe_config = haswell_get_pipe_config;
 		dev_priv->display.get_plane_config = ironlake_get_plane_config;
-		dev_priv->display.crtc_mode_set = haswell_crtc_mode_set;
+		dev_priv->display.crtc_compute_clock =
+			haswell_crtc_compute_clock;
 		dev_priv->display.crtc_enable = haswell_crtc_enable;
 		dev_priv->display.crtc_disable = haswell_crtc_disable;
 		dev_priv->display.off = ironlake_crtc_off;
-		dev_priv->display.update_primary_plane =
-			ironlake_update_primary_plane;
+		if (INTEL_INFO(dev)->gen >= 9)
+			dev_priv->display.update_primary_plane =
+				skylake_update_primary_plane;
+		else
+			dev_priv->display.update_primary_plane =
+				ironlake_update_primary_plane;
 	} else if (HAS_PCH_SPLIT(dev)) {
 		dev_priv->display.get_pipe_config = ironlake_get_pipe_config;
 		dev_priv->display.get_plane_config = ironlake_get_plane_config;
-		dev_priv->display.crtc_mode_set = ironlake_crtc_mode_set;
+		dev_priv->display.crtc_compute_clock =
+			ironlake_crtc_compute_clock;
 		dev_priv->display.crtc_enable = ironlake_crtc_enable;
 		dev_priv->display.crtc_disable = ironlake_crtc_disable;
 		dev_priv->display.off = ironlake_crtc_off;
@@ -12652,7 +12784,7 @@ static void intel_init_display(struct drm_device *dev)
 	} else if (IS_VALLEYVIEW(dev)) {
 		dev_priv->display.get_pipe_config = i9xx_get_pipe_config;
 		dev_priv->display.get_plane_config = i9xx_get_plane_config;
-		dev_priv->display.crtc_mode_set = i9xx_crtc_mode_set;
+		dev_priv->display.crtc_compute_clock = i9xx_crtc_compute_clock;
 		dev_priv->display.crtc_enable = valleyview_crtc_enable;
 		dev_priv->display.crtc_disable = i9xx_crtc_disable;
 		dev_priv->display.off = i9xx_crtc_off;
@@ -12661,7 +12793,7 @@ static void intel_init_display(struct drm_device *dev)
 	} else {
 		dev_priv->display.get_pipe_config = i9xx_get_pipe_config;
 		dev_priv->display.get_plane_config = i9xx_get_plane_config;
-		dev_priv->display.crtc_mode_set = i9xx_crtc_mode_set;
+		dev_priv->display.crtc_compute_clock = i9xx_crtc_compute_clock;
 		dev_priv->display.crtc_enable = i9xx_crtc_enable;
 		dev_priv->display.crtc_disable = i9xx_crtc_disable;
 		dev_priv->display.off = i9xx_crtc_off;
@@ -12698,31 +12830,20 @@ static void intel_init_display(struct drm_device *dev)
 		dev_priv->display.get_display_clock_speed =
 			i830_get_display_clock_speed;
 
-	if (IS_G4X(dev)) {
-		dev_priv->display.write_eld = g4x_write_eld;
-	} else if (IS_GEN5(dev)) {
+	if (IS_GEN5(dev)) {
 		dev_priv->display.fdi_link_train = ironlake_fdi_link_train;
-		dev_priv->display.write_eld = ironlake_write_eld;
 	} else if (IS_GEN6(dev)) {
 		dev_priv->display.fdi_link_train = gen6_fdi_link_train;
-		dev_priv->display.write_eld = ironlake_write_eld;
-		dev_priv->display.modeset_global_resources =
-			snb_modeset_global_resources;
 	} else if (IS_IVYBRIDGE(dev)) {
 		/* FIXME: detect B0+ stepping and use auto training */
 		dev_priv->display.fdi_link_train = ivb_manual_fdi_link_train;
-		dev_priv->display.write_eld = ironlake_write_eld;
 		dev_priv->display.modeset_global_resources =
 			ivb_modeset_global_resources;
 	} else if (IS_HASWELL(dev) || IS_BROADWELL(dev)) {
 		dev_priv->display.fdi_link_train = hsw_fdi_link_train;
-		dev_priv->display.write_eld = haswell_write_eld;
-		dev_priv->display.modeset_global_resources =
-			haswell_modeset_global_resources;
 	} else if (IS_VALLEYVIEW(dev)) {
 		dev_priv->display.modeset_global_resources =
 			valleyview_modeset_global_resources;
-		dev_priv->display.write_eld = ironlake_write_eld;
 	}
 
 	/* Default just returns -ENODEV to indicate unsupported */
@@ -12749,6 +12870,9 @@ static void intel_init_display(struct drm_device *dev)
 	case 8: /* FIXME(BDW): Check that the gen8 RCS flip works. */
 		dev_priv->display.queue_flip = intel_gen7_queue_flip;
 		break;
+	case 9:
+		dev_priv->display.queue_flip = intel_gen9_queue_flip;
+		break;
 	}
 
 	intel_panel_init_backlight_funcs(dev);
@@ -12953,11 +13077,6 @@ void intel_modeset_init_hw(struct drm_device *dev)
 	intel_enable_gt_powersave(dev);
 }
 
-void intel_modeset_suspend_hw(struct drm_device *dev)
-{
-	intel_suspend_hw(dev);
-}
-
 void intel_modeset_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -12983,6 +13102,7 @@ void intel_modeset_init(struct drm_device *dev)
 		return;
 
 	intel_init_display(dev);
+	intel_init_audio(dev);
 
 	if (IS_GEN2(dev)) {
 		dev->mode_config.max_width = 2048;
@@ -13293,7 +13413,7 @@ void i915_redisable_vga(struct drm_device *dev)
 	 * level, just check if the power well is enabled instead of trying to
 	 * follow the "don't touch the power well if we don't need it" policy
 	 * the rest of the driver uses. */
-	if (!intel_display_power_enabled(dev_priv, POWER_DOMAIN_VGA))
+	if (!intel_display_power_is_enabled(dev_priv, POWER_DOMAIN_VGA))
 		return;
 
 	i915_redisable_vga_power_on(dev);
@@ -13337,18 +13457,21 @@ static void intel_modeset_readout_hw_state(struct drm_device *dev)
 	for (i = 0; i < dev_priv->num_shared_dpll; i++) {
 		struct intel_shared_dpll *pll = &dev_priv->shared_dplls[i];
 
-		pll->on = pll->get_hw_state(dev_priv, pll, &pll->hw_state);
+		pll->on = pll->get_hw_state(dev_priv, pll,
+					    &pll->config.hw_state);
 		pll->active = 0;
+		pll->config.crtc_mask = 0;
 		for_each_intel_crtc(dev, crtc) {
-			if (crtc->active && intel_crtc_to_shared_dpll(crtc) == pll)
+			if (crtc->active && intel_crtc_to_shared_dpll(crtc) == pll) {
 				pll->active++;
+				pll->config.crtc_mask |= 1 << crtc->pipe;
+			}
 		}
-		pll->refcount = pll->active;
 
-		DRM_DEBUG_KMS("%s hw state readout: refcount %i, on %i\n",
-			      pll->name, pll->refcount, pll->on);
+		DRM_DEBUG_KMS("%s hw state readout: crtc_mask 0x%08x, on %i\n",
+			      pll->name, pll->config.crtc_mask, pll->on);
 
-		if (pll->refcount)
+		if (pll->config.crtc_mask)
 			intel_display_power_get(dev_priv, POWER_DOMAIN_PLLS);
 	}
 
@@ -13438,7 +13561,9 @@ void intel_modeset_setup_hw_state(struct drm_device *dev,
 		pll->on = false;
 	}
 
-	if (HAS_PCH_SPLIT(dev))
+	if (IS_GEN9(dev))
+		skl_wm_get_hw_state(dev);
+	else if (HAS_PCH_SPLIT(dev))
 		ilk_wm_get_hw_state(dev);
 
 	if (force_restore) {
@@ -13452,8 +13577,8 @@ void intel_modeset_setup_hw_state(struct drm_device *dev,
 			struct drm_crtc *crtc =
 				dev_priv->pipe_to_crtc_mapping[pipe];
 
-			__intel_set_mode(crtc, &crtc->mode, crtc->x, crtc->y,
-					 crtc->primary->fb);
+			intel_set_mode(crtc, &crtc->mode, crtc->x, crtc->y,
+				       crtc->primary->fb);
 		}
 	} else {
 		intel_modeset_update_staged_output_state(dev);
@@ -13464,6 +13589,7 @@ void intel_modeset_setup_hw_state(struct drm_device *dev,
 
 void intel_modeset_gem_init(struct drm_device *dev)
 {
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_crtc *c;
 	struct drm_i915_gem_object *obj;
 
@@ -13471,6 +13597,16 @@ void intel_modeset_gem_init(struct drm_device *dev)
 	intel_init_gt_powersave(dev);
 	mutex_unlock(&dev->struct_mutex);
 
+	/*
+	 * There may be no VBT; and if the BIOS enabled SSC we can
+	 * just keep using it to avoid unnecessary flicker.  Whereas if the
+	 * BIOS isn't using it, don't assume it will work even if the VBT
+	 * indicates as much.
+	 */
+	if (HAS_PCH_IBX(dev) || HAS_PCH_CPT(dev))
+		dev_priv->vbt.lvds_use_ssc = !!(I915_READ(PCH_DREF_CONTROL) &
+						DREF_SSC1_ENABLE);
+
 	intel_modeset_init_hw(dev);
 
 	intel_setup_overlay(dev);
@@ -13486,7 +13622,9 @@ void intel_modeset_gem_init(struct drm_device *dev)
 		if (obj == NULL)
 			continue;
 
-		if (intel_pin_and_fence_fb_obj(dev, obj, NULL)) {
+		if (intel_pin_and_fence_fb_obj(c->primary,
+					       c->primary->fb,
+					       NULL)) {
 			DRM_ERROR("failed to pin boot fb on pipe %d\n",
 				  to_intel_crtc(c)->pipe);
 			drm_framebuffer_unreference(c->primary->fb);
@@ -13494,6 +13632,8 @@ void intel_modeset_gem_init(struct drm_device *dev)
 		}
 	}
 	mutex_unlock(&dev->struct_mutex);
+
+	intel_backlight_register(dev);
 }
 
 void intel_connector_unregister(struct intel_connector *intel_connector)
@@ -13509,14 +13649,16 @@ void intel_modeset_cleanup(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_connector *connector;
 
+	intel_disable_gt_powersave(dev);
+
+	intel_backlight_unregister(dev);
+
 	/*
 	 * Interrupts and polling as the first thing to avoid creating havoc.
-	 * Too much stuff here (turning of rps, connectors, ...) would
+	 * Too much stuff here (turning of connectors, ...) would
 	 * experience fancy races otherwise.
 	 */
-	drm_irq_uninstall(dev);
-	intel_hpd_cancel_work(dev_priv);
-	dev_priv->pm._irqs_disabled = true;
+	intel_irq_uninstall(dev_priv);
 
 	/*
 	 * Due to the hpd irq storm handling the hotplug work can re-arm the
@@ -13530,8 +13672,6 @@ void intel_modeset_cleanup(struct drm_device *dev)
 
 	intel_disable_fbc(dev);
 
-	intel_disable_gt_powersave(dev);
-
 	ironlake_teardown_rc6(dev);
 
 	mutex_unlock(&dev->struct_mutex);
@@ -13671,8 +13811,8 @@ intel_display_capture_error_state(struct drm_device *dev)
 
 	for_each_pipe(dev_priv, i) {
 		error->pipe[i].power_domain_on =
-			intel_display_power_enabled_unlocked(dev_priv,
-							   POWER_DOMAIN_PIPE(i));
+			__intel_display_power_is_enabled(dev_priv,
+							 POWER_DOMAIN_PIPE(i));
 		if (!error->pipe[i].power_domain_on)
 			continue;
 
@@ -13707,7 +13847,7 @@ intel_display_capture_error_state(struct drm_device *dev)
 		enum transcoder cpu_transcoder = transcoders[i];
 
 		error->transcoder[i].power_domain_on =
-			intel_display_power_enabled_unlocked(dev_priv,
+			__intel_display_power_is_enabled(dev_priv,
 				POWER_DOMAIN_TRANSCODER(cpu_transcoder));
 		if (!error->transcoder[i].power_domain_on)
 			continue;
@@ -13791,9 +13931,8 @@ void intel_modeset_preclose(struct drm_device *dev, struct drm_file *file)
 
 	for_each_intel_crtc(dev, crtc) {
 		struct intel_unpin_work *work;
-		unsigned long irqflags;
 
-		spin_lock_irqsave(&dev->event_lock, irqflags);
+		spin_lock_irq(&dev->event_lock);
 
 		work = crtc->unpin_work;
 
@@ -13803,6 +13942,6 @@ void intel_modeset_preclose(struct drm_device *dev, struct drm_file *file)
 			work->event = NULL;
 		}
 
-		spin_unlock_irqrestore(&dev->event_lock, irqflags);
+		spin_unlock_irq(&dev->event_lock);
 	}
 }
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 4bcd91757321..5cecc20efa71 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -113,6 +113,9 @@ static struct intel_dp *intel_attached_dp(struct drm_connector *connector)
 static void intel_dp_link_down(struct intel_dp *intel_dp);
 static bool edp_panel_vdd_on(struct intel_dp *intel_dp);
 static void edp_panel_vdd_off(struct intel_dp *intel_dp, bool sync);
+static void vlv_init_panel_power_sequencer(struct intel_dp *intel_dp);
+static void vlv_steal_power_sequencer(struct drm_device *dev,
+				      enum pipe pipe);
 
 int
 intel_dp_max_link_bw(struct intel_dp *intel_dp)
@@ -224,8 +227,7 @@ intel_dp_mode_valid(struct drm_connector *connector,
 	return MODE_OK;
 }
 
-static uint32_t
-pack_aux(uint8_t *src, int src_bytes)
+uint32_t intel_dp_pack_aux(const uint8_t *src, int src_bytes)
 {
 	int	i;
 	uint32_t v = 0;
@@ -237,8 +239,7 @@ pack_aux(uint8_t *src, int src_bytes)
 	return v;
 }
 
-static void
-unpack_aux(uint32_t src, uint8_t *dst, int dst_bytes)
+void intel_dp_unpack_aux(uint32_t src, uint8_t *dst, int dst_bytes)
 {
 	int i;
 	if (dst_bytes > 4)
@@ -283,12 +284,10 @@ intel_hrawclk(struct drm_device *dev)
 
 static void
 intel_dp_init_panel_power_sequencer(struct drm_device *dev,
-				    struct intel_dp *intel_dp,
-				    struct edp_power_seq *out);
+				    struct intel_dp *intel_dp);
 static void
 intel_dp_init_panel_power_sequencer_registers(struct drm_device *dev,
-					      struct intel_dp *intel_dp,
-					      struct edp_power_seq *out);
+					      struct intel_dp *intel_dp);
 
 static void pps_lock(struct intel_dp *intel_dp)
 {
@@ -322,6 +321,66 @@ static void pps_unlock(struct intel_dp *intel_dp)
 	intel_display_power_put(dev_priv, power_domain);
 }
 
+static void
+vlv_power_sequencer_kick(struct intel_dp *intel_dp)
+{
+	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
+	struct drm_device *dev = intel_dig_port->base.base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	enum pipe pipe = intel_dp->pps_pipe;
+	bool pll_enabled;
+	uint32_t DP;
+
+	if (WARN(I915_READ(intel_dp->output_reg) & DP_PORT_EN,
+		 "skipping pipe %c power seqeuncer kick due to port %c being active\n",
+		 pipe_name(pipe), port_name(intel_dig_port->port)))
+		return;
+
+	DRM_DEBUG_KMS("kicking pipe %c power sequencer for port %c\n",
+		      pipe_name(pipe), port_name(intel_dig_port->port));
+
+	/* Preserve the BIOS-computed detected bit. This is
+	 * supposed to be read-only.
+	 */
+	DP = I915_READ(intel_dp->output_reg) & DP_DETECTED;
+	DP |= DP_VOLTAGE_0_4 | DP_PRE_EMPHASIS_0;
+	DP |= DP_PORT_WIDTH(1);
+	DP |= DP_LINK_TRAIN_PAT_1;
+
+	if (IS_CHERRYVIEW(dev))
+		DP |= DP_PIPE_SELECT_CHV(pipe);
+	else if (pipe == PIPE_B)
+		DP |= DP_PIPEB_SELECT;
+
+	pll_enabled = I915_READ(DPLL(pipe)) & DPLL_VCO_ENABLE;
+
+	/*
+	 * The DPLL for the pipe must be enabled for this to work.
+	 * So enable temporarily it if it's not already enabled.
+	 */
+	if (!pll_enabled)
+		vlv_force_pll_on(dev, pipe, IS_CHERRYVIEW(dev) ?
+				 &chv_dpll[0].dpll : &vlv_dpll[0].dpll);
+
+	/*
+	 * Similar magic as in intel_dp_enable_port().
+	 * We _must_ do this port enable + disable trick
+	 * to make this power seqeuencer lock onto the port.
+	 * Otherwise even VDD force bit won't work.
+	 */
+	I915_WRITE(intel_dp->output_reg, DP);
+	POSTING_READ(intel_dp->output_reg);
+
+	I915_WRITE(intel_dp->output_reg, DP | DP_PORT_EN);
+	POSTING_READ(intel_dp->output_reg);
+
+	I915_WRITE(intel_dp->output_reg, DP & ~DP_PORT_EN);
+	POSTING_READ(intel_dp->output_reg);
+
+	if (!pll_enabled)
+		vlv_force_pll_off(dev, pipe);
+}
+
 static enum pipe
 vlv_power_sequencer_pipe(struct intel_dp *intel_dp)
 {
@@ -330,10 +389,13 @@ vlv_power_sequencer_pipe(struct intel_dp *intel_dp)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_encoder *encoder;
 	unsigned int pipes = (1 << PIPE_A) | (1 << PIPE_B);
-	struct edp_power_seq power_seq;
+	enum pipe pipe;
 
 	lockdep_assert_held(&dev_priv->pps_mutex);
 
+	/* We should never land here with regular DP ports */
+	WARN_ON(!is_edp(intel_dp));
+
 	if (intel_dp->pps_pipe != INVALID_PIPE)
 		return intel_dp->pps_pipe;
 
@@ -359,18 +421,26 @@ vlv_power_sequencer_pipe(struct intel_dp *intel_dp)
 	 * are two power sequencers and up to two eDP ports.
 	 */
 	if (WARN_ON(pipes == 0))
-		return PIPE_A;
+		pipe = PIPE_A;
+	else
+		pipe = ffs(pipes) - 1;
 
-	intel_dp->pps_pipe = ffs(pipes) - 1;
+	vlv_steal_power_sequencer(dev, pipe);
+	intel_dp->pps_pipe = pipe;
 
 	DRM_DEBUG_KMS("picked pipe %c power sequencer for port %c\n",
 		      pipe_name(intel_dp->pps_pipe),
 		      port_name(intel_dig_port->port));
 
 	/* init power sequencer on this pipe and port */
-	intel_dp_init_panel_power_sequencer(dev, intel_dp, &power_seq);
-	intel_dp_init_panel_power_sequencer_registers(dev, intel_dp,
-						      &power_seq);
+	intel_dp_init_panel_power_sequencer(dev, intel_dp);
+	intel_dp_init_panel_power_sequencer_registers(dev, intel_dp);
+
+	/*
+	 * Even vdd force doesn't work until we've made
+	 * the power sequencer lock in on the port.
+	 */
+	vlv_power_sequencer_kick(intel_dp);
 
 	return intel_dp->pps_pipe;
 }
@@ -425,7 +495,6 @@ vlv_initial_power_sequencer_setup(struct intel_dp *intel_dp)
 	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
 	struct drm_device *dev = intel_dig_port->base.base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct edp_power_seq power_seq;
 	enum port port = intel_dig_port->port;
 
 	lockdep_assert_held(&dev_priv->pps_mutex);
@@ -453,9 +522,8 @@ vlv_initial_power_sequencer_setup(struct intel_dp *intel_dp)
 	DRM_DEBUG_KMS("initial power sequencer for port %c: pipe %c\n",
 		      port_name(port), pipe_name(intel_dp->pps_pipe));
 
-	intel_dp_init_panel_power_sequencer(dev, intel_dp, &power_seq);
-	intel_dp_init_panel_power_sequencer_registers(dev, intel_dp,
-						      &power_seq);
+	intel_dp_init_panel_power_sequencer(dev, intel_dp);
+	intel_dp_init_panel_power_sequencer_registers(dev, intel_dp);
 }
 
 void vlv_power_sequencer_reset(struct drm_i915_private *dev_priv)
@@ -550,6 +618,10 @@ static bool edp_have_panel_power(struct intel_dp *intel_dp)
 
 	lockdep_assert_held(&dev_priv->pps_mutex);
 
+	if (IS_VALLEYVIEW(dev) &&
+	    intel_dp->pps_pipe == INVALID_PIPE)
+		return false;
+
 	return (I915_READ(_pp_stat_reg(intel_dp)) & PP_ON) != 0;
 }
 
@@ -560,6 +632,10 @@ static bool edp_have_panel_vdd(struct intel_dp *intel_dp)
 
 	lockdep_assert_held(&dev_priv->pps_mutex);
 
+	if (IS_VALLEYVIEW(dev) &&
+	    intel_dp->pps_pipe == INVALID_PIPE)
+		return false;
+
 	return I915_READ(_pp_ctrl_reg(intel_dp)) & EDP_FORCE_VDD;
 }
 
@@ -661,6 +737,16 @@ static uint32_t vlv_get_aux_clock_divider(struct intel_dp *intel_dp, int index)
 	return index ? 0 : 100;
 }
 
+static uint32_t skl_get_aux_clock_divider(struct intel_dp *intel_dp, int index)
+{
+	/*
+	 * SKL doesn't need us to program the AUX clock divider (Hardware will
+	 * derive the clock from CDCLK automatically). We still implement the
+	 * get_aux_clock_divider vfunc to plug-in into the existing code.
+	 */
+	return index ? 0 : 1;
+}
+
 static uint32_t i9xx_get_aux_send_ctl(struct intel_dp *intel_dp,
 				      bool has_aux_irq,
 				      int send_bytes,
@@ -691,9 +777,24 @@ static uint32_t i9xx_get_aux_send_ctl(struct intel_dp *intel_dp,
 	       (aux_clock_divider << DP_AUX_CH_CTL_BIT_CLOCK_2X_SHIFT);
 }
 
+static uint32_t skl_get_aux_send_ctl(struct intel_dp *intel_dp,
+				      bool has_aux_irq,
+				      int send_bytes,
+				      uint32_t unused)
+{
+	return DP_AUX_CH_CTL_SEND_BUSY |
+	       DP_AUX_CH_CTL_DONE |
+	       (has_aux_irq ? DP_AUX_CH_CTL_INTERRUPT : 0) |
+	       DP_AUX_CH_CTL_TIME_OUT_ERROR |
+	       DP_AUX_CH_CTL_TIME_OUT_1600us |
+	       DP_AUX_CH_CTL_RECEIVE_ERROR |
+	       (send_bytes << DP_AUX_CH_CTL_MESSAGE_SIZE_SHIFT) |
+	       DP_AUX_CH_CTL_SYNC_PULSE_SKL(32);
+}
+
 static int
 intel_dp_aux_ch(struct intel_dp *intel_dp,
-		uint8_t *send, int send_bytes,
+		const uint8_t *send, int send_bytes,
 		uint8_t *recv, int recv_size)
 {
 	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
@@ -760,7 +861,8 @@ intel_dp_aux_ch(struct intel_dp *intel_dp,
 			/* Load the send data into the aux channel data registers */
 			for (i = 0; i < send_bytes; i += 4)
 				I915_WRITE(ch_data + i,
-					   pack_aux(send + i, send_bytes - i));
+					   intel_dp_pack_aux(send + i,
+							     send_bytes - i));
 
 			/* Send the command and wait for it to complete */
 			I915_WRITE(ch_ctl, send_ctl);
@@ -814,8 +916,8 @@ intel_dp_aux_ch(struct intel_dp *intel_dp,
 		recv_bytes = recv_size;
 
 	for (i = 0; i < recv_bytes; i += 4)
-		unpack_aux(I915_READ(ch_data + i),
-			   recv + i, recv_bytes - i);
+		intel_dp_unpack_aux(I915_READ(ch_data + i),
+				    recv + i, recv_bytes - i);
 
 	ret = recv_bytes;
 out:
@@ -925,7 +1027,16 @@ intel_dp_aux_init(struct intel_dp *intel_dp, struct intel_connector *connector)
 		BUG();
 	}
 
-	if (!HAS_DDI(dev))
+	/*
+	 * The AUX_CTL register is usually DP_CTL + 0x10.
+	 *
+	 * On Haswell and Broadwell though:
+	 *   - Both port A DDI_BUF_CTL and DDI_AUX_CTL are on the CPU
+	 *   - Port B/C/D AUX channels are on the PCH, DDI_BUF_CTL on the CPU
+	 *
+	 * Skylake moves AUX_CTL back next to DDI_BUF_CTL, on the CPU.
+	 */
+	if (!IS_HASWELL(dev) && !IS_BROADWELL(dev))
 		intel_dp->aux_ch_ctl_reg = intel_dp->output_reg + 0x10;
 
 	intel_dp->aux.name = name;
@@ -963,6 +1074,33 @@ intel_dp_connector_unregister(struct intel_connector *intel_connector)
 }
 
 static void
+skl_edp_set_pll_config(struct intel_crtc_config *pipe_config, int link_bw)
+{
+	u32 ctrl1;
+
+	pipe_config->ddi_pll_sel = SKL_DPLL0;
+	pipe_config->dpll_hw_state.cfgcr1 = 0;
+	pipe_config->dpll_hw_state.cfgcr2 = 0;
+
+	ctrl1 = DPLL_CTRL1_OVERRIDE(SKL_DPLL0);
+	switch (link_bw) {
+	case DP_LINK_BW_1_62:
+		ctrl1 |= DPLL_CRTL1_LINK_RATE(DPLL_CRTL1_LINK_RATE_810,
+					      SKL_DPLL0);
+		break;
+	case DP_LINK_BW_2_7:
+		ctrl1 |= DPLL_CRTL1_LINK_RATE(DPLL_CRTL1_LINK_RATE_1350,
+					      SKL_DPLL0);
+		break;
+	case DP_LINK_BW_5_4:
+		ctrl1 |= DPLL_CRTL1_LINK_RATE(DPLL_CRTL1_LINK_RATE_2700,
+					      SKL_DPLL0);
+		break;
+	}
+	pipe_config->dpll_hw_state.ctrl1 = ctrl1;
+}
+
+static void
 hsw_dp_set_ddi_pll_sel(struct intel_crtc_config *pipe_config, int link_bw)
 {
 	switch (link_bw) {
@@ -1139,7 +1277,9 @@ found:
 				&pipe_config->dp_m2_n2);
 	}
 
-	if (IS_HASWELL(dev) || IS_BROADWELL(dev))
+	if (IS_SKYLAKE(dev) && is_edp(intel_dp))
+		skl_edp_set_pll_config(pipe_config, intel_dp->link_bw);
+	else if (IS_HASWELL(dev) || IS_BROADWELL(dev))
 		hsw_dp_set_ddi_pll_sel(pipe_config, intel_dp->link_bw);
 	else
 		intel_dp_set_clock(encoder, pipe_config, intel_dp->link_bw);
@@ -1212,12 +1352,8 @@ static void intel_dp_prepare(struct intel_encoder *encoder)
 	intel_dp->DP |= DP_VOLTAGE_0_4 | DP_PRE_EMPHASIS_0;
 	intel_dp->DP |= DP_PORT_WIDTH(intel_dp->lane_count);
 
-	if (crtc->config.has_audio) {
-		DRM_DEBUG_DRIVER("Enabling DP audio on pipe %c\n",
-				 pipe_name(crtc->pipe));
+	if (crtc->config.has_audio)
 		intel_dp->DP |= DP_AUDIO_OUTPUT_ENABLE;
-		intel_write_eld(&encoder->base, adjusted_mode);
-	}
 
 	/* Split out the IBX/CPU vs CPT settings */
 
@@ -1367,6 +1503,7 @@ static bool edp_panel_vdd_on(struct intel_dp *intel_dp)
 	if (!is_edp(intel_dp))
 		return false;
 
+	cancel_delayed_work(&intel_dp->panel_vdd_work);
 	intel_dp->want_panel_vdd = true;
 
 	if (edp_have_panel_vdd(intel_dp))
@@ -1375,7 +1512,8 @@ static bool edp_panel_vdd_on(struct intel_dp *intel_dp)
 	power_domain = intel_display_port_power_domain(intel_encoder);
 	intel_display_power_get(dev_priv, power_domain);
 
-	DRM_DEBUG_KMS("Turning eDP VDD on\n");
+	DRM_DEBUG_KMS("Turning eDP port %c VDD on\n",
+		      port_name(intel_dig_port->port));
 
 	if (!edp_have_panel_power(intel_dp))
 		wait_panel_power_cycle(intel_dp);
@@ -1394,7 +1532,8 @@ static bool edp_panel_vdd_on(struct intel_dp *intel_dp)
 	 * If the panel wasn't on, delay before accessing aux channel
 	 */
 	if (!edp_have_panel_power(intel_dp)) {
-		DRM_DEBUG_KMS("eDP was not running\n");
+		DRM_DEBUG_KMS("eDP port %c panel power wasn't enabled\n",
+			      port_name(intel_dig_port->port));
 		msleep(intel_dp->panel_power_up_delay);
 	}
 
@@ -1419,7 +1558,8 @@ void intel_edp_panel_vdd_on(struct intel_dp *intel_dp)
 	vdd = edp_panel_vdd_on(intel_dp);
 	pps_unlock(intel_dp);
 
-	WARN(!vdd, "eDP VDD already requested on\n");
+	WARN(!vdd, "eDP port %c VDD already requested on\n",
+	     port_name(dp_to_dig_port(intel_dp)->port));
 }
 
 static void edp_panel_vdd_off_sync(struct intel_dp *intel_dp)
@@ -1440,7 +1580,8 @@ static void edp_panel_vdd_off_sync(struct intel_dp *intel_dp)
 	if (!edp_have_panel_vdd(intel_dp))
 		return;
 
-	DRM_DEBUG_KMS("Turning eDP VDD off\n");
+	DRM_DEBUG_KMS("Turning eDP port %c VDD off\n",
+		      port_name(intel_dig_port->port));
 
 	pp = ironlake_get_pp_control(intel_dp);
 	pp &= ~EDP_FORCE_VDD;
@@ -1501,7 +1642,8 @@ static void edp_panel_vdd_off(struct intel_dp *intel_dp, bool sync)
 	if (!is_edp(intel_dp))
 		return;
 
-	WARN(!intel_dp->want_panel_vdd, "eDP VDD not forced on");
+	WARN(!intel_dp->want_panel_vdd, "eDP port %c VDD not forced on",
+	     port_name(dp_to_dig_port(intel_dp)->port));
 
 	intel_dp->want_panel_vdd = false;
 
@@ -1511,40 +1653,25 @@ static void edp_panel_vdd_off(struct intel_dp *intel_dp, bool sync)
 		edp_panel_vdd_schedule_off(intel_dp);
 }
 
-/*
- * Must be paired with intel_edp_panel_vdd_on().
- * Nested calls to these functions are not allowed since
- * we drop the lock. Caller must use some higher level
- * locking to prevent nested calls from other threads.
- */
-static void intel_edp_panel_vdd_off(struct intel_dp *intel_dp, bool sync)
-{
-	if (!is_edp(intel_dp))
-		return;
-
-	pps_lock(intel_dp);
-	edp_panel_vdd_off(intel_dp, sync);
-	pps_unlock(intel_dp);
-}
-
-void intel_edp_panel_on(struct intel_dp *intel_dp)
+static void edp_panel_on(struct intel_dp *intel_dp)
 {
 	struct drm_device *dev = intel_dp_to_dev(intel_dp);
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 pp;
 	u32 pp_ctrl_reg;
 
+	lockdep_assert_held(&dev_priv->pps_mutex);
+
 	if (!is_edp(intel_dp))
 		return;
 
-	DRM_DEBUG_KMS("Turn eDP power on\n");
+	DRM_DEBUG_KMS("Turn eDP port %c panel power on\n",
+		      port_name(dp_to_dig_port(intel_dp)->port));
 
-	pps_lock(intel_dp);
-
-	if (edp_have_panel_power(intel_dp)) {
-		DRM_DEBUG_KMS("eDP power already on\n");
-		goto out;
-	}
+	if (WARN(edp_have_panel_power(intel_dp),
+		 "eDP port %c panel power already on\n",
+		 port_name(dp_to_dig_port(intel_dp)->port)))
+		return;
 
 	wait_panel_power_cycle(intel_dp);
 
@@ -1572,12 +1699,20 @@ void intel_edp_panel_on(struct intel_dp *intel_dp)
 		I915_WRITE(pp_ctrl_reg, pp);
 		POSTING_READ(pp_ctrl_reg);
 	}
+}
+
+void intel_edp_panel_on(struct intel_dp *intel_dp)
+{
+	if (!is_edp(intel_dp))
+		return;
 
- out:
+	pps_lock(intel_dp);
+	edp_panel_on(intel_dp);
 	pps_unlock(intel_dp);
 }
 
-void intel_edp_panel_off(struct intel_dp *intel_dp)
+
+static void edp_panel_off(struct intel_dp *intel_dp)
 {
 	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
 	struct intel_encoder *intel_encoder = &intel_dig_port->base;
@@ -1587,14 +1722,16 @@ void intel_edp_panel_off(struct intel_dp *intel_dp)
 	u32 pp;
 	u32 pp_ctrl_reg;
 
+	lockdep_assert_held(&dev_priv->pps_mutex);
+
 	if (!is_edp(intel_dp))
 		return;
 
-	DRM_DEBUG_KMS("Turn eDP power off\n");
-
-	pps_lock(intel_dp);
+	DRM_DEBUG_KMS("Turn eDP port %c panel power off\n",
+		      port_name(dp_to_dig_port(intel_dp)->port));
 
-	WARN(!intel_dp->want_panel_vdd, "Need VDD to turn off panel\n");
+	WARN(!intel_dp->want_panel_vdd, "Need eDP port %c VDD to turn off panel\n",
+	     port_name(dp_to_dig_port(intel_dp)->port));
 
 	pp = ironlake_get_pp_control(intel_dp);
 	/* We need to switch off panel power _and_ force vdd, for otherwise some
@@ -1615,7 +1752,15 @@ void intel_edp_panel_off(struct intel_dp *intel_dp)
 	/* We got a reference when we enabled the VDD. */
 	power_domain = intel_display_port_power_domain(intel_encoder);
 	intel_display_power_put(dev_priv, power_domain);
+}
+
+void intel_edp_panel_off(struct intel_dp *intel_dp)
+{
+	if (!is_edp(intel_dp))
+		return;
 
+	pps_lock(intel_dp);
+	edp_panel_off(intel_dp);
 	pps_unlock(intel_dp);
 }
 
@@ -1819,7 +1964,7 @@ static bool intel_dp_get_hw_state(struct intel_encoder *encoder,
 	u32 tmp;
 
 	power_domain = intel_display_port_power_domain(encoder);
-	if (!intel_display_power_enabled(dev_priv, power_domain))
+	if (!intel_display_power_is_enabled(dev_priv, power_domain))
 		return false;
 
 	tmp = I915_READ(intel_dp->output_reg);
@@ -1951,368 +2096,14 @@ static void intel_dp_get_config(struct intel_encoder *encoder,
 	}
 }
 
-static bool is_edp_psr(struct intel_dp *intel_dp)
-{
-	return intel_dp->psr_dpcd[0] & DP_PSR_IS_SUPPORTED;
-}
-
-static bool intel_edp_is_psr_enabled(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	if (!HAS_PSR(dev))
-		return false;
-
-	return I915_READ(EDP_PSR_CTL(dev)) & EDP_PSR_ENABLE;
-}
-
-static void intel_edp_psr_write_vsc(struct intel_dp *intel_dp,
-				    struct edp_vsc_psr *vsc_psr)
-{
-	struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
-	struct drm_device *dev = dig_port->base.base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_crtc *crtc = to_intel_crtc(dig_port->base.base.crtc);
-	u32 ctl_reg = HSW_TVIDEO_DIP_CTL(crtc->config.cpu_transcoder);
-	u32 data_reg = HSW_TVIDEO_DIP_VSC_DATA(crtc->config.cpu_transcoder);
-	uint32_t *data = (uint32_t *) vsc_psr;
-	unsigned int i;
-
-	/* As per BSPec (Pipe Video Data Island Packet), we need to disable
-	   the video DIP being updated before program video DIP data buffer
-	   registers for DIP being updated. */
-	I915_WRITE(ctl_reg, 0);
-	POSTING_READ(ctl_reg);
-
-	for (i = 0; i < VIDEO_DIP_VSC_DATA_SIZE; i += 4) {
-		if (i < sizeof(struct edp_vsc_psr))
-			I915_WRITE(data_reg + i, *data++);
-		else
-			I915_WRITE(data_reg + i, 0);
-	}
-
-	I915_WRITE(ctl_reg, VIDEO_DIP_ENABLE_VSC_HSW);
-	POSTING_READ(ctl_reg);
-}
-
-static void intel_edp_psr_setup(struct intel_dp *intel_dp)
-{
-	struct drm_device *dev = intel_dp_to_dev(intel_dp);
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct edp_vsc_psr psr_vsc;
-
-	/* Prepare VSC packet as per EDP 1.3 spec, Table 3.10 */
-	memset(&psr_vsc, 0, sizeof(psr_vsc));
-	psr_vsc.sdp_header.HB0 = 0;
-	psr_vsc.sdp_header.HB1 = 0x7;
-	psr_vsc.sdp_header.HB2 = 0x2;
-	psr_vsc.sdp_header.HB3 = 0x8;
-	intel_edp_psr_write_vsc(intel_dp, &psr_vsc);
-
-	/* Avoid continuous PSR exit by masking memup and hpd */
-	I915_WRITE(EDP_PSR_DEBUG_CTL(dev), EDP_PSR_DEBUG_MASK_MEMUP |
-		   EDP_PSR_DEBUG_MASK_HPD | EDP_PSR_DEBUG_MASK_LPSP);
-}
-
-static void intel_edp_psr_enable_sink(struct intel_dp *intel_dp)
-{
-	struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
-	struct drm_device *dev = dig_port->base.base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	uint32_t aux_clock_divider;
-	int precharge = 0x3;
-	int msg_size = 5;       /* Header(4) + Message(1) */
-	bool only_standby = false;
-
-	aux_clock_divider = intel_dp->get_aux_clock_divider(intel_dp, 0);
-
-	if (IS_BROADWELL(dev) && dig_port->port != PORT_A)
-		only_standby = true;
-
-	/* Enable PSR in sink */
-	if (intel_dp->psr_dpcd[1] & DP_PSR_NO_TRAIN_ON_EXIT || only_standby)
-		drm_dp_dpcd_writeb(&intel_dp->aux, DP_PSR_EN_CFG,
-				   DP_PSR_ENABLE & ~DP_PSR_MAIN_LINK_ACTIVE);
-	else
-		drm_dp_dpcd_writeb(&intel_dp->aux, DP_PSR_EN_CFG,
-				   DP_PSR_ENABLE | DP_PSR_MAIN_LINK_ACTIVE);
-
-	/* Setup AUX registers */
-	I915_WRITE(EDP_PSR_AUX_DATA1(dev), EDP_PSR_DPCD_COMMAND);
-	I915_WRITE(EDP_PSR_AUX_DATA2(dev), EDP_PSR_DPCD_NORMAL_OPERATION);
-	I915_WRITE(EDP_PSR_AUX_CTL(dev),
-		   DP_AUX_CH_CTL_TIME_OUT_400us |
-		   (msg_size << DP_AUX_CH_CTL_MESSAGE_SIZE_SHIFT) |
-		   (precharge << DP_AUX_CH_CTL_PRECHARGE_2US_SHIFT) |
-		   (aux_clock_divider << DP_AUX_CH_CTL_BIT_CLOCK_2X_SHIFT));
-}
-
-static void intel_edp_psr_enable_source(struct intel_dp *intel_dp)
-{
-	struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
-	struct drm_device *dev = dig_port->base.base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	uint32_t max_sleep_time = 0x1f;
-	uint32_t idle_frames = 1;
-	uint32_t val = 0x0;
-	const uint32_t link_entry_time = EDP_PSR_MIN_LINK_ENTRY_TIME_8_LINES;
-	bool only_standby = false;
-
-	if (IS_BROADWELL(dev) && dig_port->port != PORT_A)
-		only_standby = true;
-
-	if (intel_dp->psr_dpcd[1] & DP_PSR_NO_TRAIN_ON_EXIT || only_standby) {
-		val |= EDP_PSR_LINK_STANDBY;
-		val |= EDP_PSR_TP2_TP3_TIME_0us;
-		val |= EDP_PSR_TP1_TIME_0us;
-		val |= EDP_PSR_SKIP_AUX_EXIT;
-		val |= IS_BROADWELL(dev) ? BDW_PSR_SINGLE_FRAME : 0;
-	} else
-		val |= EDP_PSR_LINK_DISABLE;
-
-	I915_WRITE(EDP_PSR_CTL(dev), val |
-		   (IS_BROADWELL(dev) ? 0 : link_entry_time) |
-		   max_sleep_time << EDP_PSR_MAX_SLEEP_TIME_SHIFT |
-		   idle_frames << EDP_PSR_IDLE_FRAME_SHIFT |
-		   EDP_PSR_ENABLE);
-}
-
-static bool intel_edp_psr_match_conditions(struct intel_dp *intel_dp)
-{
-	struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
-	struct drm_device *dev = dig_port->base.base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_crtc *crtc = dig_port->base.base.crtc;
-	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-
-	lockdep_assert_held(&dev_priv->psr.lock);
-	WARN_ON(!drm_modeset_is_locked(&dev->mode_config.connection_mutex));
-	WARN_ON(!drm_modeset_is_locked(&crtc->mutex));
-
-	dev_priv->psr.source_ok = false;
-
-	if (IS_HASWELL(dev) && dig_port->port != PORT_A) {
-		DRM_DEBUG_KMS("HSW ties PSR to DDI A (eDP)\n");
-		return false;
-	}
-
-	if (!i915.enable_psr) {
-		DRM_DEBUG_KMS("PSR disable by flag\n");
-		return false;
-	}
-
-	/* Below limitations aren't valid for Broadwell */
-	if (IS_BROADWELL(dev))
-		goto out;
-
-	if (I915_READ(HSW_STEREO_3D_CTL(intel_crtc->config.cpu_transcoder)) &
-	    S3D_ENABLE) {
-		DRM_DEBUG_KMS("PSR condition failed: Stereo 3D is Enabled\n");
-		return false;
-	}
-
-	if (intel_crtc->config.adjusted_mode.flags & DRM_MODE_FLAG_INTERLACE) {
-		DRM_DEBUG_KMS("PSR condition failed: Interlaced is Enabled\n");
-		return false;
-	}
-
- out:
-	dev_priv->psr.source_ok = true;
-	return true;
-}
-
-static void intel_edp_psr_do_enable(struct intel_dp *intel_dp)
-{
-	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
-	struct drm_device *dev = intel_dig_port->base.base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	WARN_ON(I915_READ(EDP_PSR_CTL(dev)) & EDP_PSR_ENABLE);
-	WARN_ON(dev_priv->psr.active);
-	lockdep_assert_held(&dev_priv->psr.lock);
-
-	/* Enable PSR on the panel */
-	intel_edp_psr_enable_sink(intel_dp);
-
-	/* Enable PSR on the host */
-	intel_edp_psr_enable_source(intel_dp);
-
-	dev_priv->psr.active = true;
-}
-
-void intel_edp_psr_enable(struct intel_dp *intel_dp)
-{
-	struct drm_device *dev = intel_dp_to_dev(intel_dp);
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	if (!HAS_PSR(dev)) {
-		DRM_DEBUG_KMS("PSR not supported on this platform\n");
-		return;
-	}
-
-	if (!is_edp_psr(intel_dp)) {
-		DRM_DEBUG_KMS("PSR not supported by this panel\n");
-		return;
-	}
-
-	mutex_lock(&dev_priv->psr.lock);
-	if (dev_priv->psr.enabled) {
-		DRM_DEBUG_KMS("PSR already in use\n");
-		mutex_unlock(&dev_priv->psr.lock);
-		return;
-	}
-
-	dev_priv->psr.busy_frontbuffer_bits = 0;
-
-	/* Setup PSR once */
-	intel_edp_psr_setup(intel_dp);
-
-	if (intel_edp_psr_match_conditions(intel_dp))
-		dev_priv->psr.enabled = intel_dp;
-	mutex_unlock(&dev_priv->psr.lock);
-}
-
-void intel_edp_psr_disable(struct intel_dp *intel_dp)
-{
-	struct drm_device *dev = intel_dp_to_dev(intel_dp);
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	mutex_lock(&dev_priv->psr.lock);
-	if (!dev_priv->psr.enabled) {
-		mutex_unlock(&dev_priv->psr.lock);
-		return;
-	}
-
-	if (dev_priv->psr.active) {
-		I915_WRITE(EDP_PSR_CTL(dev),
-			   I915_READ(EDP_PSR_CTL(dev)) & ~EDP_PSR_ENABLE);
-
-		/* Wait till PSR is idle */
-		if (_wait_for((I915_READ(EDP_PSR_STATUS_CTL(dev)) &
-			       EDP_PSR_STATUS_STATE_MASK) == 0, 2000, 10))
-			DRM_ERROR("Timed out waiting for PSR Idle State\n");
-
-		dev_priv->psr.active = false;
-	} else {
-		WARN_ON(I915_READ(EDP_PSR_CTL(dev)) & EDP_PSR_ENABLE);
-	}
-
-	dev_priv->psr.enabled = NULL;
-	mutex_unlock(&dev_priv->psr.lock);
-
-	cancel_delayed_work_sync(&dev_priv->psr.work);
-}
-
-static void intel_edp_psr_work(struct work_struct *work)
-{
-	struct drm_i915_private *dev_priv =
-		container_of(work, typeof(*dev_priv), psr.work.work);
-	struct intel_dp *intel_dp = dev_priv->psr.enabled;
-
-	mutex_lock(&dev_priv->psr.lock);
-	intel_dp = dev_priv->psr.enabled;
-
-	if (!intel_dp)
-		goto unlock;
-
-	/*
-	 * The delayed work can race with an invalidate hence we need to
-	 * recheck. Since psr_flush first clears this and then reschedules we
-	 * won't ever miss a flush when bailing out here.
-	 */
-	if (dev_priv->psr.busy_frontbuffer_bits)
-		goto unlock;
-
-	intel_edp_psr_do_enable(intel_dp);
-unlock:
-	mutex_unlock(&dev_priv->psr.lock);
-}
-
-static void intel_edp_psr_do_exit(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	if (dev_priv->psr.active) {
-		u32 val = I915_READ(EDP_PSR_CTL(dev));
-
-		WARN_ON(!(val & EDP_PSR_ENABLE));
-
-		I915_WRITE(EDP_PSR_CTL(dev), val & ~EDP_PSR_ENABLE);
-
-		dev_priv->psr.active = false;
-	}
-
-}
-
-void intel_edp_psr_invalidate(struct drm_device *dev,
-			      unsigned frontbuffer_bits)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_crtc *crtc;
-	enum pipe pipe;
-
-	mutex_lock(&dev_priv->psr.lock);
-	if (!dev_priv->psr.enabled) {
-		mutex_unlock(&dev_priv->psr.lock);
-		return;
-	}
-
-	crtc = dp_to_dig_port(dev_priv->psr.enabled)->base.base.crtc;
-	pipe = to_intel_crtc(crtc)->pipe;
-
-	intel_edp_psr_do_exit(dev);
-
-	frontbuffer_bits &= INTEL_FRONTBUFFER_ALL_MASK(pipe);
-
-	dev_priv->psr.busy_frontbuffer_bits |= frontbuffer_bits;
-	mutex_unlock(&dev_priv->psr.lock);
-}
-
-void intel_edp_psr_flush(struct drm_device *dev,
-			 unsigned frontbuffer_bits)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_crtc *crtc;
-	enum pipe pipe;
-
-	mutex_lock(&dev_priv->psr.lock);
-	if (!dev_priv->psr.enabled) {
-		mutex_unlock(&dev_priv->psr.lock);
-		return;
-	}
-
-	crtc = dp_to_dig_port(dev_priv->psr.enabled)->base.base.crtc;
-	pipe = to_intel_crtc(crtc)->pipe;
-	dev_priv->psr.busy_frontbuffer_bits &= ~frontbuffer_bits;
-
-	/*
-	 * On Haswell sprite plane updates don't result in a psr invalidating
-	 * signal in the hardware. Which means we need to manually fake this in
-	 * software for all flushes, not just when we've seen a preceding
-	 * invalidation through frontbuffer rendering.
-	 */
-	if (IS_HASWELL(dev) &&
-	    (frontbuffer_bits & INTEL_FRONTBUFFER_SPRITE(pipe)))
-		intel_edp_psr_do_exit(dev);
-
-	if (!dev_priv->psr.active && !dev_priv->psr.busy_frontbuffer_bits)
-		schedule_delayed_work(&dev_priv->psr.work,
-				      msecs_to_jiffies(100));
-	mutex_unlock(&dev_priv->psr.lock);
-}
-
-void intel_edp_psr_init(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	INIT_DELAYED_WORK(&dev_priv->psr.work, intel_edp_psr_work);
-	mutex_init(&dev_priv->psr.lock);
-}
-
 static void intel_disable_dp(struct intel_encoder *encoder)
 {
 	struct intel_dp *intel_dp = enc_to_intel_dp(&encoder->base);
 	struct drm_device *dev = encoder->base.dev;
+	struct intel_crtc *crtc = to_intel_crtc(encoder->base.crtc);
+
+	if (crtc->config.has_audio)
+		intel_audio_codec_disable(encoder);
 
 	/* Make sure the panel is off before trying to change the mode. But also
 	 * ensure that we have vdd while we switch off the panel. */
@@ -2467,14 +2258,23 @@ static void intel_dp_enable_port(struct intel_dp *intel_dp)
 	struct drm_device *dev = intel_dp_to_dev(intel_dp);
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	intel_dp->DP |= DP_PORT_EN;
-
 	/* enable with pattern 1 (as per spec) */
 	_intel_dp_set_link_train(intel_dp, &intel_dp->DP,
 				 DP_TRAINING_PATTERN_1);
 
 	I915_WRITE(intel_dp->output_reg, intel_dp->DP);
 	POSTING_READ(intel_dp->output_reg);
+
+	/*
+	 * Magic for VLV/CHV. We _must_ first set up the register
+	 * without actually enabling the port, and then do another
+	 * write to enable the port. Otherwise link training will
+	 * fail when the power sequencer is freshly used for this port.
+	 */
+	intel_dp->DP |= DP_PORT_EN;
+
+	I915_WRITE(intel_dp->output_reg, intel_dp->DP);
+	POSTING_READ(intel_dp->output_reg);
 }
 
 static void intel_enable_dp(struct intel_encoder *encoder)
@@ -2482,19 +2282,38 @@ static void intel_enable_dp(struct intel_encoder *encoder)
 	struct intel_dp *intel_dp = enc_to_intel_dp(&encoder->base);
 	struct drm_device *dev = encoder->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_crtc *crtc = to_intel_crtc(encoder->base.crtc);
 	uint32_t dp_reg = I915_READ(intel_dp->output_reg);
 
 	if (WARN_ON(dp_reg & DP_PORT_EN))
 		return;
 
+	pps_lock(intel_dp);
+
+	if (IS_VALLEYVIEW(dev))
+		vlv_init_panel_power_sequencer(intel_dp);
+
 	intel_dp_enable_port(intel_dp);
-	intel_edp_panel_vdd_on(intel_dp);
-	intel_edp_panel_on(intel_dp);
-	intel_edp_panel_vdd_off(intel_dp, true);
+
+	edp_panel_vdd_on(intel_dp);
+	edp_panel_on(intel_dp);
+	edp_panel_vdd_off(intel_dp, true);
+
+	pps_unlock(intel_dp);
+
+	if (IS_VALLEYVIEW(dev))
+		vlv_wait_port_ready(dev_priv, dp_to_dig_port(intel_dp));
+
 	intel_dp_sink_dpms(intel_dp, DRM_MODE_DPMS_ON);
 	intel_dp_start_link_train(intel_dp);
 	intel_dp_complete_link_train(intel_dp);
 	intel_dp_stop_link_train(intel_dp);
+
+	if (crtc->config.has_audio) {
+		DRM_DEBUG_DRIVER("Enabling DP audio on pipe %c\n",
+				 pipe_name(crtc->pipe));
+		intel_audio_codec_enable(encoder);
+	}
 }
 
 static void g4x_enable_dp(struct intel_encoder *encoder)
@@ -2526,6 +2345,32 @@ static void g4x_pre_enable_dp(struct intel_encoder *encoder)
 	}
 }
 
+static void vlv_detach_power_sequencer(struct intel_dp *intel_dp)
+{
+	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
+	struct drm_i915_private *dev_priv = intel_dig_port->base.base.dev->dev_private;
+	enum pipe pipe = intel_dp->pps_pipe;
+	int pp_on_reg = VLV_PIPE_PP_ON_DELAYS(pipe);
+
+	edp_panel_vdd_off_sync(intel_dp);
+
+	/*
+	 * VLV seems to get confused when multiple power seqeuencers
+	 * have the same port selected (even if only one has power/vdd
+	 * enabled). The failure manifests as vlv_wait_port_ready() failing
+	 * CHV on the other hand doesn't seem to mind having the same port
+	 * selected in multiple power seqeuencers, but let's clear the
+	 * port select always when logically disconnecting a power sequencer
+	 * from a port.
+	 */
+	DRM_DEBUG_KMS("detaching pipe %c power sequencer from port %c\n",
+		      pipe_name(pipe), port_name(intel_dig_port->port));
+	I915_WRITE(pp_on_reg, 0);
+	POSTING_READ(pp_on_reg);
+
+	intel_dp->pps_pipe = INVALID_PIPE;
+}
+
 static void vlv_steal_power_sequencer(struct drm_device *dev,
 				      enum pipe pipe)
 {
@@ -2534,6 +2379,9 @@ static void vlv_steal_power_sequencer(struct drm_device *dev,
 
 	lockdep_assert_held(&dev_priv->pps_mutex);
 
+	if (WARN_ON(pipe != PIPE_A && pipe != PIPE_B))
+		return;
+
 	list_for_each_entry(encoder, &dev->mode_config.encoder_list,
 			    base.head) {
 		struct intel_dp *intel_dp;
@@ -2551,10 +2399,12 @@ static void vlv_steal_power_sequencer(struct drm_device *dev,
 		DRM_DEBUG_KMS("stealing pipe %c power sequencer from port %c\n",
 			      pipe_name(pipe), port_name(port));
 
-		/* make sure vdd is off before we steal it */
-		edp_panel_vdd_off_sync(intel_dp);
+		WARN(encoder->connectors_active,
+		     "stealing pipe %c power sequencer from active eDP port %c\n",
+		     pipe_name(pipe), port_name(port));
 
-		intel_dp->pps_pipe = INVALID_PIPE;
+		/* make sure vdd is off before we steal it */
+		vlv_detach_power_sequencer(intel_dp);
 	}
 }
 
@@ -2565,10 +2415,12 @@ static void vlv_init_panel_power_sequencer(struct intel_dp *intel_dp)
 	struct drm_device *dev = encoder->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_crtc *crtc = to_intel_crtc(encoder->base.crtc);
-	struct edp_power_seq power_seq;
 
 	lockdep_assert_held(&dev_priv->pps_mutex);
 
+	if (!is_edp(intel_dp))
+		return;
+
 	if (intel_dp->pps_pipe == crtc->pipe)
 		return;
 
@@ -2578,7 +2430,7 @@ static void vlv_init_panel_power_sequencer(struct intel_dp *intel_dp)
 	 * we still have control of it.
 	 */
 	if (intel_dp->pps_pipe != INVALID_PIPE)
-		edp_panel_vdd_off_sync(intel_dp);
+		vlv_detach_power_sequencer(intel_dp);
 
 	/*
 	 * We may be stealing the power
@@ -2593,9 +2445,8 @@ static void vlv_init_panel_power_sequencer(struct intel_dp *intel_dp)
 		      pipe_name(intel_dp->pps_pipe), port_name(intel_dig_port->port));
 
 	/* init power sequencer on this pipe and port */
-	intel_dp_init_panel_power_sequencer(dev, intel_dp, &power_seq);
-	intel_dp_init_panel_power_sequencer_registers(dev, intel_dp,
-						      &power_seq);
+	intel_dp_init_panel_power_sequencer(dev, intel_dp);
+	intel_dp_init_panel_power_sequencer_registers(dev, intel_dp);
 }
 
 static void vlv_pre_enable_dp(struct intel_encoder *encoder)
@@ -2624,15 +2475,7 @@ static void vlv_pre_enable_dp(struct intel_encoder *encoder)
 
 	mutex_unlock(&dev_priv->dpio_lock);
 
-	if (is_edp(intel_dp)) {
-		pps_lock(intel_dp);
-		vlv_init_panel_power_sequencer(intel_dp);
-		pps_unlock(intel_dp);
-	}
-
 	intel_enable_dp(encoder);
-
-	vlv_wait_port_ready(dev_priv, dport);
 }
 
 static void vlv_dp_pre_pll_enable(struct intel_encoder *encoder)
@@ -2680,6 +2523,15 @@ static void chv_pre_enable_dp(struct intel_encoder *encoder)
 
 	mutex_lock(&dev_priv->dpio_lock);
 
+	/* allow hardware to manage TX FIFO reset source */
+	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS01_DW11(ch));
+	val &= ~DPIO_LANEDESKEW_STRAP_OVRD;
+	vlv_dpio_write(dev_priv, pipe, VLV_PCS01_DW11(ch), val);
+
+	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS23_DW11(ch));
+	val &= ~DPIO_LANEDESKEW_STRAP_OVRD;
+	vlv_dpio_write(dev_priv, pipe, VLV_PCS23_DW11(ch), val);
+
 	/* Deassert soft data lane reset*/
 	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS01_DW1(ch));
 	val |= CHV_PCS_REQ_SOFTRESET_EN;
@@ -2715,15 +2567,7 @@ static void chv_pre_enable_dp(struct intel_encoder *encoder)
 
 	mutex_unlock(&dev_priv->dpio_lock);
 
-	if (is_edp(intel_dp)) {
-		pps_lock(intel_dp);
-		vlv_init_panel_power_sequencer(intel_dp);
-		pps_unlock(intel_dp);
-	}
-
 	intel_enable_dp(encoder);
-
-	vlv_wait_port_ready(dev_priv, dport);
 }
 
 static void chv_dp_pre_pll_enable(struct intel_encoder *encoder)
@@ -2843,7 +2687,9 @@ intel_dp_voltage_max(struct intel_dp *intel_dp)
 	struct drm_device *dev = intel_dp_to_dev(intel_dp);
 	enum port port = dp_to_dig_port(intel_dp)->port;
 
-	if (IS_VALLEYVIEW(dev))
+	if (INTEL_INFO(dev)->gen >= 9)
+		return DP_TRAIN_VOLTAGE_SWING_LEVEL_2;
+	else if (IS_VALLEYVIEW(dev))
 		return DP_TRAIN_VOLTAGE_SWING_LEVEL_3;
 	else if (IS_GEN7(dev) && port == PORT_A)
 		return DP_TRAIN_VOLTAGE_SWING_LEVEL_2;
@@ -2859,7 +2705,18 @@ intel_dp_pre_emphasis_max(struct intel_dp *intel_dp, uint8_t voltage_swing)
 	struct drm_device *dev = intel_dp_to_dev(intel_dp);
 	enum port port = dp_to_dig_port(intel_dp)->port;
 
-	if (IS_HASWELL(dev) || IS_BROADWELL(dev)) {
+	if (INTEL_INFO(dev)->gen >= 9) {
+		switch (voltage_swing & DP_TRAIN_VOLTAGE_SWING_MASK) {
+		case DP_TRAIN_VOLTAGE_SWING_LEVEL_0:
+			return DP_TRAIN_PRE_EMPH_LEVEL_3;
+		case DP_TRAIN_VOLTAGE_SWING_LEVEL_1:
+			return DP_TRAIN_PRE_EMPH_LEVEL_2;
+		case DP_TRAIN_VOLTAGE_SWING_LEVEL_2:
+			return DP_TRAIN_PRE_EMPH_LEVEL_1;
+		default:
+			return DP_TRAIN_PRE_EMPH_LEVEL_0;
+		}
+	} else if (IS_HASWELL(dev) || IS_BROADWELL(dev)) {
 		switch (voltage_swing & DP_TRAIN_VOLTAGE_SWING_MASK) {
 		case DP_TRAIN_VOLTAGE_SWING_LEVEL_0:
 			return DP_TRAIN_PRE_EMPH_LEVEL_3;
@@ -3095,12 +2952,26 @@ static uint32_t intel_chv_signal_levels(struct intel_dp *intel_dp)
 	/* Clear calc init */
 	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS01_DW10(ch));
 	val &= ~(DPIO_PCS_SWING_CALC_TX0_TX2 | DPIO_PCS_SWING_CALC_TX1_TX3);
+	val &= ~(DPIO_PCS_TX1DEEMP_MASK | DPIO_PCS_TX2DEEMP_MASK);
+	val |= DPIO_PCS_TX1DEEMP_9P5 | DPIO_PCS_TX2DEEMP_9P5;
 	vlv_dpio_write(dev_priv, pipe, VLV_PCS01_DW10(ch), val);
 
 	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS23_DW10(ch));
 	val &= ~(DPIO_PCS_SWING_CALC_TX0_TX2 | DPIO_PCS_SWING_CALC_TX1_TX3);
+	val &= ~(DPIO_PCS_TX1DEEMP_MASK | DPIO_PCS_TX2DEEMP_MASK);
+	val |= DPIO_PCS_TX1DEEMP_9P5 | DPIO_PCS_TX2DEEMP_9P5;
 	vlv_dpio_write(dev_priv, pipe, VLV_PCS23_DW10(ch), val);
 
+	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS01_DW9(ch));
+	val &= ~(DPIO_PCS_TX1MARGIN_MASK | DPIO_PCS_TX2MARGIN_MASK);
+	val |= DPIO_PCS_TX1MARGIN_000 | DPIO_PCS_TX2MARGIN_000;
+	vlv_dpio_write(dev_priv, pipe, VLV_PCS01_DW9(ch), val);
+
+	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS23_DW9(ch));
+	val &= ~(DPIO_PCS_TX1MARGIN_MASK | DPIO_PCS_TX2MARGIN_MASK);
+	val |= DPIO_PCS_TX1MARGIN_000 | DPIO_PCS_TX2MARGIN_000;
+	vlv_dpio_write(dev_priv, pipe, VLV_PCS23_DW9(ch), val);
+
 	/* Program swing deemph */
 	for (i = 0; i < 4; i++) {
 		val = vlv_dpio_read(dev_priv, pipe, CHV_TX_DW4(ch, i));
@@ -3341,7 +3212,7 @@ intel_dp_set_signal_levels(struct intel_dp *intel_dp, uint32_t *DP)
 	uint32_t signal_levels, mask;
 	uint8_t train_set = intel_dp->train_set[0];
 
-	if (IS_HASWELL(dev) || IS_BROADWELL(dev)) {
+	if (IS_HASWELL(dev) || IS_BROADWELL(dev) || INTEL_INFO(dev)->gen >= 9) {
 		signal_levels = intel_hsw_signal_levels(train_set);
 		mask = DDI_BUF_EMP_MASK;
 	} else if (IS_CHERRYVIEW(dev)) {
@@ -3605,7 +3476,6 @@ intel_dp_complete_link_train(struct intel_dp *intel_dp)
 
 		/* Try 5 times, then try clock recovery if that fails */
 		if (tries > 5) {
-			intel_dp_link_down(intel_dp);
 			intel_dp_start_link_train(intel_dp);
 			intel_dp_set_link_train(intel_dp, &DP,
 						training_pattern |
@@ -3763,8 +3633,6 @@ intel_dp_probe_oui(struct intel_dp *intel_dp)
 	if (!(intel_dp->dpcd[DP_DOWN_STREAM_PORT_COUNT] & DP_OUI_SUPPORT))
 		return;
 
-	intel_edp_panel_vdd_on(intel_dp);
-
 	if (intel_dp_dpcd_read_wake(&intel_dp->aux, DP_SINK_OUI, buf, 3) == 3)
 		DRM_DEBUG_KMS("Sink OUI: %02hx%02hx%02hx\n",
 			      buf[0], buf[1], buf[2]);
@@ -3772,8 +3640,6 @@ intel_dp_probe_oui(struct intel_dp *intel_dp)
 	if (intel_dp_dpcd_read_wake(&intel_dp->aux, DP_BRANCH_OUI, buf, 3) == 3)
 		DRM_DEBUG_KMS("Branch OUI: %02hx%02hx%02hx\n",
 			      buf[0], buf[1], buf[2]);
-
-	intel_edp_panel_vdd_off(intel_dp, false);
 }
 
 static bool
@@ -3787,7 +3653,6 @@ intel_dp_probe_mst(struct intel_dp *intel_dp)
 	if (intel_dp->dpcd[DP_DPCD_REV] < 0x12)
 		return false;
 
-	intel_edp_panel_vdd_on(intel_dp);
 	if (intel_dp_dpcd_read_wake(&intel_dp->aux, DP_MSTM_CAP, buf, 1)) {
 		if (buf[0] & DP_MST_CAP) {
 			DRM_DEBUG_KMS("Sink is MST capable\n");
@@ -3797,7 +3662,6 @@ intel_dp_probe_mst(struct intel_dp *intel_dp)
 			intel_dp->is_mst = false;
 		}
 	}
-	intel_edp_panel_vdd_off(intel_dp, false);
 
 	drm_dp_mst_topology_mgr_set_mst(&intel_dp->mst_mgr, intel_dp->is_mst);
 	return intel_dp->is_mst;
@@ -3809,26 +3673,48 @@ int intel_dp_sink_crc(struct intel_dp *intel_dp, u8 *crc)
 	struct drm_device *dev = intel_dig_port->base.base.dev;
 	struct intel_crtc *intel_crtc =
 		to_intel_crtc(intel_dig_port->base.base.crtc);
-	u8 buf[1];
+	u8 buf;
+	int test_crc_count;
+	int attempts = 6;
 
-	if (drm_dp_dpcd_readb(&intel_dp->aux, DP_TEST_SINK_MISC, buf) < 0)
+	if (drm_dp_dpcd_readb(&intel_dp->aux, DP_TEST_SINK_MISC, &buf) < 0)
 		return -EIO;
 
-	if (!(buf[0] & DP_TEST_CRC_SUPPORTED))
+	if (!(buf & DP_TEST_CRC_SUPPORTED))
 		return -ENOTTY;
 
+	if (drm_dp_dpcd_readb(&intel_dp->aux, DP_TEST_SINK, &buf) < 0)
+		return -EIO;
+
 	if (drm_dp_dpcd_writeb(&intel_dp->aux, DP_TEST_SINK,
-			       DP_TEST_SINK_START) < 0)
+				buf | DP_TEST_SINK_START) < 0)
+		return -EIO;
+
+	if (drm_dp_dpcd_readb(&intel_dp->aux, DP_TEST_SINK_MISC, &buf) < 0)
 		return -EIO;
+	test_crc_count = buf & DP_TEST_COUNT_MASK;
 
-	/* Wait 2 vblanks to be sure we will have the correct CRC value */
-	intel_wait_for_vblank(dev, intel_crtc->pipe);
-	intel_wait_for_vblank(dev, intel_crtc->pipe);
+	do {
+		if (drm_dp_dpcd_readb(&intel_dp->aux,
+				      DP_TEST_SINK_MISC, &buf) < 0)
+			return -EIO;
+		intel_wait_for_vblank(dev, intel_crtc->pipe);
+	} while (--attempts && (buf & DP_TEST_COUNT_MASK) == test_crc_count);
+
+	if (attempts == 0) {
+		DRM_DEBUG_KMS("Panel is unable to calculate CRC after 6 vblanks\n");
+		return -ETIMEDOUT;
+	}
 
 	if (drm_dp_dpcd_read(&intel_dp->aux, DP_TEST_CRC_R_CR, crc, 6) < 0)
 		return -EIO;
 
-	drm_dp_dpcd_writeb(&intel_dp->aux, DP_TEST_SINK, 0);
+	if (drm_dp_dpcd_readb(&intel_dp->aux, DP_TEST_SINK, &buf) < 0)
+		return -EIO;
+	if (drm_dp_dpcd_writeb(&intel_dp->aux, DP_TEST_SINK,
+			       buf & ~DP_TEST_SINK_START) < 0)
+		return -EIO;
+
 	return 0;
 }
 
@@ -4456,9 +4342,52 @@ static void intel_dp_encoder_suspend(struct intel_encoder *intel_encoder)
 	pps_unlock(intel_dp);
 }
 
+static void intel_edp_panel_vdd_sanitize(struct intel_dp *intel_dp)
+{
+	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
+	struct drm_device *dev = intel_dig_port->base.base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	enum intel_display_power_domain power_domain;
+
+	lockdep_assert_held(&dev_priv->pps_mutex);
+
+	if (!edp_have_panel_vdd(intel_dp))
+		return;
+
+	/*
+	 * The VDD bit needs a power domain reference, so if the bit is
+	 * already enabled when we boot or resume, grab this reference and
+	 * schedule a vdd off, so we don't hold on to the reference
+	 * indefinitely.
+	 */
+	DRM_DEBUG_KMS("VDD left on by BIOS, adjusting state tracking\n");
+	power_domain = intel_display_port_power_domain(&intel_dig_port->base);
+	intel_display_power_get(dev_priv, power_domain);
+
+	edp_panel_vdd_schedule_off(intel_dp);
+}
+
 static void intel_dp_encoder_reset(struct drm_encoder *encoder)
 {
-	intel_edp_panel_vdd_sanitize(to_intel_encoder(encoder));
+	struct intel_dp *intel_dp;
+
+	if (to_intel_encoder(encoder)->type != INTEL_OUTPUT_EDP)
+		return;
+
+	intel_dp = enc_to_intel_dp(encoder);
+
+	pps_lock(intel_dp);
+
+	/*
+	 * Read out the current power sequencer assignment,
+	 * in case the BIOS did something with it.
+	 */
+	if (IS_VALLEYVIEW(encoder->dev))
+		vlv_initial_power_sequencer_setup(intel_dp);
+
+	intel_edp_panel_vdd_sanitize(intel_dp);
+
+	pps_unlock(intel_dp);
 }
 
 static const struct drm_connector_funcs intel_dp_connector_funcs = {
@@ -4645,16 +4574,20 @@ static void intel_dp_init_panel_power_timestamps(struct intel_dp *intel_dp)
 
 static void
 intel_dp_init_panel_power_sequencer(struct drm_device *dev,
-				    struct intel_dp *intel_dp,
-				    struct edp_power_seq *out)
+				    struct intel_dp *intel_dp)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct edp_power_seq cur, vbt, spec, final;
+	struct edp_power_seq cur, vbt, spec,
+		*final = &intel_dp->pps_delays;
 	u32 pp_on, pp_off, pp_div, pp;
 	int pp_ctrl_reg, pp_on_reg, pp_off_reg, pp_div_reg;
 
 	lockdep_assert_held(&dev_priv->pps_mutex);
 
+	/* already initialized? */
+	if (final->t11_t12 != 0)
+		return;
+
 	if (HAS_PCH_SPLIT(dev)) {
 		pp_ctrl_reg = PCH_PP_CONTROL;
 		pp_on_reg = PCH_PP_ON_DELAYS;
@@ -4716,7 +4649,7 @@ intel_dp_init_panel_power_sequencer(struct drm_device *dev,
 
 	/* Use the max of the register settings and vbt. If both are
 	 * unset, fall back to the spec limits. */
-#define assign_final(field)	final.field = (max(cur.field, vbt.field) == 0 ? \
+#define assign_final(field)	final->field = (max(cur.field, vbt.field) == 0 ? \
 				       spec.field : \
 				       max(cur.field, vbt.field))
 	assign_final(t1_t3);
@@ -4726,7 +4659,7 @@ intel_dp_init_panel_power_sequencer(struct drm_device *dev,
 	assign_final(t11_t12);
 #undef assign_final
 
-#define get_delay(field)	(DIV_ROUND_UP(final.field, 10))
+#define get_delay(field)	(DIV_ROUND_UP(final->field, 10))
 	intel_dp->panel_power_up_delay = get_delay(t1_t3);
 	intel_dp->backlight_on_delay = get_delay(t8);
 	intel_dp->backlight_off_delay = get_delay(t9);
@@ -4740,21 +4673,18 @@ intel_dp_init_panel_power_sequencer(struct drm_device *dev,
 
 	DRM_DEBUG_KMS("backlight on delay %d, off delay %d\n",
 		      intel_dp->backlight_on_delay, intel_dp->backlight_off_delay);
-
-	if (out)
-		*out = final;
 }
 
 static void
 intel_dp_init_panel_power_sequencer_registers(struct drm_device *dev,
-					      struct intel_dp *intel_dp,
-					      struct edp_power_seq *seq)
+					      struct intel_dp *intel_dp)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 pp_on, pp_off, pp_div, port_sel = 0;
 	int div = HAS_PCH_SPLIT(dev) ? intel_pch_rawclk(dev) : intel_hrawclk(dev);
 	int pp_on_reg, pp_off_reg, pp_div_reg;
 	enum port port = dp_to_dig_port(intel_dp)->port;
+	const struct edp_power_seq *seq = &intel_dp->pps_delays;
 
 	lockdep_assert_held(&dev_priv->pps_mutex);
 
@@ -4837,7 +4767,7 @@ void intel_dp_set_drrs_state(struct drm_device *dev, int refresh_rate)
 	 * hard to tell without seeing the user of this function of this code.
 	 * Check locking and ordering once that lands.
 	 */
-	if (INTEL_INFO(dev)->gen < 8 && intel_edp_is_psr_enabled(dev)) {
+	if (INTEL_INFO(dev)->gen < 8 && intel_psr_is_enabled(dev)) {
 		DRM_DEBUG_KMS("DRRS is disabled as PSR is enabled\n");
 		return;
 	}
@@ -4940,40 +4870,8 @@ intel_dp_drrs_init(struct intel_digital_port *intel_dig_port,
 	return downclock_mode;
 }
 
-void intel_edp_panel_vdd_sanitize(struct intel_encoder *intel_encoder)
-{
-	struct drm_device *dev = intel_encoder->base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_dp *intel_dp;
-	enum intel_display_power_domain power_domain;
-
-	if (intel_encoder->type != INTEL_OUTPUT_EDP)
-		return;
-
-	intel_dp = enc_to_intel_dp(&intel_encoder->base);
-
-	pps_lock(intel_dp);
-
-	if (!edp_have_panel_vdd(intel_dp))
-		goto out;
-	/*
-	 * The VDD bit needs a power domain reference, so if the bit is
-	 * already enabled when we boot or resume, grab this reference and
-	 * schedule a vdd off, so we don't hold on to the reference
-	 * indefinitely.
-	 */
-	DRM_DEBUG_KMS("VDD left on by BIOS, adjusting state tracking\n");
-	power_domain = intel_display_port_power_domain(intel_encoder);
-	intel_display_power_get(dev_priv, power_domain);
-
-	edp_panel_vdd_schedule_off(intel_dp);
- out:
-	pps_unlock(intel_dp);
-}
-
 static bool intel_edp_init_connector(struct intel_dp *intel_dp,
-				     struct intel_connector *intel_connector,
-				     struct edp_power_seq *power_seq)
+				     struct intel_connector *intel_connector)
 {
 	struct drm_connector *connector = &intel_connector->base;
 	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
@@ -4985,18 +4883,19 @@ static bool intel_edp_init_connector(struct intel_dp *intel_dp,
 	bool has_dpcd;
 	struct drm_display_mode *scan;
 	struct edid *edid;
+	enum pipe pipe = INVALID_PIPE;
 
 	intel_dp->drrs_state.type = DRRS_NOT_SUPPORTED;
 
 	if (!is_edp(intel_dp))
 		return true;
 
-	intel_edp_panel_vdd_sanitize(intel_encoder);
+	pps_lock(intel_dp);
+	intel_edp_panel_vdd_sanitize(intel_dp);
+	pps_unlock(intel_dp);
 
 	/* Cache DPCD and EDID for edp. */
-	intel_edp_panel_vdd_on(intel_dp);
 	has_dpcd = intel_dp_get_dpcd(intel_dp);
-	intel_edp_panel_vdd_off(intel_dp, false);
 
 	if (has_dpcd) {
 		if (intel_dp->dpcd[DP_DPCD_REV] >= 0x11)
@@ -5011,7 +4910,7 @@ static bool intel_edp_init_connector(struct intel_dp *intel_dp,
 
 	/* We now know it's not a ghost, init power sequence regs. */
 	pps_lock(intel_dp);
-	intel_dp_init_panel_power_sequencer_registers(dev, intel_dp, power_seq);
+	intel_dp_init_panel_power_sequencer_registers(dev, intel_dp);
 	pps_unlock(intel_dp);
 
 	mutex_lock(&dev->mode_config.mutex);
@@ -5053,11 +4952,30 @@ static bool intel_edp_init_connector(struct intel_dp *intel_dp,
 	if (IS_VALLEYVIEW(dev)) {
 		intel_dp->edp_notifier.notifier_call = edp_notify_handler;
 		register_reboot_notifier(&intel_dp->edp_notifier);
+
+		/*
+		 * Figure out the current pipe for the initial backlight setup.
+		 * If the current pipe isn't valid, try the PPS pipe, and if that
+		 * fails just assume pipe A.
+		 */
+		if (IS_CHERRYVIEW(dev))
+			pipe = DP_PORT_TO_PIPE_CHV(intel_dp->DP);
+		else
+			pipe = PORT_TO_PIPE(intel_dp->DP);
+
+		if (pipe != PIPE_A && pipe != PIPE_B)
+			pipe = intel_dp->pps_pipe;
+
+		if (pipe != PIPE_A && pipe != PIPE_B)
+			pipe = PIPE_A;
+
+		DRM_DEBUG_KMS("using pipe %c for initial backlight setup\n",
+			      pipe_name(pipe));
 	}
 
 	intel_panel_init(&intel_connector->panel, fixed_mode, downclock_mode);
 	intel_connector->panel.backlight_power = intel_edp_backlight_power;
-	intel_panel_setup_backlight(connector);
+	intel_panel_setup_backlight(connector, pipe);
 
 	return true;
 }
@@ -5072,13 +4990,14 @@ intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
 	struct drm_device *dev = intel_encoder->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	enum port port = intel_dig_port->port;
-	struct edp_power_seq power_seq = { 0 };
 	int type;
 
 	intel_dp->pps_pipe = INVALID_PIPE;
 
 	/* intel_dp vfuncs */
-	if (IS_VALLEYVIEW(dev))
+	if (INTEL_INFO(dev)->gen >= 9)
+		intel_dp->get_aux_clock_divider = skl_get_aux_clock_divider;
+	else if (IS_VALLEYVIEW(dev))
 		intel_dp->get_aux_clock_divider = vlv_get_aux_clock_divider;
 	else if (IS_HASWELL(dev) || IS_BROADWELL(dev))
 		intel_dp->get_aux_clock_divider = hsw_get_aux_clock_divider;
@@ -5087,7 +5006,10 @@ intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
 	else
 		intel_dp->get_aux_clock_divider = i9xx_get_aux_clock_divider;
 
-	intel_dp->get_aux_send_ctl = i9xx_get_aux_send_ctl;
+	if (INTEL_INFO(dev)->gen >= 9)
+		intel_dp->get_aux_send_ctl = skl_get_aux_send_ctl;
+	else
+		intel_dp->get_aux_send_ctl = i9xx_get_aux_send_ctl;
 
 	/* Preserve the current hw state. */
 	intel_dp->DP = I915_READ(intel_dp->output_reg);
@@ -5106,6 +5028,11 @@ intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
 	if (type == DRM_MODE_CONNECTOR_eDP)
 		intel_encoder->type = INTEL_OUTPUT_EDP;
 
+	/* eDP only on port B and/or C on vlv/chv */
+	if (WARN_ON(IS_VALLEYVIEW(dev) && is_edp(intel_dp) &&
+		    port != PORT_B && port != PORT_C))
+		return false;
+
 	DRM_DEBUG_KMS("Adding %s connector on port %c\n",
 			type == DRM_MODE_CONNECTOR_eDP ? "eDP" : "DP",
 			port_name(port));
@@ -5148,13 +5075,11 @@ intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
 
 	if (is_edp(intel_dp)) {
 		pps_lock(intel_dp);
-		if (IS_VALLEYVIEW(dev)) {
+		intel_dp_init_panel_power_timestamps(intel_dp);
+		if (IS_VALLEYVIEW(dev))
 			vlv_initial_power_sequencer_setup(intel_dp);
-		} else {
-			intel_dp_init_panel_power_timestamps(intel_dp);
-			intel_dp_init_panel_power_sequencer(dev, intel_dp,
-							    &power_seq);
-		}
+		else
+			intel_dp_init_panel_power_sequencer(dev, intel_dp);
 		pps_unlock(intel_dp);
 	}
 
@@ -5168,7 +5093,7 @@ intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
 		}
 	}
 
-	if (!intel_edp_init_connector(intel_dp, intel_connector, &power_seq)) {
+	if (!intel_edp_init_connector(intel_dp, intel_connector)) {
 		drm_dp_aux_unregister(&intel_dp->aux);
 		if (is_edp(intel_dp)) {
 			cancel_delayed_work_sync(&intel_dp->panel_vdd_work);
diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
index d9a7a7865f66..7f8c6a66680a 100644
--- a/drivers/gpu/drm/i915/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/intel_dp_mst.c
@@ -278,20 +278,12 @@ static int intel_dp_mst_get_ddc_modes(struct drm_connector *connector)
 }
 
 static enum drm_connector_status
-intel_mst_port_dp_detect(struct drm_connector *connector)
+intel_dp_mst_detect(struct drm_connector *connector, bool force)
 {
 	struct intel_connector *intel_connector = to_intel_connector(connector);
 	struct intel_dp *intel_dp = intel_connector->mst_port;
 
-	return drm_dp_mst_detect_port(&intel_dp->mst_mgr, intel_connector->port);
-}
-
-static enum drm_connector_status
-intel_dp_mst_detect(struct drm_connector *connector, bool force)
-{
-	enum drm_connector_status status;
-	status = intel_mst_port_dp_detect(connector);
-	return status;
+	return drm_dp_mst_detect_port(connector, &intel_dp->mst_mgr, intel_connector->port);
 }
 
 static int
@@ -393,7 +385,7 @@ static void intel_connector_remove_from_fbdev(struct intel_connector *connector)
 #endif
 }
 
-static struct drm_connector *intel_dp_add_mst_connector(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port, char *pathprop)
+static struct drm_connector *intel_dp_add_mst_connector(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port, const char *pathprop)
 {
 	struct intel_dp *intel_dp = container_of(mgr, struct intel_dp, mst_mgr);
 	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
@@ -422,6 +414,8 @@ static struct drm_connector *intel_dp_add_mst_connector(struct drm_dp_mst_topolo
 	intel_dp_add_properties(intel_dp, connector);
 
 	drm_object_attach_property(&connector->base, dev->mode_config.path_property, 0);
+	drm_object_attach_property(&connector->base, dev->mode_config.tile_property, 0);
+
 	drm_mode_connector_set_path_property(connector, pathprop);
 	drm_reinit_primary_mode_group(dev);
 	mutex_lock(&dev->mode_config.mutex);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index ba715229a540..25fdbb16d4e0 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -34,6 +34,7 @@
 #include <drm/drm_crtc_helper.h>
 #include <drm/drm_fb_helper.h>
 #include <drm/drm_dp_mst_helper.h>
+#include <drm/drm_rect.h>
 
 #define DIV_ROUND_CLOSEST_ULL(ll, d)	\
 ({ unsigned long long _tmp = (ll)+(d)/2; do_div(_tmp, d); _tmp; })
@@ -93,18 +94,20 @@
 
 /* these are outputs from the chip - integrated only
    external chips are via DVO or SDVO output */
-#define INTEL_OUTPUT_UNUSED 0
-#define INTEL_OUTPUT_ANALOG 1
-#define INTEL_OUTPUT_DVO 2
-#define INTEL_OUTPUT_SDVO 3
-#define INTEL_OUTPUT_LVDS 4
-#define INTEL_OUTPUT_TVOUT 5
-#define INTEL_OUTPUT_HDMI 6
-#define INTEL_OUTPUT_DISPLAYPORT 7
-#define INTEL_OUTPUT_EDP 8
-#define INTEL_OUTPUT_DSI 9
-#define INTEL_OUTPUT_UNKNOWN 10
-#define INTEL_OUTPUT_DP_MST 11
+enum intel_output_type {
+	INTEL_OUTPUT_UNUSED = 0,
+	INTEL_OUTPUT_ANALOG = 1,
+	INTEL_OUTPUT_DVO = 2,
+	INTEL_OUTPUT_SDVO = 3,
+	INTEL_OUTPUT_LVDS = 4,
+	INTEL_OUTPUT_TVOUT = 5,
+	INTEL_OUTPUT_HDMI = 6,
+	INTEL_OUTPUT_DISPLAYPORT = 7,
+	INTEL_OUTPUT_EDP = 8,
+	INTEL_OUTPUT_DSI = 9,
+	INTEL_OUTPUT_UNKNOWN = 10,
+	INTEL_OUTPUT_DP_MST = 11,
+};
 
 #define INTEL_DVO_CHIP_NONE 0
 #define INTEL_DVO_CHIP_LVDS 1
@@ -135,7 +138,7 @@ struct intel_encoder {
 	 */
 	struct intel_crtc *new_crtc;
 
-	int type;
+	enum intel_output_type type;
 	unsigned int cloneable;
 	bool connectors_active;
 	void (*hot_plug)(struct intel_encoder *);
@@ -240,6 +243,17 @@ typedef struct dpll {
 	int	p;
 } intel_clock_t;
 
+struct intel_plane_state {
+	struct drm_crtc *crtc;
+	struct drm_framebuffer *fb;
+	struct drm_rect src;
+	struct drm_rect dst;
+	struct drm_rect clip;
+	struct drm_rect orig_src;
+	struct drm_rect orig_dst;
+	bool visible;
+};
+
 struct intel_plane_config {
 	bool tiled;
 	int size;
@@ -278,6 +292,9 @@ struct intel_crtc_config {
 	 * between pch encoders and cpu encoders. */
 	bool has_pch_encoder;
 
+	/* Are we sending infoframes on the attached port */
+	bool has_infoframe;
+
 	/* CPU Transcoder for the pipe. Currently this can only differ from the
 	 * pipe on Haswell (where we have a special eDP transcoder). */
 	enum transcoder cpu_transcoder;
@@ -326,7 +343,10 @@ struct intel_crtc_config {
 	/* Selected dpll when shared or DPLL_ID_PRIVATE. */
 	enum intel_dpll_id shared_dpll;
 
-	/* PORT_CLK_SEL for DDI ports. */
+	/*
+	 * - PORT_CLK_SEL for DDI ports on HSW/BDW.
+	 * - enum skl_dpll on SKL
+	 */
 	uint32_t ddi_pll_sel;
 
 	/* Actual register state of the dpll, for shared dpll cross-checking. */
@@ -387,7 +407,14 @@ struct intel_pipe_wm {
 
 struct intel_mmio_flip {
 	u32 seqno;
-	u32 ring_id;
+	struct intel_engine_cs *ring;
+	struct work_struct work;
+};
+
+struct skl_pipe_wm {
+	struct skl_wm_level wm[8];
+	struct skl_wm_level trans_wm;
+	uint32_t linetime;
 };
 
 struct intel_crtc {
@@ -437,6 +464,8 @@ struct intel_crtc {
 	struct {
 		/* watermarks currently being used  */
 		struct intel_pipe_wm active;
+		/* SKL wm values currently in use */
+		struct skl_pipe_wm skl_active;
 	} wm;
 
 	int scanline_offset;
@@ -529,6 +558,7 @@ struct intel_hdmi {
 	void (*set_infoframes)(struct drm_encoder *encoder,
 			       bool enable,
 			       struct drm_display_mode *adjusted_mode);
+	bool (*infoframe_enabled)(struct drm_encoder *encoder);
 };
 
 struct intel_dp_mst_encoder;
@@ -578,6 +608,7 @@ struct intel_dp {
 	 * this port. Only relevant on VLV/CHV.
 	 */
 	enum pipe pps_pipe;
+	struct edp_power_seq pps_delays;
 
 	bool use_tps3;
 	bool can_mst; /* this port supports mst */
@@ -734,32 +765,47 @@ hdmi_to_dig_port(struct intel_hdmi *intel_hdmi)
 	return container_of(intel_hdmi, struct intel_digital_port, hdmi);
 }
 
+/*
+ * Returns the number of planes for this pipe, ie the number of sprites + 1
+ * (primary plane). This doesn't count the cursor plane then.
+ */
+static inline unsigned int intel_num_planes(struct intel_crtc *crtc)
+{
+	return INTEL_INFO(crtc->base.dev)->num_sprites[crtc->pipe] + 1;
+}
 
-/* i915_irq.c */
-bool intel_set_cpu_fifo_underrun_reporting(struct drm_device *dev,
+/* intel_fifo_underrun.c */
+bool intel_set_cpu_fifo_underrun_reporting(struct drm_i915_private *dev_priv,
 					   enum pipe pipe, bool enable);
-bool intel_set_pch_fifo_underrun_reporting(struct drm_device *dev,
+bool intel_set_pch_fifo_underrun_reporting(struct drm_i915_private *dev_priv,
 					   enum transcoder pch_transcoder,
 					   bool enable);
+void intel_cpu_fifo_underrun_irq_handler(struct drm_i915_private *dev_priv,
+					 enum pipe pipe);
+void intel_pch_fifo_underrun_irq_handler(struct drm_i915_private *dev_priv,
+					 enum transcoder pch_transcoder);
+void i9xx_check_fifo_underruns(struct drm_i915_private *dev_priv);
+
+/* i915_irq.c */
 void gen5_enable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask);
 void gen5_disable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask);
 void gen6_enable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask);
 void gen6_disable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask);
-void gen8_enable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask);
-void gen8_disable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask);
-void intel_runtime_pm_disable_interrupts(struct drm_device *dev);
-void intel_runtime_pm_restore_interrupts(struct drm_device *dev);
+void gen6_reset_rps_interrupts(struct drm_device *dev);
+void gen6_enable_rps_interrupts(struct drm_device *dev);
+void gen6_disable_rps_interrupts(struct drm_device *dev);
+void intel_runtime_pm_disable_interrupts(struct drm_i915_private *dev_priv);
+void intel_runtime_pm_enable_interrupts(struct drm_i915_private *dev_priv);
 static inline bool intel_irqs_enabled(struct drm_i915_private *dev_priv)
 {
 	/*
 	 * We only use drm_irq_uninstall() at unload and VT switch, so
 	 * this is the only thing we need to check.
 	 */
-	return !dev_priv->pm._irqs_disabled;
+	return dev_priv->pm.irqs_enabled;
 }
 
 int intel_get_crtc_scanline(struct intel_crtc *crtc);
-void i9xx_check_fifo_underruns(struct drm_device *dev);
 void gen8_irq_power_well_post_enable(struct drm_i915_private *dev_priv);
 
 /* intel_crt.c */
@@ -792,11 +838,7 @@ void intel_ddi_clock_get(struct intel_encoder *encoder,
 			 struct intel_crtc_config *pipe_config);
 void intel_ddi_set_vc_payload_alloc(struct drm_crtc *crtc, bool state);
 
-/* intel_display.c */
-const char *intel_output_name(int output);
-bool intel_has_pending_fb_unpin(struct drm_device *dev);
-int intel_pch_rawclk(struct drm_device *dev);
-void intel_mark_busy(struct drm_device *dev);
+/* intel_frontbuffer.c */
 void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
 			     struct intel_engine_cs *ring);
 void intel_frontbuffer_flip_prepare(struct drm_device *dev,
@@ -806,7 +848,7 @@ void intel_frontbuffer_flip_complete(struct drm_device *dev,
 void intel_frontbuffer_flush(struct drm_device *dev,
 			     unsigned frontbuffer_bits);
 /**
- * intel_frontbuffer_flip - prepare frontbuffer flip
+ * intel_frontbuffer_flip - synchronous frontbuffer flip
  * @dev: DRM device
  * @frontbuffer_bits: frontbuffer plane tracking bits
  *
@@ -824,6 +866,18 @@ void intel_frontbuffer_flip(struct drm_device *dev,
 }
 
 void intel_fb_obj_flush(struct drm_i915_gem_object *obj, bool retire);
+
+
+/* intel_audio.c */
+void intel_init_audio(struct drm_device *dev);
+void intel_audio_codec_enable(struct intel_encoder *encoder);
+void intel_audio_codec_disable(struct intel_encoder *encoder);
+
+/* intel_display.c */
+const char *intel_output_name(int output);
+bool intel_has_pending_fb_unpin(struct drm_device *dev);
+int intel_pch_rawclk(struct drm_device *dev);
+void intel_mark_busy(struct drm_device *dev);
 void intel_mark_idle(struct drm_device *dev);
 void intel_crtc_restore_mode(struct drm_crtc *crtc);
 void intel_crtc_control(struct drm_crtc *crtc, bool enable);
@@ -844,7 +898,12 @@ int intel_get_pipe_from_crtc_id(struct drm_device *dev, void *data,
 				struct drm_file *file_priv);
 enum transcoder intel_pipe_to_cpu_transcoder(struct drm_i915_private *dev_priv,
 					     enum pipe pipe);
-void intel_wait_for_vblank(struct drm_device *dev, int pipe);
+bool intel_pipe_has_type(struct intel_crtc *crtc, enum intel_output_type type);
+static inline void
+intel_wait_for_vblank(struct drm_device *dev, int pipe)
+{
+	drm_wait_one_vblank(dev, pipe);
+}
 int ironlake_get_lanes_required(int target_clock, int link_bw, int bpp);
 void vlv_wait_port_ready(struct drm_i915_private *dev_priv,
 			 struct intel_digital_port *dport);
@@ -854,8 +913,8 @@ bool intel_get_load_detect_pipe(struct drm_connector *connector,
 				struct drm_modeset_acquire_ctx *ctx);
 void intel_release_load_detect_pipe(struct drm_connector *connector,
 				    struct intel_load_detect_pipe *old);
-int intel_pin_and_fence_fb_obj(struct drm_device *dev,
-			       struct drm_i915_gem_object *obj,
+int intel_pin_and_fence_fb_obj(struct drm_plane *plane,
+			       struct drm_framebuffer *fb,
 			       struct intel_engine_cs *pipelined);
 void intel_unpin_fb_obj(struct drm_i915_gem_object *obj);
 struct drm_framebuffer *
@@ -877,7 +936,13 @@ void assert_shared_dpll(struct drm_i915_private *dev_priv,
 struct intel_shared_dpll *intel_get_shared_dpll(struct intel_crtc *crtc);
 void intel_put_shared_dpll(struct intel_crtc *crtc);
 
+void vlv_force_pll_on(struct drm_device *dev, enum pipe pipe,
+		      const struct dpll *dpll);
+void vlv_force_pll_off(struct drm_device *dev, enum pipe pipe);
+
 /* modesetting asserts */
+void assert_panel_unlocked(struct drm_i915_private *dev_priv,
+			   enum pipe pipe);
 void assert_pll(struct drm_i915_private *dev_priv,
 		enum pipe pipe, bool state);
 #define assert_pll_enabled(d, p) assert_pll(d, p, true)
@@ -889,13 +954,12 @@ void assert_fdi_rx_pll(struct drm_i915_private *dev_priv,
 void assert_pipe(struct drm_i915_private *dev_priv, enum pipe pipe, bool state);
 #define assert_pipe_enabled(d, p) assert_pipe(d, p, true)
 #define assert_pipe_disabled(d, p) assert_pipe(d, p, false)
-void intel_write_eld(struct drm_encoder *encoder,
-		     struct drm_display_mode *mode);
 unsigned long intel_gen4_compute_page_offset(int *x, int *y,
 					     unsigned int tiling_mode,
 					     unsigned int bpp,
 					     unsigned int pitch);
-void intel_display_handle_reset(struct drm_device *dev);
+void intel_prepare_reset(struct drm_device *dev);
+void intel_finish_reset(struct drm_device *dev);
 void hsw_enable_pc8(struct drm_i915_private *dev_priv);
 void hsw_disable_pc8(struct drm_i915_private *dev_priv);
 void intel_dp_get_m_n(struct intel_crtc *crtc,
@@ -908,7 +972,6 @@ ironlake_check_encoder_dotclock(const struct intel_crtc_config *pipe_config,
 bool intel_crtc_active(struct drm_crtc *crtc);
 void hsw_enable_ips(struct intel_crtc *crtc);
 void hsw_disable_ips(struct intel_crtc *crtc);
-void intel_display_set_init_power(struct drm_i915_private *dev, bool enable);
 enum intel_display_power_domain
 intel_display_port_power_domain(struct intel_encoder *intel_encoder);
 void intel_mode_from_pipe_config(struct drm_display_mode *mode,
@@ -936,25 +999,18 @@ bool intel_dp_hpd_pulse(struct intel_digital_port *intel_dig_port,
 void intel_edp_backlight_on(struct intel_dp *intel_dp);
 void intel_edp_backlight_off(struct intel_dp *intel_dp);
 void intel_edp_panel_vdd_on(struct intel_dp *intel_dp);
-void intel_edp_panel_vdd_sanitize(struct intel_encoder *intel_encoder);
 void intel_edp_panel_on(struct intel_dp *intel_dp);
 void intel_edp_panel_off(struct intel_dp *intel_dp);
-void intel_edp_psr_enable(struct intel_dp *intel_dp);
-void intel_edp_psr_disable(struct intel_dp *intel_dp);
 void intel_dp_set_drrs_state(struct drm_device *dev, int refresh_rate);
-void intel_edp_psr_invalidate(struct drm_device *dev,
-			      unsigned frontbuffer_bits);
-void intel_edp_psr_flush(struct drm_device *dev,
-			 unsigned frontbuffer_bits);
-void intel_edp_psr_init(struct drm_device *dev);
-
-int intel_dp_handle_hpd_irq(struct intel_digital_port *digport, bool long_hpd);
 void intel_dp_add_properties(struct intel_dp *intel_dp, struct drm_connector *connector);
 void intel_dp_mst_suspend(struct drm_device *dev);
 void intel_dp_mst_resume(struct drm_device *dev);
 int intel_dp_max_link_bw(struct intel_dp *intel_dp);
 void intel_dp_hot_plug(struct intel_encoder *intel_encoder);
 void vlv_power_sequencer_reset(struct drm_i915_private *dev_priv);
+uint32_t intel_dp_pack_aux(const uint8_t *src, int src_bytes);
+void intel_dp_unpack_aux(uint32_t src, uint8_t *dst, int dst_bytes);
+
 /* intel_dp_mst.c */
 int intel_dp_mst_encoder_init(struct intel_digital_port *intel_dig_port, int conn_id);
 void intel_dp_mst_encoder_cleanup(struct intel_digital_port *intel_dig_port);
@@ -1044,7 +1100,7 @@ void intel_gmch_panel_fitting(struct intel_crtc *crtc,
 			      int fitting_mode);
 void intel_panel_set_backlight_acpi(struct intel_connector *connector,
 				    u32 level, u32 max);
-int intel_panel_setup_backlight(struct drm_connector *connector);
+int intel_panel_setup_backlight(struct drm_connector *connector, enum pipe pipe);
 void intel_panel_enable_backlight(struct intel_connector *connector);
 void intel_panel_disable_backlight(struct intel_connector *connector);
 void intel_panel_destroy_backlight(struct drm_connector *connector);
@@ -1054,6 +1110,41 @@ extern struct drm_display_mode *intel_find_panel_downclock(
 				struct drm_device *dev,
 				struct drm_display_mode *fixed_mode,
 				struct drm_connector *connector);
+void intel_backlight_register(struct drm_device *dev);
+void intel_backlight_unregister(struct drm_device *dev);
+
+
+/* intel_psr.c */
+bool intel_psr_is_enabled(struct drm_device *dev);
+void intel_psr_enable(struct intel_dp *intel_dp);
+void intel_psr_disable(struct intel_dp *intel_dp);
+void intel_psr_invalidate(struct drm_device *dev,
+			      unsigned frontbuffer_bits);
+void intel_psr_flush(struct drm_device *dev,
+			 unsigned frontbuffer_bits);
+void intel_psr_init(struct drm_device *dev);
+
+/* intel_runtime_pm.c */
+int intel_power_domains_init(struct drm_i915_private *);
+void intel_power_domains_fini(struct drm_i915_private *);
+void intel_power_domains_init_hw(struct drm_i915_private *dev_priv);
+void intel_runtime_pm_enable(struct drm_i915_private *dev_priv);
+
+bool intel_display_power_is_enabled(struct drm_i915_private *dev_priv,
+				    enum intel_display_power_domain domain);
+bool __intel_display_power_is_enabled(struct drm_i915_private *dev_priv,
+				      enum intel_display_power_domain domain);
+void intel_display_power_get(struct drm_i915_private *dev_priv,
+			     enum intel_display_power_domain domain);
+void intel_display_power_put(struct drm_i915_private *dev_priv,
+			     enum intel_display_power_domain domain);
+void intel_aux_display_runtime_get(struct drm_i915_private *dev_priv);
+void intel_aux_display_runtime_put(struct drm_i915_private *dev_priv);
+void intel_runtime_pm_get(struct drm_i915_private *dev_priv);
+void intel_runtime_pm_get_noresume(struct drm_i915_private *dev_priv);
+void intel_runtime_pm_put(struct drm_i915_private *dev_priv);
+
+void intel_display_set_init_power(struct drm_i915_private *dev, bool enable);
 
 /* intel_pm.c */
 void intel_init_clock_gating(struct drm_device *dev);
@@ -1072,17 +1163,6 @@ bool intel_fbc_enabled(struct drm_device *dev);
 void intel_update_fbc(struct drm_device *dev);
 void intel_gpu_ips_init(struct drm_i915_private *dev_priv);
 void intel_gpu_ips_teardown(void);
-int intel_power_domains_init(struct drm_i915_private *);
-void intel_power_domains_remove(struct drm_i915_private *);
-bool intel_display_power_enabled(struct drm_i915_private *dev_priv,
-				 enum intel_display_power_domain domain);
-bool intel_display_power_enabled_unlocked(struct drm_i915_private *dev_priv,
-					  enum intel_display_power_domain domain);
-void intel_display_power_get(struct drm_i915_private *dev_priv,
-			     enum intel_display_power_domain domain);
-void intel_display_power_put(struct drm_i915_private *dev_priv,
-			     enum intel_display_power_domain domain);
-void intel_power_domains_init_hw(struct drm_i915_private *dev_priv);
 void intel_init_gt_powersave(struct drm_device *dev);
 void intel_cleanup_gt_powersave(struct drm_device *dev);
 void intel_enable_gt_powersave(struct drm_device *dev);
@@ -1093,14 +1173,10 @@ void ironlake_teardown_rc6(struct drm_device *dev);
 void gen6_update_ring_freq(struct drm_device *dev);
 void gen6_rps_idle(struct drm_i915_private *dev_priv);
 void gen6_rps_boost(struct drm_i915_private *dev_priv);
-void intel_aux_display_runtime_get(struct drm_i915_private *dev_priv);
-void intel_aux_display_runtime_put(struct drm_i915_private *dev_priv);
-void intel_runtime_pm_get(struct drm_i915_private *dev_priv);
-void intel_runtime_pm_get_noresume(struct drm_i915_private *dev_priv);
-void intel_runtime_pm_put(struct drm_i915_private *dev_priv);
-void intel_init_runtime_pm(struct drm_i915_private *dev_priv);
-void intel_fini_runtime_pm(struct drm_i915_private *dev_priv);
 void ilk_wm_get_hw_state(struct drm_device *dev);
+void skl_wm_get_hw_state(struct drm_device *dev);
+void skl_ddb_get_hw_state(struct drm_i915_private *dev_priv,
+			  struct skl_ddb_allocation *ddb /* out */);
 
 
 /* intel_sdvo.c */
@@ -1120,7 +1196,9 @@ int intel_sprite_set_colorkey(struct drm_device *dev, void *data,
 			      struct drm_file *file_priv);
 int intel_sprite_get_colorkey(struct drm_device *dev, void *data,
 			      struct drm_file *file_priv);
-
+bool intel_pipe_update_start(struct intel_crtc *crtc,
+			     uint32_t *start_vbl_count);
+void intel_pipe_update_end(struct intel_crtc *crtc, u32 start_vbl_count);
 
 /* intel_tv.c */
 void intel_tv_init(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
index 5bd9e09ad3c5..0b184079de14 100644
--- a/drivers/gpu/drm/i915/intel_dsi.c
+++ b/drivers/gpu/drm/i915/intel_dsi.c
@@ -344,7 +344,7 @@ static bool intel_dsi_get_hw_state(struct intel_encoder *encoder,
 	DRM_DEBUG_KMS("\n");
 
 	power_domain = intel_display_port_power_domain(encoder);
-	if (!intel_display_power_enabled(dev_priv, power_domain))
+	if (!intel_display_power_is_enabled(dev_priv, power_domain))
 		return false;
 
 	/* XXX: this only works for one DSI output */
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 9b584f3fbb99..850cf7d6578c 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -119,25 +119,25 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
 		goto out;
 	}
 
-	/* Flush everything out, we'll be doing GTT only from now on */
-	ret = intel_pin_and_fence_fb_obj(dev, obj, NULL);
-	if (ret) {
-		DRM_ERROR("failed to pin obj: %d\n", ret);
-		goto out_unref;
-	}
-
 	fb = __intel_framebuffer_create(dev, &mode_cmd, obj);
 	if (IS_ERR(fb)) {
 		ret = PTR_ERR(fb);
-		goto out_unpin;
+		goto out_unref;
+	}
+
+	/* Flush everything out, we'll be doing GTT only from now on */
+	ret = intel_pin_and_fence_fb_obj(NULL, fb, NULL);
+	if (ret) {
+		DRM_ERROR("failed to pin obj: %d\n", ret);
+		goto out_fb;
 	}
 
 	ifbdev->fb = to_intel_framebuffer(fb);
 
 	return 0;
 
-out_unpin:
-	i915_gem_object_ggtt_unpin(obj);
+out_fb:
+	drm_framebuffer_remove(fb);
 out_unref:
 	drm_gem_object_unreference(&obj->base);
 out:
@@ -324,6 +324,7 @@ intel_fb_helper_crtc(struct drm_fb_helper *fb_helper, struct drm_crtc *crtc)
 static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
 				    struct drm_fb_helper_crtc **crtcs,
 				    struct drm_display_mode **modes,
+				    struct drm_fb_offset *offsets,
 				    bool *enabled, int width, int height)
 {
 	struct drm_device *dev = fb_helper->dev;
@@ -332,6 +333,8 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
 	bool fallback = true;
 	int num_connectors_enabled = 0;
 	int num_connectors_detected = 0;
+	uint64_t conn_configured = 0, mask;
+	int pass = 0;
 
 	save_enabled = kcalloc(dev->mode_config.num_connector, sizeof(bool),
 			       GFP_KERNEL);
@@ -339,7 +342,8 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
 		return false;
 
 	memcpy(save_enabled, enabled, dev->mode_config.num_connector);
-
+	mask = (1 << fb_helper->connector_count) - 1;
+retry:
 	for (i = 0; i < fb_helper->connector_count; i++) {
 		struct drm_fb_helper_connector *fb_conn;
 		struct drm_connector *connector;
@@ -349,12 +353,19 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
 		fb_conn = fb_helper->connector_info[i];
 		connector = fb_conn->connector;
 
+		if (conn_configured & (1 << i))
+			continue;
+
+		if (pass == 0 && !connector->has_tile)
+			continue;
+
 		if (connector->status == connector_status_connected)
 			num_connectors_detected++;
 
 		if (!enabled[i]) {
 			DRM_DEBUG_KMS("connector %s not enabled, skipping\n",
 				      connector->name);
+			conn_configured |= (1 << i);
 			continue;
 		}
 
@@ -373,6 +384,7 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
 			DRM_DEBUG_KMS("connector %s has no encoder or crtc, skipping\n",
 				      connector->name);
 			enabled[i] = false;
+			conn_configured |= (1 << i);
 			continue;
 		}
 
@@ -400,8 +412,8 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
 
 		/* try for preferred next */
 		if (!modes[i]) {
-			DRM_DEBUG_KMS("looking for preferred mode on connector %s\n",
-				      connector->name);
+			DRM_DEBUG_KMS("looking for preferred mode on connector %s %d\n",
+				      connector->name, connector->has_tile);
 			modes[i] = drm_has_preferred_mode(fb_conn, width,
 							  height);
 		}
@@ -444,6 +456,12 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
 			      modes[i]->flags & DRM_MODE_FLAG_INTERLACE ? "i" :"");
 
 		fallback = false;
+		conn_configured |= (1 << i);
+	}
+
+	if ((conn_configured & mask) != mask) {
+		pass++;
+		goto retry;
 	}
 
 	/*
diff --git a/drivers/gpu/drm/i915/intel_fifo_underrun.c b/drivers/gpu/drm/i915/intel_fifo_underrun.c
new file mode 100644
index 000000000000..77af512d2d35
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_fifo_underrun.c
@@ -0,0 +1,381 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Daniel Vetter <daniel.vetter@ffwll.ch>
+ *
+ */
+
+#include "i915_drv.h"
+#include "intel_drv.h"
+
+/**
+ * DOC: fifo underrun handling
+ *
+ * The i915 driver checks for display fifo underruns using the interrupt signals
+ * provided by the hardware. This is enabled by default and fairly useful to
+ * debug display issues, especially watermark settings.
+ *
+ * If an underrun is detected this is logged into dmesg. To avoid flooding logs
+ * and occupying the cpu underrun interrupts are disabled after the first
+ * occurrence until the next modeset on a given pipe.
+ *
+ * Note that underrun detection on gmch platforms is a bit more ugly since there
+ * is no interrupt (despite that the signalling bit is in the PIPESTAT pipe
+ * interrupt register). Also on some other platforms underrun interrupts are
+ * shared, which means that if we detect an underrun we need to disable underrun
+ * reporting on all pipes.
+ *
+ * The code also supports underrun detection on the PCH transcoder.
+ */
+
+static bool ivb_can_enable_err_int(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_crtc *crtc;
+	enum pipe pipe;
+
+	assert_spin_locked(&dev_priv->irq_lock);
+
+	for_each_pipe(dev_priv, pipe) {
+		crtc = to_intel_crtc(dev_priv->pipe_to_crtc_mapping[pipe]);
+
+		if (crtc->cpu_fifo_underrun_disabled)
+			return false;
+	}
+
+	return true;
+}
+
+static bool cpt_can_enable_serr_int(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	enum pipe pipe;
+	struct intel_crtc *crtc;
+
+	assert_spin_locked(&dev_priv->irq_lock);
+
+	for_each_pipe(dev_priv, pipe) {
+		crtc = to_intel_crtc(dev_priv->pipe_to_crtc_mapping[pipe]);
+
+		if (crtc->pch_fifo_underrun_disabled)
+			return false;
+	}
+
+	return true;
+}
+
+/**
+ * i9xx_check_fifo_underruns - check for fifo underruns
+ * @dev_priv: i915 device instance
+ *
+ * This function checks for fifo underruns on GMCH platforms. This needs to be
+ * done manually on modeset to make sure that we catch all underruns since they
+ * do not generate an interrupt by themselves on these platforms.
+ */
+void i9xx_check_fifo_underruns(struct drm_i915_private *dev_priv)
+{
+	struct intel_crtc *crtc;
+
+	spin_lock_irq(&dev_priv->irq_lock);
+
+	for_each_intel_crtc(dev_priv->dev, crtc) {
+		u32 reg = PIPESTAT(crtc->pipe);
+		u32 pipestat;
+
+		if (crtc->cpu_fifo_underrun_disabled)
+			continue;
+
+		pipestat = I915_READ(reg) & 0xffff0000;
+		if ((pipestat & PIPE_FIFO_UNDERRUN_STATUS) == 0)
+			continue;
+
+		I915_WRITE(reg, pipestat | PIPE_FIFO_UNDERRUN_STATUS);
+		POSTING_READ(reg);
+
+		DRM_ERROR("pipe %c underrun\n", pipe_name(crtc->pipe));
+	}
+
+	spin_unlock_irq(&dev_priv->irq_lock);
+}
+
+static void i9xx_set_fifo_underrun_reporting(struct drm_device *dev,
+					     enum pipe pipe,
+					     bool enable, bool old)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	u32 reg = PIPESTAT(pipe);
+	u32 pipestat = I915_READ(reg) & 0xffff0000;
+
+	assert_spin_locked(&dev_priv->irq_lock);
+
+	if (enable) {
+		I915_WRITE(reg, pipestat | PIPE_FIFO_UNDERRUN_STATUS);
+		POSTING_READ(reg);
+	} else {
+		if (old && pipestat & PIPE_FIFO_UNDERRUN_STATUS)
+			DRM_ERROR("pipe %c underrun\n", pipe_name(pipe));
+	}
+}
+
+static void ironlake_set_fifo_underrun_reporting(struct drm_device *dev,
+						 enum pipe pipe, bool enable)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	uint32_t bit = (pipe == PIPE_A) ? DE_PIPEA_FIFO_UNDERRUN :
+					  DE_PIPEB_FIFO_UNDERRUN;
+
+	if (enable)
+		ironlake_enable_display_irq(dev_priv, bit);
+	else
+		ironlake_disable_display_irq(dev_priv, bit);
+}
+
+static void ivybridge_set_fifo_underrun_reporting(struct drm_device *dev,
+						  enum pipe pipe,
+						  bool enable, bool old)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	if (enable) {
+		I915_WRITE(GEN7_ERR_INT, ERR_INT_FIFO_UNDERRUN(pipe));
+
+		if (!ivb_can_enable_err_int(dev))
+			return;
+
+		ironlake_enable_display_irq(dev_priv, DE_ERR_INT_IVB);
+	} else {
+		ironlake_disable_display_irq(dev_priv, DE_ERR_INT_IVB);
+
+		if (old &&
+		    I915_READ(GEN7_ERR_INT) & ERR_INT_FIFO_UNDERRUN(pipe)) {
+			DRM_ERROR("uncleared fifo underrun on pipe %c\n",
+				  pipe_name(pipe));
+		}
+	}
+}
+
+static void broadwell_set_fifo_underrun_reporting(struct drm_device *dev,
+						  enum pipe pipe, bool enable)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	assert_spin_locked(&dev_priv->irq_lock);
+
+	if (enable)
+		dev_priv->de_irq_mask[pipe] &= ~GEN8_PIPE_FIFO_UNDERRUN;
+	else
+		dev_priv->de_irq_mask[pipe] |= GEN8_PIPE_FIFO_UNDERRUN;
+	I915_WRITE(GEN8_DE_PIPE_IMR(pipe), dev_priv->de_irq_mask[pipe]);
+	POSTING_READ(GEN8_DE_PIPE_IMR(pipe));
+}
+
+static void ibx_set_fifo_underrun_reporting(struct drm_device *dev,
+					    enum transcoder pch_transcoder,
+					    bool enable)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	uint32_t bit = (pch_transcoder == TRANSCODER_A) ?
+		       SDE_TRANSA_FIFO_UNDER : SDE_TRANSB_FIFO_UNDER;
+
+	if (enable)
+		ibx_enable_display_interrupt(dev_priv, bit);
+	else
+		ibx_disable_display_interrupt(dev_priv, bit);
+}
+
+static void cpt_set_fifo_underrun_reporting(struct drm_device *dev,
+					    enum transcoder pch_transcoder,
+					    bool enable, bool old)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (enable) {
+		I915_WRITE(SERR_INT,
+			   SERR_INT_TRANS_FIFO_UNDERRUN(pch_transcoder));
+
+		if (!cpt_can_enable_serr_int(dev))
+			return;
+
+		ibx_enable_display_interrupt(dev_priv, SDE_ERROR_CPT);
+	} else {
+		ibx_disable_display_interrupt(dev_priv, SDE_ERROR_CPT);
+
+		if (old && I915_READ(SERR_INT) &
+		    SERR_INT_TRANS_FIFO_UNDERRUN(pch_transcoder)) {
+			DRM_ERROR("uncleared pch fifo underrun on pch transcoder %c\n",
+				  transcoder_name(pch_transcoder));
+		}
+	}
+}
+
+static bool __intel_set_cpu_fifo_underrun_reporting(struct drm_device *dev,
+						    enum pipe pipe, bool enable)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	bool old;
+
+	assert_spin_locked(&dev_priv->irq_lock);
+
+	old = !intel_crtc->cpu_fifo_underrun_disabled;
+	intel_crtc->cpu_fifo_underrun_disabled = !enable;
+
+	if (HAS_GMCH_DISPLAY(dev))
+		i9xx_set_fifo_underrun_reporting(dev, pipe, enable, old);
+	else if (IS_GEN5(dev) || IS_GEN6(dev))
+		ironlake_set_fifo_underrun_reporting(dev, pipe, enable);
+	else if (IS_GEN7(dev))
+		ivybridge_set_fifo_underrun_reporting(dev, pipe, enable, old);
+	else if (IS_GEN8(dev) || IS_GEN9(dev))
+		broadwell_set_fifo_underrun_reporting(dev, pipe, enable);
+
+	return old;
+}
+
+/**
+ * intel_set_cpu_fifo_underrun_reporting - set cpu fifo underrrun reporting state
+ * @dev_priv: i915 device instance
+ * @pipe: (CPU) pipe to set state for
+ * @enable: whether underruns should be reported or not
+ *
+ * This function sets the fifo underrun state for @pipe. It is used in the
+ * modeset code to avoid false positives since on many platforms underruns are
+ * expected when disabling or enabling the pipe.
+ *
+ * Notice that on some platforms disabling underrun reports for one pipe
+ * disables for all due to shared interrupts. Actual reporting is still per-pipe
+ * though.
+ *
+ * Returns the previous state of underrun reporting.
+ */
+bool intel_set_cpu_fifo_underrun_reporting(struct drm_i915_private *dev_priv,
+					   enum pipe pipe, bool enable)
+{
+	unsigned long flags;
+	bool ret;
+
+	spin_lock_irqsave(&dev_priv->irq_lock, flags);
+	ret = __intel_set_cpu_fifo_underrun_reporting(dev_priv->dev, pipe,
+						      enable);
+	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+
+	return ret;
+}
+
+static bool
+__cpu_fifo_underrun_reporting_enabled(struct drm_i915_private *dev_priv,
+				      enum pipe pipe)
+{
+	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+
+	return !intel_crtc->cpu_fifo_underrun_disabled;
+}
+
+/**
+ * intel_set_pch_fifo_underrun_reporting - set PCH fifo underrun reporting state
+ * @dev_priv: i915 device instance
+ * @pch_transcoder: the PCH transcoder (same as pipe on IVB and older)
+ * @enable: whether underruns should be reported or not
+ *
+ * This function makes us disable or enable PCH fifo underruns for a specific
+ * PCH transcoder. Notice that on some PCHs (e.g. CPT/PPT), disabling FIFO
+ * underrun reporting for one transcoder may also disable all the other PCH
+ * error interruts for the other transcoders, due to the fact that there's just
+ * one interrupt mask/enable bit for all the transcoders.
+ *
+ * Returns the previous state of underrun reporting.
+ */
+bool intel_set_pch_fifo_underrun_reporting(struct drm_i915_private *dev_priv,
+					   enum transcoder pch_transcoder,
+					   bool enable)
+{
+	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pch_transcoder];
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	unsigned long flags;
+	bool old;
+
+	/*
+	 * NOTE: Pre-LPT has a fixed cpu pipe -> pch transcoder mapping, but LPT
+	 * has only one pch transcoder A that all pipes can use. To avoid racy
+	 * pch transcoder -> pipe lookups from interrupt code simply store the
+	 * underrun statistics in crtc A. Since we never expose this anywhere
+	 * nor use it outside of the fifo underrun code here using the "wrong"
+	 * crtc on LPT won't cause issues.
+	 */
+
+	spin_lock_irqsave(&dev_priv->irq_lock, flags);
+
+	old = !intel_crtc->pch_fifo_underrun_disabled;
+	intel_crtc->pch_fifo_underrun_disabled = !enable;
+
+	if (HAS_PCH_IBX(dev_priv->dev))
+		ibx_set_fifo_underrun_reporting(dev_priv->dev, pch_transcoder,
+						enable);
+	else
+		cpt_set_fifo_underrun_reporting(dev_priv->dev, pch_transcoder,
+						enable, old);
+
+	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	return old;
+}
+
+/**
+ * intel_pch_fifo_underrun_irq_handler - handle PCH fifo underrun interrupt
+ * @dev_priv: i915 device instance
+ * @pipe: (CPU) pipe to set state for
+ *
+ * This handles a CPU fifo underrun interrupt, generating an underrun warning
+ * into dmesg if underrun reporting is enabled and then disables the underrun
+ * interrupt to avoid an irq storm.
+ */
+void intel_cpu_fifo_underrun_irq_handler(struct drm_i915_private *dev_priv,
+					 enum pipe pipe)
+{
+	/* GMCH can't disable fifo underruns, filter them. */
+	if (HAS_GMCH_DISPLAY(dev_priv->dev) &&
+	    !__cpu_fifo_underrun_reporting_enabled(dev_priv, pipe))
+		return;
+
+	if (intel_set_cpu_fifo_underrun_reporting(dev_priv, pipe, false))
+		DRM_ERROR("CPU pipe %c FIFO underrun\n",
+			  pipe_name(pipe));
+}
+
+/**
+ * intel_pch_fifo_underrun_irq_handler - handle PCH fifo underrun interrupt
+ * @dev_priv: i915 device instance
+ * @pch_transcoder: the PCH transcoder (same as pipe on IVB and older)
+ *
+ * This handles a PCH fifo underrun interrupt, generating an underrun warning
+ * into dmesg if underrun reporting is enabled and then disables the underrun
+ * interrupt to avoid an irq storm.
+ */
+void intel_pch_fifo_underrun_irq_handler(struct drm_i915_private *dev_priv,
+					 enum transcoder pch_transcoder)
+{
+	if (intel_set_pch_fifo_underrun_reporting(dev_priv, pch_transcoder,
+						  false))
+		DRM_ERROR("PCH transcoder %c FIFO underrun\n",
+			  transcoder_name(pch_transcoder));
+}
diff --git a/drivers/gpu/drm/i915/intel_frontbuffer.c b/drivers/gpu/drm/i915/intel_frontbuffer.c
new file mode 100644
index 000000000000..79f6d72179c5
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_frontbuffer.c
@@ -0,0 +1,279 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *	Daniel Vetter <daniel.vetter@ffwll.ch>
+ */
+
+/**
+ * DOC: frontbuffer tracking
+ *
+ * Many features require us to track changes to the currently active
+ * frontbuffer, especially rendering targeted at the frontbuffer.
+ *
+ * To be able to do so GEM tracks frontbuffers using a bitmask for all possible
+ * frontbuffer slots through i915_gem_track_fb(). The function in this file are
+ * then called when the contents of the frontbuffer are invalidated, when
+ * frontbuffer rendering has stopped again to flush out all the changes and when
+ * the frontbuffer is exchanged with a flip. Subsystems interested in
+ * frontbuffer changes (e.g. PSR, FBC, DRRS) should directly put their callbacks
+ * into the relevant places and filter for the frontbuffer slots that they are
+ * interested int.
+ *
+ * On a high level there are two types of powersaving features. The first one
+ * work like a special cache (FBC and PSR) and are interested when they should
+ * stop caching and when to restart caching. This is done by placing callbacks
+ * into the invalidate and the flush functions: At invalidate the caching must
+ * be stopped and at flush time it can be restarted. And maybe they need to know
+ * when the frontbuffer changes (e.g. when the hw doesn't initiate an invalidate
+ * and flush on its own) which can be achieved with placing callbacks into the
+ * flip functions.
+ *
+ * The other type of display power saving feature only cares about busyness
+ * (e.g. DRRS). In that case all three (invalidate, flush and flip) indicate
+ * busyness. There is no direct way to detect idleness. Instead an idle timer
+ * work delayed work should be started from the flush and flip functions and
+ * cancelled as soon as busyness is detected.
+ *
+ * Note that there's also an older frontbuffer activity tracking scheme which
+ * just tracks general activity. This is done by the various mark_busy and
+ * mark_idle functions. For display power management features using these
+ * functions is deprecated and should be avoided.
+ */
+
+#include <drm/drmP.h>
+
+#include "intel_drv.h"
+#include "i915_drv.h"
+
+static void intel_increase_pllclock(struct drm_device *dev,
+				    enum pipe pipe)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int dpll_reg = DPLL(pipe);
+	int dpll;
+
+	if (!HAS_GMCH_DISPLAY(dev))
+		return;
+
+	if (!dev_priv->lvds_downclock_avail)
+		return;
+
+	dpll = I915_READ(dpll_reg);
+	if (!HAS_PIPE_CXSR(dev) && (dpll & DISPLAY_RATE_SELECT_FPA1)) {
+		DRM_DEBUG_DRIVER("upclocking LVDS\n");
+
+		assert_panel_unlocked(dev_priv, pipe);
+
+		dpll &= ~DISPLAY_RATE_SELECT_FPA1;
+		I915_WRITE(dpll_reg, dpll);
+		intel_wait_for_vblank(dev, pipe);
+
+		dpll = I915_READ(dpll_reg);
+		if (dpll & DISPLAY_RATE_SELECT_FPA1)
+			DRM_DEBUG_DRIVER("failed to upclock LVDS!\n");
+	}
+}
+
+/**
+ * intel_mark_fb_busy - mark given planes as busy
+ * @dev: DRM device
+ * @frontbuffer_bits: bits for the affected planes
+ * @ring: optional ring for asynchronous commands
+ *
+ * This function gets called every time the screen contents change. It can be
+ * used to keep e.g. the update rate at the nominal refresh rate with DRRS.
+ */
+static void intel_mark_fb_busy(struct drm_device *dev,
+			       unsigned frontbuffer_bits,
+			       struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	enum pipe pipe;
+
+	if (!i915.powersave)
+		return;
+
+	for_each_pipe(dev_priv, pipe) {
+		if (!(frontbuffer_bits & INTEL_FRONTBUFFER_ALL_MASK(pipe)))
+			continue;
+
+		intel_increase_pllclock(dev, pipe);
+		if (ring && intel_fbc_enabled(dev))
+			ring->fbc_dirty = true;
+	}
+}
+
+/**
+ * intel_fb_obj_invalidate - invalidate frontbuffer object
+ * @obj: GEM object to invalidate
+ * @ring: set for asynchronous rendering
+ *
+ * This function gets called every time rendering on the given object starts and
+ * frontbuffer caching (fbc, low refresh rate for DRRS, panel self refresh) must
+ * be invalidated. If @ring is non-NULL any subsequent invalidation will be delayed
+ * until the rendering completes or a flip on this frontbuffer plane is
+ * scheduled.
+ */
+void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
+			     struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = obj->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
+
+	if (!obj->frontbuffer_bits)
+		return;
+
+	if (ring) {
+		mutex_lock(&dev_priv->fb_tracking.lock);
+		dev_priv->fb_tracking.busy_bits
+			|= obj->frontbuffer_bits;
+		dev_priv->fb_tracking.flip_bits
+			&= ~obj->frontbuffer_bits;
+		mutex_unlock(&dev_priv->fb_tracking.lock);
+	}
+
+	intel_mark_fb_busy(dev, obj->frontbuffer_bits, ring);
+
+	intel_psr_invalidate(dev, obj->frontbuffer_bits);
+}
+
+/**
+ * intel_frontbuffer_flush - flush frontbuffer
+ * @dev: DRM device
+ * @frontbuffer_bits: frontbuffer plane tracking bits
+ *
+ * This function gets called every time rendering on the given planes has
+ * completed and frontbuffer caching can be started again. Flushes will get
+ * delayed if they're blocked by some outstanding asynchronous rendering.
+ *
+ * Can be called without any locks held.
+ */
+void intel_frontbuffer_flush(struct drm_device *dev,
+			     unsigned frontbuffer_bits)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	/* Delay flushing when rings are still busy.*/
+	mutex_lock(&dev_priv->fb_tracking.lock);
+	frontbuffer_bits &= ~dev_priv->fb_tracking.busy_bits;
+	mutex_unlock(&dev_priv->fb_tracking.lock);
+
+	intel_mark_fb_busy(dev, frontbuffer_bits, NULL);
+
+	intel_psr_flush(dev, frontbuffer_bits);
+
+	/*
+	 * FIXME: Unconditional fbc flushing here is a rather gross hack and
+	 * needs to be reworked into a proper frontbuffer tracking scheme like
+	 * psr employs.
+	 */
+	if (dev_priv->fbc.need_sw_cache_clean) {
+		dev_priv->fbc.need_sw_cache_clean = false;
+		bdw_fbc_sw_flush(dev, FBC_REND_CACHE_CLEAN);
+	}
+}
+
+/**
+ * intel_fb_obj_flush - flush frontbuffer object
+ * @obj: GEM object to flush
+ * @retire: set when retiring asynchronous rendering
+ *
+ * This function gets called every time rendering on the given object has
+ * completed and frontbuffer caching can be started again. If @retire is true
+ * then any delayed flushes will be unblocked.
+ */
+void intel_fb_obj_flush(struct drm_i915_gem_object *obj,
+			bool retire)
+{
+	struct drm_device *dev = obj->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	unsigned frontbuffer_bits;
+
+	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
+
+	if (!obj->frontbuffer_bits)
+		return;
+
+	frontbuffer_bits = obj->frontbuffer_bits;
+
+	if (retire) {
+		mutex_lock(&dev_priv->fb_tracking.lock);
+		/* Filter out new bits since rendering started. */
+		frontbuffer_bits &= dev_priv->fb_tracking.busy_bits;
+
+		dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits;
+		mutex_unlock(&dev_priv->fb_tracking.lock);
+	}
+
+	intel_frontbuffer_flush(dev, frontbuffer_bits);
+}
+
+/**
+ * intel_frontbuffer_flip_prepare - prepare asynchronous frontbuffer flip
+ * @dev: DRM device
+ * @frontbuffer_bits: frontbuffer plane tracking bits
+ *
+ * This function gets called after scheduling a flip on @obj. The actual
+ * frontbuffer flushing will be delayed until completion is signalled with
+ * intel_frontbuffer_flip_complete. If an invalidate happens in between this
+ * flush will be cancelled.
+ *
+ * Can be called without any locks held.
+ */
+void intel_frontbuffer_flip_prepare(struct drm_device *dev,
+				    unsigned frontbuffer_bits)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	mutex_lock(&dev_priv->fb_tracking.lock);
+	dev_priv->fb_tracking.flip_bits |= frontbuffer_bits;
+	/* Remove stale busy bits due to the old buffer. */
+	dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits;
+	mutex_unlock(&dev_priv->fb_tracking.lock);
+}
+
+/**
+ * intel_frontbuffer_flip_complete - complete asynchronous frontbuffer flip
+ * @dev: DRM device
+ * @frontbuffer_bits: frontbuffer plane tracking bits
+ *
+ * This function gets called after the flip has been latched and will complete
+ * on the next vblank. It will execute the flush if it hasn't been cancelled yet.
+ *
+ * Can be called without any locks held.
+ */
+void intel_frontbuffer_flip_complete(struct drm_device *dev,
+				     unsigned frontbuffer_bits)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	mutex_lock(&dev_priv->fb_tracking.lock);
+	/* Mask any cancelled flips. */
+	frontbuffer_bits &= dev_priv->fb_tracking.flip_bits;
+	dev_priv->fb_tracking.flip_bits &= ~frontbuffer_bits;
+	mutex_unlock(&dev_priv->fb_tracking.lock);
+
+	intel_frontbuffer_flush(dev, frontbuffer_bits);
+}
diff --git a/drivers/gpu/drm/i915/intel_hdmi.c b/drivers/gpu/drm/i915/intel_hdmi.c
index 29ec1535992d..3abc2000fce9 100644
--- a/drivers/gpu/drm/i915/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/intel_hdmi.c
@@ -166,6 +166,19 @@ static void g4x_write_infoframe(struct drm_encoder *encoder,
 	POSTING_READ(VIDEO_DIP_CTL);
 }
 
+static bool g4x_infoframe_enabled(struct drm_encoder *encoder)
+{
+	struct drm_device *dev = encoder->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_digital_port *intel_dig_port = enc_to_dig_port(encoder);
+	u32 val = I915_READ(VIDEO_DIP_CTL);
+
+	if (VIDEO_DIP_PORT(intel_dig_port->port) == (val & VIDEO_DIP_PORT_MASK))
+		return val & VIDEO_DIP_ENABLE;
+
+	return false;
+}
+
 static void ibx_write_infoframe(struct drm_encoder *encoder,
 				enum hdmi_infoframe_type type,
 				const void *frame, ssize_t len)
@@ -204,6 +217,17 @@ static void ibx_write_infoframe(struct drm_encoder *encoder,
 	POSTING_READ(reg);
 }
 
+static bool ibx_infoframe_enabled(struct drm_encoder *encoder)
+{
+	struct drm_device *dev = encoder->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_crtc *intel_crtc = to_intel_crtc(encoder->crtc);
+	int reg = TVIDEO_DIP_CTL(intel_crtc->pipe);
+	u32 val = I915_READ(reg);
+
+	return val & VIDEO_DIP_ENABLE;
+}
+
 static void cpt_write_infoframe(struct drm_encoder *encoder,
 				enum hdmi_infoframe_type type,
 				const void *frame, ssize_t len)
@@ -245,6 +269,17 @@ static void cpt_write_infoframe(struct drm_encoder *encoder,
 	POSTING_READ(reg);
 }
 
+static bool cpt_infoframe_enabled(struct drm_encoder *encoder)
+{
+	struct drm_device *dev = encoder->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_crtc *intel_crtc = to_intel_crtc(encoder->crtc);
+	int reg = TVIDEO_DIP_CTL(intel_crtc->pipe);
+	u32 val = I915_READ(reg);
+
+	return val & VIDEO_DIP_ENABLE;
+}
+
 static void vlv_write_infoframe(struct drm_encoder *encoder,
 				enum hdmi_infoframe_type type,
 				const void *frame, ssize_t len)
@@ -283,6 +318,17 @@ static void vlv_write_infoframe(struct drm_encoder *encoder,
 	POSTING_READ(reg);
 }
 
+static bool vlv_infoframe_enabled(struct drm_encoder *encoder)
+{
+	struct drm_device *dev = encoder->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_crtc *intel_crtc = to_intel_crtc(encoder->crtc);
+	int reg = VLV_TVIDEO_DIP_CTL(intel_crtc->pipe);
+	u32 val = I915_READ(reg);
+
+	return val & VIDEO_DIP_ENABLE;
+}
+
 static void hsw_write_infoframe(struct drm_encoder *encoder,
 				enum hdmi_infoframe_type type,
 				const void *frame, ssize_t len)
@@ -320,6 +366,18 @@ static void hsw_write_infoframe(struct drm_encoder *encoder,
 	POSTING_READ(ctl_reg);
 }
 
+static bool hsw_infoframe_enabled(struct drm_encoder *encoder)
+{
+	struct drm_device *dev = encoder->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_crtc *intel_crtc = to_intel_crtc(encoder->crtc);
+	u32 ctl_reg = HSW_TVIDEO_DIP_CTL(intel_crtc->config.cpu_transcoder);
+	u32 val = I915_READ(ctl_reg);
+
+	return val & (VIDEO_DIP_ENABLE_AVI_HSW | VIDEO_DIP_ENABLE_SPD_HSW |
+		      VIDEO_DIP_ENABLE_VS_HSW);
+}
+
 /*
  * The data we write to the DIP data buffer registers is 1 byte bigger than the
  * HDMI infoframe size because of an ECC/reserved byte at position 3 (starting
@@ -661,14 +719,6 @@ static void intel_hdmi_prepare(struct intel_encoder *encoder)
 	if (crtc->config.has_hdmi_sink)
 		hdmi_val |= HDMI_MODE_SELECT_HDMI;
 
-	if (crtc->config.has_audio) {
-		WARN_ON(!crtc->config.has_hdmi_sink);
-		DRM_DEBUG_DRIVER("Enabling HDMI audio on pipe %c\n",
-				 pipe_name(crtc->pipe));
-		hdmi_val |= SDVO_AUDIO_ENABLE;
-		intel_write_eld(&encoder->base, adjusted_mode);
-	}
-
 	if (HAS_PCH_CPT(dev))
 		hdmi_val |= SDVO_PIPE_SEL_CPT(crtc->pipe);
 	else if (IS_CHERRYVIEW(dev))
@@ -690,7 +740,7 @@ static bool intel_hdmi_get_hw_state(struct intel_encoder *encoder,
 	u32 tmp;
 
 	power_domain = intel_display_port_power_domain(encoder);
-	if (!intel_display_power_enabled(dev_priv, power_domain))
+	if (!intel_display_power_is_enabled(dev_priv, power_domain))
 		return false;
 
 	tmp = I915_READ(intel_hdmi->hdmi_reg);
@@ -732,6 +782,9 @@ static void intel_hdmi_get_config(struct intel_encoder *encoder,
 	if (tmp & HDMI_MODE_SELECT_HDMI)
 		pipe_config->has_hdmi_sink = true;
 
+	if (intel_hdmi->infoframe_enabled(&encoder->base))
+		pipe_config->has_infoframe = true;
+
 	if (tmp & SDVO_AUDIO_ENABLE)
 		pipe_config->has_audio = true;
 
@@ -791,6 +844,13 @@ static void intel_enable_hdmi(struct intel_encoder *encoder)
 		I915_WRITE(intel_hdmi->hdmi_reg, temp);
 		POSTING_READ(intel_hdmi->hdmi_reg);
 	}
+
+	if (intel_crtc->config.has_audio) {
+		WARN_ON(!intel_crtc->config.has_hdmi_sink);
+		DRM_DEBUG_DRIVER("Enabling HDMI audio on pipe %c\n",
+				 pipe_name(intel_crtc->pipe));
+		intel_audio_codec_enable(encoder);
+	}
 }
 
 static void vlv_enable_hdmi(struct intel_encoder *encoder)
@@ -802,9 +862,13 @@ static void intel_disable_hdmi(struct intel_encoder *encoder)
 	struct drm_device *dev = encoder->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_hdmi *intel_hdmi = enc_to_intel_hdmi(&encoder->base);
+	struct intel_crtc *crtc = to_intel_crtc(encoder->base.crtc);
 	u32 temp;
 	u32 enable_bits = SDVO_ENABLE | SDVO_AUDIO_ENABLE;
 
+	if (crtc->config.has_audio)
+		intel_audio_codec_disable(encoder);
+
 	temp = I915_READ(intel_hdmi->hdmi_reg);
 
 	/* HW workaround for IBX, we need to move the port to transcoder A
@@ -922,6 +986,9 @@ bool intel_hdmi_compute_config(struct intel_encoder *encoder,
 
 	pipe_config->has_hdmi_sink = intel_hdmi->has_hdmi_sink;
 
+	if (pipe_config->has_hdmi_sink)
+		pipe_config->has_infoframe = true;
+
 	if (intel_hdmi->color_range_auto) {
 		/* See CEA-861-E - 5.1 Default Encoding Parameters */
 		if (pipe_config->has_hdmi_sink &&
@@ -1394,10 +1461,13 @@ static void chv_hdmi_post_disable(struct intel_encoder *encoder)
 static void chv_hdmi_pre_enable(struct intel_encoder *encoder)
 {
 	struct intel_digital_port *dport = enc_to_dig_port(&encoder->base);
+	struct intel_hdmi *intel_hdmi = &dport->hdmi;
 	struct drm_device *dev = encoder->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_crtc *intel_crtc =
 		to_intel_crtc(encoder->base.crtc);
+	struct drm_display_mode *adjusted_mode =
+		&intel_crtc->config.adjusted_mode;
 	enum dpio_channel ch = vlv_dport_to_channel(dport);
 	int pipe = intel_crtc->pipe;
 	int data, i;
@@ -1405,6 +1475,15 @@ static void chv_hdmi_pre_enable(struct intel_encoder *encoder)
 
 	mutex_lock(&dev_priv->dpio_lock);
 
+	/* allow hardware to manage TX FIFO reset source */
+	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS01_DW11(ch));
+	val &= ~DPIO_LANEDESKEW_STRAP_OVRD;
+	vlv_dpio_write(dev_priv, pipe, VLV_PCS01_DW11(ch), val);
+
+	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS23_DW11(ch));
+	val &= ~DPIO_LANEDESKEW_STRAP_OVRD;
+	vlv_dpio_write(dev_priv, pipe, VLV_PCS23_DW11(ch), val);
+
 	/* Deassert soft data lane reset*/
 	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS01_DW1(ch));
 	val |= CHV_PCS_REQ_SOFTRESET_EN;
@@ -1441,12 +1520,26 @@ static void chv_hdmi_pre_enable(struct intel_encoder *encoder)
 	/* Clear calc init */
 	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS01_DW10(ch));
 	val &= ~(DPIO_PCS_SWING_CALC_TX0_TX2 | DPIO_PCS_SWING_CALC_TX1_TX3);
+	val &= ~(DPIO_PCS_TX1DEEMP_MASK | DPIO_PCS_TX2DEEMP_MASK);
+	val |= DPIO_PCS_TX1DEEMP_9P5 | DPIO_PCS_TX2DEEMP_9P5;
 	vlv_dpio_write(dev_priv, pipe, VLV_PCS01_DW10(ch), val);
 
 	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS23_DW10(ch));
 	val &= ~(DPIO_PCS_SWING_CALC_TX0_TX2 | DPIO_PCS_SWING_CALC_TX1_TX3);
+	val &= ~(DPIO_PCS_TX1DEEMP_MASK | DPIO_PCS_TX2DEEMP_MASK);
+	val |= DPIO_PCS_TX1DEEMP_9P5 | DPIO_PCS_TX2DEEMP_9P5;
 	vlv_dpio_write(dev_priv, pipe, VLV_PCS23_DW10(ch), val);
 
+	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS01_DW9(ch));
+	val &= ~(DPIO_PCS_TX1MARGIN_MASK | DPIO_PCS_TX2MARGIN_MASK);
+	val |= DPIO_PCS_TX1MARGIN_000 | DPIO_PCS_TX2MARGIN_000;
+	vlv_dpio_write(dev_priv, pipe, VLV_PCS01_DW9(ch), val);
+
+	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS23_DW9(ch));
+	val &= ~(DPIO_PCS_TX1MARGIN_MASK | DPIO_PCS_TX2MARGIN_MASK);
+	val |= DPIO_PCS_TX1MARGIN_000 | DPIO_PCS_TX2MARGIN_000;
+	vlv_dpio_write(dev_priv, pipe, VLV_PCS23_DW9(ch), val);
+
 	/* FIXME: Program the support xxx V-dB */
 	/* Use 800mV-0dB */
 	for (i = 0; i < 4; i++) {
@@ -1499,6 +1592,10 @@ static void chv_hdmi_pre_enable(struct intel_encoder *encoder)
 
 	mutex_unlock(&dev_priv->dpio_lock);
 
+	intel_hdmi->set_infoframes(&encoder->base,
+				   intel_crtc->config.has_hdmi_sink,
+				   adjusted_mode);
+
 	intel_enable_hdmi(encoder);
 
 	vlv_wait_port_ready(dev_priv, dport);
@@ -1593,18 +1690,23 @@ void intel_hdmi_init_connector(struct intel_digital_port *intel_dig_port,
 	if (IS_VALLEYVIEW(dev)) {
 		intel_hdmi->write_infoframe = vlv_write_infoframe;
 		intel_hdmi->set_infoframes = vlv_set_infoframes;
+		intel_hdmi->infoframe_enabled = vlv_infoframe_enabled;
 	} else if (IS_G4X(dev)) {
 		intel_hdmi->write_infoframe = g4x_write_infoframe;
 		intel_hdmi->set_infoframes = g4x_set_infoframes;
+		intel_hdmi->infoframe_enabled = g4x_infoframe_enabled;
 	} else if (HAS_DDI(dev)) {
 		intel_hdmi->write_infoframe = hsw_write_infoframe;
 		intel_hdmi->set_infoframes = hsw_set_infoframes;
+		intel_hdmi->infoframe_enabled = hsw_infoframe_enabled;
 	} else if (HAS_PCH_IBX(dev)) {
 		intel_hdmi->write_infoframe = ibx_write_infoframe;
 		intel_hdmi->set_infoframes = ibx_set_infoframes;
+		intel_hdmi->infoframe_enabled = ibx_infoframe_enabled;
 	} else {
 		intel_hdmi->write_infoframe = cpt_write_infoframe;
 		intel_hdmi->set_infoframes = cpt_set_infoframes;
+		intel_hdmi->infoframe_enabled = cpt_infoframe_enabled;
 	}
 
 	if (HAS_DDI(dev))
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index bafd38b5703e..e588376227ea 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -136,11 +136,10 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 
+#define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE)
 #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
 #define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
 
-#define GEN8_LR_CONTEXT_ALIGN 4096
-
 #define RING_EXECLIST_QFULL		(1 << 0x2)
 #define RING_EXECLIST1_VALID		(1 << 0x3)
 #define RING_EXECLIST0_VALID		(1 << 0x4)
@@ -204,6 +203,9 @@ enum {
 };
 #define GEN8_CTX_ID_SHIFT 32
 
+static int intel_lr_context_pin(struct intel_engine_cs *ring,
+		struct intel_context *ctx);
+
 /**
  * intel_sanitize_enable_execlists() - sanitize i915.enable_execlists
  * @dev: DRM device.
@@ -219,6 +221,9 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 {
 	WARN_ON(i915.enable_ppgtt == -1);
 
+	if (INTEL_INFO(dev)->gen >= 9)
+		return 1;
+
 	if (enable_execlists == 0)
 		return 0;
 
@@ -275,7 +280,8 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 				 struct drm_i915_gem_object *ctx_obj0,
 				 struct drm_i915_gem_object *ctx_obj1)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	uint64_t temp = 0;
 	uint32_t desc[4];
 	unsigned long flags;
@@ -300,13 +306,18 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	 * Instead, we do the runtime_pm_get/put when creating/destroying requests.
 	 */
 	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
-	if (IS_CHERRYVIEW(dev_priv->dev)) {
+	if (IS_CHERRYVIEW(dev) || INTEL_INFO(dev)->gen >= 9) {
 		if (dev_priv->uncore.fw_rendercount++ == 0)
 			dev_priv->uncore.funcs.force_wake_get(dev_priv,
 							      FORCEWAKE_RENDER);
 		if (dev_priv->uncore.fw_mediacount++ == 0)
 			dev_priv->uncore.funcs.force_wake_get(dev_priv,
 							      FORCEWAKE_MEDIA);
+		if (INTEL_INFO(dev)->gen >= 9) {
+			if (dev_priv->uncore.fw_blittercount++ == 0)
+				dev_priv->uncore.funcs.force_wake_get(dev_priv,
+							FORCEWAKE_BLITTER);
+		}
 	} else {
 		if (dev_priv->uncore.forcewake_count++ == 0)
 			dev_priv->uncore.funcs.force_wake_get(dev_priv,
@@ -325,13 +336,18 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 
 	/* Release Force Wakeup (see the big comment above). */
 	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
-	if (IS_CHERRYVIEW(dev_priv->dev)) {
+	if (IS_CHERRYVIEW(dev) || INTEL_INFO(dev)->gen >= 9) {
 		if (--dev_priv->uncore.fw_rendercount == 0)
 			dev_priv->uncore.funcs.force_wake_put(dev_priv,
 							      FORCEWAKE_RENDER);
 		if (--dev_priv->uncore.fw_mediacount == 0)
 			dev_priv->uncore.funcs.force_wake_put(dev_priv,
 							      FORCEWAKE_MEDIA);
+		if (INTEL_INFO(dev)->gen >= 9) {
+			if (--dev_priv->uncore.fw_blittercount == 0)
+				dev_priv->uncore.funcs.force_wake_put(dev_priv,
+							FORCEWAKE_BLITTER);
+		}
 	} else {
 		if (--dev_priv->uncore.forcewake_count == 0)
 			dev_priv->uncore.funcs.force_wake_put(dev_priv,
@@ -341,7 +357,9 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
 }
 
-static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
+static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
+				    struct drm_i915_gem_object *ring_obj,
+				    u32 tail)
 {
 	struct page *page;
 	uint32_t *reg_state;
@@ -350,43 +368,45 @@ static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tai
 	reg_state = kmap_atomic(page);
 
 	reg_state[CTX_RING_TAIL+1] = tail;
+	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
 
 	kunmap_atomic(reg_state);
 
 	return 0;
 }
 
-static int execlists_submit_context(struct intel_engine_cs *ring,
-				    struct intel_context *to0, u32 tail0,
-				    struct intel_context *to1, u32 tail1)
+static void execlists_submit_contexts(struct intel_engine_cs *ring,
+				      struct intel_context *to0, u32 tail0,
+				      struct intel_context *to1, u32 tail1)
 {
-	struct drm_i915_gem_object *ctx_obj0;
+	struct drm_i915_gem_object *ctx_obj0 = to0->engine[ring->id].state;
+	struct intel_ringbuffer *ringbuf0 = to0->engine[ring->id].ringbuf;
 	struct drm_i915_gem_object *ctx_obj1 = NULL;
+	struct intel_ringbuffer *ringbuf1 = NULL;
 
-	ctx_obj0 = to0->engine[ring->id].state;
 	BUG_ON(!ctx_obj0);
 	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj0));
+	WARN_ON(!i915_gem_obj_is_pinned(ringbuf0->obj));
 
-	execlists_ctx_write_tail(ctx_obj0, tail0);
+	execlists_update_context(ctx_obj0, ringbuf0->obj, tail0);
 
 	if (to1) {
+		ringbuf1 = to1->engine[ring->id].ringbuf;
 		ctx_obj1 = to1->engine[ring->id].state;
 		BUG_ON(!ctx_obj1);
 		WARN_ON(!i915_gem_obj_is_pinned(ctx_obj1));
+		WARN_ON(!i915_gem_obj_is_pinned(ringbuf1->obj));
 
-		execlists_ctx_write_tail(ctx_obj1, tail1);
+		execlists_update_context(ctx_obj1, ringbuf1->obj, tail1);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
-
-	return 0;
 }
 
 static void execlists_context_unqueue(struct intel_engine_cs *ring)
 {
 	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
 	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 
 	assert_spin_locked(&ring->execlist_lock);
 
@@ -403,7 +423,8 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 			 * will update tail past first request's workload */
 			cursor->elsp_submitted = req0->elsp_submitted;
 			list_del(&req0->execlist_link);
-			queue_work(dev_priv->wq, &req0->work);
+			list_add_tail(&req0->execlist_link,
+				&ring->execlist_retired_req_list);
 			req0 = cursor;
 		} else {
 			req1 = cursor;
@@ -413,9 +434,9 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 
 	WARN_ON(req1 && req1->elsp_submitted);
 
-	WARN_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
-					 req1 ? req1->ctx : NULL,
-					 req1 ? req1->tail : 0));
+	execlists_submit_contexts(ring, req0->ctx, req0->tail,
+				  req1 ? req1->ctx : NULL,
+				  req1 ? req1->tail : 0);
 
 	req0->elsp_submitted++;
 	if (req1)
@@ -425,7 +446,6 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 					   u32 request_id)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	struct intel_ctx_submit_request *head_req;
 
 	assert_spin_locked(&ring->execlist_lock);
@@ -443,7 +463,8 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 
 			if (--head_req->elsp_submitted <= 0) {
 				list_del(&head_req->execlist_link);
-				queue_work(dev_priv->wq, &head_req->work);
+				list_add_tail(&head_req->execlist_link,
+					&ring->execlist_retired_req_list);
 				return true;
 			}
 		}
@@ -512,22 +533,6 @@ void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
 		   ((u32)ring->next_context_status_buffer & 0x07) << 8);
 }
 
-static void execlists_free_request_task(struct work_struct *work)
-{
-	struct intel_ctx_submit_request *req =
-		container_of(work, struct intel_ctx_submit_request, work);
-	struct drm_device *dev = req->ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	intel_runtime_pm_put(dev_priv);
-
-	mutex_lock(&dev->struct_mutex);
-	i915_gem_context_unreference(req->ctx);
-	mutex_unlock(&dev->struct_mutex);
-
-	kfree(req);
-}
-
 static int execlists_context_queue(struct intel_engine_cs *ring,
 				   struct intel_context *to,
 				   u32 tail)
@@ -542,9 +547,12 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 		return -ENOMEM;
 	req->ctx = to;
 	i915_gem_context_reference(req->ctx);
+
+	if (to != ring->default_context)
+		intel_lr_context_pin(ring, to);
+
 	req->ring = ring;
 	req->tail = tail;
-	INIT_WORK(&req->work, execlists_free_request_task);
 
 	intel_runtime_pm_get(dev_priv);
 
@@ -563,9 +571,10 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 
 		if (to == tail_req->ctx) {
 			WARN(tail_req->elsp_submitted != 0,
-			     "More than 2 already-submitted reqs queued\n");
+				"More than 2 already-submitted reqs queued\n");
 			list_del(&tail_req->execlist_link);
-			queue_work(dev_priv->wq, &tail_req->work);
+			list_add_tail(&tail_req->execlist_link,
+				&ring->execlist_retired_req_list);
 		}
 	}
 
@@ -733,6 +742,36 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 	return 0;
 }
 
+void intel_execlists_retire_requests(struct intel_engine_cs *ring)
+{
+	struct intel_ctx_submit_request *req, *tmp;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	unsigned long flags;
+	struct list_head retired_list;
+
+	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
+	if (list_empty(&ring->execlist_retired_req_list))
+		return;
+
+	INIT_LIST_HEAD(&retired_list);
+	spin_lock_irqsave(&ring->execlist_lock, flags);
+	list_replace_init(&ring->execlist_retired_req_list, &retired_list);
+	spin_unlock_irqrestore(&ring->execlist_lock, flags);
+
+	list_for_each_entry_safe(req, tmp, &retired_list, execlist_link) {
+		struct intel_context *ctx = req->ctx;
+		struct drm_i915_gem_object *ctx_obj =
+				ctx->engine[ring->id].state;
+
+		if (ctx_obj && (ctx != ring->default_context))
+			intel_lr_context_unpin(ring, ctx);
+		intel_runtime_pm_put(dev_priv);
+		i915_gem_context_unreference(req->ctx);
+		list_del(&req->execlist_link);
+		kfree(req);
+	}
+}
+
 void intel_logical_ring_stop(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -793,9 +832,55 @@ void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
 	execlists_context_queue(ring, ctx, ringbuf->tail);
 }
 
+static int intel_lr_context_pin(struct intel_engine_cs *ring,
+		struct intel_context *ctx)
+{
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	int ret = 0;
+
+	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
+	if (ctx->engine[ring->id].unpin_count++ == 0) {
+		ret = i915_gem_obj_ggtt_pin(ctx_obj,
+				GEN8_LR_CONTEXT_ALIGN, 0);
+		if (ret)
+			goto reset_unpin_count;
+
+		ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
+		if (ret)
+			goto unpin_ctx_obj;
+	}
+
+	return ret;
+
+unpin_ctx_obj:
+	i915_gem_object_ggtt_unpin(ctx_obj);
+reset_unpin_count:
+	ctx->engine[ring->id].unpin_count = 0;
+
+	return ret;
+}
+
+void intel_lr_context_unpin(struct intel_engine_cs *ring,
+		struct intel_context *ctx)
+{
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+
+	if (ctx_obj) {
+		WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
+		if (--ctx->engine[ring->id].unpin_count == 0) {
+			intel_unpin_ringbuffer_obj(ringbuf);
+			i915_gem_object_ggtt_unpin(ctx_obj);
+		}
+	}
+}
+
 static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
 				    struct intel_context *ctx)
 {
+	int ret;
+
 	if (ring->outstanding_lazy_seqno)
 		return 0;
 
@@ -806,6 +891,14 @@ static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
 		if (request == NULL)
 			return -ENOMEM;
 
+		if (ctx != ring->default_context) {
+			ret = intel_lr_context_pin(ring, ctx);
+			if (ret) {
+				kfree(request);
+				return ret;
+			}
+		}
+
 		/* Hold a reference to the context this request belongs to
 		 * (we will need it when the time comes to emit/retire the
 		 * request).
@@ -991,6 +1084,44 @@ int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords)
 	return 0;
 }
 
+static int intel_logical_ring_workarounds_emit(struct intel_engine_cs *ring,
+					       struct intel_context *ctx)
+{
+	int ret, i;
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_workarounds *w = &dev_priv->workarounds;
+
+	if (WARN_ON(w->count == 0))
+		return 0;
+
+	ring->gpu_caches_dirty = true;
+	ret = logical_ring_flush_all_caches(ringbuf);
+	if (ret)
+		return ret;
+
+	ret = intel_logical_ring_begin(ringbuf, w->count * 2 + 2);
+	if (ret)
+		return ret;
+
+	intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(w->count));
+	for (i = 0; i < w->count; i++) {
+		intel_logical_ring_emit(ringbuf, w->reg[i].addr);
+		intel_logical_ring_emit(ringbuf, w->reg[i].value);
+	}
+	intel_logical_ring_emit(ringbuf, MI_NOOP);
+
+	intel_logical_ring_advance(ringbuf);
+
+	ring->gpu_caches_dirty = true;
+	ret = logical_ring_flush_all_caches(ringbuf);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
 static int gen8_init_common_ring(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
@@ -1034,7 +1165,7 @@ static int gen8_init_render_ring(struct intel_engine_cs *ring)
 
 	I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
 
-	return ret;
+	return init_workarounds_ring(ring);
 }
 
 static int gen8_emit_bb_start(struct intel_ringbuffer *ringbuf,
@@ -1063,7 +1194,7 @@ static bool gen8_logical_ring_get_irq(struct intel_engine_cs *ring)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	unsigned long flags;
 
-	if (!dev->irq_enabled)
+	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
 		return false;
 
 	spin_lock_irqsave(&dev_priv->irq_lock, flags);
@@ -1214,11 +1345,13 @@ static int gen8_emit_request(struct intel_ringbuffer *ringbuf)
  */
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct drm_i915_private *dev_priv;
 
 	if (!intel_ring_initialized(ring))
 		return;
 
+	dev_priv = ring->dev->dev_private;
+
 	intel_logical_ring_stop(ring);
 	WARN_ON((I915_READ_MODE(ring) & MODE_IDLE) == 0);
 	ring->preallocated_lazy_request = NULL;
@@ -1248,6 +1381,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	init_waitqueue_head(&ring->irq_queue);
 
 	INIT_LIST_HEAD(&ring->execlist_queue);
+	INIT_LIST_HEAD(&ring->execlist_retired_req_list);
 	spin_lock_init(&ring->execlist_lock);
 	ring->next_context_status_buffer = 0;
 
@@ -1282,6 +1416,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 		ring->irq_keep_mask |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
 
 	ring->init = gen8_init_render_ring;
+	ring->init_context = intel_logical_ring_workarounds_emit;
 	ring->cleanup = intel_fini_pipe_control;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
@@ -1495,7 +1630,6 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_gem_object *ring_obj = ringbuf->obj;
 	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
 	struct page *page;
 	uint32_t *reg_state;
@@ -1541,7 +1675,9 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_RING_TAIL] = RING_TAIL(ring->mmio_base);
 	reg_state[CTX_RING_TAIL+1] = 0;
 	reg_state[CTX_RING_BUFFER_START] = RING_START(ring->mmio_base);
-	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
+	/* Ring buffer start address is not known until the buffer is pinned.
+	 * It is written to the context image in execlists_update_context()
+	 */
 	reg_state[CTX_RING_BUFFER_CONTROL] = RING_CTL(ring->mmio_base);
 	reg_state[CTX_RING_BUFFER_CONTROL+1] =
 			((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES) | RING_VALID;
@@ -1617,12 +1753,18 @@ void intel_lr_context_free(struct intel_context *ctx)
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
-		struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
 
 		if (ctx_obj) {
+			struct intel_ringbuffer *ringbuf =
+					ctx->engine[i].ringbuf;
+			struct intel_engine_cs *ring = ringbuf->ring;
+
+			if (ctx == ring->default_context) {
+				intel_unpin_ringbuffer_obj(ringbuf);
+				i915_gem_object_ggtt_unpin(ctx_obj);
+			}
 			intel_destroy_ringbuffer_obj(ringbuf);
 			kfree(ringbuf);
-			i915_gem_object_ggtt_unpin(ctx_obj);
 			drm_gem_object_unreference(&ctx_obj->base);
 		}
 	}
@@ -1632,11 +1774,14 @@ static uint32_t get_lr_context_size(struct intel_engine_cs *ring)
 {
 	int ret = 0;
 
-	WARN_ON(INTEL_INFO(ring->dev)->gen != 8);
+	WARN_ON(INTEL_INFO(ring->dev)->gen < 8);
 
 	switch (ring->id) {
 	case RCS:
-		ret = GEN8_LR_CONTEXT_RENDER_SIZE;
+		if (INTEL_INFO(ring->dev)->gen >= 9)
+			ret = GEN9_LR_CONTEXT_RENDER_SIZE;
+		else
+			ret = GEN8_LR_CONTEXT_RENDER_SIZE;
 		break;
 	case VCS:
 	case BCS:
@@ -1649,6 +1794,23 @@ static uint32_t get_lr_context_size(struct intel_engine_cs *ring)
 	return ret;
 }
 
+static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
+		struct drm_i915_gem_object *default_ctx_obj)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	/* The status page is offset 0 from the default context object
+	 * in LRC mode. */
+	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(default_ctx_obj);
+	ring->status_page.page_addr =
+			kmap(sg_page(default_ctx_obj->pages->sgl));
+	ring->status_page.obj = default_ctx_obj;
+
+	I915_WRITE(RING_HWS_PGA(ring->mmio_base),
+			(u32)ring->status_page.gfx_addr);
+	POSTING_READ(RING_HWS_PGA(ring->mmio_base));
+}
+
 /**
  * intel_lr_context_deferred_create() - create the LRC specific bits of a context
  * @ctx: LR context to create.
@@ -1660,11 +1822,12 @@ static uint32_t get_lr_context_size(struct intel_engine_cs *ring)
  * the creation is a deferred call: it's better to make sure first that we need to use
  * a given ring with the context.
  *
- * Return: non-zero on eror.
+ * Return: non-zero on error.
  */
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
+	const bool is_global_default_ctx = (ctx == ring->default_context);
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_gem_object *ctx_obj;
 	uint32_t context_size;
@@ -1684,21 +1847,22 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 		return ret;
 	}
 
-	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
-	if (ret) {
-		DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
-		drm_gem_object_unreference(&ctx_obj->base);
-		return ret;
+	if (is_global_default_ctx) {
+		ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+		if (ret) {
+			DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n",
+					ret);
+			drm_gem_object_unreference(&ctx_obj->base);
+			return ret;
+		}
 	}
 
 	ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
 	if (!ringbuf) {
 		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
 				ring->name);
-		i915_gem_object_ggtt_unpin(ctx_obj);
-		drm_gem_object_unreference(&ctx_obj->base);
 		ret = -ENOMEM;
-		return ret;
+		goto error_unpin_ctx;
 	}
 
 	ringbuf->ring = ring;
@@ -1711,46 +1875,51 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	ringbuf->space = ringbuf->size;
 	ringbuf->last_retired_head = -1;
 
-	/* TODO: For now we put this in the mappable region so that we can reuse
-	 * the existing ringbuffer code which ioremaps it. When we start
-	 * creating many contexts, this will no longer work and we must switch
-	 * to a kmapish interface.
-	 */
-	ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
-	if (ret) {
-		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
+	if (ringbuf->obj == NULL) {
+		ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
+		if (ret) {
+			DRM_DEBUG_DRIVER(
+				"Failed to allocate ringbuffer obj %s: %d\n",
 				ring->name, ret);
-		goto error;
+			goto error_free_rbuf;
+		}
+
+		if (is_global_default_ctx) {
+			ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
+			if (ret) {
+				DRM_ERROR(
+					"Failed to pin and map ringbuffer %s: %d\n",
+					ring->name, ret);
+				goto error_destroy_rbuf;
+			}
+		}
+
 	}
 
 	ret = populate_lr_context(ctx, ctx_obj, ring, ringbuf);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
-		intel_destroy_ringbuffer_obj(ringbuf);
 		goto error;
 	}
 
 	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].state = ctx_obj;
 
-	if (ctx == ring->default_context) {
-		/* The status page is offset 0 from the default context object
-		 * in LRC mode. */
-		ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(ctx_obj);
-		ring->status_page.page_addr =
-				kmap(sg_page(ctx_obj->pages->sgl));
-		if (ring->status_page.page_addr == NULL)
-			return -ENOMEM;
-		ring->status_page.obj = ctx_obj;
-	}
+	if (ctx == ring->default_context)
+		lrc_setup_hardware_status_page(ring, ctx_obj);
 
 	if (ring->id == RCS && !ctx->rcs_initialized) {
+		if (ring->init_context) {
+			ret = ring->init_context(ring, ctx);
+			if (ret)
+				DRM_ERROR("ring init context: %d\n", ret);
+		}
+
 		ret = intel_lr_context_render_state_init(ring, ctx);
 		if (ret) {
 			DRM_ERROR("Init render state failed: %d\n", ret);
 			ctx->engine[ring->id].ringbuf = NULL;
 			ctx->engine[ring->id].state = NULL;
-			intel_destroy_ringbuffer_obj(ringbuf);
 			goto error;
 		}
 		ctx->rcs_initialized = true;
@@ -1759,8 +1928,15 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	return 0;
 
 error:
+	if (is_global_default_ctx)
+		intel_unpin_ringbuffer_obj(ringbuf);
+error_destroy_rbuf:
+	intel_destroy_ringbuffer_obj(ringbuf);
+error_free_rbuf:
 	kfree(ringbuf);
-	i915_gem_object_ggtt_unpin(ctx_obj);
+error_unpin_ctx:
+	if (is_global_default_ctx)
+		i915_gem_object_ggtt_unpin(ctx_obj);
 	drm_gem_object_unreference(&ctx_obj->base);
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 33c3b4bf28c5..14b216b9be7f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -24,6 +24,8 @@
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+#define GEN8_LR_CONTEXT_ALIGN 4096
+
 /* Execlists regs */
 #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
 #define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
@@ -67,6 +69,8 @@ int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
+void intel_lr_context_unpin(struct intel_engine_cs *ring,
+		struct intel_context *ctx);
 
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
@@ -104,11 +108,11 @@ struct intel_ctx_submit_request {
 	u32 tail;
 
 	struct list_head execlist_link;
-	struct work_struct work;
 
 	int elsp_submitted;
 };
 
 void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
+void intel_execlists_retire_requests(struct intel_engine_cs *ring);
 
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/intel_lvds.c b/drivers/gpu/drm/i915/intel_lvds.c
index c0bbf2172446..14654d628ca4 100644
--- a/drivers/gpu/drm/i915/intel_lvds.c
+++ b/drivers/gpu/drm/i915/intel_lvds.c
@@ -76,7 +76,7 @@ static bool intel_lvds_get_hw_state(struct intel_encoder *encoder,
 	u32 tmp;
 
 	power_domain = intel_display_port_power_domain(encoder);
-	if (!intel_display_power_enabled(dev_priv, power_domain))
+	if (!intel_display_power_is_enabled(dev_priv, power_domain))
 		return false;
 
 	tmp = I915_READ(lvds_encoder->reg);
@@ -1116,7 +1116,7 @@ out:
 	drm_connector_register(connector);
 
 	intel_panel_init(&intel_connector->panel, fixed_mode, downclock_mode);
-	intel_panel_setup_backlight(connector);
+	intel_panel_setup_backlight(connector, INVALID_PIPE);
 
 	return;
 
diff --git a/drivers/gpu/drm/i915/intel_panel.c b/drivers/gpu/drm/i915/intel_panel.c
index 41b3be217493..4d63839bd9b4 100644
--- a/drivers/gpu/drm/i915/intel_panel.c
+++ b/drivers/gpu/drm/i915/intel_panel.c
@@ -521,6 +521,9 @@ static u32 _vlv_get_backlight(struct drm_device *dev, enum pipe pipe)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
+	if (WARN_ON(pipe != PIPE_A && pipe != PIPE_B))
+		return 0;
+
 	return I915_READ(VLV_BLC_PWM_CTL(pipe)) & BACKLIGHT_DUTY_CYCLE_MASK;
 }
 
@@ -536,15 +539,17 @@ static u32 intel_panel_get_backlight(struct intel_connector *connector)
 {
 	struct drm_device *dev = connector->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	u32 val;
-	unsigned long flags;
+	struct intel_panel *panel = &connector->panel;
+	u32 val = 0;
 
-	spin_lock_irqsave(&dev_priv->backlight_lock, flags);
+	mutex_lock(&dev_priv->backlight_lock);
 
-	val = dev_priv->display.get_backlight(connector);
-	val = intel_panel_compute_brightness(connector, val);
+	if (panel->backlight.enabled) {
+		val = dev_priv->display.get_backlight(connector);
+		val = intel_panel_compute_brightness(connector, val);
+	}
 
-	spin_unlock_irqrestore(&dev_priv->backlight_lock, flags);
+	mutex_unlock(&dev_priv->backlight_lock);
 
 	DRM_DEBUG_DRIVER("get backlight PWM = %d\n", val);
 	return val;
@@ -603,6 +608,9 @@ static void vlv_set_backlight(struct intel_connector *connector, u32 level)
 	enum pipe pipe = intel_get_pipe_from_connector(connector);
 	u32 tmp;
 
+	if (WARN_ON(pipe != PIPE_A && pipe != PIPE_B))
+		return;
+
 	tmp = I915_READ(VLV_BLC_PWM_CTL(pipe)) & ~BACKLIGHT_DUTY_CYCLE_MASK;
 	I915_WRITE(VLV_BLC_PWM_CTL(pipe), tmp | level);
 }
@@ -626,14 +634,12 @@ static void intel_panel_set_backlight(struct intel_connector *connector,
 	struct drm_device *dev = connector->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_panel *panel = &connector->panel;
-	enum pipe pipe = intel_get_pipe_from_connector(connector);
 	u32 hw_level;
-	unsigned long flags;
 
-	if (!panel->backlight.present || pipe == INVALID_PIPE)
+	if (!panel->backlight.present)
 		return;
 
-	spin_lock_irqsave(&dev_priv->backlight_lock, flags);
+	mutex_lock(&dev_priv->backlight_lock);
 
 	WARN_ON(panel->backlight.max == 0);
 
@@ -643,7 +649,7 @@ static void intel_panel_set_backlight(struct intel_connector *connector,
 	if (panel->backlight.enabled)
 		intel_panel_actually_set_backlight(connector, hw_level);
 
-	spin_unlock_irqrestore(&dev_priv->backlight_lock, flags);
+	mutex_unlock(&dev_priv->backlight_lock);
 }
 
 /* set backlight brightness to level in range [0..max], assuming hw min is
@@ -657,12 +663,17 @@ void intel_panel_set_backlight_acpi(struct intel_connector *connector,
 	struct intel_panel *panel = &connector->panel;
 	enum pipe pipe = intel_get_pipe_from_connector(connector);
 	u32 hw_level;
-	unsigned long flags;
 
+	/*
+	 * INVALID_PIPE may occur during driver init because
+	 * connection_mutex isn't held across the entire backlight
+	 * setup + modeset readout, and the BIOS can issue the
+	 * requests at any time.
+	 */
 	if (!panel->backlight.present || pipe == INVALID_PIPE)
 		return;
 
-	spin_lock_irqsave(&dev_priv->backlight_lock, flags);
+	mutex_lock(&dev_priv->backlight_lock);
 
 	WARN_ON(panel->backlight.max == 0);
 
@@ -678,7 +689,7 @@ void intel_panel_set_backlight_acpi(struct intel_connector *connector,
 	if (panel->backlight.enabled)
 		intel_panel_actually_set_backlight(connector, hw_level);
 
-	spin_unlock_irqrestore(&dev_priv->backlight_lock, flags);
+	mutex_unlock(&dev_priv->backlight_lock);
 }
 
 static void pch_disable_backlight(struct intel_connector *connector)
@@ -720,6 +731,9 @@ static void vlv_disable_backlight(struct intel_connector *connector)
 	enum pipe pipe = intel_get_pipe_from_connector(connector);
 	u32 tmp;
 
+	if (WARN_ON(pipe != PIPE_A && pipe != PIPE_B))
+		return;
+
 	intel_panel_actually_set_backlight(connector, 0);
 
 	tmp = I915_READ(VLV_BLC_PWM_CTL2(pipe));
@@ -731,10 +745,8 @@ void intel_panel_disable_backlight(struct intel_connector *connector)
 	struct drm_device *dev = connector->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_panel *panel = &connector->panel;
-	enum pipe pipe = intel_get_pipe_from_connector(connector);
-	unsigned long flags;
 
-	if (!panel->backlight.present || pipe == INVALID_PIPE)
+	if (!panel->backlight.present)
 		return;
 
 	/*
@@ -748,14 +760,14 @@ void intel_panel_disable_backlight(struct intel_connector *connector)
 		return;
 	}
 
-	spin_lock_irqsave(&dev_priv->backlight_lock, flags);
+	mutex_lock(&dev_priv->backlight_lock);
 
 	if (panel->backlight.device)
 		panel->backlight.device->props.power = FB_BLANK_POWERDOWN;
 	panel->backlight.enabled = false;
 	dev_priv->display.disable_backlight(connector);
 
-	spin_unlock_irqrestore(&dev_priv->backlight_lock, flags);
+	mutex_unlock(&dev_priv->backlight_lock);
 }
 
 static void bdw_enable_backlight(struct intel_connector *connector)
@@ -779,8 +791,9 @@ static void bdw_enable_backlight(struct intel_connector *connector)
 	if (panel->backlight.active_low_pwm)
 		pch_ctl1 |= BLM_PCH_POLARITY;
 
-	/* BDW always uses the pch pwm controls. */
-	pch_ctl1 |= BLM_PCH_OVERRIDE_ENABLE;
+	/* After LPT, override is the default. */
+	if (HAS_PCH_LPT(dev_priv))
+		pch_ctl1 |= BLM_PCH_OVERRIDE_ENABLE;
 
 	I915_WRITE(BLC_PWM_PCH_CTL1, pch_ctl1);
 	POSTING_READ(BLC_PWM_PCH_CTL1);
@@ -909,6 +922,9 @@ static void vlv_enable_backlight(struct intel_connector *connector)
 	enum pipe pipe = intel_get_pipe_from_connector(connector);
 	u32 ctl, ctl2;
 
+	if (WARN_ON(pipe != PIPE_A && pipe != PIPE_B))
+		return;
+
 	ctl2 = I915_READ(VLV_BLC_PWM_CTL2(pipe));
 	if (ctl2 & BLM_PWM_ENABLE) {
 		DRM_DEBUG_KMS("backlight already enabled\n");
@@ -936,14 +952,13 @@ void intel_panel_enable_backlight(struct intel_connector *connector)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_panel *panel = &connector->panel;
 	enum pipe pipe = intel_get_pipe_from_connector(connector);
-	unsigned long flags;
 
-	if (!panel->backlight.present || pipe == INVALID_PIPE)
+	if (!panel->backlight.present)
 		return;
 
 	DRM_DEBUG_KMS("pipe %c\n", pipe_name(pipe));
 
-	spin_lock_irqsave(&dev_priv->backlight_lock, flags);
+	mutex_lock(&dev_priv->backlight_lock);
 
 	WARN_ON(panel->backlight.max == 0);
 
@@ -961,7 +976,7 @@ void intel_panel_enable_backlight(struct intel_connector *connector)
 	if (panel->backlight.device)
 		panel->backlight.device->props.power = FB_BLANK_UNBLANK;
 
-	spin_unlock_irqrestore(&dev_priv->backlight_lock, flags);
+	mutex_unlock(&dev_priv->backlight_lock);
 }
 
 #if IS_ENABLED(CONFIG_BACKLIGHT_CLASS_DEVICE)
@@ -1030,6 +1045,9 @@ static int intel_backlight_device_register(struct intel_connector *connector)
 	if (WARN_ON(panel->backlight.device))
 		return -ENODEV;
 
+	if (!panel->backlight.present)
+		return 0;
+
 	WARN_ON(panel->backlight.max == 0);
 
 	memset(&props, 0, sizeof(props));
@@ -1065,6 +1083,10 @@ static int intel_backlight_device_register(struct intel_connector *connector)
 		panel->backlight.device = NULL;
 		return -ENODEV;
 	}
+
+	DRM_DEBUG_KMS("Connector %s backlight sysfs interface registered\n",
+		      connector->base.name);
+
 	return 0;
 }
 
@@ -1119,7 +1141,7 @@ static u32 get_backlight_min_vbt(struct intel_connector *connector)
 	return scale(min, 0, 255, 0, panel->backlight.max);
 }
 
-static int bdw_setup_backlight(struct intel_connector *connector)
+static int bdw_setup_backlight(struct intel_connector *connector, enum pipe unused)
 {
 	struct drm_device *dev = connector->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1145,7 +1167,7 @@ static int bdw_setup_backlight(struct intel_connector *connector)
 	return 0;
 }
 
-static int pch_setup_backlight(struct intel_connector *connector)
+static int pch_setup_backlight(struct intel_connector *connector, enum pipe unused)
 {
 	struct drm_device *dev = connector->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1172,7 +1194,7 @@ static int pch_setup_backlight(struct intel_connector *connector)
 	return 0;
 }
 
-static int i9xx_setup_backlight(struct intel_connector *connector)
+static int i9xx_setup_backlight(struct intel_connector *connector, enum pipe unused)
 {
 	struct drm_device *dev = connector->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1204,7 +1226,7 @@ static int i9xx_setup_backlight(struct intel_connector *connector)
 	return 0;
 }
 
-static int i965_setup_backlight(struct intel_connector *connector)
+static int i965_setup_backlight(struct intel_connector *connector, enum pipe unused)
 {
 	struct drm_device *dev = connector->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1234,37 +1256,40 @@ static int i965_setup_backlight(struct intel_connector *connector)
 	return 0;
 }
 
-static int vlv_setup_backlight(struct intel_connector *connector)
+static int vlv_setup_backlight(struct intel_connector *connector, enum pipe pipe)
 {
 	struct drm_device *dev = connector->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_panel *panel = &connector->panel;
-	enum pipe pipe;
+	enum pipe p;
 	u32 ctl, ctl2, val;
 
-	for_each_pipe(dev_priv, pipe) {
-		u32 cur_val = I915_READ(VLV_BLC_PWM_CTL(pipe));
+	for_each_pipe(dev_priv, p) {
+		u32 cur_val = I915_READ(VLV_BLC_PWM_CTL(p));
 
 		/* Skip if the modulation freq is already set */
 		if (cur_val & ~BACKLIGHT_DUTY_CYCLE_MASK)
 			continue;
 
 		cur_val &= BACKLIGHT_DUTY_CYCLE_MASK;
-		I915_WRITE(VLV_BLC_PWM_CTL(pipe), (0xf42 << 16) |
+		I915_WRITE(VLV_BLC_PWM_CTL(p), (0xf42 << 16) |
 			   cur_val);
 	}
 
-	ctl2 = I915_READ(VLV_BLC_PWM_CTL2(PIPE_A));
+	if (WARN_ON(pipe != PIPE_A && pipe != PIPE_B))
+		return -ENODEV;
+
+	ctl2 = I915_READ(VLV_BLC_PWM_CTL2(pipe));
 	panel->backlight.active_low_pwm = ctl2 & BLM_POLARITY_I965;
 
-	ctl = I915_READ(VLV_BLC_PWM_CTL(PIPE_A));
+	ctl = I915_READ(VLV_BLC_PWM_CTL(pipe));
 	panel->backlight.max = ctl >> 16;
 	if (!panel->backlight.max)
 		return -ENODEV;
 
 	panel->backlight.min = get_backlight_min_vbt(connector);
 
-	val = _vlv_get_backlight(dev, PIPE_A);
+	val = _vlv_get_backlight(dev, pipe);
 	panel->backlight.level = intel_panel_compute_brightness(connector, val);
 
 	panel->backlight.enabled = (ctl2 & BLM_PWM_ENABLE) &&
@@ -1273,13 +1298,12 @@ static int vlv_setup_backlight(struct intel_connector *connector)
 	return 0;
 }
 
-int intel_panel_setup_backlight(struct drm_connector *connector)
+int intel_panel_setup_backlight(struct drm_connector *connector, enum pipe pipe)
 {
 	struct drm_device *dev = connector->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_connector *intel_connector = to_intel_connector(connector);
 	struct intel_panel *panel = &intel_connector->panel;
-	unsigned long flags;
 	int ret;
 
 	if (!dev_priv->vbt.backlight.present) {
@@ -1292,9 +1316,9 @@ int intel_panel_setup_backlight(struct drm_connector *connector)
 	}
 
 	/* set level and max in panel struct */
-	spin_lock_irqsave(&dev_priv->backlight_lock, flags);
-	ret = dev_priv->display.setup_backlight(intel_connector);
-	spin_unlock_irqrestore(&dev_priv->backlight_lock, flags);
+	mutex_lock(&dev_priv->backlight_lock);
+	ret = dev_priv->display.setup_backlight(intel_connector, pipe);
+	mutex_unlock(&dev_priv->backlight_lock);
 
 	if (ret) {
 		DRM_DEBUG_KMS("failed to setup backlight for connector %s\n",
@@ -1302,15 +1326,12 @@ int intel_panel_setup_backlight(struct drm_connector *connector)
 		return ret;
 	}
 
-	intel_backlight_device_register(intel_connector);
-
 	panel->backlight.present = true;
 
-	DRM_DEBUG_KMS("backlight initialized, %s, brightness %u/%u, "
-		      "sysfs interface %sregistered\n",
+	DRM_DEBUG_KMS("Connector %s backlight initialized, %s, brightness %u/%u\n",
+		      connector->name,
 		      panel->backlight.enabled ? "enabled" : "disabled",
-		      panel->backlight.level, panel->backlight.max,
-		      panel->backlight.device ? "" : "not ");
+		      panel->backlight.level, panel->backlight.max);
 
 	return 0;
 }
@@ -1321,7 +1342,6 @@ void intel_panel_destroy_backlight(struct drm_connector *connector)
 	struct intel_panel *panel = &intel_connector->panel;
 
 	panel->backlight.present = false;
-	intel_backlight_device_unregister(intel_connector);
 }
 
 /* Set up chip specific backlight functions */
@@ -1329,7 +1349,7 @@ void intel_panel_init_backlight_funcs(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	if (IS_BROADWELL(dev)) {
+	if (IS_BROADWELL(dev) || (INTEL_INFO(dev)->gen >= 9)) {
 		dev_priv->display.setup_backlight = bdw_setup_backlight;
 		dev_priv->display.enable_backlight = bdw_enable_backlight;
 		dev_priv->display.disable_backlight = pch_disable_backlight;
@@ -1384,3 +1404,19 @@ void intel_panel_fini(struct intel_panel *panel)
 		drm_mode_destroy(intel_connector->base.dev,
 				panel->downclock_mode);
 }
+
+void intel_backlight_register(struct drm_device *dev)
+{
+	struct intel_connector *connector;
+
+	list_for_each_entry(connector, &dev->mode_config.connector_list, base.head)
+		intel_backlight_device_register(connector);
+}
+
+void intel_backlight_unregister(struct drm_device *dev)
+{
+	struct intel_connector *connector;
+
+	list_for_each_entry(connector, &dev->mode_config.connector_list, base.head)
+		intel_backlight_device_unregister(connector);
+}
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index ad2fd605f76b..1f4b56e273c8 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -30,9 +30,6 @@
 #include "intel_drv.h"
 #include "../../../platform/x86/intel_ips.h"
 #include <linux/module.h>
-#include <linux/vgaarb.h>
-#include <drm/i915_powerwell.h>
-#include <linux/pm_runtime.h>
 
 /**
  * RC6 is a special power stage which allows the GPU to enter an very
@@ -66,11 +63,37 @@
  * i915.i915_enable_fbc parameter
  */
 
+static void gen9_init_clock_gating(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	/*
+	 * WaDisableSDEUnitClockGating:skl
+	 * This seems to be a pre-production w/a.
+	 */
+	I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) |
+		   GEN8_SDEUNIT_CLOCK_GATE_DISABLE);
+
+	/*
+	 * WaDisableDgMirrorFixInHalfSliceChicken5:skl
+	 * This is a pre-production w/a.
+	 */
+	I915_WRITE(GEN9_HALF_SLICE_CHICKEN5,
+		   I915_READ(GEN9_HALF_SLICE_CHICKEN5) &
+		   ~GEN9_DG_MIRROR_FIX_ENABLE);
+
+	/* Wa4x4STCOptimizationDisable:skl */
+	I915_WRITE(CACHE_MODE_1,
+		   _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE));
+}
+
 static void i8xx_disable_fbc(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 fbc_ctl;
 
+	dev_priv->fbc.enabled = false;
+
 	/* Disable compression */
 	fbc_ctl = I915_READ(FBC_CONTROL);
 	if ((fbc_ctl & FBC_CTL_EN) == 0)
@@ -99,6 +122,8 @@ static void i8xx_enable_fbc(struct drm_crtc *crtc)
 	int i;
 	u32 fbc_ctl;
 
+	dev_priv->fbc.enabled = true;
+
 	cfb_pitch = dev_priv->fbc.size / FBC_LL_SIZE;
 	if (fb->pitches[0] < cfb_pitch)
 		cfb_pitch = fb->pitches[0];
@@ -153,6 +178,8 @@ static void g4x_enable_fbc(struct drm_crtc *crtc)
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	u32 dpfc_ctl;
 
+	dev_priv->fbc.enabled = true;
+
 	dpfc_ctl = DPFC_CTL_PLANE(intel_crtc->plane) | DPFC_SR_EN;
 	if (drm_format_plane_cpp(fb->pixel_format, 0) == 2)
 		dpfc_ctl |= DPFC_CTL_LIMIT_2X;
@@ -173,6 +200,8 @@ static void g4x_disable_fbc(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 dpfc_ctl;
 
+	dev_priv->fbc.enabled = false;
+
 	/* Disable compression */
 	dpfc_ctl = I915_READ(DPFC_CONTROL);
 	if (dpfc_ctl & DPFC_CTL_EN) {
@@ -224,6 +253,8 @@ static void ironlake_enable_fbc(struct drm_crtc *crtc)
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	u32 dpfc_ctl;
 
+	dev_priv->fbc.enabled = true;
+
 	dpfc_ctl = DPFC_CTL_PLANE(intel_crtc->plane);
 	if (drm_format_plane_cpp(fb->pixel_format, 0) == 2)
 		dev_priv->fbc.threshold++;
@@ -264,6 +295,8 @@ static void ironlake_disable_fbc(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 dpfc_ctl;
 
+	dev_priv->fbc.enabled = false;
+
 	/* Disable compression */
 	dpfc_ctl = I915_READ(ILK_DPFC_CONTROL);
 	if (dpfc_ctl & DPFC_CTL_EN) {
@@ -290,6 +323,8 @@ static void gen7_enable_fbc(struct drm_crtc *crtc)
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	u32 dpfc_ctl;
 
+	dev_priv->fbc.enabled = true;
+
 	dpfc_ctl = IVB_DPFC_CTL_PLANE(intel_crtc->plane);
 	if (drm_format_plane_cpp(fb->pixel_format, 0) == 2)
 		dev_priv->fbc.threshold++;
@@ -339,19 +374,19 @@ bool intel_fbc_enabled(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	if (!dev_priv->display.fbc_enabled)
-		return false;
-
-	return dev_priv->display.fbc_enabled(dev);
+	return dev_priv->fbc.enabled;
 }
 
-void gen8_fbc_sw_flush(struct drm_device *dev, u32 value)
+void bdw_fbc_sw_flush(struct drm_device *dev, u32 value)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
 	if (!IS_GEN8(dev))
 		return;
 
+	if (!intel_fbc_enabled(dev))
+		return;
+
 	I915_WRITE(MSG_FBC_REND_STATE, value);
 }
 
@@ -1310,6 +1345,7 @@ static bool vlv_compute_drain_latency(struct drm_crtc *crtc,
 				      int *prec_mult,
 				      int *drain_latency)
 {
+	struct drm_device *dev = crtc->dev;
 	int entries;
 	int clock = to_intel_crtc(crtc)->config.adjusted_mode.crtc_clock;
 
@@ -1320,8 +1356,12 @@ static bool vlv_compute_drain_latency(struct drm_crtc *crtc,
 		return false;
 
 	entries = DIV_ROUND_UP(clock, 1000) * pixel_size;
-	*prec_mult = (entries > 128) ? DRAIN_LATENCY_PRECISION_64 :
-				       DRAIN_LATENCY_PRECISION_32;
+	if (IS_CHERRYVIEW(dev))
+		*prec_mult = (entries > 128) ? DRAIN_LATENCY_PRECISION_32 :
+					       DRAIN_LATENCY_PRECISION_16;
+	else
+		*prec_mult = (entries > 128) ? DRAIN_LATENCY_PRECISION_64 :
+					       DRAIN_LATENCY_PRECISION_32;
 	*drain_latency = (64 * (*prec_mult) * 4) / entries;
 
 	if (*drain_latency > DRAIN_LATENCY_MASK)
@@ -1340,15 +1380,18 @@ static bool vlv_compute_drain_latency(struct drm_crtc *crtc,
 
 static void vlv_update_drain_latency(struct drm_crtc *crtc)
 {
-	struct drm_i915_private *dev_priv = crtc->dev->dev_private;
+	struct drm_device *dev = crtc->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	int pixel_size;
 	int drain_latency;
 	enum pipe pipe = intel_crtc->pipe;
 	int plane_prec, prec_mult, plane_dl;
+	const int high_precision = IS_CHERRYVIEW(dev) ?
+		DRAIN_LATENCY_PRECISION_32 : DRAIN_LATENCY_PRECISION_64;
 
-	plane_dl = I915_READ(VLV_DDL(pipe)) & ~(DDL_PLANE_PRECISION_64 |
-		   DRAIN_LATENCY_MASK | DDL_CURSOR_PRECISION_64 |
+	plane_dl = I915_READ(VLV_DDL(pipe)) & ~(DDL_PLANE_PRECISION_HIGH |
+		   DRAIN_LATENCY_MASK | DDL_CURSOR_PRECISION_HIGH |
 		   (DRAIN_LATENCY_MASK << DDL_CURSOR_SHIFT));
 
 	if (!intel_crtc_active(crtc)) {
@@ -1359,9 +1402,9 @@ static void vlv_update_drain_latency(struct drm_crtc *crtc)
 	/* Primary plane Drain Latency */
 	pixel_size = crtc->primary->fb->bits_per_pixel / 8;	/* BPP */
 	if (vlv_compute_drain_latency(crtc, pixel_size, &prec_mult, &drain_latency)) {
-		plane_prec = (prec_mult == DRAIN_LATENCY_PRECISION_64) ?
-					   DDL_PLANE_PRECISION_64 :
-					   DDL_PLANE_PRECISION_32;
+		plane_prec = (prec_mult == high_precision) ?
+					   DDL_PLANE_PRECISION_HIGH :
+					   DDL_PLANE_PRECISION_LOW;
 		plane_dl |= plane_prec | drain_latency;
 	}
 
@@ -1373,9 +1416,9 @@ static void vlv_update_drain_latency(struct drm_crtc *crtc)
 	/* Program cursor DL only if it is enabled */
 	if (intel_crtc->cursor_base &&
 	    vlv_compute_drain_latency(crtc, pixel_size, &prec_mult, &drain_latency)) {
-		plane_prec = (prec_mult == DRAIN_LATENCY_PRECISION_64) ?
-					   DDL_CURSOR_PRECISION_64 :
-					   DDL_CURSOR_PRECISION_32;
+		plane_prec = (prec_mult == high_precision) ?
+					   DDL_CURSOR_PRECISION_HIGH :
+					   DDL_CURSOR_PRECISION_LOW;
 		plane_dl |= plane_prec | (drain_latency << DDL_CURSOR_SHIFT);
 	}
 
@@ -1543,15 +1586,17 @@ static void valleyview_update_sprite_wm(struct drm_plane *plane,
 	int plane_prec;
 	int sprite_dl;
 	int prec_mult;
+	const int high_precision = IS_CHERRYVIEW(dev) ?
+		DRAIN_LATENCY_PRECISION_32 : DRAIN_LATENCY_PRECISION_64;
 
-	sprite_dl = I915_READ(VLV_DDL(pipe)) & ~(DDL_SPRITE_PRECISION_64(sprite) |
+	sprite_dl = I915_READ(VLV_DDL(pipe)) & ~(DDL_SPRITE_PRECISION_HIGH(sprite) |
 		    (DRAIN_LATENCY_MASK << DDL_SPRITE_SHIFT(sprite)));
 
 	if (enabled && vlv_compute_drain_latency(crtc, pixel_size, &prec_mult,
 						 &drain_latency)) {
-		plane_prec = (prec_mult == DRAIN_LATENCY_PRECISION_64) ?
-					   DDL_SPRITE_PRECISION_64(sprite) :
-					   DDL_SPRITE_PRECISION_32(sprite);
+		plane_prec = (prec_mult == high_precision) ?
+					   DDL_SPRITE_PRECISION_HIGH(sprite) :
+					   DDL_SPRITE_PRECISION_LOW(sprite);
 		sprite_dl |= plane_prec |
 			     (drain_latency << DDL_SPRITE_SHIFT(sprite));
 	}
@@ -1915,6 +1960,14 @@ static uint32_t ilk_wm_fbc(uint32_t pri_val, uint32_t horiz_pixels,
 	return DIV_ROUND_UP(pri_val * 64, horiz_pixels * bytes_per_pixel) + 2;
 }
 
+struct skl_pipe_wm_parameters {
+	bool active;
+	uint32_t pipe_htotal;
+	uint32_t pixel_rate; /* in KHz */
+	struct intel_plane_wm_parameters plane[I915_MAX_PLANES];
+	struct intel_plane_wm_parameters cursor;
+};
+
 struct ilk_pipe_wm_parameters {
 	bool active;
 	uint32_t pipe_htotal;
@@ -2226,11 +2279,82 @@ hsw_compute_linetime_wm(struct drm_device *dev, struct drm_crtc *crtc)
 	       PIPE_WM_LINETIME_TIME(linetime);
 }
 
-static void intel_read_wm_latency(struct drm_device *dev, uint16_t wm[5])
+static void intel_read_wm_latency(struct drm_device *dev, uint16_t wm[8])
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	if (IS_HASWELL(dev) || IS_BROADWELL(dev)) {
+	if (IS_GEN9(dev)) {
+		uint32_t val;
+		int ret, i;
+		int level, max_level = ilk_wm_max_level(dev);
+
+		/* read the first set of memory latencies[0:3] */
+		val = 0; /* data0 to be programmed to 0 for first set */
+		mutex_lock(&dev_priv->rps.hw_lock);
+		ret = sandybridge_pcode_read(dev_priv,
+					     GEN9_PCODE_READ_MEM_LATENCY,
+					     &val);
+		mutex_unlock(&dev_priv->rps.hw_lock);
+
+		if (ret) {
+			DRM_ERROR("SKL Mailbox read error = %d\n", ret);
+			return;
+		}
+
+		wm[0] = val & GEN9_MEM_LATENCY_LEVEL_MASK;
+		wm[1] = (val >> GEN9_MEM_LATENCY_LEVEL_1_5_SHIFT) &
+				GEN9_MEM_LATENCY_LEVEL_MASK;
+		wm[2] = (val >> GEN9_MEM_LATENCY_LEVEL_2_6_SHIFT) &
+				GEN9_MEM_LATENCY_LEVEL_MASK;
+		wm[3] = (val >> GEN9_MEM_LATENCY_LEVEL_3_7_SHIFT) &
+				GEN9_MEM_LATENCY_LEVEL_MASK;
+
+		/* read the second set of memory latencies[4:7] */
+		val = 1; /* data0 to be programmed to 1 for second set */
+		mutex_lock(&dev_priv->rps.hw_lock);
+		ret = sandybridge_pcode_read(dev_priv,
+					     GEN9_PCODE_READ_MEM_LATENCY,
+					     &val);
+		mutex_unlock(&dev_priv->rps.hw_lock);
+		if (ret) {
+			DRM_ERROR("SKL Mailbox read error = %d\n", ret);
+			return;
+		}
+
+		wm[4] = val & GEN9_MEM_LATENCY_LEVEL_MASK;
+		wm[5] = (val >> GEN9_MEM_LATENCY_LEVEL_1_5_SHIFT) &
+				GEN9_MEM_LATENCY_LEVEL_MASK;
+		wm[6] = (val >> GEN9_MEM_LATENCY_LEVEL_2_6_SHIFT) &
+				GEN9_MEM_LATENCY_LEVEL_MASK;
+		wm[7] = (val >> GEN9_MEM_LATENCY_LEVEL_3_7_SHIFT) &
+				GEN9_MEM_LATENCY_LEVEL_MASK;
+
+		/*
+		 * punit doesn't take into account the read latency so we need
+		 * to add 2us to the various latency levels we retrieve from
+		 * the punit.
+		 *   - W0 is a bit special in that it's the only level that
+		 *   can't be disabled if we want to have display working, so
+		 *   we always add 2us there.
+		 *   - For levels >=1, punit returns 0us latency when they are
+		 *   disabled, so we respect that and don't add 2us then
+		 *
+		 * Additionally, if a level n (n > 1) has a 0us latency, all
+		 * levels m (m >= n) need to be disabled. We make sure to
+		 * sanitize the values out of the punit to satisfy this
+		 * requirement.
+		 */
+		wm[0] += 2;
+		for (level = 1; level <= max_level; level++)
+			if (wm[level] != 0)
+				wm[level] += 2;
+			else {
+				for (i = level + 1; i <= max_level; i++)
+					wm[i] = 0;
+
+				break;
+			}
+	} else if (IS_HASWELL(dev) || IS_BROADWELL(dev)) {
 		uint64_t sskpd = I915_READ64(MCH_SSKPD);
 
 		wm[0] = (sskpd >> 56) & 0xFF;
@@ -2278,7 +2402,9 @@ static void intel_fixup_cur_wm_latency(struct drm_device *dev, uint16_t wm[5])
 int ilk_wm_max_level(const struct drm_device *dev)
 {
 	/* how many WM levels are we expecting */
-	if (IS_HASWELL(dev) || IS_BROADWELL(dev))
+	if (IS_GEN9(dev))
+		return 7;
+	else if (IS_HASWELL(dev) || IS_BROADWELL(dev))
 		return 4;
 	else if (INTEL_INFO(dev)->gen >= 6)
 		return 3;
@@ -2288,7 +2414,7 @@ int ilk_wm_max_level(const struct drm_device *dev)
 
 static void intel_print_wm_latency(struct drm_device *dev,
 				   const char *name,
-				   const uint16_t wm[5])
+				   const uint16_t wm[8])
 {
 	int level, max_level = ilk_wm_max_level(dev);
 
@@ -2301,8 +2427,13 @@ static void intel_print_wm_latency(struct drm_device *dev,
 			continue;
 		}
 
-		/* WM1+ latency values in 0.5us units */
-		if (level > 0)
+		/*
+		 * - latencies are in us on gen9.
+		 * - before then, WM1+ latency values are in 0.5us units
+		 */
+		if (IS_GEN9(dev))
+			latency *= 10;
+		else if (level > 0)
 			latency *= 5;
 
 		DRM_DEBUG_KMS("%s WM%d latency %u (%u.%u usec)\n",
@@ -2370,6 +2501,14 @@ static void ilk_setup_wm_latency(struct drm_device *dev)
 		snb_wm_latency_quirk(dev);
 }
 
+static void skl_setup_wm_latency(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	intel_read_wm_latency(dev, dev_priv->wm.skl_latency);
+	intel_print_wm_latency(dev, "Gen9 Plane", dev_priv->wm.skl_latency);
+}
+
 static void ilk_compute_wm_parameters(struct drm_crtc *crtc,
 				      struct ilk_pipe_wm_parameters *p)
 {
@@ -2860,6 +2999,769 @@ static bool ilk_disable_lp_wm(struct drm_device *dev)
 	return _ilk_disable_lp_wm(dev_priv, WM_DIRTY_LP_ALL);
 }
 
+/*
+ * On gen9, we need to allocate Display Data Buffer (DDB) portions to the
+ * different active planes.
+ */
+
+#define SKL_DDB_SIZE		896	/* in blocks */
+
+static void
+skl_ddb_get_pipe_allocation_limits(struct drm_device *dev,
+				   struct drm_crtc *for_crtc,
+				   const struct intel_wm_config *config,
+				   const struct skl_pipe_wm_parameters *params,
+				   struct skl_ddb_entry *alloc /* out */)
+{
+	struct drm_crtc *crtc;
+	unsigned int pipe_size, ddb_size;
+	int nth_active_pipe;
+
+	if (!params->active) {
+		alloc->start = 0;
+		alloc->end = 0;
+		return;
+	}
+
+	ddb_size = SKL_DDB_SIZE;
+
+	ddb_size -= 4; /* 4 blocks for bypass path allocation */
+
+	nth_active_pipe = 0;
+	for_each_crtc(dev, crtc) {
+		if (!intel_crtc_active(crtc))
+			continue;
+
+		if (crtc == for_crtc)
+			break;
+
+		nth_active_pipe++;
+	}
+
+	pipe_size = ddb_size / config->num_pipes_active;
+	alloc->start = nth_active_pipe * ddb_size / config->num_pipes_active;
+	alloc->end = alloc->start + pipe_size;
+}
+
+static unsigned int skl_cursor_allocation(const struct intel_wm_config *config)
+{
+	if (config->num_pipes_active == 1)
+		return 32;
+
+	return 8;
+}
+
+static void skl_ddb_entry_init_from_hw(struct skl_ddb_entry *entry, u32 reg)
+{
+	entry->start = reg & 0x3ff;
+	entry->end = (reg >> 16) & 0x3ff;
+	if (entry->end)
+		entry->end += 1;
+}
+
+void skl_ddb_get_hw_state(struct drm_i915_private *dev_priv,
+			  struct skl_ddb_allocation *ddb /* out */)
+{
+	struct drm_device *dev = dev_priv->dev;
+	enum pipe pipe;
+	int plane;
+	u32 val;
+
+	for_each_pipe(dev_priv, pipe) {
+		for_each_plane(pipe, plane) {
+			val = I915_READ(PLANE_BUF_CFG(pipe, plane));
+			skl_ddb_entry_init_from_hw(&ddb->plane[pipe][plane],
+						   val);
+		}
+
+		val = I915_READ(CUR_BUF_CFG(pipe));
+		skl_ddb_entry_init_from_hw(&ddb->cursor[pipe], val);
+	}
+}
+
+static unsigned int
+skl_plane_relative_data_rate(const struct intel_plane_wm_parameters *p)
+{
+	return p->horiz_pixels * p->vert_pixels * p->bytes_per_pixel;
+}
+
+/*
+ * We don't overflow 32 bits. Worst case is 3 planes enabled, each fetching
+ * a 8192x4096@32bpp framebuffer:
+ *   3 * 4096 * 8192  * 4 < 2^32
+ */
+static unsigned int
+skl_get_total_relative_data_rate(struct intel_crtc *intel_crtc,
+				 const struct skl_pipe_wm_parameters *params)
+{
+	unsigned int total_data_rate = 0;
+	int plane;
+
+	for (plane = 0; plane < intel_num_planes(intel_crtc); plane++) {
+		const struct intel_plane_wm_parameters *p;
+
+		p = &params->plane[plane];
+		if (!p->enabled)
+			continue;
+
+		total_data_rate += skl_plane_relative_data_rate(p);
+	}
+
+	return total_data_rate;
+}
+
+static void
+skl_allocate_pipe_ddb(struct drm_crtc *crtc,
+		      const struct intel_wm_config *config,
+		      const struct skl_pipe_wm_parameters *params,
+		      struct skl_ddb_allocation *ddb /* out */)
+{
+	struct drm_device *dev = crtc->dev;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	enum pipe pipe = intel_crtc->pipe;
+	struct skl_ddb_entry *alloc = &ddb->pipe[pipe];
+	uint16_t alloc_size, start, cursor_blocks;
+	unsigned int total_data_rate;
+	int plane;
+
+	skl_ddb_get_pipe_allocation_limits(dev, crtc, config, params, alloc);
+	alloc_size = skl_ddb_entry_size(alloc);
+	if (alloc_size == 0) {
+		memset(ddb->plane[pipe], 0, sizeof(ddb->plane[pipe]));
+		memset(&ddb->cursor[pipe], 0, sizeof(ddb->cursor[pipe]));
+		return;
+	}
+
+	cursor_blocks = skl_cursor_allocation(config);
+	ddb->cursor[pipe].start = alloc->end - cursor_blocks;
+	ddb->cursor[pipe].end = alloc->end;
+
+	alloc_size -= cursor_blocks;
+	alloc->end -= cursor_blocks;
+
+	/*
+	 * Each active plane get a portion of the remaining space, in
+	 * proportion to the amount of data they need to fetch from memory.
+	 *
+	 * FIXME: we may not allocate every single block here.
+	 */
+	total_data_rate = skl_get_total_relative_data_rate(intel_crtc, params);
+
+	start = alloc->start;
+	for (plane = 0; plane < intel_num_planes(intel_crtc); plane++) {
+		const struct intel_plane_wm_parameters *p;
+		unsigned int data_rate;
+		uint16_t plane_blocks;
+
+		p = &params->plane[plane];
+		if (!p->enabled)
+			continue;
+
+		data_rate = skl_plane_relative_data_rate(p);
+
+		/*
+		 * promote the expression to 64 bits to avoid overflowing, the
+		 * result is < available as data_rate / total_data_rate < 1
+		 */
+		plane_blocks = div_u64((uint64_t)alloc_size * data_rate,
+				       total_data_rate);
+
+		ddb->plane[pipe][plane].start = start;
+		ddb->plane[pipe][plane].end = start + plane_blocks;
+
+		start += plane_blocks;
+	}
+
+}
+
+static uint32_t skl_pipe_pixel_rate(const struct intel_crtc_config *config)
+{
+	/* TODO: Take into account the scalers once we support them */
+	return config->adjusted_mode.crtc_clock;
+}
+
+/*
+ * The max latency should be 257 (max the punit can code is 255 and we add 2us
+ * for the read latency) and bytes_per_pixel should always be <= 8, so that
+ * should allow pixel_rate up to ~2 GHz which seems sufficient since max
+ * 2xcdclk is 1350 MHz and the pixel rate should never exceed that.
+*/
+static uint32_t skl_wm_method1(uint32_t pixel_rate, uint8_t bytes_per_pixel,
+			       uint32_t latency)
+{
+	uint32_t wm_intermediate_val, ret;
+
+	if (latency == 0)
+		return UINT_MAX;
+
+	wm_intermediate_val = latency * pixel_rate * bytes_per_pixel;
+	ret = DIV_ROUND_UP(wm_intermediate_val, 1000);
+
+	return ret;
+}
+
+static uint32_t skl_wm_method2(uint32_t pixel_rate, uint32_t pipe_htotal,
+			       uint32_t horiz_pixels, uint8_t bytes_per_pixel,
+			       uint32_t latency)
+{
+	uint32_t ret, plane_bytes_per_line, wm_intermediate_val;
+
+	if (latency == 0)
+		return UINT_MAX;
+
+	plane_bytes_per_line = horiz_pixels * bytes_per_pixel;
+	wm_intermediate_val = latency * pixel_rate;
+	ret = DIV_ROUND_UP(wm_intermediate_val, pipe_htotal * 1000) *
+				plane_bytes_per_line;
+
+	return ret;
+}
+
+static bool skl_ddb_allocation_changed(const struct skl_ddb_allocation *new_ddb,
+				       const struct intel_crtc *intel_crtc)
+{
+	struct drm_device *dev = intel_crtc->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	const struct skl_ddb_allocation *cur_ddb = &dev_priv->wm.skl_hw.ddb;
+	enum pipe pipe = intel_crtc->pipe;
+
+	if (memcmp(new_ddb->plane[pipe], cur_ddb->plane[pipe],
+		   sizeof(new_ddb->plane[pipe])))
+		return true;
+
+	if (memcmp(&new_ddb->cursor[pipe], &cur_ddb->cursor[pipe],
+		    sizeof(new_ddb->cursor[pipe])))
+		return true;
+
+	return false;
+}
+
+static void skl_compute_wm_global_parameters(struct drm_device *dev,
+					     struct intel_wm_config *config)
+{
+	struct drm_crtc *crtc;
+	struct drm_plane *plane;
+
+	list_for_each_entry(crtc, &dev->mode_config.crtc_list, head)
+		config->num_pipes_active += intel_crtc_active(crtc);
+
+	/* FIXME: I don't think we need those two global parameters on SKL */
+	list_for_each_entry(plane, &dev->mode_config.plane_list, head) {
+		struct intel_plane *intel_plane = to_intel_plane(plane);
+
+		config->sprites_enabled |= intel_plane->wm.enabled;
+		config->sprites_scaled |= intel_plane->wm.scaled;
+	}
+}
+
+static void skl_compute_wm_pipe_parameters(struct drm_crtc *crtc,
+					   struct skl_pipe_wm_parameters *p)
+{
+	struct drm_device *dev = crtc->dev;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	enum pipe pipe = intel_crtc->pipe;
+	struct drm_plane *plane;
+	int i = 1; /* Index for sprite planes start */
+
+	p->active = intel_crtc_active(crtc);
+	if (p->active) {
+		p->pipe_htotal = intel_crtc->config.adjusted_mode.crtc_htotal;
+		p->pixel_rate = skl_pipe_pixel_rate(&intel_crtc->config);
+
+		/*
+		 * For now, assume primary and cursor planes are always enabled.
+		 */
+		p->plane[0].enabled = true;
+		p->plane[0].bytes_per_pixel =
+			crtc->primary->fb->bits_per_pixel / 8;
+		p->plane[0].horiz_pixels = intel_crtc->config.pipe_src_w;
+		p->plane[0].vert_pixels = intel_crtc->config.pipe_src_h;
+
+		p->cursor.enabled = true;
+		p->cursor.bytes_per_pixel = 4;
+		p->cursor.horiz_pixels = intel_crtc->cursor_width ?
+					 intel_crtc->cursor_width : 64;
+	}
+
+	list_for_each_entry(plane, &dev->mode_config.plane_list, head) {
+		struct intel_plane *intel_plane = to_intel_plane(plane);
+
+		if (intel_plane->pipe == pipe)
+			p->plane[i++] = intel_plane->wm;
+	}
+}
+
+static bool skl_compute_plane_wm(struct skl_pipe_wm_parameters *p,
+				 struct intel_plane_wm_parameters *p_params,
+				 uint16_t ddb_allocation,
+				 uint32_t mem_value,
+				 uint16_t *out_blocks, /* out */
+				 uint8_t *out_lines /* out */)
+{
+	uint32_t method1, method2, plane_bytes_per_line, res_blocks, res_lines;
+	uint32_t result_bytes;
+
+	if (mem_value == 0 || !p->active || !p_params->enabled)
+		return false;
+
+	method1 = skl_wm_method1(p->pixel_rate,
+				 p_params->bytes_per_pixel,
+				 mem_value);
+	method2 = skl_wm_method2(p->pixel_rate,
+				 p->pipe_htotal,
+				 p_params->horiz_pixels,
+				 p_params->bytes_per_pixel,
+				 mem_value);
+
+	plane_bytes_per_line = p_params->horiz_pixels *
+					p_params->bytes_per_pixel;
+
+	/* For now xtile and linear */
+	if (((ddb_allocation * 512) / plane_bytes_per_line) >= 1)
+		result_bytes = min(method1, method2);
+	else
+		result_bytes = method1;
+
+	res_blocks = DIV_ROUND_UP(result_bytes, 512) + 1;
+	res_lines = DIV_ROUND_UP(result_bytes, plane_bytes_per_line);
+
+	if (res_blocks > ddb_allocation || res_lines > 31)
+		return false;
+
+	*out_blocks = res_blocks;
+	*out_lines = res_lines;
+
+	return true;
+}
+
+static void skl_compute_wm_level(const struct drm_i915_private *dev_priv,
+				 struct skl_ddb_allocation *ddb,
+				 struct skl_pipe_wm_parameters *p,
+				 enum pipe pipe,
+				 int level,
+				 int num_planes,
+				 struct skl_wm_level *result)
+{
+	uint16_t latency = dev_priv->wm.skl_latency[level];
+	uint16_t ddb_blocks;
+	int i;
+
+	for (i = 0; i < num_planes; i++) {
+		ddb_blocks = skl_ddb_entry_size(&ddb->plane[pipe][i]);
+
+		result->plane_en[i] = skl_compute_plane_wm(p, &p->plane[i],
+						ddb_blocks,
+						latency,
+						&result->plane_res_b[i],
+						&result->plane_res_l[i]);
+	}
+
+	ddb_blocks = skl_ddb_entry_size(&ddb->cursor[pipe]);
+	result->cursor_en = skl_compute_plane_wm(p, &p->cursor, ddb_blocks,
+						 latency, &result->cursor_res_b,
+						 &result->cursor_res_l);
+}
+
+static uint32_t
+skl_compute_linetime_wm(struct drm_crtc *crtc, struct skl_pipe_wm_parameters *p)
+{
+	if (!intel_crtc_active(crtc))
+		return 0;
+
+	return DIV_ROUND_UP(8 * p->pipe_htotal * 1000, p->pixel_rate);
+
+}
+
+static void skl_compute_transition_wm(struct drm_crtc *crtc,
+				      struct skl_pipe_wm_parameters *params,
+				      struct skl_wm_level *trans_wm /* out */)
+{
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	int i;
+
+	if (!params->active)
+		return;
+
+	/* Until we know more, just disable transition WMs */
+	for (i = 0; i < intel_num_planes(intel_crtc); i++)
+		trans_wm->plane_en[i] = false;
+	trans_wm->cursor_en = false;
+}
+
+static void skl_compute_pipe_wm(struct drm_crtc *crtc,
+				struct skl_ddb_allocation *ddb,
+				struct skl_pipe_wm_parameters *params,
+				struct skl_pipe_wm *pipe_wm)
+{
+	struct drm_device *dev = crtc->dev;
+	const struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	int level, max_level = ilk_wm_max_level(dev);
+
+	for (level = 0; level <= max_level; level++) {
+		skl_compute_wm_level(dev_priv, ddb, params, intel_crtc->pipe,
+				     level, intel_num_planes(intel_crtc),
+				     &pipe_wm->wm[level]);
+	}
+	pipe_wm->linetime = skl_compute_linetime_wm(crtc, params);
+
+	skl_compute_transition_wm(crtc, params, &pipe_wm->trans_wm);
+}
+
+static void skl_compute_wm_results(struct drm_device *dev,
+				   struct skl_pipe_wm_parameters *p,
+				   struct skl_pipe_wm *p_wm,
+				   struct skl_wm_values *r,
+				   struct intel_crtc *intel_crtc)
+{
+	int level, max_level = ilk_wm_max_level(dev);
+	enum pipe pipe = intel_crtc->pipe;
+	uint32_t temp;
+	int i;
+
+	for (level = 0; level <= max_level; level++) {
+		for (i = 0; i < intel_num_planes(intel_crtc); i++) {
+			temp = 0;
+
+			temp |= p_wm->wm[level].plane_res_l[i] <<
+					PLANE_WM_LINES_SHIFT;
+			temp |= p_wm->wm[level].plane_res_b[i];
+			if (p_wm->wm[level].plane_en[i])
+				temp |= PLANE_WM_EN;
+
+			r->plane[pipe][i][level] = temp;
+		}
+
+		temp = 0;
+
+		temp |= p_wm->wm[level].cursor_res_l << PLANE_WM_LINES_SHIFT;
+		temp |= p_wm->wm[level].cursor_res_b;
+
+		if (p_wm->wm[level].cursor_en)
+			temp |= PLANE_WM_EN;
+
+		r->cursor[pipe][level] = temp;
+
+	}
+
+	/* transition WMs */
+	for (i = 0; i < intel_num_planes(intel_crtc); i++) {
+		temp = 0;
+		temp |= p_wm->trans_wm.plane_res_l[i] << PLANE_WM_LINES_SHIFT;
+		temp |= p_wm->trans_wm.plane_res_b[i];
+		if (p_wm->trans_wm.plane_en[i])
+			temp |= PLANE_WM_EN;
+
+		r->plane_trans[pipe][i] = temp;
+	}
+
+	temp = 0;
+	temp |= p_wm->trans_wm.cursor_res_l << PLANE_WM_LINES_SHIFT;
+	temp |= p_wm->trans_wm.cursor_res_b;
+	if (p_wm->trans_wm.cursor_en)
+		temp |= PLANE_WM_EN;
+
+	r->cursor_trans[pipe] = temp;
+
+	r->wm_linetime[pipe] = p_wm->linetime;
+}
+
+static void skl_ddb_entry_write(struct drm_i915_private *dev_priv, uint32_t reg,
+				const struct skl_ddb_entry *entry)
+{
+	if (entry->end)
+		I915_WRITE(reg, (entry->end - 1) << 16 | entry->start);
+	else
+		I915_WRITE(reg, 0);
+}
+
+static void skl_write_wm_values(struct drm_i915_private *dev_priv,
+				const struct skl_wm_values *new)
+{
+	struct drm_device *dev = dev_priv->dev;
+	struct intel_crtc *crtc;
+
+	list_for_each_entry(crtc, &dev->mode_config.crtc_list, base.head) {
+		int i, level, max_level = ilk_wm_max_level(dev);
+		enum pipe pipe = crtc->pipe;
+
+		if (!new->dirty[pipe])
+			continue;
+
+		I915_WRITE(PIPE_WM_LINETIME(pipe), new->wm_linetime[pipe]);
+
+		for (level = 0; level <= max_level; level++) {
+			for (i = 0; i < intel_num_planes(crtc); i++)
+				I915_WRITE(PLANE_WM(pipe, i, level),
+					   new->plane[pipe][i][level]);
+			I915_WRITE(CUR_WM(pipe, level),
+				   new->cursor[pipe][level]);
+		}
+		for (i = 0; i < intel_num_planes(crtc); i++)
+			I915_WRITE(PLANE_WM_TRANS(pipe, i),
+				   new->plane_trans[pipe][i]);
+		I915_WRITE(CUR_WM_TRANS(pipe), new->cursor_trans[pipe]);
+
+		for (i = 0; i < intel_num_planes(crtc); i++)
+			skl_ddb_entry_write(dev_priv,
+					    PLANE_BUF_CFG(pipe, i),
+					    &new->ddb.plane[pipe][i]);
+
+		skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
+				    &new->ddb.cursor[pipe]);
+	}
+}
+
+/*
+ * When setting up a new DDB allocation arrangement, we need to correctly
+ * sequence the times at which the new allocations for the pipes are taken into
+ * account or we'll have pipes fetching from space previously allocated to
+ * another pipe.
+ *
+ * Roughly the sequence looks like:
+ *  1. re-allocate the pipe(s) with the allocation being reduced and not
+ *     overlapping with a previous light-up pipe (another way to put it is:
+ *     pipes with their new allocation strickly included into their old ones).
+ *  2. re-allocate the other pipes that get their allocation reduced
+ *  3. allocate the pipes having their allocation increased
+ *
+ * Steps 1. and 2. are here to take care of the following case:
+ * - Initially DDB looks like this:
+ *     |   B    |   C    |
+ * - enable pipe A.
+ * - pipe B has a reduced DDB allocation that overlaps with the old pipe C
+ *   allocation
+ *     |  A  |  B  |  C  |
+ *
+ * We need to sequence the re-allocation: C, B, A (and not B, C, A).
+ */
+
+static void
+skl_wm_flush_pipe(struct drm_i915_private *dev_priv, enum pipe pipe, int pass)
+{
+	struct drm_device *dev = dev_priv->dev;
+	int plane;
+
+	DRM_DEBUG_KMS("flush pipe %c (pass %d)\n", pipe_name(pipe), pass);
+
+	for_each_plane(pipe, plane) {
+		I915_WRITE(PLANE_SURF(pipe, plane),
+			   I915_READ(PLANE_SURF(pipe, plane)));
+	}
+	I915_WRITE(CURBASE(pipe), I915_READ(CURBASE(pipe)));
+}
+
+static bool
+skl_ddb_allocation_included(const struct skl_ddb_allocation *old,
+			    const struct skl_ddb_allocation *new,
+			    enum pipe pipe)
+{
+	uint16_t old_size, new_size;
+
+	old_size = skl_ddb_entry_size(&old->pipe[pipe]);
+	new_size = skl_ddb_entry_size(&new->pipe[pipe]);
+
+	return old_size != new_size &&
+	       new->pipe[pipe].start >= old->pipe[pipe].start &&
+	       new->pipe[pipe].end <= old->pipe[pipe].end;
+}
+
+static void skl_flush_wm_values(struct drm_i915_private *dev_priv,
+				struct skl_wm_values *new_values)
+{
+	struct drm_device *dev = dev_priv->dev;
+	struct skl_ddb_allocation *cur_ddb, *new_ddb;
+	bool reallocated[I915_MAX_PIPES] = {false, false, false};
+	struct intel_crtc *crtc;
+	enum pipe pipe;
+
+	new_ddb = &new_values->ddb;
+	cur_ddb = &dev_priv->wm.skl_hw.ddb;
+
+	/*
+	 * First pass: flush the pipes with the new allocation contained into
+	 * the old space.
+	 *
+	 * We'll wait for the vblank on those pipes to ensure we can safely
+	 * re-allocate the freed space without this pipe fetching from it.
+	 */
+	for_each_intel_crtc(dev, crtc) {
+		if (!crtc->active)
+			continue;
+
+		pipe = crtc->pipe;
+
+		if (!skl_ddb_allocation_included(cur_ddb, new_ddb, pipe))
+			continue;
+
+		skl_wm_flush_pipe(dev_priv, pipe, 1);
+		intel_wait_for_vblank(dev, pipe);
+
+		reallocated[pipe] = true;
+	}
+
+
+	/*
+	 * Second pass: flush the pipes that are having their allocation
+	 * reduced, but overlapping with a previous allocation.
+	 *
+	 * Here as well we need to wait for the vblank to make sure the freed
+	 * space is not used anymore.
+	 */
+	for_each_intel_crtc(dev, crtc) {
+		if (!crtc->active)
+			continue;
+
+		pipe = crtc->pipe;
+
+		if (reallocated[pipe])
+			continue;
+
+		if (skl_ddb_entry_size(&new_ddb->pipe[pipe]) <
+		    skl_ddb_entry_size(&cur_ddb->pipe[pipe])) {
+			skl_wm_flush_pipe(dev_priv, pipe, 2);
+			intel_wait_for_vblank(dev, pipe);
+		}
+
+		reallocated[pipe] = true;
+	}
+
+	/*
+	 * Third pass: flush the pipes that got more space allocated.
+	 *
+	 * We don't need to actively wait for the update here, next vblank
+	 * will just get more DDB space with the correct WM values.
+	 */
+	for_each_intel_crtc(dev, crtc) {
+		if (!crtc->active)
+			continue;
+
+		pipe = crtc->pipe;
+
+		/*
+		 * At this point, only the pipes more space than before are
+		 * left to re-allocate.
+		 */
+		if (reallocated[pipe])
+			continue;
+
+		skl_wm_flush_pipe(dev_priv, pipe, 3);
+	}
+}
+
+static bool skl_update_pipe_wm(struct drm_crtc *crtc,
+			       struct skl_pipe_wm_parameters *params,
+			       struct intel_wm_config *config,
+			       struct skl_ddb_allocation *ddb, /* out */
+			       struct skl_pipe_wm *pipe_wm /* out */)
+{
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+
+	skl_compute_wm_pipe_parameters(crtc, params);
+	skl_allocate_pipe_ddb(crtc, config, params, ddb);
+	skl_compute_pipe_wm(crtc, ddb, params, pipe_wm);
+
+	if (!memcmp(&intel_crtc->wm.skl_active, pipe_wm, sizeof(*pipe_wm)))
+		return false;
+
+	intel_crtc->wm.skl_active = *pipe_wm;
+	return true;
+}
+
+static void skl_update_other_pipe_wm(struct drm_device *dev,
+				     struct drm_crtc *crtc,
+				     struct intel_wm_config *config,
+				     struct skl_wm_values *r)
+{
+	struct intel_crtc *intel_crtc;
+	struct intel_crtc *this_crtc = to_intel_crtc(crtc);
+
+	/*
+	 * If the WM update hasn't changed the allocation for this_crtc (the
+	 * crtc we are currently computing the new WM values for), other
+	 * enabled crtcs will keep the same allocation and we don't need to
+	 * recompute anything for them.
+	 */
+	if (!skl_ddb_allocation_changed(&r->ddb, this_crtc))
+		return;
+
+	/*
+	 * Otherwise, because of this_crtc being freshly enabled/disabled, the
+	 * other active pipes need new DDB allocation and WM values.
+	 */
+	list_for_each_entry(intel_crtc, &dev->mode_config.crtc_list,
+				base.head) {
+		struct skl_pipe_wm_parameters params = {};
+		struct skl_pipe_wm pipe_wm = {};
+		bool wm_changed;
+
+		if (this_crtc->pipe == intel_crtc->pipe)
+			continue;
+
+		if (!intel_crtc->active)
+			continue;
+
+		wm_changed = skl_update_pipe_wm(&intel_crtc->base,
+						&params, config,
+						&r->ddb, &pipe_wm);
+
+		/*
+		 * If we end up re-computing the other pipe WM values, it's
+		 * because it was really needed, so we expect the WM values to
+		 * be different.
+		 */
+		WARN_ON(!wm_changed);
+
+		skl_compute_wm_results(dev, &params, &pipe_wm, r, intel_crtc);
+		r->dirty[intel_crtc->pipe] = true;
+	}
+}
+
+static void skl_update_wm(struct drm_crtc *crtc)
+{
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	struct drm_device *dev = crtc->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct skl_pipe_wm_parameters params = {};
+	struct skl_wm_values *results = &dev_priv->wm.skl_results;
+	struct skl_pipe_wm pipe_wm = {};
+	struct intel_wm_config config = {};
+
+	memset(results, 0, sizeof(*results));
+
+	skl_compute_wm_global_parameters(dev, &config);
+
+	if (!skl_update_pipe_wm(crtc, &params, &config,
+				&results->ddb, &pipe_wm))
+		return;
+
+	skl_compute_wm_results(dev, &params, &pipe_wm, results, intel_crtc);
+	results->dirty[intel_crtc->pipe] = true;
+
+	skl_update_other_pipe_wm(dev, crtc, &config, results);
+	skl_write_wm_values(dev_priv, results);
+	skl_flush_wm_values(dev_priv, results);
+
+	/* store the new configuration */
+	dev_priv->wm.skl_hw = *results;
+}
+
+static void
+skl_update_sprite_wm(struct drm_plane *plane, struct drm_crtc *crtc,
+		     uint32_t sprite_width, uint32_t sprite_height,
+		     int pixel_size, bool enabled, bool scaled)
+{
+	struct intel_plane *intel_plane = to_intel_plane(plane);
+
+	intel_plane->wm.enabled = enabled;
+	intel_plane->wm.scaled = scaled;
+	intel_plane->wm.horiz_pixels = sprite_width;
+	intel_plane->wm.vert_pixels = sprite_height;
+	intel_plane->wm.bytes_per_pixel = pixel_size;
+
+	skl_update_wm(crtc);
+}
+
 static void ilk_update_wm(struct drm_crtc *crtc)
 {
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
@@ -2934,6 +3836,113 @@ ilk_update_sprite_wm(struct drm_plane *plane,
 	ilk_update_wm(crtc);
 }
 
+static void skl_pipe_wm_active_state(uint32_t val,
+				     struct skl_pipe_wm *active,
+				     bool is_transwm,
+				     bool is_cursor,
+				     int i,
+				     int level)
+{
+	bool is_enabled = (val & PLANE_WM_EN) != 0;
+
+	if (!is_transwm) {
+		if (!is_cursor) {
+			active->wm[level].plane_en[i] = is_enabled;
+			active->wm[level].plane_res_b[i] =
+					val & PLANE_WM_BLOCKS_MASK;
+			active->wm[level].plane_res_l[i] =
+					(val >> PLANE_WM_LINES_SHIFT) &
+						PLANE_WM_LINES_MASK;
+		} else {
+			active->wm[level].cursor_en = is_enabled;
+			active->wm[level].cursor_res_b =
+					val & PLANE_WM_BLOCKS_MASK;
+			active->wm[level].cursor_res_l =
+					(val >> PLANE_WM_LINES_SHIFT) &
+						PLANE_WM_LINES_MASK;
+		}
+	} else {
+		if (!is_cursor) {
+			active->trans_wm.plane_en[i] = is_enabled;
+			active->trans_wm.plane_res_b[i] =
+					val & PLANE_WM_BLOCKS_MASK;
+			active->trans_wm.plane_res_l[i] =
+					(val >> PLANE_WM_LINES_SHIFT) &
+						PLANE_WM_LINES_MASK;
+		} else {
+			active->trans_wm.cursor_en = is_enabled;
+			active->trans_wm.cursor_res_b =
+					val & PLANE_WM_BLOCKS_MASK;
+			active->trans_wm.cursor_res_l =
+					(val >> PLANE_WM_LINES_SHIFT) &
+						PLANE_WM_LINES_MASK;
+		}
+	}
+}
+
+static void skl_pipe_wm_get_hw_state(struct drm_crtc *crtc)
+{
+	struct drm_device *dev = crtc->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct skl_wm_values *hw = &dev_priv->wm.skl_hw;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	struct skl_pipe_wm *active = &intel_crtc->wm.skl_active;
+	enum pipe pipe = intel_crtc->pipe;
+	int level, i, max_level;
+	uint32_t temp;
+
+	max_level = ilk_wm_max_level(dev);
+
+	hw->wm_linetime[pipe] = I915_READ(PIPE_WM_LINETIME(pipe));
+
+	for (level = 0; level <= max_level; level++) {
+		for (i = 0; i < intel_num_planes(intel_crtc); i++)
+			hw->plane[pipe][i][level] =
+					I915_READ(PLANE_WM(pipe, i, level));
+		hw->cursor[pipe][level] = I915_READ(CUR_WM(pipe, level));
+	}
+
+	for (i = 0; i < intel_num_planes(intel_crtc); i++)
+		hw->plane_trans[pipe][i] = I915_READ(PLANE_WM_TRANS(pipe, i));
+	hw->cursor_trans[pipe] = I915_READ(CUR_WM_TRANS(pipe));
+
+	if (!intel_crtc_active(crtc))
+		return;
+
+	hw->dirty[pipe] = true;
+
+	active->linetime = hw->wm_linetime[pipe];
+
+	for (level = 0; level <= max_level; level++) {
+		for (i = 0; i < intel_num_planes(intel_crtc); i++) {
+			temp = hw->plane[pipe][i][level];
+			skl_pipe_wm_active_state(temp, active, false,
+						false, i, level);
+		}
+		temp = hw->cursor[pipe][level];
+		skl_pipe_wm_active_state(temp, active, false, true, i, level);
+	}
+
+	for (i = 0; i < intel_num_planes(intel_crtc); i++) {
+		temp = hw->plane_trans[pipe][i];
+		skl_pipe_wm_active_state(temp, active, true, false, i, 0);
+	}
+
+	temp = hw->cursor_trans[pipe];
+	skl_pipe_wm_active_state(temp, active, true, true, i, 0);
+}
+
+void skl_wm_get_hw_state(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct skl_ddb_allocation *ddb = &dev_priv->wm.skl_hw.ddb;
+	struct drm_crtc *crtc;
+
+	skl_ddb_get_hw_state(dev_priv, ddb);
+	list_for_each_entry(crtc, &dev->mode_config.crtc_list, head)
+		skl_pipe_wm_get_hw_state(crtc);
+}
+
 static void ilk_pipe_wm_get_hw_state(struct drm_crtc *crtc)
 {
 	struct drm_device *dev = crtc->dev;
@@ -3442,7 +4451,7 @@ static void vlv_set_rps_idle(struct drm_i915_private *dev_priv)
 					dev_priv->rps.min_freq_softlimit);
 
 	if (wait_for(((vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS))
-				& GENFREQSTATUS) == 0, 5))
+				& GENFREQSTATUS) == 0, 100))
 		DRM_ERROR("timed out waiting for Punit\n");
 
 	vlv_force_gfx_clock(dev_priv, false);
@@ -3495,14 +4504,8 @@ void valleyview_set_rps(struct drm_device *dev, u8 val)
 		      "Odd GPU freq value\n"))
 		val &= ~1;
 
-	if (val != dev_priv->rps.cur_freq) {
-		DRM_DEBUG_DRIVER("GPU freq request from %d MHz (%u) to %d MHz (%u)\n",
-				 vlv_gpu_freq(dev_priv, dev_priv->rps.cur_freq),
-				 dev_priv->rps.cur_freq,
-				 vlv_gpu_freq(dev_priv, val), val);
-
+	if (val != dev_priv->rps.cur_freq)
 		vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val);
-	}
 
 	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
 
@@ -3510,43 +4513,11 @@ void valleyview_set_rps(struct drm_device *dev, u8 val)
 	trace_intel_gpu_freq_change(vlv_gpu_freq(dev_priv, val));
 }
 
-static void gen8_disable_rps_interrupts(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	I915_WRITE(GEN6_PMINTRMSK, ~GEN8_PMINTR_REDIRECT_TO_NON_DISP);
-	I915_WRITE(GEN8_GT_IER(2), I915_READ(GEN8_GT_IER(2)) &
-				   ~dev_priv->pm_rps_events);
-	/* Complete PM interrupt masking here doesn't race with the rps work
-	 * item again unmasking PM interrupts because that is using a different
-	 * register (GEN8_GT_IMR(2)) to mask PM interrupts. The only risk is in
-	 * leaving stale bits in GEN8_GT_IIR(2) and GEN8_GT_IMR(2) which
-	 * gen8_enable_rps will clean up. */
-
-	spin_lock_irq(&dev_priv->irq_lock);
-	dev_priv->rps.pm_iir = 0;
-	spin_unlock_irq(&dev_priv->irq_lock);
-
-	I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events);
-}
-
-static void gen6_disable_rps_interrupts(struct drm_device *dev)
+static void gen9_disable_rps(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);
-	I915_WRITE(GEN6_PMIER, I915_READ(GEN6_PMIER) &
-				~dev_priv->pm_rps_events);
-	/* Complete PM interrupt masking here doesn't race with the rps work
-	 * item again unmasking PM interrupts because that is using a different
-	 * register (PMIMR) to mask PM interrupts. The only risk is in leaving
-	 * stale bits in PMIIR and PMIMR which gen6_enable_rps will clean up. */
-
-	spin_lock_irq(&dev_priv->irq_lock);
-	dev_priv->rps.pm_iir = 0;
-	spin_unlock_irq(&dev_priv->irq_lock);
-
-	I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events);
+	I915_WRITE(GEN6_RC_CONTROL, 0);
 }
 
 static void gen6_disable_rps(struct drm_device *dev)
@@ -3555,11 +4526,6 @@ static void gen6_disable_rps(struct drm_device *dev)
 
 	I915_WRITE(GEN6_RC_CONTROL, 0);
 	I915_WRITE(GEN6_RPNSWREQ, 1 << 31);
-
-	if (IS_BROADWELL(dev))
-		gen8_disable_rps_interrupts(dev);
-	else
-		gen6_disable_rps_interrupts(dev);
 }
 
 static void cherryview_disable_rps(struct drm_device *dev)
@@ -3567,8 +4533,6 @@ static void cherryview_disable_rps(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
 	I915_WRITE(GEN6_RC_CONTROL, 0);
-
-	gen8_disable_rps_interrupts(dev);
 }
 
 static void valleyview_disable_rps(struct drm_device *dev)
@@ -3582,8 +4546,6 @@ static void valleyview_disable_rps(struct drm_device *dev)
 	I915_WRITE(GEN6_RC_CONTROL, 0);
 
 	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
-
-	gen6_disable_rps_interrupts(dev);
 }
 
 static void intel_print_rc6_info(struct drm_device *dev, u32 mode)
@@ -3594,10 +4556,15 @@ static void intel_print_rc6_info(struct drm_device *dev, u32 mode)
 		else
 			mode = 0;
 	}
-	DRM_DEBUG_KMS("Enabling RC6 states: RC6 %s, RC6p %s, RC6pp %s\n",
-		      (mode & GEN6_RC_CTL_RC6_ENABLE) ? "on" : "off",
-		      (mode & GEN6_RC_CTL_RC6p_ENABLE) ? "on" : "off",
-		      (mode & GEN6_RC_CTL_RC6pp_ENABLE) ? "on" : "off");
+	if (HAS_RC6p(dev))
+		DRM_DEBUG_KMS("Enabling RC6 states: RC6 %s RC6p %s RC6pp %s\n",
+			      (mode & GEN6_RC_CTL_RC6_ENABLE) ? "on" : "off",
+			      (mode & GEN6_RC_CTL_RC6p_ENABLE) ? "on" : "off",
+			      (mode & GEN6_RC_CTL_RC6pp_ENABLE) ? "on" : "off");
+
+	else
+		DRM_DEBUG_KMS("Enabling RC6 states: RC6 %s\n",
+			      (mode & GEN6_RC_CTL_RC6_ENABLE) ? "on" : "off");
 }
 
 static int sanitize_rc6_option(const struct drm_device *dev, int enable_rc6)
@@ -3614,7 +4581,7 @@ static int sanitize_rc6_option(const struct drm_device *dev, int enable_rc6)
 	if (enable_rc6 >= 0) {
 		int mask;
 
-		if (INTEL_INFO(dev)->gen == 6 || IS_IVYBRIDGE(dev))
+		if (HAS_RC6p(dev))
 			mask = INTEL_RC6_ENABLE | INTEL_RC6p_ENABLE |
 			       INTEL_RC6pp_ENABLE;
 		else
@@ -3642,54 +4609,92 @@ int intel_enable_rc6(const struct drm_device *dev)
 	return i915.enable_rc6;
 }
 
-static void gen8_enable_rps_interrupts(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	spin_lock_irq(&dev_priv->irq_lock);
-	WARN_ON(dev_priv->rps.pm_iir);
-	gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
-	I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events);
-	spin_unlock_irq(&dev_priv->irq_lock);
-}
-
-static void gen6_enable_rps_interrupts(struct drm_device *dev)
+static void gen6_init_rps_frequencies(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	uint32_t rp_state_cap;
+	u32 ddcc_status = 0;
+	int ret;
 
-	spin_lock_irq(&dev_priv->irq_lock);
-	WARN_ON(dev_priv->rps.pm_iir);
-	gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
-	I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events);
-	spin_unlock_irq(&dev_priv->irq_lock);
-}
-
-static void parse_rp_state_cap(struct drm_i915_private *dev_priv, u32 rp_state_cap)
-{
+	rp_state_cap = I915_READ(GEN6_RP_STATE_CAP);
 	/* All of these values are in units of 50MHz */
 	dev_priv->rps.cur_freq		= 0;
-	/* static values from HW: RP0 < RPe < RP1 < RPn (min_freq) */
-	dev_priv->rps.rp1_freq		= (rp_state_cap >>  8) & 0xff;
+	/* static values from HW: RP0 > RP1 > RPn (min_freq) */
 	dev_priv->rps.rp0_freq		= (rp_state_cap >>  0) & 0xff;
+	dev_priv->rps.rp1_freq		= (rp_state_cap >>  8) & 0xff;
 	dev_priv->rps.min_freq		= (rp_state_cap >> 16) & 0xff;
-	/* XXX: only BYT has a special efficient freq */
-	dev_priv->rps.efficient_freq	= dev_priv->rps.rp1_freq;
 	/* hw_max = RP0 until we check for overclocking */
 	dev_priv->rps.max_freq		= dev_priv->rps.rp0_freq;
 
+	dev_priv->rps.efficient_freq = dev_priv->rps.rp1_freq;
+	if (IS_HASWELL(dev) || IS_BROADWELL(dev)) {
+		ret = sandybridge_pcode_read(dev_priv,
+					HSW_PCODE_DYNAMIC_DUTY_CYCLE_CONTROL,
+					&ddcc_status);
+		if (0 == ret)
+			dev_priv->rps.efficient_freq =
+				(ddcc_status >> 8) & 0xff;
+	}
+
 	/* Preserve min/max settings in case of re-init */
 	if (dev_priv->rps.max_freq_softlimit == 0)
 		dev_priv->rps.max_freq_softlimit = dev_priv->rps.max_freq;
 
-	if (dev_priv->rps.min_freq_softlimit == 0)
-		dev_priv->rps.min_freq_softlimit = dev_priv->rps.min_freq;
+	if (dev_priv->rps.min_freq_softlimit == 0) {
+		if (IS_HASWELL(dev) || IS_BROADWELL(dev))
+			dev_priv->rps.min_freq_softlimit =
+				/* max(RPe, 450 MHz) */
+				max(dev_priv->rps.efficient_freq, (u8) 9);
+		else
+			dev_priv->rps.min_freq_softlimit =
+				dev_priv->rps.min_freq;
+	}
+}
+
+static void gen9_enable_rps(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs *ring;
+	uint32_t rc6_mask = 0;
+	int unused;
+
+	/* 1a: Software RC state - RC0 */
+	I915_WRITE(GEN6_RC_STATE, 0);
+
+	/* 1b: Get forcewake during program sequence. Although the driver
+	 * hasn't enabled a state yet where we need forcewake, BIOS may have.*/
+	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
+
+	/* 2a: Disable RC states. */
+	I915_WRITE(GEN6_RC_CONTROL, 0);
+
+	/* 2b: Program RC6 thresholds.*/
+	I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 54 << 16);
+	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */
+	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */
+	for_each_ring(ring, dev_priv, unused)
+		I915_WRITE(RING_MAX_IDLE(ring->mmio_base), 10);
+	I915_WRITE(GEN6_RC_SLEEP, 0);
+	I915_WRITE(GEN6_RC6_THRESHOLD, 37500); /* 37.5/125ms per EI */
+
+	/* 3a: Enable RC6 */
+	if (intel_enable_rc6(dev) & INTEL_RC6_ENABLE)
+		rc6_mask = GEN6_RC_CTL_RC6_ENABLE;
+	DRM_INFO("RC6 %s\n", (rc6_mask & GEN6_RC_CTL_RC6_ENABLE) ?
+			"on" : "off");
+	I915_WRITE(GEN6_RC_CONTROL, GEN6_RC_CTL_HW_ENABLE |
+				   GEN6_RC_CTL_EI_MODE(1) |
+				   rc6_mask);
+
+	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
+
 }
 
 static void gen8_enable_rps(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	uint32_t rc6_mask = 0, rp_state_cap;
+	uint32_t rc6_mask = 0;
 	int unused;
 
 	/* 1a: Software RC state - RC0 */
@@ -3702,8 +4707,8 @@ static void gen8_enable_rps(struct drm_device *dev)
 	/* 2a: Disable RC states. */
 	I915_WRITE(GEN6_RC_CONTROL, 0);
 
-	rp_state_cap = I915_READ(GEN6_RP_STATE_CAP);
-	parse_rp_state_cap(dev_priv, rp_state_cap);
+	/* Initialize rps frequencies */
+	gen6_init_rps_frequencies(dev);
 
 	/* 2b: Program RC6 thresholds.*/
 	I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 40 << 16);
@@ -3761,9 +4766,8 @@ static void gen8_enable_rps(struct drm_device *dev)
 
 	/* 6: Ring frequency + overclocking (our driver does this later */
 
-	gen6_set_rps(dev, (I915_READ(GEN6_GT_PERF_STATUS) & 0xff00) >> 8);
-
-	gen8_enable_rps_interrupts(dev);
+	dev_priv->rps.power = HIGH_POWER; /* force a reset */
+	gen6_set_rps(dev_priv->dev, dev_priv->rps.min_freq_softlimit);
 
 	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 }
@@ -3772,7 +4776,6 @@ static void gen6_enable_rps(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	u32 rp_state_cap;
 	u32 rc6vids, pcu_mbox = 0, rc6_mask = 0;
 	u32 gtfifodbg;
 	int rc6_mode;
@@ -3796,9 +4799,8 @@ static void gen6_enable_rps(struct drm_device *dev)
 
 	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
 
-	rp_state_cap = I915_READ(GEN6_RP_STATE_CAP);
-
-	parse_rp_state_cap(dev_priv, rp_state_cap);
+	/* Initialize rps frequencies */
+	gen6_init_rps_frequencies(dev);
 
 	/* disable the counters and set deterministic thresholds */
 	I915_WRITE(GEN6_RC_CONTROL, 0);
@@ -3861,8 +4863,6 @@ static void gen6_enable_rps(struct drm_device *dev)
 	dev_priv->rps.power = HIGH_POWER; /* force a reset */
 	gen6_set_rps(dev_priv->dev, dev_priv->rps.min_freq_softlimit);
 
-	gen6_enable_rps_interrupts(dev);
-
 	rc6vids = 0;
 	ret = sandybridge_pcode_read(dev_priv, GEN6_PCODE_READ_RC6VIDS, &rc6vids);
 	if (IS_GEN6(dev) && ret) {
@@ -3915,9 +4915,9 @@ static void __gen6_update_ring_freq(struct drm_device *dev)
 	 * to use for memory access.  We do this by specifying the IA frequency
 	 * the PCU should use as a reference to determine the ring frequency.
 	 */
-	for (gpu_freq = dev_priv->rps.max_freq_softlimit; gpu_freq >= dev_priv->rps.min_freq_softlimit;
+	for (gpu_freq = dev_priv->rps.max_freq; gpu_freq >= dev_priv->rps.min_freq;
 	     gpu_freq--) {
-		int diff = dev_priv->rps.max_freq_softlimit - gpu_freq;
+		int diff = dev_priv->rps.max_freq - gpu_freq;
 		unsigned int ia_freq = 0, ring_freq = 0;
 
 		if (INTEL_INFO(dev)->gen >= 8) {
@@ -4072,12 +5072,15 @@ static void cherryview_setup_pctx(struct drm_device *dev)
 
 	pcbr = I915_READ(VLV_PCBR);
 	if ((pcbr >> VLV_PCBR_ADDR_SHIFT) == 0) {
+		DRM_DEBUG_DRIVER("BIOS didn't set up PCBR, fixing up\n");
 		paddr = (dev_priv->mm.stolen_base +
 			 (gtt->stolen_size - pctx_size));
 
 		pctx_paddr = (paddr & (~4095));
 		I915_WRITE(VLV_PCBR, pctx_paddr);
 	}
+
+	DRM_DEBUG_DRIVER("PCBR: 0x%08x\n", I915_READ(VLV_PCBR));
 }
 
 static void valleyview_setup_pctx(struct drm_device *dev)
@@ -4103,6 +5106,8 @@ static void valleyview_setup_pctx(struct drm_device *dev)
 		goto out;
 	}
 
+	DRM_DEBUG_DRIVER("BIOS didn't set up PCBR, fixing up\n");
+
 	/*
 	 * From the Gunit register HAS:
 	 * The Gfx driver is expected to program this register and ensure
@@ -4121,6 +5126,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
 	I915_WRITE(VLV_PCBR, pctx_paddr);
 
 out:
+	DRM_DEBUG_DRIVER("PCBR: 0x%08x\n", I915_READ(VLV_PCBR));
 	dev_priv->vlv_pctx = pctx;
 }
 
@@ -4157,7 +5163,7 @@ static void valleyview_init_gt_powersave(struct drm_device *dev)
 		dev_priv->mem_freq = 1333;
 		break;
 	}
-	DRM_DEBUG_DRIVER("DDR speed: %d MHz", dev_priv->mem_freq);
+	DRM_DEBUG_DRIVER("DDR speed: %d MHz\n", dev_priv->mem_freq);
 
 	dev_priv->rps.max_freq = valleyview_rps_max_freq(dev_priv);
 	dev_priv->rps.rp0_freq = dev_priv->rps.max_freq;
@@ -4199,7 +5205,10 @@ static void cherryview_init_gt_powersave(struct drm_device *dev)
 
 	mutex_lock(&dev_priv->rps.hw_lock);
 
-	val = vlv_punit_read(dev_priv, CCK_FUSE_REG);
+	mutex_lock(&dev_priv->dpio_lock);
+	val = vlv_cck_read(dev_priv, CCK_FUSE_REG);
+	mutex_unlock(&dev_priv->dpio_lock);
+
 	switch ((val >> 2) & 0x7) {
 	case 0:
 	case 1:
@@ -4223,7 +5232,7 @@ static void cherryview_init_gt_powersave(struct drm_device *dev)
 		dev_priv->mem_freq = 1600;
 		break;
 	}
-	DRM_DEBUG_DRIVER("DDR speed: %d MHz", dev_priv->mem_freq);
+	DRM_DEBUG_DRIVER("DDR speed: %d MHz\n", dev_priv->mem_freq);
 
 	dev_priv->rps.max_freq = cherryview_rps_max_freq(dev_priv);
 	dev_priv->rps.rp0_freq = dev_priv->rps.max_freq;
@@ -4309,8 +5318,6 @@ static void cherryview_enable_rps(struct drm_device *dev)
 	/* For now we assume BIOS is allocating and populating the PCBR  */
 	pcbr = I915_READ(VLV_PCBR);
 
-	DRM_DEBUG_DRIVER("PCBR offset : 0x%x\n", pcbr);
-
 	/* 3: Enable RC6 */
 	if ((intel_enable_rc6(dev) & INTEL_RC6_ENABLE) &&
 						(pcbr >> VLV_PCBR_ADDR_SHIFT))
@@ -4340,7 +5347,10 @@ static void cherryview_enable_rps(struct drm_device *dev)
 
 	val = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
 
-	DRM_DEBUG_DRIVER("GPLL enabled? %s\n", val & 0x10 ? "yes" : "no");
+	/* RPS code assumes GPLL is used */
+	WARN_ONCE((val & GPLLENABLE) == 0, "GPLL not enabled\n");
+
+	DRM_DEBUG_DRIVER("GPLL enabled? %s\n", val & GPLLENABLE ? "yes" : "no");
 	DRM_DEBUG_DRIVER("GPU status: 0x%08x\n", val);
 
 	dev_priv->rps.cur_freq = (val >> 8) & 0xff;
@@ -4354,8 +5364,6 @@ static void cherryview_enable_rps(struct drm_device *dev)
 
 	valleyview_set_rps(dev_priv->dev, dev_priv->rps.efficient_freq);
 
-	gen8_enable_rps_interrupts(dev);
-
 	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 }
 
@@ -4420,7 +5428,10 @@ static void valleyview_enable_rps(struct drm_device *dev)
 
 	val = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
 
-	DRM_DEBUG_DRIVER("GPLL enabled? %s\n", val & 0x10 ? "yes" : "no");
+	/* RPS code assumes GPLL is used */
+	WARN_ONCE((val & GPLLENABLE) == 0, "GPLL not enabled\n");
+
+	DRM_DEBUG_DRIVER("GPLL enabled? %s\n", val & GPLLENABLE ? "yes" : "no");
 	DRM_DEBUG_DRIVER("GPU status: 0x%08x\n", val);
 
 	dev_priv->rps.cur_freq = (val >> 8) & 0xff;
@@ -4434,8 +5445,6 @@ static void valleyview_enable_rps(struct drm_device *dev)
 
 	valleyview_set_rps(dev_priv->dev, dev_priv->rps.efficient_freq);
 
-	gen6_enable_rps_interrupts(dev);
-
 	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 }
 
@@ -5194,12 +6203,17 @@ void intel_suspend_gt_powersave(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	/* Interrupts should be disabled already to avoid re-arming. */
-	WARN_ON(intel_irqs_enabled(dev_priv));
+	if (INTEL_INFO(dev)->gen < 6)
+		return;
 
 	flush_delayed_work(&dev_priv->rps.delayed_resume_work);
 
-	cancel_work_sync(&dev_priv->rps.work);
+	/*
+	 * TODO: disable RPS interrupts on GEN9+ too once RPS support
+	 * is added for it.
+	 */
+	if (INTEL_INFO(dev)->gen < 9)
+		gen6_disable_rps_interrupts(dev);
 
 	/* Force GPU to min freq during suspend */
 	gen6_rps_idle(dev_priv);
@@ -5209,9 +6223,6 @@ void intel_disable_gt_powersave(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	/* Interrupts should be disabled already to avoid re-arming. */
-	WARN_ON(intel_irqs_enabled(dev_priv));
-
 	if (IS_IRONLAKE_M(dev)) {
 		ironlake_disable_drps(dev);
 		ironlake_disable_rc6(dev);
@@ -5219,12 +6230,15 @@ void intel_disable_gt_powersave(struct drm_device *dev)
 		intel_suspend_gt_powersave(dev);
 
 		mutex_lock(&dev_priv->rps.hw_lock);
-		if (IS_CHERRYVIEW(dev))
+		if (INTEL_INFO(dev)->gen >= 9)
+			gen9_disable_rps(dev);
+		else if (IS_CHERRYVIEW(dev))
 			cherryview_disable_rps(dev);
 		else if (IS_VALLEYVIEW(dev))
 			valleyview_disable_rps(dev);
 		else
 			gen6_disable_rps(dev);
+
 		dev_priv->rps.enabled = false;
 		mutex_unlock(&dev_priv->rps.hw_lock);
 	}
@@ -5239,10 +6253,19 @@ static void intel_gen6_powersave_work(struct work_struct *work)
 
 	mutex_lock(&dev_priv->rps.hw_lock);
 
+	/*
+	 * TODO: reset/enable RPS interrupts on GEN9+ too, once RPS support is
+	 * added for it.
+	 */
+	if (INTEL_INFO(dev)->gen < 9)
+		gen6_reset_rps_interrupts(dev);
+
 	if (IS_CHERRYVIEW(dev)) {
 		cherryview_enable_rps(dev);
 	} else if (IS_VALLEYVIEW(dev)) {
 		valleyview_enable_rps(dev);
+	} else if (INTEL_INFO(dev)->gen >= 9) {
+		gen9_enable_rps(dev);
 	} else if (IS_BROADWELL(dev)) {
 		gen8_enable_rps(dev);
 		__gen6_update_ring_freq(dev);
@@ -5251,6 +6274,10 @@ static void intel_gen6_powersave_work(struct work_struct *work)
 		__gen6_update_ring_freq(dev);
 	}
 	dev_priv->rps.enabled = true;
+
+	if (INTEL_INFO(dev)->gen < 9)
+		gen6_enable_rps_interrupts(dev);
+
 	mutex_unlock(&dev_priv->rps.hw_lock);
 
 	intel_runtime_pm_put(dev_priv);
@@ -5481,7 +6508,7 @@ static void gen6_init_clock_gating(struct drm_device *dev)
 	 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 	 */
 	I915_WRITE(GEN6_GT_MODE,
-		   GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
+		   _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4));
 
 	ilk_init_lp_watermarks(dev);
 
@@ -5609,16 +6636,6 @@ static void broadwell_init_clock_gating(struct drm_device *dev)
 	I915_WRITE(WM2_LP_ILK, 0);
 	I915_WRITE(WM1_LP_ILK, 0);
 
-	/* FIXME(BDW): Check all the w/a, some might only apply to
-	 * pre-production hw. */
-
-
-	I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE));
-
-	I915_WRITE(_3D_CHICKEN3,
-		   _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2)));
-
-
 	/* WaSwitchSolVfFArbitrationPriority:bdw */
 	I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
 
@@ -5689,7 +6706,7 @@ static void haswell_init_clock_gating(struct drm_device *dev)
 	 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 	 */
 	I915_WRITE(GEN7_GT_MODE,
-		   GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
+		   _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4));
 
 	/* WaSwitchSolVfFArbitrationPriority:hsw */
 	I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
@@ -5786,7 +6803,7 @@ static void ivybridge_init_clock_gating(struct drm_device *dev)
 	 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 	 */
 	I915_WRITE(GEN7_GT_MODE,
-		   GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
+		   _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4));
 
 	snpcr = I915_READ(GEN6_MBCUNIT_SNPCR);
 	snpcr &= ~GEN6_MBC_SNPCR_MASK;
@@ -5899,18 +6916,6 @@ static void cherryview_init_clock_gating(struct drm_device *dev)
 	/* WaDisableSDEUnitClockGating:chv */
 	I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) |
 		   GEN8_SDEUNIT_CLOCK_GATE_DISABLE);
-
-	/* WaDisableGunitClockGating:chv (pre-production hw) */
-	I915_WRITE(VLV_GUNIT_CLOCK_GATE, I915_READ(VLV_GUNIT_CLOCK_GATE) |
-		   GINT_DIS);
-
-	/* WaDisableFfDopClockGating:chv (pre-production hw) */
-	I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL,
-		   _MASKED_BIT_ENABLE(GEN8_FF_DOP_CLOCK_GATE_DISABLE));
-
-	/* WaDisableDopClockGating:chv (pre-production hw) */
-	I915_WRITE(GEN6_UCGCTL1, I915_READ(GEN6_UCGCTL1) |
-		   GEN6_EU_TCUNIT_CLOCK_GATE_DISABLE);
 }
 
 static void g4x_init_clock_gating(struct drm_device *dev)
@@ -6036,1161 +7041,35 @@ void intel_suspend_hw(struct drm_device *dev)
 		lpt_suspend_hw(dev);
 }
 
-#define for_each_power_well(i, power_well, domain_mask, power_domains)	\
-	for (i = 0;							\
-	     i < (power_domains)->power_well_count &&			\
-		 ((power_well) = &(power_domains)->power_wells[i]);	\
-	     i++)							\
-		if ((power_well)->domains & (domain_mask))
-
-#define for_each_power_well_rev(i, power_well, domain_mask, power_domains) \
-	for (i = (power_domains)->power_well_count - 1;			 \
-	     i >= 0 && ((power_well) = &(power_domains)->power_wells[i]);\
-	     i--)							 \
-		if ((power_well)->domains & (domain_mask))
-
-/**
- * We should only use the power well if we explicitly asked the hardware to
- * enable it, so check if it's enabled and also check if we've requested it to
- * be enabled.
- */
-static bool hsw_power_well_enabled(struct drm_i915_private *dev_priv,
-				   struct i915_power_well *power_well)
-{
-	return I915_READ(HSW_PWR_WELL_DRIVER) ==
-		     (HSW_PWR_WELL_ENABLE_REQUEST | HSW_PWR_WELL_STATE_ENABLED);
-}
-
-bool intel_display_power_enabled_unlocked(struct drm_i915_private *dev_priv,
-					  enum intel_display_power_domain domain)
-{
-	struct i915_power_domains *power_domains;
-	struct i915_power_well *power_well;
-	bool is_enabled;
-	int i;
-
-	if (dev_priv->pm.suspended)
-		return false;
-
-	power_domains = &dev_priv->power_domains;
-
-	is_enabled = true;
-
-	for_each_power_well_rev(i, power_well, BIT(domain), power_domains) {
-		if (power_well->always_on)
-			continue;
-
-		if (!power_well->hw_enabled) {
-			is_enabled = false;
-			break;
-		}
-	}
-
-	return is_enabled;
-}
-
-bool intel_display_power_enabled(struct drm_i915_private *dev_priv,
-				 enum intel_display_power_domain domain)
-{
-	struct i915_power_domains *power_domains;
-	bool ret;
-
-	power_domains = &dev_priv->power_domains;
-
-	mutex_lock(&power_domains->lock);
-	ret = intel_display_power_enabled_unlocked(dev_priv, domain);
-	mutex_unlock(&power_domains->lock);
-
-	return ret;
-}
-
-/*
- * Starting with Haswell, we have a "Power Down Well" that can be turned off
- * when not needed anymore. We have 4 registers that can request the power well
- * to be enabled, and it will only be disabled if none of the registers is
- * requesting it to be enabled.
- */
-static void hsw_power_well_post_enable(struct drm_i915_private *dev_priv)
-{
-	struct drm_device *dev = dev_priv->dev;
-
-	/*
-	 * After we re-enable the power well, if we touch VGA register 0x3d5
-	 * we'll get unclaimed register interrupts. This stops after we write
-	 * anything to the VGA MSR register. The vgacon module uses this
-	 * register all the time, so if we unbind our driver and, as a
-	 * consequence, bind vgacon, we'll get stuck in an infinite loop at
-	 * console_unlock(). So make here we touch the VGA MSR register, making
-	 * sure vgacon can keep working normally without triggering interrupts
-	 * and error messages.
-	 */
-	vga_get_uninterruptible(dev->pdev, VGA_RSRC_LEGACY_IO);
-	outb(inb(VGA_MSR_READ), VGA_MSR_WRITE);
-	vga_put(dev->pdev, VGA_RSRC_LEGACY_IO);
-
-	if (IS_BROADWELL(dev))
-		gen8_irq_power_well_post_enable(dev_priv);
-}
-
-static void hsw_set_power_well(struct drm_i915_private *dev_priv,
-			       struct i915_power_well *power_well, bool enable)
-{
-	bool is_enabled, enable_requested;
-	uint32_t tmp;
-
-	tmp = I915_READ(HSW_PWR_WELL_DRIVER);
-	is_enabled = tmp & HSW_PWR_WELL_STATE_ENABLED;
-	enable_requested = tmp & HSW_PWR_WELL_ENABLE_REQUEST;
-
-	if (enable) {
-		if (!enable_requested)
-			I915_WRITE(HSW_PWR_WELL_DRIVER,
-				   HSW_PWR_WELL_ENABLE_REQUEST);
-
-		if (!is_enabled) {
-			DRM_DEBUG_KMS("Enabling power well\n");
-			if (wait_for((I915_READ(HSW_PWR_WELL_DRIVER) &
-				      HSW_PWR_WELL_STATE_ENABLED), 20))
-				DRM_ERROR("Timeout enabling power well\n");
-		}
-
-		hsw_power_well_post_enable(dev_priv);
-	} else {
-		if (enable_requested) {
-			I915_WRITE(HSW_PWR_WELL_DRIVER, 0);
-			POSTING_READ(HSW_PWR_WELL_DRIVER);
-			DRM_DEBUG_KMS("Requesting to disable the power well\n");
-		}
-	}
-}
-
-static void hsw_power_well_sync_hw(struct drm_i915_private *dev_priv,
-				   struct i915_power_well *power_well)
-{
-	hsw_set_power_well(dev_priv, power_well, power_well->count > 0);
-
-	/*
-	 * We're taking over the BIOS, so clear any requests made by it since
-	 * the driver is in charge now.
-	 */
-	if (I915_READ(HSW_PWR_WELL_BIOS) & HSW_PWR_WELL_ENABLE_REQUEST)
-		I915_WRITE(HSW_PWR_WELL_BIOS, 0);
-}
-
-static void hsw_power_well_enable(struct drm_i915_private *dev_priv,
-				  struct i915_power_well *power_well)
-{
-	hsw_set_power_well(dev_priv, power_well, true);
-}
-
-static void hsw_power_well_disable(struct drm_i915_private *dev_priv,
-				   struct i915_power_well *power_well)
-{
-	hsw_set_power_well(dev_priv, power_well, false);
-}
-
-static void i9xx_always_on_power_well_noop(struct drm_i915_private *dev_priv,
-					   struct i915_power_well *power_well)
+static void intel_init_fbc(struct drm_i915_private *dev_priv)
 {
-}
-
-static bool i9xx_always_on_power_well_enabled(struct drm_i915_private *dev_priv,
-					     struct i915_power_well *power_well)
-{
-	return true;
-}
-
-static void vlv_set_power_well(struct drm_i915_private *dev_priv,
-			       struct i915_power_well *power_well, bool enable)
-{
-	enum punit_power_well power_well_id = power_well->data;
-	u32 mask;
-	u32 state;
-	u32 ctrl;
-
-	mask = PUNIT_PWRGT_MASK(power_well_id);
-	state = enable ? PUNIT_PWRGT_PWR_ON(power_well_id) :
-			 PUNIT_PWRGT_PWR_GATE(power_well_id);
-
-	mutex_lock(&dev_priv->rps.hw_lock);
-
-#define COND \
-	((vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_STATUS) & mask) == state)
-
-	if (COND)
-		goto out;
-
-	ctrl = vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_CTRL);
-	ctrl &= ~mask;
-	ctrl |= state;
-	vlv_punit_write(dev_priv, PUNIT_REG_PWRGT_CTRL, ctrl);
-
-	if (wait_for(COND, 100))
-		DRM_ERROR("timout setting power well state %08x (%08x)\n",
-			  state,
-			  vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_CTRL));
-
-#undef COND
-
-out:
-	mutex_unlock(&dev_priv->rps.hw_lock);
-}
-
-static void vlv_power_well_sync_hw(struct drm_i915_private *dev_priv,
-				   struct i915_power_well *power_well)
-{
-	vlv_set_power_well(dev_priv, power_well, power_well->count > 0);
-}
-
-static void vlv_power_well_enable(struct drm_i915_private *dev_priv,
-				  struct i915_power_well *power_well)
-{
-	vlv_set_power_well(dev_priv, power_well, true);
-}
-
-static void vlv_power_well_disable(struct drm_i915_private *dev_priv,
-				   struct i915_power_well *power_well)
-{
-	vlv_set_power_well(dev_priv, power_well, false);
-}
-
-static bool vlv_power_well_enabled(struct drm_i915_private *dev_priv,
-				   struct i915_power_well *power_well)
-{
-	int power_well_id = power_well->data;
-	bool enabled = false;
-	u32 mask;
-	u32 state;
-	u32 ctrl;
-
-	mask = PUNIT_PWRGT_MASK(power_well_id);
-	ctrl = PUNIT_PWRGT_PWR_ON(power_well_id);
-
-	mutex_lock(&dev_priv->rps.hw_lock);
-
-	state = vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_STATUS) & mask;
-	/*
-	 * We only ever set the power-on and power-gate states, anything
-	 * else is unexpected.
-	 */
-	WARN_ON(state != PUNIT_PWRGT_PWR_ON(power_well_id) &&
-		state != PUNIT_PWRGT_PWR_GATE(power_well_id));
-	if (state == ctrl)
-		enabled = true;
-
-	/*
-	 * A transient state at this point would mean some unexpected party
-	 * is poking at the power controls too.
-	 */
-	ctrl = vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_CTRL) & mask;
-	WARN_ON(ctrl != state);
-
-	mutex_unlock(&dev_priv->rps.hw_lock);
-
-	return enabled;
-}
-
-static void vlv_display_power_well_enable(struct drm_i915_private *dev_priv,
-					  struct i915_power_well *power_well)
-{
-	WARN_ON_ONCE(power_well->data != PUNIT_POWER_WELL_DISP2D);
-
-	vlv_set_power_well(dev_priv, power_well, true);
-
-	spin_lock_irq(&dev_priv->irq_lock);
-	valleyview_enable_display_irqs(dev_priv);
-	spin_unlock_irq(&dev_priv->irq_lock);
-
-	/*
-	 * During driver initialization/resume we can avoid restoring the
-	 * part of the HW/SW state that will be inited anyway explicitly.
-	 */
-	if (dev_priv->power_domains.initializing)
+	if (!HAS_FBC(dev_priv)) {
+		dev_priv->fbc.enabled = false;
 		return;
-
-	intel_hpd_init(dev_priv->dev);
-
-	i915_redisable_vga_power_on(dev_priv->dev);
-}
-
-static void vlv_display_power_well_disable(struct drm_i915_private *dev_priv,
-					   struct i915_power_well *power_well)
-{
-	WARN_ON_ONCE(power_well->data != PUNIT_POWER_WELL_DISP2D);
-
-	spin_lock_irq(&dev_priv->irq_lock);
-	valleyview_disable_display_irqs(dev_priv);
-	spin_unlock_irq(&dev_priv->irq_lock);
-
-	vlv_set_power_well(dev_priv, power_well, false);
-
-	vlv_power_sequencer_reset(dev_priv);
-}
-
-static void vlv_dpio_cmn_power_well_enable(struct drm_i915_private *dev_priv,
-					   struct i915_power_well *power_well)
-{
-	WARN_ON_ONCE(power_well->data != PUNIT_POWER_WELL_DPIO_CMN_BC);
-
-	/*
-	 * Enable the CRI clock source so we can get at the
-	 * display and the reference clock for VGA
-	 * hotplug / manual detection.
-	 */
-	I915_WRITE(DPLL(PIPE_B), I915_READ(DPLL(PIPE_B)) |
-		   DPLL_REFA_CLK_ENABLE_VLV | DPLL_INTEGRATED_CRI_CLK_VLV);
-	udelay(1); /* >10ns for cmnreset, >0ns for sidereset */
-
-	vlv_set_power_well(dev_priv, power_well, true);
-
-	/*
-	 * From VLV2A0_DP_eDP_DPIO_driver_vbios_notes_10.docx -
-	 *  6.	De-assert cmn_reset/side_reset. Same as VLV X0.
-	 *   a.	GUnit 0x2110 bit[0] set to 1 (def 0)
-	 *   b.	The other bits such as sfr settings / modesel may all
-	 *	be set to 0.
-	 *
-	 * This should only be done on init and resume from S3 with
-	 * both PLLs disabled, or we risk losing DPIO and PLL
-	 * synchronization.
-	 */
-	I915_WRITE(DPIO_CTL, I915_READ(DPIO_CTL) | DPIO_CMNRST);
-}
-
-static void vlv_dpio_cmn_power_well_disable(struct drm_i915_private *dev_priv,
-					    struct i915_power_well *power_well)
-{
-	enum pipe pipe;
-
-	WARN_ON_ONCE(power_well->data != PUNIT_POWER_WELL_DPIO_CMN_BC);
-
-	for_each_pipe(dev_priv, pipe)
-		assert_pll_disabled(dev_priv, pipe);
-
-	/* Assert common reset */
-	I915_WRITE(DPIO_CTL, I915_READ(DPIO_CTL) & ~DPIO_CMNRST);
-
-	vlv_set_power_well(dev_priv, power_well, false);
-}
-
-static void chv_dpio_cmn_power_well_enable(struct drm_i915_private *dev_priv,
-					   struct i915_power_well *power_well)
-{
-	enum dpio_phy phy;
-
-	WARN_ON_ONCE(power_well->data != PUNIT_POWER_WELL_DPIO_CMN_BC &&
-		     power_well->data != PUNIT_POWER_WELL_DPIO_CMN_D);
-
-	/*
-	 * Enable the CRI clock source so we can get at the
-	 * display and the reference clock for VGA
-	 * hotplug / manual detection.
-	 */
-	if (power_well->data == PUNIT_POWER_WELL_DPIO_CMN_BC) {
-		phy = DPIO_PHY0;
-		I915_WRITE(DPLL(PIPE_B), I915_READ(DPLL(PIPE_B)) |
-			   DPLL_REFA_CLK_ENABLE_VLV);
-		I915_WRITE(DPLL(PIPE_B), I915_READ(DPLL(PIPE_B)) |
-			   DPLL_REFA_CLK_ENABLE_VLV | DPLL_INTEGRATED_CRI_CLK_VLV);
-	} else {
-		phy = DPIO_PHY1;
-		I915_WRITE(DPLL(PIPE_C), I915_READ(DPLL(PIPE_C)) |
-			   DPLL_REFA_CLK_ENABLE_VLV | DPLL_INTEGRATED_CRI_CLK_VLV);
 	}
-	udelay(1); /* >10ns for cmnreset, >0ns for sidereset */
-	vlv_set_power_well(dev_priv, power_well, true);
 
-	/* Poll for phypwrgood signal */
-	if (wait_for(I915_READ(DISPLAY_PHY_STATUS) & PHY_POWERGOOD(phy), 1))
-		DRM_ERROR("Display PHY %d is not power up\n", phy);
-
-	I915_WRITE(DISPLAY_PHY_CONTROL, I915_READ(DISPLAY_PHY_CONTROL) |
-		   PHY_COM_LANE_RESET_DEASSERT(phy));
-}
-
-static void chv_dpio_cmn_power_well_disable(struct drm_i915_private *dev_priv,
-					    struct i915_power_well *power_well)
-{
-	enum dpio_phy phy;
-
-	WARN_ON_ONCE(power_well->data != PUNIT_POWER_WELL_DPIO_CMN_BC &&
-		     power_well->data != PUNIT_POWER_WELL_DPIO_CMN_D);
-
-	if (power_well->data == PUNIT_POWER_WELL_DPIO_CMN_BC) {
-		phy = DPIO_PHY0;
-		assert_pll_disabled(dev_priv, PIPE_A);
-		assert_pll_disabled(dev_priv, PIPE_B);
+	if (INTEL_INFO(dev_priv)->gen >= 7) {
+		dev_priv->display.fbc_enabled = ironlake_fbc_enabled;
+		dev_priv->display.enable_fbc = gen7_enable_fbc;
+		dev_priv->display.disable_fbc = ironlake_disable_fbc;
+	} else if (INTEL_INFO(dev_priv)->gen >= 5) {
+		dev_priv->display.fbc_enabled = ironlake_fbc_enabled;
+		dev_priv->display.enable_fbc = ironlake_enable_fbc;
+		dev_priv->display.disable_fbc = ironlake_disable_fbc;
+	} else if (IS_GM45(dev_priv)) {
+		dev_priv->display.fbc_enabled = g4x_fbc_enabled;
+		dev_priv->display.enable_fbc = g4x_enable_fbc;
+		dev_priv->display.disable_fbc = g4x_disable_fbc;
 	} else {
-		phy = DPIO_PHY1;
-		assert_pll_disabled(dev_priv, PIPE_C);
-	}
+		dev_priv->display.fbc_enabled = i8xx_fbc_enabled;
+		dev_priv->display.enable_fbc = i8xx_enable_fbc;
+		dev_priv->display.disable_fbc = i8xx_disable_fbc;
 
-	I915_WRITE(DISPLAY_PHY_CONTROL, I915_READ(DISPLAY_PHY_CONTROL) &
-		   ~PHY_COM_LANE_RESET_DEASSERT(phy));
-
-	vlv_set_power_well(dev_priv, power_well, false);
-}
-
-static bool chv_pipe_power_well_enabled(struct drm_i915_private *dev_priv,
-					struct i915_power_well *power_well)
-{
-	enum pipe pipe = power_well->data;
-	bool enabled;
-	u32 state, ctrl;
-
-	mutex_lock(&dev_priv->rps.hw_lock);
-
-	state = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ) & DP_SSS_MASK(pipe);
-	/*
-	 * We only ever set the power-on and power-gate states, anything
-	 * else is unexpected.
-	 */
-	WARN_ON(state != DP_SSS_PWR_ON(pipe) && state != DP_SSS_PWR_GATE(pipe));
-	enabled = state == DP_SSS_PWR_ON(pipe);
-
-	/*
-	 * A transient state at this point would mean some unexpected party
-	 * is poking at the power controls too.
-	 */
-	ctrl = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ) & DP_SSC_MASK(pipe);
-	WARN_ON(ctrl << 16 != state);
-
-	mutex_unlock(&dev_priv->rps.hw_lock);
-
-	return enabled;
-}
-
-static void chv_set_pipe_power_well(struct drm_i915_private *dev_priv,
-				    struct i915_power_well *power_well,
-				    bool enable)
-{
-	enum pipe pipe = power_well->data;
-	u32 state;
-	u32 ctrl;
-
-	state = enable ? DP_SSS_PWR_ON(pipe) : DP_SSS_PWR_GATE(pipe);
-
-	mutex_lock(&dev_priv->rps.hw_lock);
-
-#define COND \
-	((vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ) & DP_SSS_MASK(pipe)) == state)
-
-	if (COND)
-		goto out;
-
-	ctrl = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ);
-	ctrl &= ~DP_SSC_MASK(pipe);
-	ctrl |= enable ? DP_SSC_PWR_ON(pipe) : DP_SSC_PWR_GATE(pipe);
-	vlv_punit_write(dev_priv, PUNIT_REG_DSPFREQ, ctrl);
-
-	if (wait_for(COND, 100))
-		DRM_ERROR("timout setting power well state %08x (%08x)\n",
-			  state,
-			  vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ));
-
-#undef COND
-
-out:
-	mutex_unlock(&dev_priv->rps.hw_lock);
-}
-
-static void chv_pipe_power_well_sync_hw(struct drm_i915_private *dev_priv,
-					struct i915_power_well *power_well)
-{
-	chv_set_pipe_power_well(dev_priv, power_well, power_well->count > 0);
-}
-
-static void chv_pipe_power_well_enable(struct drm_i915_private *dev_priv,
-				       struct i915_power_well *power_well)
-{
-	WARN_ON_ONCE(power_well->data != PIPE_A &&
-		     power_well->data != PIPE_B &&
-		     power_well->data != PIPE_C);
-
-	chv_set_pipe_power_well(dev_priv, power_well, true);
-}
-
-static void chv_pipe_power_well_disable(struct drm_i915_private *dev_priv,
-					struct i915_power_well *power_well)
-{
-	WARN_ON_ONCE(power_well->data != PIPE_A &&
-		     power_well->data != PIPE_B &&
-		     power_well->data != PIPE_C);
-
-	chv_set_pipe_power_well(dev_priv, power_well, false);
-}
-
-static void check_power_well_state(struct drm_i915_private *dev_priv,
-				   struct i915_power_well *power_well)
-{
-	bool enabled = power_well->ops->is_enabled(dev_priv, power_well);
-
-	if (power_well->always_on || !i915.disable_power_well) {
-		if (!enabled)
-			goto mismatch;
-
-		return;
-	}
-
-	if (enabled != (power_well->count > 0))
-		goto mismatch;
-
-	return;
-
-mismatch:
-	WARN(1, "state mismatch for '%s' (always_on %d hw state %d use-count %d disable_power_well %d\n",
-		  power_well->name, power_well->always_on, enabled,
-		  power_well->count, i915.disable_power_well);
-}
-
-void intel_display_power_get(struct drm_i915_private *dev_priv,
-			     enum intel_display_power_domain domain)
-{
-	struct i915_power_domains *power_domains;
-	struct i915_power_well *power_well;
-	int i;
-
-	intel_runtime_pm_get(dev_priv);
-
-	power_domains = &dev_priv->power_domains;
-
-	mutex_lock(&power_domains->lock);
-
-	for_each_power_well(i, power_well, BIT(domain), power_domains) {
-		if (!power_well->count++) {
-			DRM_DEBUG_KMS("enabling %s\n", power_well->name);
-			power_well->ops->enable(dev_priv, power_well);
-			power_well->hw_enabled = true;
-		}
-
-		check_power_well_state(dev_priv, power_well);
+		/* This value was pulled out of someone's hat */
+		I915_WRITE(FBC_CONTROL, 500 << FBC_CTL_INTERVAL_SHIFT);
 	}
 
-	power_domains->domain_use_count[domain]++;
-
-	mutex_unlock(&power_domains->lock);
-}
-
-void intel_display_power_put(struct drm_i915_private *dev_priv,
-			     enum intel_display_power_domain domain)
-{
-	struct i915_power_domains *power_domains;
-	struct i915_power_well *power_well;
-	int i;
-
-	power_domains = &dev_priv->power_domains;
-
-	mutex_lock(&power_domains->lock);
-
-	WARN_ON(!power_domains->domain_use_count[domain]);
-	power_domains->domain_use_count[domain]--;
-
-	for_each_power_well_rev(i, power_well, BIT(domain), power_domains) {
-		WARN_ON(!power_well->count);
-
-		if (!--power_well->count && i915.disable_power_well) {
-			DRM_DEBUG_KMS("disabling %s\n", power_well->name);
-			power_well->hw_enabled = false;
-			power_well->ops->disable(dev_priv, power_well);
-		}
-
-		check_power_well_state(dev_priv, power_well);
-	}
-
-	mutex_unlock(&power_domains->lock);
-
-	intel_runtime_pm_put(dev_priv);
-}
-
-static struct i915_power_domains *hsw_pwr;
-
-/* Display audio driver power well request */
-int i915_request_power_well(void)
-{
-	struct drm_i915_private *dev_priv;
-
-	if (!hsw_pwr)
-		return -ENODEV;
-
-	dev_priv = container_of(hsw_pwr, struct drm_i915_private,
-				power_domains);
-	intel_display_power_get(dev_priv, POWER_DOMAIN_AUDIO);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(i915_request_power_well);
-
-/* Display audio driver power well release */
-int i915_release_power_well(void)
-{
-	struct drm_i915_private *dev_priv;
-
-	if (!hsw_pwr)
-		return -ENODEV;
-
-	dev_priv = container_of(hsw_pwr, struct drm_i915_private,
-				power_domains);
-	intel_display_power_put(dev_priv, POWER_DOMAIN_AUDIO);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(i915_release_power_well);
-
-/*
- * Private interface for the audio driver to get CDCLK in kHz.
- *
- * Caller must request power well using i915_request_power_well() prior to
- * making the call.
- */
-int i915_get_cdclk_freq(void)
-{
-	struct drm_i915_private *dev_priv;
-
-	if (!hsw_pwr)
-		return -ENODEV;
-
-	dev_priv = container_of(hsw_pwr, struct drm_i915_private,
-				power_domains);
-
-	return intel_ddi_get_cdclk_freq(dev_priv);
-}
-EXPORT_SYMBOL_GPL(i915_get_cdclk_freq);
-
-
-#define POWER_DOMAIN_MASK (BIT(POWER_DOMAIN_NUM) - 1)
-
-#define HSW_ALWAYS_ON_POWER_DOMAINS (			\
-	BIT(POWER_DOMAIN_PIPE_A) |			\
-	BIT(POWER_DOMAIN_TRANSCODER_EDP) |		\
-	BIT(POWER_DOMAIN_PORT_DDI_A_2_LANES) |		\
-	BIT(POWER_DOMAIN_PORT_DDI_A_4_LANES) |		\
-	BIT(POWER_DOMAIN_PORT_DDI_B_2_LANES) |		\
-	BIT(POWER_DOMAIN_PORT_DDI_B_4_LANES) |		\
-	BIT(POWER_DOMAIN_PORT_DDI_C_2_LANES) |		\
-	BIT(POWER_DOMAIN_PORT_DDI_C_4_LANES) |		\
-	BIT(POWER_DOMAIN_PORT_DDI_D_2_LANES) |		\
-	BIT(POWER_DOMAIN_PORT_DDI_D_4_LANES) |		\
-	BIT(POWER_DOMAIN_PORT_CRT) |			\
-	BIT(POWER_DOMAIN_PLLS) |			\
-	BIT(POWER_DOMAIN_INIT))
-#define HSW_DISPLAY_POWER_DOMAINS (				\
-	(POWER_DOMAIN_MASK & ~HSW_ALWAYS_ON_POWER_DOMAINS) |	\
-	BIT(POWER_DOMAIN_INIT))
-
-#define BDW_ALWAYS_ON_POWER_DOMAINS (			\
-	HSW_ALWAYS_ON_POWER_DOMAINS |			\
-	BIT(POWER_DOMAIN_PIPE_A_PANEL_FITTER))
-#define BDW_DISPLAY_POWER_DOMAINS (				\
-	(POWER_DOMAIN_MASK & ~BDW_ALWAYS_ON_POWER_DOMAINS) |	\
-	BIT(POWER_DOMAIN_INIT))
-
-#define VLV_ALWAYS_ON_POWER_DOMAINS	BIT(POWER_DOMAIN_INIT)
-#define VLV_DISPLAY_POWER_DOMAINS	POWER_DOMAIN_MASK
-
-#define VLV_DPIO_CMN_BC_POWER_DOMAINS (		\
-	BIT(POWER_DOMAIN_PORT_DDI_B_2_LANES) |	\
-	BIT(POWER_DOMAIN_PORT_DDI_B_4_LANES) |	\
-	BIT(POWER_DOMAIN_PORT_DDI_C_2_LANES) |	\
-	BIT(POWER_DOMAIN_PORT_DDI_C_4_LANES) |	\
-	BIT(POWER_DOMAIN_PORT_CRT) |		\
-	BIT(POWER_DOMAIN_INIT))
-
-#define VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS (	\
-	BIT(POWER_DOMAIN_PORT_DDI_B_2_LANES) |	\
-	BIT(POWER_DOMAIN_PORT_DDI_B_4_LANES) |	\
-	BIT(POWER_DOMAIN_INIT))
-
-#define VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS (	\
-	BIT(POWER_DOMAIN_PORT_DDI_B_4_LANES) |	\
-	BIT(POWER_DOMAIN_INIT))
-
-#define VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS (	\
-	BIT(POWER_DOMAIN_PORT_DDI_C_2_LANES) |	\
-	BIT(POWER_DOMAIN_PORT_DDI_C_4_LANES) |	\
-	BIT(POWER_DOMAIN_INIT))
-
-#define VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS (	\
-	BIT(POWER_DOMAIN_PORT_DDI_C_4_LANES) |	\
-	BIT(POWER_DOMAIN_INIT))
-
-#define CHV_PIPE_A_POWER_DOMAINS (	\
-	BIT(POWER_DOMAIN_PIPE_A) |	\
-	BIT(POWER_DOMAIN_INIT))
-
-#define CHV_PIPE_B_POWER_DOMAINS (	\
-	BIT(POWER_DOMAIN_PIPE_B) |	\
-	BIT(POWER_DOMAIN_INIT))
-
-#define CHV_PIPE_C_POWER_DOMAINS (	\
-	BIT(POWER_DOMAIN_PIPE_C) |	\
-	BIT(POWER_DOMAIN_INIT))
-
-#define CHV_DPIO_CMN_BC_POWER_DOMAINS (		\
-	BIT(POWER_DOMAIN_PORT_DDI_B_2_LANES) |	\
-	BIT(POWER_DOMAIN_PORT_DDI_B_4_LANES) |	\
-	BIT(POWER_DOMAIN_PORT_DDI_C_2_LANES) |	\
-	BIT(POWER_DOMAIN_PORT_DDI_C_4_LANES) |	\
-	BIT(POWER_DOMAIN_INIT))
-
-#define CHV_DPIO_CMN_D_POWER_DOMAINS (		\
-	BIT(POWER_DOMAIN_PORT_DDI_D_2_LANES) |	\
-	BIT(POWER_DOMAIN_PORT_DDI_D_4_LANES) |	\
-	BIT(POWER_DOMAIN_INIT))
-
-#define CHV_DPIO_TX_D_LANES_01_POWER_DOMAINS (	\
-	BIT(POWER_DOMAIN_PORT_DDI_D_2_LANES) |	\
-	BIT(POWER_DOMAIN_PORT_DDI_D_4_LANES) |	\
-	BIT(POWER_DOMAIN_INIT))
-
-#define CHV_DPIO_TX_D_LANES_23_POWER_DOMAINS (	\
-	BIT(POWER_DOMAIN_PORT_DDI_D_4_LANES) |	\
-	BIT(POWER_DOMAIN_INIT))
-
-static const struct i915_power_well_ops i9xx_always_on_power_well_ops = {
-	.sync_hw = i9xx_always_on_power_well_noop,
-	.enable = i9xx_always_on_power_well_noop,
-	.disable = i9xx_always_on_power_well_noop,
-	.is_enabled = i9xx_always_on_power_well_enabled,
-};
-
-static const struct i915_power_well_ops chv_pipe_power_well_ops = {
-	.sync_hw = chv_pipe_power_well_sync_hw,
-	.enable = chv_pipe_power_well_enable,
-	.disable = chv_pipe_power_well_disable,
-	.is_enabled = chv_pipe_power_well_enabled,
-};
-
-static const struct i915_power_well_ops chv_dpio_cmn_power_well_ops = {
-	.sync_hw = vlv_power_well_sync_hw,
-	.enable = chv_dpio_cmn_power_well_enable,
-	.disable = chv_dpio_cmn_power_well_disable,
-	.is_enabled = vlv_power_well_enabled,
-};
-
-static struct i915_power_well i9xx_always_on_power_well[] = {
-	{
-		.name = "always-on",
-		.always_on = 1,
-		.domains = POWER_DOMAIN_MASK,
-		.ops = &i9xx_always_on_power_well_ops,
-	},
-};
-
-static const struct i915_power_well_ops hsw_power_well_ops = {
-	.sync_hw = hsw_power_well_sync_hw,
-	.enable = hsw_power_well_enable,
-	.disable = hsw_power_well_disable,
-	.is_enabled = hsw_power_well_enabled,
-};
-
-static struct i915_power_well hsw_power_wells[] = {
-	{
-		.name = "always-on",
-		.always_on = 1,
-		.domains = HSW_ALWAYS_ON_POWER_DOMAINS,
-		.ops = &i9xx_always_on_power_well_ops,
-	},
-	{
-		.name = "display",
-		.domains = HSW_DISPLAY_POWER_DOMAINS,
-		.ops = &hsw_power_well_ops,
-	},
-};
-
-static struct i915_power_well bdw_power_wells[] = {
-	{
-		.name = "always-on",
-		.always_on = 1,
-		.domains = BDW_ALWAYS_ON_POWER_DOMAINS,
-		.ops = &i9xx_always_on_power_well_ops,
-	},
-	{
-		.name = "display",
-		.domains = BDW_DISPLAY_POWER_DOMAINS,
-		.ops = &hsw_power_well_ops,
-	},
-};
-
-static const struct i915_power_well_ops vlv_display_power_well_ops = {
-	.sync_hw = vlv_power_well_sync_hw,
-	.enable = vlv_display_power_well_enable,
-	.disable = vlv_display_power_well_disable,
-	.is_enabled = vlv_power_well_enabled,
-};
-
-static const struct i915_power_well_ops vlv_dpio_cmn_power_well_ops = {
-	.sync_hw = vlv_power_well_sync_hw,
-	.enable = vlv_dpio_cmn_power_well_enable,
-	.disable = vlv_dpio_cmn_power_well_disable,
-	.is_enabled = vlv_power_well_enabled,
-};
-
-static const struct i915_power_well_ops vlv_dpio_power_well_ops = {
-	.sync_hw = vlv_power_well_sync_hw,
-	.enable = vlv_power_well_enable,
-	.disable = vlv_power_well_disable,
-	.is_enabled = vlv_power_well_enabled,
-};
-
-static struct i915_power_well vlv_power_wells[] = {
-	{
-		.name = "always-on",
-		.always_on = 1,
-		.domains = VLV_ALWAYS_ON_POWER_DOMAINS,
-		.ops = &i9xx_always_on_power_well_ops,
-	},
-	{
-		.name = "display",
-		.domains = VLV_DISPLAY_POWER_DOMAINS,
-		.data = PUNIT_POWER_WELL_DISP2D,
-		.ops = &vlv_display_power_well_ops,
-	},
-	{
-		.name = "dpio-tx-b-01",
-		.domains = VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS |
-			   VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS |
-			   VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS |
-			   VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS,
-		.ops = &vlv_dpio_power_well_ops,
-		.data = PUNIT_POWER_WELL_DPIO_TX_B_LANES_01,
-	},
-	{
-		.name = "dpio-tx-b-23",
-		.domains = VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS |
-			   VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS |
-			   VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS |
-			   VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS,
-		.ops = &vlv_dpio_power_well_ops,
-		.data = PUNIT_POWER_WELL_DPIO_TX_B_LANES_23,
-	},
-	{
-		.name = "dpio-tx-c-01",
-		.domains = VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS |
-			   VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS |
-			   VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS |
-			   VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS,
-		.ops = &vlv_dpio_power_well_ops,
-		.data = PUNIT_POWER_WELL_DPIO_TX_C_LANES_01,
-	},
-	{
-		.name = "dpio-tx-c-23",
-		.domains = VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS |
-			   VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS |
-			   VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS |
-			   VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS,
-		.ops = &vlv_dpio_power_well_ops,
-		.data = PUNIT_POWER_WELL_DPIO_TX_C_LANES_23,
-	},
-	{
-		.name = "dpio-common",
-		.domains = VLV_DPIO_CMN_BC_POWER_DOMAINS,
-		.data = PUNIT_POWER_WELL_DPIO_CMN_BC,
-		.ops = &vlv_dpio_cmn_power_well_ops,
-	},
-};
-
-static struct i915_power_well chv_power_wells[] = {
-	{
-		.name = "always-on",
-		.always_on = 1,
-		.domains = VLV_ALWAYS_ON_POWER_DOMAINS,
-		.ops = &i9xx_always_on_power_well_ops,
-	},
-#if 0
-	{
-		.name = "display",
-		.domains = VLV_DISPLAY_POWER_DOMAINS,
-		.data = PUNIT_POWER_WELL_DISP2D,
-		.ops = &vlv_display_power_well_ops,
-	},
-	{
-		.name = "pipe-a",
-		.domains = CHV_PIPE_A_POWER_DOMAINS,
-		.data = PIPE_A,
-		.ops = &chv_pipe_power_well_ops,
-	},
-	{
-		.name = "pipe-b",
-		.domains = CHV_PIPE_B_POWER_DOMAINS,
-		.data = PIPE_B,
-		.ops = &chv_pipe_power_well_ops,
-	},
-	{
-		.name = "pipe-c",
-		.domains = CHV_PIPE_C_POWER_DOMAINS,
-		.data = PIPE_C,
-		.ops = &chv_pipe_power_well_ops,
-	},
-#endif
-	{
-		.name = "dpio-common-bc",
-		/*
-		 * XXX: cmnreset for one PHY seems to disturb the other.
-		 * As a workaround keep both powered on at the same
-		 * time for now.
-		 */
-		.domains = CHV_DPIO_CMN_BC_POWER_DOMAINS | CHV_DPIO_CMN_D_POWER_DOMAINS,
-		.data = PUNIT_POWER_WELL_DPIO_CMN_BC,
-		.ops = &chv_dpio_cmn_power_well_ops,
-	},
-	{
-		.name = "dpio-common-d",
-		/*
-		 * XXX: cmnreset for one PHY seems to disturb the other.
-		 * As a workaround keep both powered on at the same
-		 * time for now.
-		 */
-		.domains = CHV_DPIO_CMN_BC_POWER_DOMAINS | CHV_DPIO_CMN_D_POWER_DOMAINS,
-		.data = PUNIT_POWER_WELL_DPIO_CMN_D,
-		.ops = &chv_dpio_cmn_power_well_ops,
-	},
-#if 0
-	{
-		.name = "dpio-tx-b-01",
-		.domains = VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS |
-			   VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS,
-		.ops = &vlv_dpio_power_well_ops,
-		.data = PUNIT_POWER_WELL_DPIO_TX_B_LANES_01,
-	},
-	{
-		.name = "dpio-tx-b-23",
-		.domains = VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS |
-			   VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS,
-		.ops = &vlv_dpio_power_well_ops,
-		.data = PUNIT_POWER_WELL_DPIO_TX_B_LANES_23,
-	},
-	{
-		.name = "dpio-tx-c-01",
-		.domains = VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS |
-			   VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS,
-		.ops = &vlv_dpio_power_well_ops,
-		.data = PUNIT_POWER_WELL_DPIO_TX_C_LANES_01,
-	},
-	{
-		.name = "dpio-tx-c-23",
-		.domains = VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS |
-			   VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS,
-		.ops = &vlv_dpio_power_well_ops,
-		.data = PUNIT_POWER_WELL_DPIO_TX_C_LANES_23,
-	},
-	{
-		.name = "dpio-tx-d-01",
-		.domains = CHV_DPIO_TX_D_LANES_01_POWER_DOMAINS |
-			   CHV_DPIO_TX_D_LANES_23_POWER_DOMAINS,
-		.ops = &vlv_dpio_power_well_ops,
-		.data = PUNIT_POWER_WELL_DPIO_TX_D_LANES_01,
-	},
-	{
-		.name = "dpio-tx-d-23",
-		.domains = CHV_DPIO_TX_D_LANES_01_POWER_DOMAINS |
-			   CHV_DPIO_TX_D_LANES_23_POWER_DOMAINS,
-		.ops = &vlv_dpio_power_well_ops,
-		.data = PUNIT_POWER_WELL_DPIO_TX_D_LANES_23,
-	},
-#endif
-};
-
-static struct i915_power_well *lookup_power_well(struct drm_i915_private *dev_priv,
-						 enum punit_power_well power_well_id)
-{
-	struct i915_power_domains *power_domains = &dev_priv->power_domains;
-	struct i915_power_well *power_well;
-	int i;
-
-	for_each_power_well(i, power_well, POWER_DOMAIN_MASK, power_domains) {
-		if (power_well->data == power_well_id)
-			return power_well;
-	}
-
-	return NULL;
-}
-
-#define set_power_wells(power_domains, __power_wells) ({		\
-	(power_domains)->power_wells = (__power_wells);			\
-	(power_domains)->power_well_count = ARRAY_SIZE(__power_wells);	\
-})
-
-int intel_power_domains_init(struct drm_i915_private *dev_priv)
-{
-	struct i915_power_domains *power_domains = &dev_priv->power_domains;
-
-	mutex_init(&power_domains->lock);
-
-	/*
-	 * The enabling order will be from lower to higher indexed wells,
-	 * the disabling order is reversed.
-	 */
-	if (IS_HASWELL(dev_priv->dev)) {
-		set_power_wells(power_domains, hsw_power_wells);
-		hsw_pwr = power_domains;
-	} else if (IS_BROADWELL(dev_priv->dev)) {
-		set_power_wells(power_domains, bdw_power_wells);
-		hsw_pwr = power_domains;
-	} else if (IS_CHERRYVIEW(dev_priv->dev)) {
-		set_power_wells(power_domains, chv_power_wells);
-	} else if (IS_VALLEYVIEW(dev_priv->dev)) {
-		set_power_wells(power_domains, vlv_power_wells);
-	} else {
-		set_power_wells(power_domains, i9xx_always_on_power_well);
-	}
-
-	return 0;
-}
-
-void intel_power_domains_remove(struct drm_i915_private *dev_priv)
-{
-	hsw_pwr = NULL;
-}
-
-static void intel_power_domains_resume(struct drm_i915_private *dev_priv)
-{
-	struct i915_power_domains *power_domains = &dev_priv->power_domains;
-	struct i915_power_well *power_well;
-	int i;
-
-	mutex_lock(&power_domains->lock);
-	for_each_power_well(i, power_well, POWER_DOMAIN_MASK, power_domains) {
-		power_well->ops->sync_hw(dev_priv, power_well);
-		power_well->hw_enabled = power_well->ops->is_enabled(dev_priv,
-								     power_well);
-	}
-	mutex_unlock(&power_domains->lock);
-}
-
-static void vlv_cmnlane_wa(struct drm_i915_private *dev_priv)
-{
-	struct i915_power_well *cmn =
-		lookup_power_well(dev_priv, PUNIT_POWER_WELL_DPIO_CMN_BC);
-	struct i915_power_well *disp2d =
-		lookup_power_well(dev_priv, PUNIT_POWER_WELL_DISP2D);
-
-	/* nothing to do if common lane is already off */
-	if (!cmn->ops->is_enabled(dev_priv, cmn))
-		return;
-
-	/* If the display might be already active skip this */
-	if (disp2d->ops->is_enabled(dev_priv, disp2d) &&
-	    I915_READ(DPIO_CTL) & DPIO_CMNRST)
-		return;
-
-	DRM_DEBUG_KMS("toggling display PHY side reset\n");
-
-	/* cmnlane needs DPLL registers */
-	disp2d->ops->enable(dev_priv, disp2d);
-
-	/*
-	 * From VLV2A0_DP_eDP_HDMI_DPIO_driver_vbios_notes_11.docx:
-	 * Need to assert and de-assert PHY SB reset by gating the
-	 * common lane power, then un-gating it.
-	 * Simply ungating isn't enough to reset the PHY enough to get
-	 * ports and lanes running.
-	 */
-	cmn->ops->disable(dev_priv, cmn);
-}
-
-void intel_power_domains_init_hw(struct drm_i915_private *dev_priv)
-{
-	struct drm_device *dev = dev_priv->dev;
-	struct i915_power_domains *power_domains = &dev_priv->power_domains;
-
-	power_domains->initializing = true;
-
-	if (IS_VALLEYVIEW(dev) && !IS_CHERRYVIEW(dev)) {
-		mutex_lock(&power_domains->lock);
-		vlv_cmnlane_wa(dev_priv);
-		mutex_unlock(&power_domains->lock);
-	}
-
-	/* For now, we need the power well to be always enabled. */
-	intel_display_set_init_power(dev_priv, true);
-	intel_power_domains_resume(dev_priv);
-	power_domains->initializing = false;
-}
-
-void intel_aux_display_runtime_get(struct drm_i915_private *dev_priv)
-{
-	intel_runtime_pm_get(dev_priv);
-}
-
-void intel_aux_display_runtime_put(struct drm_i915_private *dev_priv)
-{
-	intel_runtime_pm_put(dev_priv);
-}
-
-void intel_runtime_pm_get(struct drm_i915_private *dev_priv)
-{
-	struct drm_device *dev = dev_priv->dev;
-	struct device *device = &dev->pdev->dev;
-
-	if (!HAS_RUNTIME_PM(dev))
-		return;
-
-	pm_runtime_get_sync(device);
-	WARN(dev_priv->pm.suspended, "Device still suspended.\n");
-}
-
-void intel_runtime_pm_get_noresume(struct drm_i915_private *dev_priv)
-{
-	struct drm_device *dev = dev_priv->dev;
-	struct device *device = &dev->pdev->dev;
-
-	if (!HAS_RUNTIME_PM(dev))
-		return;
-
-	WARN(dev_priv->pm.suspended, "Getting nosync-ref while suspended.\n");
-	pm_runtime_get_noresume(device);
-}
-
-void intel_runtime_pm_put(struct drm_i915_private *dev_priv)
-{
-	struct drm_device *dev = dev_priv->dev;
-	struct device *device = &dev->pdev->dev;
-
-	if (!HAS_RUNTIME_PM(dev))
-		return;
-
-	pm_runtime_mark_last_busy(device);
-	pm_runtime_put_autosuspend(device);
-}
-
-void intel_init_runtime_pm(struct drm_i915_private *dev_priv)
-{
-	struct drm_device *dev = dev_priv->dev;
-	struct device *device = &dev->pdev->dev;
-
-	if (!HAS_RUNTIME_PM(dev))
-		return;
-
-	pm_runtime_set_active(device);
-
-	/*
-	 * RPM depends on RC6 to save restore the GT HW context, so make RC6 a
-	 * requirement.
-	 */
-	if (!intel_enable_rc6(dev)) {
-		DRM_INFO("RC6 disabled, disabling runtime PM support\n");
-		return;
-	}
-
-	pm_runtime_set_autosuspend_delay(device, 10000); /* 10s */
-	pm_runtime_mark_last_busy(device);
-	pm_runtime_use_autosuspend(device);
-
-	pm_runtime_put_autosuspend(device);
-}
-
-void intel_fini_runtime_pm(struct drm_i915_private *dev_priv)
-{
-	struct drm_device *dev = dev_priv->dev;
-	struct device *device = &dev->pdev->dev;
-
-	if (!HAS_RUNTIME_PM(dev))
-		return;
-
-	if (!intel_enable_rc6(dev))
-		return;
-
-	/* Make sure we're not suspended first. */
-	pm_runtime_get_sync(device);
-	pm_runtime_disable(device);
+	dev_priv->fbc.enabled = dev_priv->display.fbc_enabled(dev_priv->dev);
 }
 
 /* Set up chip specific power management-related functions */
@@ -7198,28 +7077,7 @@ void intel_init_pm(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	if (HAS_FBC(dev)) {
-		if (INTEL_INFO(dev)->gen >= 7) {
-			dev_priv->display.fbc_enabled = ironlake_fbc_enabled;
-			dev_priv->display.enable_fbc = gen7_enable_fbc;
-			dev_priv->display.disable_fbc = ironlake_disable_fbc;
-		} else if (INTEL_INFO(dev)->gen >= 5) {
-			dev_priv->display.fbc_enabled = ironlake_fbc_enabled;
-			dev_priv->display.enable_fbc = ironlake_enable_fbc;
-			dev_priv->display.disable_fbc = ironlake_disable_fbc;
-		} else if (IS_GM45(dev)) {
-			dev_priv->display.fbc_enabled = g4x_fbc_enabled;
-			dev_priv->display.enable_fbc = g4x_enable_fbc;
-			dev_priv->display.disable_fbc = g4x_disable_fbc;
-		} else {
-			dev_priv->display.fbc_enabled = i8xx_fbc_enabled;
-			dev_priv->display.enable_fbc = i8xx_enable_fbc;
-			dev_priv->display.disable_fbc = i8xx_disable_fbc;
-
-			/* This value was pulled out of someone's hat */
-			I915_WRITE(FBC_CONTROL, 500 << FBC_CTL_INTERVAL_SHIFT);
-		}
-	}
+	intel_init_fbc(dev_priv);
 
 	/* For cxsr */
 	if (IS_PINEVIEW(dev))
@@ -7228,7 +7086,13 @@ void intel_init_pm(struct drm_device *dev)
 		i915_ironlake_get_mem_freq(dev);
 
 	/* For FIFO watermark updates */
-	if (HAS_PCH_SPLIT(dev)) {
+	if (INTEL_INFO(dev)->gen >= 9) {
+		skl_setup_wm_latency(dev);
+
+		dev_priv->display.init_clock_gating = gen9_init_clock_gating;
+		dev_priv->display.update_wm = skl_update_wm;
+		dev_priv->display.update_sprite_wm = skl_update_sprite_wm;
+	} else if (HAS_PCH_SPLIT(dev)) {
 		ilk_setup_wm_latency(dev);
 
 		if ((IS_GEN5(dev) && dev_priv->wm.pri_latency[1] &&
@@ -7309,7 +7173,7 @@ void intel_init_pm(struct drm_device *dev)
 	}
 }
 
-int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u8 mbox, u32 *val)
+int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val)
 {
 	WARN_ON(!mutex_is_locked(&dev_priv->rps.hw_lock));
 
@@ -7319,6 +7183,7 @@ int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u8 mbox, u32 *val)
 	}
 
 	I915_WRITE(GEN6_PCODE_DATA, *val);
+	I915_WRITE(GEN6_PCODE_DATA1, 0);
 	I915_WRITE(GEN6_PCODE_MAILBOX, GEN6_PCODE_READY | mbox);
 
 	if (wait_for((I915_READ(GEN6_PCODE_MAILBOX) & GEN6_PCODE_READY) == 0,
@@ -7333,7 +7198,7 @@ int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u8 mbox, u32 *val)
 	return 0;
 }
 
-int sandybridge_pcode_write(struct drm_i915_private *dev_priv, u8 mbox, u32 val)
+int sandybridge_pcode_write(struct drm_i915_private *dev_priv, u32 mbox, u32 val)
 {
 	WARN_ON(!mutex_is_locked(&dev_priv->rps.hw_lock));
 
@@ -7356,99 +7221,66 @@ int sandybridge_pcode_write(struct drm_i915_private *dev_priv, u8 mbox, u32 val)
 	return 0;
 }
 
-static int byt_gpu_freq(struct drm_i915_private *dev_priv, int val)
+static int vlv_gpu_freq_div(unsigned int czclk_freq)
 {
-	int div;
-
-	/* 4 x czclk */
-	switch (dev_priv->mem_freq) {
-	case 800:
-		div = 10;
-		break;
-	case 1066:
-		div = 12;
-		break;
-	case 1333:
-		div = 16;
-		break;
+	switch (czclk_freq) {
+	case 200:
+		return 10;
+	case 267:
+		return 12;
+	case 320:
+	case 333:
+		return 16;
+	case 400:
+		return 20;
 	default:
 		return -1;
 	}
+}
+
+static int byt_gpu_freq(struct drm_i915_private *dev_priv, int val)
+{
+	int div, czclk_freq = DIV_ROUND_CLOSEST(dev_priv->mem_freq, 4);
+
+	div = vlv_gpu_freq_div(czclk_freq);
+	if (div < 0)
+		return div;
 
-	return DIV_ROUND_CLOSEST(dev_priv->mem_freq * (val + 6 - 0xbd), 4 * div);
+	return DIV_ROUND_CLOSEST(czclk_freq * (val + 6 - 0xbd), div);
 }
 
 static int byt_freq_opcode(struct drm_i915_private *dev_priv, int val)
 {
-	int mul;
+	int mul, czclk_freq = DIV_ROUND_CLOSEST(dev_priv->mem_freq, 4);
 
-	/* 4 x czclk */
-	switch (dev_priv->mem_freq) {
-	case 800:
-		mul = 10;
-		break;
-	case 1066:
-		mul = 12;
-		break;
-	case 1333:
-		mul = 16;
-		break;
-	default:
-		return -1;
-	}
+	mul = vlv_gpu_freq_div(czclk_freq);
+	if (mul < 0)
+		return mul;
 
-	return DIV_ROUND_CLOSEST(4 * mul * val, dev_priv->mem_freq) + 0xbd - 6;
+	return DIV_ROUND_CLOSEST(mul * val, czclk_freq) + 0xbd - 6;
 }
 
 static int chv_gpu_freq(struct drm_i915_private *dev_priv, int val)
 {
-	int div, freq;
+	int div, czclk_freq = dev_priv->rps.cz_freq;
 
-	switch (dev_priv->rps.cz_freq) {
-	case 200:
-		div = 5;
-		break;
-	case 267:
-		div = 6;
-		break;
-	case 320:
-	case 333:
-	case 400:
-		div = 8;
-		break;
-	default:
-		return -1;
-	}
-
-	freq = (DIV_ROUND_CLOSEST((dev_priv->rps.cz_freq * val), 2 * div) / 2);
+	div = vlv_gpu_freq_div(czclk_freq) / 2;
+	if (div < 0)
+		return div;
 
-	return freq;
+	return DIV_ROUND_CLOSEST(czclk_freq * val, 2 * div) / 2;
 }
 
 static int chv_freq_opcode(struct drm_i915_private *dev_priv, int val)
 {
-	int mul, opcode;
+	int mul, czclk_freq = dev_priv->rps.cz_freq;
 
-	switch (dev_priv->rps.cz_freq) {
-	case 200:
-		mul = 5;
-		break;
-	case 267:
-		mul = 6;
-		break;
-	case 320:
-	case 333:
-	case 400:
-		mul = 8;
-		break;
-	default:
-		return -1;
-	}
+	mul = vlv_gpu_freq_div(czclk_freq) / 2;
+	if (mul < 0)
+		return mul;
 
 	/* CHV needs even values */
-	opcode = (DIV_ROUND_CLOSEST((val * 2 * mul), dev_priv->rps.cz_freq) * 2);
-
-	return opcode;
+	return DIV_ROUND_CLOSEST(val * 2 * mul, czclk_freq) * 2;
 }
 
 int vlv_gpu_freq(struct drm_i915_private *dev_priv, int val)
@@ -7485,5 +7317,4 @@ void intel_pm_setup(struct drm_device *dev)
 			  intel_gen6_powersave_work);
 
 	dev_priv->pm.suspended = false;
-	dev_priv->pm._irqs_disabled = false;
 }
diff --git a/drivers/gpu/drm/i915/intel_psr.c b/drivers/gpu/drm/i915/intel_psr.c
new file mode 100644
index 000000000000..716b8a961eea
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_psr.c
@@ -0,0 +1,481 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * DOC: Panel Self Refresh (PSR/SRD)
+ *
+ * Since Haswell Display controller supports Panel Self-Refresh on display
+ * panels witch have a remote frame buffer (RFB) implemented according to PSR
+ * spec in eDP1.3. PSR feature allows the display to go to lower standby states
+ * when system is idle but display is on as it eliminates display refresh
+ * request to DDR memory completely as long as the frame buffer for that
+ * display is unchanged.
+ *
+ * Panel Self Refresh must be supported by both Hardware (source) and
+ * Panel (sink).
+ *
+ * PSR saves power by caching the framebuffer in the panel RFB, which allows us
+ * to power down the link and memory controller. For DSI panels the same idea
+ * is called "manual mode".
+ *
+ * The implementation uses the hardware-based PSR support which automatically
+ * enters/exits self-refresh mode. The hardware takes care of sending the
+ * required DP aux message and could even retrain the link (that part isn't
+ * enabled yet though). The hardware also keeps track of any frontbuffer
+ * changes to know when to exit self-refresh mode again. Unfortunately that
+ * part doesn't work too well, hence why the i915 PSR support uses the
+ * software frontbuffer tracking to make sure it doesn't miss a screen
+ * update. For this integration intel_psr_invalidate() and intel_psr_flush()
+ * get called by the frontbuffer tracking code. Note that because of locking
+ * issues the self-refresh re-enable code is done from a work queue, which
+ * must be correctly synchronized/cancelled when shutting down the pipe."
+ */
+
+#include <drm/drmP.h>
+
+#include "intel_drv.h"
+#include "i915_drv.h"
+
+static bool is_edp_psr(struct intel_dp *intel_dp)
+{
+	return intel_dp->psr_dpcd[0] & DP_PSR_IS_SUPPORTED;
+}
+
+bool intel_psr_is_enabled(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (!HAS_PSR(dev))
+		return false;
+
+	return I915_READ(EDP_PSR_CTL(dev)) & EDP_PSR_ENABLE;
+}
+
+static void intel_psr_write_vsc(struct intel_dp *intel_dp,
+				    struct edp_vsc_psr *vsc_psr)
+{
+	struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
+	struct drm_device *dev = dig_port->base.base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_crtc *crtc = to_intel_crtc(dig_port->base.base.crtc);
+	u32 ctl_reg = HSW_TVIDEO_DIP_CTL(crtc->config.cpu_transcoder);
+	u32 data_reg = HSW_TVIDEO_DIP_VSC_DATA(crtc->config.cpu_transcoder);
+	uint32_t *data = (uint32_t *) vsc_psr;
+	unsigned int i;
+
+	/* As per BSPec (Pipe Video Data Island Packet), we need to disable
+	   the video DIP being updated before program video DIP data buffer
+	   registers for DIP being updated. */
+	I915_WRITE(ctl_reg, 0);
+	POSTING_READ(ctl_reg);
+
+	for (i = 0; i < VIDEO_DIP_VSC_DATA_SIZE; i += 4) {
+		if (i < sizeof(struct edp_vsc_psr))
+			I915_WRITE(data_reg + i, *data++);
+		else
+			I915_WRITE(data_reg + i, 0);
+	}
+
+	I915_WRITE(ctl_reg, VIDEO_DIP_ENABLE_VSC_HSW);
+	POSTING_READ(ctl_reg);
+}
+
+static void intel_psr_setup_vsc(struct intel_dp *intel_dp)
+{
+	struct edp_vsc_psr psr_vsc;
+
+	/* Prepare VSC packet as per EDP 1.3 spec, Table 3.10 */
+	memset(&psr_vsc, 0, sizeof(psr_vsc));
+	psr_vsc.sdp_header.HB0 = 0;
+	psr_vsc.sdp_header.HB1 = 0x7;
+	psr_vsc.sdp_header.HB2 = 0x2;
+	psr_vsc.sdp_header.HB3 = 0x8;
+	intel_psr_write_vsc(intel_dp, &psr_vsc);
+}
+
+static void intel_psr_enable_sink(struct intel_dp *intel_dp)
+{
+	struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
+	struct drm_device *dev = dig_port->base.base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	uint32_t aux_clock_divider;
+	int precharge = 0x3;
+	bool only_standby = false;
+	static const uint8_t aux_msg[] = {
+		[0] = DP_AUX_NATIVE_WRITE << 4,
+		[1] = DP_SET_POWER >> 8,
+		[2] = DP_SET_POWER & 0xff,
+		[3] = 1 - 1,
+		[4] = DP_SET_POWER_D0,
+	};
+	int i;
+
+	BUILD_BUG_ON(sizeof(aux_msg) > 20);
+
+	aux_clock_divider = intel_dp->get_aux_clock_divider(intel_dp, 0);
+
+	if (IS_BROADWELL(dev) && dig_port->port != PORT_A)
+		only_standby = true;
+
+	/* Enable PSR in sink */
+	if (intel_dp->psr_dpcd[1] & DP_PSR_NO_TRAIN_ON_EXIT || only_standby)
+		drm_dp_dpcd_writeb(&intel_dp->aux, DP_PSR_EN_CFG,
+				   DP_PSR_ENABLE & ~DP_PSR_MAIN_LINK_ACTIVE);
+	else
+		drm_dp_dpcd_writeb(&intel_dp->aux, DP_PSR_EN_CFG,
+				   DP_PSR_ENABLE | DP_PSR_MAIN_LINK_ACTIVE);
+
+	/* Setup AUX registers */
+	for (i = 0; i < sizeof(aux_msg); i += 4)
+		I915_WRITE(EDP_PSR_AUX_DATA1(dev) + i,
+			   intel_dp_pack_aux(&aux_msg[i], sizeof(aux_msg) - i));
+
+	I915_WRITE(EDP_PSR_AUX_CTL(dev),
+		   DP_AUX_CH_CTL_TIME_OUT_400us |
+		   (sizeof(aux_msg) << DP_AUX_CH_CTL_MESSAGE_SIZE_SHIFT) |
+		   (precharge << DP_AUX_CH_CTL_PRECHARGE_2US_SHIFT) |
+		   (aux_clock_divider << DP_AUX_CH_CTL_BIT_CLOCK_2X_SHIFT));
+}
+
+static void intel_psr_enable_source(struct intel_dp *intel_dp)
+{
+	struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
+	struct drm_device *dev = dig_port->base.base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	uint32_t max_sleep_time = 0x1f;
+	uint32_t idle_frames = 1;
+	uint32_t val = 0x0;
+	const uint32_t link_entry_time = EDP_PSR_MIN_LINK_ENTRY_TIME_8_LINES;
+	bool only_standby = false;
+
+	if (IS_BROADWELL(dev) && dig_port->port != PORT_A)
+		only_standby = true;
+
+	if (intel_dp->psr_dpcd[1] & DP_PSR_NO_TRAIN_ON_EXIT || only_standby) {
+		val |= EDP_PSR_LINK_STANDBY;
+		val |= EDP_PSR_TP2_TP3_TIME_0us;
+		val |= EDP_PSR_TP1_TIME_0us;
+		val |= EDP_PSR_SKIP_AUX_EXIT;
+		val |= IS_BROADWELL(dev) ? BDW_PSR_SINGLE_FRAME : 0;
+	} else
+		val |= EDP_PSR_LINK_DISABLE;
+
+	I915_WRITE(EDP_PSR_CTL(dev), val |
+		   (IS_BROADWELL(dev) ? 0 : link_entry_time) |
+		   max_sleep_time << EDP_PSR_MAX_SLEEP_TIME_SHIFT |
+		   idle_frames << EDP_PSR_IDLE_FRAME_SHIFT |
+		   EDP_PSR_ENABLE);
+}
+
+static bool intel_psr_match_conditions(struct intel_dp *intel_dp)
+{
+	struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
+	struct drm_device *dev = dig_port->base.base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_crtc *crtc = dig_port->base.base.crtc;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+
+	lockdep_assert_held(&dev_priv->psr.lock);
+	WARN_ON(!drm_modeset_is_locked(&dev->mode_config.connection_mutex));
+	WARN_ON(!drm_modeset_is_locked(&crtc->mutex));
+
+	dev_priv->psr.source_ok = false;
+
+	if (IS_HASWELL(dev) && dig_port->port != PORT_A) {
+		DRM_DEBUG_KMS("HSW ties PSR to DDI A (eDP)\n");
+		return false;
+	}
+
+	if (!i915.enable_psr) {
+		DRM_DEBUG_KMS("PSR disable by flag\n");
+		return false;
+	}
+
+	/* Below limitations aren't valid for Broadwell */
+	if (IS_BROADWELL(dev))
+		goto out;
+
+	if (I915_READ(HSW_STEREO_3D_CTL(intel_crtc->config.cpu_transcoder)) &
+	    S3D_ENABLE) {
+		DRM_DEBUG_KMS("PSR condition failed: Stereo 3D is Enabled\n");
+		return false;
+	}
+
+	if (intel_crtc->config.adjusted_mode.flags & DRM_MODE_FLAG_INTERLACE) {
+		DRM_DEBUG_KMS("PSR condition failed: Interlaced is Enabled\n");
+		return false;
+	}
+
+ out:
+	dev_priv->psr.source_ok = true;
+	return true;
+}
+
+static void intel_psr_do_enable(struct intel_dp *intel_dp)
+{
+	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
+	struct drm_device *dev = intel_dig_port->base.base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	WARN_ON(I915_READ(EDP_PSR_CTL(dev)) & EDP_PSR_ENABLE);
+	WARN_ON(dev_priv->psr.active);
+	lockdep_assert_held(&dev_priv->psr.lock);
+
+	/* Enable/Re-enable PSR on the host */
+	intel_psr_enable_source(intel_dp);
+
+	dev_priv->psr.active = true;
+}
+
+/**
+ * intel_psr_enable - Enable PSR
+ * @intel_dp: Intel DP
+ *
+ * This function can only be called after the pipe is fully trained and enabled.
+ */
+void intel_psr_enable(struct intel_dp *intel_dp)
+{
+	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
+	struct drm_device *dev = intel_dig_port->base.base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (!HAS_PSR(dev)) {
+		DRM_DEBUG_KMS("PSR not supported on this platform\n");
+		return;
+	}
+
+	if (!is_edp_psr(intel_dp)) {
+		DRM_DEBUG_KMS("PSR not supported by this panel\n");
+		return;
+	}
+
+	mutex_lock(&dev_priv->psr.lock);
+	if (dev_priv->psr.enabled) {
+		DRM_DEBUG_KMS("PSR already in use\n");
+		goto unlock;
+	}
+
+	if (!intel_psr_match_conditions(intel_dp))
+		goto unlock;
+
+	dev_priv->psr.busy_frontbuffer_bits = 0;
+
+	intel_psr_setup_vsc(intel_dp);
+
+	/* Avoid continuous PSR exit by masking memup and hpd */
+	I915_WRITE(EDP_PSR_DEBUG_CTL(dev), EDP_PSR_DEBUG_MASK_MEMUP |
+		   EDP_PSR_DEBUG_MASK_HPD | EDP_PSR_DEBUG_MASK_LPSP);
+
+	/* Enable PSR on the panel */
+	intel_psr_enable_sink(intel_dp);
+
+	dev_priv->psr.enabled = intel_dp;
+unlock:
+	mutex_unlock(&dev_priv->psr.lock);
+}
+
+/**
+ * intel_psr_disable - Disable PSR
+ * @intel_dp: Intel DP
+ *
+ * This function needs to be called before disabling pipe.
+ */
+void intel_psr_disable(struct intel_dp *intel_dp)
+{
+	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
+	struct drm_device *dev = intel_dig_port->base.base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	mutex_lock(&dev_priv->psr.lock);
+	if (!dev_priv->psr.enabled) {
+		mutex_unlock(&dev_priv->psr.lock);
+		return;
+	}
+
+	if (dev_priv->psr.active) {
+		I915_WRITE(EDP_PSR_CTL(dev),
+			   I915_READ(EDP_PSR_CTL(dev)) & ~EDP_PSR_ENABLE);
+
+		/* Wait till PSR is idle */
+		if (_wait_for((I915_READ(EDP_PSR_STATUS_CTL(dev)) &
+			       EDP_PSR_STATUS_STATE_MASK) == 0, 2000, 10))
+			DRM_ERROR("Timed out waiting for PSR Idle State\n");
+
+		dev_priv->psr.active = false;
+	} else {
+		WARN_ON(I915_READ(EDP_PSR_CTL(dev)) & EDP_PSR_ENABLE);
+	}
+
+	dev_priv->psr.enabled = NULL;
+	mutex_unlock(&dev_priv->psr.lock);
+
+	cancel_delayed_work_sync(&dev_priv->psr.work);
+}
+
+static void intel_psr_work(struct work_struct *work)
+{
+	struct drm_i915_private *dev_priv =
+		container_of(work, typeof(*dev_priv), psr.work.work);
+	struct intel_dp *intel_dp = dev_priv->psr.enabled;
+
+	/* We have to make sure PSR is ready for re-enable
+	 * otherwise it keeps disabled until next full enable/disable cycle.
+	 * PSR might take some time to get fully disabled
+	 * and be ready for re-enable.
+	 */
+	if (wait_for((I915_READ(EDP_PSR_STATUS_CTL(dev_priv->dev)) &
+		      EDP_PSR_STATUS_STATE_MASK) == 0, 50)) {
+		DRM_ERROR("Timed out waiting for PSR Idle for re-enable\n");
+		return;
+	}
+
+	mutex_lock(&dev_priv->psr.lock);
+	intel_dp = dev_priv->psr.enabled;
+
+	if (!intel_dp)
+		goto unlock;
+
+	/*
+	 * The delayed work can race with an invalidate hence we need to
+	 * recheck. Since psr_flush first clears this and then reschedules we
+	 * won't ever miss a flush when bailing out here.
+	 */
+	if (dev_priv->psr.busy_frontbuffer_bits)
+		goto unlock;
+
+	intel_psr_do_enable(intel_dp);
+unlock:
+	mutex_unlock(&dev_priv->psr.lock);
+}
+
+static void intel_psr_exit(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (dev_priv->psr.active) {
+		u32 val = I915_READ(EDP_PSR_CTL(dev));
+
+		WARN_ON(!(val & EDP_PSR_ENABLE));
+
+		I915_WRITE(EDP_PSR_CTL(dev), val & ~EDP_PSR_ENABLE);
+
+		dev_priv->psr.active = false;
+	}
+
+}
+
+/**
+ * intel_psr_invalidate - Invalidade PSR
+ * @dev: DRM device
+ * @frontbuffer_bits: frontbuffer plane tracking bits
+ *
+ * Since the hardware frontbuffer tracking has gaps we need to integrate
+ * with the software frontbuffer tracking. This function gets called every
+ * time frontbuffer rendering starts and a buffer gets dirtied. PSR must be
+ * disabled if the frontbuffer mask contains a buffer relevant to PSR.
+ *
+ * Dirty frontbuffers relevant to PSR are tracked in busy_frontbuffer_bits."
+ */
+void intel_psr_invalidate(struct drm_device *dev,
+			      unsigned frontbuffer_bits)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_crtc *crtc;
+	enum pipe pipe;
+
+	mutex_lock(&dev_priv->psr.lock);
+	if (!dev_priv->psr.enabled) {
+		mutex_unlock(&dev_priv->psr.lock);
+		return;
+	}
+
+	crtc = dp_to_dig_port(dev_priv->psr.enabled)->base.base.crtc;
+	pipe = to_intel_crtc(crtc)->pipe;
+
+	intel_psr_exit(dev);
+
+	frontbuffer_bits &= INTEL_FRONTBUFFER_ALL_MASK(pipe);
+
+	dev_priv->psr.busy_frontbuffer_bits |= frontbuffer_bits;
+	mutex_unlock(&dev_priv->psr.lock);
+}
+
+/**
+ * intel_psr_flush - Flush PSR
+ * @dev: DRM device
+ * @frontbuffer_bits: frontbuffer plane tracking bits
+ *
+ * Since the hardware frontbuffer tracking has gaps we need to integrate
+ * with the software frontbuffer tracking. This function gets called every
+ * time frontbuffer rendering has completed and flushed out to memory. PSR
+ * can be enabled again if no other frontbuffer relevant to PSR is dirty.
+ *
+ * Dirty frontbuffers relevant to PSR are tracked in busy_frontbuffer_bits.
+ */
+void intel_psr_flush(struct drm_device *dev,
+			 unsigned frontbuffer_bits)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_crtc *crtc;
+	enum pipe pipe;
+
+	mutex_lock(&dev_priv->psr.lock);
+	if (!dev_priv->psr.enabled) {
+		mutex_unlock(&dev_priv->psr.lock);
+		return;
+	}
+
+	crtc = dp_to_dig_port(dev_priv->psr.enabled)->base.base.crtc;
+	pipe = to_intel_crtc(crtc)->pipe;
+	dev_priv->psr.busy_frontbuffer_bits &= ~frontbuffer_bits;
+
+	/*
+	 * On Haswell sprite plane updates don't result in a psr invalidating
+	 * signal in the hardware. Which means we need to manually fake this in
+	 * software for all flushes, not just when we've seen a preceding
+	 * invalidation through frontbuffer rendering.
+	 */
+	if (IS_HASWELL(dev) &&
+	    (frontbuffer_bits & INTEL_FRONTBUFFER_SPRITE(pipe)))
+		intel_psr_exit(dev);
+
+	if (!dev_priv->psr.active && !dev_priv->psr.busy_frontbuffer_bits)
+		schedule_delayed_work(&dev_priv->psr.work,
+				      msecs_to_jiffies(100));
+	mutex_unlock(&dev_priv->psr.lock);
+}
+
+/**
+ * intel_psr_init - Init basic PSR work and mutex.
+ * @dev: DRM device
+ *
+ * This function is  called only once at driver load to initialize basic
+ * PSR stuff.
+ */
+void intel_psr_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	INIT_DELAYED_WORK(&dev_priv->psr.work, intel_psr_work);
+	mutex_init(&dev_priv->psr.lock);
+}
diff --git a/drivers/gpu/drm/i915/intel_renderstate.h b/drivers/gpu/drm/i915/intel_renderstate.h
index 6c792d3a9c9c..5bd69852752c 100644
--- a/drivers/gpu/drm/i915/intel_renderstate.h
+++ b/drivers/gpu/drm/i915/intel_renderstate.h
@@ -29,6 +29,7 @@
 extern const struct intel_renderstate_rodata gen6_null_state;
 extern const struct intel_renderstate_rodata gen7_null_state;
 extern const struct intel_renderstate_rodata gen8_null_state;
+extern const struct intel_renderstate_rodata gen9_null_state;
 
 #define RO_RENDERSTATE(_g)						\
 	const struct intel_renderstate_rodata gen ## _g ## _null_state = { \
diff --git a/drivers/gpu/drm/i915/intel_renderstate_gen8.c b/drivers/gpu/drm/i915/intel_renderstate_gen8.c
index 75ef1b5de45c..78011d73fa9f 100644
--- a/drivers/gpu/drm/i915/intel_renderstate_gen8.c
+++ b/drivers/gpu/drm/i915/intel_renderstate_gen8.c
@@ -1,16 +1,134 @@
 #include "intel_renderstate.h"
 
 static const u32 gen8_null_state_relocs[] = {
-	0x00000048,
-	0x00000050,
-	0x00000060,
-	0x000003ec,
+	0x00000798,
+	0x000007a4,
+	0x000007ac,
+	0x000007bc,
 	-1,
 };
 
 static const u32 gen8_null_state_batch[] = {
+	0x7a000004,
+	0x01000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
 	0x69040000,
-	0x61020001,
+	0x78140000,
+	0x04000000,
+	0x7820000a,
+	0x00000000,
+	0x00000000,
+	0x80000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78130002,
+	0x00000000,
+	0x00000000,
+	0x02001808,
+	0x781f0002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78510009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78100007,
+	0x00000000,
+	0x00000000,
+	0x00010000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781b0007,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000800,
+	0x00000000,
+	0x78110008,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781e0003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781d0007,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78120002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78500003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781c0002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x780c0000,
+	0x00000000,
+	0x78520003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78300000,
+	0x08010040,
+	0x78310000,
+	0x1e000000,
+	0x78320000,
+	0x1e000000,
+	0x78330000,
+	0x1e000000,
+	0x79190002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x791a0002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x791b0002,
+	0x00000000,
 	0x00000000,
 	0x00000000,
 	0x79120000,
@@ -23,48 +141,435 @@ static const u32 gen8_null_state_batch[] = {
 	0x00000000,
 	0x79160000,
 	0x00000000,
-	0x6101000e,
-	0x00000001,
+	0x78150009,
 	0x00000000,
-	0x00000001,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78190009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781a0009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78160009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78170009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78490001,
+	0x00000000,
+	0x00000000,
+	0x784a0000,
+	0x00000000,
+	0x784b0000,
+	0x00000004,
+	0x79170101,
+	0x00000000,
+	0x00000080,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79180006,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79180006,
+	0x20000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79180006,
+	0x40000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79180006,
+	0x60000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x6101000e,
 	0x00000001,	 /* reloc */
 	0x00000000,
+	0x00000000,
 	0x00000001,	 /* reloc */
 	0x00000000,
+	0x00000001,	 /* reloc */
 	0x00000000,
+	0x00000001,
 	0x00000000,
 	0x00000001,	 /* reloc */
 	0x00000000,
-	0xfffff001,
 	0x00001001,
-	0xfffff001,
 	0x00001001,
-	0x78230000,
-	0x000006e0,
-	0x78210000,
-	0x00000700,
-	0x78300000,
-	0x08010040,
-	0x78330000,
-	0x08000000,
-	0x78310000,
-	0x08000000,
-	0x78320000,
-	0x08000000,
-	0x78240000,
-	0x00000641,
-	0x780e0000,
-	0x00000601,
+	0x00000001,
+	0x00001001,
+	0x61020001,
+	0x00000000,
+	0x00000000,
+	0x79000002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78050006,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79040002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79040002,
+	0x40000000,
+	0x00000000,
+	0x00000000,
+	0x79040002,
+	0x80000000,
+	0x00000000,
+	0x00000000,
+	0x79040002,
+	0xc0000000,
+	0x00000000,
+	0x00000000,
+	0x79080001,
+	0x00000000,
+	0x00000000,
+	0x790a0001,
+	0x00000000,
+	0x00000000,
+	0x78060003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78070003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78040001,
+	0x00000000,
+	0x00000000,
+	0x79110000,
+	0x00000000,
 	0x780d0000,
 	0x00000000,
-	0x78180000,
-	0x00000001,
-	0x78520003,
+	0x79060000,
 	0x00000000,
+	0x7907001f,
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x78190009,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -75,7 +580,6 @@ static const u32 gen8_null_state_batch[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x781b0007,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -84,26 +588,22 @@ static const u32 gen8_null_state_batch[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x78270000,
 	0x00000000,
-	0x782c0000,
 	0x00000000,
-	0x781c0002,
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x78160009,
 	0x00000000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x7902000f,
 	0x00000000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x78110008,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -113,12 +613,10 @@ static const u32 gen8_null_state_batch[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x78290000,
 	0x00000000,
-	0x782e0000,
 	0x00000000,
-	0x781a0009,
 	0x00000000,
+	0x790c000f,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -128,7 +626,6 @@ static const u32 gen8_null_state_batch[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x781d0007,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -136,153 +633,153 @@ static const u32 gen8_null_state_batch[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x780a0003,
 	0x00000000,
-	0x78280000,
 	0x00000000,
-	0x782d0000,
 	0x00000000,
-	0x78260000,
 	0x00000000,
-	0x782b0000,
+	0x78080083,
+	0x00004000,
 	0x00000000,
-	0x78150009,
 	0x00000000,
 	0x00000000,
+	0x04004000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x08004000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x0c004000,
 	0x00000000,
 	0x00000000,
-	0x78100007,
 	0x00000000,
+	0x10004000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x14004000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x18004000,
 	0x00000000,
-	0x781e0003,
 	0x00000000,
 	0x00000000,
+	0x1c004000,
 	0x00000000,
 	0x00000000,
-	0x78120002,
 	0x00000000,
+	0x20004000,
 	0x00000000,
 	0x00000000,
-	0x781f0002,
-	0x30400820,
 	0x00000000,
+	0x24004000,
 	0x00000000,
-	0x78510009,
 	0x00000000,
 	0x00000000,
+	0x28004000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x2c004000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x30004000,
 	0x00000000,
 	0x00000000,
-	0x78500003,
-	0x00210000,
 	0x00000000,
+	0x34004000,
 	0x00000000,
 	0x00000000,
-	0x78130002,
 	0x00000000,
+	0x38004000,
 	0x00000000,
 	0x00000000,
-	0x782a0000,
-	0x00000480,
-	0x782f0000,
-	0x00000540,
-	0x78140000,
-	0x00000800,
-	0x78170009,
 	0x00000000,
+	0x3c004000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x40004000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x44004000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x7820000a,
-	0x00000580,
+	0x48004000,
 	0x00000000,
-	0x08080000,
 	0x00000000,
 	0x00000000,
-	0x1f000002,
-	0x00060000,
+	0x4c004000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x50004000,
 	0x00000000,
-	0x784d0000,
-	0x40000000,
-	0x784f0000,
-	0x80000100,
-	0x780f0000,
-	0x00000740,
-	0x78050006,
 	0x00000000,
 	0x00000000,
+	0x54004000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x58004000,
 	0x00000000,
 	0x00000000,
-	0x78070003,
 	0x00000000,
+	0x5c004000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x78060003,
+	0x60004000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x64004000,
 	0x00000000,
-	0x78040001,
 	0x00000000,
-	0x00000001,
-	0x79000002,
-	0xffffffff,
+	0x00000000,
+	0x68004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x6c004000,
+	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x78080003,
-	0x00006000,
-	0x000005e0,	 /* reloc */
+	0x70004000,
 	0x00000000,
 	0x00000000,
-	0x78090005,
+	0x00000000,
+	0x74004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x7c004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x80004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78090043,
 	0x02000000,
 	0x22220000,
-	0x02f60000,
-	0x11230000,
-	0x02850004,
-	0x11230000,
-	0x784b0000,
-	0x0000000f,
-	0x78490001,
 	0x00000000,
 	0x00000000,
-	0x7b000005,
 	0x00000000,
-	0x00000003,
 	0x00000000,
-	0x00000001,
 	0x00000000,
 	0x00000000,
-	0x05000000,	 /* cmds end */
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -297,8 +794,6 @@ static const u32 gen8_null_state_batch[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x000004c0,	 /* state start */
-	0x00000500,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -345,46 +840,65 @@ static const u32 gen8_null_state_batch[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x680b0001,
+	0x78260000,
+	0x00000000,
+	0x78270000,
+	0x00000000,
+	0x78280000,
+	0x00000000,
+	0x78290000,
+	0x00000000,
+	0x782a0000,
+	0x00000000,
+	0x780e0000,
+	0x00000dc1,
+	0x78240000,
+	0x00000e01,
+	0x784f0000,
+	0x80000100,
+	0x784d0000,
+	0x40000000,
+	0x782b0000,
+	0x00000000,
+	0x782c0000,
+	0x00000000,
+	0x782d0000,
 	0x00000000,
+	0x782e0000,
 	0x00000000,
+	0x782f0000,
 	0x00000000,
-	0x00000092,
+	0x780f0000,
 	0x00000000,
+	0x78230000,
+	0x00000e60,
+	0x78210000,
+	0x00000e80,
+	0x7b000005,
+	0x00000004,
+	0x00000001,
 	0x00000000,
+	0x00000001,
 	0x00000000,
 	0x00000000,
+	0x05000000,	 /* cmds end */
 	0x00000000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
+	0x00000000,	 /* state start */
+	0x00000000,
+	0x3f800000,
+	0x3f800000,
+	0x3f800000,
+	0x3f800000,
+	0x00000000,
+	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x0060005a,
-	0x21403ae8,
-	0x3a0000c0,
-	0x008d0040,
-	0x0060005a,
-	0x21603ae8,
-	0x3a0000c0,
-	0x008d0080,
-	0x0060005a,
-	0x21803ae8,
-	0x3a0000d0,
-	0x008d0040,
-	0x0060005a,
-	0x21a03ae8,
-	0x3a0000d0,
-	0x008d0080,
-	0x02800031,
-	0x2e0022e8,
-	0x0e000140,
-	0x08840001,
-	0x05800031,
-	0x200022e0,
-	0x0e000e00,
-	0x90031000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -410,38 +924,6 @@ static const u32 gen8_null_state_batch[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
-	0x06200000,
-	0x00000002,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -449,8 +931,6 @@ static const u32 gen8_null_state_batch[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0xf99a130c,
-	0x799a130c,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -466,9 +946,7 @@ static const u32 gen8_null_state_batch[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x3f800000,
 	0x00000000,
-	0x3f800000,
 	0x00000000,
 	0x00000000,
 	0x00000000,
diff --git a/drivers/gpu/drm/i915/intel_renderstate_gen9.c b/drivers/gpu/drm/i915/intel_renderstate_gen9.c
new file mode 100644
index 000000000000..875075373807
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_renderstate_gen9.c
@@ -0,0 +1,974 @@
+#include "intel_renderstate.h"
+
+static const u32 gen9_null_state_relocs[] = {
+	0x000007a8,
+	0x000007b4,
+	0x000007bc,
+	0x000007cc,
+	-1,
+};
+
+static const u32 gen9_null_state_batch[] = {
+	0x7a000004,
+	0x01000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x69040300,
+	0x78140000,
+	0x04000000,
+	0x7820000a,
+	0x00000000,
+	0x00000000,
+	0x80000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78130002,
+	0x00000000,
+	0x00000000,
+	0x02001808,
+	0x781f0004,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78510009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78100007,
+	0x00000000,
+	0x00000000,
+	0x00010000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781b0007,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000800,
+	0x00000000,
+	0x78110008,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781e0003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781d0009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78120002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78500003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781c0002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x780c0000,
+	0x00000000,
+	0x78520003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78300000,
+	0x08010040,
+	0x78310000,
+	0x1e000000,
+	0x78320000,
+	0x1e000000,
+	0x78330000,
+	0x1e000000,
+	0x79190002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x791a0002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x791b0002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79120000,
+	0x00000000,
+	0x79130000,
+	0x00000000,
+	0x79140000,
+	0x00000000,
+	0x79150000,
+	0x00000000,
+	0x79160000,
+	0x00000000,
+	0x78150009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78190009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781a0009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78160009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78170009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78490001,
+	0x00000000,
+	0x00000000,
+	0x784a0000,
+	0x00000000,
+	0x784b0000,
+	0x00000004,
+	0x79170101,
+	0x00000000,
+	0x00000080,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79180006,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79180006,
+	0x20000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79180006,
+	0x40000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79180006,
+	0x60000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x61010011,
+	0x00000001,	 /* reloc */
+	0x00000000,
+	0x00000000,
+	0x00000001,	 /* reloc */
+	0x00000000,
+	0x00000001,	 /* reloc */
+	0x00000000,
+	0x00000001,
+	0x00000000,
+	0x00000001,	 /* reloc */
+	0x00000000,
+	0x00001001,
+	0x00001001,
+	0x00000001,
+	0x00001001,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x61020001,
+	0x00000000,
+	0x00000000,
+	0x79000002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78050006,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79040002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79040002,
+	0x40000000,
+	0x00000000,
+	0x00000000,
+	0x79040002,
+	0x80000000,
+	0x00000000,
+	0x00000000,
+	0x79040002,
+	0xc0000000,
+	0x00000000,
+	0x00000000,
+	0x79080001,
+	0x00000000,
+	0x00000000,
+	0x790a0001,
+	0x00000000,
+	0x00000000,
+	0x78060003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78070003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78040001,
+	0x00000000,
+	0x00000000,
+	0x79110000,
+	0x00000000,
+	0x780d0000,
+	0x00000000,
+	0x79060000,
+	0x00000000,
+	0x7907001f,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x7902000f,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x790c000f,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x780a0003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78080083,
+	0x00004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x04004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x08004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x0c004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x10004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x14004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x18004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x1c004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x20004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x24004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x28004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x2c004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x30004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x34004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x38004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x3c004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x40004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x44004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x48004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x4c004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x50004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x54004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x58004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x5c004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x60004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x64004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x68004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x6c004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x70004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x74004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x7c004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x80004000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78090043,
+	0x02000000,
+	0x22220000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78550003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x680b0001,
+	0x780e0000,
+	0x00000e01,
+	0x78240000,
+	0x00000e41,
+	0x784f0000,
+	0x80000100,
+	0x784d0000,
+	0x40000000,
+	0x782b0000,
+	0x00000000,
+	0x782c0000,
+	0x00000000,
+	0x782d0000,
+	0x00000000,
+	0x782e0000,
+	0x00000000,
+	0x782f0000,
+	0x00000000,
+	0x780f0000,
+	0x00000000,
+	0x78230000,
+	0x00000ea0,
+	0x78210000,
+	0x00000ec0,
+	0x78260000,
+	0x00000000,
+	0x78270000,
+	0x00000000,
+	0x78280000,
+	0x00000000,
+	0x78290000,
+	0x00000000,
+	0x782a0000,
+	0x00000000,
+	0x7b000005,
+	0x00000004,
+	0x00000001,
+	0x00000000,
+	0x00000001,
+	0x00000000,
+	0x00000000,
+	0x05000000,	 /* cmds end */
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,	 /* state start */
+	0x00000000,
+	0x3f800000,
+	0x3f800000,
+	0x3f800000,
+	0x3f800000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,	 /* state end */
+};
+
+RO_RENDERSTATE(9);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 0a80e419b589..9f445e9a75d1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -589,14 +589,10 @@ static int init_ring_common(struct intel_engine_cs *ring)
 		goto out;
 	}
 
-	if (!drm_core_check_feature(ring->dev, DRIVER_MODESET))
-		i915_kernel_lost_context(ring->dev);
-	else {
-		ringbuf->head = I915_READ_HEAD(ring);
-		ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
-		ringbuf->space = intel_ring_space(ringbuf);
-		ringbuf->last_retired_head = -1;
-	}
+	ringbuf->head = I915_READ_HEAD(ring);
+	ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
+	ringbuf->space = intel_ring_space(ringbuf);
+	ringbuf->last_retired_head = -1;
 
 	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
 
@@ -665,76 +661,112 @@ err:
 	return ret;
 }
 
-static inline void intel_ring_emit_wa(struct intel_engine_cs *ring,
-				       u32 addr, u32 value)
+static int intel_ring_workarounds_emit(struct intel_engine_cs *ring,
+				       struct intel_context *ctx)
 {
+	int ret, i;
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_workarounds *w = &dev_priv->workarounds;
 
-	if (WARN_ON(dev_priv->num_wa_regs >= I915_MAX_WA_REGS))
-		return;
+	if (WARN_ON(w->count == 0))
+		return 0;
 
-	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-	intel_ring_emit(ring, addr);
-	intel_ring_emit(ring, value);
+	ring->gpu_caches_dirty = true;
+	ret = intel_ring_flush_all_caches(ring);
+	if (ret)
+		return ret;
 
-	dev_priv->intel_wa_regs[dev_priv->num_wa_regs].addr = addr;
-	dev_priv->intel_wa_regs[dev_priv->num_wa_regs].mask = value & 0xFFFF;
-	/* value is updated with the status of remaining bits of this
-	 * register when it is read from debugfs file
-	 */
-	dev_priv->intel_wa_regs[dev_priv->num_wa_regs].value = value;
-	dev_priv->num_wa_regs++;
+	ret = intel_ring_begin(ring, (w->count * 2 + 2));
+	if (ret)
+		return ret;
+
+	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(w->count));
+	for (i = 0; i < w->count; i++) {
+		intel_ring_emit(ring, w->reg[i].addr);
+		intel_ring_emit(ring, w->reg[i].value);
+	}
+	intel_ring_emit(ring, MI_NOOP);
+
+	intel_ring_advance(ring);
 
-	return;
+	ring->gpu_caches_dirty = true;
+	ret = intel_ring_flush_all_caches(ring);
+	if (ret)
+		return ret;
+
+	DRM_DEBUG_DRIVER("Number of Workarounds emitted: %d\n", w->count);
+
+	return 0;
 }
 
+static int wa_add(struct drm_i915_private *dev_priv,
+		  const u32 addr, const u32 mask, const u32 val)
+{
+	const u32 idx = dev_priv->workarounds.count;
+
+	if (WARN_ON(idx >= I915_MAX_WA_REGS))
+		return -ENOSPC;
+
+	dev_priv->workarounds.reg[idx].addr = addr;
+	dev_priv->workarounds.reg[idx].value = val;
+	dev_priv->workarounds.reg[idx].mask = mask;
+
+	dev_priv->workarounds.count++;
+
+	return 0;
+}
+
+#define WA_REG(addr, mask, val) { \
+		const int r = wa_add(dev_priv, (addr), (mask), (val)); \
+		if (r) \
+			return r; \
+	}
+
+#define WA_SET_BIT_MASKED(addr, mask) \
+	WA_REG(addr, (mask), _MASKED_BIT_ENABLE(mask))
+
+#define WA_CLR_BIT_MASKED(addr, mask) \
+	WA_REG(addr, (mask), _MASKED_BIT_DISABLE(mask))
+
+#define WA_SET_FIELD_MASKED(addr, mask, value) \
+	WA_REG(addr, mask, _MASKED_FIELD(mask, value))
+
+#define WA_SET_BIT(addr, mask) WA_REG(addr, mask, I915_READ(addr) | (mask))
+#define WA_CLR_BIT(addr, mask) WA_REG(addr, mask, I915_READ(addr) & ~(mask))
+
+#define WA_WRITE(addr, val) WA_REG(addr, 0xffffffff, val)
+
 static int bdw_init_workarounds(struct intel_engine_cs *ring)
 {
-	int ret;
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	/*
-	 * workarounds applied in this fn are part of register state context,
-	 * they need to be re-initialized followed by gpu reset, suspend/resume,
-	 * module reload.
-	 */
-	dev_priv->num_wa_regs = 0;
-	memset(dev_priv->intel_wa_regs, 0, sizeof(dev_priv->intel_wa_regs));
-
-	/*
-	 * update the number of dwords required based on the
-	 * actual number of workarounds applied
-	 */
-	ret = intel_ring_begin(ring, 18);
-	if (ret)
-		return ret;
-
 	/* WaDisablePartialInstShootdown:bdw */
-	/* WaDisableThreadStallDopClockGating:bdw */
-	/* FIXME: Unclear whether we really need this on production bdw. */
-	intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN,
-			   _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE
-					     | STALL_DOP_GATING_DISABLE));
+	/* WaDisableThreadStallDopClockGating:bdw (pre-production) */
+	WA_SET_BIT_MASKED(GEN8_ROW_CHICKEN,
+			  PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE |
+			  STALL_DOP_GATING_DISABLE);
 
-	/* WaDisableDopClockGating:bdw May not be needed for production */
-	intel_ring_emit_wa(ring, GEN7_ROW_CHICKEN2,
-			   _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));
+	/* WaDisableDopClockGating:bdw */
+	WA_SET_BIT_MASKED(GEN7_ROW_CHICKEN2,
+			  DOP_CLOCK_GATING_DISABLE);
 
-	intel_ring_emit_wa(ring, HALF_SLICE_CHICKEN3,
-			   _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
+	WA_SET_BIT_MASKED(HALF_SLICE_CHICKEN3,
+			  GEN8_SAMPLER_POWER_BYPASS_DIS);
 
 	/* Use Force Non-Coherent whenever executing a 3D context. This is a
 	 * workaround for for a possible hang in the unlikely event a TLB
 	 * invalidation occurs during a PSD flush.
 	 */
-	intel_ring_emit_wa(ring, HDC_CHICKEN0,
-			   _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT));
+	/* WaDisableFenceDestinationToSLM:bdw (GT3 pre-production) */
+	WA_SET_BIT_MASKED(HDC_CHICKEN0,
+			  HDC_FORCE_NON_COHERENT |
+			  (IS_BDW_GT3(dev) ? HDC_FENCE_DEST_SLM_DISABLE : 0));
 
 	/* Wa4x4STCOptimizationDisable:bdw */
-	intel_ring_emit_wa(ring, CACHE_MODE_1,
-			   _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE));
+	WA_SET_BIT_MASKED(CACHE_MODE_1,
+			  GEN8_4x4_STC_OPTIMIZATION_DISABLE);
 
 	/*
 	 * BSpec recommends 8x4 when MSAA is used,
@@ -744,52 +776,51 @@ static int bdw_init_workarounds(struct intel_engine_cs *ring)
 	 * disable bit, which we don't touch here, but it's good
 	 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 	 */
-	intel_ring_emit_wa(ring, GEN7_GT_MODE,
-			   GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
-
-	intel_ring_advance(ring);
-
-	DRM_DEBUG_DRIVER("Number of Workarounds applied: %d\n",
-			 dev_priv->num_wa_regs);
+	WA_SET_FIELD_MASKED(GEN7_GT_MODE,
+			    GEN6_WIZ_HASHING_MASK,
+			    GEN6_WIZ_HASHING_16x4);
 
 	return 0;
 }
 
 static int chv_init_workarounds(struct intel_engine_cs *ring)
 {
-	int ret;
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	/*
-	 * workarounds applied in this fn are part of register state context,
-	 * they need to be re-initialized followed by gpu reset, suspend/resume,
-	 * module reload.
+	/* WaDisablePartialInstShootdown:chv */
+	/* WaDisableThreadStallDopClockGating:chv */
+	WA_SET_BIT_MASKED(GEN8_ROW_CHICKEN,
+			  PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE |
+			  STALL_DOP_GATING_DISABLE);
+
+	/* Use Force Non-Coherent whenever executing a 3D context. This is a
+	 * workaround for a possible hang in the unlikely event a TLB
+	 * invalidation occurs during a PSD flush.
 	 */
-	dev_priv->num_wa_regs = 0;
-	memset(dev_priv->intel_wa_regs, 0, sizeof(dev_priv->intel_wa_regs));
+	/* WaForceEnableNonCoherent:chv */
+	/* WaHdcDisableFetchWhenMasked:chv */
+	WA_SET_BIT_MASKED(HDC_CHICKEN0,
+			  HDC_FORCE_NON_COHERENT |
+			  HDC_DONOT_FETCH_MEM_WHEN_MASKED);
 
-	ret = intel_ring_begin(ring, 12);
-	if (ret)
-		return ret;
+	return 0;
+}
 
-	/* WaDisablePartialInstShootdown:chv */
-	intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN,
-			   _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE));
+int init_workarounds_ring(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	/* WaDisableThreadStallDopClockGating:chv */
-	intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN,
-			   _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
+	WARN_ON(ring->id != RCS);
 
-	/* WaDisableDopClockGating:chv (pre-production hw) */
-	intel_ring_emit_wa(ring, GEN7_ROW_CHICKEN2,
-			   _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));
+	dev_priv->workarounds.count = 0;
 
-	/* WaDisableSamplerPowerBypass:chv (pre-production hw) */
-	intel_ring_emit_wa(ring, HALF_SLICE_CHICKEN3,
-			   _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
+	if (IS_BROADWELL(dev))
+		return bdw_init_workarounds(ring);
 
-	intel_ring_advance(ring);
+	if (IS_CHERRYVIEW(dev))
+		return chv_init_workarounds(ring);
 
 	return 0;
 }
@@ -812,7 +843,7 @@ static int init_render_ring(struct intel_engine_cs *ring)
 	 *
 	 * WaDisableAsyncFlipPerfMode:snb,ivb,hsw,vlv,bdw,chv
 	 */
-	if (INTEL_INFO(dev)->gen >= 6)
+	if (INTEL_INFO(dev)->gen >= 6 && INTEL_INFO(dev)->gen < 9)
 		I915_WRITE(MI_MODE, _MASKED_BIT_ENABLE(ASYNC_FLIP_PERF_DISABLE));
 
 	/* Required for the hardware to program scanline values for waiting */
@@ -849,7 +880,7 @@ static int init_render_ring(struct intel_engine_cs *ring)
 	if (HAS_L3_DPF(dev))
 		I915_WRITE_IMR(ring, ~GT_PARITY_ERROR(dev));
 
-	return ret;
+	return init_workarounds_ring(ring);
 }
 
 static void render_ring_cleanup(struct intel_engine_cs *ring)
@@ -1186,7 +1217,7 @@ gen5_ring_get_irq(struct intel_engine_cs *ring)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	unsigned long flags;
 
-	if (!dev->irq_enabled)
+	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
 		return false;
 
 	spin_lock_irqsave(&dev_priv->irq_lock, flags);
@@ -1217,7 +1248,7 @@ i9xx_ring_get_irq(struct intel_engine_cs *ring)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	unsigned long flags;
 
-	if (!dev->irq_enabled)
+	if (!intel_irqs_enabled(dev_priv))
 		return false;
 
 	spin_lock_irqsave(&dev_priv->irq_lock, flags);
@@ -1254,7 +1285,7 @@ i8xx_ring_get_irq(struct intel_engine_cs *ring)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	unsigned long flags;
 
-	if (!dev->irq_enabled)
+	if (!intel_irqs_enabled(dev_priv))
 		return false;
 
 	spin_lock_irqsave(&dev_priv->irq_lock, flags);
@@ -1388,8 +1419,8 @@ gen6_ring_get_irq(struct intel_engine_cs *ring)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	unsigned long flags;
 
-	if (!dev->irq_enabled)
-	       return false;
+	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
+		return false;
 
 	spin_lock_irqsave(&dev_priv->irq_lock, flags);
 	if (ring->irq_refcount++ == 0) {
@@ -1431,7 +1462,7 @@ hsw_vebox_get_irq(struct intel_engine_cs *ring)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	unsigned long flags;
 
-	if (!dev->irq_enabled)
+	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
 		return false;
 
 	spin_lock_irqsave(&dev_priv->irq_lock, flags);
@@ -1451,9 +1482,6 @@ hsw_vebox_put_irq(struct intel_engine_cs *ring)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	unsigned long flags;
 
-	if (!dev->irq_enabled)
-		return;
-
 	spin_lock_irqsave(&dev_priv->irq_lock, flags);
 	if (--ring->irq_refcount == 0) {
 		I915_WRITE_IMR(ring, ~0);
@@ -1469,7 +1497,7 @@ gen8_ring_get_irq(struct intel_engine_cs *ring)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	unsigned long flags;
 
-	if (!dev->irq_enabled)
+	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
 		return false;
 
 	spin_lock_irqsave(&dev_priv->irq_lock, flags);
@@ -1694,13 +1722,42 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 	return 0;
 }
 
-void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
+void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 {
-	if (!ringbuf->obj)
-		return;
-
 	iounmap(ringbuf->virtual_start);
+	ringbuf->virtual_start = NULL;
 	i915_gem_object_ggtt_unpin(ringbuf->obj);
+}
+
+int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
+				     struct intel_ringbuffer *ringbuf)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct drm_i915_gem_object *obj = ringbuf->obj;
+	int ret;
+
+	ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
+	if (ret)
+		return ret;
+
+	ret = i915_gem_object_set_to_gtt_domain(obj, true);
+	if (ret) {
+		i915_gem_object_ggtt_unpin(obj);
+		return ret;
+	}
+
+	ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
+			i915_gem_obj_ggtt_offset(obj), ringbuf->size);
+	if (ringbuf->virtual_start == NULL) {
+		i915_gem_object_ggtt_unpin(obj);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
+{
 	drm_gem_object_unreference(&ringbuf->obj->base);
 	ringbuf->obj = NULL;
 }
@@ -1708,12 +1765,7 @@ void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 			       struct intel_ringbuffer *ringbuf)
 {
-	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj;
-	int ret;
-
-	if (ringbuf->obj)
-		return 0;
 
 	obj = NULL;
 	if (!HAS_LLC(dev))
@@ -1726,30 +1778,9 @@ int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 	/* mark ring buffers as read-only from GPU side by default */
 	obj->gt_ro = 1;
 
-	ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
-	if (ret)
-		goto err_unref;
-
-	ret = i915_gem_object_set_to_gtt_domain(obj, true);
-	if (ret)
-		goto err_unpin;
-
-	ringbuf->virtual_start =
-		ioremap_wc(dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj),
-				ringbuf->size);
-	if (ringbuf->virtual_start == NULL) {
-		ret = -EINVAL;
-		goto err_unpin;
-	}
-
 	ringbuf->obj = obj;
-	return 0;
 
-err_unpin:
-	i915_gem_object_ggtt_unpin(obj);
-err_unref:
-	drm_gem_object_unreference(&obj->base);
-	return ret;
+	return 0;
 }
 
 static int intel_init_ring_buffer(struct drm_device *dev,
@@ -1786,10 +1817,21 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 			goto error;
 	}
 
-	ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
-	if (ret) {
-		DRM_ERROR("Failed to allocate ringbuffer %s: %d\n", ring->name, ret);
-		goto error;
+	if (ringbuf->obj == NULL) {
+		ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
+		if (ret) {
+			DRM_ERROR("Failed to allocate ringbuffer %s: %d\n",
+					ring->name, ret);
+			goto error;
+		}
+
+		ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
+		if (ret) {
+			DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
+					ring->name, ret);
+			intel_destroy_ringbuffer_obj(ringbuf);
+			goto error;
+		}
 	}
 
 	/* Workaround an erratum on the i830 which causes a hang if
@@ -1818,15 +1860,19 @@ error:
 
 void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 {
-	struct drm_i915_private *dev_priv = to_i915(ring->dev);
-	struct intel_ringbuffer *ringbuf = ring->buffer;
+	struct drm_i915_private *dev_priv;
+	struct intel_ringbuffer *ringbuf;
 
 	if (!intel_ring_initialized(ring))
 		return;
 
+	dev_priv = to_i915(ring->dev);
+	ringbuf = ring->buffer;
+
 	intel_stop_ring_buffer(ring);
 	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
 
+	intel_unpin_ringbuffer_obj(ringbuf);
 	intel_destroy_ringbuffer_obj(ringbuf);
 	ring->preallocated_lazy_request = NULL;
 	ring->outstanding_lazy_seqno = 0;
@@ -1912,13 +1958,6 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 			break;
 		}
 
-		if (!drm_core_check_feature(dev, DRIVER_MODESET) &&
-		    dev->primary->master) {
-			struct drm_i915_master_private *master_priv = dev->primary->master->driver_priv;
-			if (master_priv->sarea_priv)
-				master_priv->sarea_priv->perf_boxes |= I915_BOX_WAIT;
-		}
-
 		msleep(1);
 
 		if (dev_priv->mm.interruptible && signal_pending(current)) {
@@ -2229,6 +2268,7 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
 			   u32 invalidate, u32 flush)
 {
 	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	uint32_t cmd;
 	int ret;
 
@@ -2259,8 +2299,12 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
 	}
 	intel_ring_advance(ring);
 
-	if (IS_GEN7(dev) && !invalidate && flush)
-		return gen7_ring_fbc_flush(ring, FBC_REND_CACHE_CLEAN);
+	if (!invalidate && flush) {
+		if (IS_GEN7(dev))
+			return gen7_ring_fbc_flush(ring, FBC_REND_CACHE_CLEAN);
+		else if (IS_BROADWELL(dev))
+			dev_priv->fbc.need_sw_cache_clean = true;
+	}
 
 	return 0;
 }
@@ -2293,10 +2337,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 					dev_priv->semaphore_obj = obj;
 			}
 		}
-		if (IS_CHERRYVIEW(dev))
-			ring->init_context = chv_init_workarounds;
-		else
-			ring->init_context = bdw_init_workarounds;
+
+		ring->init_context = intel_ring_workarounds_emit;
 		ring->add_request = gen6_add_request;
 		ring->flush = gen8_render_ring_flush;
 		ring->irq_get = gen8_ring_get_irq;
@@ -2406,91 +2448,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	return intel_init_ring_buffer(dev, ring);
 }
 
-int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_engine_cs *ring = &dev_priv->ring[RCS];
-	struct intel_ringbuffer *ringbuf = ring->buffer;
-	int ret;
-
-	if (ringbuf == NULL) {
-		ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
-		if (!ringbuf)
-			return -ENOMEM;
-		ring->buffer = ringbuf;
-	}
-
-	ring->name = "render ring";
-	ring->id = RCS;
-	ring->mmio_base = RENDER_RING_BASE;
-
-	if (INTEL_INFO(dev)->gen >= 6) {
-		/* non-kms not supported on gen6+ */
-		ret = -ENODEV;
-		goto err_ringbuf;
-	}
-
-	/* Note: gem is not supported on gen5/ilk without kms (the corresponding
-	 * gem_init ioctl returns with -ENODEV). Hence we do not need to set up
-	 * the special gen5 functions. */
-	ring->add_request = i9xx_add_request;
-	if (INTEL_INFO(dev)->gen < 4)
-		ring->flush = gen2_render_ring_flush;
-	else
-		ring->flush = gen4_render_ring_flush;
-	ring->get_seqno = ring_get_seqno;
-	ring->set_seqno = ring_set_seqno;
-	if (IS_GEN2(dev)) {
-		ring->irq_get = i8xx_ring_get_irq;
-		ring->irq_put = i8xx_ring_put_irq;
-	} else {
-		ring->irq_get = i9xx_ring_get_irq;
-		ring->irq_put = i9xx_ring_put_irq;
-	}
-	ring->irq_enable_mask = I915_USER_INTERRUPT;
-	ring->write_tail = ring_write_tail;
-	if (INTEL_INFO(dev)->gen >= 4)
-		ring->dispatch_execbuffer = i965_dispatch_execbuffer;
-	else if (IS_I830(dev) || IS_845G(dev))
-		ring->dispatch_execbuffer = i830_dispatch_execbuffer;
-	else
-		ring->dispatch_execbuffer = i915_dispatch_execbuffer;
-	ring->init = init_render_ring;
-	ring->cleanup = render_ring_cleanup;
-
-	ring->dev = dev;
-	INIT_LIST_HEAD(&ring->active_list);
-	INIT_LIST_HEAD(&ring->request_list);
-
-	ringbuf->size = size;
-	ringbuf->effective_size = ringbuf->size;
-	if (IS_I830(ring->dev) || IS_845G(ring->dev))
-		ringbuf->effective_size -= 2 * CACHELINE_BYTES;
-
-	ringbuf->virtual_start = ioremap_wc(start, size);
-	if (ringbuf->virtual_start == NULL) {
-		DRM_ERROR("can not ioremap virtual address for"
-			  " ring buffer\n");
-		ret = -ENOMEM;
-		goto err_ringbuf;
-	}
-
-	if (!I915_NEED_GFX_HWS(dev)) {
-		ret = init_phys_status_page(ring);
-		if (ret)
-			goto err_vstart;
-	}
-
-	return 0;
-
-err_vstart:
-	iounmap(ringbuf->virtual_start);
-err_ringbuf:
-	kfree(ringbuf);
-	ring->buffer = NULL;
-	return ret;
-}
-
 int intel_init_bsd_ring_buffer(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 96479c89f4bd..fe426cff598b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -148,7 +148,8 @@ struct  intel_engine_cs {
 
 	int		(*init)(struct intel_engine_cs *ring);
 
-	int		(*init_context)(struct intel_engine_cs *ring);
+	int		(*init_context)(struct intel_engine_cs *ring,
+					struct intel_context *ctx);
 
 	void		(*write_tail)(struct intel_engine_cs *ring,
 				      u32 value);
@@ -235,6 +236,7 @@ struct  intel_engine_cs {
 	/* Execlists */
 	spinlock_t execlist_lock;
 	struct list_head execlist_queue;
+	struct list_head execlist_retired_req_list;
 	u8 next_context_status_buffer;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
@@ -381,6 +383,9 @@ intel_write_status_page(struct intel_engine_cs *ring,
 #define I915_GEM_HWS_SCRATCH_INDEX	0x30
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
+void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
+int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
+				     struct intel_ringbuffer *ringbuf);
 void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
 int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 			       struct intel_ringbuffer *ringbuf);
@@ -424,6 +429,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev);
 u64 intel_ring_get_active_head(struct intel_engine_cs *ring);
 void intel_ring_setup_status_page(struct intel_engine_cs *ring);
 
+int init_workarounds_ring(struct intel_engine_cs *ring);
+
 static inline u32 intel_ring_get_tail(struct intel_ringbuffer *ringbuf)
 {
 	return ringbuf->tail;
@@ -441,7 +448,4 @@ static inline void i915_trace_irq_get(struct intel_engine_cs *ring, u32 seqno)
 		ring->trace_irq_seqno = seqno;
 }
 
-/* DRI warts */
-int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size);
-
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
new file mode 100644
index 000000000000..f5a78d53e297
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -0,0 +1,1406 @@
+/*
+ * Copyright © 2012-2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Eugeni Dodonov <eugeni.dodonov@intel.com>
+ *    Daniel Vetter <daniel.vetter@ffwll.ch>
+ *
+ */
+
+#include <linux/pm_runtime.h>
+#include <linux/vgaarb.h>
+
+#include "i915_drv.h"
+#include "intel_drv.h"
+#include <drm/i915_powerwell.h>
+
+/**
+ * DOC: runtime pm
+ *
+ * The i915 driver supports dynamic enabling and disabling of entire hardware
+ * blocks at runtime. This is especially important on the display side where
+ * software is supposed to control many power gates manually on recent hardware,
+ * since on the GT side a lot of the power management is done by the hardware.
+ * But even there some manual control at the device level is required.
+ *
+ * Since i915 supports a diverse set of platforms with a unified codebase and
+ * hardware engineers just love to shuffle functionality around between power
+ * domains there's a sizeable amount of indirection required. This file provides
+ * generic functions to the driver for grabbing and releasing references for
+ * abstract power domains. It then maps those to the actual power wells
+ * present for a given platform.
+ */
+
+static struct i915_power_domains *hsw_pwr;
+
+#define for_each_power_well(i, power_well, domain_mask, power_domains)	\
+	for (i = 0;							\
+	     i < (power_domains)->power_well_count &&			\
+		 ((power_well) = &(power_domains)->power_wells[i]);	\
+	     i++)							\
+		if ((power_well)->domains & (domain_mask))
+
+#define for_each_power_well_rev(i, power_well, domain_mask, power_domains) \
+	for (i = (power_domains)->power_well_count - 1;			 \
+	     i >= 0 && ((power_well) = &(power_domains)->power_wells[i]);\
+	     i--)							 \
+		if ((power_well)->domains & (domain_mask))
+
+/*
+ * We should only use the power well if we explicitly asked the hardware to
+ * enable it, so check if it's enabled and also check if we've requested it to
+ * be enabled.
+ */
+static bool hsw_power_well_enabled(struct drm_i915_private *dev_priv,
+				   struct i915_power_well *power_well)
+{
+	return I915_READ(HSW_PWR_WELL_DRIVER) ==
+		     (HSW_PWR_WELL_ENABLE_REQUEST | HSW_PWR_WELL_STATE_ENABLED);
+}
+
+/**
+ * __intel_display_power_is_enabled - unlocked check for a power domain
+ * @dev_priv: i915 device instance
+ * @domain: power domain to check
+ *
+ * This is the unlocked version of intel_display_power_is_enabled() and should
+ * only be used from error capture and recovery code where deadlocks are
+ * possible.
+ *
+ * Returns:
+ * True when the power domain is enabled, false otherwise.
+ */
+bool __intel_display_power_is_enabled(struct drm_i915_private *dev_priv,
+				      enum intel_display_power_domain domain)
+{
+	struct i915_power_domains *power_domains;
+	struct i915_power_well *power_well;
+	bool is_enabled;
+	int i;
+
+	if (dev_priv->pm.suspended)
+		return false;
+
+	power_domains = &dev_priv->power_domains;
+
+	is_enabled = true;
+
+	for_each_power_well_rev(i, power_well, BIT(domain), power_domains) {
+		if (power_well->always_on)
+			continue;
+
+		if (!power_well->hw_enabled) {
+			is_enabled = false;
+			break;
+		}
+	}
+
+	return is_enabled;
+}
+
+/**
+ * intel_display_power_is_enabled - unlocked check for a power domain
+ * @dev_priv: i915 device instance
+ * @domain: power domain to check
+ *
+ * This function can be used to check the hw power domain state. It is mostly
+ * used in hardware state readout functions. Everywhere else code should rely
+ * upon explicit power domain reference counting to ensure that the hardware
+ * block is powered up before accessing it.
+ *
+ * Callers must hold the relevant modesetting locks to ensure that concurrent
+ * threads can't disable the power well while the caller tries to read a few
+ * registers.
+ *
+ * Returns:
+ * True when the power domain is enabled, false otherwise.
+ */
+bool intel_display_power_is_enabled(struct drm_i915_private *dev_priv,
+				    enum intel_display_power_domain domain)
+{
+	struct i915_power_domains *power_domains;
+	bool ret;
+
+	power_domains = &dev_priv->power_domains;
+
+	mutex_lock(&power_domains->lock);
+	ret = __intel_display_power_is_enabled(dev_priv, domain);
+	mutex_unlock(&power_domains->lock);
+
+	return ret;
+}
+
+/**
+ * intel_display_set_init_power - set the initial power domain state
+ * @dev_priv: i915 device instance
+ * @enable: whether to enable or disable the initial power domain state
+ *
+ * For simplicity our driver load/unload and system suspend/resume code assumes
+ * that all power domains are always enabled. This functions controls the state
+ * of this little hack. While the initial power domain state is enabled runtime
+ * pm is effectively disabled.
+ */
+void intel_display_set_init_power(struct drm_i915_private *dev_priv,
+				  bool enable)
+{
+	if (dev_priv->power_domains.init_power_on == enable)
+		return;
+
+	if (enable)
+		intel_display_power_get(dev_priv, POWER_DOMAIN_INIT);
+	else
+		intel_display_power_put(dev_priv, POWER_DOMAIN_INIT);
+
+	dev_priv->power_domains.init_power_on = enable;
+}
+
+/*
+ * Starting with Haswell, we have a "Power Down Well" that can be turned off
+ * when not needed anymore. We have 4 registers that can request the power well
+ * to be enabled, and it will only be disabled if none of the registers is
+ * requesting it to be enabled.
+ */
+static void hsw_power_well_post_enable(struct drm_i915_private *dev_priv)
+{
+	struct drm_device *dev = dev_priv->dev;
+
+	/*
+	 * After we re-enable the power well, if we touch VGA register 0x3d5
+	 * we'll get unclaimed register interrupts. This stops after we write
+	 * anything to the VGA MSR register. The vgacon module uses this
+	 * register all the time, so if we unbind our driver and, as a
+	 * consequence, bind vgacon, we'll get stuck in an infinite loop at
+	 * console_unlock(). So make here we touch the VGA MSR register, making
+	 * sure vgacon can keep working normally without triggering interrupts
+	 * and error messages.
+	 */
+	vga_get_uninterruptible(dev->pdev, VGA_RSRC_LEGACY_IO);
+	outb(inb(VGA_MSR_READ), VGA_MSR_WRITE);
+	vga_put(dev->pdev, VGA_RSRC_LEGACY_IO);
+
+	if (IS_BROADWELL(dev) || (INTEL_INFO(dev)->gen >= 9))
+		gen8_irq_power_well_post_enable(dev_priv);
+}
+
+static void hsw_set_power_well(struct drm_i915_private *dev_priv,
+			       struct i915_power_well *power_well, bool enable)
+{
+	bool is_enabled, enable_requested;
+	uint32_t tmp;
+
+	tmp = I915_READ(HSW_PWR_WELL_DRIVER);
+	is_enabled = tmp & HSW_PWR_WELL_STATE_ENABLED;
+	enable_requested = tmp & HSW_PWR_WELL_ENABLE_REQUEST;
+
+	if (enable) {
+		if (!enable_requested)
+			I915_WRITE(HSW_PWR_WELL_DRIVER,
+				   HSW_PWR_WELL_ENABLE_REQUEST);
+
+		if (!is_enabled) {
+			DRM_DEBUG_KMS("Enabling power well\n");
+			if (wait_for((I915_READ(HSW_PWR_WELL_DRIVER) &
+				      HSW_PWR_WELL_STATE_ENABLED), 20))
+				DRM_ERROR("Timeout enabling power well\n");
+			hsw_power_well_post_enable(dev_priv);
+		}
+
+	} else {
+		if (enable_requested) {
+			I915_WRITE(HSW_PWR_WELL_DRIVER, 0);
+			POSTING_READ(HSW_PWR_WELL_DRIVER);
+			DRM_DEBUG_KMS("Requesting to disable the power well\n");
+		}
+	}
+}
+
+static void hsw_power_well_sync_hw(struct drm_i915_private *dev_priv,
+				   struct i915_power_well *power_well)
+{
+	hsw_set_power_well(dev_priv, power_well, power_well->count > 0);
+
+	/*
+	 * We're taking over the BIOS, so clear any requests made by it since
+	 * the driver is in charge now.
+	 */
+	if (I915_READ(HSW_PWR_WELL_BIOS) & HSW_PWR_WELL_ENABLE_REQUEST)
+		I915_WRITE(HSW_PWR_WELL_BIOS, 0);
+}
+
+static void hsw_power_well_enable(struct drm_i915_private *dev_priv,
+				  struct i915_power_well *power_well)
+{
+	hsw_set_power_well(dev_priv, power_well, true);
+}
+
+static void hsw_power_well_disable(struct drm_i915_private *dev_priv,
+				   struct i915_power_well *power_well)
+{
+	hsw_set_power_well(dev_priv, power_well, false);
+}
+
+static void i9xx_always_on_power_well_noop(struct drm_i915_private *dev_priv,
+					   struct i915_power_well *power_well)
+{
+}
+
+static bool i9xx_always_on_power_well_enabled(struct drm_i915_private *dev_priv,
+					     struct i915_power_well *power_well)
+{
+	return true;
+}
+
+static void vlv_set_power_well(struct drm_i915_private *dev_priv,
+			       struct i915_power_well *power_well, bool enable)
+{
+	enum punit_power_well power_well_id = power_well->data;
+	u32 mask;
+	u32 state;
+	u32 ctrl;
+
+	mask = PUNIT_PWRGT_MASK(power_well_id);
+	state = enable ? PUNIT_PWRGT_PWR_ON(power_well_id) :
+			 PUNIT_PWRGT_PWR_GATE(power_well_id);
+
+	mutex_lock(&dev_priv->rps.hw_lock);
+
+#define COND \
+	((vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_STATUS) & mask) == state)
+
+	if (COND)
+		goto out;
+
+	ctrl = vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_CTRL);
+	ctrl &= ~mask;
+	ctrl |= state;
+	vlv_punit_write(dev_priv, PUNIT_REG_PWRGT_CTRL, ctrl);
+
+	if (wait_for(COND, 100))
+		DRM_ERROR("timout setting power well state %08x (%08x)\n",
+			  state,
+			  vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_CTRL));
+
+#undef COND
+
+out:
+	mutex_unlock(&dev_priv->rps.hw_lock);
+}
+
+static void vlv_power_well_sync_hw(struct drm_i915_private *dev_priv,
+				   struct i915_power_well *power_well)
+{
+	vlv_set_power_well(dev_priv, power_well, power_well->count > 0);
+}
+
+static void vlv_power_well_enable(struct drm_i915_private *dev_priv,
+				  struct i915_power_well *power_well)
+{
+	vlv_set_power_well(dev_priv, power_well, true);
+}
+
+static void vlv_power_well_disable(struct drm_i915_private *dev_priv,
+				   struct i915_power_well *power_well)
+{
+	vlv_set_power_well(dev_priv, power_well, false);
+}
+
+static bool vlv_power_well_enabled(struct drm_i915_private *dev_priv,
+				   struct i915_power_well *power_well)
+{
+	int power_well_id = power_well->data;
+	bool enabled = false;
+	u32 mask;
+	u32 state;
+	u32 ctrl;
+
+	mask = PUNIT_PWRGT_MASK(power_well_id);
+	ctrl = PUNIT_PWRGT_PWR_ON(power_well_id);
+
+	mutex_lock(&dev_priv->rps.hw_lock);
+
+	state = vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_STATUS) & mask;
+	/*
+	 * We only ever set the power-on and power-gate states, anything
+	 * else is unexpected.
+	 */
+	WARN_ON(state != PUNIT_PWRGT_PWR_ON(power_well_id) &&
+		state != PUNIT_PWRGT_PWR_GATE(power_well_id));
+	if (state == ctrl)
+		enabled = true;
+
+	/*
+	 * A transient state at this point would mean some unexpected party
+	 * is poking at the power controls too.
+	 */
+	ctrl = vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_CTRL) & mask;
+	WARN_ON(ctrl != state);
+
+	mutex_unlock(&dev_priv->rps.hw_lock);
+
+	return enabled;
+}
+
+static void vlv_display_power_well_enable(struct drm_i915_private *dev_priv,
+					  struct i915_power_well *power_well)
+{
+	WARN_ON_ONCE(power_well->data != PUNIT_POWER_WELL_DISP2D);
+
+	vlv_set_power_well(dev_priv, power_well, true);
+
+	spin_lock_irq(&dev_priv->irq_lock);
+	valleyview_enable_display_irqs(dev_priv);
+	spin_unlock_irq(&dev_priv->irq_lock);
+
+	/*
+	 * During driver initialization/resume we can avoid restoring the
+	 * part of the HW/SW state that will be inited anyway explicitly.
+	 */
+	if (dev_priv->power_domains.initializing)
+		return;
+
+	intel_hpd_init(dev_priv);
+
+	i915_redisable_vga_power_on(dev_priv->dev);
+}
+
+static void vlv_display_power_well_disable(struct drm_i915_private *dev_priv,
+					   struct i915_power_well *power_well)
+{
+	WARN_ON_ONCE(power_well->data != PUNIT_POWER_WELL_DISP2D);
+
+	spin_lock_irq(&dev_priv->irq_lock);
+	valleyview_disable_display_irqs(dev_priv);
+	spin_unlock_irq(&dev_priv->irq_lock);
+
+	vlv_set_power_well(dev_priv, power_well, false);
+
+	vlv_power_sequencer_reset(dev_priv);
+}
+
+static void vlv_dpio_cmn_power_well_enable(struct drm_i915_private *dev_priv,
+					   struct i915_power_well *power_well)
+{
+	WARN_ON_ONCE(power_well->data != PUNIT_POWER_WELL_DPIO_CMN_BC);
+
+	/*
+	 * Enable the CRI clock source so we can get at the
+	 * display and the reference clock for VGA
+	 * hotplug / manual detection.
+	 */
+	I915_WRITE(DPLL(PIPE_B), I915_READ(DPLL(PIPE_B)) |
+		   DPLL_REFA_CLK_ENABLE_VLV | DPLL_INTEGRATED_CRI_CLK_VLV);
+	udelay(1); /* >10ns for cmnreset, >0ns for sidereset */
+
+	vlv_set_power_well(dev_priv, power_well, true);
+
+	/*
+	 * From VLV2A0_DP_eDP_DPIO_driver_vbios_notes_10.docx -
+	 *  6.	De-assert cmn_reset/side_reset. Same as VLV X0.
+	 *   a.	GUnit 0x2110 bit[0] set to 1 (def 0)
+	 *   b.	The other bits such as sfr settings / modesel may all
+	 *	be set to 0.
+	 *
+	 * This should only be done on init and resume from S3 with
+	 * both PLLs disabled, or we risk losing DPIO and PLL
+	 * synchronization.
+	 */
+	I915_WRITE(DPIO_CTL, I915_READ(DPIO_CTL) | DPIO_CMNRST);
+}
+
+static void vlv_dpio_cmn_power_well_disable(struct drm_i915_private *dev_priv,
+					    struct i915_power_well *power_well)
+{
+	enum pipe pipe;
+
+	WARN_ON_ONCE(power_well->data != PUNIT_POWER_WELL_DPIO_CMN_BC);
+
+	for_each_pipe(dev_priv, pipe)
+		assert_pll_disabled(dev_priv, pipe);
+
+	/* Assert common reset */
+	I915_WRITE(DPIO_CTL, I915_READ(DPIO_CTL) & ~DPIO_CMNRST);
+
+	vlv_set_power_well(dev_priv, power_well, false);
+}
+
+static void chv_dpio_cmn_power_well_enable(struct drm_i915_private *dev_priv,
+					   struct i915_power_well *power_well)
+{
+	enum dpio_phy phy;
+
+	WARN_ON_ONCE(power_well->data != PUNIT_POWER_WELL_DPIO_CMN_BC &&
+		     power_well->data != PUNIT_POWER_WELL_DPIO_CMN_D);
+
+	/*
+	 * Enable the CRI clock source so we can get at the
+	 * display and the reference clock for VGA
+	 * hotplug / manual detection.
+	 */
+	if (power_well->data == PUNIT_POWER_WELL_DPIO_CMN_BC) {
+		phy = DPIO_PHY0;
+		I915_WRITE(DPLL(PIPE_B), I915_READ(DPLL(PIPE_B)) |
+			   DPLL_REFA_CLK_ENABLE_VLV);
+		I915_WRITE(DPLL(PIPE_B), I915_READ(DPLL(PIPE_B)) |
+			   DPLL_REFA_CLK_ENABLE_VLV | DPLL_INTEGRATED_CRI_CLK_VLV);
+	} else {
+		phy = DPIO_PHY1;
+		I915_WRITE(DPLL(PIPE_C), I915_READ(DPLL(PIPE_C)) |
+			   DPLL_REFA_CLK_ENABLE_VLV | DPLL_INTEGRATED_CRI_CLK_VLV);
+	}
+	udelay(1); /* >10ns for cmnreset, >0ns for sidereset */
+	vlv_set_power_well(dev_priv, power_well, true);
+
+	/* Poll for phypwrgood signal */
+	if (wait_for(I915_READ(DISPLAY_PHY_STATUS) & PHY_POWERGOOD(phy), 1))
+		DRM_ERROR("Display PHY %d is not power up\n", phy);
+
+	I915_WRITE(DISPLAY_PHY_CONTROL, I915_READ(DISPLAY_PHY_CONTROL) |
+		   PHY_COM_LANE_RESET_DEASSERT(phy));
+}
+
+static void chv_dpio_cmn_power_well_disable(struct drm_i915_private *dev_priv,
+					    struct i915_power_well *power_well)
+{
+	enum dpio_phy phy;
+
+	WARN_ON_ONCE(power_well->data != PUNIT_POWER_WELL_DPIO_CMN_BC &&
+		     power_well->data != PUNIT_POWER_WELL_DPIO_CMN_D);
+
+	if (power_well->data == PUNIT_POWER_WELL_DPIO_CMN_BC) {
+		phy = DPIO_PHY0;
+		assert_pll_disabled(dev_priv, PIPE_A);
+		assert_pll_disabled(dev_priv, PIPE_B);
+	} else {
+		phy = DPIO_PHY1;
+		assert_pll_disabled(dev_priv, PIPE_C);
+	}
+
+	I915_WRITE(DISPLAY_PHY_CONTROL, I915_READ(DISPLAY_PHY_CONTROL) &
+		   ~PHY_COM_LANE_RESET_DEASSERT(phy));
+
+	vlv_set_power_well(dev_priv, power_well, false);
+}
+
+static bool chv_pipe_power_well_enabled(struct drm_i915_private *dev_priv,
+					struct i915_power_well *power_well)
+{
+	enum pipe pipe = power_well->data;
+	bool enabled;
+	u32 state, ctrl;
+
+	mutex_lock(&dev_priv->rps.hw_lock);
+
+	state = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ) & DP_SSS_MASK(pipe);
+	/*
+	 * We only ever set the power-on and power-gate states, anything
+	 * else is unexpected.
+	 */
+	WARN_ON(state != DP_SSS_PWR_ON(pipe) && state != DP_SSS_PWR_GATE(pipe));
+	enabled = state == DP_SSS_PWR_ON(pipe);
+
+	/*
+	 * A transient state at this point would mean some unexpected party
+	 * is poking at the power controls too.
+	 */
+	ctrl = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ) & DP_SSC_MASK(pipe);
+	WARN_ON(ctrl << 16 != state);
+
+	mutex_unlock(&dev_priv->rps.hw_lock);
+
+	return enabled;
+}
+
+static void chv_set_pipe_power_well(struct drm_i915_private *dev_priv,
+				    struct i915_power_well *power_well,
+				    bool enable)
+{
+	enum pipe pipe = power_well->data;
+	u32 state;
+	u32 ctrl;
+
+	state = enable ? DP_SSS_PWR_ON(pipe) : DP_SSS_PWR_GATE(pipe);
+
+	mutex_lock(&dev_priv->rps.hw_lock);
+
+#define COND \
+	((vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ) & DP_SSS_MASK(pipe)) == state)
+
+	if (COND)
+		goto out;
+
+	ctrl = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ);
+	ctrl &= ~DP_SSC_MASK(pipe);
+	ctrl |= enable ? DP_SSC_PWR_ON(pipe) : DP_SSC_PWR_GATE(pipe);
+	vlv_punit_write(dev_priv, PUNIT_REG_DSPFREQ, ctrl);
+
+	if (wait_for(COND, 100))
+		DRM_ERROR("timout setting power well state %08x (%08x)\n",
+			  state,
+			  vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ));
+
+#undef COND
+
+out:
+	mutex_unlock(&dev_priv->rps.hw_lock);
+}
+
+static void chv_pipe_power_well_sync_hw(struct drm_i915_private *dev_priv,
+					struct i915_power_well *power_well)
+{
+	chv_set_pipe_power_well(dev_priv, power_well, power_well->count > 0);
+}
+
+static void chv_pipe_power_well_enable(struct drm_i915_private *dev_priv,
+				       struct i915_power_well *power_well)
+{
+	WARN_ON_ONCE(power_well->data != PIPE_A &&
+		     power_well->data != PIPE_B &&
+		     power_well->data != PIPE_C);
+
+	chv_set_pipe_power_well(dev_priv, power_well, true);
+
+	if (power_well->data == PIPE_A) {
+		spin_lock_irq(&dev_priv->irq_lock);
+		valleyview_enable_display_irqs(dev_priv);
+		spin_unlock_irq(&dev_priv->irq_lock);
+
+		/*
+		 * During driver initialization/resume we can avoid restoring the
+		 * part of the HW/SW state that will be inited anyway explicitly.
+		 */
+		if (dev_priv->power_domains.initializing)
+			return;
+
+		intel_hpd_init(dev_priv);
+
+		i915_redisable_vga_power_on(dev_priv->dev);
+	}
+}
+
+static void chv_pipe_power_well_disable(struct drm_i915_private *dev_priv,
+					struct i915_power_well *power_well)
+{
+	WARN_ON_ONCE(power_well->data != PIPE_A &&
+		     power_well->data != PIPE_B &&
+		     power_well->data != PIPE_C);
+
+	if (power_well->data == PIPE_A) {
+		spin_lock_irq(&dev_priv->irq_lock);
+		valleyview_disable_display_irqs(dev_priv);
+		spin_unlock_irq(&dev_priv->irq_lock);
+	}
+
+	chv_set_pipe_power_well(dev_priv, power_well, false);
+
+	if (power_well->data == PIPE_A)
+		vlv_power_sequencer_reset(dev_priv);
+}
+
+static void check_power_well_state(struct drm_i915_private *dev_priv,
+				   struct i915_power_well *power_well)
+{
+	bool enabled = power_well->ops->is_enabled(dev_priv, power_well);
+
+	if (power_well->always_on || !i915.disable_power_well) {
+		if (!enabled)
+			goto mismatch;
+
+		return;
+	}
+
+	if (enabled != (power_well->count > 0))
+		goto mismatch;
+
+	return;
+
+mismatch:
+	WARN(1, "state mismatch for '%s' (always_on %d hw state %d use-count %d disable_power_well %d\n",
+		  power_well->name, power_well->always_on, enabled,
+		  power_well->count, i915.disable_power_well);
+}
+
+/**
+ * intel_display_power_get - grab a power domain reference
+ * @dev_priv: i915 device instance
+ * @domain: power domain to reference
+ *
+ * This function grabs a power domain reference for @domain and ensures that the
+ * power domain and all its parents are powered up. Therefore users should only
+ * grab a reference to the innermost power domain they need.
+ *
+ * Any power domain reference obtained by this function must have a symmetric
+ * call to intel_display_power_put() to release the reference again.
+ */
+void intel_display_power_get(struct drm_i915_private *dev_priv,
+			     enum intel_display_power_domain domain)
+{
+	struct i915_power_domains *power_domains;
+	struct i915_power_well *power_well;
+	int i;
+
+	intel_runtime_pm_get(dev_priv);
+
+	power_domains = &dev_priv->power_domains;
+
+	mutex_lock(&power_domains->lock);
+
+	for_each_power_well(i, power_well, BIT(domain), power_domains) {
+		if (!power_well->count++) {
+			DRM_DEBUG_KMS("enabling %s\n", power_well->name);
+			power_well->ops->enable(dev_priv, power_well);
+			power_well->hw_enabled = true;
+		}
+
+		check_power_well_state(dev_priv, power_well);
+	}
+
+	power_domains->domain_use_count[domain]++;
+
+	mutex_unlock(&power_domains->lock);
+}
+
+/**
+ * intel_display_power_put - release a power domain reference
+ * @dev_priv: i915 device instance
+ * @domain: power domain to reference
+ *
+ * This function drops the power domain reference obtained by
+ * intel_display_power_get() and might power down the corresponding hardware
+ * block right away if this is the last reference.
+ */
+void intel_display_power_put(struct drm_i915_private *dev_priv,
+			     enum intel_display_power_domain domain)
+{
+	struct i915_power_domains *power_domains;
+	struct i915_power_well *power_well;
+	int i;
+
+	power_domains = &dev_priv->power_domains;
+
+	mutex_lock(&power_domains->lock);
+
+	WARN_ON(!power_domains->domain_use_count[domain]);
+	power_domains->domain_use_count[domain]--;
+
+	for_each_power_well_rev(i, power_well, BIT(domain), power_domains) {
+		WARN_ON(!power_well->count);
+
+		if (!--power_well->count && i915.disable_power_well) {
+			DRM_DEBUG_KMS("disabling %s\n", power_well->name);
+			power_well->hw_enabled = false;
+			power_well->ops->disable(dev_priv, power_well);
+		}
+
+		check_power_well_state(dev_priv, power_well);
+	}
+
+	mutex_unlock(&power_domains->lock);
+
+	intel_runtime_pm_put(dev_priv);
+}
+
+#define POWER_DOMAIN_MASK (BIT(POWER_DOMAIN_NUM) - 1)
+
+#define HSW_ALWAYS_ON_POWER_DOMAINS (			\
+	BIT(POWER_DOMAIN_PIPE_A) |			\
+	BIT(POWER_DOMAIN_TRANSCODER_EDP) |		\
+	BIT(POWER_DOMAIN_PORT_DDI_A_2_LANES) |		\
+	BIT(POWER_DOMAIN_PORT_DDI_A_4_LANES) |		\
+	BIT(POWER_DOMAIN_PORT_DDI_B_2_LANES) |		\
+	BIT(POWER_DOMAIN_PORT_DDI_B_4_LANES) |		\
+	BIT(POWER_DOMAIN_PORT_DDI_C_2_LANES) |		\
+	BIT(POWER_DOMAIN_PORT_DDI_C_4_LANES) |		\
+	BIT(POWER_DOMAIN_PORT_DDI_D_2_LANES) |		\
+	BIT(POWER_DOMAIN_PORT_DDI_D_4_LANES) |		\
+	BIT(POWER_DOMAIN_PORT_CRT) |			\
+	BIT(POWER_DOMAIN_PLLS) |			\
+	BIT(POWER_DOMAIN_INIT))
+#define HSW_DISPLAY_POWER_DOMAINS (				\
+	(POWER_DOMAIN_MASK & ~HSW_ALWAYS_ON_POWER_DOMAINS) |	\
+	BIT(POWER_DOMAIN_INIT))
+
+#define BDW_ALWAYS_ON_POWER_DOMAINS (			\
+	HSW_ALWAYS_ON_POWER_DOMAINS |			\
+	BIT(POWER_DOMAIN_PIPE_A_PANEL_FITTER))
+#define BDW_DISPLAY_POWER_DOMAINS (				\
+	(POWER_DOMAIN_MASK & ~BDW_ALWAYS_ON_POWER_DOMAINS) |	\
+	BIT(POWER_DOMAIN_INIT))
+
+#define VLV_ALWAYS_ON_POWER_DOMAINS	BIT(POWER_DOMAIN_INIT)
+#define VLV_DISPLAY_POWER_DOMAINS	POWER_DOMAIN_MASK
+
+#define VLV_DPIO_CMN_BC_POWER_DOMAINS (		\
+	BIT(POWER_DOMAIN_PORT_DDI_B_2_LANES) |	\
+	BIT(POWER_DOMAIN_PORT_DDI_B_4_LANES) |	\
+	BIT(POWER_DOMAIN_PORT_DDI_C_2_LANES) |	\
+	BIT(POWER_DOMAIN_PORT_DDI_C_4_LANES) |	\
+	BIT(POWER_DOMAIN_PORT_CRT) |		\
+	BIT(POWER_DOMAIN_INIT))
+
+#define VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS (	\
+	BIT(POWER_DOMAIN_PORT_DDI_B_2_LANES) |	\
+	BIT(POWER_DOMAIN_PORT_DDI_B_4_LANES) |	\
+	BIT(POWER_DOMAIN_INIT))
+
+#define VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS (	\
+	BIT(POWER_DOMAIN_PORT_DDI_B_4_LANES) |	\
+	BIT(POWER_DOMAIN_INIT))
+
+#define VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS (	\
+	BIT(POWER_DOMAIN_PORT_DDI_C_2_LANES) |	\
+	BIT(POWER_DOMAIN_PORT_DDI_C_4_LANES) |	\
+	BIT(POWER_DOMAIN_INIT))
+
+#define VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS (	\
+	BIT(POWER_DOMAIN_PORT_DDI_C_4_LANES) |	\
+	BIT(POWER_DOMAIN_INIT))
+
+#define CHV_PIPE_A_POWER_DOMAINS (	\
+	BIT(POWER_DOMAIN_PIPE_A) |	\
+	BIT(POWER_DOMAIN_INIT))
+
+#define CHV_PIPE_B_POWER_DOMAINS (	\
+	BIT(POWER_DOMAIN_PIPE_B) |	\
+	BIT(POWER_DOMAIN_INIT))
+
+#define CHV_PIPE_C_POWER_DOMAINS (	\
+	BIT(POWER_DOMAIN_PIPE_C) |	\
+	BIT(POWER_DOMAIN_INIT))
+
+#define CHV_DPIO_CMN_BC_POWER_DOMAINS (		\
+	BIT(POWER_DOMAIN_PORT_DDI_B_2_LANES) |	\
+	BIT(POWER_DOMAIN_PORT_DDI_B_4_LANES) |	\
+	BIT(POWER_DOMAIN_PORT_DDI_C_2_LANES) |	\
+	BIT(POWER_DOMAIN_PORT_DDI_C_4_LANES) |	\
+	BIT(POWER_DOMAIN_INIT))
+
+#define CHV_DPIO_CMN_D_POWER_DOMAINS (		\
+	BIT(POWER_DOMAIN_PORT_DDI_D_2_LANES) |	\
+	BIT(POWER_DOMAIN_PORT_DDI_D_4_LANES) |	\
+	BIT(POWER_DOMAIN_INIT))
+
+#define CHV_DPIO_TX_D_LANES_01_POWER_DOMAINS (	\
+	BIT(POWER_DOMAIN_PORT_DDI_D_2_LANES) |	\
+	BIT(POWER_DOMAIN_PORT_DDI_D_4_LANES) |	\
+	BIT(POWER_DOMAIN_INIT))
+
+#define CHV_DPIO_TX_D_LANES_23_POWER_DOMAINS (	\
+	BIT(POWER_DOMAIN_PORT_DDI_D_4_LANES) |	\
+	BIT(POWER_DOMAIN_INIT))
+
+static const struct i915_power_well_ops i9xx_always_on_power_well_ops = {
+	.sync_hw = i9xx_always_on_power_well_noop,
+	.enable = i9xx_always_on_power_well_noop,
+	.disable = i9xx_always_on_power_well_noop,
+	.is_enabled = i9xx_always_on_power_well_enabled,
+};
+
+static const struct i915_power_well_ops chv_pipe_power_well_ops = {
+	.sync_hw = chv_pipe_power_well_sync_hw,
+	.enable = chv_pipe_power_well_enable,
+	.disable = chv_pipe_power_well_disable,
+	.is_enabled = chv_pipe_power_well_enabled,
+};
+
+static const struct i915_power_well_ops chv_dpio_cmn_power_well_ops = {
+	.sync_hw = vlv_power_well_sync_hw,
+	.enable = chv_dpio_cmn_power_well_enable,
+	.disable = chv_dpio_cmn_power_well_disable,
+	.is_enabled = vlv_power_well_enabled,
+};
+
+static struct i915_power_well i9xx_always_on_power_well[] = {
+	{
+		.name = "always-on",
+		.always_on = 1,
+		.domains = POWER_DOMAIN_MASK,
+		.ops = &i9xx_always_on_power_well_ops,
+	},
+};
+
+static const struct i915_power_well_ops hsw_power_well_ops = {
+	.sync_hw = hsw_power_well_sync_hw,
+	.enable = hsw_power_well_enable,
+	.disable = hsw_power_well_disable,
+	.is_enabled = hsw_power_well_enabled,
+};
+
+static struct i915_power_well hsw_power_wells[] = {
+	{
+		.name = "always-on",
+		.always_on = 1,
+		.domains = HSW_ALWAYS_ON_POWER_DOMAINS,
+		.ops = &i9xx_always_on_power_well_ops,
+	},
+	{
+		.name = "display",
+		.domains = HSW_DISPLAY_POWER_DOMAINS,
+		.ops = &hsw_power_well_ops,
+	},
+};
+
+static struct i915_power_well bdw_power_wells[] = {
+	{
+		.name = "always-on",
+		.always_on = 1,
+		.domains = BDW_ALWAYS_ON_POWER_DOMAINS,
+		.ops = &i9xx_always_on_power_well_ops,
+	},
+	{
+		.name = "display",
+		.domains = BDW_DISPLAY_POWER_DOMAINS,
+		.ops = &hsw_power_well_ops,
+	},
+};
+
+static const struct i915_power_well_ops vlv_display_power_well_ops = {
+	.sync_hw = vlv_power_well_sync_hw,
+	.enable = vlv_display_power_well_enable,
+	.disable = vlv_display_power_well_disable,
+	.is_enabled = vlv_power_well_enabled,
+};
+
+static const struct i915_power_well_ops vlv_dpio_cmn_power_well_ops = {
+	.sync_hw = vlv_power_well_sync_hw,
+	.enable = vlv_dpio_cmn_power_well_enable,
+	.disable = vlv_dpio_cmn_power_well_disable,
+	.is_enabled = vlv_power_well_enabled,
+};
+
+static const struct i915_power_well_ops vlv_dpio_power_well_ops = {
+	.sync_hw = vlv_power_well_sync_hw,
+	.enable = vlv_power_well_enable,
+	.disable = vlv_power_well_disable,
+	.is_enabled = vlv_power_well_enabled,
+};
+
+static struct i915_power_well vlv_power_wells[] = {
+	{
+		.name = "always-on",
+		.always_on = 1,
+		.domains = VLV_ALWAYS_ON_POWER_DOMAINS,
+		.ops = &i9xx_always_on_power_well_ops,
+	},
+	{
+		.name = "display",
+		.domains = VLV_DISPLAY_POWER_DOMAINS,
+		.data = PUNIT_POWER_WELL_DISP2D,
+		.ops = &vlv_display_power_well_ops,
+	},
+	{
+		.name = "dpio-tx-b-01",
+		.domains = VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS |
+			   VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS |
+			   VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS |
+			   VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS,
+		.ops = &vlv_dpio_power_well_ops,
+		.data = PUNIT_POWER_WELL_DPIO_TX_B_LANES_01,
+	},
+	{
+		.name = "dpio-tx-b-23",
+		.domains = VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS |
+			   VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS |
+			   VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS |
+			   VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS,
+		.ops = &vlv_dpio_power_well_ops,
+		.data = PUNIT_POWER_WELL_DPIO_TX_B_LANES_23,
+	},
+	{
+		.name = "dpio-tx-c-01",
+		.domains = VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS |
+			   VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS |
+			   VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS |
+			   VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS,
+		.ops = &vlv_dpio_power_well_ops,
+		.data = PUNIT_POWER_WELL_DPIO_TX_C_LANES_01,
+	},
+	{
+		.name = "dpio-tx-c-23",
+		.domains = VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS |
+			   VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS |
+			   VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS |
+			   VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS,
+		.ops = &vlv_dpio_power_well_ops,
+		.data = PUNIT_POWER_WELL_DPIO_TX_C_LANES_23,
+	},
+	{
+		.name = "dpio-common",
+		.domains = VLV_DPIO_CMN_BC_POWER_DOMAINS,
+		.data = PUNIT_POWER_WELL_DPIO_CMN_BC,
+		.ops = &vlv_dpio_cmn_power_well_ops,
+	},
+};
+
+static struct i915_power_well chv_power_wells[] = {
+	{
+		.name = "always-on",
+		.always_on = 1,
+		.domains = VLV_ALWAYS_ON_POWER_DOMAINS,
+		.ops = &i9xx_always_on_power_well_ops,
+	},
+#if 0
+	{
+		.name = "display",
+		.domains = VLV_DISPLAY_POWER_DOMAINS,
+		.data = PUNIT_POWER_WELL_DISP2D,
+		.ops = &vlv_display_power_well_ops,
+	},
+#endif
+	{
+		.name = "pipe-a",
+		/*
+		 * FIXME: pipe A power well seems to be the new disp2d well.
+		 * At least all registers seem to be housed there. Figure
+		 * out if this a a temporary situation in pre-production
+		 * hardware or a permanent state of affairs.
+		 */
+		.domains = CHV_PIPE_A_POWER_DOMAINS | VLV_DISPLAY_POWER_DOMAINS,
+		.data = PIPE_A,
+		.ops = &chv_pipe_power_well_ops,
+	},
+#if 0
+	{
+		.name = "pipe-b",
+		.domains = CHV_PIPE_B_POWER_DOMAINS,
+		.data = PIPE_B,
+		.ops = &chv_pipe_power_well_ops,
+	},
+	{
+		.name = "pipe-c",
+		.domains = CHV_PIPE_C_POWER_DOMAINS,
+		.data = PIPE_C,
+		.ops = &chv_pipe_power_well_ops,
+	},
+#endif
+	{
+		.name = "dpio-common-bc",
+		/*
+		 * XXX: cmnreset for one PHY seems to disturb the other.
+		 * As a workaround keep both powered on at the same
+		 * time for now.
+		 */
+		.domains = CHV_DPIO_CMN_BC_POWER_DOMAINS | CHV_DPIO_CMN_D_POWER_DOMAINS,
+		.data = PUNIT_POWER_WELL_DPIO_CMN_BC,
+		.ops = &chv_dpio_cmn_power_well_ops,
+	},
+	{
+		.name = "dpio-common-d",
+		/*
+		 * XXX: cmnreset for one PHY seems to disturb the other.
+		 * As a workaround keep both powered on at the same
+		 * time for now.
+		 */
+		.domains = CHV_DPIO_CMN_BC_POWER_DOMAINS | CHV_DPIO_CMN_D_POWER_DOMAINS,
+		.data = PUNIT_POWER_WELL_DPIO_CMN_D,
+		.ops = &chv_dpio_cmn_power_well_ops,
+	},
+#if 0
+	{
+		.name = "dpio-tx-b-01",
+		.domains = VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS |
+			   VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS,
+		.ops = &vlv_dpio_power_well_ops,
+		.data = PUNIT_POWER_WELL_DPIO_TX_B_LANES_01,
+	},
+	{
+		.name = "dpio-tx-b-23",
+		.domains = VLV_DPIO_TX_B_LANES_01_POWER_DOMAINS |
+			   VLV_DPIO_TX_B_LANES_23_POWER_DOMAINS,
+		.ops = &vlv_dpio_power_well_ops,
+		.data = PUNIT_POWER_WELL_DPIO_TX_B_LANES_23,
+	},
+	{
+		.name = "dpio-tx-c-01",
+		.domains = VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS |
+			   VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS,
+		.ops = &vlv_dpio_power_well_ops,
+		.data = PUNIT_POWER_WELL_DPIO_TX_C_LANES_01,
+	},
+	{
+		.name = "dpio-tx-c-23",
+		.domains = VLV_DPIO_TX_C_LANES_01_POWER_DOMAINS |
+			   VLV_DPIO_TX_C_LANES_23_POWER_DOMAINS,
+		.ops = &vlv_dpio_power_well_ops,
+		.data = PUNIT_POWER_WELL_DPIO_TX_C_LANES_23,
+	},
+	{
+		.name = "dpio-tx-d-01",
+		.domains = CHV_DPIO_TX_D_LANES_01_POWER_DOMAINS |
+			   CHV_DPIO_TX_D_LANES_23_POWER_DOMAINS,
+		.ops = &vlv_dpio_power_well_ops,
+		.data = PUNIT_POWER_WELL_DPIO_TX_D_LANES_01,
+	},
+	{
+		.name = "dpio-tx-d-23",
+		.domains = CHV_DPIO_TX_D_LANES_01_POWER_DOMAINS |
+			   CHV_DPIO_TX_D_LANES_23_POWER_DOMAINS,
+		.ops = &vlv_dpio_power_well_ops,
+		.data = PUNIT_POWER_WELL_DPIO_TX_D_LANES_23,
+	},
+#endif
+};
+
+static struct i915_power_well *lookup_power_well(struct drm_i915_private *dev_priv,
+						 enum punit_power_well power_well_id)
+{
+	struct i915_power_domains *power_domains = &dev_priv->power_domains;
+	struct i915_power_well *power_well;
+	int i;
+
+	for_each_power_well(i, power_well, POWER_DOMAIN_MASK, power_domains) {
+		if (power_well->data == power_well_id)
+			return power_well;
+	}
+
+	return NULL;
+}
+
+#define set_power_wells(power_domains, __power_wells) ({		\
+	(power_domains)->power_wells = (__power_wells);			\
+	(power_domains)->power_well_count = ARRAY_SIZE(__power_wells);	\
+})
+
+/**
+ * intel_power_domains_init - initializes the power domain structures
+ * @dev_priv: i915 device instance
+ *
+ * Initializes the power domain structures for @dev_priv depending upon the
+ * supported platform.
+ */
+int intel_power_domains_init(struct drm_i915_private *dev_priv)
+{
+	struct i915_power_domains *power_domains = &dev_priv->power_domains;
+
+	mutex_init(&power_domains->lock);
+
+	/*
+	 * The enabling order will be from lower to higher indexed wells,
+	 * the disabling order is reversed.
+	 */
+	if (IS_HASWELL(dev_priv->dev)) {
+		set_power_wells(power_domains, hsw_power_wells);
+		hsw_pwr = power_domains;
+	} else if (IS_BROADWELL(dev_priv->dev)) {
+		set_power_wells(power_domains, bdw_power_wells);
+		hsw_pwr = power_domains;
+	} else if (IS_CHERRYVIEW(dev_priv->dev)) {
+		set_power_wells(power_domains, chv_power_wells);
+	} else if (IS_VALLEYVIEW(dev_priv->dev)) {
+		set_power_wells(power_domains, vlv_power_wells);
+	} else {
+		set_power_wells(power_domains, i9xx_always_on_power_well);
+	}
+
+	return 0;
+}
+
+static void intel_runtime_pm_disable(struct drm_i915_private *dev_priv)
+{
+	struct drm_device *dev = dev_priv->dev;
+	struct device *device = &dev->pdev->dev;
+
+	if (!HAS_RUNTIME_PM(dev))
+		return;
+
+	if (!intel_enable_rc6(dev))
+		return;
+
+	/* Make sure we're not suspended first. */
+	pm_runtime_get_sync(device);
+	pm_runtime_disable(device);
+}
+
+/**
+ * intel_power_domains_fini - finalizes the power domain structures
+ * @dev_priv: i915 device instance
+ *
+ * Finalizes the power domain structures for @dev_priv depending upon the
+ * supported platform. This function also disables runtime pm and ensures that
+ * the device stays powered up so that the driver can be reloaded.
+ */
+void intel_power_domains_fini(struct drm_i915_private *dev_priv)
+{
+	intel_runtime_pm_disable(dev_priv);
+
+	/* The i915.ko module is still not prepared to be loaded when
+	 * the power well is not enabled, so just enable it in case
+	 * we're going to unload/reload. */
+	intel_display_set_init_power(dev_priv, true);
+
+	hsw_pwr = NULL;
+}
+
+static void intel_power_domains_resume(struct drm_i915_private *dev_priv)
+{
+	struct i915_power_domains *power_domains = &dev_priv->power_domains;
+	struct i915_power_well *power_well;
+	int i;
+
+	mutex_lock(&power_domains->lock);
+	for_each_power_well(i, power_well, POWER_DOMAIN_MASK, power_domains) {
+		power_well->ops->sync_hw(dev_priv, power_well);
+		power_well->hw_enabled = power_well->ops->is_enabled(dev_priv,
+								     power_well);
+	}
+	mutex_unlock(&power_domains->lock);
+}
+
+static void vlv_cmnlane_wa(struct drm_i915_private *dev_priv)
+{
+	struct i915_power_well *cmn =
+		lookup_power_well(dev_priv, PUNIT_POWER_WELL_DPIO_CMN_BC);
+	struct i915_power_well *disp2d =
+		lookup_power_well(dev_priv, PUNIT_POWER_WELL_DISP2D);
+
+	/* If the display might be already active skip this */
+	if (cmn->ops->is_enabled(dev_priv, cmn) &&
+	    disp2d->ops->is_enabled(dev_priv, disp2d) &&
+	    I915_READ(DPIO_CTL) & DPIO_CMNRST)
+		return;
+
+	DRM_DEBUG_KMS("toggling display PHY side reset\n");
+
+	/* cmnlane needs DPLL registers */
+	disp2d->ops->enable(dev_priv, disp2d);
+
+	/*
+	 * From VLV2A0_DP_eDP_HDMI_DPIO_driver_vbios_notes_11.docx:
+	 * Need to assert and de-assert PHY SB reset by gating the
+	 * common lane power, then un-gating it.
+	 * Simply ungating isn't enough to reset the PHY enough to get
+	 * ports and lanes running.
+	 */
+	cmn->ops->disable(dev_priv, cmn);
+}
+
+/**
+ * intel_power_domains_init_hw - initialize hardware power domain state
+ * @dev_priv: i915 device instance
+ *
+ * This function initializes the hardware power domain state and enables all
+ * power domains using intel_display_set_init_power().
+ */
+void intel_power_domains_init_hw(struct drm_i915_private *dev_priv)
+{
+	struct drm_device *dev = dev_priv->dev;
+	struct i915_power_domains *power_domains = &dev_priv->power_domains;
+
+	power_domains->initializing = true;
+
+	if (IS_VALLEYVIEW(dev) && !IS_CHERRYVIEW(dev)) {
+		mutex_lock(&power_domains->lock);
+		vlv_cmnlane_wa(dev_priv);
+		mutex_unlock(&power_domains->lock);
+	}
+
+	/* For now, we need the power well to be always enabled. */
+	intel_display_set_init_power(dev_priv, true);
+	intel_power_domains_resume(dev_priv);
+	power_domains->initializing = false;
+}
+
+/**
+ * intel_aux_display_runtime_get - grab an auxilliary power domain reference
+ * @dev_priv: i915 device instance
+ *
+ * This function grabs a power domain reference for the auxiliary power domain
+ * (for access to the GMBUS and DP AUX blocks) and ensures that it and all its
+ * parents are powered up. Therefore users should only grab a reference to the
+ * innermost power domain they need.
+ *
+ * Any power domain reference obtained by this function must have a symmetric
+ * call to intel_aux_display_runtime_put() to release the reference again.
+ */
+void intel_aux_display_runtime_get(struct drm_i915_private *dev_priv)
+{
+	intel_runtime_pm_get(dev_priv);
+}
+
+/**
+ * intel_aux_display_runtime_put - release an auxilliary power domain reference
+ * @dev_priv: i915 device instance
+ *
+ * This function drops the auxilliary power domain reference obtained by
+ * intel_aux_display_runtime_get() and might power down the corresponding
+ * hardware block right away if this is the last reference.
+ */
+void intel_aux_display_runtime_put(struct drm_i915_private *dev_priv)
+{
+	intel_runtime_pm_put(dev_priv);
+}
+
+/**
+ * intel_runtime_pm_get - grab a runtime pm reference
+ * @dev_priv: i915 device instance
+ *
+ * This function grabs a device-level runtime pm reference (mostly used for GEM
+ * code to ensure the GTT or GT is on) and ensures that it is powered up.
+ *
+ * Any runtime pm reference obtained by this function must have a symmetric
+ * call to intel_runtime_pm_put() to release the reference again.
+ */
+void intel_runtime_pm_get(struct drm_i915_private *dev_priv)
+{
+	struct drm_device *dev = dev_priv->dev;
+	struct device *device = &dev->pdev->dev;
+
+	if (!HAS_RUNTIME_PM(dev))
+		return;
+
+	pm_runtime_get_sync(device);
+	WARN(dev_priv->pm.suspended, "Device still suspended.\n");
+}
+
+/**
+ * intel_runtime_pm_get_noresume - grab a runtime pm reference
+ * @dev_priv: i915 device instance
+ *
+ * This function grabs a device-level runtime pm reference (mostly used for GEM
+ * code to ensure the GTT or GT is on).
+ *
+ * It will _not_ power up the device but instead only check that it's powered
+ * on.  Therefore it is only valid to call this functions from contexts where
+ * the device is known to be powered up and where trying to power it up would
+ * result in hilarity and deadlocks. That pretty much means only the system
+ * suspend/resume code where this is used to grab runtime pm references for
+ * delayed setup down in work items.
+ *
+ * Any runtime pm reference obtained by this function must have a symmetric
+ * call to intel_runtime_pm_put() to release the reference again.
+ */
+void intel_runtime_pm_get_noresume(struct drm_i915_private *dev_priv)
+{
+	struct drm_device *dev = dev_priv->dev;
+	struct device *device = &dev->pdev->dev;
+
+	if (!HAS_RUNTIME_PM(dev))
+		return;
+
+	WARN(dev_priv->pm.suspended, "Getting nosync-ref while suspended.\n");
+	pm_runtime_get_noresume(device);
+}
+
+/**
+ * intel_runtime_pm_put - release a runtime pm reference
+ * @dev_priv: i915 device instance
+ *
+ * This function drops the device-level runtime pm reference obtained by
+ * intel_runtime_pm_get() and might power down the corresponding
+ * hardware block right away if this is the last reference.
+ */
+void intel_runtime_pm_put(struct drm_i915_private *dev_priv)
+{
+	struct drm_device *dev = dev_priv->dev;
+	struct device *device = &dev->pdev->dev;
+
+	if (!HAS_RUNTIME_PM(dev))
+		return;
+
+	pm_runtime_mark_last_busy(device);
+	pm_runtime_put_autosuspend(device);
+}
+
+/**
+ * intel_runtime_pm_enable - enable runtime pm
+ * @dev_priv: i915 device instance
+ *
+ * This function enables runtime pm at the end of the driver load sequence.
+ *
+ * Note that this function does currently not enable runtime pm for the
+ * subordinate display power domains. That is only done on the first modeset
+ * using intel_display_set_init_power().
+ */
+void intel_runtime_pm_enable(struct drm_i915_private *dev_priv)
+{
+	struct drm_device *dev = dev_priv->dev;
+	struct device *device = &dev->pdev->dev;
+
+	if (!HAS_RUNTIME_PM(dev))
+		return;
+
+	pm_runtime_set_active(device);
+
+	/*
+	 * RPM depends on RC6 to save restore the GT HW context, so make RC6 a
+	 * requirement.
+	 */
+	if (!intel_enable_rc6(dev)) {
+		DRM_INFO("RC6 disabled, disabling runtime PM support\n");
+		return;
+	}
+
+	pm_runtime_set_autosuspend_delay(device, 10000); /* 10s */
+	pm_runtime_mark_last_busy(device);
+	pm_runtime_use_autosuspend(device);
+
+	pm_runtime_put_autosuspend(device);
+}
+
+/* Display audio driver power well request */
+int i915_request_power_well(void)
+{
+	struct drm_i915_private *dev_priv;
+
+	if (!hsw_pwr)
+		return -ENODEV;
+
+	dev_priv = container_of(hsw_pwr, struct drm_i915_private,
+				power_domains);
+	intel_display_power_get(dev_priv, POWER_DOMAIN_AUDIO);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(i915_request_power_well);
+
+/* Display audio driver power well release */
+int i915_release_power_well(void)
+{
+	struct drm_i915_private *dev_priv;
+
+	if (!hsw_pwr)
+		return -ENODEV;
+
+	dev_priv = container_of(hsw_pwr, struct drm_i915_private,
+				power_domains);
+	intel_display_power_put(dev_priv, POWER_DOMAIN_AUDIO);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(i915_release_power_well);
+
+/*
+ * Private interface for the audio driver to get CDCLK in kHz.
+ *
+ * Caller must request power well using i915_request_power_well() prior to
+ * making the call.
+ */
+int i915_get_cdclk_freq(void)
+{
+	struct drm_i915_private *dev_priv;
+
+	if (!hsw_pwr)
+		return -ENODEV;
+
+	dev_priv = container_of(hsw_pwr, struct drm_i915_private,
+				power_domains);
+
+	return intel_ddi_get_cdclk_freq(dev_priv);
+}
+EXPORT_SYMBOL_GPL(i915_get_cdclk_freq);
diff --git a/drivers/gpu/drm/i915/intel_sdvo.c b/drivers/gpu/drm/i915/intel_sdvo.c
index 9350edd6728d..6d7a277458b5 100644
--- a/drivers/gpu/drm/i915/intel_sdvo.c
+++ b/drivers/gpu/drm/i915/intel_sdvo.c
@@ -1991,57 +1991,10 @@ static int intel_sdvo_get_modes(struct drm_connector *connector)
 	return !list_empty(&connector->probed_modes);
 }
 
-static void
-intel_sdvo_destroy_enhance_property(struct drm_connector *connector)
-{
-	struct intel_sdvo_connector *intel_sdvo_connector = to_intel_sdvo_connector(connector);
-	struct drm_device *dev = connector->dev;
-
-	if (intel_sdvo_connector->left)
-		drm_property_destroy(dev, intel_sdvo_connector->left);
-	if (intel_sdvo_connector->right)
-		drm_property_destroy(dev, intel_sdvo_connector->right);
-	if (intel_sdvo_connector->top)
-		drm_property_destroy(dev, intel_sdvo_connector->top);
-	if (intel_sdvo_connector->bottom)
-		drm_property_destroy(dev, intel_sdvo_connector->bottom);
-	if (intel_sdvo_connector->hpos)
-		drm_property_destroy(dev, intel_sdvo_connector->hpos);
-	if (intel_sdvo_connector->vpos)
-		drm_property_destroy(dev, intel_sdvo_connector->vpos);
-	if (intel_sdvo_connector->saturation)
-		drm_property_destroy(dev, intel_sdvo_connector->saturation);
-	if (intel_sdvo_connector->contrast)
-		drm_property_destroy(dev, intel_sdvo_connector->contrast);
-	if (intel_sdvo_connector->hue)
-		drm_property_destroy(dev, intel_sdvo_connector->hue);
-	if (intel_sdvo_connector->sharpness)
-		drm_property_destroy(dev, intel_sdvo_connector->sharpness);
-	if (intel_sdvo_connector->flicker_filter)
-		drm_property_destroy(dev, intel_sdvo_connector->flicker_filter);
-	if (intel_sdvo_connector->flicker_filter_2d)
-		drm_property_destroy(dev, intel_sdvo_connector->flicker_filter_2d);
-	if (intel_sdvo_connector->flicker_filter_adaptive)
-		drm_property_destroy(dev, intel_sdvo_connector->flicker_filter_adaptive);
-	if (intel_sdvo_connector->tv_luma_filter)
-		drm_property_destroy(dev, intel_sdvo_connector->tv_luma_filter);
-	if (intel_sdvo_connector->tv_chroma_filter)
-		drm_property_destroy(dev, intel_sdvo_connector->tv_chroma_filter);
-	if (intel_sdvo_connector->dot_crawl)
-		drm_property_destroy(dev, intel_sdvo_connector->dot_crawl);
-	if (intel_sdvo_connector->brightness)
-		drm_property_destroy(dev, intel_sdvo_connector->brightness);
-}
-
 static void intel_sdvo_destroy(struct drm_connector *connector)
 {
 	struct intel_sdvo_connector *intel_sdvo_connector = to_intel_sdvo_connector(connector);
 
-	if (intel_sdvo_connector->tv_format)
-		drm_property_destroy(connector->dev,
-				     intel_sdvo_connector->tv_format);
-
-	intel_sdvo_destroy_enhance_property(connector);
 	drm_connector_cleanup(connector);
 	kfree(intel_sdvo_connector);
 }
diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
index 07a74ef589bd..7d9c340f7693 100644
--- a/drivers/gpu/drm/i915/intel_sprite.c
+++ b/drivers/gpu/drm/i915/intel_sprite.c
@@ -37,6 +37,20 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 
+static bool
+format_is_yuv(uint32_t format)
+{
+	switch (format) {
+	case DRM_FORMAT_YUYV:
+	case DRM_FORMAT_UYVY:
+	case DRM_FORMAT_VYUY:
+	case DRM_FORMAT_YVYU:
+		return true;
+	default:
+		return false;
+	}
+}
+
 static int usecs_to_scanlines(const struct drm_display_mode *mode, int usecs)
 {
 	/* paranoia */
@@ -46,7 +60,23 @@ static int usecs_to_scanlines(const struct drm_display_mode *mode, int usecs)
 	return DIV_ROUND_UP(usecs * mode->crtc_clock, 1000 * mode->crtc_htotal);
 }
 
-static bool intel_pipe_update_start(struct intel_crtc *crtc, uint32_t *start_vbl_count)
+/**
+ * intel_pipe_update_start() - start update of a set of display registers
+ * @crtc: the crtc of which the registers are going to be updated
+ * @start_vbl_count: vblank counter return pointer used for error checking
+ *
+ * Mark the start of an update to pipe registers that should be updated
+ * atomically regarding vblank. If the next vblank will happens within
+ * the next 100 us, this function waits until the vblank passes.
+ *
+ * After a successful call to this function, interrupts will be disabled
+ * until a subsequent call to intel_pipe_update_end(). That is done to
+ * avoid random delays. The value written to @start_vbl_count should be
+ * supplied to intel_pipe_update_end() for error checking.
+ *
+ * Return: true if the call was successful
+ */
+bool intel_pipe_update_start(struct intel_crtc *crtc, uint32_t *start_vbl_count)
 {
 	struct drm_device *dev = crtc->base.dev;
 	const struct drm_display_mode *mode = &crtc->config.adjusted_mode;
@@ -56,8 +86,6 @@ static bool intel_pipe_update_start(struct intel_crtc *crtc, uint32_t *start_vbl
 	wait_queue_head_t *wq = drm_crtc_vblank_waitqueue(&crtc->base);
 	DEFINE_WAIT(wait);
 
-	WARN_ON(!drm_modeset_is_locked(&crtc->base.mutex));
-
 	vblank_start = mode->crtc_vblank_start;
 	if (mode->flags & DRM_MODE_FLAG_INTERLACE)
 		vblank_start = DIV_ROUND_UP(vblank_start, 2);
@@ -112,7 +140,16 @@ static bool intel_pipe_update_start(struct intel_crtc *crtc, uint32_t *start_vbl
 	return true;
 }
 
-static void intel_pipe_update_end(struct intel_crtc *crtc, u32 start_vbl_count)
+/**
+ * intel_pipe_update_end() - end update of a set of display registers
+ * @crtc: the crtc of which the registers were updated
+ * @start_vbl_count: start vblank counter (used for error checking)
+ *
+ * Mark the end of an update started with intel_pipe_update_start(). This
+ * re-enables interrupts and verifies the update was actually completed
+ * before a vblank using the value of @start_vbl_count.
+ */
+void intel_pipe_update_end(struct intel_crtc *crtc, u32 start_vbl_count)
 {
 	struct drm_device *dev = crtc->base.dev;
 	enum pipe pipe = crtc->pipe;
@@ -139,6 +176,226 @@ static void intel_update_primary_plane(struct intel_crtc *crtc)
 }
 
 static void
+skl_update_plane(struct drm_plane *drm_plane, struct drm_crtc *crtc,
+		 struct drm_framebuffer *fb,
+		 struct drm_i915_gem_object *obj, int crtc_x, int crtc_y,
+		 unsigned int crtc_w, unsigned int crtc_h,
+		 uint32_t x, uint32_t y,
+		 uint32_t src_w, uint32_t src_h)
+{
+	struct drm_device *dev = drm_plane->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_plane *intel_plane = to_intel_plane(drm_plane);
+	const int pipe = intel_plane->pipe;
+	const int plane = intel_plane->plane + 1;
+	u32 plane_ctl, stride;
+	int pixel_size = drm_format_plane_cpp(fb->pixel_format, 0);
+
+	plane_ctl = I915_READ(PLANE_CTL(pipe, plane));
+
+	/* Mask out pixel format bits in case we change it */
+	plane_ctl &= ~PLANE_CTL_FORMAT_MASK;
+	plane_ctl &= ~PLANE_CTL_ORDER_RGBX;
+	plane_ctl &= ~PLANE_CTL_YUV422_ORDER_MASK;
+	plane_ctl &= ~PLANE_CTL_TILED_MASK;
+	plane_ctl &= ~PLANE_CTL_ALPHA_MASK;
+	plane_ctl &= ~PLANE_CTL_ROTATE_MASK;
+
+	/* Trickle feed has to be enabled */
+	plane_ctl &= ~PLANE_CTL_TRICKLE_FEED_DISABLE;
+
+	switch (fb->pixel_format) {
+	case DRM_FORMAT_RGB565:
+		plane_ctl |= PLANE_CTL_FORMAT_RGB_565;
+		break;
+	case DRM_FORMAT_XBGR8888:
+		plane_ctl |= PLANE_CTL_FORMAT_XRGB_8888 | PLANE_CTL_ORDER_RGBX;
+		break;
+	case DRM_FORMAT_XRGB8888:
+		plane_ctl |= PLANE_CTL_FORMAT_XRGB_8888;
+		break;
+	/*
+	 * XXX: For ARBG/ABGR formats we default to expecting scanout buffers
+	 * to be already pre-multiplied. We need to add a knob (or a different
+	 * DRM_FORMAT) for user-space to configure that.
+	 */
+	case DRM_FORMAT_ABGR8888:
+		plane_ctl |= PLANE_CTL_FORMAT_XRGB_8888 |
+			     PLANE_CTL_ORDER_RGBX |
+			     PLANE_CTL_ALPHA_SW_PREMULTIPLY;
+		break;
+	case DRM_FORMAT_ARGB8888:
+		plane_ctl |= PLANE_CTL_FORMAT_XRGB_8888 |
+			     PLANE_CTL_ALPHA_SW_PREMULTIPLY;
+		break;
+	case DRM_FORMAT_YUYV:
+		plane_ctl |= PLANE_CTL_FORMAT_YUV422 | PLANE_CTL_YUV422_YUYV;
+		break;
+	case DRM_FORMAT_YVYU:
+		plane_ctl |= PLANE_CTL_FORMAT_YUV422 | PLANE_CTL_YUV422_YVYU;
+		break;
+	case DRM_FORMAT_UYVY:
+		plane_ctl |= PLANE_CTL_FORMAT_YUV422 | PLANE_CTL_YUV422_UYVY;
+		break;
+	case DRM_FORMAT_VYUY:
+		plane_ctl |= PLANE_CTL_FORMAT_YUV422 | PLANE_CTL_YUV422_VYUY;
+		break;
+	default:
+		BUG();
+	}
+
+	switch (obj->tiling_mode) {
+	case I915_TILING_NONE:
+		stride = fb->pitches[0] >> 6;
+		break;
+	case I915_TILING_X:
+		plane_ctl |= PLANE_CTL_TILED_X;
+		stride = fb->pitches[0] >> 9;
+		break;
+	default:
+		BUG();
+	}
+	if (intel_plane->rotation == BIT(DRM_ROTATE_180))
+		plane_ctl |= PLANE_CTL_ROTATE_180;
+
+	plane_ctl |= PLANE_CTL_ENABLE;
+	plane_ctl |= PLANE_CTL_PIPE_CSC_ENABLE;
+
+	intel_update_sprite_watermarks(drm_plane, crtc, src_w, src_h,
+				       pixel_size, true,
+				       src_w != crtc_w || src_h != crtc_h);
+
+	/* Sizes are 0 based */
+	src_w--;
+	src_h--;
+	crtc_w--;
+	crtc_h--;
+
+	I915_WRITE(PLANE_OFFSET(pipe, plane), (y << 16) | x);
+	I915_WRITE(PLANE_STRIDE(pipe, plane), stride);
+	I915_WRITE(PLANE_POS(pipe, plane), (crtc_y << 16) | crtc_x);
+	I915_WRITE(PLANE_SIZE(pipe, plane), (crtc_h << 16) | crtc_w);
+	I915_WRITE(PLANE_CTL(pipe, plane), plane_ctl);
+	I915_WRITE(PLANE_SURF(pipe, plane), i915_gem_obj_ggtt_offset(obj));
+	POSTING_READ(PLANE_SURF(pipe, plane));
+}
+
+static void
+skl_disable_plane(struct drm_plane *drm_plane, struct drm_crtc *crtc)
+{
+	struct drm_device *dev = drm_plane->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_plane *intel_plane = to_intel_plane(drm_plane);
+	const int pipe = intel_plane->pipe;
+	const int plane = intel_plane->plane + 1;
+
+	I915_WRITE(PLANE_CTL(pipe, plane),
+		   I915_READ(PLANE_CTL(pipe, plane)) & ~PLANE_CTL_ENABLE);
+
+	/* Activate double buffered register update */
+	I915_WRITE(PLANE_CTL(pipe, plane), 0);
+	POSTING_READ(PLANE_CTL(pipe, plane));
+
+	intel_update_sprite_watermarks(drm_plane, crtc, 0, 0, 0, false, false);
+}
+
+static int
+skl_update_colorkey(struct drm_plane *drm_plane,
+		    struct drm_intel_sprite_colorkey *key)
+{
+	struct drm_device *dev = drm_plane->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_plane *intel_plane = to_intel_plane(drm_plane);
+	const int pipe = intel_plane->pipe;
+	const int plane = intel_plane->plane;
+	u32 plane_ctl;
+
+	I915_WRITE(PLANE_KEYVAL(pipe, plane), key->min_value);
+	I915_WRITE(PLANE_KEYMAX(pipe, plane), key->max_value);
+	I915_WRITE(PLANE_KEYMSK(pipe, plane), key->channel_mask);
+
+	plane_ctl = I915_READ(PLANE_CTL(pipe, plane));
+	plane_ctl &= ~PLANE_CTL_KEY_ENABLE_MASK;
+	if (key->flags & I915_SET_COLORKEY_DESTINATION)
+		plane_ctl |= PLANE_CTL_KEY_ENABLE_DESTINATION;
+	else if (key->flags & I915_SET_COLORKEY_SOURCE)
+		plane_ctl |= PLANE_CTL_KEY_ENABLE_SOURCE;
+	I915_WRITE(PLANE_CTL(pipe, plane), plane_ctl);
+
+	POSTING_READ(PLANE_CTL(pipe, plane));
+
+	return 0;
+}
+
+static void
+skl_get_colorkey(struct drm_plane *drm_plane,
+		 struct drm_intel_sprite_colorkey *key)
+{
+	struct drm_device *dev = drm_plane->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_plane *intel_plane = to_intel_plane(drm_plane);
+	const int pipe = intel_plane->pipe;
+	const int plane = intel_plane->plane;
+	u32 plane_ctl;
+
+	key->min_value = I915_READ(PLANE_KEYVAL(pipe, plane));
+	key->max_value = I915_READ(PLANE_KEYMAX(pipe, plane));
+	key->channel_mask = I915_READ(PLANE_KEYMSK(pipe, plane));
+
+	plane_ctl = I915_READ(PLANE_CTL(pipe, plane));
+
+	switch (plane_ctl & PLANE_CTL_KEY_ENABLE_MASK) {
+	case PLANE_CTL_KEY_ENABLE_DESTINATION:
+		key->flags = I915_SET_COLORKEY_DESTINATION;
+		break;
+	case PLANE_CTL_KEY_ENABLE_SOURCE:
+		key->flags = I915_SET_COLORKEY_SOURCE;
+		break;
+	default:
+		key->flags = I915_SET_COLORKEY_NONE;
+	}
+}
+
+static void
+chv_update_csc(struct intel_plane *intel_plane, uint32_t format)
+{
+	struct drm_i915_private *dev_priv = intel_plane->base.dev->dev_private;
+	int plane = intel_plane->plane;
+
+	/* Seems RGB data bypasses the CSC always */
+	if (!format_is_yuv(format))
+		return;
+
+	/*
+	 * BT.601 limited range YCbCr -> full range RGB
+	 *
+	 * |r|   | 6537 4769     0|   |cr  |
+	 * |g| = |-3330 4769 -1605| x |y-64|
+	 * |b|   |    0 4769  8263|   |cb  |
+	 *
+	 * Cb and Cr apparently come in as signed already, so no
+	 * need for any offset. For Y we need to remove the offset.
+	 */
+	I915_WRITE(SPCSCYGOFF(plane), SPCSC_OOFF(0) | SPCSC_IOFF(-64));
+	I915_WRITE(SPCSCCBOFF(plane), SPCSC_OOFF(0) | SPCSC_IOFF(0));
+	I915_WRITE(SPCSCCROFF(plane), SPCSC_OOFF(0) | SPCSC_IOFF(0));
+
+	I915_WRITE(SPCSCC01(plane), SPCSC_C1(4769) | SPCSC_C0(6537));
+	I915_WRITE(SPCSCC23(plane), SPCSC_C1(-3330) | SPCSC_C0(0));
+	I915_WRITE(SPCSCC45(plane), SPCSC_C1(-1605) | SPCSC_C0(4769));
+	I915_WRITE(SPCSCC67(plane), SPCSC_C1(4769) | SPCSC_C0(0));
+	I915_WRITE(SPCSCC8(plane), SPCSC_C0(8263));
+
+	I915_WRITE(SPCSCYGICLAMP(plane), SPCSC_IMAX(940) | SPCSC_IMIN(64));
+	I915_WRITE(SPCSCCBICLAMP(plane), SPCSC_IMAX(448) | SPCSC_IMIN(-448));
+	I915_WRITE(SPCSCCRICLAMP(plane), SPCSC_IMAX(448) | SPCSC_IMIN(-448));
+
+	I915_WRITE(SPCSCYGOCLAMP(plane), SPCSC_OMAX(1023) | SPCSC_OMIN(0));
+	I915_WRITE(SPCSCCBOCLAMP(plane), SPCSC_OMAX(1023) | SPCSC_OMIN(0));
+	I915_WRITE(SPCSCCROCLAMP(plane), SPCSC_OMAX(1023) | SPCSC_OMIN(0));
+}
+
+static void
 vlv_update_plane(struct drm_plane *dplane, struct drm_crtc *crtc,
 		 struct drm_framebuffer *fb,
 		 struct drm_i915_gem_object *obj, int crtc_x, int crtc_y,
@@ -249,6 +506,9 @@ vlv_update_plane(struct drm_plane *dplane, struct drm_crtc *crtc,
 
 	intel_update_primary_plane(intel_crtc);
 
+	if (IS_CHERRYVIEW(dev) && pipe == PIPE_B)
+		chv_update_csc(intel_plane, fb->pixel_format);
+
 	I915_WRITE(SPSTRIDE(pipe, plane), fb->pitches[0]);
 	I915_WRITE(SPPOS(pipe, plane), (crtc_y << 16) | crtc_x);
 
@@ -257,6 +517,8 @@ vlv_update_plane(struct drm_plane *dplane, struct drm_crtc *crtc,
 	else
 		I915_WRITE(SPLINOFF(pipe, plane), linear_offset);
 
+	I915_WRITE(SPCONSTALPHA(pipe, plane), 0);
+
 	I915_WRITE(SPSIZE(pipe, plane), (crtc_h << 16) | crtc_w);
 	I915_WRITE(SPCNTR(pipe, plane), sprctl);
 	I915_WRITE(SPSURF(pipe, plane), i915_gem_obj_ggtt_offset(obj) +
@@ -821,20 +1083,6 @@ ilk_get_colorkey(struct drm_plane *plane, struct drm_intel_sprite_colorkey *key)
 		key->flags = I915_SET_COLORKEY_NONE;
 }
 
-static bool
-format_is_yuv(uint32_t format)
-{
-	switch (format) {
-	case DRM_FORMAT_YUYV:
-	case DRM_FORMAT_UYVY:
-	case DRM_FORMAT_VYUY:
-	case DRM_FORMAT_YVYU:
-		return true;
-	default:
-		return false;
-	}
-}
-
 static bool colorkey_enabled(struct intel_plane *intel_plane)
 {
 	struct drm_intel_sprite_colorkey key;
@@ -845,57 +1093,23 @@ static bool colorkey_enabled(struct intel_plane *intel_plane)
 }
 
 static int
-intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
-		   struct drm_framebuffer *fb, int crtc_x, int crtc_y,
-		   unsigned int crtc_w, unsigned int crtc_h,
-		   uint32_t src_x, uint32_t src_y,
-		   uint32_t src_w, uint32_t src_h)
+intel_check_sprite_plane(struct drm_plane *plane,
+			 struct intel_plane_state *state)
 {
-	struct drm_device *dev = plane->dev;
-	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	struct intel_crtc *intel_crtc = to_intel_crtc(state->crtc);
 	struct intel_plane *intel_plane = to_intel_plane(plane);
-	enum pipe pipe = intel_crtc->pipe;
-	struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb);
-	struct drm_i915_gem_object *obj = intel_fb->obj;
-	struct drm_i915_gem_object *old_obj = intel_plane->obj;
-	int ret;
-	bool primary_enabled;
-	bool visible;
+	struct drm_framebuffer *fb = state->fb;
+	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
+	int crtc_x, crtc_y;
+	unsigned int crtc_w, crtc_h;
+	uint32_t src_x, src_y, src_w, src_h;
+	struct drm_rect *src = &state->src;
+	struct drm_rect *dst = &state->dst;
+	struct drm_rect *orig_src = &state->orig_src;
+	const struct drm_rect *clip = &state->clip;
 	int hscale, vscale;
 	int max_scale, min_scale;
 	int pixel_size = drm_format_plane_cpp(fb->pixel_format, 0);
-	struct drm_rect src = {
-		/* sample coordinates in 16.16 fixed point */
-		.x1 = src_x,
-		.x2 = src_x + src_w,
-		.y1 = src_y,
-		.y2 = src_y + src_h,
-	};
-	struct drm_rect dst = {
-		/* integer pixels */
-		.x1 = crtc_x,
-		.x2 = crtc_x + crtc_w,
-		.y1 = crtc_y,
-		.y2 = crtc_y + crtc_h,
-	};
-	const struct drm_rect clip = {
-		.x2 = intel_crtc->active ? intel_crtc->config.pipe_src_w : 0,
-		.y2 = intel_crtc->active ? intel_crtc->config.pipe_src_h : 0,
-	};
-	const struct {
-		int crtc_x, crtc_y;
-		unsigned int crtc_w, crtc_h;
-		uint32_t src_x, src_y, src_w, src_h;
-	} orig = {
-		.crtc_x = crtc_x,
-		.crtc_y = crtc_y,
-		.crtc_w = crtc_w,
-		.crtc_h = crtc_h,
-		.src_x = src_x,
-		.src_y = src_y,
-		.src_w = src_w,
-		.src_h = src_h,
-	};
 
 	/* Don't modify another pipe's plane */
 	if (intel_plane->pipe != intel_crtc->pipe) {
@@ -927,55 +1141,55 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
 	max_scale = intel_plane->max_downscale << 16;
 	min_scale = intel_plane->can_scale ? 1 : (1 << 16);
 
-	drm_rect_rotate(&src, fb->width << 16, fb->height << 16,
+	drm_rect_rotate(src, fb->width << 16, fb->height << 16,
 			intel_plane->rotation);
 
-	hscale = drm_rect_calc_hscale_relaxed(&src, &dst, min_scale, max_scale);
+	hscale = drm_rect_calc_hscale_relaxed(src, dst, min_scale, max_scale);
 	BUG_ON(hscale < 0);
 
-	vscale = drm_rect_calc_vscale_relaxed(&src, &dst, min_scale, max_scale);
+	vscale = drm_rect_calc_vscale_relaxed(src, dst, min_scale, max_scale);
 	BUG_ON(vscale < 0);
 
-	visible = drm_rect_clip_scaled(&src, &dst, &clip, hscale, vscale);
+	state->visible =  drm_rect_clip_scaled(src, dst, clip, hscale, vscale);
 
-	crtc_x = dst.x1;
-	crtc_y = dst.y1;
-	crtc_w = drm_rect_width(&dst);
-	crtc_h = drm_rect_height(&dst);
+	crtc_x = dst->x1;
+	crtc_y = dst->y1;
+	crtc_w = drm_rect_width(dst);
+	crtc_h = drm_rect_height(dst);
 
-	if (visible) {
+	if (state->visible) {
 		/* check again in case clipping clamped the results */
-		hscale = drm_rect_calc_hscale(&src, &dst, min_scale, max_scale);
+		hscale = drm_rect_calc_hscale(src, dst, min_scale, max_scale);
 		if (hscale < 0) {
 			DRM_DEBUG_KMS("Horizontal scaling factor out of limits\n");
-			drm_rect_debug_print(&src, true);
-			drm_rect_debug_print(&dst, false);
+			drm_rect_debug_print(src, true);
+			drm_rect_debug_print(dst, false);
 
 			return hscale;
 		}
 
-		vscale = drm_rect_calc_vscale(&src, &dst, min_scale, max_scale);
+		vscale = drm_rect_calc_vscale(src, dst, min_scale, max_scale);
 		if (vscale < 0) {
 			DRM_DEBUG_KMS("Vertical scaling factor out of limits\n");
-			drm_rect_debug_print(&src, true);
-			drm_rect_debug_print(&dst, false);
+			drm_rect_debug_print(src, true);
+			drm_rect_debug_print(dst, false);
 
 			return vscale;
 		}
 
 		/* Make the source viewport size an exact multiple of the scaling factors. */
-		drm_rect_adjust_size(&src,
-				     drm_rect_width(&dst) * hscale - drm_rect_width(&src),
-				     drm_rect_height(&dst) * vscale - drm_rect_height(&src));
+		drm_rect_adjust_size(src,
+				     drm_rect_width(dst) * hscale - drm_rect_width(src),
+				     drm_rect_height(dst) * vscale - drm_rect_height(src));
 
-		drm_rect_rotate_inv(&src, fb->width << 16, fb->height << 16,
+		drm_rect_rotate_inv(src, fb->width << 16, fb->height << 16,
 				    intel_plane->rotation);
 
 		/* sanity check to make sure the src viewport wasn't enlarged */
-		WARN_ON(src.x1 < (int) src_x ||
-			src.y1 < (int) src_y ||
-			src.x2 > (int) (src_x + src_w) ||
-			src.y2 > (int) (src_y + src_h));
+		WARN_ON(src->x1 < (int) orig_src->x1 ||
+			src->y1 < (int) orig_src->y1 ||
+			src->x2 > (int) orig_src->x2 ||
+			src->y2 > (int) orig_src->y2);
 
 		/*
 		 * Hardware doesn't handle subpixel coordinates.
@@ -983,10 +1197,10 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
 		 * increase the source viewport size, because that could
 		 * push the downscaling factor out of bounds.
 		 */
-		src_x = src.x1 >> 16;
-		src_w = drm_rect_width(&src) >> 16;
-		src_y = src.y1 >> 16;
-		src_h = drm_rect_height(&src) >> 16;
+		src_x = src->x1 >> 16;
+		src_w = drm_rect_width(src) >> 16;
+		src_y = src->y1 >> 16;
+		src_h = drm_rect_height(src) >> 16;
 
 		if (format_is_yuv(fb->pixel_format)) {
 			src_x &= ~1;
@@ -1000,12 +1214,12 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
 				crtc_w &= ~1;
 
 			if (crtc_w == 0)
-				visible = false;
+				state->visible = false;
 		}
 	}
 
 	/* Check size restrictions when scaling */
-	if (visible && (src_w != crtc_w || src_h != crtc_h)) {
+	if (state->visible && (src_w != crtc_w || src_h != crtc_h)) {
 		unsigned int width_bytes;
 
 		WARN_ON(!intel_plane->can_scale);
@@ -1013,12 +1227,13 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
 		/* FIXME interlacing min height is 6 */
 
 		if (crtc_w < 3 || crtc_h < 3)
-			visible = false;
+			state->visible = false;
 
 		if (src_w < 3 || src_h < 3)
-			visible = false;
+			state->visible = false;
 
-		width_bytes = ((src_x * pixel_size) & 63) + src_w * pixel_size;
+		width_bytes = ((src_x * pixel_size) & 63) +
+					src_w * pixel_size;
 
 		if (src_w > 2048 || src_h > 2048 ||
 		    width_bytes > 4096 || fb->pitches[0] > 4096) {
@@ -1027,42 +1242,90 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
 		}
 	}
 
-	dst.x1 = crtc_x;
-	dst.x2 = crtc_x + crtc_w;
-	dst.y1 = crtc_y;
-	dst.y2 = crtc_y + crtc_h;
+	if (state->visible) {
+		src->x1 = src_x;
+		src->x2 = src_x + src_w;
+		src->y1 = src_y;
+		src->y2 = src_y + src_h;
+	}
 
-	/*
-	 * If the sprite is completely covering the primary plane,
-	 * we can disable the primary and save power.
-	 */
-	primary_enabled = !drm_rect_equals(&dst, &clip) || colorkey_enabled(intel_plane);
-	WARN_ON(!primary_enabled && !visible && intel_crtc->active);
+	dst->x1 = crtc_x;
+	dst->x2 = crtc_x + crtc_w;
+	dst->y1 = crtc_y;
+	dst->y2 = crtc_y + crtc_h;
 
-	mutex_lock(&dev->struct_mutex);
+	return 0;
+}
 
-	/* Note that this will apply the VT-d workaround for scanouts,
-	 * which is more restrictive than required for sprites. (The
-	 * primary plane requires 256KiB alignment with 64 PTE padding,
-	 * the sprite planes only require 128KiB alignment and 32 PTE padding.
-	 */
-	ret = intel_pin_and_fence_fb_obj(dev, obj, NULL);
+static int
+intel_prepare_sprite_plane(struct drm_plane *plane,
+			   struct intel_plane_state *state)
+{
+	struct drm_device *dev = plane->dev;
+	struct drm_crtc *crtc = state->crtc;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	struct intel_plane *intel_plane = to_intel_plane(plane);
+	enum pipe pipe = intel_crtc->pipe;
+	struct drm_framebuffer *fb = state->fb;
+	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
+	struct drm_i915_gem_object *old_obj = intel_plane->obj;
+	int ret;
 
-	i915_gem_track_fb(old_obj, obj,
-			  INTEL_FRONTBUFFER_SPRITE(pipe));
-	mutex_unlock(&dev->struct_mutex);
+	if (old_obj != obj) {
+		mutex_lock(&dev->struct_mutex);
 
-	if (ret)
-		return ret;
+		/* Note that this will apply the VT-d workaround for scanouts,
+		 * which is more restrictive than required for sprites. (The
+		 * primary plane requires 256KiB alignment with 64 PTE padding,
+		 * the sprite planes only require 128KiB alignment and 32 PTE
+		 * padding.
+		 */
+		ret = intel_pin_and_fence_fb_obj(plane, fb, NULL);
+		if (ret == 0)
+			i915_gem_track_fb(old_obj, obj,
+					  INTEL_FRONTBUFFER_SPRITE(pipe));
+		mutex_unlock(&dev->struct_mutex);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
 
-	intel_plane->crtc_x = orig.crtc_x;
-	intel_plane->crtc_y = orig.crtc_y;
-	intel_plane->crtc_w = orig.crtc_w;
-	intel_plane->crtc_h = orig.crtc_h;
-	intel_plane->src_x = orig.src_x;
-	intel_plane->src_y = orig.src_y;
-	intel_plane->src_w = orig.src_w;
-	intel_plane->src_h = orig.src_h;
+static void
+intel_commit_sprite_plane(struct drm_plane *plane,
+			  struct intel_plane_state *state)
+{
+	struct drm_device *dev = plane->dev;
+	struct drm_crtc *crtc = state->crtc;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	struct intel_plane *intel_plane = to_intel_plane(plane);
+	enum pipe pipe = intel_crtc->pipe;
+	struct drm_framebuffer *fb = state->fb;
+	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
+	struct drm_i915_gem_object *old_obj = intel_plane->obj;
+	int crtc_x, crtc_y;
+	unsigned int crtc_w, crtc_h;
+	uint32_t src_x, src_y, src_w, src_h;
+	struct drm_rect *dst = &state->dst;
+	const struct drm_rect *clip = &state->clip;
+	bool primary_enabled;
+
+	/*
+	 * If the sprite is completely covering the primary plane,
+	 * we can disable the primary and save power.
+	 */
+	primary_enabled = !drm_rect_equals(dst, clip) || colorkey_enabled(intel_plane);
+	WARN_ON(!primary_enabled && !state->visible && intel_crtc->active);
+
+	intel_plane->crtc_x = state->orig_dst.x1;
+	intel_plane->crtc_y = state->orig_dst.y1;
+	intel_plane->crtc_w = drm_rect_width(&state->orig_dst);
+	intel_plane->crtc_h = drm_rect_height(&state->orig_dst);
+	intel_plane->src_x = state->orig_src.x1;
+	intel_plane->src_y = state->orig_src.y1;
+	intel_plane->src_w = drm_rect_width(&state->orig_src);
+	intel_plane->src_h = drm_rect_height(&state->orig_src);
 	intel_plane->obj = obj;
 
 	if (intel_crtc->active) {
@@ -1076,12 +1339,22 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
 		if (primary_was_enabled && !primary_enabled)
 			intel_pre_disable_primary(crtc);
 
-		if (visible)
+		if (state->visible) {
+			crtc_x = state->dst.x1;
+			crtc_y = state->dst.y1;
+			crtc_w = drm_rect_width(&state->dst);
+			crtc_h = drm_rect_height(&state->dst);
+			src_x = state->src.x1;
+			src_y = state->src.y1;
+			src_w = drm_rect_width(&state->src);
+			src_h = drm_rect_height(&state->src);
 			intel_plane->update_plane(plane, crtc, fb, obj,
 						  crtc_x, crtc_y, crtc_w, crtc_h,
 						  src_x, src_y, src_w, src_h);
-		else
+		} else {
 			intel_plane->disable_plane(plane, crtc);
+		}
+
 
 		intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_SPRITE(pipe));
 
@@ -1090,21 +1363,65 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
 	}
 
 	/* Unpin old obj after new one is active to avoid ugliness */
-	if (old_obj) {
+	if (old_obj && old_obj != obj) {
+
 		/*
 		 * It's fairly common to simply update the position of
 		 * an existing object.  In that case, we don't need to
 		 * wait for vblank to avoid ugliness, we only need to
 		 * do the pin & ref bookkeeping.
 		 */
-		if (old_obj != obj && intel_crtc->active)
+		if (intel_crtc->active)
 			intel_wait_for_vblank(dev, intel_crtc->pipe);
 
 		mutex_lock(&dev->struct_mutex);
 		intel_unpin_fb_obj(old_obj);
 		mutex_unlock(&dev->struct_mutex);
 	}
+}
+
+static int
+intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
+		   struct drm_framebuffer *fb, int crtc_x, int crtc_y,
+		   unsigned int crtc_w, unsigned int crtc_h,
+		   uint32_t src_x, uint32_t src_y,
+		   uint32_t src_w, uint32_t src_h)
+{
+	struct intel_plane_state state;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	int ret;
 
+	state.crtc = crtc;
+	state.fb = fb;
+
+	/* sample coordinates in 16.16 fixed point */
+	state.src.x1 = src_x;
+	state.src.x2 = src_x + src_w;
+	state.src.y1 = src_y;
+	state.src.y2 = src_y + src_h;
+
+	/* integer pixels */
+	state.dst.x1 = crtc_x;
+	state.dst.x2 = crtc_x + crtc_w;
+	state.dst.y1 = crtc_y;
+	state.dst.y2 = crtc_y + crtc_h;
+
+	state.clip.x1 = 0;
+	state.clip.y1 = 0;
+	state.clip.x2 = intel_crtc->active ? intel_crtc->config.pipe_src_w : 0;
+	state.clip.y2 = intel_crtc->active ? intel_crtc->config.pipe_src_h : 0;
+	state.orig_src = state.src;
+	state.orig_dst = state.dst;
+
+	ret = intel_check_sprite_plane(plane, &state);
+	if (ret)
+		return ret;
+
+	ret = intel_prepare_sprite_plane(plane, &state);
+	if (ret)
+		return ret;
+
+	intel_commit_sprite_plane(plane, &state);
 	return 0;
 }
 
@@ -1305,6 +1622,18 @@ static uint32_t vlv_plane_formats[] = {
 	DRM_FORMAT_VYUY,
 };
 
+static uint32_t skl_plane_formats[] = {
+	DRM_FORMAT_RGB565,
+	DRM_FORMAT_ABGR8888,
+	DRM_FORMAT_ARGB8888,
+	DRM_FORMAT_XBGR8888,
+	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_YUYV,
+	DRM_FORMAT_YVYU,
+	DRM_FORMAT_UYVY,
+	DRM_FORMAT_VYUY,
+};
+
 int
 intel_plane_init(struct drm_device *dev, enum pipe pipe, int plane)
 {
@@ -1368,7 +1697,21 @@ intel_plane_init(struct drm_device *dev, enum pipe pipe, int plane)
 			num_plane_formats = ARRAY_SIZE(snb_plane_formats);
 		}
 		break;
-
+	case 9:
+		/*
+		 * FIXME: Skylake planes can be scaled (with some restrictions),
+		 * but this is for another time.
+		 */
+		intel_plane->can_scale = false;
+		intel_plane->max_downscale = 1;
+		intel_plane->update_plane = skl_update_plane;
+		intel_plane->disable_plane = skl_disable_plane;
+		intel_plane->update_colorkey = skl_update_colorkey;
+		intel_plane->get_colorkey = skl_get_colorkey;
+
+		plane_formats = skl_plane_formats;
+		num_plane_formats = ARRAY_SIZE(skl_plane_formats);
+		break;
 	default:
 		kfree(intel_plane);
 		return -ENODEV;
diff --git a/drivers/gpu/drm/i915/intel_tv.c b/drivers/gpu/drm/i915/intel_tv.c
index c14341ca3ef9..6f5f59b880f5 100644
--- a/drivers/gpu/drm/i915/intel_tv.c
+++ b/drivers/gpu/drm/i915/intel_tv.c
@@ -1182,18 +1182,17 @@ intel_tv_detect_type(struct intel_tv *intel_tv,
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	struct drm_device *dev = encoder->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long irqflags;
 	u32 tv_ctl, save_tv_ctl;
 	u32 tv_dac, save_tv_dac;
 	int type;
 
 	/* Disable TV interrupts around load detect or we'll recurse */
 	if (connector->polled & DRM_CONNECTOR_POLL_HPD) {
-		spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+		spin_lock_irq(&dev_priv->irq_lock);
 		i915_disable_pipestat(dev_priv, 0,
 				      PIPE_HOTPLUG_INTERRUPT_STATUS |
 				      PIPE_HOTPLUG_TV_INTERRUPT_STATUS);
-		spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+		spin_unlock_irq(&dev_priv->irq_lock);
 	}
 
 	save_tv_dac = tv_dac = I915_READ(TV_DAC);
@@ -1266,11 +1265,11 @@ intel_tv_detect_type(struct intel_tv *intel_tv,
 
 	/* Restore interrupt config */
 	if (connector->polled & DRM_CONNECTOR_POLL_HPD) {
-		spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+		spin_lock_irq(&dev_priv->irq_lock);
 		i915_enable_pipestat(dev_priv, 0,
 				     PIPE_HOTPLUG_INTERRUPT_STATUS |
 				     PIPE_HOTPLUG_TV_INTERRUPT_STATUS);
-		spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+		spin_unlock_irq(&dev_priv->irq_lock);
 	}
 
 	return type;
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 918b76163965..46de8d75b4bf 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -43,23 +43,17 @@
 static void
 assert_device_not_suspended(struct drm_i915_private *dev_priv)
 {
-	WARN(HAS_RUNTIME_PM(dev_priv->dev) && dev_priv->pm.suspended,
-	     "Device suspended\n");
+	WARN_ONCE(HAS_RUNTIME_PM(dev_priv->dev) && dev_priv->pm.suspended,
+		  "Device suspended\n");
 }
 
 static void __gen6_gt_wait_for_thread_c0(struct drm_i915_private *dev_priv)
 {
-	u32 gt_thread_status_mask;
-
-	if (IS_HASWELL(dev_priv->dev))
-		gt_thread_status_mask = GEN6_GT_THREAD_STATUS_CORE_MASK_HSW;
-	else
-		gt_thread_status_mask = GEN6_GT_THREAD_STATUS_CORE_MASK;
-
 	/* w/a for a sporadic read returning 0 by waiting for the GT
 	 * thread to wake up.
 	 */
-	if (wait_for_atomic_us((__raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG) & gt_thread_status_mask) == 0, 500))
+	if (wait_for_atomic_us((__raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG) &
+				GEN6_GT_THREAD_STATUS_CORE_MASK) == 0, 500))
 		DRM_ERROR("GT thread status wait timed out\n");
 }
 
@@ -120,8 +114,7 @@ static void __gen7_gt_force_wake_mt_get(struct drm_i915_private *dev_priv,
 		DRM_ERROR("Timed out waiting for forcewake to ack request.\n");
 
 	/* WaRsForcewakeWaitTC0:ivb,hsw */
-	if (INTEL_INFO(dev_priv->dev)->gen < 8)
-		__gen6_gt_wait_for_thread_c0(dev_priv);
+	__gen6_gt_wait_for_thread_c0(dev_priv);
 }
 
 static void gen6_gt_check_fifodbg(struct drm_i915_private *dev_priv)
@@ -229,10 +222,6 @@ static void __vlv_force_wake_get(struct drm_i915_private *dev_priv,
 					FORCEWAKE_ACK_TIMEOUT_MS))
 			DRM_ERROR("Timed out: waiting for media to ack.\n");
 	}
-
-	/* WaRsForcewakeWaitTC0:vlv */
-	if (!IS_CHERRYVIEW(dev_priv->dev))
-		__gen6_gt_wait_for_thread_c0(dev_priv);
 }
 
 static void __vlv_force_wake_put(struct drm_i915_private *dev_priv,
@@ -299,6 +288,154 @@ static void vlv_force_wake_put(struct drm_i915_private *dev_priv, int fw_engine)
 	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
 }
 
+static void __gen9_gt_force_wake_mt_reset(struct drm_i915_private *dev_priv)
+{
+	__raw_i915_write32(dev_priv, FORCEWAKE_RENDER_GEN9,
+			_MASKED_BIT_DISABLE(0xffff));
+
+	__raw_i915_write32(dev_priv, FORCEWAKE_MEDIA_GEN9,
+			_MASKED_BIT_DISABLE(0xffff));
+
+	__raw_i915_write32(dev_priv, FORCEWAKE_BLITTER_GEN9,
+			_MASKED_BIT_DISABLE(0xffff));
+}
+
+static void
+__gen9_force_wake_get(struct drm_i915_private *dev_priv, int fw_engine)
+{
+	/* Check for Render Engine */
+	if (FORCEWAKE_RENDER & fw_engine) {
+		if (wait_for_atomic((__raw_i915_read32(dev_priv,
+						FORCEWAKE_ACK_RENDER_GEN9) &
+						FORCEWAKE_KERNEL) == 0,
+					FORCEWAKE_ACK_TIMEOUT_MS))
+			DRM_ERROR("Timed out: Render forcewake old ack to clear.\n");
+
+		__raw_i915_write32(dev_priv, FORCEWAKE_RENDER_GEN9,
+				   _MASKED_BIT_ENABLE(FORCEWAKE_KERNEL));
+
+		if (wait_for_atomic((__raw_i915_read32(dev_priv,
+						FORCEWAKE_ACK_RENDER_GEN9) &
+						FORCEWAKE_KERNEL),
+					FORCEWAKE_ACK_TIMEOUT_MS))
+			DRM_ERROR("Timed out: waiting for Render to ack.\n");
+	}
+
+	/* Check for Media Engine */
+	if (FORCEWAKE_MEDIA & fw_engine) {
+		if (wait_for_atomic((__raw_i915_read32(dev_priv,
+						FORCEWAKE_ACK_MEDIA_GEN9) &
+						FORCEWAKE_KERNEL) == 0,
+					FORCEWAKE_ACK_TIMEOUT_MS))
+			DRM_ERROR("Timed out: Media forcewake old ack to clear.\n");
+
+		__raw_i915_write32(dev_priv, FORCEWAKE_MEDIA_GEN9,
+				   _MASKED_BIT_ENABLE(FORCEWAKE_KERNEL));
+
+		if (wait_for_atomic((__raw_i915_read32(dev_priv,
+						FORCEWAKE_ACK_MEDIA_GEN9) &
+						FORCEWAKE_KERNEL),
+					FORCEWAKE_ACK_TIMEOUT_MS))
+			DRM_ERROR("Timed out: waiting for Media to ack.\n");
+	}
+
+	/* Check for Blitter Engine */
+	if (FORCEWAKE_BLITTER & fw_engine) {
+		if (wait_for_atomic((__raw_i915_read32(dev_priv,
+						FORCEWAKE_ACK_BLITTER_GEN9) &
+						FORCEWAKE_KERNEL) == 0,
+					FORCEWAKE_ACK_TIMEOUT_MS))
+			DRM_ERROR("Timed out: Blitter forcewake old ack to clear.\n");
+
+		__raw_i915_write32(dev_priv, FORCEWAKE_BLITTER_GEN9,
+				   _MASKED_BIT_ENABLE(FORCEWAKE_KERNEL));
+
+		if (wait_for_atomic((__raw_i915_read32(dev_priv,
+						FORCEWAKE_ACK_BLITTER_GEN9) &
+						FORCEWAKE_KERNEL),
+					FORCEWAKE_ACK_TIMEOUT_MS))
+			DRM_ERROR("Timed out: waiting for Blitter to ack.\n");
+	}
+}
+
+static void
+__gen9_force_wake_put(struct drm_i915_private *dev_priv, int fw_engine)
+{
+	/* Check for Render Engine */
+	if (FORCEWAKE_RENDER & fw_engine)
+		__raw_i915_write32(dev_priv, FORCEWAKE_RENDER_GEN9,
+				_MASKED_BIT_DISABLE(FORCEWAKE_KERNEL));
+
+	/* Check for Media Engine */
+	if (FORCEWAKE_MEDIA & fw_engine)
+		__raw_i915_write32(dev_priv, FORCEWAKE_MEDIA_GEN9,
+				_MASKED_BIT_DISABLE(FORCEWAKE_KERNEL));
+
+	/* Check for Blitter Engine */
+	if (FORCEWAKE_BLITTER & fw_engine)
+		__raw_i915_write32(dev_priv, FORCEWAKE_BLITTER_GEN9,
+				_MASKED_BIT_DISABLE(FORCEWAKE_KERNEL));
+}
+
+static void
+gen9_force_wake_get(struct drm_i915_private *dev_priv, int fw_engine)
+{
+	unsigned long irqflags;
+
+	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+
+	if (FORCEWAKE_RENDER & fw_engine) {
+		if (dev_priv->uncore.fw_rendercount++ == 0)
+			dev_priv->uncore.funcs.force_wake_get(dev_priv,
+							FORCEWAKE_RENDER);
+	}
+
+	if (FORCEWAKE_MEDIA & fw_engine) {
+		if (dev_priv->uncore.fw_mediacount++ == 0)
+			dev_priv->uncore.funcs.force_wake_get(dev_priv,
+							FORCEWAKE_MEDIA);
+	}
+
+	if (FORCEWAKE_BLITTER & fw_engine) {
+		if (dev_priv->uncore.fw_blittercount++ == 0)
+			dev_priv->uncore.funcs.force_wake_get(dev_priv,
+							FORCEWAKE_BLITTER);
+	}
+
+	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+}
+
+static void
+gen9_force_wake_put(struct drm_i915_private *dev_priv, int fw_engine)
+{
+	unsigned long irqflags;
+
+	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+
+	if (FORCEWAKE_RENDER & fw_engine) {
+		WARN_ON(dev_priv->uncore.fw_rendercount == 0);
+		if (--dev_priv->uncore.fw_rendercount == 0)
+			dev_priv->uncore.funcs.force_wake_put(dev_priv,
+							FORCEWAKE_RENDER);
+	}
+
+	if (FORCEWAKE_MEDIA & fw_engine) {
+		WARN_ON(dev_priv->uncore.fw_mediacount == 0);
+		if (--dev_priv->uncore.fw_mediacount == 0)
+			dev_priv->uncore.funcs.force_wake_put(dev_priv,
+							FORCEWAKE_MEDIA);
+	}
+
+	if (FORCEWAKE_BLITTER & fw_engine) {
+		WARN_ON(dev_priv->uncore.fw_blittercount == 0);
+		if (--dev_priv->uncore.fw_blittercount == 0)
+			dev_priv->uncore.funcs.force_wake_put(dev_priv,
+							FORCEWAKE_BLITTER);
+	}
+
+	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+}
+
 static void gen6_force_wake_timer(unsigned long arg)
 {
 	struct drm_i915_private *dev_priv = (void *)arg;
@@ -337,6 +474,9 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 	if (IS_IVYBRIDGE(dev) || IS_HASWELL(dev) || IS_BROADWELL(dev))
 		__gen7_gt_force_wake_mt_reset(dev_priv);
 
+	if (IS_GEN9(dev))
+		__gen9_gt_force_wake_mt_reset(dev_priv);
+
 	if (restore) { /* If reset with a user forcewake, try to restore */
 		unsigned fw = 0;
 
@@ -346,6 +486,15 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 
 			if (dev_priv->uncore.fw_mediacount)
 				fw |= FORCEWAKE_MEDIA;
+		} else if (IS_GEN9(dev)) {
+			if (dev_priv->uncore.fw_rendercount)
+				fw |= FORCEWAKE_RENDER;
+
+			if (dev_priv->uncore.fw_mediacount)
+				fw |= FORCEWAKE_MEDIA;
+
+			if (dev_priv->uncore.fw_blittercount)
+				fw |= FORCEWAKE_BLITTER;
 		} else {
 			if (dev_priv->uncore.forcewake_count)
 				fw = FORCEWAKE_ALL;
@@ -363,7 +512,8 @@ void intel_uncore_forcewake_reset(struct drm_device *dev, bool restore)
 	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
 }
 
-void intel_uncore_early_sanitize(struct drm_device *dev, bool restore_forcewake)
+static void __intel_uncore_early_sanitize(struct drm_device *dev,
+					  bool restore_forcewake)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
@@ -389,6 +539,12 @@ void intel_uncore_early_sanitize(struct drm_device *dev, bool restore_forcewake)
 	intel_uncore_forcewake_reset(dev, restore_forcewake);
 }
 
+void intel_uncore_early_sanitize(struct drm_device *dev, bool restore_forcewake)
+{
+	__intel_uncore_early_sanitize(dev, restore_forcewake);
+	i915_check_and_clear_faults(dev);
+}
+
 void intel_uncore_sanitize(struct drm_device *dev)
 {
 	/* BIOS often leaves RC6 enabled, but disable it for hw init */
@@ -410,6 +566,10 @@ void gen6_gt_force_wake_get(struct drm_i915_private *dev_priv, int fw_engine)
 
 	intel_runtime_pm_get(dev_priv);
 
+	/* Redirect to Gen9 specific routine */
+	if (IS_GEN9(dev_priv->dev))
+		return gen9_force_wake_get(dev_priv, fw_engine);
+
 	/* Redirect to VLV specific routine */
 	if (IS_VALLEYVIEW(dev_priv->dev))
 		return vlv_force_wake_get(dev_priv, fw_engine);
@@ -431,6 +591,12 @@ void gen6_gt_force_wake_put(struct drm_i915_private *dev_priv, int fw_engine)
 	if (!dev_priv->uncore.funcs.force_wake_put)
 		return;
 
+	/* Redirect to Gen9 specific routine */
+	if (IS_GEN9(dev_priv->dev)) {
+		gen9_force_wake_put(dev_priv, fw_engine);
+		goto out;
+	}
+
 	/* Redirect to VLV specific routine */
 	if (IS_VALLEYVIEW(dev_priv->dev)) {
 		vlv_force_wake_put(dev_priv, fw_engine);
@@ -504,6 +670,38 @@ void assert_force_wake_inactive(struct drm_i915_private *dev_priv)
 	 REG_RANGE((reg), 0x14000, 0x14400) || \
 	 REG_RANGE((reg), 0x22000, 0x24000))
 
+#define FORCEWAKE_GEN9_UNCORE_RANGE_OFFSET(reg) \
+	REG_RANGE((reg), 0xB00,  0x2000)
+
+#define FORCEWAKE_GEN9_RENDER_RANGE_OFFSET(reg) \
+	(REG_RANGE((reg), 0x2000, 0x2700) || \
+	 REG_RANGE((reg), 0x3000, 0x4000) || \
+	 REG_RANGE((reg), 0x5200, 0x8000) || \
+	 REG_RANGE((reg), 0x8140, 0x8160) || \
+	 REG_RANGE((reg), 0x8300, 0x8500) || \
+	 REG_RANGE((reg), 0x8C00, 0x8D00) || \
+	 REG_RANGE((reg), 0xB000, 0xB480) || \
+	 REG_RANGE((reg), 0xE000, 0xE900) || \
+	 REG_RANGE((reg), 0x24400, 0x24800))
+
+#define FORCEWAKE_GEN9_MEDIA_RANGE_OFFSET(reg) \
+	(REG_RANGE((reg), 0x8130, 0x8140) || \
+	 REG_RANGE((reg), 0x8800, 0x8A00) || \
+	 REG_RANGE((reg), 0xD000, 0xD800) || \
+	 REG_RANGE((reg), 0x12000, 0x14000) || \
+	 REG_RANGE((reg), 0x1A000, 0x1EA00) || \
+	 REG_RANGE((reg), 0x30000, 0x40000))
+
+#define FORCEWAKE_GEN9_COMMON_RANGE_OFFSET(reg) \
+	REG_RANGE((reg), 0x9400, 0x9800)
+
+#define FORCEWAKE_GEN9_BLITTER_RANGE_OFFSET(reg) \
+	((reg) < 0x40000 &&\
+	 !FORCEWAKE_GEN9_UNCORE_RANGE_OFFSET(reg) && \
+	 !FORCEWAKE_GEN9_RENDER_RANGE_OFFSET(reg) && \
+	 !FORCEWAKE_GEN9_MEDIA_RANGE_OFFSET(reg) && \
+	 !FORCEWAKE_GEN9_COMMON_RANGE_OFFSET(reg))
+
 static void
 ilk_dummy_write(struct drm_i915_private *dev_priv)
 {
@@ -634,6 +832,45 @@ chv_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) { \
 	REG_READ_FOOTER; \
 }
 
+#define SKL_NEEDS_FORCE_WAKE(dev_priv, reg)	\
+	 ((reg) < 0x40000 && !FORCEWAKE_GEN9_UNCORE_RANGE_OFFSET(reg))
+
+#define __gen9_read(x) \
+static u##x \
+gen9_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) { \
+	REG_READ_HEADER(x); \
+	if (!SKL_NEEDS_FORCE_WAKE((dev_priv), (reg))) { \
+		val = __raw_i915_read##x(dev_priv, reg); \
+	} else { \
+		unsigned fwengine = 0; \
+		if (FORCEWAKE_GEN9_RENDER_RANGE_OFFSET(reg)) { \
+			if (dev_priv->uncore.fw_rendercount == 0) \
+				fwengine = FORCEWAKE_RENDER; \
+		} else if (FORCEWAKE_GEN9_MEDIA_RANGE_OFFSET(reg)) { \
+			if (dev_priv->uncore.fw_mediacount == 0) \
+				fwengine = FORCEWAKE_MEDIA; \
+		} else if (FORCEWAKE_GEN9_COMMON_RANGE_OFFSET(reg)) { \
+			if (dev_priv->uncore.fw_rendercount == 0) \
+				fwengine |= FORCEWAKE_RENDER; \
+			if (dev_priv->uncore.fw_mediacount == 0) \
+				fwengine |= FORCEWAKE_MEDIA; \
+		} else { \
+			if (dev_priv->uncore.fw_blittercount == 0) \
+				fwengine = FORCEWAKE_BLITTER; \
+		} \
+		if (fwengine) \
+			dev_priv->uncore.funcs.force_wake_get(dev_priv, fwengine); \
+		val = __raw_i915_read##x(dev_priv, reg); \
+		if (fwengine) \
+			dev_priv->uncore.funcs.force_wake_put(dev_priv, fwengine); \
+	} \
+	REG_READ_FOOTER; \
+}
+
+__gen9_read(8)
+__gen9_read(16)
+__gen9_read(32)
+__gen9_read(64)
 __chv_read(8)
 __chv_read(16)
 __chv_read(32)
@@ -655,6 +892,7 @@ __gen4_read(16)
 __gen4_read(32)
 __gen4_read(64)
 
+#undef __gen9_read
 #undef __chv_read
 #undef __vlv_read
 #undef __gen6_read
@@ -792,6 +1030,69 @@ chv_write##x(struct drm_i915_private *dev_priv, off_t reg, u##x val, bool trace)
 	REG_WRITE_FOOTER; \
 }
 
+static const u32 gen9_shadowed_regs[] = {
+	RING_TAIL(RENDER_RING_BASE),
+	RING_TAIL(GEN6_BSD_RING_BASE),
+	RING_TAIL(VEBOX_RING_BASE),
+	RING_TAIL(BLT_RING_BASE),
+	FORCEWAKE_BLITTER_GEN9,
+	FORCEWAKE_RENDER_GEN9,
+	FORCEWAKE_MEDIA_GEN9,
+	GEN6_RPNSWREQ,
+	GEN6_RC_VIDEO_FREQ,
+	/* TODO: Other registers are not yet used */
+};
+
+static bool is_gen9_shadowed(struct drm_i915_private *dev_priv, u32 reg)
+{
+	int i;
+	for (i = 0; i < ARRAY_SIZE(gen9_shadowed_regs); i++)
+		if (reg == gen9_shadowed_regs[i])
+			return true;
+
+	return false;
+}
+
+#define __gen9_write(x) \
+static void \
+gen9_write##x(struct drm_i915_private *dev_priv, off_t reg, u##x val, \
+		bool trace) { \
+	REG_WRITE_HEADER; \
+	if (!SKL_NEEDS_FORCE_WAKE((dev_priv), (reg)) || \
+			is_gen9_shadowed(dev_priv, reg)) { \
+		__raw_i915_write##x(dev_priv, reg, val); \
+	} else { \
+		unsigned fwengine = 0; \
+		if (FORCEWAKE_GEN9_RENDER_RANGE_OFFSET(reg)) { \
+			if (dev_priv->uncore.fw_rendercount == 0) \
+				fwengine = FORCEWAKE_RENDER; \
+		} else if (FORCEWAKE_GEN9_MEDIA_RANGE_OFFSET(reg)) { \
+			if (dev_priv->uncore.fw_mediacount == 0) \
+				fwengine = FORCEWAKE_MEDIA; \
+		} else if (FORCEWAKE_GEN9_COMMON_RANGE_OFFSET(reg)) { \
+			if (dev_priv->uncore.fw_rendercount == 0) \
+				fwengine |= FORCEWAKE_RENDER; \
+			if (dev_priv->uncore.fw_mediacount == 0) \
+				fwengine |= FORCEWAKE_MEDIA; \
+		} else { \
+			if (dev_priv->uncore.fw_blittercount == 0) \
+				fwengine = FORCEWAKE_BLITTER; \
+		} \
+		if (fwengine) \
+			dev_priv->uncore.funcs.force_wake_get(dev_priv, \
+					fwengine); \
+		__raw_i915_write##x(dev_priv, reg, val); \
+		if (fwengine) \
+			dev_priv->uncore.funcs.force_wake_put(dev_priv, \
+					fwengine); \
+	} \
+	REG_WRITE_FOOTER; \
+}
+
+__gen9_write(8)
+__gen9_write(16)
+__gen9_write(32)
+__gen9_write(64)
 __chv_write(8)
 __chv_write(16)
 __chv_write(32)
@@ -817,6 +1118,7 @@ __gen4_write(16)
 __gen4_write(32)
 __gen4_write(64)
 
+#undef __gen9_write
 #undef __chv_write
 #undef __gen8_write
 #undef __hsw_write
@@ -826,6 +1128,22 @@ __gen4_write(64)
 #undef REG_WRITE_FOOTER
 #undef REG_WRITE_HEADER
 
+#define ASSIGN_WRITE_MMIO_VFUNCS(x) \
+do { \
+	dev_priv->uncore.funcs.mmio_writeb = x##_write8; \
+	dev_priv->uncore.funcs.mmio_writew = x##_write16; \
+	dev_priv->uncore.funcs.mmio_writel = x##_write32; \
+	dev_priv->uncore.funcs.mmio_writeq = x##_write64; \
+} while (0)
+
+#define ASSIGN_READ_MMIO_VFUNCS(x) \
+do { \
+	dev_priv->uncore.funcs.mmio_readb = x##_read8; \
+	dev_priv->uncore.funcs.mmio_readw = x##_read16; \
+	dev_priv->uncore.funcs.mmio_readl = x##_read32; \
+	dev_priv->uncore.funcs.mmio_readq = x##_read64; \
+} while (0)
+
 void intel_uncore_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -833,9 +1151,12 @@ void intel_uncore_init(struct drm_device *dev)
 	setup_timer(&dev_priv->uncore.force_wake_timer,
 		    gen6_force_wake_timer, (unsigned long)dev_priv);
 
-	intel_uncore_early_sanitize(dev, false);
+	__intel_uncore_early_sanitize(dev, false);
 
-	if (IS_VALLEYVIEW(dev)) {
+	if (IS_GEN9(dev)) {
+		dev_priv->uncore.funcs.force_wake_get = __gen9_force_wake_get;
+		dev_priv->uncore.funcs.force_wake_put = __gen9_force_wake_put;
+	} else if (IS_VALLEYVIEW(dev)) {
 		dev_priv->uncore.funcs.force_wake_get = __vlv_force_wake_get;
 		dev_priv->uncore.funcs.force_wake_put = __vlv_force_wake_put;
 	} else if (IS_HASWELL(dev) || IS_BROADWELL(dev)) {
@@ -881,77 +1202,52 @@ void intel_uncore_init(struct drm_device *dev)
 
 	switch (INTEL_INFO(dev)->gen) {
 	default:
+		WARN_ON(1);
+		return;
+	case 9:
+		ASSIGN_WRITE_MMIO_VFUNCS(gen9);
+		ASSIGN_READ_MMIO_VFUNCS(gen9);
+		break;
+	case 8:
 		if (IS_CHERRYVIEW(dev)) {
-			dev_priv->uncore.funcs.mmio_writeb  = chv_write8;
-			dev_priv->uncore.funcs.mmio_writew  = chv_write16;
-			dev_priv->uncore.funcs.mmio_writel  = chv_write32;
-			dev_priv->uncore.funcs.mmio_writeq  = chv_write64;
-			dev_priv->uncore.funcs.mmio_readb  = chv_read8;
-			dev_priv->uncore.funcs.mmio_readw  = chv_read16;
-			dev_priv->uncore.funcs.mmio_readl  = chv_read32;
-			dev_priv->uncore.funcs.mmio_readq  = chv_read64;
+			ASSIGN_WRITE_MMIO_VFUNCS(chv);
+			ASSIGN_READ_MMIO_VFUNCS(chv);
 
 		} else {
-			dev_priv->uncore.funcs.mmio_writeb  = gen8_write8;
-			dev_priv->uncore.funcs.mmio_writew  = gen8_write16;
-			dev_priv->uncore.funcs.mmio_writel  = gen8_write32;
-			dev_priv->uncore.funcs.mmio_writeq  = gen8_write64;
-			dev_priv->uncore.funcs.mmio_readb  = gen6_read8;
-			dev_priv->uncore.funcs.mmio_readw  = gen6_read16;
-			dev_priv->uncore.funcs.mmio_readl  = gen6_read32;
-			dev_priv->uncore.funcs.mmio_readq  = gen6_read64;
+			ASSIGN_WRITE_MMIO_VFUNCS(gen8);
+			ASSIGN_READ_MMIO_VFUNCS(gen6);
 		}
 		break;
 	case 7:
 	case 6:
 		if (IS_HASWELL(dev)) {
-			dev_priv->uncore.funcs.mmio_writeb  = hsw_write8;
-			dev_priv->uncore.funcs.mmio_writew  = hsw_write16;
-			dev_priv->uncore.funcs.mmio_writel  = hsw_write32;
-			dev_priv->uncore.funcs.mmio_writeq  = hsw_write64;
+			ASSIGN_WRITE_MMIO_VFUNCS(hsw);
 		} else {
-			dev_priv->uncore.funcs.mmio_writeb  = gen6_write8;
-			dev_priv->uncore.funcs.mmio_writew  = gen6_write16;
-			dev_priv->uncore.funcs.mmio_writel  = gen6_write32;
-			dev_priv->uncore.funcs.mmio_writeq  = gen6_write64;
+			ASSIGN_WRITE_MMIO_VFUNCS(gen6);
 		}
 
 		if (IS_VALLEYVIEW(dev)) {
-			dev_priv->uncore.funcs.mmio_readb  = vlv_read8;
-			dev_priv->uncore.funcs.mmio_readw  = vlv_read16;
-			dev_priv->uncore.funcs.mmio_readl  = vlv_read32;
-			dev_priv->uncore.funcs.mmio_readq  = vlv_read64;
+			ASSIGN_READ_MMIO_VFUNCS(vlv);
 		} else {
-			dev_priv->uncore.funcs.mmio_readb  = gen6_read8;
-			dev_priv->uncore.funcs.mmio_readw  = gen6_read16;
-			dev_priv->uncore.funcs.mmio_readl  = gen6_read32;
-			dev_priv->uncore.funcs.mmio_readq  = gen6_read64;
+			ASSIGN_READ_MMIO_VFUNCS(gen6);
 		}
 		break;
 	case 5:
-		dev_priv->uncore.funcs.mmio_writeb  = gen5_write8;
-		dev_priv->uncore.funcs.mmio_writew  = gen5_write16;
-		dev_priv->uncore.funcs.mmio_writel  = gen5_write32;
-		dev_priv->uncore.funcs.mmio_writeq  = gen5_write64;
-		dev_priv->uncore.funcs.mmio_readb  = gen5_read8;
-		dev_priv->uncore.funcs.mmio_readw  = gen5_read16;
-		dev_priv->uncore.funcs.mmio_readl  = gen5_read32;
-		dev_priv->uncore.funcs.mmio_readq  = gen5_read64;
+		ASSIGN_WRITE_MMIO_VFUNCS(gen5);
+		ASSIGN_READ_MMIO_VFUNCS(gen5);
 		break;
 	case 4:
 	case 3:
 	case 2:
-		dev_priv->uncore.funcs.mmio_writeb  = gen4_write8;
-		dev_priv->uncore.funcs.mmio_writew  = gen4_write16;
-		dev_priv->uncore.funcs.mmio_writel  = gen4_write32;
-		dev_priv->uncore.funcs.mmio_writeq  = gen4_write64;
-		dev_priv->uncore.funcs.mmio_readb  = gen4_read8;
-		dev_priv->uncore.funcs.mmio_readw  = gen4_read16;
-		dev_priv->uncore.funcs.mmio_readl  = gen4_read32;
-		dev_priv->uncore.funcs.mmio_readq  = gen4_read64;
+		ASSIGN_WRITE_MMIO_VFUNCS(gen4);
+		ASSIGN_READ_MMIO_VFUNCS(gen4);
 		break;
 	}
+
+	i915_check_and_clear_faults(dev);
 }
+#undef ASSIGN_WRITE_MMIO_VFUNCS
+#undef ASSIGN_READ_MMIO_VFUNCS
 
 void intel_uncore_fini(struct drm_device *dev)
 {
@@ -968,7 +1264,7 @@ static const struct register_whitelist {
 	/* supported gens, 0x10 for 4, 0x30 for 4 and 5, etc. */
 	uint32_t gen_bitmask;
 } whitelist[] = {
-	{ RING_TIMESTAMP(RENDER_RING_BASE), 8, GEN_RANGE(4, 8) },
+	{ RING_TIMESTAMP(RENDER_RING_BASE), 8, GEN_RANGE(4, 9) },
 };
 
 int i915_reg_read_ioctl(struct drm_device *dev,
@@ -1053,41 +1349,34 @@ int i915_get_reset_stats_ioctl(struct drm_device *dev,
 	return 0;
 }
 
-static int i965_reset_complete(struct drm_device *dev)
+static int i915_reset_complete(struct drm_device *dev)
 {
 	u8 gdrst;
-	pci_read_config_byte(dev->pdev, I965_GDRST, &gdrst);
-	return (gdrst & GRDOM_RESET_ENABLE) == 0;
+	pci_read_config_byte(dev->pdev, I915_GDRST, &gdrst);
+	return (gdrst & GRDOM_RESET_STATUS) == 0;
 }
 
-static int i965_do_reset(struct drm_device *dev)
+static int i915_do_reset(struct drm_device *dev)
 {
-	int ret;
-
-	/* FIXME: i965g/gm need a display save/restore for gpu reset. */
-	return -ENODEV;
+	/* assert reset for at least 20 usec */
+	pci_write_config_byte(dev->pdev, I915_GDRST, GRDOM_RESET_ENABLE);
+	udelay(20);
+	pci_write_config_byte(dev->pdev, I915_GDRST, 0);
 
-	/*
-	 * Set the domains we want to reset (GRDOM/bits 2 and 3) as
-	 * well as the reset bit (GR/bit 0).  Setting the GR bit
-	 * triggers the reset; when done, the hardware will clear it.
-	 */
-	pci_write_config_byte(dev->pdev, I965_GDRST,
-			      GRDOM_RENDER | GRDOM_RESET_ENABLE);
-	ret =  wait_for(i965_reset_complete(dev), 500);
-	if (ret)
-		return ret;
-
-	pci_write_config_byte(dev->pdev, I965_GDRST,
-			      GRDOM_MEDIA | GRDOM_RESET_ENABLE);
-
-	ret =  wait_for(i965_reset_complete(dev), 500);
-	if (ret)
-		return ret;
+	return wait_for(i915_reset_complete(dev), 500);
+}
 
-	pci_write_config_byte(dev->pdev, I965_GDRST, 0);
+static int g4x_reset_complete(struct drm_device *dev)
+{
+	u8 gdrst;
+	pci_read_config_byte(dev->pdev, I915_GDRST, &gdrst);
+	return (gdrst & GRDOM_RESET_ENABLE) == 0;
+}
 
-	return 0;
+static int g33_do_reset(struct drm_device *dev)
+{
+	pci_write_config_byte(dev->pdev, I915_GDRST, GRDOM_RESET_ENABLE);
+	return wait_for(g4x_reset_complete(dev), 500);
 }
 
 static int g4x_do_reset(struct drm_device *dev)
@@ -1095,9 +1384,9 @@ static int g4x_do_reset(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret;
 
-	pci_write_config_byte(dev->pdev, I965_GDRST,
+	pci_write_config_byte(dev->pdev, I915_GDRST,
 			      GRDOM_RENDER | GRDOM_RESET_ENABLE);
-	ret =  wait_for(i965_reset_complete(dev), 500);
+	ret =  wait_for(g4x_reset_complete(dev), 500);
 	if (ret)
 		return ret;
 
@@ -1105,9 +1394,9 @@ static int g4x_do_reset(struct drm_device *dev)
 	I915_WRITE(VDECCLK_GATE_D, I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
 	POSTING_READ(VDECCLK_GATE_D);
 
-	pci_write_config_byte(dev->pdev, I965_GDRST,
+	pci_write_config_byte(dev->pdev, I915_GDRST,
 			      GRDOM_MEDIA | GRDOM_RESET_ENABLE);
-	ret =  wait_for(i965_reset_complete(dev), 500);
+	ret =  wait_for(g4x_reset_complete(dev), 500);
 	if (ret)
 		return ret;
 
@@ -1115,7 +1404,7 @@ static int g4x_do_reset(struct drm_device *dev)
 	I915_WRITE(VDECCLK_GATE_D, I915_READ(VDECCLK_GATE_D) & ~VCP_UNIT_CLOCK_GATE_DISABLE);
 	POSTING_READ(VDECCLK_GATE_D);
 
-	pci_write_config_byte(dev->pdev, I965_GDRST, 0);
+	pci_write_config_byte(dev->pdev, I915_GDRST, 0);
 
 	return 0;
 }
@@ -1173,8 +1462,10 @@ int intel_gpu_reset(struct drm_device *dev)
 		return ironlake_do_reset(dev);
 	else if (IS_G4X(dev))
 		return g4x_do_reset(dev);
-	else if (IS_GEN4(dev))
-		return i965_do_reset(dev);
+	else if (IS_G33(dev))
+		return g33_do_reset(dev);
+	else if (INTEL_INFO(dev)->gen >= 3)
+		return i915_do_reset(dev);
 	else
 		return -ENODEV;
 }
diff --git a/drivers/staging/imx-drm/Kconfig b/drivers/gpu/drm/imx/Kconfig
index 82fb758a29bc..82fb758a29bc 100644
--- a/drivers/staging/imx-drm/Kconfig
+++ b/drivers/gpu/drm/imx/Kconfig
diff --git a/drivers/staging/imx-drm/Makefile b/drivers/gpu/drm/imx/Makefile
index 582c438d8cbd..582c438d8cbd 100644
--- a/drivers/staging/imx-drm/Makefile
+++ b/drivers/gpu/drm/imx/Makefile
diff --git a/drivers/staging/imx-drm/imx-drm-core.c b/drivers/gpu/drm/imx/imx-drm-core.c
index ad6173500bfc..e48b2211d2d6 100644
--- a/drivers/staging/imx-drm/imx-drm-core.c
+++ b/drivers/gpu/drm/imx/imx-drm-core.c
@@ -24,6 +24,7 @@
 #include <drm/drm_crtc_helper.h>
 #include <drm/drm_gem_cma_helper.h>
 #include <drm/drm_fb_cma_helper.h>
+#include <drm/drm_plane_helper.h>
 
 #include "imx-drm.h"
 
diff --git a/drivers/staging/imx-drm/imx-drm.h b/drivers/gpu/drm/imx/imx-drm.h
index 7453ae00c412..7453ae00c412 100644
--- a/drivers/staging/imx-drm/imx-drm.h
+++ b/drivers/gpu/drm/imx/imx-drm.h
diff --git a/drivers/staging/imx-drm/imx-hdmi.c b/drivers/gpu/drm/imx/imx-hdmi.c
index ddc53e039530..ddc53e039530 100644
--- a/drivers/staging/imx-drm/imx-hdmi.c
+++ b/drivers/gpu/drm/imx/imx-hdmi.c
diff --git a/drivers/staging/imx-drm/imx-hdmi.h b/drivers/gpu/drm/imx/imx-hdmi.h
index 39b677689db6..39b677689db6 100644
--- a/drivers/staging/imx-drm/imx-hdmi.h
+++ b/drivers/gpu/drm/imx/imx-hdmi.h
diff --git a/drivers/staging/imx-drm/imx-ldb.c b/drivers/gpu/drm/imx/imx-ldb.c
index 2638dc1671d0..2638dc1671d0 100644
--- a/drivers/staging/imx-drm/imx-ldb.c
+++ b/drivers/gpu/drm/imx/imx-ldb.c
diff --git a/drivers/staging/imx-drm/imx-tve.c b/drivers/gpu/drm/imx/imx-tve.c
index 64b54d7f996c..64b54d7f996c 100644
--- a/drivers/staging/imx-drm/imx-tve.c
+++ b/drivers/gpu/drm/imx/imx-tve.c
diff --git a/drivers/staging/imx-drm/ipuv3-crtc.c b/drivers/gpu/drm/imx/ipuv3-crtc.c
index 11e84a251773..11e84a251773 100644
--- a/drivers/staging/imx-drm/ipuv3-crtc.c
+++ b/drivers/gpu/drm/imx/ipuv3-crtc.c
diff --git a/drivers/staging/imx-drm/ipuv3-plane.c b/drivers/gpu/drm/imx/ipuv3-plane.c
index 944962b692bb..944962b692bb 100644
--- a/drivers/staging/imx-drm/ipuv3-plane.c
+++ b/drivers/gpu/drm/imx/ipuv3-plane.c
diff --git a/drivers/staging/imx-drm/ipuv3-plane.h b/drivers/gpu/drm/imx/ipuv3-plane.h
index c0aae5bcb5d4..c0aae5bcb5d4 100644
--- a/drivers/staging/imx-drm/ipuv3-plane.h
+++ b/drivers/gpu/drm/imx/ipuv3-plane.h
diff --git a/drivers/staging/imx-drm/parallel-display.c b/drivers/gpu/drm/imx/parallel-display.c
index 8a76a5c1c34b..8a76a5c1c34b 100644
--- a/drivers/staging/imx-drm/parallel-display.c
+++ b/drivers/gpu/drm/imx/parallel-display.c
diff --git a/drivers/gpu/drm/mgag200/mgag200_mode.c b/drivers/gpu/drm/mgag200/mgag200_mode.c
index 83485ab81ce8..9872ba9abf1a 100644
--- a/drivers/gpu/drm/mgag200/mgag200_mode.c
+++ b/drivers/gpu/drm/mgag200/mgag200_mode.c
@@ -15,6 +15,7 @@
 
 #include <drm/drmP.h>
 #include <drm/drm_crtc_helper.h>
+#include <drm/drm_plane_helper.h>
 
 #include "mgag200_drv.h"
 
diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
index 9d907c526c94..5b2a1ff95d3d 100644
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@ -3,6 +3,7 @@ config DRM_MSM
 	tristate "MSM DRM"
 	depends on DRM
 	depends on ARCH_QCOM || (ARM && COMPILE_TEST)
+	select REGULATOR
 	select DRM_KMS_HELPER
 	select DRM_PANEL
 	select SHMEM
diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 6283dcb96af5..143d988f8add 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -7,6 +7,7 @@ msm-y := \
 	adreno/adreno_device.o \
 	adreno/adreno_gpu.o \
 	adreno/a3xx_gpu.o \
+	adreno/a4xx_gpu.o \
 	hdmi/hdmi.o \
 	hdmi/hdmi_audio.o \
 	hdmi/hdmi_bridge.o \
@@ -24,12 +25,15 @@ msm-y := \
 	mdp/mdp4/mdp4_irq.o \
 	mdp/mdp4/mdp4_kms.o \
 	mdp/mdp4/mdp4_plane.o \
+	mdp/mdp5/mdp5_cfg.o \
+	mdp/mdp5/mdp5_ctl.o \
 	mdp/mdp5/mdp5_crtc.o \
 	mdp/mdp5/mdp5_encoder.o \
 	mdp/mdp5/mdp5_irq.o \
 	mdp/mdp5/mdp5_kms.o \
 	mdp/mdp5/mdp5_plane.o \
 	mdp/mdp5/mdp5_smp.o \
+	msm_atomic.o \
 	msm_drv.o \
 	msm_fb.o \
 	msm_gem.o \
diff --git a/drivers/gpu/drm/msm/adreno/a2xx.xml.h b/drivers/gpu/drm/msm/adreno/a2xx.xml.h
index a3104598c27f..22882cc0a573 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx.xml.h
+++ b/drivers/gpu/drm/msm/adreno/a2xx.xml.h
@@ -11,10 +11,10 @@ The rules-ng-ng source files this header was generated from are:
 - /home/robclark/src/freedreno/envytools/rnndb/adreno.xml               (    364 bytes, from 2013-11-30 14:47:15)
 - /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml  (   1453 bytes, from 2013-03-31 16:51:27)
 - /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml          (  32901 bytes, from 2014-06-02 15:21:30)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml (   9859 bytes, from 2014-06-02 15:21:30)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml    (  14960 bytes, from 2014-07-27 17:22:13)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml          (  58020 bytes, from 2014-08-01 12:22:48)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml          (  41068 bytes, from 2014-08-01 12:22:48)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml (  10551 bytes, from 2014-11-13 22:44:30)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml    (  15053 bytes, from 2014-11-09 15:45:47)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml          (  63169 bytes, from 2014-11-13 22:44:18)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml          (  49097 bytes, from 2014-11-14 15:38:00)
 
 Copyright (C) 2013-2014 by the following authors:
 - Rob Clark <robdclark@gmail.com> (robclark)
@@ -926,11 +926,11 @@ static inline uint32_t A2XX_VGT_DRAW_INITIATOR_INDEX_SIZE(enum pc_di_index_size
 #define A2XX_VGT_DRAW_INITIATOR_NOT_EOP				0x00001000
 #define A2XX_VGT_DRAW_INITIATOR_SMALL_INDEX			0x00002000
 #define A2XX_VGT_DRAW_INITIATOR_PRE_DRAW_INITIATOR_ENABLE	0x00004000
-#define A2XX_VGT_DRAW_INITIATOR_NUM_INDICES__MASK		0xffff0000
-#define A2XX_VGT_DRAW_INITIATOR_NUM_INDICES__SHIFT		16
-static inline uint32_t A2XX_VGT_DRAW_INITIATOR_NUM_INDICES(uint32_t val)
+#define A2XX_VGT_DRAW_INITIATOR_NUM_INSTANCES__MASK		0xff000000
+#define A2XX_VGT_DRAW_INITIATOR_NUM_INSTANCES__SHIFT		24
+static inline uint32_t A2XX_VGT_DRAW_INITIATOR_NUM_INSTANCES(uint32_t val)
 {
-	return ((val) << A2XX_VGT_DRAW_INITIATOR_NUM_INDICES__SHIFT) & A2XX_VGT_DRAW_INITIATOR_NUM_INDICES__MASK;
+	return ((val) << A2XX_VGT_DRAW_INITIATOR_NUM_INSTANCES__SHIFT) & A2XX_VGT_DRAW_INITIATOR_NUM_INSTANCES__MASK;
 }
 
 #define REG_A2XX_VGT_IMMED_DATA					0x000021fd
@@ -1243,13 +1243,13 @@ static inline uint32_t A2XX_CLEAR_COLOR_ALPHA(uint32_t val)
 #define A2XX_PA_SU_POINT_SIZE_HEIGHT__SHIFT			0
 static inline uint32_t A2XX_PA_SU_POINT_SIZE_HEIGHT(float val)
 {
-	return ((((uint32_t)(val * 8.0))) << A2XX_PA_SU_POINT_SIZE_HEIGHT__SHIFT) & A2XX_PA_SU_POINT_SIZE_HEIGHT__MASK;
+	return ((((uint32_t)(val * 16.0))) << A2XX_PA_SU_POINT_SIZE_HEIGHT__SHIFT) & A2XX_PA_SU_POINT_SIZE_HEIGHT__MASK;
 }
 #define A2XX_PA_SU_POINT_SIZE_WIDTH__MASK			0xffff0000
 #define A2XX_PA_SU_POINT_SIZE_WIDTH__SHIFT			16
 static inline uint32_t A2XX_PA_SU_POINT_SIZE_WIDTH(float val)
 {
-	return ((((uint32_t)(val * 8.0))) << A2XX_PA_SU_POINT_SIZE_WIDTH__SHIFT) & A2XX_PA_SU_POINT_SIZE_WIDTH__MASK;
+	return ((((uint32_t)(val * 16.0))) << A2XX_PA_SU_POINT_SIZE_WIDTH__SHIFT) & A2XX_PA_SU_POINT_SIZE_WIDTH__MASK;
 }
 
 #define REG_A2XX_PA_SU_POINT_MINMAX				0x00002281
@@ -1257,13 +1257,13 @@ static inline uint32_t A2XX_PA_SU_POINT_SIZE_WIDTH(float val)
 #define A2XX_PA_SU_POINT_MINMAX_MIN__SHIFT			0
 static inline uint32_t A2XX_PA_SU_POINT_MINMAX_MIN(float val)
 {
-	return ((((uint32_t)(val * 8.0))) << A2XX_PA_SU_POINT_MINMAX_MIN__SHIFT) & A2XX_PA_SU_POINT_MINMAX_MIN__MASK;
+	return ((((uint32_t)(val * 16.0))) << A2XX_PA_SU_POINT_MINMAX_MIN__SHIFT) & A2XX_PA_SU_POINT_MINMAX_MIN__MASK;
 }
 #define A2XX_PA_SU_POINT_MINMAX_MAX__MASK			0xffff0000
 #define A2XX_PA_SU_POINT_MINMAX_MAX__SHIFT			16
 static inline uint32_t A2XX_PA_SU_POINT_MINMAX_MAX(float val)
 {
-	return ((((uint32_t)(val * 8.0))) << A2XX_PA_SU_POINT_MINMAX_MAX__SHIFT) & A2XX_PA_SU_POINT_MINMAX_MAX__MASK;
+	return ((((uint32_t)(val * 16.0))) << A2XX_PA_SU_POINT_MINMAX_MAX__SHIFT) & A2XX_PA_SU_POINT_MINMAX_MAX__MASK;
 }
 
 #define REG_A2XX_PA_SU_LINE_CNTL				0x00002282
@@ -1271,7 +1271,7 @@ static inline uint32_t A2XX_PA_SU_POINT_MINMAX_MAX(float val)
 #define A2XX_PA_SU_LINE_CNTL_WIDTH__SHIFT			0
 static inline uint32_t A2XX_PA_SU_LINE_CNTL_WIDTH(float val)
 {
-	return ((((uint32_t)(val * 8.0))) << A2XX_PA_SU_LINE_CNTL_WIDTH__SHIFT) & A2XX_PA_SU_LINE_CNTL_WIDTH__MASK;
+	return ((((uint32_t)(val * 16.0))) << A2XX_PA_SU_LINE_CNTL_WIDTH__SHIFT) & A2XX_PA_SU_LINE_CNTL_WIDTH__MASK;
 }
 
 #define REG_A2XX_PA_SC_LINE_STIPPLE				0x00002283
diff --git a/drivers/gpu/drm/msm/adreno/a3xx.xml.h b/drivers/gpu/drm/msm/adreno/a3xx.xml.h
index 82d015279b47..109e9a263daf 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx.xml.h
+++ b/drivers/gpu/drm/msm/adreno/a3xx.xml.h
@@ -11,10 +11,10 @@ The rules-ng-ng source files this header was generated from are:
 - /home/robclark/src/freedreno/envytools/rnndb/adreno.xml               (    364 bytes, from 2013-11-30 14:47:15)
 - /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml  (   1453 bytes, from 2013-03-31 16:51:27)
 - /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml          (  32901 bytes, from 2014-06-02 15:21:30)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml (   9859 bytes, from 2014-06-02 15:21:30)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml    (  14960 bytes, from 2014-07-27 17:22:13)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml          (  58020 bytes, from 2014-08-01 12:22:48)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml          (  41068 bytes, from 2014-08-01 12:22:48)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml (  10551 bytes, from 2014-11-13 22:44:30)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml    (  15053 bytes, from 2014-11-09 15:45:47)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml          (  63169 bytes, from 2014-11-13 22:44:18)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml          (  49097 bytes, from 2014-11-14 15:38:00)
 
 Copyright (C) 2013-2014 by the following authors:
 - Rob Clark <robdclark@gmail.com> (robclark)
@@ -86,6 +86,14 @@ enum a3xx_vtx_fmt {
 	VFMT_NORM_USHORT_16_16 = 29,
 	VFMT_NORM_USHORT_16_16_16 = 30,
 	VFMT_NORM_USHORT_16_16_16_16 = 31,
+	VFMT_UINT_32 = 32,
+	VFMT_UINT_32_32 = 33,
+	VFMT_UINT_32_32_32 = 34,
+	VFMT_UINT_32_32_32_32 = 35,
+	VFMT_INT_32 = 36,
+	VFMT_INT_32_32 = 37,
+	VFMT_INT_32_32_32 = 38,
+	VFMT_INT_32_32_32_32 = 39,
 	VFMT_UBYTE_8 = 40,
 	VFMT_UBYTE_8_8 = 41,
 	VFMT_UBYTE_8_8_8 = 42,
@@ -112,7 +120,9 @@ enum a3xx_tex_fmt {
 	TFMT_NORM_USHORT_565 = 4,
 	TFMT_NORM_USHORT_5551 = 6,
 	TFMT_NORM_USHORT_4444 = 7,
+	TFMT_NORM_USHORT_Z16 = 9,
 	TFMT_NORM_UINT_X8Z24 = 10,
+	TFMT_FLOAT_Z32 = 11,
 	TFMT_NORM_UINT_NV12_UV_TILED = 17,
 	TFMT_NORM_UINT_NV12_Y_TILED = 19,
 	TFMT_NORM_UINT_NV12_UV = 21,
@@ -121,18 +131,38 @@ enum a3xx_tex_fmt {
 	TFMT_NORM_UINT_I420_U = 26,
 	TFMT_NORM_UINT_I420_V = 27,
 	TFMT_NORM_UINT_2_10_10_10 = 41,
+	TFMT_FLOAT_9_9_9_E5 = 42,
+	TFMT_FLOAT_10_11_11 = 43,
 	TFMT_NORM_UINT_A8 = 44,
 	TFMT_NORM_UINT_L8_A8 = 47,
 	TFMT_NORM_UINT_8 = 48,
 	TFMT_NORM_UINT_8_8 = 49,
 	TFMT_NORM_UINT_8_8_8 = 50,
 	TFMT_NORM_UINT_8_8_8_8 = 51,
+	TFMT_NORM_SINT_8_8 = 53,
+	TFMT_NORM_SINT_8_8_8_8 = 55,
+	TFMT_UINT_8_8 = 57,
+	TFMT_UINT_8_8_8_8 = 59,
+	TFMT_SINT_8_8 = 61,
+	TFMT_SINT_8_8_8_8 = 63,
 	TFMT_FLOAT_16 = 64,
 	TFMT_FLOAT_16_16 = 65,
 	TFMT_FLOAT_16_16_16_16 = 67,
+	TFMT_UINT_16 = 68,
+	TFMT_UINT_16_16 = 69,
+	TFMT_UINT_16_16_16_16 = 71,
+	TFMT_SINT_16 = 72,
+	TFMT_SINT_16_16 = 73,
+	TFMT_SINT_16_16_16_16 = 75,
 	TFMT_FLOAT_32 = 84,
 	TFMT_FLOAT_32_32 = 85,
 	TFMT_FLOAT_32_32_32_32 = 87,
+	TFMT_UINT_32 = 88,
+	TFMT_UINT_32_32 = 89,
+	TFMT_UINT_32_32_32_32 = 91,
+	TFMT_SINT_32 = 92,
+	TFMT_SINT_32_32 = 93,
+	TFMT_SINT_32_32_32_32 = 95,
 };
 
 enum a3xx_tex_fetchsize {
@@ -145,19 +175,34 @@ enum a3xx_tex_fetchsize {
 };
 
 enum a3xx_color_fmt {
+	RB_R5G6B5_UNORM = 0,
+	RB_R5G5B5A1_UNORM = 1,
+	RB_R4G4B4A4_UNORM = 3,
 	RB_R8G8B8_UNORM = 4,
 	RB_R8G8B8A8_UNORM = 8,
-	RB_Z16_UNORM = 12,
+	RB_R8G8B8A8_UINT = 10,
+	RB_R8G8B8A8_SINT = 11,
+	RB_R8G8_UNORM = 12,
+	RB_R8_UINT = 14,
+	RB_R8_SINT = 15,
+	RB_R10G10B10A2_UNORM = 16,
 	RB_A8_UNORM = 20,
+	RB_R8_UNORM = 21,
 	RB_R16G16B16A16_FLOAT = 27,
+	RB_R11G11B10_FLOAT = 28,
+	RB_R16_SINT = 40,
+	RB_R16G16_SINT = 41,
+	RB_R16G16B16A16_SINT = 43,
+	RB_R16_UINT = 44,
+	RB_R16G16_UINT = 45,
+	RB_R16G16B16A16_UINT = 47,
 	RB_R32G32B32A32_FLOAT = 51,
-};
-
-enum a3xx_color_swap {
-	WZYX = 0,
-	WXYZ = 1,
-	ZYXW = 2,
-	XYZW = 3,
+	RB_R32_SINT = 52,
+	RB_R32G32_SINT = 53,
+	RB_R32G32B32A32_SINT = 55,
+	RB_R32_UINT = 56,
+	RB_R32G32_UINT = 57,
+	RB_R32G32B32A32_UINT = 59,
 };
 
 enum a3xx_sp_perfcounter_select {
@@ -194,6 +239,11 @@ enum a3xx_rb_blend_opcode {
 	BLEND_MAX_DST_SRC = 4,
 };
 
+enum a3xx_intp_mode {
+	SMOOTH = 0,
+	FLAT = 1,
+};
+
 enum a3xx_tex_filter {
 	A3XX_TEX_NEAREST = 0,
 	A3XX_TEX_LINEAR = 1,
@@ -536,6 +586,10 @@ enum a3xx_tex_type {
 
 #define REG_A3XX_CP_MEQ_DATA					0x000001db
 
+#define REG_A3XX_CP_WFI_PEND_CTR				0x000001f5
+
+#define REG_A3XX_RBBM_PM_OVERRIDE2				0x0000039d
+
 #define REG_A3XX_CP_PERFCOUNTER_SELECT				0x00000445
 
 #define REG_A3XX_CP_HW_FAULT					0x0000045c
@@ -550,6 +604,12 @@ static inline uint32_t REG_A3XX_CP_PROTECT_REG(uint32_t i0) { return 0x00000460
 
 #define REG_A3XX_CP_AHB_FAULT					0x0000054d
 
+#define REG_A3XX_SQ_GPR_MANAGEMENT				0x00000d00
+
+#define REG_A3XX_SQ_INST_STORE_MANAGMENT			0x00000d02
+
+#define REG_A3XX_TP0_CHICKEN					0x00000e1e
+
 #define REG_A3XX_SP_GLOBAL_MEM_SIZE				0x00000e22
 
 #define REG_A3XX_SP_GLOBAL_MEM_ADDR				0x00000e23
@@ -632,13 +692,13 @@ static inline uint32_t A3XX_GRAS_CL_VPORT_ZSCALE(float val)
 #define A3XX_GRAS_SU_POINT_MINMAX_MIN__SHIFT			0
 static inline uint32_t A3XX_GRAS_SU_POINT_MINMAX_MIN(float val)
 {
-	return ((((uint32_t)(val * 8.0))) << A3XX_GRAS_SU_POINT_MINMAX_MIN__SHIFT) & A3XX_GRAS_SU_POINT_MINMAX_MIN__MASK;
+	return ((((uint32_t)(val * 16.0))) << A3XX_GRAS_SU_POINT_MINMAX_MIN__SHIFT) & A3XX_GRAS_SU_POINT_MINMAX_MIN__MASK;
 }
 #define A3XX_GRAS_SU_POINT_MINMAX_MAX__MASK			0xffff0000
 #define A3XX_GRAS_SU_POINT_MINMAX_MAX__SHIFT			16
 static inline uint32_t A3XX_GRAS_SU_POINT_MINMAX_MAX(float val)
 {
-	return ((((uint32_t)(val * 8.0))) << A3XX_GRAS_SU_POINT_MINMAX_MAX__SHIFT) & A3XX_GRAS_SU_POINT_MINMAX_MAX__MASK;
+	return ((((uint32_t)(val * 16.0))) << A3XX_GRAS_SU_POINT_MINMAX_MAX__SHIFT) & A3XX_GRAS_SU_POINT_MINMAX_MAX__MASK;
 }
 
 #define REG_A3XX_GRAS_SU_POINT_SIZE				0x00002069
@@ -646,7 +706,7 @@ static inline uint32_t A3XX_GRAS_SU_POINT_MINMAX_MAX(float val)
 #define A3XX_GRAS_SU_POINT_SIZE__SHIFT				0
 static inline uint32_t A3XX_GRAS_SU_POINT_SIZE(float val)
 {
-	return ((((uint32_t)(val * 8.0))) << A3XX_GRAS_SU_POINT_SIZE__SHIFT) & A3XX_GRAS_SU_POINT_SIZE__MASK;
+	return ((((int32_t)(val * 16.0))) << A3XX_GRAS_SU_POINT_SIZE__SHIFT) & A3XX_GRAS_SU_POINT_SIZE__MASK;
 }
 
 #define REG_A3XX_GRAS_SU_POLY_OFFSET_SCALE			0x0000206c
@@ -654,7 +714,7 @@ static inline uint32_t A3XX_GRAS_SU_POINT_SIZE(float val)
 #define A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL__SHIFT		0
 static inline uint32_t A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL(float val)
 {
-	return ((((uint32_t)(val * 28.0))) << A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL__SHIFT) & A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL__MASK;
+	return ((((int32_t)(val * 16384.0))) << A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL__SHIFT) & A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL__MASK;
 }
 
 #define REG_A3XX_GRAS_SU_POLY_OFFSET_OFFSET			0x0000206d
@@ -662,7 +722,7 @@ static inline uint32_t A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL(float val)
 #define A3XX_GRAS_SU_POLY_OFFSET_OFFSET__SHIFT			0
 static inline uint32_t A3XX_GRAS_SU_POLY_OFFSET_OFFSET(float val)
 {
-	return ((((uint32_t)(val * 28.0))) << A3XX_GRAS_SU_POLY_OFFSET_OFFSET__SHIFT) & A3XX_GRAS_SU_POLY_OFFSET_OFFSET__MASK;
+	return ((((int32_t)(val * 16384.0))) << A3XX_GRAS_SU_POLY_OFFSET_OFFSET__SHIFT) & A3XX_GRAS_SU_POLY_OFFSET_OFFSET__MASK;
 }
 
 #define REG_A3XX_GRAS_SU_MODE_CONTROL				0x00002070
@@ -673,7 +733,7 @@ static inline uint32_t A3XX_GRAS_SU_POLY_OFFSET_OFFSET(float val)
 #define A3XX_GRAS_SU_MODE_CONTROL_LINEHALFWIDTH__SHIFT		3
 static inline uint32_t A3XX_GRAS_SU_MODE_CONTROL_LINEHALFWIDTH(float val)
 {
-	return ((((uint32_t)(val * 4.0))) << A3XX_GRAS_SU_MODE_CONTROL_LINEHALFWIDTH__SHIFT) & A3XX_GRAS_SU_MODE_CONTROL_LINEHALFWIDTH__MASK;
+	return ((((int32_t)(val * 4.0))) << A3XX_GRAS_SU_MODE_CONTROL_LINEHALFWIDTH__SHIFT) & A3XX_GRAS_SU_MODE_CONTROL_LINEHALFWIDTH__MASK;
 }
 #define A3XX_GRAS_SU_MODE_CONTROL_POLY_OFFSET			0x00000800
 
@@ -863,6 +923,7 @@ static inline uint32_t A3XX_RB_MRT_BUF_INFO_COLOR_SWAP(enum a3xx_color_swap val)
 {
 	return ((val) << A3XX_RB_MRT_BUF_INFO_COLOR_SWAP__SHIFT) & A3XX_RB_MRT_BUF_INFO_COLOR_SWAP__MASK;
 }
+#define A3XX_RB_MRT_BUF_INFO_COLOR_SRGB				0x00004000
 #define A3XX_RB_MRT_BUF_INFO_COLOR_BUF_PITCH__MASK		0xfffe0000
 #define A3XX_RB_MRT_BUF_INFO_COLOR_BUF_PITCH__SHIFT		17
 static inline uint32_t A3XX_RB_MRT_BUF_INFO_COLOR_BUF_PITCH(uint32_t val)
@@ -1001,6 +1062,7 @@ static inline uint32_t A3XX_RB_COPY_CONTROL_FASTCLEAR(uint32_t val)
 {
 	return ((val) << A3XX_RB_COPY_CONTROL_FASTCLEAR__SHIFT) & A3XX_RB_COPY_CONTROL_FASTCLEAR__MASK;
 }
+#define A3XX_RB_COPY_CONTROL_UNK12				0x00001000
 #define A3XX_RB_COPY_CONTROL_GMEM_BASE__MASK			0xffffc000
 #define A3XX_RB_COPY_CONTROL_GMEM_BASE__SHIFT			14
 static inline uint32_t A3XX_RB_COPY_CONTROL_GMEM_BASE(uint32_t val)
@@ -1079,7 +1141,7 @@ static inline uint32_t A3XX_RB_DEPTH_CONTROL_ZFUNC(enum adreno_compare_func val)
 #define REG_A3XX_RB_DEPTH_CLEAR					0x00002101
 
 #define REG_A3XX_RB_DEPTH_INFO					0x00002102
-#define A3XX_RB_DEPTH_INFO_DEPTH_FORMAT__MASK			0x00000001
+#define A3XX_RB_DEPTH_INFO_DEPTH_FORMAT__MASK			0x00000003
 #define A3XX_RB_DEPTH_INFO_DEPTH_FORMAT__SHIFT			0
 static inline uint32_t A3XX_RB_DEPTH_INFO_DEPTH_FORMAT(enum adreno_rb_depth_format val)
 {
@@ -1265,6 +1327,7 @@ static inline uint32_t A3XX_PC_PRIM_VTX_CNTL_POLYMODE_BACK_PTYPE(enum adreno_pa_
 {
 	return ((val) << A3XX_PC_PRIM_VTX_CNTL_POLYMODE_BACK_PTYPE__SHIFT) & A3XX_PC_PRIM_VTX_CNTL_POLYMODE_BACK_PTYPE__MASK;
 }
+#define A3XX_PC_PRIM_VTX_CNTL_PRIMITIVE_RESTART			0x00100000
 #define A3XX_PC_PRIM_VTX_CNTL_PROVOKING_VTX_LAST		0x02000000
 #define A3XX_PC_PRIM_VTX_CNTL_PSIZE				0x04000000
 
@@ -1281,7 +1344,12 @@ static inline uint32_t A3XX_HLSQ_CONTROL_0_REG_FSTHREADSIZE(enum a3xx_threadsize
 #define A3XX_HLSQ_CONTROL_0_REG_SPSHADERRESTART			0x00000200
 #define A3XX_HLSQ_CONTROL_0_REG_RESERVED2			0x00000400
 #define A3XX_HLSQ_CONTROL_0_REG_CHUNKDISABLE			0x04000000
-#define A3XX_HLSQ_CONTROL_0_REG_CONSTSWITCHMODE			0x08000000
+#define A3XX_HLSQ_CONTROL_0_REG_CONSTMODE__MASK			0x08000000
+#define A3XX_HLSQ_CONTROL_0_REG_CONSTMODE__SHIFT		27
+static inline uint32_t A3XX_HLSQ_CONTROL_0_REG_CONSTMODE(uint32_t val)
+{
+	return ((val) << A3XX_HLSQ_CONTROL_0_REG_CONSTMODE__SHIFT) & A3XX_HLSQ_CONTROL_0_REG_CONSTMODE__MASK;
+}
 #define A3XX_HLSQ_CONTROL_0_REG_LAZYUPDATEDISABLE		0x10000000
 #define A3XX_HLSQ_CONTROL_0_REG_SPCONSTFULLUPDATE		0x20000000
 #define A3XX_HLSQ_CONTROL_0_REG_TPFULLUPDATE			0x40000000
@@ -1484,6 +1552,8 @@ static inline uint32_t A3XX_VFD_CONTROL_1_REGID4INST(uint32_t val)
 
 #define REG_A3XX_VFD_INDEX_OFFSET				0x00002245
 
+#define REG_A3XX_VFD_INDEX_OFFSET				0x00002245
+
 static inline uint32_t REG_A3XX_VFD_FETCH(uint32_t i0) { return 0x00002246 + 0x2*i0; }
 
 static inline uint32_t REG_A3XX_VFD_FETCH_INSTR_0(uint32_t i0) { return 0x00002246 + 0x2*i0; }
@@ -1537,6 +1607,7 @@ static inline uint32_t A3XX_VFD_DECODE_INSTR_REGID(uint32_t val)
 {
 	return ((val) << A3XX_VFD_DECODE_INSTR_REGID__SHIFT) & A3XX_VFD_DECODE_INSTR_REGID__MASK;
 }
+#define A3XX_VFD_DECODE_INSTR_INT				0x00100000
 #define A3XX_VFD_DECODE_INSTR_SWAP__MASK			0x00c00000
 #define A3XX_VFD_DECODE_INSTR_SWAP__SHIFT			22
 static inline uint32_t A3XX_VFD_DECODE_INSTR_SWAP(enum a3xx_color_swap val)
@@ -1604,6 +1675,102 @@ static inline uint32_t A3XX_VPC_PACK_NUMNONPOSVSVAR(uint32_t val)
 static inline uint32_t REG_A3XX_VPC_VARYING_INTERP(uint32_t i0) { return 0x00002282 + 0x1*i0; }
 
 static inline uint32_t REG_A3XX_VPC_VARYING_INTERP_MODE(uint32_t i0) { return 0x00002282 + 0x1*i0; }
+#define A3XX_VPC_VARYING_INTERP_MODE_C0__MASK			0x00000003
+#define A3XX_VPC_VARYING_INTERP_MODE_C0__SHIFT			0
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_C0(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_C0__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_C0__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_C1__MASK			0x0000000c
+#define A3XX_VPC_VARYING_INTERP_MODE_C1__SHIFT			2
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_C1(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_C1__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_C1__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_C2__MASK			0x00000030
+#define A3XX_VPC_VARYING_INTERP_MODE_C2__SHIFT			4
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_C2(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_C2__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_C2__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_C3__MASK			0x000000c0
+#define A3XX_VPC_VARYING_INTERP_MODE_C3__SHIFT			6
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_C3(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_C3__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_C3__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_C4__MASK			0x00000300
+#define A3XX_VPC_VARYING_INTERP_MODE_C4__SHIFT			8
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_C4(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_C4__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_C4__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_C5__MASK			0x00000c00
+#define A3XX_VPC_VARYING_INTERP_MODE_C5__SHIFT			10
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_C5(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_C5__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_C5__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_C6__MASK			0x00003000
+#define A3XX_VPC_VARYING_INTERP_MODE_C6__SHIFT			12
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_C6(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_C6__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_C6__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_C7__MASK			0x0000c000
+#define A3XX_VPC_VARYING_INTERP_MODE_C7__SHIFT			14
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_C7(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_C7__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_C7__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_C8__MASK			0x00030000
+#define A3XX_VPC_VARYING_INTERP_MODE_C8__SHIFT			16
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_C8(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_C8__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_C8__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_C9__MASK			0x000c0000
+#define A3XX_VPC_VARYING_INTERP_MODE_C9__SHIFT			18
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_C9(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_C9__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_C9__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_CA__MASK			0x00300000
+#define A3XX_VPC_VARYING_INTERP_MODE_CA__SHIFT			20
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_CA(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_CA__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_CA__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_CB__MASK			0x00c00000
+#define A3XX_VPC_VARYING_INTERP_MODE_CB__SHIFT			22
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_CB(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_CB__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_CB__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_CC__MASK			0x03000000
+#define A3XX_VPC_VARYING_INTERP_MODE_CC__SHIFT			24
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_CC(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_CC__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_CC__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_CD__MASK			0x0c000000
+#define A3XX_VPC_VARYING_INTERP_MODE_CD__SHIFT			26
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_CD(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_CD__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_CD__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_CE__MASK			0x30000000
+#define A3XX_VPC_VARYING_INTERP_MODE_CE__SHIFT			28
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_CE(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_CE__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_CE__MASK;
+}
+#define A3XX_VPC_VARYING_INTERP_MODE_CF__MASK			0xc0000000
+#define A3XX_VPC_VARYING_INTERP_MODE_CF__SHIFT			30
+static inline uint32_t A3XX_VPC_VARYING_INTERP_MODE_CF(enum a3xx_intp_mode val)
+{
+	return ((val) << A3XX_VPC_VARYING_INTERP_MODE_CF__SHIFT) & A3XX_VPC_VARYING_INTERP_MODE_CF__MASK;
+}
 
 static inline uint32_t REG_A3XX_VPC_VARYING_PS_REPL(uint32_t i0) { return 0x00002286 + 0x1*i0; }
 
@@ -1928,6 +2095,8 @@ static inline uint32_t A3XX_SP_FS_MRT_REG_REGID(uint32_t val)
 	return ((val) << A3XX_SP_FS_MRT_REG_REGID__SHIFT) & A3XX_SP_FS_MRT_REG_REGID__MASK;
 }
 #define A3XX_SP_FS_MRT_REG_HALF_PRECISION			0x00000100
+#define A3XX_SP_FS_MRT_REG_SINT					0x00000400
+#define A3XX_SP_FS_MRT_REG_UINT					0x00000800
 
 static inline uint32_t REG_A3XX_SP_FS_IMAGE_OUTPUT(uint32_t i0) { return 0x000022f4 + 0x1*i0; }
 
@@ -1947,6 +2116,8 @@ static inline uint32_t A3XX_SP_FS_LENGTH_REG_SHADERLENGTH(uint32_t val)
 	return ((val) << A3XX_SP_FS_LENGTH_REG_SHADERLENGTH__SHIFT) & A3XX_SP_FS_LENGTH_REG_SHADERLENGTH__MASK;
 }
 
+#define REG_A3XX_PA_SC_AA_CONFIG				0x00002301
+
 #define REG_A3XX_TPL1_TP_VS_TEX_OFFSET				0x00002340
 #define A3XX_TPL1_TP_VS_TEX_OFFSET_SAMPLEROFFSET__MASK		0x000000ff
 #define A3XX_TPL1_TP_VS_TEX_OFFSET_SAMPLEROFFSET__SHIFT		0
@@ -2297,11 +2468,11 @@ static inline uint32_t A3XX_VGT_DRAW_INITIATOR_INDEX_SIZE(enum pc_di_index_size
 #define A3XX_VGT_DRAW_INITIATOR_NOT_EOP				0x00001000
 #define A3XX_VGT_DRAW_INITIATOR_SMALL_INDEX			0x00002000
 #define A3XX_VGT_DRAW_INITIATOR_PRE_DRAW_INITIATOR_ENABLE	0x00004000
-#define A3XX_VGT_DRAW_INITIATOR_NUM_INDICES__MASK		0xffff0000
-#define A3XX_VGT_DRAW_INITIATOR_NUM_INDICES__SHIFT		16
-static inline uint32_t A3XX_VGT_DRAW_INITIATOR_NUM_INDICES(uint32_t val)
+#define A3XX_VGT_DRAW_INITIATOR_NUM_INSTANCES__MASK		0xff000000
+#define A3XX_VGT_DRAW_INITIATOR_NUM_INSTANCES__SHIFT		24
+static inline uint32_t A3XX_VGT_DRAW_INITIATOR_NUM_INSTANCES(uint32_t val)
 {
-	return ((val) << A3XX_VGT_DRAW_INITIATOR_NUM_INDICES__SHIFT) & A3XX_VGT_DRAW_INITIATOR_NUM_INDICES__MASK;
+	return ((val) << A3XX_VGT_DRAW_INITIATOR_NUM_INSTANCES__SHIFT) & A3XX_VGT_DRAW_INITIATOR_NUM_INSTANCES__MASK;
 }
 
 #define REG_A3XX_VGT_IMMED_DATA					0x000021fd
@@ -2347,17 +2518,23 @@ static inline uint32_t A3XX_TEX_SAMP_0_COMPARE_FUNC(enum adreno_compare_func val
 #define A3XX_TEX_SAMP_0_UNNORM_COORDS				0x80000000
 
 #define REG_A3XX_TEX_SAMP_1					0x00000001
+#define A3XX_TEX_SAMP_1_LOD_BIAS__MASK				0x000007ff
+#define A3XX_TEX_SAMP_1_LOD_BIAS__SHIFT				0
+static inline uint32_t A3XX_TEX_SAMP_1_LOD_BIAS(float val)
+{
+	return ((((int32_t)(val * 64.0))) << A3XX_TEX_SAMP_1_LOD_BIAS__SHIFT) & A3XX_TEX_SAMP_1_LOD_BIAS__MASK;
+}
 #define A3XX_TEX_SAMP_1_MAX_LOD__MASK				0x003ff000
 #define A3XX_TEX_SAMP_1_MAX_LOD__SHIFT				12
 static inline uint32_t A3XX_TEX_SAMP_1_MAX_LOD(float val)
 {
-	return ((((uint32_t)(val * 12.0))) << A3XX_TEX_SAMP_1_MAX_LOD__SHIFT) & A3XX_TEX_SAMP_1_MAX_LOD__MASK;
+	return ((((uint32_t)(val * 64.0))) << A3XX_TEX_SAMP_1_MAX_LOD__SHIFT) & A3XX_TEX_SAMP_1_MAX_LOD__MASK;
 }
 #define A3XX_TEX_SAMP_1_MIN_LOD__MASK				0xffc00000
 #define A3XX_TEX_SAMP_1_MIN_LOD__SHIFT				22
 static inline uint32_t A3XX_TEX_SAMP_1_MIN_LOD(float val)
 {
-	return ((((uint32_t)(val * 12.0))) << A3XX_TEX_SAMP_1_MIN_LOD__SHIFT) & A3XX_TEX_SAMP_1_MIN_LOD__MASK;
+	return ((((uint32_t)(val * 64.0))) << A3XX_TEX_SAMP_1_MIN_LOD__SHIFT) & A3XX_TEX_SAMP_1_MIN_LOD__MASK;
 }
 
 #define REG_A3XX_TEX_CONST_0					0x00000000
@@ -2448,6 +2625,24 @@ static inline uint32_t A3XX_TEX_CONST_2_SWAP(enum a3xx_color_swap val)
 }
 
 #define REG_A3XX_TEX_CONST_3					0x00000003
+#define A3XX_TEX_CONST_3_LAYERSZ1__MASK				0x0000000f
+#define A3XX_TEX_CONST_3_LAYERSZ1__SHIFT			0
+static inline uint32_t A3XX_TEX_CONST_3_LAYERSZ1(uint32_t val)
+{
+	return ((val >> 12) << A3XX_TEX_CONST_3_LAYERSZ1__SHIFT) & A3XX_TEX_CONST_3_LAYERSZ1__MASK;
+}
+#define A3XX_TEX_CONST_3_DEPTH__MASK				0x0ffe0000
+#define A3XX_TEX_CONST_3_DEPTH__SHIFT				17
+static inline uint32_t A3XX_TEX_CONST_3_DEPTH(uint32_t val)
+{
+	return ((val) << A3XX_TEX_CONST_3_DEPTH__SHIFT) & A3XX_TEX_CONST_3_DEPTH__MASK;
+}
+#define A3XX_TEX_CONST_3_LAYERSZ2__MASK				0xf0000000
+#define A3XX_TEX_CONST_3_LAYERSZ2__SHIFT			28
+static inline uint32_t A3XX_TEX_CONST_3_LAYERSZ2(uint32_t val)
+{
+	return ((val >> 12) << A3XX_TEX_CONST_3_LAYERSZ2__SHIFT) & A3XX_TEX_CONST_3_LAYERSZ2__MASK;
+}
 
 
 #endif /* A3XX_XML */
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 218c5b060398..b66c53bdc039 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -2,6 +2,8 @@
  * Copyright (C) 2013 Red Hat
  * Author: Rob Clark <robdclark@gmail.com>
  *
+ * Copyright (c) 2014 The Linux Foundation. All rights reserved.
+ *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 as published by
  * the Free Software Foundation.
@@ -406,6 +408,94 @@ static void a3xx_dump(struct msm_gpu *gpu)
 			gpu_read(gpu, REG_A3XX_RBBM_STATUS));
 	adreno_dump(gpu);
 }
+/* Register offset defines for A3XX */
+static const unsigned int a3xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_DEBUG, REG_AXXX_CP_DEBUG),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_RAM_WADDR, REG_AXXX_CP_ME_RAM_WADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_RAM_DATA, REG_AXXX_CP_ME_RAM_DATA),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_PFP_UCODE_DATA,
+			REG_A3XX_CP_PFP_UCODE_DATA),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_PFP_UCODE_ADDR,
+			REG_A3XX_CP_PFP_UCODE_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_WFI_PEND_CTR, REG_A3XX_CP_WFI_PEND_CTR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR_ADDR, REG_AXXX_CP_RB_RPTR_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR, REG_AXXX_CP_RB_RPTR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_WPTR, REG_AXXX_CP_RB_WPTR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_PROTECT_CTRL, REG_A3XX_CP_PROTECT_CTRL),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_CNTL, REG_AXXX_CP_ME_CNTL),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_CNTL, REG_AXXX_CP_RB_CNTL),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB1_BASE, REG_AXXX_CP_IB1_BASE),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB1_BUFSZ, REG_AXXX_CP_IB1_BUFSZ),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB2_BASE, REG_AXXX_CP_IB2_BASE),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB2_BUFSZ, REG_AXXX_CP_IB2_BUFSZ),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_TIMESTAMP, REG_AXXX_CP_SCRATCH_REG0),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_RAM_RADDR, REG_AXXX_CP_ME_RAM_RADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_SCRATCH_ADDR, REG_AXXX_SCRATCH_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_SCRATCH_UMSK, REG_AXXX_SCRATCH_UMSK),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_ROQ_ADDR, REG_A3XX_CP_ROQ_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_ROQ_DATA, REG_A3XX_CP_ROQ_DATA),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_MERCIU_ADDR, REG_A3XX_CP_MERCIU_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_MERCIU_DATA, REG_A3XX_CP_MERCIU_DATA),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_MERCIU_DATA2, REG_A3XX_CP_MERCIU_DATA2),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_MEQ_ADDR, REG_A3XX_CP_MEQ_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_MEQ_DATA, REG_A3XX_CP_MEQ_DATA),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_HW_FAULT, REG_A3XX_CP_HW_FAULT),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_PROTECT_STATUS,
+			REG_A3XX_CP_PROTECT_STATUS),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_STATUS, REG_A3XX_RBBM_STATUS),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_CTL,
+			REG_A3XX_RBBM_PERFCTR_CTL),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_CMD0,
+			REG_A3XX_RBBM_PERFCTR_LOAD_CMD0),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_CMD1,
+			REG_A3XX_RBBM_PERFCTR_LOAD_CMD1),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_PWR_1_LO,
+			REG_A3XX_RBBM_PERFCTR_PWR_1_LO),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_INT_0_MASK, REG_A3XX_RBBM_INT_0_MASK),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_INT_0_STATUS,
+			REG_A3XX_RBBM_INT_0_STATUS),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_AHB_ERROR_STATUS,
+			REG_A3XX_RBBM_AHB_ERROR_STATUS),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_AHB_CMD, REG_A3XX_RBBM_AHB_CMD),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_INT_CLEAR_CMD,
+			REG_A3XX_RBBM_INT_CLEAR_CMD),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_CLOCK_CTL, REG_A3XX_RBBM_CLOCK_CTL),
+	REG_ADRENO_DEFINE(REG_ADRENO_VPC_DEBUG_RAM_SEL,
+			REG_A3XX_VPC_VPC_DEBUG_RAM_SEL),
+	REG_ADRENO_DEFINE(REG_ADRENO_VPC_DEBUG_RAM_READ,
+			REG_A3XX_VPC_VPC_DEBUG_RAM_READ),
+	REG_ADRENO_DEFINE(REG_ADRENO_VSC_SIZE_ADDRESS,
+			REG_A3XX_VSC_SIZE_ADDRESS),
+	REG_ADRENO_DEFINE(REG_ADRENO_VFD_CONTROL_0, REG_A3XX_VFD_CONTROL_0),
+	REG_ADRENO_DEFINE(REG_ADRENO_VFD_INDEX_MAX, REG_A3XX_VFD_INDEX_MAX),
+	REG_ADRENO_DEFINE(REG_ADRENO_SP_VS_PVT_MEM_ADDR_REG,
+			REG_A3XX_SP_VS_PVT_MEM_ADDR_REG),
+	REG_ADRENO_DEFINE(REG_ADRENO_SP_FS_PVT_MEM_ADDR_REG,
+			REG_A3XX_SP_FS_PVT_MEM_ADDR_REG),
+	REG_ADRENO_DEFINE(REG_ADRENO_SP_VS_OBJ_START_REG,
+			REG_A3XX_SP_VS_OBJ_START_REG),
+	REG_ADRENO_DEFINE(REG_ADRENO_SP_FS_OBJ_START_REG,
+			REG_A3XX_SP_FS_OBJ_START_REG),
+	REG_ADRENO_DEFINE(REG_ADRENO_PA_SC_AA_CONFIG, REG_A3XX_PA_SC_AA_CONFIG),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PM_OVERRIDE2,
+			REG_A3XX_RBBM_PM_OVERRIDE2),
+	REG_ADRENO_DEFINE(REG_ADRENO_SCRATCH_REG2, REG_AXXX_CP_SCRATCH_REG2),
+	REG_ADRENO_DEFINE(REG_ADRENO_SQ_GPR_MANAGEMENT,
+			REG_A3XX_SQ_GPR_MANAGEMENT),
+	REG_ADRENO_DEFINE(REG_ADRENO_SQ_INST_STORE_MANAGMENT,
+			REG_A3XX_SQ_INST_STORE_MANAGMENT),
+	REG_ADRENO_DEFINE(REG_ADRENO_TP0_CHICKEN, REG_A3XX_TP0_CHICKEN),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_RBBM_CTL, REG_A3XX_RBBM_RBBM_CTL),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_SW_RESET_CMD,
+			REG_A3XX_RBBM_SW_RESET_CMD),
+	REG_ADRENO_DEFINE(REG_ADRENO_UCHE_INVALIDATE0,
+			REG_A3XX_UCHE_CACHE_INVALIDATE0_REG),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_VALUE_LO,
+			REG_A3XX_RBBM_PERFCTR_LOAD_VALUE_LO),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_VALUE_HI,
+			REG_A3XX_RBBM_PERFCTR_LOAD_VALUE_HI),
+};
 
 static const struct adreno_gpu_funcs funcs = {
 	.base = {
@@ -463,6 +553,7 @@ struct msm_gpu *a3xx_gpu_init(struct drm_device *dev)
 	gpu->num_perfcntrs = ARRAY_SIZE(perfcntrs);
 
 	adreno_gpu->registers = a3xx_registers;
+	adreno_gpu->reg_offsets = a3xx_register_offsets;
 
 	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs);
 	if (ret)
diff --git a/drivers/gpu/drm/msm/adreno/a4xx.xml.h b/drivers/gpu/drm/msm/adreno/a4xx.xml.h
new file mode 100644
index 000000000000..5a24c416d2dd
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a4xx.xml.h
@@ -0,0 +1,2144 @@
+#ifndef A4XX_XML
+#define A4XX_XML
+
+/* Autogenerated file, DO NOT EDIT manually!
+
+This file was generated by the rules-ng-ng headergen tool in this git repository:
+http://github.com/freedreno/envytools/
+git clone https://github.com/freedreno/envytools.git
+
+The rules-ng-ng source files this header was generated from are:
+- /home/robclark/src/freedreno/envytools/rnndb/adreno.xml               (    364 bytes, from 2013-11-30 14:47:15)
+- /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml  (   1453 bytes, from 2013-03-31 16:51:27)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml          (  32901 bytes, from 2014-06-02 15:21:30)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml (  10551 bytes, from 2014-11-13 22:44:30)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml    (  15053 bytes, from 2014-11-09 15:45:47)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml          (  63169 bytes, from 2014-11-13 22:44:18)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml          (  49097 bytes, from 2014-11-14 15:38:00)
+
+Copyright (C) 2013-2014 by the following authors:
+- Rob Clark <robdclark@gmail.com> (robclark)
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice (including the
+next paragraph) shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+*/
+
+
+enum a4xx_color_fmt {
+	RB4_A8_UNORM = 1,
+	RB4_R5G6R5_UNORM = 14,
+	RB4_Z16_UNORM = 15,
+	RB4_R8G8B8_UNORM = 25,
+	RB4_R8G8B8A8_UNORM = 26,
+};
+
+enum a4xx_tile_mode {
+	TILE4_LINEAR = 0,
+	TILE4_3 = 3,
+};
+
+enum a4xx_rb_blend_opcode {
+	BLEND_DST_PLUS_SRC = 0,
+	BLEND_SRC_MINUS_DST = 1,
+	BLEND_DST_MINUS_SRC = 2,
+	BLEND_MIN_DST_SRC = 3,
+	BLEND_MAX_DST_SRC = 4,
+};
+
+enum a4xx_vtx_fmt {
+	VFMT4_FLOAT_32 = 1,
+	VFMT4_FLOAT_32_32 = 2,
+	VFMT4_FLOAT_32_32_32 = 3,
+	VFMT4_FLOAT_32_32_32_32 = 4,
+	VFMT4_FLOAT_16 = 5,
+	VFMT4_FLOAT_16_16 = 6,
+	VFMT4_FLOAT_16_16_16 = 7,
+	VFMT4_FLOAT_16_16_16_16 = 8,
+	VFMT4_FIXED_32 = 9,
+	VFMT4_FIXED_32_32 = 10,
+	VFMT4_FIXED_32_32_32 = 11,
+	VFMT4_FIXED_32_32_32_32 = 12,
+	VFMT4_SHORT_16 = 16,
+	VFMT4_SHORT_16_16 = 17,
+	VFMT4_SHORT_16_16_16 = 18,
+	VFMT4_SHORT_16_16_16_16 = 19,
+	VFMT4_USHORT_16 = 20,
+	VFMT4_USHORT_16_16 = 21,
+	VFMT4_USHORT_16_16_16 = 22,
+	VFMT4_USHORT_16_16_16_16 = 23,
+	VFMT4_NORM_SHORT_16 = 24,
+	VFMT4_NORM_SHORT_16_16 = 25,
+	VFMT4_NORM_SHORT_16_16_16 = 26,
+	VFMT4_NORM_SHORT_16_16_16_16 = 27,
+	VFMT4_NORM_USHORT_16 = 28,
+	VFMT4_NORM_USHORT_16_16 = 29,
+	VFMT4_NORM_USHORT_16_16_16 = 30,
+	VFMT4_NORM_USHORT_16_16_16_16 = 31,
+	VFMT4_UBYTE_8 = 40,
+	VFMT4_UBYTE_8_8 = 41,
+	VFMT4_UBYTE_8_8_8 = 42,
+	VFMT4_UBYTE_8_8_8_8 = 43,
+	VFMT4_NORM_UBYTE_8 = 44,
+	VFMT4_NORM_UBYTE_8_8 = 45,
+	VFMT4_NORM_UBYTE_8_8_8 = 46,
+	VFMT4_NORM_UBYTE_8_8_8_8 = 47,
+	VFMT4_BYTE_8 = 48,
+	VFMT4_BYTE_8_8 = 49,
+	VFMT4_BYTE_8_8_8 = 50,
+	VFMT4_BYTE_8_8_8_8 = 51,
+	VFMT4_NORM_BYTE_8 = 52,
+	VFMT4_NORM_BYTE_8_8 = 53,
+	VFMT4_NORM_BYTE_8_8_8 = 54,
+	VFMT4_NORM_BYTE_8_8_8_8 = 55,
+	VFMT4_UINT_10_10_10_2 = 60,
+	VFMT4_NORM_UINT_10_10_10_2 = 61,
+	VFMT4_INT_10_10_10_2 = 62,
+	VFMT4_NORM_INT_10_10_10_2 = 63,
+};
+
+enum a4xx_tex_fmt {
+	TFMT4_NORM_USHORT_565 = 11,
+	TFMT4_NORM_USHORT_5551 = 10,
+	TFMT4_NORM_USHORT_4444 = 8,
+	TFMT4_NORM_UINT_X8Z24 = 71,
+	TFMT4_NORM_UINT_2_10_10_10 = 33,
+	TFMT4_NORM_UINT_A8 = 3,
+	TFMT4_NORM_UINT_L8_A8 = 13,
+	TFMT4_NORM_UINT_8 = 4,
+	TFMT4_NORM_UINT_8_8_8_8 = 28,
+	TFMT4_FLOAT_16 = 20,
+	TFMT4_FLOAT_16_16 = 40,
+	TFMT4_FLOAT_16_16_16_16 = 53,
+	TFMT4_FLOAT_32 = 43,
+	TFMT4_FLOAT_32_32 = 56,
+	TFMT4_FLOAT_32_32_32_32 = 63,
+};
+
+enum a4xx_depth_format {
+	DEPTH4_NONE = 0,
+	DEPTH4_16 = 1,
+	DEPTH4_24_8 = 2,
+};
+
+enum a4xx_tex_filter {
+	A4XX_TEX_NEAREST = 0,
+	A4XX_TEX_LINEAR = 1,
+};
+
+enum a4xx_tex_clamp {
+	A4XX_TEX_REPEAT = 0,
+	A4XX_TEX_CLAMP_TO_EDGE = 1,
+	A4XX_TEX_MIRROR_REPEAT = 2,
+	A4XX_TEX_CLAMP_NONE = 3,
+};
+
+enum a4xx_tex_swiz {
+	A4XX_TEX_X = 0,
+	A4XX_TEX_Y = 1,
+	A4XX_TEX_Z = 2,
+	A4XX_TEX_W = 3,
+	A4XX_TEX_ZERO = 4,
+	A4XX_TEX_ONE = 5,
+};
+
+enum a4xx_tex_type {
+	A4XX_TEX_1D = 0,
+	A4XX_TEX_2D = 1,
+	A4XX_TEX_CUBE = 2,
+	A4XX_TEX_3D = 3,
+};
+
+#define A4XX_CGC_HLSQ_EARLY_CYC__MASK				0x00700000
+#define A4XX_CGC_HLSQ_EARLY_CYC__SHIFT				20
+static inline uint32_t A4XX_CGC_HLSQ_EARLY_CYC(uint32_t val)
+{
+	return ((val) << A4XX_CGC_HLSQ_EARLY_CYC__SHIFT) & A4XX_CGC_HLSQ_EARLY_CYC__MASK;
+}
+#define A4XX_INT0_RBBM_GPU_IDLE					0x00000001
+#define A4XX_INT0_RBBM_AHB_ERROR				0x00000002
+#define A4XX_INT0_RBBM_REG_TIMEOUT				0x00000004
+#define A4XX_INT0_RBBM_ME_MS_TIMEOUT				0x00000008
+#define A4XX_INT0_RBBM_PFP_MS_TIMEOUT				0x00000010
+#define A4XX_INT0_RBBM_ATB_BUS_OVERFLOW				0x00000020
+#define A4XX_INT0_VFD_ERROR					0x00000040
+#define A4XX_INT0_CP_SW_INT					0x00000080
+#define A4XX_INT0_CP_T0_PACKET_IN_IB				0x00000100
+#define A4XX_INT0_CP_OPCODE_ERROR				0x00000200
+#define A4XX_INT0_CP_RESERVED_BIT_ERROR				0x00000400
+#define A4XX_INT0_CP_HW_FAULT					0x00000800
+#define A4XX_INT0_CP_DMA					0x00001000
+#define A4XX_INT0_CP_IB2_INT					0x00002000
+#define A4XX_INT0_CP_IB1_INT					0x00004000
+#define A4XX_INT0_CP_RB_INT					0x00008000
+#define A4XX_INT0_CP_REG_PROTECT_FAULT				0x00010000
+#define A4XX_INT0_CP_RB_DONE_TS					0x00020000
+#define A4XX_INT0_CP_VS_DONE_TS					0x00040000
+#define A4XX_INT0_CP_PS_DONE_TS					0x00080000
+#define A4XX_INT0_CACHE_FLUSH_TS				0x00100000
+#define A4XX_INT0_CP_AHB_ERROR_HALT				0x00200000
+#define A4XX_INT0_MISC_HANG_DETECT				0x01000000
+#define A4XX_INT0_UCHE_OOB_ACCESS				0x02000000
+#define REG_A4XX_RB_GMEM_BASE_ADDR				0x00000cc0
+
+#define REG_A4XX_RB_PERFCTR_RB_SEL_0				0x00000cc7
+
+#define REG_A4XX_RB_PERFCTR_RB_SEL_1				0x00000cc8
+
+#define REG_A4XX_RB_PERFCTR_RB_SEL_2				0x00000cc9
+
+#define REG_A4XX_RB_PERFCTR_RB_SEL_3				0x00000cca
+
+#define REG_A4XX_RB_PERFCTR_RB_SEL_4				0x00000ccb
+
+#define REG_A4XX_RB_PERFCTR_RB_SEL_5				0x00000ccc
+
+#define REG_A4XX_RB_PERFCTR_RB_SEL_6				0x00000ccd
+
+#define REG_A4XX_RB_PERFCTR_RB_SEL_7				0x00000cce
+
+#define REG_A4XX_RB_PERFCTR_CCU_SEL_3				0x00000cd2
+
+#define REG_A4XX_RB_FRAME_BUFFER_DIMENSION			0x00000ce0
+#define A4XX_RB_FRAME_BUFFER_DIMENSION_WIDTH__MASK		0x00003fff
+#define A4XX_RB_FRAME_BUFFER_DIMENSION_WIDTH__SHIFT		0
+static inline uint32_t A4XX_RB_FRAME_BUFFER_DIMENSION_WIDTH(uint32_t val)
+{
+	return ((val) << A4XX_RB_FRAME_BUFFER_DIMENSION_WIDTH__SHIFT) & A4XX_RB_FRAME_BUFFER_DIMENSION_WIDTH__MASK;
+}
+#define A4XX_RB_FRAME_BUFFER_DIMENSION_HEIGHT__MASK		0x3fff0000
+#define A4XX_RB_FRAME_BUFFER_DIMENSION_HEIGHT__SHIFT		16
+static inline uint32_t A4XX_RB_FRAME_BUFFER_DIMENSION_HEIGHT(uint32_t val)
+{
+	return ((val) << A4XX_RB_FRAME_BUFFER_DIMENSION_HEIGHT__SHIFT) & A4XX_RB_FRAME_BUFFER_DIMENSION_HEIGHT__MASK;
+}
+
+#define REG_A4XX_RB_CLEAR_COLOR_DW0				0x000020cc
+
+#define REG_A4XX_RB_CLEAR_COLOR_DW1				0x000020cd
+
+#define REG_A4XX_RB_CLEAR_COLOR_DW2				0x000020ce
+
+#define REG_A4XX_RB_CLEAR_COLOR_DW3				0x000020cf
+
+#define REG_A4XX_RB_MODE_CONTROL				0x000020a0
+#define A4XX_RB_MODE_CONTROL_WIDTH__MASK			0x0000003f
+#define A4XX_RB_MODE_CONTROL_WIDTH__SHIFT			0
+static inline uint32_t A4XX_RB_MODE_CONTROL_WIDTH(uint32_t val)
+{
+	return ((val >> 5) << A4XX_RB_MODE_CONTROL_WIDTH__SHIFT) & A4XX_RB_MODE_CONTROL_WIDTH__MASK;
+}
+#define A4XX_RB_MODE_CONTROL_HEIGHT__MASK			0x00003f00
+#define A4XX_RB_MODE_CONTROL_HEIGHT__SHIFT			8
+static inline uint32_t A4XX_RB_MODE_CONTROL_HEIGHT(uint32_t val)
+{
+	return ((val >> 5) << A4XX_RB_MODE_CONTROL_HEIGHT__SHIFT) & A4XX_RB_MODE_CONTROL_HEIGHT__MASK;
+}
+
+#define REG_A4XX_RB_RENDER_CONTROL				0x000020a1
+#define A4XX_RB_RENDER_CONTROL_BINNING_PASS			0x00000001
+#define A4XX_RB_RENDER_CONTROL_DISABLE_COLOR_PIPE		0x00000020
+
+#define REG_A4XX_RB_MSAA_CONTROL				0x000020a2
+#define A4XX_RB_MSAA_CONTROL_DISABLE				0x00001000
+#define A4XX_RB_MSAA_CONTROL_SAMPLES__MASK			0x0000e000
+#define A4XX_RB_MSAA_CONTROL_SAMPLES__SHIFT			13
+static inline uint32_t A4XX_RB_MSAA_CONTROL_SAMPLES(uint32_t val)
+{
+	return ((val) << A4XX_RB_MSAA_CONTROL_SAMPLES__SHIFT) & A4XX_RB_MSAA_CONTROL_SAMPLES__MASK;
+}
+
+#define REG_A4XX_RB_MSAA_CONTROL2				0x000020a3
+#define A4XX_RB_MSAA_CONTROL2_MSAA_SAMPLES__MASK		0x00000380
+#define A4XX_RB_MSAA_CONTROL2_MSAA_SAMPLES__SHIFT		7
+static inline uint32_t A4XX_RB_MSAA_CONTROL2_MSAA_SAMPLES(uint32_t val)
+{
+	return ((val) << A4XX_RB_MSAA_CONTROL2_MSAA_SAMPLES__SHIFT) & A4XX_RB_MSAA_CONTROL2_MSAA_SAMPLES__MASK;
+}
+#define A4XX_RB_MSAA_CONTROL2_VARYING				0x00001000
+
+static inline uint32_t REG_A4XX_RB_MRT(uint32_t i0) { return 0x000020a4 + 0x5*i0; }
+
+static inline uint32_t REG_A4XX_RB_MRT_CONTROL(uint32_t i0) { return 0x000020a4 + 0x5*i0; }
+#define A4XX_RB_MRT_CONTROL_READ_DEST_ENABLE			0x00000008
+#define A4XX_RB_MRT_CONTROL_BLEND				0x00000010
+#define A4XX_RB_MRT_CONTROL_BLEND2				0x00000020
+#define A4XX_RB_MRT_CONTROL_FASTCLEAR				0x00000400
+#define A4XX_RB_MRT_CONTROL_B11					0x00000800
+#define A4XX_RB_MRT_CONTROL_COMPONENT_ENABLE__MASK		0x0f000000
+#define A4XX_RB_MRT_CONTROL_COMPONENT_ENABLE__SHIFT		24
+static inline uint32_t A4XX_RB_MRT_CONTROL_COMPONENT_ENABLE(uint32_t val)
+{
+	return ((val) << A4XX_RB_MRT_CONTROL_COMPONENT_ENABLE__SHIFT) & A4XX_RB_MRT_CONTROL_COMPONENT_ENABLE__MASK;
+}
+
+static inline uint32_t REG_A4XX_RB_MRT_BUF_INFO(uint32_t i0) { return 0x000020a5 + 0x5*i0; }
+#define A4XX_RB_MRT_BUF_INFO_COLOR_FORMAT__MASK			0x0000003f
+#define A4XX_RB_MRT_BUF_INFO_COLOR_FORMAT__SHIFT		0
+static inline uint32_t A4XX_RB_MRT_BUF_INFO_COLOR_FORMAT(enum a4xx_color_fmt val)
+{
+	return ((val) << A4XX_RB_MRT_BUF_INFO_COLOR_FORMAT__SHIFT) & A4XX_RB_MRT_BUF_INFO_COLOR_FORMAT__MASK;
+}
+#define A4XX_RB_MRT_BUF_INFO_DITHER_MODE__MASK			0x00000600
+#define A4XX_RB_MRT_BUF_INFO_DITHER_MODE__SHIFT			9
+static inline uint32_t A4XX_RB_MRT_BUF_INFO_DITHER_MODE(enum adreno_rb_dither_mode val)
+{
+	return ((val) << A4XX_RB_MRT_BUF_INFO_DITHER_MODE__SHIFT) & A4XX_RB_MRT_BUF_INFO_DITHER_MODE__MASK;
+}
+#define A4XX_RB_MRT_BUF_INFO_COLOR_SWAP__MASK			0x00001800
+#define A4XX_RB_MRT_BUF_INFO_COLOR_SWAP__SHIFT			11
+static inline uint32_t A4XX_RB_MRT_BUF_INFO_COLOR_SWAP(enum a3xx_color_swap val)
+{
+	return ((val) << A4XX_RB_MRT_BUF_INFO_COLOR_SWAP__SHIFT) & A4XX_RB_MRT_BUF_INFO_COLOR_SWAP__MASK;
+}
+#define A4XX_RB_MRT_BUF_INFO_COLOR_BUF_PITCH__MASK		0x007fc000
+#define A4XX_RB_MRT_BUF_INFO_COLOR_BUF_PITCH__SHIFT		14
+static inline uint32_t A4XX_RB_MRT_BUF_INFO_COLOR_BUF_PITCH(uint32_t val)
+{
+	return ((val >> 4) << A4XX_RB_MRT_BUF_INFO_COLOR_BUF_PITCH__SHIFT) & A4XX_RB_MRT_BUF_INFO_COLOR_BUF_PITCH__MASK;
+}
+
+static inline uint32_t REG_A4XX_RB_MRT_BASE(uint32_t i0) { return 0x000020a6 + 0x5*i0; }
+
+static inline uint32_t REG_A4XX_RB_MRT_CONTROL3(uint32_t i0) { return 0x000020a7 + 0x5*i0; }
+#define A4XX_RB_MRT_CONTROL3_STRIDE__MASK			0x0001fff8
+#define A4XX_RB_MRT_CONTROL3_STRIDE__SHIFT			3
+static inline uint32_t A4XX_RB_MRT_CONTROL3_STRIDE(uint32_t val)
+{
+	return ((val) << A4XX_RB_MRT_CONTROL3_STRIDE__SHIFT) & A4XX_RB_MRT_CONTROL3_STRIDE__MASK;
+}
+
+static inline uint32_t REG_A4XX_RB_MRT_BLEND_CONTROL(uint32_t i0) { return 0x000020a8 + 0x5*i0; }
+#define A4XX_RB_MRT_BLEND_CONTROL_RGB_SRC_FACTOR__MASK		0x0000001f
+#define A4XX_RB_MRT_BLEND_CONTROL_RGB_SRC_FACTOR__SHIFT		0
+static inline uint32_t A4XX_RB_MRT_BLEND_CONTROL_RGB_SRC_FACTOR(enum adreno_rb_blend_factor val)
+{
+	return ((val) << A4XX_RB_MRT_BLEND_CONTROL_RGB_SRC_FACTOR__SHIFT) & A4XX_RB_MRT_BLEND_CONTROL_RGB_SRC_FACTOR__MASK;
+}
+#define A4XX_RB_MRT_BLEND_CONTROL_RGB_BLEND_OPCODE__MASK	0x000000e0
+#define A4XX_RB_MRT_BLEND_CONTROL_RGB_BLEND_OPCODE__SHIFT	5
+static inline uint32_t A4XX_RB_MRT_BLEND_CONTROL_RGB_BLEND_OPCODE(enum a4xx_rb_blend_opcode val)
+{
+	return ((val) << A4XX_RB_MRT_BLEND_CONTROL_RGB_BLEND_OPCODE__SHIFT) & A4XX_RB_MRT_BLEND_CONTROL_RGB_BLEND_OPCODE__MASK;
+}
+#define A4XX_RB_MRT_BLEND_CONTROL_RGB_DEST_FACTOR__MASK		0x00001f00
+#define A4XX_RB_MRT_BLEND_CONTROL_RGB_DEST_FACTOR__SHIFT	8
+static inline uint32_t A4XX_RB_MRT_BLEND_CONTROL_RGB_DEST_FACTOR(enum adreno_rb_blend_factor val)
+{
+	return ((val) << A4XX_RB_MRT_BLEND_CONTROL_RGB_DEST_FACTOR__SHIFT) & A4XX_RB_MRT_BLEND_CONTROL_RGB_DEST_FACTOR__MASK;
+}
+#define A4XX_RB_MRT_BLEND_CONTROL_ALPHA_SRC_FACTOR__MASK	0x001f0000
+#define A4XX_RB_MRT_BLEND_CONTROL_ALPHA_SRC_FACTOR__SHIFT	16
+static inline uint32_t A4XX_RB_MRT_BLEND_CONTROL_ALPHA_SRC_FACTOR(enum adreno_rb_blend_factor val)
+{
+	return ((val) << A4XX_RB_MRT_BLEND_CONTROL_ALPHA_SRC_FACTOR__SHIFT) & A4XX_RB_MRT_BLEND_CONTROL_ALPHA_SRC_FACTOR__MASK;
+}
+#define A4XX_RB_MRT_BLEND_CONTROL_ALPHA_BLEND_OPCODE__MASK	0x00e00000
+#define A4XX_RB_MRT_BLEND_CONTROL_ALPHA_BLEND_OPCODE__SHIFT	21
+static inline uint32_t A4XX_RB_MRT_BLEND_CONTROL_ALPHA_BLEND_OPCODE(enum a4xx_rb_blend_opcode val)
+{
+	return ((val) << A4XX_RB_MRT_BLEND_CONTROL_ALPHA_BLEND_OPCODE__SHIFT) & A4XX_RB_MRT_BLEND_CONTROL_ALPHA_BLEND_OPCODE__MASK;
+}
+#define A4XX_RB_MRT_BLEND_CONTROL_ALPHA_DEST_FACTOR__MASK	0x1f000000
+#define A4XX_RB_MRT_BLEND_CONTROL_ALPHA_DEST_FACTOR__SHIFT	24
+static inline uint32_t A4XX_RB_MRT_BLEND_CONTROL_ALPHA_DEST_FACTOR(enum adreno_rb_blend_factor val)
+{
+	return ((val) << A4XX_RB_MRT_BLEND_CONTROL_ALPHA_DEST_FACTOR__SHIFT) & A4XX_RB_MRT_BLEND_CONTROL_ALPHA_DEST_FACTOR__MASK;
+}
+
+#define REG_A4XX_RB_ALPHA_CONTROL				0x000020f8
+#define A4XX_RB_ALPHA_CONTROL_ALPHA_TEST			0x00000100
+#define A4XX_RB_ALPHA_CONTROL_ALPHA_TEST_FUNC__MASK		0x00000e00
+#define A4XX_RB_ALPHA_CONTROL_ALPHA_TEST_FUNC__SHIFT		9
+static inline uint32_t A4XX_RB_ALPHA_CONTROL_ALPHA_TEST_FUNC(enum adreno_compare_func val)
+{
+	return ((val) << A4XX_RB_ALPHA_CONTROL_ALPHA_TEST_FUNC__SHIFT) & A4XX_RB_ALPHA_CONTROL_ALPHA_TEST_FUNC__MASK;
+}
+
+#define REG_A4XX_RB_FS_OUTPUT					0x000020f9
+#define A4XX_RB_FS_OUTPUT_ENABLE_COLOR_PIPE			0x00000001
+#define A4XX_RB_FS_OUTPUT_FAST_CLEAR				0x00000100
+#define A4XX_RB_FS_OUTPUT_SAMPLE_MASK__MASK			0xffff0000
+#define A4XX_RB_FS_OUTPUT_SAMPLE_MASK__SHIFT			16
+static inline uint32_t A4XX_RB_FS_OUTPUT_SAMPLE_MASK(uint32_t val)
+{
+	return ((val) << A4XX_RB_FS_OUTPUT_SAMPLE_MASK__SHIFT) & A4XX_RB_FS_OUTPUT_SAMPLE_MASK__MASK;
+}
+
+#define REG_A4XX_RB_RENDER_CONTROL3				0x000020fb
+#define A4XX_RB_RENDER_CONTROL3_COMPONENT_ENABLE__MASK		0x0000001f
+#define A4XX_RB_RENDER_CONTROL3_COMPONENT_ENABLE__SHIFT		0
+static inline uint32_t A4XX_RB_RENDER_CONTROL3_COMPONENT_ENABLE(uint32_t val)
+{
+	return ((val) << A4XX_RB_RENDER_CONTROL3_COMPONENT_ENABLE__SHIFT) & A4XX_RB_RENDER_CONTROL3_COMPONENT_ENABLE__MASK;
+}
+
+#define REG_A4XX_RB_COPY_CONTROL				0x000020fc
+#define A4XX_RB_COPY_CONTROL_MSAA_RESOLVE__MASK			0x00000003
+#define A4XX_RB_COPY_CONTROL_MSAA_RESOLVE__SHIFT		0
+static inline uint32_t A4XX_RB_COPY_CONTROL_MSAA_RESOLVE(enum a3xx_msaa_samples val)
+{
+	return ((val) << A4XX_RB_COPY_CONTROL_MSAA_RESOLVE__SHIFT) & A4XX_RB_COPY_CONTROL_MSAA_RESOLVE__MASK;
+}
+#define A4XX_RB_COPY_CONTROL_MODE__MASK				0x00000070
+#define A4XX_RB_COPY_CONTROL_MODE__SHIFT			4
+static inline uint32_t A4XX_RB_COPY_CONTROL_MODE(enum adreno_rb_copy_control_mode val)
+{
+	return ((val) << A4XX_RB_COPY_CONTROL_MODE__SHIFT) & A4XX_RB_COPY_CONTROL_MODE__MASK;
+}
+#define A4XX_RB_COPY_CONTROL_FASTCLEAR__MASK			0x00000f00
+#define A4XX_RB_COPY_CONTROL_FASTCLEAR__SHIFT			8
+static inline uint32_t A4XX_RB_COPY_CONTROL_FASTCLEAR(uint32_t val)
+{
+	return ((val) << A4XX_RB_COPY_CONTROL_FASTCLEAR__SHIFT) & A4XX_RB_COPY_CONTROL_FASTCLEAR__MASK;
+}
+#define A4XX_RB_COPY_CONTROL_GMEM_BASE__MASK			0xffffc000
+#define A4XX_RB_COPY_CONTROL_GMEM_BASE__SHIFT			14
+static inline uint32_t A4XX_RB_COPY_CONTROL_GMEM_BASE(uint32_t val)
+{
+	return ((val >> 14) << A4XX_RB_COPY_CONTROL_GMEM_BASE__SHIFT) & A4XX_RB_COPY_CONTROL_GMEM_BASE__MASK;
+}
+
+#define REG_A4XX_RB_COPY_DEST_BASE				0x000020fd
+#define A4XX_RB_COPY_DEST_BASE_BASE__MASK			0xfffffff0
+#define A4XX_RB_COPY_DEST_BASE_BASE__SHIFT			4
+static inline uint32_t A4XX_RB_COPY_DEST_BASE_BASE(uint32_t val)
+{
+	return ((val >> 4) << A4XX_RB_COPY_DEST_BASE_BASE__SHIFT) & A4XX_RB_COPY_DEST_BASE_BASE__MASK;
+}
+
+#define REG_A4XX_RB_COPY_DEST_PITCH				0x000020fe
+#define A4XX_RB_COPY_DEST_PITCH_PITCH__MASK			0xffffffff
+#define A4XX_RB_COPY_DEST_PITCH_PITCH__SHIFT			0
+static inline uint32_t A4XX_RB_COPY_DEST_PITCH_PITCH(uint32_t val)
+{
+	return ((val >> 5) << A4XX_RB_COPY_DEST_PITCH_PITCH__SHIFT) & A4XX_RB_COPY_DEST_PITCH_PITCH__MASK;
+}
+
+#define REG_A4XX_RB_COPY_DEST_INFO				0x000020ff
+#define A4XX_RB_COPY_DEST_INFO_FORMAT__MASK			0x000000fc
+#define A4XX_RB_COPY_DEST_INFO_FORMAT__SHIFT			2
+static inline uint32_t A4XX_RB_COPY_DEST_INFO_FORMAT(enum a4xx_color_fmt val)
+{
+	return ((val) << A4XX_RB_COPY_DEST_INFO_FORMAT__SHIFT) & A4XX_RB_COPY_DEST_INFO_FORMAT__MASK;
+}
+#define A4XX_RB_COPY_DEST_INFO_SWAP__MASK			0x00000300
+#define A4XX_RB_COPY_DEST_INFO_SWAP__SHIFT			8
+static inline uint32_t A4XX_RB_COPY_DEST_INFO_SWAP(enum a3xx_color_swap val)
+{
+	return ((val) << A4XX_RB_COPY_DEST_INFO_SWAP__SHIFT) & A4XX_RB_COPY_DEST_INFO_SWAP__MASK;
+}
+#define A4XX_RB_COPY_DEST_INFO_DITHER_MODE__MASK		0x00000c00
+#define A4XX_RB_COPY_DEST_INFO_DITHER_MODE__SHIFT		10
+static inline uint32_t A4XX_RB_COPY_DEST_INFO_DITHER_MODE(enum adreno_rb_dither_mode val)
+{
+	return ((val) << A4XX_RB_COPY_DEST_INFO_DITHER_MODE__SHIFT) & A4XX_RB_COPY_DEST_INFO_DITHER_MODE__MASK;
+}
+#define A4XX_RB_COPY_DEST_INFO_COMPONENT_ENABLE__MASK		0x0003c000
+#define A4XX_RB_COPY_DEST_INFO_COMPONENT_ENABLE__SHIFT		14
+static inline uint32_t A4XX_RB_COPY_DEST_INFO_COMPONENT_ENABLE(uint32_t val)
+{
+	return ((val) << A4XX_RB_COPY_DEST_INFO_COMPONENT_ENABLE__SHIFT) & A4XX_RB_COPY_DEST_INFO_COMPONENT_ENABLE__MASK;
+}
+#define A4XX_RB_COPY_DEST_INFO_ENDIAN__MASK			0x001c0000
+#define A4XX_RB_COPY_DEST_INFO_ENDIAN__SHIFT			18
+static inline uint32_t A4XX_RB_COPY_DEST_INFO_ENDIAN(enum adreno_rb_surface_endian val)
+{
+	return ((val) << A4XX_RB_COPY_DEST_INFO_ENDIAN__SHIFT) & A4XX_RB_COPY_DEST_INFO_ENDIAN__MASK;
+}
+#define A4XX_RB_COPY_DEST_INFO_TILE__MASK			0x03000000
+#define A4XX_RB_COPY_DEST_INFO_TILE__SHIFT			24
+static inline uint32_t A4XX_RB_COPY_DEST_INFO_TILE(enum a4xx_tile_mode val)
+{
+	return ((val) << A4XX_RB_COPY_DEST_INFO_TILE__SHIFT) & A4XX_RB_COPY_DEST_INFO_TILE__MASK;
+}
+
+#define REG_A4XX_RB_FS_OUTPUT_REG				0x00002100
+#define A4XX_RB_FS_OUTPUT_REG_COLOR_PIPE_ENABLE			0x00000001
+#define A4XX_RB_FS_OUTPUT_REG_FRAG_WRITES_Z			0x00000020
+
+#define REG_A4XX_RB_DEPTH_CONTROL				0x00002101
+#define A4XX_RB_DEPTH_CONTROL_FRAG_WRITES_Z			0x00000001
+#define A4XX_RB_DEPTH_CONTROL_Z_ENABLE				0x00000002
+#define A4XX_RB_DEPTH_CONTROL_Z_WRITE_ENABLE			0x00000004
+#define A4XX_RB_DEPTH_CONTROL_ZFUNC__MASK			0x00000070
+#define A4XX_RB_DEPTH_CONTROL_ZFUNC__SHIFT			4
+static inline uint32_t A4XX_RB_DEPTH_CONTROL_ZFUNC(enum adreno_compare_func val)
+{
+	return ((val) << A4XX_RB_DEPTH_CONTROL_ZFUNC__SHIFT) & A4XX_RB_DEPTH_CONTROL_ZFUNC__MASK;
+}
+#define A4XX_RB_DEPTH_CONTROL_BF_ENABLE				0x00000080
+#define A4XX_RB_DEPTH_CONTROL_EARLY_Z_DISABLE			0x00010000
+#define A4XX_RB_DEPTH_CONTROL_Z_TEST_ENABLE			0x80000000
+
+#define REG_A4XX_RB_DEPTH_CLEAR					0x00002102
+
+#define REG_A4XX_RB_DEPTH_INFO					0x00002103
+#define A4XX_RB_DEPTH_INFO_DEPTH_FORMAT__MASK			0x00000003
+#define A4XX_RB_DEPTH_INFO_DEPTH_FORMAT__SHIFT			0
+static inline uint32_t A4XX_RB_DEPTH_INFO_DEPTH_FORMAT(enum a4xx_depth_format val)
+{
+	return ((val) << A4XX_RB_DEPTH_INFO_DEPTH_FORMAT__SHIFT) & A4XX_RB_DEPTH_INFO_DEPTH_FORMAT__MASK;
+}
+#define A4XX_RB_DEPTH_INFO_DEPTH_BASE__MASK			0xfffff000
+#define A4XX_RB_DEPTH_INFO_DEPTH_BASE__SHIFT			12
+static inline uint32_t A4XX_RB_DEPTH_INFO_DEPTH_BASE(uint32_t val)
+{
+	return ((val >> 12) << A4XX_RB_DEPTH_INFO_DEPTH_BASE__SHIFT) & A4XX_RB_DEPTH_INFO_DEPTH_BASE__MASK;
+}
+
+#define REG_A4XX_RB_DEPTH_PITCH					0x00002104
+#define A4XX_RB_DEPTH_PITCH__MASK				0xffffffff
+#define A4XX_RB_DEPTH_PITCH__SHIFT				0
+static inline uint32_t A4XX_RB_DEPTH_PITCH(uint32_t val)
+{
+	return ((val >> 4) << A4XX_RB_DEPTH_PITCH__SHIFT) & A4XX_RB_DEPTH_PITCH__MASK;
+}
+
+#define REG_A4XX_RB_DEPTH_PITCH2				0x00002105
+#define A4XX_RB_DEPTH_PITCH2__MASK				0xffffffff
+#define A4XX_RB_DEPTH_PITCH2__SHIFT				0
+static inline uint32_t A4XX_RB_DEPTH_PITCH2(uint32_t val)
+{
+	return ((val >> 4) << A4XX_RB_DEPTH_PITCH2__SHIFT) & A4XX_RB_DEPTH_PITCH2__MASK;
+}
+
+#define REG_A4XX_RB_STENCIL_CONTROL				0x00002106
+#define A4XX_RB_STENCIL_CONTROL_STENCIL_ENABLE			0x00000001
+#define A4XX_RB_STENCIL_CONTROL_STENCIL_ENABLE_BF		0x00000002
+#define A4XX_RB_STENCIL_CONTROL_STENCIL_READ			0x00000004
+#define A4XX_RB_STENCIL_CONTROL_FUNC__MASK			0x00000700
+#define A4XX_RB_STENCIL_CONTROL_FUNC__SHIFT			8
+static inline uint32_t A4XX_RB_STENCIL_CONTROL_FUNC(enum adreno_compare_func val)
+{
+	return ((val) << A4XX_RB_STENCIL_CONTROL_FUNC__SHIFT) & A4XX_RB_STENCIL_CONTROL_FUNC__MASK;
+}
+#define A4XX_RB_STENCIL_CONTROL_FAIL__MASK			0x00003800
+#define A4XX_RB_STENCIL_CONTROL_FAIL__SHIFT			11
+static inline uint32_t A4XX_RB_STENCIL_CONTROL_FAIL(enum adreno_stencil_op val)
+{
+	return ((val) << A4XX_RB_STENCIL_CONTROL_FAIL__SHIFT) & A4XX_RB_STENCIL_CONTROL_FAIL__MASK;
+}
+#define A4XX_RB_STENCIL_CONTROL_ZPASS__MASK			0x0001c000
+#define A4XX_RB_STENCIL_CONTROL_ZPASS__SHIFT			14
+static inline uint32_t A4XX_RB_STENCIL_CONTROL_ZPASS(enum adreno_stencil_op val)
+{
+	return ((val) << A4XX_RB_STENCIL_CONTROL_ZPASS__SHIFT) & A4XX_RB_STENCIL_CONTROL_ZPASS__MASK;
+}
+#define A4XX_RB_STENCIL_CONTROL_ZFAIL__MASK			0x000e0000
+#define A4XX_RB_STENCIL_CONTROL_ZFAIL__SHIFT			17
+static inline uint32_t A4XX_RB_STENCIL_CONTROL_ZFAIL(enum adreno_stencil_op val)
+{
+	return ((val) << A4XX_RB_STENCIL_CONTROL_ZFAIL__SHIFT) & A4XX_RB_STENCIL_CONTROL_ZFAIL__MASK;
+}
+#define A4XX_RB_STENCIL_CONTROL_FUNC_BF__MASK			0x00700000
+#define A4XX_RB_STENCIL_CONTROL_FUNC_BF__SHIFT			20
+static inline uint32_t A4XX_RB_STENCIL_CONTROL_FUNC_BF(enum adreno_compare_func val)
+{
+	return ((val) << A4XX_RB_STENCIL_CONTROL_FUNC_BF__SHIFT) & A4XX_RB_STENCIL_CONTROL_FUNC_BF__MASK;
+}
+#define A4XX_RB_STENCIL_CONTROL_FAIL_BF__MASK			0x03800000
+#define A4XX_RB_STENCIL_CONTROL_FAIL_BF__SHIFT			23
+static inline uint32_t A4XX_RB_STENCIL_CONTROL_FAIL_BF(enum adreno_stencil_op val)
+{
+	return ((val) << A4XX_RB_STENCIL_CONTROL_FAIL_BF__SHIFT) & A4XX_RB_STENCIL_CONTROL_FAIL_BF__MASK;
+}
+#define A4XX_RB_STENCIL_CONTROL_ZPASS_BF__MASK			0x1c000000
+#define A4XX_RB_STENCIL_CONTROL_ZPASS_BF__SHIFT			26
+static inline uint32_t A4XX_RB_STENCIL_CONTROL_ZPASS_BF(enum adreno_stencil_op val)
+{
+	return ((val) << A4XX_RB_STENCIL_CONTROL_ZPASS_BF__SHIFT) & A4XX_RB_STENCIL_CONTROL_ZPASS_BF__MASK;
+}
+#define A4XX_RB_STENCIL_CONTROL_ZFAIL_BF__MASK			0xe0000000
+#define A4XX_RB_STENCIL_CONTROL_ZFAIL_BF__SHIFT			29
+static inline uint32_t A4XX_RB_STENCIL_CONTROL_ZFAIL_BF(enum adreno_stencil_op val)
+{
+	return ((val) << A4XX_RB_STENCIL_CONTROL_ZFAIL_BF__SHIFT) & A4XX_RB_STENCIL_CONTROL_ZFAIL_BF__MASK;
+}
+
+#define REG_A4XX_RB_STENCIL_CONTROL2				0x00002107
+#define A4XX_RB_STENCIL_CONTROL2_STENCIL_BUFFER			0x00000001
+
+#define REG_A4XX_RB_STENCILREFMASK				0x0000210b
+#define A4XX_RB_STENCILREFMASK_STENCILREF__MASK			0x000000ff
+#define A4XX_RB_STENCILREFMASK_STENCILREF__SHIFT		0
+static inline uint32_t A4XX_RB_STENCILREFMASK_STENCILREF(uint32_t val)
+{
+	return ((val) << A4XX_RB_STENCILREFMASK_STENCILREF__SHIFT) & A4XX_RB_STENCILREFMASK_STENCILREF__MASK;
+}
+#define A4XX_RB_STENCILREFMASK_STENCILMASK__MASK		0x0000ff00
+#define A4XX_RB_STENCILREFMASK_STENCILMASK__SHIFT		8
+static inline uint32_t A4XX_RB_STENCILREFMASK_STENCILMASK(uint32_t val)
+{
+	return ((val) << A4XX_RB_STENCILREFMASK_STENCILMASK__SHIFT) & A4XX_RB_STENCILREFMASK_STENCILMASK__MASK;
+}
+#define A4XX_RB_STENCILREFMASK_STENCILWRITEMASK__MASK		0x00ff0000
+#define A4XX_RB_STENCILREFMASK_STENCILWRITEMASK__SHIFT		16
+static inline uint32_t A4XX_RB_STENCILREFMASK_STENCILWRITEMASK(uint32_t val)
+{
+	return ((val) << A4XX_RB_STENCILREFMASK_STENCILWRITEMASK__SHIFT) & A4XX_RB_STENCILREFMASK_STENCILWRITEMASK__MASK;
+}
+
+#define REG_A4XX_RB_STENCILREFMASK_BF				0x0000210c
+#define A4XX_RB_STENCILREFMASK_BF_STENCILREF__MASK		0x000000ff
+#define A4XX_RB_STENCILREFMASK_BF_STENCILREF__SHIFT		0
+static inline uint32_t A4XX_RB_STENCILREFMASK_BF_STENCILREF(uint32_t val)
+{
+	return ((val) << A4XX_RB_STENCILREFMASK_BF_STENCILREF__SHIFT) & A4XX_RB_STENCILREFMASK_BF_STENCILREF__MASK;
+}
+#define A4XX_RB_STENCILREFMASK_BF_STENCILMASK__MASK		0x0000ff00
+#define A4XX_RB_STENCILREFMASK_BF_STENCILMASK__SHIFT		8
+static inline uint32_t A4XX_RB_STENCILREFMASK_BF_STENCILMASK(uint32_t val)
+{
+	return ((val) << A4XX_RB_STENCILREFMASK_BF_STENCILMASK__SHIFT) & A4XX_RB_STENCILREFMASK_BF_STENCILMASK__MASK;
+}
+#define A4XX_RB_STENCILREFMASK_BF_STENCILWRITEMASK__MASK	0x00ff0000
+#define A4XX_RB_STENCILREFMASK_BF_STENCILWRITEMASK__SHIFT	16
+static inline uint32_t A4XX_RB_STENCILREFMASK_BF_STENCILWRITEMASK(uint32_t val)
+{
+	return ((val) << A4XX_RB_STENCILREFMASK_BF_STENCILWRITEMASK__SHIFT) & A4XX_RB_STENCILREFMASK_BF_STENCILWRITEMASK__MASK;
+}
+
+#define REG_A4XX_RB_BIN_OFFSET					0x0000210d
+#define A4XX_RB_BIN_OFFSET_WINDOW_OFFSET_DISABLE		0x80000000
+#define A4XX_RB_BIN_OFFSET_X__MASK				0x00007fff
+#define A4XX_RB_BIN_OFFSET_X__SHIFT				0
+static inline uint32_t A4XX_RB_BIN_OFFSET_X(uint32_t val)
+{
+	return ((val) << A4XX_RB_BIN_OFFSET_X__SHIFT) & A4XX_RB_BIN_OFFSET_X__MASK;
+}
+#define A4XX_RB_BIN_OFFSET_Y__MASK				0x7fff0000
+#define A4XX_RB_BIN_OFFSET_Y__SHIFT				16
+static inline uint32_t A4XX_RB_BIN_OFFSET_Y(uint32_t val)
+{
+	return ((val) << A4XX_RB_BIN_OFFSET_Y__SHIFT) & A4XX_RB_BIN_OFFSET_Y__MASK;
+}
+
+#define REG_A4XX_RB_VPORT_Z_CLAMP_MAX_15			0x0000213f
+
+#define REG_A4XX_RBBM_HW_VERSION				0x00000000
+
+#define REG_A4XX_RBBM_HW_CONFIGURATION				0x00000002
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL_TP(uint32_t i0) { return 0x00000004 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL_TP_REG(uint32_t i0) { return 0x00000004 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL2_TP(uint32_t i0) { return 0x00000008 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL2_TP_REG(uint32_t i0) { return 0x00000008 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_HYST_TP(uint32_t i0) { return 0x0000000c + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_HYST_TP_REG(uint32_t i0) { return 0x0000000c + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_DELAY_TP(uint32_t i0) { return 0x00000010 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_DELAY_TP_REG(uint32_t i0) { return 0x00000010 + 0x1*i0; }
+
+#define REG_A4XX_RBBM_CLOCK_CTL_UCHE 				0x00000014
+
+#define REG_A4XX_RBBM_CLOCK_CTL2_UCHE				0x00000015
+
+#define REG_A4XX_RBBM_CLOCK_CTL3_UCHE				0x00000016
+
+#define REG_A4XX_RBBM_CLOCK_CTL4_UCHE				0x00000017
+
+#define REG_A4XX_RBBM_CLOCK_HYST_UCHE				0x00000018
+
+#define REG_A4XX_RBBM_CLOCK_DELAY_UCHE				0x00000019
+
+#define REG_A4XX_RBBM_CLOCK_MODE_GPC				0x0000001a
+
+#define REG_A4XX_RBBM_CLOCK_DELAY_GPC				0x0000001b
+
+#define REG_A4XX_RBBM_CLOCK_HYST_GPC				0x0000001c
+
+#define REG_A4XX_RBBM_CLOCK_CTL_TSE_RAS_RBBM			0x0000001d
+
+#define REG_A4XX_RBBM_CLOCK_HYST_TSE_RAS_RBBM			0x0000001e
+
+#define REG_A4XX_RBBM_CLOCK_DELAY_TSE_RAS_RBBM			0x0000001f
+
+#define REG_A4XX_RBBM_CLOCK_CTL					0x00000020
+
+#define REG_A4XX_RBBM_SP_HYST_CNT				0x00000021
+
+#define REG_A4XX_RBBM_SW_RESET_CMD				0x00000022
+
+#define REG_A4XX_RBBM_AHB_CTL0					0x00000023
+
+#define REG_A4XX_RBBM_AHB_CTL1					0x00000024
+
+#define REG_A4XX_RBBM_AHB_CMD					0x00000025
+
+#define REG_A4XX_RBBM_RB_SUB_BLOCK_SEL_CTL			0x00000026
+
+#define REG_A4XX_RBBM_RAM_ACC_63_32				0x00000028
+
+#define REG_A4XX_RBBM_WAIT_IDLE_CLOCKS_CTL			0x0000002b
+
+#define REG_A4XX_RBBM_INTERFACE_HANG_INT_CTL			0x0000002f
+
+#define REG_A4XX_RBBM_INTERFACE_HANG_MASK_CTL4			0x00000034
+
+#define REG_A4XX_RBBM_INT_CLEAR_CMD				0x00000036
+
+#define REG_A4XX_RBBM_INT_0_MASK				0x00000037
+
+#define REG_A4XX_RBBM_RBBM_CTL					0x0000003e
+
+#define REG_A4XX_RBBM_AHB_DEBUG_CTL				0x0000003f
+
+#define REG_A4XX_RBBM_VBIF_DEBUG_CTL				0x00000041
+
+#define REG_A4XX_RBBM_CLOCK_CTL2				0x00000042
+
+#define REG_A4XX_RBBM_BLOCK_SW_RESET_CMD			0x00000045
+
+#define REG_A4XX_RBBM_RESET_CYCLES				0x00000047
+
+#define REG_A4XX_RBBM_EXT_TRACE_BUS_CTL				0x00000049
+
+#define REG_A4XX_RBBM_CFG_DEBBUS_SEL_A				0x0000004a
+
+#define REG_A4XX_RBBM_CFG_DEBBUS_SEL_B				0x0000004b
+
+#define REG_A4XX_RBBM_CFG_DEBBUS_SEL_C				0x0000004c
+
+#define REG_A4XX_RBBM_CFG_DEBBUS_SEL_D				0x0000004d
+
+#define REG_A4XX_RBBM_PERFCTR_CP_0_LO				0x0000009c
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL_SP(uint32_t i0) { return 0x00000068 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL_SP_REG(uint32_t i0) { return 0x00000068 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL2_SP(uint32_t i0) { return 0x0000006c + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL2_SP_REG(uint32_t i0) { return 0x0000006c + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_HYST_SP(uint32_t i0) { return 0x00000070 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_HYST_SP_REG(uint32_t i0) { return 0x00000070 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_DELAY_SP(uint32_t i0) { return 0x00000074 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_DELAY_SP_REG(uint32_t i0) { return 0x00000074 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL_RB(uint32_t i0) { return 0x00000078 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL_RB_REG(uint32_t i0) { return 0x00000078 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL2_RB(uint32_t i0) { return 0x0000007c + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL2_RB_REG(uint32_t i0) { return 0x0000007c + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL_MARB_CCU(uint32_t i0) { return 0x00000082 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_CTL_MARB_CCU_REG(uint32_t i0) { return 0x00000082 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_HYST_RB_MARB_CCU(uint32_t i0) { return 0x00000086 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_HYST_RB_MARB_CCU_REG(uint32_t i0) { return 0x00000086 + 0x1*i0; }
+
+#define REG_A4XX_RBBM_CLOCK_HYST_COM_DCOM			0x00000080
+
+#define REG_A4XX_RBBM_CLOCK_CTL_COM_DCOM			0x00000081
+
+#define REG_A4XX_RBBM_CLOCK_CTL_HLSQ				0x0000008a
+
+#define REG_A4XX_RBBM_CLOCK_HYST_HLSQ				0x0000008b
+
+#define REG_A4XX_RBBM_CLOCK_DELAY_HLSQ				0x0000008c
+
+#define REG_A4XX_RBBM_CLOCK_DELAY_COM_DCOM			0x0000008d
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_DELAY_RB_MARB_CCU_L1(uint32_t i0) { return 0x0000008e + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_RBBM_CLOCK_DELAY_RB_MARB_CCU_L1_REG(uint32_t i0) { return 0x0000008e + 0x1*i0; }
+
+#define REG_A4XX_RBBM_PERFCTR_PWR_1_LO				0x00000168
+
+#define REG_A4XX_RBBM_PERFCTR_CTL				0x00000170
+
+#define REG_A4XX_RBBM_PERFCTR_LOAD_CMD0				0x00000171
+
+#define REG_A4XX_RBBM_PERFCTR_LOAD_CMD1				0x00000172
+
+#define REG_A4XX_RBBM_PERFCTR_LOAD_CMD2				0x00000173
+
+#define REG_A4XX_RBBM_PERFCTR_LOAD_VALUE_LO			0x00000174
+
+#define REG_A4XX_RBBM_PERFCTR_LOAD_VALUE_HI			0x00000175
+
+#define REG_A4XX_RBBM_GPU_BUSY_MASKED				0x0000017a
+
+#define REG_A4XX_RBBM_INT_0_STATUS				0x0000017d
+
+#define REG_A4XX_RBBM_CLOCK_STATUS				0x00000182
+
+#define REG_A4XX_RBBM_AHB_STATUS				0x00000189
+
+#define REG_A4XX_RBBM_AHB_ME_SPLIT_STATUS			0x0000018c
+
+#define REG_A4XX_RBBM_AHB_PFP_SPLIT_STATUS			0x0000018d
+
+#define REG_A4XX_RBBM_AHB_ERROR_STATUS				0x0000018f
+
+#define REG_A4XX_RBBM_STATUS					0x00000191
+#define A4XX_RBBM_STATUS_HI_BUSY				0x00000001
+#define A4XX_RBBM_STATUS_CP_ME_BUSY				0x00000002
+#define A4XX_RBBM_STATUS_CP_PFP_BUSY				0x00000004
+#define A4XX_RBBM_STATUS_CP_NRT_BUSY				0x00004000
+#define A4XX_RBBM_STATUS_VBIF_BUSY				0x00008000
+#define A4XX_RBBM_STATUS_TSE_BUSY				0x00010000
+#define A4XX_RBBM_STATUS_RAS_BUSY				0x00020000
+#define A4XX_RBBM_STATUS_RB_BUSY				0x00040000
+#define A4XX_RBBM_STATUS_PC_DCALL_BUSY				0x00080000
+#define A4XX_RBBM_STATUS_PC_VSD_BUSY				0x00100000
+#define A4XX_RBBM_STATUS_VFD_BUSY				0x00200000
+#define A4XX_RBBM_STATUS_VPC_BUSY				0x00400000
+#define A4XX_RBBM_STATUS_UCHE_BUSY				0x00800000
+#define A4XX_RBBM_STATUS_SP_BUSY				0x01000000
+#define A4XX_RBBM_STATUS_TPL1_BUSY				0x02000000
+#define A4XX_RBBM_STATUS_MARB_BUSY				0x04000000
+#define A4XX_RBBM_STATUS_VSC_BUSY				0x08000000
+#define A4XX_RBBM_STATUS_ARB_BUSY				0x10000000
+#define A4XX_RBBM_STATUS_HLSQ_BUSY				0x20000000
+#define A4XX_RBBM_STATUS_GPU_BUSY_NOHC				0x40000000
+#define A4XX_RBBM_STATUS_GPU_BUSY				0x80000000
+
+#define REG_A4XX_RBBM_INTERFACE_RRDY_STATUS5			0x0000019f
+
+#define REG_A4XX_CP_SCRATCH_UMASK				0x00000228
+
+#define REG_A4XX_CP_SCRATCH_ADDR				0x00000229
+
+#define REG_A4XX_CP_RB_BASE					0x00000200
+
+#define REG_A4XX_CP_RB_CNTL					0x00000201
+
+#define REG_A4XX_CP_RB_WPTR					0x00000205
+
+#define REG_A4XX_CP_RB_RPTR_ADDR				0x00000203
+
+#define REG_A4XX_CP_RB_RPTR					0x00000204
+
+#define REG_A4XX_CP_IB1_BASE					0x00000206
+
+#define REG_A4XX_CP_IB1_BUFSZ					0x00000207
+
+#define REG_A4XX_CP_IB2_BASE					0x00000208
+
+#define REG_A4XX_CP_IB2_BUFSZ					0x00000209
+
+#define REG_A4XX_CP_ME_RB_DONE_DATA				0x00000217
+
+#define REG_A4XX_CP_QUEUE_THRESH2				0x00000219
+
+#define REG_A4XX_CP_MERCIU_SIZE					0x0000021b
+
+#define REG_A4XX_CP_ROQ_ADDR					0x0000021c
+
+#define REG_A4XX_CP_ROQ_DATA					0x0000021d
+
+#define REG_A4XX_CP_MEQ_ADDR 					0x0000021e
+
+#define REG_A4XX_CP_MEQ_DATA 					0x0000021f
+
+#define REG_A4XX_CP_MERCIU_ADDR					0x00000220
+
+#define REG_A4XX_CP_MERCIU_DATA					0x00000221
+
+#define REG_A4XX_CP_MERCIU_DATA2				0x00000222
+
+#define REG_A4XX_CP_PFP_UCODE_ADDR				0x00000223
+
+#define REG_A4XX_CP_PFP_UCODE_DATA				0x00000224
+
+#define REG_A4XX_CP_ME_RAM_WADDR				0x00000225
+
+#define REG_A4XX_CP_ME_RAM_RADDR				0x00000226
+
+#define REG_A4XX_CP_ME_RAM_DATA					0x00000227
+
+#define REG_A4XX_CP_PREEMPT					0x0000022a
+
+#define REG_A4XX_CP_CNTL					0x0000022c
+
+#define REG_A4XX_CP_ME_CNTL					0x0000022d
+
+#define REG_A4XX_CP_DEBUG					0x0000022e
+
+#define REG_A4XX_CP_DEBUG_ECO_CONTROL				0x00000231
+
+#define REG_A4XX_CP_DRAW_STATE_ADDR				0x00000232
+
+#define REG_A4XX_CP_PROTECT_REG_0				0x00000240
+
+static inline uint32_t REG_A4XX_CP_PROTECT(uint32_t i0) { return 0x00000240 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_CP_PROTECT_REG(uint32_t i0) { return 0x00000240 + 0x1*i0; }
+
+#define REG_A4XX_CP_PROTECT_CTRL				0x00000250
+
+#define REG_A4XX_CP_ST_BASE					0x000004c0
+
+#define REG_A4XX_CP_STQ_AVAIL					0x000004ce
+
+#define REG_A4XX_CP_MERCIU_STAT					0x000004d0
+
+#define REG_A4XX_CP_WFI_PEND_CTR				0x000004d2
+
+#define REG_A4XX_CP_HW_FAULT					0x000004d8
+
+#define REG_A4XX_CP_PROTECT_STATUS				0x000004da
+
+#define REG_A4XX_CP_EVENTS_IN_FLIGHT				0x000004dd
+
+#define REG_A4XX_CP_PERFCTR_CP_SEL_0				0x00000500
+
+#define REG_A4XX_CP_PERFCOMBINER_SELECT				0x0000050b
+
+static inline uint32_t REG_A4XX_CP_SCRATCH(uint32_t i0) { return 0x00000578 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_CP_SCRATCH_REG(uint32_t i0) { return 0x00000578 + 0x1*i0; }
+
+#define REG_A4XX_SP_VS_STATUS					0x00000ec0
+
+#define REG_A4XX_SP_PERFCTR_SP_SEL_11				0x00000ecf
+
+#define REG_A4XX_SP_SP_CTRL_REG					0x000022c0
+#define A4XX_SP_SP_CTRL_REG_BINNING_PASS			0x00080000
+
+#define REG_A4XX_SP_INSTR_CACHE_CTRL				0x000022c1
+
+#define REG_A4XX_SP_VS_CTRL_REG0				0x000022c4
+#define A4XX_SP_VS_CTRL_REG0_THREADMODE__MASK			0x00000001
+#define A4XX_SP_VS_CTRL_REG0_THREADMODE__SHIFT			0
+static inline uint32_t A4XX_SP_VS_CTRL_REG0_THREADMODE(enum a3xx_threadmode val)
+{
+	return ((val) << A4XX_SP_VS_CTRL_REG0_THREADMODE__SHIFT) & A4XX_SP_VS_CTRL_REG0_THREADMODE__MASK;
+}
+#define A4XX_SP_VS_CTRL_REG0_VARYING				0x00000002
+#define A4XX_SP_VS_CTRL_REG0_CACHEINVALID			0x00000004
+#define A4XX_SP_VS_CTRL_REG0_HALFREGFOOTPRINT__MASK		0x000003f0
+#define A4XX_SP_VS_CTRL_REG0_HALFREGFOOTPRINT__SHIFT		4
+static inline uint32_t A4XX_SP_VS_CTRL_REG0_HALFREGFOOTPRINT(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_CTRL_REG0_HALFREGFOOTPRINT__SHIFT) & A4XX_SP_VS_CTRL_REG0_HALFREGFOOTPRINT__MASK;
+}
+#define A4XX_SP_VS_CTRL_REG0_FULLREGFOOTPRINT__MASK		0x0003fc00
+#define A4XX_SP_VS_CTRL_REG0_FULLREGFOOTPRINT__SHIFT		10
+static inline uint32_t A4XX_SP_VS_CTRL_REG0_FULLREGFOOTPRINT(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_CTRL_REG0_FULLREGFOOTPRINT__SHIFT) & A4XX_SP_VS_CTRL_REG0_FULLREGFOOTPRINT__MASK;
+}
+#define A4XX_SP_VS_CTRL_REG0_INOUTREGOVERLAP__MASK		0x000c0000
+#define A4XX_SP_VS_CTRL_REG0_INOUTREGOVERLAP__SHIFT		18
+static inline uint32_t A4XX_SP_VS_CTRL_REG0_INOUTREGOVERLAP(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_CTRL_REG0_INOUTREGOVERLAP__SHIFT) & A4XX_SP_VS_CTRL_REG0_INOUTREGOVERLAP__MASK;
+}
+#define A4XX_SP_VS_CTRL_REG0_THREADSIZE__MASK			0x00100000
+#define A4XX_SP_VS_CTRL_REG0_THREADSIZE__SHIFT			20
+static inline uint32_t A4XX_SP_VS_CTRL_REG0_THREADSIZE(enum a3xx_threadsize val)
+{
+	return ((val) << A4XX_SP_VS_CTRL_REG0_THREADSIZE__SHIFT) & A4XX_SP_VS_CTRL_REG0_THREADSIZE__MASK;
+}
+#define A4XX_SP_VS_CTRL_REG0_SUPERTHREADMODE			0x00200000
+#define A4XX_SP_VS_CTRL_REG0_PIXLODENABLE			0x00400000
+
+#define REG_A4XX_SP_VS_CTRL_REG1				0x000022c5
+#define A4XX_SP_VS_CTRL_REG1_CONSTLENGTH__MASK			0x000000ff
+#define A4XX_SP_VS_CTRL_REG1_CONSTLENGTH__SHIFT			0
+static inline uint32_t A4XX_SP_VS_CTRL_REG1_CONSTLENGTH(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_CTRL_REG1_CONSTLENGTH__SHIFT) & A4XX_SP_VS_CTRL_REG1_CONSTLENGTH__MASK;
+}
+#define A4XX_SP_VS_CTRL_REG1_INITIALOUTSTANDING__MASK		0x7f000000
+#define A4XX_SP_VS_CTRL_REG1_INITIALOUTSTANDING__SHIFT		24
+static inline uint32_t A4XX_SP_VS_CTRL_REG1_INITIALOUTSTANDING(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_CTRL_REG1_INITIALOUTSTANDING__SHIFT) & A4XX_SP_VS_CTRL_REG1_INITIALOUTSTANDING__MASK;
+}
+
+#define REG_A4XX_SP_VS_PARAM_REG				0x000022c6
+#define A4XX_SP_VS_PARAM_REG_POSREGID__MASK			0x000000ff
+#define A4XX_SP_VS_PARAM_REG_POSREGID__SHIFT			0
+static inline uint32_t A4XX_SP_VS_PARAM_REG_POSREGID(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_PARAM_REG_POSREGID__SHIFT) & A4XX_SP_VS_PARAM_REG_POSREGID__MASK;
+}
+#define A4XX_SP_VS_PARAM_REG_PSIZEREGID__MASK			0x0000ff00
+#define A4XX_SP_VS_PARAM_REG_PSIZEREGID__SHIFT			8
+static inline uint32_t A4XX_SP_VS_PARAM_REG_PSIZEREGID(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_PARAM_REG_PSIZEREGID__SHIFT) & A4XX_SP_VS_PARAM_REG_PSIZEREGID__MASK;
+}
+#define A4XX_SP_VS_PARAM_REG_TOTALVSOUTVAR__MASK		0xfff00000
+#define A4XX_SP_VS_PARAM_REG_TOTALVSOUTVAR__SHIFT		20
+static inline uint32_t A4XX_SP_VS_PARAM_REG_TOTALVSOUTVAR(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_PARAM_REG_TOTALVSOUTVAR__SHIFT) & A4XX_SP_VS_PARAM_REG_TOTALVSOUTVAR__MASK;
+}
+
+static inline uint32_t REG_A4XX_SP_VS_OUT(uint32_t i0) { return 0x000022c7 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_SP_VS_OUT_REG(uint32_t i0) { return 0x000022c7 + 0x1*i0; }
+#define A4XX_SP_VS_OUT_REG_A_REGID__MASK			0x000001ff
+#define A4XX_SP_VS_OUT_REG_A_REGID__SHIFT			0
+static inline uint32_t A4XX_SP_VS_OUT_REG_A_REGID(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_OUT_REG_A_REGID__SHIFT) & A4XX_SP_VS_OUT_REG_A_REGID__MASK;
+}
+#define A4XX_SP_VS_OUT_REG_A_COMPMASK__MASK			0x00001e00
+#define A4XX_SP_VS_OUT_REG_A_COMPMASK__SHIFT			9
+static inline uint32_t A4XX_SP_VS_OUT_REG_A_COMPMASK(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_OUT_REG_A_COMPMASK__SHIFT) & A4XX_SP_VS_OUT_REG_A_COMPMASK__MASK;
+}
+#define A4XX_SP_VS_OUT_REG_B_REGID__MASK			0x01ff0000
+#define A4XX_SP_VS_OUT_REG_B_REGID__SHIFT			16
+static inline uint32_t A4XX_SP_VS_OUT_REG_B_REGID(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_OUT_REG_B_REGID__SHIFT) & A4XX_SP_VS_OUT_REG_B_REGID__MASK;
+}
+#define A4XX_SP_VS_OUT_REG_B_COMPMASK__MASK			0x1e000000
+#define A4XX_SP_VS_OUT_REG_B_COMPMASK__SHIFT			25
+static inline uint32_t A4XX_SP_VS_OUT_REG_B_COMPMASK(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_OUT_REG_B_COMPMASK__SHIFT) & A4XX_SP_VS_OUT_REG_B_COMPMASK__MASK;
+}
+
+static inline uint32_t REG_A4XX_SP_VS_VPC_DST(uint32_t i0) { return 0x000022d8 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_SP_VS_VPC_DST_REG(uint32_t i0) { return 0x000022d8 + 0x1*i0; }
+#define A4XX_SP_VS_VPC_DST_REG_OUTLOC0__MASK			0x000000ff
+#define A4XX_SP_VS_VPC_DST_REG_OUTLOC0__SHIFT			0
+static inline uint32_t A4XX_SP_VS_VPC_DST_REG_OUTLOC0(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_VPC_DST_REG_OUTLOC0__SHIFT) & A4XX_SP_VS_VPC_DST_REG_OUTLOC0__MASK;
+}
+#define A4XX_SP_VS_VPC_DST_REG_OUTLOC1__MASK			0x0000ff00
+#define A4XX_SP_VS_VPC_DST_REG_OUTLOC1__SHIFT			8
+static inline uint32_t A4XX_SP_VS_VPC_DST_REG_OUTLOC1(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_VPC_DST_REG_OUTLOC1__SHIFT) & A4XX_SP_VS_VPC_DST_REG_OUTLOC1__MASK;
+}
+#define A4XX_SP_VS_VPC_DST_REG_OUTLOC2__MASK			0x00ff0000
+#define A4XX_SP_VS_VPC_DST_REG_OUTLOC2__SHIFT			16
+static inline uint32_t A4XX_SP_VS_VPC_DST_REG_OUTLOC2(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_VPC_DST_REG_OUTLOC2__SHIFT) & A4XX_SP_VS_VPC_DST_REG_OUTLOC2__MASK;
+}
+#define A4XX_SP_VS_VPC_DST_REG_OUTLOC3__MASK			0xff000000
+#define A4XX_SP_VS_VPC_DST_REG_OUTLOC3__SHIFT			24
+static inline uint32_t A4XX_SP_VS_VPC_DST_REG_OUTLOC3(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_VPC_DST_REG_OUTLOC3__SHIFT) & A4XX_SP_VS_VPC_DST_REG_OUTLOC3__MASK;
+}
+
+#define REG_A4XX_SP_VS_OBJ_OFFSET_REG				0x000022e0
+#define A4XX_SP_VS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__MASK	0x01ff0000
+#define A4XX_SP_VS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__SHIFT	16
+static inline uint32_t A4XX_SP_VS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__SHIFT) & A4XX_SP_VS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__MASK;
+}
+#define A4XX_SP_VS_OBJ_OFFSET_REG_SHADEROBJOFFSET__MASK		0xfe000000
+#define A4XX_SP_VS_OBJ_OFFSET_REG_SHADEROBJOFFSET__SHIFT	25
+static inline uint32_t A4XX_SP_VS_OBJ_OFFSET_REG_SHADEROBJOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_SP_VS_OBJ_OFFSET_REG_SHADEROBJOFFSET__SHIFT) & A4XX_SP_VS_OBJ_OFFSET_REG_SHADEROBJOFFSET__MASK;
+}
+
+#define REG_A4XX_SP_VS_OBJ_START				0x000022e1
+
+#define REG_A4XX_SP_VS_PVT_MEM_PARAM				0x000022e2
+
+#define REG_A4XX_SP_VS_PVT_MEM_ADDR				0x000022e3
+
+#define REG_A4XX_SP_VS_LENGTH_REG				0x000022e5
+
+#define REG_A4XX_SP_FS_CTRL_REG0				0x000022e8
+#define A4XX_SP_FS_CTRL_REG0_THREADMODE__MASK			0x00000001
+#define A4XX_SP_FS_CTRL_REG0_THREADMODE__SHIFT			0
+static inline uint32_t A4XX_SP_FS_CTRL_REG0_THREADMODE(enum a3xx_threadmode val)
+{
+	return ((val) << A4XX_SP_FS_CTRL_REG0_THREADMODE__SHIFT) & A4XX_SP_FS_CTRL_REG0_THREADMODE__MASK;
+}
+#define A4XX_SP_FS_CTRL_REG0_VARYING				0x00000002
+#define A4XX_SP_FS_CTRL_REG0_CACHEINVALID			0x00000004
+#define A4XX_SP_FS_CTRL_REG0_HALFREGFOOTPRINT__MASK		0x000003f0
+#define A4XX_SP_FS_CTRL_REG0_HALFREGFOOTPRINT__SHIFT		4
+static inline uint32_t A4XX_SP_FS_CTRL_REG0_HALFREGFOOTPRINT(uint32_t val)
+{
+	return ((val) << A4XX_SP_FS_CTRL_REG0_HALFREGFOOTPRINT__SHIFT) & A4XX_SP_FS_CTRL_REG0_HALFREGFOOTPRINT__MASK;
+}
+#define A4XX_SP_FS_CTRL_REG0_FULLREGFOOTPRINT__MASK		0x0003fc00
+#define A4XX_SP_FS_CTRL_REG0_FULLREGFOOTPRINT__SHIFT		10
+static inline uint32_t A4XX_SP_FS_CTRL_REG0_FULLREGFOOTPRINT(uint32_t val)
+{
+	return ((val) << A4XX_SP_FS_CTRL_REG0_FULLREGFOOTPRINT__SHIFT) & A4XX_SP_FS_CTRL_REG0_FULLREGFOOTPRINT__MASK;
+}
+#define A4XX_SP_FS_CTRL_REG0_INOUTREGOVERLAP__MASK		0x000c0000
+#define A4XX_SP_FS_CTRL_REG0_INOUTREGOVERLAP__SHIFT		18
+static inline uint32_t A4XX_SP_FS_CTRL_REG0_INOUTREGOVERLAP(uint32_t val)
+{
+	return ((val) << A4XX_SP_FS_CTRL_REG0_INOUTREGOVERLAP__SHIFT) & A4XX_SP_FS_CTRL_REG0_INOUTREGOVERLAP__MASK;
+}
+#define A4XX_SP_FS_CTRL_REG0_THREADSIZE__MASK			0x00100000
+#define A4XX_SP_FS_CTRL_REG0_THREADSIZE__SHIFT			20
+static inline uint32_t A4XX_SP_FS_CTRL_REG0_THREADSIZE(enum a3xx_threadsize val)
+{
+	return ((val) << A4XX_SP_FS_CTRL_REG0_THREADSIZE__SHIFT) & A4XX_SP_FS_CTRL_REG0_THREADSIZE__MASK;
+}
+#define A4XX_SP_FS_CTRL_REG0_SUPERTHREADMODE			0x00200000
+#define A4XX_SP_FS_CTRL_REG0_PIXLODENABLE			0x00400000
+
+#define REG_A4XX_SP_FS_CTRL_REG1				0x000022e9
+#define A4XX_SP_FS_CTRL_REG1_CONSTLENGTH__MASK			0x000000ff
+#define A4XX_SP_FS_CTRL_REG1_CONSTLENGTH__SHIFT			0
+static inline uint32_t A4XX_SP_FS_CTRL_REG1_CONSTLENGTH(uint32_t val)
+{
+	return ((val) << A4XX_SP_FS_CTRL_REG1_CONSTLENGTH__SHIFT) & A4XX_SP_FS_CTRL_REG1_CONSTLENGTH__MASK;
+}
+#define A4XX_SP_FS_CTRL_REG1_VARYING				0x00100000
+
+#define REG_A4XX_SP_FS_OBJ_OFFSET_REG				0x000022ea
+#define A4XX_SP_FS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__MASK	0x01ff0000
+#define A4XX_SP_FS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__SHIFT	16
+static inline uint32_t A4XX_SP_FS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_SP_FS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__SHIFT) & A4XX_SP_FS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__MASK;
+}
+#define A4XX_SP_FS_OBJ_OFFSET_REG_SHADEROBJOFFSET__MASK		0xfe000000
+#define A4XX_SP_FS_OBJ_OFFSET_REG_SHADEROBJOFFSET__SHIFT	25
+static inline uint32_t A4XX_SP_FS_OBJ_OFFSET_REG_SHADEROBJOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_SP_FS_OBJ_OFFSET_REG_SHADEROBJOFFSET__SHIFT) & A4XX_SP_FS_OBJ_OFFSET_REG_SHADEROBJOFFSET__MASK;
+}
+
+#define REG_A4XX_SP_FS_OBJ_START				0x000022eb
+
+#define REG_A4XX_SP_FS_PVT_MEM_PARAM				0x000022ec
+
+#define REG_A4XX_SP_FS_PVT_MEM_ADDR				0x000022ed
+
+#define REG_A4XX_SP_FS_LENGTH_REG				0x000022ef
+
+#define REG_A4XX_SP_FS_OUTPUT_REG				0x000022f0
+#define A4XX_SP_FS_OUTPUT_REG_DEPTH_ENABLE			0x00000080
+#define A4XX_SP_FS_OUTPUT_REG_DEPTH_REGID__MASK			0x0000ff00
+#define A4XX_SP_FS_OUTPUT_REG_DEPTH_REGID__SHIFT		8
+static inline uint32_t A4XX_SP_FS_OUTPUT_REG_DEPTH_REGID(uint32_t val)
+{
+	return ((val) << A4XX_SP_FS_OUTPUT_REG_DEPTH_REGID__SHIFT) & A4XX_SP_FS_OUTPUT_REG_DEPTH_REGID__MASK;
+}
+
+static inline uint32_t REG_A4XX_SP_FS_MRT(uint32_t i0) { return 0x000022f1 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_SP_FS_MRT_REG(uint32_t i0) { return 0x000022f1 + 0x1*i0; }
+#define A4XX_SP_FS_MRT_REG_REGID__MASK				0x000000ff
+#define A4XX_SP_FS_MRT_REG_REGID__SHIFT				0
+static inline uint32_t A4XX_SP_FS_MRT_REG_REGID(uint32_t val)
+{
+	return ((val) << A4XX_SP_FS_MRT_REG_REGID__SHIFT) & A4XX_SP_FS_MRT_REG_REGID__MASK;
+}
+#define A4XX_SP_FS_MRT_REG_HALF_PRECISION			0x00000100
+#define A4XX_SP_FS_MRT_REG_MRTFORMAT__MASK			0x0003f000
+#define A4XX_SP_FS_MRT_REG_MRTFORMAT__SHIFT			12
+static inline uint32_t A4XX_SP_FS_MRT_REG_MRTFORMAT(enum a4xx_color_fmt val)
+{
+	return ((val) << A4XX_SP_FS_MRT_REG_MRTFORMAT__SHIFT) & A4XX_SP_FS_MRT_REG_MRTFORMAT__MASK;
+}
+
+#define REG_A4XX_SP_HS_OBJ_OFFSET_REG				0x0000230d
+#define A4XX_SP_HS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__MASK	0x01ff0000
+#define A4XX_SP_HS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__SHIFT	16
+static inline uint32_t A4XX_SP_HS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_SP_HS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__SHIFT) & A4XX_SP_HS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__MASK;
+}
+#define A4XX_SP_HS_OBJ_OFFSET_REG_SHADEROBJOFFSET__MASK		0xfe000000
+#define A4XX_SP_HS_OBJ_OFFSET_REG_SHADEROBJOFFSET__SHIFT	25
+static inline uint32_t A4XX_SP_HS_OBJ_OFFSET_REG_SHADEROBJOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_SP_HS_OBJ_OFFSET_REG_SHADEROBJOFFSET__SHIFT) & A4XX_SP_HS_OBJ_OFFSET_REG_SHADEROBJOFFSET__MASK;
+}
+
+#define REG_A4XX_SP_DS_OBJ_OFFSET_REG				0x00002334
+#define A4XX_SP_DS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__MASK	0x01ff0000
+#define A4XX_SP_DS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__SHIFT	16
+static inline uint32_t A4XX_SP_DS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_SP_DS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__SHIFT) & A4XX_SP_DS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__MASK;
+}
+#define A4XX_SP_DS_OBJ_OFFSET_REG_SHADEROBJOFFSET__MASK		0xfe000000
+#define A4XX_SP_DS_OBJ_OFFSET_REG_SHADEROBJOFFSET__SHIFT	25
+static inline uint32_t A4XX_SP_DS_OBJ_OFFSET_REG_SHADEROBJOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_SP_DS_OBJ_OFFSET_REG_SHADEROBJOFFSET__SHIFT) & A4XX_SP_DS_OBJ_OFFSET_REG_SHADEROBJOFFSET__MASK;
+}
+
+#define REG_A4XX_SP_GS_OBJ_OFFSET_REG				0x0000235b
+#define A4XX_SP_GS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__MASK	0x01ff0000
+#define A4XX_SP_GS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__SHIFT	16
+static inline uint32_t A4XX_SP_GS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_SP_GS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__SHIFT) & A4XX_SP_GS_OBJ_OFFSET_REG_CONSTOBJECTOFFSET__MASK;
+}
+#define A4XX_SP_GS_OBJ_OFFSET_REG_SHADEROBJOFFSET__MASK		0xfe000000
+#define A4XX_SP_GS_OBJ_OFFSET_REG_SHADEROBJOFFSET__SHIFT	25
+static inline uint32_t A4XX_SP_GS_OBJ_OFFSET_REG_SHADEROBJOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_SP_GS_OBJ_OFFSET_REG_SHADEROBJOFFSET__SHIFT) & A4XX_SP_GS_OBJ_OFFSET_REG_SHADEROBJOFFSET__MASK;
+}
+
+#define REG_A4XX_SP_GS_LENGTH_REG				0x00002360
+
+#define REG_A4XX_VPC_DEBUG_RAM_SEL				0x00000e60
+
+#define REG_A4XX_VPC_DEBUG_RAM_READ				0x00000e61
+
+#define REG_A4XX_VPC_DEBUG_ECO_CONTROL				0x00000e64
+
+#define REG_A4XX_VPC_PERFCTR_VPC_SEL_3				0x00000e68
+
+#define REG_A4XX_VPC_ATTR					0x00002140
+#define A4XX_VPC_ATTR_TOTALATTR__MASK				0x000001ff
+#define A4XX_VPC_ATTR_TOTALATTR__SHIFT				0
+static inline uint32_t A4XX_VPC_ATTR_TOTALATTR(uint32_t val)
+{
+	return ((val) << A4XX_VPC_ATTR_TOTALATTR__SHIFT) & A4XX_VPC_ATTR_TOTALATTR__MASK;
+}
+#define A4XX_VPC_ATTR_PSIZE					0x00000200
+#define A4XX_VPC_ATTR_THRDASSIGN__MASK				0x00003000
+#define A4XX_VPC_ATTR_THRDASSIGN__SHIFT				12
+static inline uint32_t A4XX_VPC_ATTR_THRDASSIGN(uint32_t val)
+{
+	return ((val) << A4XX_VPC_ATTR_THRDASSIGN__SHIFT) & A4XX_VPC_ATTR_THRDASSIGN__MASK;
+}
+#define A4XX_VPC_ATTR_ENABLE					0x02000000
+
+#define REG_A4XX_VPC_PACK					0x00002141
+#define A4XX_VPC_PACK_NUMBYPASSVAR__MASK			0x000000ff
+#define A4XX_VPC_PACK_NUMBYPASSVAR__SHIFT			0
+static inline uint32_t A4XX_VPC_PACK_NUMBYPASSVAR(uint32_t val)
+{
+	return ((val) << A4XX_VPC_PACK_NUMBYPASSVAR__SHIFT) & A4XX_VPC_PACK_NUMBYPASSVAR__MASK;
+}
+#define A4XX_VPC_PACK_NUMFPNONPOSVAR__MASK			0x0000ff00
+#define A4XX_VPC_PACK_NUMFPNONPOSVAR__SHIFT			8
+static inline uint32_t A4XX_VPC_PACK_NUMFPNONPOSVAR(uint32_t val)
+{
+	return ((val) << A4XX_VPC_PACK_NUMFPNONPOSVAR__SHIFT) & A4XX_VPC_PACK_NUMFPNONPOSVAR__MASK;
+}
+#define A4XX_VPC_PACK_NUMNONPOSVSVAR__MASK			0x00ff0000
+#define A4XX_VPC_PACK_NUMNONPOSVSVAR__SHIFT			16
+static inline uint32_t A4XX_VPC_PACK_NUMNONPOSVSVAR(uint32_t val)
+{
+	return ((val) << A4XX_VPC_PACK_NUMNONPOSVSVAR__SHIFT) & A4XX_VPC_PACK_NUMNONPOSVSVAR__MASK;
+}
+
+static inline uint32_t REG_A4XX_VPC_VARYING_INTERP(uint32_t i0) { return 0x00002142 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_VPC_VARYING_INTERP_MODE(uint32_t i0) { return 0x00002142 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_VPC_VARYING_PS_REPL(uint32_t i0) { return 0x0000214a + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_VPC_VARYING_PS_REPL_MODE(uint32_t i0) { return 0x0000214a + 0x1*i0; }
+
+#define REG_A4XX_VPC_SO_FLUSH_WADDR_3				0x0000216e
+
+#define REG_A4XX_VSC_BIN_SIZE					0x00000c00
+#define A4XX_VSC_BIN_SIZE_WIDTH__MASK				0x0000001f
+#define A4XX_VSC_BIN_SIZE_WIDTH__SHIFT				0
+static inline uint32_t A4XX_VSC_BIN_SIZE_WIDTH(uint32_t val)
+{
+	return ((val >> 5) << A4XX_VSC_BIN_SIZE_WIDTH__SHIFT) & A4XX_VSC_BIN_SIZE_WIDTH__MASK;
+}
+#define A4XX_VSC_BIN_SIZE_HEIGHT__MASK				0x000003e0
+#define A4XX_VSC_BIN_SIZE_HEIGHT__SHIFT				5
+static inline uint32_t A4XX_VSC_BIN_SIZE_HEIGHT(uint32_t val)
+{
+	return ((val >> 5) << A4XX_VSC_BIN_SIZE_HEIGHT__SHIFT) & A4XX_VSC_BIN_SIZE_HEIGHT__MASK;
+}
+
+#define REG_A4XX_VSC_SIZE_ADDRESS				0x00000c01
+
+#define REG_A4XX_VSC_SIZE_ADDRESS2				0x00000c02
+
+#define REG_A4XX_VSC_DEBUG_ECO_CONTROL				0x00000c03
+
+static inline uint32_t REG_A4XX_VSC_PIPE_CONFIG(uint32_t i0) { return 0x00000c08 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_VSC_PIPE_CONFIG_REG(uint32_t i0) { return 0x00000c08 + 0x1*i0; }
+#define A4XX_VSC_PIPE_CONFIG_REG_X__MASK			0x000003ff
+#define A4XX_VSC_PIPE_CONFIG_REG_X__SHIFT			0
+static inline uint32_t A4XX_VSC_PIPE_CONFIG_REG_X(uint32_t val)
+{
+	return ((val) << A4XX_VSC_PIPE_CONFIG_REG_X__SHIFT) & A4XX_VSC_PIPE_CONFIG_REG_X__MASK;
+}
+#define A4XX_VSC_PIPE_CONFIG_REG_Y__MASK			0x000ffc00
+#define A4XX_VSC_PIPE_CONFIG_REG_Y__SHIFT			10
+static inline uint32_t A4XX_VSC_PIPE_CONFIG_REG_Y(uint32_t val)
+{
+	return ((val) << A4XX_VSC_PIPE_CONFIG_REG_Y__SHIFT) & A4XX_VSC_PIPE_CONFIG_REG_Y__MASK;
+}
+#define A4XX_VSC_PIPE_CONFIG_REG_W__MASK			0x00f00000
+#define A4XX_VSC_PIPE_CONFIG_REG_W__SHIFT			20
+static inline uint32_t A4XX_VSC_PIPE_CONFIG_REG_W(uint32_t val)
+{
+	return ((val) << A4XX_VSC_PIPE_CONFIG_REG_W__SHIFT) & A4XX_VSC_PIPE_CONFIG_REG_W__MASK;
+}
+#define A4XX_VSC_PIPE_CONFIG_REG_H__MASK			0x0f000000
+#define A4XX_VSC_PIPE_CONFIG_REG_H__SHIFT			24
+static inline uint32_t A4XX_VSC_PIPE_CONFIG_REG_H(uint32_t val)
+{
+	return ((val) << A4XX_VSC_PIPE_CONFIG_REG_H__SHIFT) & A4XX_VSC_PIPE_CONFIG_REG_H__MASK;
+}
+
+static inline uint32_t REG_A4XX_VSC_PIPE_DATA_ADDRESS(uint32_t i0) { return 0x00000c10 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_VSC_PIPE_DATA_ADDRESS_REG(uint32_t i0) { return 0x00000c10 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_VSC_PIPE_DATA_LENGTH(uint32_t i0) { return 0x00000c18 + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_VSC_PIPE_DATA_LENGTH_REG(uint32_t i0) { return 0x00000c18 + 0x1*i0; }
+
+#define REG_A4XX_VSC_PIPE_PARTIAL_POSN_1			0x00000c41
+
+#define REG_A4XX_VSC_PERFCTR_VSC_SEL_0				0x00000c50
+
+#define REG_A4XX_VSC_PERFCTR_VSC_SEL_1				0x00000c51
+
+#define REG_A4XX_VFD_DEBUG_CONTROL				0x00000e40
+
+#define REG_A4XX_VFD_PERFCTR_VFD_SEL_7				0x00000e4a
+
+#define REG_A4XX_VFD_CONTROL_0					0x00002200
+#define A4XX_VFD_CONTROL_0_TOTALATTRTOVS__MASK			0x000000ff
+#define A4XX_VFD_CONTROL_0_TOTALATTRTOVS__SHIFT			0
+static inline uint32_t A4XX_VFD_CONTROL_0_TOTALATTRTOVS(uint32_t val)
+{
+	return ((val) << A4XX_VFD_CONTROL_0_TOTALATTRTOVS__SHIFT) & A4XX_VFD_CONTROL_0_TOTALATTRTOVS__MASK;
+}
+#define A4XX_VFD_CONTROL_0_BYPASSATTROVS__MASK			0x0001fe00
+#define A4XX_VFD_CONTROL_0_BYPASSATTROVS__SHIFT			9
+static inline uint32_t A4XX_VFD_CONTROL_0_BYPASSATTROVS(uint32_t val)
+{
+	return ((val) << A4XX_VFD_CONTROL_0_BYPASSATTROVS__SHIFT) & A4XX_VFD_CONTROL_0_BYPASSATTROVS__MASK;
+}
+#define A4XX_VFD_CONTROL_0_STRMDECINSTRCNT__MASK		0x03f00000
+#define A4XX_VFD_CONTROL_0_STRMDECINSTRCNT__SHIFT		20
+static inline uint32_t A4XX_VFD_CONTROL_0_STRMDECINSTRCNT(uint32_t val)
+{
+	return ((val) << A4XX_VFD_CONTROL_0_STRMDECINSTRCNT__SHIFT) & A4XX_VFD_CONTROL_0_STRMDECINSTRCNT__MASK;
+}
+#define A4XX_VFD_CONTROL_0_STRMFETCHINSTRCNT__MASK		0xfc000000
+#define A4XX_VFD_CONTROL_0_STRMFETCHINSTRCNT__SHIFT		26
+static inline uint32_t A4XX_VFD_CONTROL_0_STRMFETCHINSTRCNT(uint32_t val)
+{
+	return ((val) << A4XX_VFD_CONTROL_0_STRMFETCHINSTRCNT__SHIFT) & A4XX_VFD_CONTROL_0_STRMFETCHINSTRCNT__MASK;
+}
+
+#define REG_A4XX_VFD_CONTROL_1					0x00002201
+#define A4XX_VFD_CONTROL_1_MAXSTORAGE__MASK			0x0000ffff
+#define A4XX_VFD_CONTROL_1_MAXSTORAGE__SHIFT			0
+static inline uint32_t A4XX_VFD_CONTROL_1_MAXSTORAGE(uint32_t val)
+{
+	return ((val) << A4XX_VFD_CONTROL_1_MAXSTORAGE__SHIFT) & A4XX_VFD_CONTROL_1_MAXSTORAGE__MASK;
+}
+#define A4XX_VFD_CONTROL_1_REGID4VTX__MASK			0x00ff0000
+#define A4XX_VFD_CONTROL_1_REGID4VTX__SHIFT			16
+static inline uint32_t A4XX_VFD_CONTROL_1_REGID4VTX(uint32_t val)
+{
+	return ((val) << A4XX_VFD_CONTROL_1_REGID4VTX__SHIFT) & A4XX_VFD_CONTROL_1_REGID4VTX__MASK;
+}
+#define A4XX_VFD_CONTROL_1_REGID4INST__MASK			0xff000000
+#define A4XX_VFD_CONTROL_1_REGID4INST__SHIFT			24
+static inline uint32_t A4XX_VFD_CONTROL_1_REGID4INST(uint32_t val)
+{
+	return ((val) << A4XX_VFD_CONTROL_1_REGID4INST__SHIFT) & A4XX_VFD_CONTROL_1_REGID4INST__MASK;
+}
+
+#define REG_A4XX_VFD_CONTROL_2					0x00002202
+
+#define REG_A4XX_VFD_CONTROL_3					0x00002203
+
+#define REG_A4XX_VFD_CONTROL_4					0x00002204
+
+#define REG_A4XX_VFD_INDEX_OFFSET				0x00002208
+
+static inline uint32_t REG_A4XX_VFD_FETCH(uint32_t i0) { return 0x0000220a + 0x4*i0; }
+
+static inline uint32_t REG_A4XX_VFD_FETCH_INSTR_0(uint32_t i0) { return 0x0000220a + 0x4*i0; }
+#define A4XX_VFD_FETCH_INSTR_0_FETCHSIZE__MASK			0x0000007f
+#define A4XX_VFD_FETCH_INSTR_0_FETCHSIZE__SHIFT			0
+static inline uint32_t A4XX_VFD_FETCH_INSTR_0_FETCHSIZE(uint32_t val)
+{
+	return ((val) << A4XX_VFD_FETCH_INSTR_0_FETCHSIZE__SHIFT) & A4XX_VFD_FETCH_INSTR_0_FETCHSIZE__MASK;
+}
+#define A4XX_VFD_FETCH_INSTR_0_BUFSTRIDE__MASK			0x0001ff80
+#define A4XX_VFD_FETCH_INSTR_0_BUFSTRIDE__SHIFT			7
+static inline uint32_t A4XX_VFD_FETCH_INSTR_0_BUFSTRIDE(uint32_t val)
+{
+	return ((val) << A4XX_VFD_FETCH_INSTR_0_BUFSTRIDE__SHIFT) & A4XX_VFD_FETCH_INSTR_0_BUFSTRIDE__MASK;
+}
+#define A4XX_VFD_FETCH_INSTR_0_SWITCHNEXT			0x00080000
+#define A4XX_VFD_FETCH_INSTR_0_STEPRATE__MASK			0xff000000
+#define A4XX_VFD_FETCH_INSTR_0_STEPRATE__SHIFT			24
+static inline uint32_t A4XX_VFD_FETCH_INSTR_0_STEPRATE(uint32_t val)
+{
+	return ((val) << A4XX_VFD_FETCH_INSTR_0_STEPRATE__SHIFT) & A4XX_VFD_FETCH_INSTR_0_STEPRATE__MASK;
+}
+
+static inline uint32_t REG_A4XX_VFD_FETCH_INSTR_1(uint32_t i0) { return 0x0000220b + 0x4*i0; }
+
+static inline uint32_t REG_A4XX_VFD_FETCH_INSTR_2(uint32_t i0) { return 0x0000220c + 0x4*i0; }
+#define A4XX_VFD_FETCH_INSTR_2_SIZE__MASK			0xfffffff0
+#define A4XX_VFD_FETCH_INSTR_2_SIZE__SHIFT			4
+static inline uint32_t A4XX_VFD_FETCH_INSTR_2_SIZE(uint32_t val)
+{
+	return ((val >> 4) << A4XX_VFD_FETCH_INSTR_2_SIZE__SHIFT) & A4XX_VFD_FETCH_INSTR_2_SIZE__MASK;
+}
+
+static inline uint32_t REG_A4XX_VFD_FETCH_INSTR_3(uint32_t i0) { return 0x0000220d + 0x4*i0; }
+
+static inline uint32_t REG_A4XX_VFD_DECODE(uint32_t i0) { return 0x0000228a + 0x1*i0; }
+
+static inline uint32_t REG_A4XX_VFD_DECODE_INSTR(uint32_t i0) { return 0x0000228a + 0x1*i0; }
+#define A4XX_VFD_DECODE_INSTR_WRITEMASK__MASK			0x0000000f
+#define A4XX_VFD_DECODE_INSTR_WRITEMASK__SHIFT			0
+static inline uint32_t A4XX_VFD_DECODE_INSTR_WRITEMASK(uint32_t val)
+{
+	return ((val) << A4XX_VFD_DECODE_INSTR_WRITEMASK__SHIFT) & A4XX_VFD_DECODE_INSTR_WRITEMASK__MASK;
+}
+#define A4XX_VFD_DECODE_INSTR_CONSTFILL				0x00000010
+#define A4XX_VFD_DECODE_INSTR_FORMAT__MASK			0x00000fc0
+#define A4XX_VFD_DECODE_INSTR_FORMAT__SHIFT			6
+static inline uint32_t A4XX_VFD_DECODE_INSTR_FORMAT(enum a4xx_vtx_fmt val)
+{
+	return ((val) << A4XX_VFD_DECODE_INSTR_FORMAT__SHIFT) & A4XX_VFD_DECODE_INSTR_FORMAT__MASK;
+}
+#define A4XX_VFD_DECODE_INSTR_REGID__MASK			0x000ff000
+#define A4XX_VFD_DECODE_INSTR_REGID__SHIFT			12
+static inline uint32_t A4XX_VFD_DECODE_INSTR_REGID(uint32_t val)
+{
+	return ((val) << A4XX_VFD_DECODE_INSTR_REGID__SHIFT) & A4XX_VFD_DECODE_INSTR_REGID__MASK;
+}
+#define A4XX_VFD_DECODE_INSTR_SWAP__MASK			0x00c00000
+#define A4XX_VFD_DECODE_INSTR_SWAP__SHIFT			22
+static inline uint32_t A4XX_VFD_DECODE_INSTR_SWAP(enum a3xx_color_swap val)
+{
+	return ((val) << A4XX_VFD_DECODE_INSTR_SWAP__SHIFT) & A4XX_VFD_DECODE_INSTR_SWAP__MASK;
+}
+#define A4XX_VFD_DECODE_INSTR_SHIFTCNT__MASK			0x1f000000
+#define A4XX_VFD_DECODE_INSTR_SHIFTCNT__SHIFT			24
+static inline uint32_t A4XX_VFD_DECODE_INSTR_SHIFTCNT(uint32_t val)
+{
+	return ((val) << A4XX_VFD_DECODE_INSTR_SHIFTCNT__SHIFT) & A4XX_VFD_DECODE_INSTR_SHIFTCNT__MASK;
+}
+#define A4XX_VFD_DECODE_INSTR_LASTCOMPVALID			0x20000000
+#define A4XX_VFD_DECODE_INSTR_SWITCHNEXT			0x40000000
+
+#define REG_A4XX_TPL1_DEBUG_ECO_CONTROL				0x00000f00
+
+#define REG_A4XX_TPL1_PERFCTR_TP_SEL_7				0x00000f0b
+
+#define REG_A4XX_TPL1_TP_TEX_OFFSET				0x00002380
+
+#define REG_A4XX_TPL1_TP_CS_TEXMEMOBJ_BASE_ADDR			0x000023a6
+
+#define REG_A4XX_GRAS_TSE_STATUS				0x00000c80
+
+#define REG_A4XX_GRAS_DEBUG_ECO_CONTROL				0x00000c81
+
+#define REG_A4XX_GRAS_PERFCTR_TSE_SEL_0				0x00000c88
+
+#define REG_A4XX_GRAS_PERFCTR_TSE_SEL_3				0x00000c8b
+
+#define REG_A4XX_GRAS_CL_CLIP_CNTL				0x00002000
+
+#define REG_A4XX_GRAS_CLEAR_CNTL				0x00002003
+#define A4XX_GRAS_CLEAR_CNTL_NOT_FASTCLEAR			0x00000001
+
+#define REG_A4XX_GRAS_CL_GB_CLIP_ADJ				0x00002004
+#define A4XX_GRAS_CL_GB_CLIP_ADJ_HORZ__MASK			0x000003ff
+#define A4XX_GRAS_CL_GB_CLIP_ADJ_HORZ__SHIFT			0
+static inline uint32_t A4XX_GRAS_CL_GB_CLIP_ADJ_HORZ(uint32_t val)
+{
+	return ((val) << A4XX_GRAS_CL_GB_CLIP_ADJ_HORZ__SHIFT) & A4XX_GRAS_CL_GB_CLIP_ADJ_HORZ__MASK;
+}
+#define A4XX_GRAS_CL_GB_CLIP_ADJ_VERT__MASK			0x000ffc00
+#define A4XX_GRAS_CL_GB_CLIP_ADJ_VERT__SHIFT			10
+static inline uint32_t A4XX_GRAS_CL_GB_CLIP_ADJ_VERT(uint32_t val)
+{
+	return ((val) << A4XX_GRAS_CL_GB_CLIP_ADJ_VERT__SHIFT) & A4XX_GRAS_CL_GB_CLIP_ADJ_VERT__MASK;
+}
+
+#define REG_A4XX_GRAS_CL_VPORT_XOFFSET_0			0x00002008
+#define A4XX_GRAS_CL_VPORT_XOFFSET_0__MASK			0xffffffff
+#define A4XX_GRAS_CL_VPORT_XOFFSET_0__SHIFT			0
+static inline uint32_t A4XX_GRAS_CL_VPORT_XOFFSET_0(float val)
+{
+	return ((fui(val)) << A4XX_GRAS_CL_VPORT_XOFFSET_0__SHIFT) & A4XX_GRAS_CL_VPORT_XOFFSET_0__MASK;
+}
+
+#define REG_A4XX_GRAS_CL_VPORT_XSCALE_0				0x00002009
+#define A4XX_GRAS_CL_VPORT_XSCALE_0__MASK			0xffffffff
+#define A4XX_GRAS_CL_VPORT_XSCALE_0__SHIFT			0
+static inline uint32_t A4XX_GRAS_CL_VPORT_XSCALE_0(float val)
+{
+	return ((fui(val)) << A4XX_GRAS_CL_VPORT_XSCALE_0__SHIFT) & A4XX_GRAS_CL_VPORT_XSCALE_0__MASK;
+}
+
+#define REG_A4XX_GRAS_CL_VPORT_YOFFSET_0			0x0000200a
+#define A4XX_GRAS_CL_VPORT_YOFFSET_0__MASK			0xffffffff
+#define A4XX_GRAS_CL_VPORT_YOFFSET_0__SHIFT			0
+static inline uint32_t A4XX_GRAS_CL_VPORT_YOFFSET_0(float val)
+{
+	return ((fui(val)) << A4XX_GRAS_CL_VPORT_YOFFSET_0__SHIFT) & A4XX_GRAS_CL_VPORT_YOFFSET_0__MASK;
+}
+
+#define REG_A4XX_GRAS_CL_VPORT_YSCALE_0				0x0000200b
+#define A4XX_GRAS_CL_VPORT_YSCALE_0__MASK			0xffffffff
+#define A4XX_GRAS_CL_VPORT_YSCALE_0__SHIFT			0
+static inline uint32_t A4XX_GRAS_CL_VPORT_YSCALE_0(float val)
+{
+	return ((fui(val)) << A4XX_GRAS_CL_VPORT_YSCALE_0__SHIFT) & A4XX_GRAS_CL_VPORT_YSCALE_0__MASK;
+}
+
+#define REG_A4XX_GRAS_CL_VPORT_ZOFFSET_0			0x0000200c
+#define A4XX_GRAS_CL_VPORT_ZOFFSET_0__MASK			0xffffffff
+#define A4XX_GRAS_CL_VPORT_ZOFFSET_0__SHIFT			0
+static inline uint32_t A4XX_GRAS_CL_VPORT_ZOFFSET_0(float val)
+{
+	return ((fui(val)) << A4XX_GRAS_CL_VPORT_ZOFFSET_0__SHIFT) & A4XX_GRAS_CL_VPORT_ZOFFSET_0__MASK;
+}
+
+#define REG_A4XX_GRAS_CL_VPORT_ZSCALE_0				0x0000200d
+#define A4XX_GRAS_CL_VPORT_ZSCALE_0__MASK			0xffffffff
+#define A4XX_GRAS_CL_VPORT_ZSCALE_0__SHIFT			0
+static inline uint32_t A4XX_GRAS_CL_VPORT_ZSCALE_0(float val)
+{
+	return ((fui(val)) << A4XX_GRAS_CL_VPORT_ZSCALE_0__SHIFT) & A4XX_GRAS_CL_VPORT_ZSCALE_0__MASK;
+}
+
+#define REG_A4XX_GRAS_SU_POINT_MINMAX				0x00002070
+#define A4XX_GRAS_SU_POINT_MINMAX_MIN__MASK			0x0000ffff
+#define A4XX_GRAS_SU_POINT_MINMAX_MIN__SHIFT			0
+static inline uint32_t A4XX_GRAS_SU_POINT_MINMAX_MIN(float val)
+{
+	return ((((uint32_t)(val * 16.0))) << A4XX_GRAS_SU_POINT_MINMAX_MIN__SHIFT) & A4XX_GRAS_SU_POINT_MINMAX_MIN__MASK;
+}
+#define A4XX_GRAS_SU_POINT_MINMAX_MAX__MASK			0xffff0000
+#define A4XX_GRAS_SU_POINT_MINMAX_MAX__SHIFT			16
+static inline uint32_t A4XX_GRAS_SU_POINT_MINMAX_MAX(float val)
+{
+	return ((((uint32_t)(val * 16.0))) << A4XX_GRAS_SU_POINT_MINMAX_MAX__SHIFT) & A4XX_GRAS_SU_POINT_MINMAX_MAX__MASK;
+}
+
+#define REG_A4XX_GRAS_SU_POINT_SIZE				0x00002071
+#define A4XX_GRAS_SU_POINT_SIZE__MASK				0xffffffff
+#define A4XX_GRAS_SU_POINT_SIZE__SHIFT				0
+static inline uint32_t A4XX_GRAS_SU_POINT_SIZE(float val)
+{
+	return ((((int32_t)(val * 16.0))) << A4XX_GRAS_SU_POINT_SIZE__SHIFT) & A4XX_GRAS_SU_POINT_SIZE__MASK;
+}
+
+#define REG_A4XX_GRAS_ALPHA_CONTROL				0x00002073
+#define A4XX_GRAS_ALPHA_CONTROL_ALPHA_TEST_ENABLE		0x00000004
+
+#define REG_A4XX_GRAS_SU_POLY_OFFSET_SCALE			0x00002074
+#define A4XX_GRAS_SU_POLY_OFFSET_SCALE__MASK			0xffffffff
+#define A4XX_GRAS_SU_POLY_OFFSET_SCALE__SHIFT			0
+static inline uint32_t A4XX_GRAS_SU_POLY_OFFSET_SCALE(float val)
+{
+	return ((fui(val)) << A4XX_GRAS_SU_POLY_OFFSET_SCALE__SHIFT) & A4XX_GRAS_SU_POLY_OFFSET_SCALE__MASK;
+}
+
+#define REG_A4XX_GRAS_SU_POLY_OFFSET_OFFSET			0x00002075
+#define A4XX_GRAS_SU_POLY_OFFSET_OFFSET__MASK			0xffffffff
+#define A4XX_GRAS_SU_POLY_OFFSET_OFFSET__SHIFT			0
+static inline uint32_t A4XX_GRAS_SU_POLY_OFFSET_OFFSET(float val)
+{
+	return ((fui(val)) << A4XX_GRAS_SU_POLY_OFFSET_OFFSET__SHIFT) & A4XX_GRAS_SU_POLY_OFFSET_OFFSET__MASK;
+}
+
+#define REG_A4XX_GRAS_SC_EXTENT_WINDOW_TL			0x0000209f
+
+#define REG_A4XX_GRAS_SC_SCREEN_SCISSOR_TL			0x0000207c
+#define A4XX_GRAS_SC_SCREEN_SCISSOR_TL_WINDOW_OFFSET_DISABLE	0x80000000
+#define A4XX_GRAS_SC_SCREEN_SCISSOR_TL_X__MASK			0x00007fff
+#define A4XX_GRAS_SC_SCREEN_SCISSOR_TL_X__SHIFT			0
+static inline uint32_t A4XX_GRAS_SC_SCREEN_SCISSOR_TL_X(uint32_t val)
+{
+	return ((val) << A4XX_GRAS_SC_SCREEN_SCISSOR_TL_X__SHIFT) & A4XX_GRAS_SC_SCREEN_SCISSOR_TL_X__MASK;
+}
+#define A4XX_GRAS_SC_SCREEN_SCISSOR_TL_Y__MASK			0x7fff0000
+#define A4XX_GRAS_SC_SCREEN_SCISSOR_TL_Y__SHIFT			16
+static inline uint32_t A4XX_GRAS_SC_SCREEN_SCISSOR_TL_Y(uint32_t val)
+{
+	return ((val) << A4XX_GRAS_SC_SCREEN_SCISSOR_TL_Y__SHIFT) & A4XX_GRAS_SC_SCREEN_SCISSOR_TL_Y__MASK;
+}
+
+#define REG_A4XX_GRAS_SC_SCREEN_SCISSOR_BR			0x0000207d
+#define A4XX_GRAS_SC_SCREEN_SCISSOR_BR_WINDOW_OFFSET_DISABLE	0x80000000
+#define A4XX_GRAS_SC_SCREEN_SCISSOR_BR_X__MASK			0x00007fff
+#define A4XX_GRAS_SC_SCREEN_SCISSOR_BR_X__SHIFT			0
+static inline uint32_t A4XX_GRAS_SC_SCREEN_SCISSOR_BR_X(uint32_t val)
+{
+	return ((val) << A4XX_GRAS_SC_SCREEN_SCISSOR_BR_X__SHIFT) & A4XX_GRAS_SC_SCREEN_SCISSOR_BR_X__MASK;
+}
+#define A4XX_GRAS_SC_SCREEN_SCISSOR_BR_Y__MASK			0x7fff0000
+#define A4XX_GRAS_SC_SCREEN_SCISSOR_BR_Y__SHIFT			16
+static inline uint32_t A4XX_GRAS_SC_SCREEN_SCISSOR_BR_Y(uint32_t val)
+{
+	return ((val) << A4XX_GRAS_SC_SCREEN_SCISSOR_BR_Y__SHIFT) & A4XX_GRAS_SC_SCREEN_SCISSOR_BR_Y__MASK;
+}
+
+#define REG_A4XX_GRAS_SC_WINDOW_SCISSOR_BR			0x0000209c
+#define A4XX_GRAS_SC_WINDOW_SCISSOR_BR_WINDOW_OFFSET_DISABLE	0x80000000
+#define A4XX_GRAS_SC_WINDOW_SCISSOR_BR_X__MASK			0x00007fff
+#define A4XX_GRAS_SC_WINDOW_SCISSOR_BR_X__SHIFT			0
+static inline uint32_t A4XX_GRAS_SC_WINDOW_SCISSOR_BR_X(uint32_t val)
+{
+	return ((val) << A4XX_GRAS_SC_WINDOW_SCISSOR_BR_X__SHIFT) & A4XX_GRAS_SC_WINDOW_SCISSOR_BR_X__MASK;
+}
+#define A4XX_GRAS_SC_WINDOW_SCISSOR_BR_Y__MASK			0x7fff0000
+#define A4XX_GRAS_SC_WINDOW_SCISSOR_BR_Y__SHIFT			16
+static inline uint32_t A4XX_GRAS_SC_WINDOW_SCISSOR_BR_Y(uint32_t val)
+{
+	return ((val) << A4XX_GRAS_SC_WINDOW_SCISSOR_BR_Y__SHIFT) & A4XX_GRAS_SC_WINDOW_SCISSOR_BR_Y__MASK;
+}
+
+#define REG_A4XX_GRAS_SC_WINDOW_SCISSOR_TL			0x0000209d
+#define A4XX_GRAS_SC_WINDOW_SCISSOR_TL_WINDOW_OFFSET_DISABLE	0x80000000
+#define A4XX_GRAS_SC_WINDOW_SCISSOR_TL_X__MASK			0x00007fff
+#define A4XX_GRAS_SC_WINDOW_SCISSOR_TL_X__SHIFT			0
+static inline uint32_t A4XX_GRAS_SC_WINDOW_SCISSOR_TL_X(uint32_t val)
+{
+	return ((val) << A4XX_GRAS_SC_WINDOW_SCISSOR_TL_X__SHIFT) & A4XX_GRAS_SC_WINDOW_SCISSOR_TL_X__MASK;
+}
+#define A4XX_GRAS_SC_WINDOW_SCISSOR_TL_Y__MASK			0x7fff0000
+#define A4XX_GRAS_SC_WINDOW_SCISSOR_TL_Y__SHIFT			16
+static inline uint32_t A4XX_GRAS_SC_WINDOW_SCISSOR_TL_Y(uint32_t val)
+{
+	return ((val) << A4XX_GRAS_SC_WINDOW_SCISSOR_TL_Y__SHIFT) & A4XX_GRAS_SC_WINDOW_SCISSOR_TL_Y__MASK;
+}
+
+#define REG_A4XX_GRAS_DEPTH_CONTROL				0x00002077
+#define A4XX_GRAS_DEPTH_CONTROL_FORMAT__MASK			0x00000003
+#define A4XX_GRAS_DEPTH_CONTROL_FORMAT__SHIFT			0
+static inline uint32_t A4XX_GRAS_DEPTH_CONTROL_FORMAT(enum a4xx_depth_format val)
+{
+	return ((val) << A4XX_GRAS_DEPTH_CONTROL_FORMAT__SHIFT) & A4XX_GRAS_DEPTH_CONTROL_FORMAT__MASK;
+}
+
+#define REG_A4XX_GRAS_SU_MODE_CONTROL				0x00002078
+#define A4XX_GRAS_SU_MODE_CONTROL_CULL_FRONT			0x00000001
+#define A4XX_GRAS_SU_MODE_CONTROL_CULL_BACK			0x00000002
+#define A4XX_GRAS_SU_MODE_CONTROL_FRONT_CW			0x00000004
+#define A4XX_GRAS_SU_MODE_CONTROL_LINEHALFWIDTH__MASK		0x000007f8
+#define A4XX_GRAS_SU_MODE_CONTROL_LINEHALFWIDTH__SHIFT		3
+static inline uint32_t A4XX_GRAS_SU_MODE_CONTROL_LINEHALFWIDTH(float val)
+{
+	return ((((int32_t)(val * 4.0))) << A4XX_GRAS_SU_MODE_CONTROL_LINEHALFWIDTH__SHIFT) & A4XX_GRAS_SU_MODE_CONTROL_LINEHALFWIDTH__MASK;
+}
+#define A4XX_GRAS_SU_MODE_CONTROL_POLY_OFFSET			0x00000800
+#define A4XX_GRAS_SU_MODE_CONTROL_RENDERING_PASS		0x00100000
+
+#define REG_A4XX_GRAS_SC_CONTROL				0x0000207b
+#define A4XX_GRAS_SC_CONTROL_RENDER_MODE__MASK			0x0000000c
+#define A4XX_GRAS_SC_CONTROL_RENDER_MODE__SHIFT			2
+static inline uint32_t A4XX_GRAS_SC_CONTROL_RENDER_MODE(enum a3xx_render_mode val)
+{
+	return ((val) << A4XX_GRAS_SC_CONTROL_RENDER_MODE__SHIFT) & A4XX_GRAS_SC_CONTROL_RENDER_MODE__MASK;
+}
+#define A4XX_GRAS_SC_CONTROL_MSAA_SAMPLES__MASK			0x00000380
+#define A4XX_GRAS_SC_CONTROL_MSAA_SAMPLES__SHIFT		7
+static inline uint32_t A4XX_GRAS_SC_CONTROL_MSAA_SAMPLES(uint32_t val)
+{
+	return ((val) << A4XX_GRAS_SC_CONTROL_MSAA_SAMPLES__SHIFT) & A4XX_GRAS_SC_CONTROL_MSAA_SAMPLES__MASK;
+}
+#define A4XX_GRAS_SC_CONTROL_MSAA_DISABLE			0x00000800
+#define A4XX_GRAS_SC_CONTROL_RASTER_MODE__MASK			0x0000f000
+#define A4XX_GRAS_SC_CONTROL_RASTER_MODE__SHIFT			12
+static inline uint32_t A4XX_GRAS_SC_CONTROL_RASTER_MODE(uint32_t val)
+{
+	return ((val) << A4XX_GRAS_SC_CONTROL_RASTER_MODE__SHIFT) & A4XX_GRAS_SC_CONTROL_RASTER_MODE__MASK;
+}
+
+#define REG_A4XX_UCHE_CACHE_MODE_CONTROL			0x00000e80
+
+#define REG_A4XX_UCHE_TRAP_BASE_LO				0x00000e83
+
+#define REG_A4XX_UCHE_TRAP_BASE_HI				0x00000e84
+
+#define REG_A4XX_UCHE_CACHE_STATUS				0x00000e88
+
+#define REG_A4XX_UCHE_INVALIDATE0				0x00000e8a
+
+#define REG_A4XX_UCHE_INVALIDATE1				0x00000e8b
+
+#define REG_A4XX_UCHE_CACHE_WAYS_VFD				0x00000e8c
+
+#define REG_A4XX_UCHE_PERFCTR_UCHE_SEL_7			0x00000e95
+
+#define REG_A4XX_HLSQ_TIMEOUT_THRESHOLD				0x00000e00
+
+#define REG_A4XX_HLSQ_DEBUG_ECO_CONTROL				0x00000e04
+
+#define REG_A4XX_HLSQ_PERF_PIPE_MASK				0x00000e0e
+
+#define REG_A4XX_HLSQ_CONTROL_0_REG				0x000023c0
+#define A4XX_HLSQ_CONTROL_0_REG_FSTHREADSIZE__MASK		0x00000010
+#define A4XX_HLSQ_CONTROL_0_REG_FSTHREADSIZE__SHIFT		4
+static inline uint32_t A4XX_HLSQ_CONTROL_0_REG_FSTHREADSIZE(enum a3xx_threadsize val)
+{
+	return ((val) << A4XX_HLSQ_CONTROL_0_REG_FSTHREADSIZE__SHIFT) & A4XX_HLSQ_CONTROL_0_REG_FSTHREADSIZE__MASK;
+}
+#define A4XX_HLSQ_CONTROL_0_REG_FSSUPERTHREADENABLE		0x00000040
+#define A4XX_HLSQ_CONTROL_0_REG_SPSHADERRESTART			0x00000200
+#define A4XX_HLSQ_CONTROL_0_REG_RESERVED2			0x00000400
+#define A4XX_HLSQ_CONTROL_0_REG_CHUNKDISABLE			0x04000000
+#define A4XX_HLSQ_CONTROL_0_REG_CONSTMODE__MASK			0x08000000
+#define A4XX_HLSQ_CONTROL_0_REG_CONSTMODE__SHIFT		27
+static inline uint32_t A4XX_HLSQ_CONTROL_0_REG_CONSTMODE(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_CONTROL_0_REG_CONSTMODE__SHIFT) & A4XX_HLSQ_CONTROL_0_REG_CONSTMODE__MASK;
+}
+#define A4XX_HLSQ_CONTROL_0_REG_LAZYUPDATEDISABLE		0x10000000
+#define A4XX_HLSQ_CONTROL_0_REG_SPCONSTFULLUPDATE		0x20000000
+#define A4XX_HLSQ_CONTROL_0_REG_TPFULLUPDATE			0x40000000
+#define A4XX_HLSQ_CONTROL_0_REG_SINGLECONTEXT			0x80000000
+
+#define REG_A4XX_HLSQ_CONTROL_1_REG				0x000023c1
+#define A4XX_HLSQ_CONTROL_1_REG_VSTHREADSIZE__MASK		0x00000040
+#define A4XX_HLSQ_CONTROL_1_REG_VSTHREADSIZE__SHIFT		6
+static inline uint32_t A4XX_HLSQ_CONTROL_1_REG_VSTHREADSIZE(enum a3xx_threadsize val)
+{
+	return ((val) << A4XX_HLSQ_CONTROL_1_REG_VSTHREADSIZE__SHIFT) & A4XX_HLSQ_CONTROL_1_REG_VSTHREADSIZE__MASK;
+}
+#define A4XX_HLSQ_CONTROL_1_REG_VSSUPERTHREADENABLE		0x00000100
+#define A4XX_HLSQ_CONTROL_1_REG_RESERVED1			0x00000200
+#define A4XX_HLSQ_CONTROL_1_REG_ZWCOORD				0x02000000
+
+#define REG_A4XX_HLSQ_CONTROL_2_REG				0x000023c2
+#define A4XX_HLSQ_CONTROL_2_REG_PRIMALLOCTHRESHOLD__MASK	0xfc000000
+#define A4XX_HLSQ_CONTROL_2_REG_PRIMALLOCTHRESHOLD__SHIFT	26
+static inline uint32_t A4XX_HLSQ_CONTROL_2_REG_PRIMALLOCTHRESHOLD(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_CONTROL_2_REG_PRIMALLOCTHRESHOLD__SHIFT) & A4XX_HLSQ_CONTROL_2_REG_PRIMALLOCTHRESHOLD__MASK;
+}
+
+#define REG_A4XX_HLSQ_CONTROL_3_REG				0x000023c3
+#define A4XX_HLSQ_CONTROL_3_REG_REGID__MASK			0x000000ff
+#define A4XX_HLSQ_CONTROL_3_REG_REGID__SHIFT			0
+static inline uint32_t A4XX_HLSQ_CONTROL_3_REG_REGID(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_CONTROL_3_REG_REGID__SHIFT) & A4XX_HLSQ_CONTROL_3_REG_REGID__MASK;
+}
+
+#define REG_A4XX_HLSQ_VS_CONTROL_REG				0x000023c5
+#define A4XX_HLSQ_VS_CONTROL_REG_CONSTLENGTH__MASK		0x000000ff
+#define A4XX_HLSQ_VS_CONTROL_REG_CONSTLENGTH__SHIFT		0
+static inline uint32_t A4XX_HLSQ_VS_CONTROL_REG_CONSTLENGTH(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_VS_CONTROL_REG_CONSTLENGTH__SHIFT) & A4XX_HLSQ_VS_CONTROL_REG_CONSTLENGTH__MASK;
+}
+#define A4XX_HLSQ_VS_CONTROL_REG_CONSTOBJECTOFFSET__MASK	0x0000ff00
+#define A4XX_HLSQ_VS_CONTROL_REG_CONSTOBJECTOFFSET__SHIFT	8
+static inline uint32_t A4XX_HLSQ_VS_CONTROL_REG_CONSTOBJECTOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_VS_CONTROL_REG_CONSTOBJECTOFFSET__SHIFT) & A4XX_HLSQ_VS_CONTROL_REG_CONSTOBJECTOFFSET__MASK;
+}
+#define A4XX_HLSQ_VS_CONTROL_REG_SHADEROBJOFFSET__MASK		0x00fe0000
+#define A4XX_HLSQ_VS_CONTROL_REG_SHADEROBJOFFSET__SHIFT		17
+static inline uint32_t A4XX_HLSQ_VS_CONTROL_REG_SHADEROBJOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_VS_CONTROL_REG_SHADEROBJOFFSET__SHIFT) & A4XX_HLSQ_VS_CONTROL_REG_SHADEROBJOFFSET__MASK;
+}
+#define A4XX_HLSQ_VS_CONTROL_REG_INSTRLENGTH__MASK		0xff000000
+#define A4XX_HLSQ_VS_CONTROL_REG_INSTRLENGTH__SHIFT		24
+static inline uint32_t A4XX_HLSQ_VS_CONTROL_REG_INSTRLENGTH(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_VS_CONTROL_REG_INSTRLENGTH__SHIFT) & A4XX_HLSQ_VS_CONTROL_REG_INSTRLENGTH__MASK;
+}
+
+#define REG_A4XX_HLSQ_FS_CONTROL_REG				0x000023c6
+#define A4XX_HLSQ_FS_CONTROL_REG_CONSTLENGTH__MASK		0x000000ff
+#define A4XX_HLSQ_FS_CONTROL_REG_CONSTLENGTH__SHIFT		0
+static inline uint32_t A4XX_HLSQ_FS_CONTROL_REG_CONSTLENGTH(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_FS_CONTROL_REG_CONSTLENGTH__SHIFT) & A4XX_HLSQ_FS_CONTROL_REG_CONSTLENGTH__MASK;
+}
+#define A4XX_HLSQ_FS_CONTROL_REG_CONSTOBJECTOFFSET__MASK	0x0000ff00
+#define A4XX_HLSQ_FS_CONTROL_REG_CONSTOBJECTOFFSET__SHIFT	8
+static inline uint32_t A4XX_HLSQ_FS_CONTROL_REG_CONSTOBJECTOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_FS_CONTROL_REG_CONSTOBJECTOFFSET__SHIFT) & A4XX_HLSQ_FS_CONTROL_REG_CONSTOBJECTOFFSET__MASK;
+}
+#define A4XX_HLSQ_FS_CONTROL_REG_SHADEROBJOFFSET__MASK		0x00fe0000
+#define A4XX_HLSQ_FS_CONTROL_REG_SHADEROBJOFFSET__SHIFT		17
+static inline uint32_t A4XX_HLSQ_FS_CONTROL_REG_SHADEROBJOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_FS_CONTROL_REG_SHADEROBJOFFSET__SHIFT) & A4XX_HLSQ_FS_CONTROL_REG_SHADEROBJOFFSET__MASK;
+}
+#define A4XX_HLSQ_FS_CONTROL_REG_INSTRLENGTH__MASK		0xff000000
+#define A4XX_HLSQ_FS_CONTROL_REG_INSTRLENGTH__SHIFT		24
+static inline uint32_t A4XX_HLSQ_FS_CONTROL_REG_INSTRLENGTH(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_FS_CONTROL_REG_INSTRLENGTH__SHIFT) & A4XX_HLSQ_FS_CONTROL_REG_INSTRLENGTH__MASK;
+}
+
+#define REG_A4XX_HLSQ_HS_CONTROL_REG				0x000023c7
+#define A4XX_HLSQ_HS_CONTROL_REG_CONSTLENGTH__MASK		0x000000ff
+#define A4XX_HLSQ_HS_CONTROL_REG_CONSTLENGTH__SHIFT		0
+static inline uint32_t A4XX_HLSQ_HS_CONTROL_REG_CONSTLENGTH(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_HS_CONTROL_REG_CONSTLENGTH__SHIFT) & A4XX_HLSQ_HS_CONTROL_REG_CONSTLENGTH__MASK;
+}
+#define A4XX_HLSQ_HS_CONTROL_REG_CONSTOBJECTOFFSET__MASK	0x0000ff00
+#define A4XX_HLSQ_HS_CONTROL_REG_CONSTOBJECTOFFSET__SHIFT	8
+static inline uint32_t A4XX_HLSQ_HS_CONTROL_REG_CONSTOBJECTOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_HS_CONTROL_REG_CONSTOBJECTOFFSET__SHIFT) & A4XX_HLSQ_HS_CONTROL_REG_CONSTOBJECTOFFSET__MASK;
+}
+#define A4XX_HLSQ_HS_CONTROL_REG_SHADEROBJOFFSET__MASK		0x00fe0000
+#define A4XX_HLSQ_HS_CONTROL_REG_SHADEROBJOFFSET__SHIFT		17
+static inline uint32_t A4XX_HLSQ_HS_CONTROL_REG_SHADEROBJOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_HS_CONTROL_REG_SHADEROBJOFFSET__SHIFT) & A4XX_HLSQ_HS_CONTROL_REG_SHADEROBJOFFSET__MASK;
+}
+#define A4XX_HLSQ_HS_CONTROL_REG_INSTRLENGTH__MASK		0xff000000
+#define A4XX_HLSQ_HS_CONTROL_REG_INSTRLENGTH__SHIFT		24
+static inline uint32_t A4XX_HLSQ_HS_CONTROL_REG_INSTRLENGTH(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_HS_CONTROL_REG_INSTRLENGTH__SHIFT) & A4XX_HLSQ_HS_CONTROL_REG_INSTRLENGTH__MASK;
+}
+
+#define REG_A4XX_HLSQ_DS_CONTROL_REG				0x000023c8
+#define A4XX_HLSQ_DS_CONTROL_REG_CONSTLENGTH__MASK		0x000000ff
+#define A4XX_HLSQ_DS_CONTROL_REG_CONSTLENGTH__SHIFT		0
+static inline uint32_t A4XX_HLSQ_DS_CONTROL_REG_CONSTLENGTH(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_DS_CONTROL_REG_CONSTLENGTH__SHIFT) & A4XX_HLSQ_DS_CONTROL_REG_CONSTLENGTH__MASK;
+}
+#define A4XX_HLSQ_DS_CONTROL_REG_CONSTOBJECTOFFSET__MASK	0x0000ff00
+#define A4XX_HLSQ_DS_CONTROL_REG_CONSTOBJECTOFFSET__SHIFT	8
+static inline uint32_t A4XX_HLSQ_DS_CONTROL_REG_CONSTOBJECTOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_DS_CONTROL_REG_CONSTOBJECTOFFSET__SHIFT) & A4XX_HLSQ_DS_CONTROL_REG_CONSTOBJECTOFFSET__MASK;
+}
+#define A4XX_HLSQ_DS_CONTROL_REG_SHADEROBJOFFSET__MASK		0x00fe0000
+#define A4XX_HLSQ_DS_CONTROL_REG_SHADEROBJOFFSET__SHIFT		17
+static inline uint32_t A4XX_HLSQ_DS_CONTROL_REG_SHADEROBJOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_DS_CONTROL_REG_SHADEROBJOFFSET__SHIFT) & A4XX_HLSQ_DS_CONTROL_REG_SHADEROBJOFFSET__MASK;
+}
+#define A4XX_HLSQ_DS_CONTROL_REG_INSTRLENGTH__MASK		0xff000000
+#define A4XX_HLSQ_DS_CONTROL_REG_INSTRLENGTH__SHIFT		24
+static inline uint32_t A4XX_HLSQ_DS_CONTROL_REG_INSTRLENGTH(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_DS_CONTROL_REG_INSTRLENGTH__SHIFT) & A4XX_HLSQ_DS_CONTROL_REG_INSTRLENGTH__MASK;
+}
+
+#define REG_A4XX_HLSQ_GS_CONTROL_REG				0x000023c9
+#define A4XX_HLSQ_GS_CONTROL_REG_CONSTLENGTH__MASK		0x000000ff
+#define A4XX_HLSQ_GS_CONTROL_REG_CONSTLENGTH__SHIFT		0
+static inline uint32_t A4XX_HLSQ_GS_CONTROL_REG_CONSTLENGTH(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_GS_CONTROL_REG_CONSTLENGTH__SHIFT) & A4XX_HLSQ_GS_CONTROL_REG_CONSTLENGTH__MASK;
+}
+#define A4XX_HLSQ_GS_CONTROL_REG_CONSTOBJECTOFFSET__MASK	0x0000ff00
+#define A4XX_HLSQ_GS_CONTROL_REG_CONSTOBJECTOFFSET__SHIFT	8
+static inline uint32_t A4XX_HLSQ_GS_CONTROL_REG_CONSTOBJECTOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_GS_CONTROL_REG_CONSTOBJECTOFFSET__SHIFT) & A4XX_HLSQ_GS_CONTROL_REG_CONSTOBJECTOFFSET__MASK;
+}
+#define A4XX_HLSQ_GS_CONTROL_REG_SHADEROBJOFFSET__MASK		0x00fe0000
+#define A4XX_HLSQ_GS_CONTROL_REG_SHADEROBJOFFSET__SHIFT		17
+static inline uint32_t A4XX_HLSQ_GS_CONTROL_REG_SHADEROBJOFFSET(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_GS_CONTROL_REG_SHADEROBJOFFSET__SHIFT) & A4XX_HLSQ_GS_CONTROL_REG_SHADEROBJOFFSET__MASK;
+}
+#define A4XX_HLSQ_GS_CONTROL_REG_INSTRLENGTH__MASK		0xff000000
+#define A4XX_HLSQ_GS_CONTROL_REG_INSTRLENGTH__SHIFT		24
+static inline uint32_t A4XX_HLSQ_GS_CONTROL_REG_INSTRLENGTH(uint32_t val)
+{
+	return ((val) << A4XX_HLSQ_GS_CONTROL_REG_INSTRLENGTH__SHIFT) & A4XX_HLSQ_GS_CONTROL_REG_INSTRLENGTH__MASK;
+}
+
+#define REG_A4XX_HLSQ_UPDATE_CONTROL				0x000023db
+
+#define REG_A4XX_PC_BINNING_COMMAND				0x00000d00
+#define A4XX_PC_BINNING_COMMAND_BINNING_ENABLE			0x00000001
+
+#define REG_A4XX_PC_DRAWCALL_SETUP_OVERRIDE			0x00000d0c
+
+#define REG_A4XX_PC_PERFCTR_PC_SEL_0				0x00000d10
+
+#define REG_A4XX_PC_PERFCTR_PC_SEL_7				0x00000d17
+
+#define REG_A4XX_PC_BIN_BASE					0x000021c0
+
+#define REG_A4XX_PC_PRIM_VTX_CNTL				0x000021c4
+#define A4XX_PC_PRIM_VTX_CNTL_VAROUT				0x00000001
+#define A4XX_PC_PRIM_VTX_CNTL_PROVOKING_VTX_LAST		0x02000000
+#define A4XX_PC_PRIM_VTX_CNTL_PSIZE				0x04000000
+
+#define REG_A4XX_UNKNOWN_21C5					0x000021c5
+
+#define REG_A4XX_PC_RESTART_INDEX				0x000021c6
+
+#define REG_A4XX_PC_GS_PARAM					0x000021e5
+
+#define REG_A4XX_PC_HS_PARAM					0x000021e7
+
+#define REG_A4XX_VBIF_VERSION					0x00003000
+
+#define REG_A4XX_VBIF_CLKON					0x00003001
+#define A4XX_VBIF_CLKON_FORCE_ON_TESTBUS			0x00000001
+
+#define REG_A4XX_VBIF_ABIT_SORT					0x0000301c
+
+#define REG_A4XX_VBIF_ABIT_SORT_CONF				0x0000301d
+
+#define REG_A4XX_VBIF_GATE_OFF_WRREQ_EN				0x0000302a
+
+#define REG_A4XX_VBIF_IN_RD_LIM_CONF0				0x0000302c
+
+#define REG_A4XX_VBIF_IN_RD_LIM_CONF1				0x0000302d
+
+#define REG_A4XX_VBIF_IN_WR_LIM_CONF0				0x00003030
+
+#define REG_A4XX_VBIF_IN_WR_LIM_CONF1				0x00003031
+
+#define REG_A4XX_VBIF_ROUND_ROBIN_QOS_ARB			0x00003049
+
+#define REG_A4XX_UNKNOWN_0CC5					0x00000cc5
+
+#define REG_A4XX_UNKNOWN_0CC6					0x00000cc6
+
+#define REG_A4XX_UNKNOWN_0D01					0x00000d01
+
+#define REG_A4XX_UNKNOWN_0E05					0x00000e05
+
+#define REG_A4XX_UNKNOWN_0E42					0x00000e42
+
+#define REG_A4XX_UNKNOWN_0EC2					0x00000ec2
+
+#define REG_A4XX_UNKNOWN_0EC3					0x00000ec3
+
+#define REG_A4XX_UNKNOWN_0F03					0x00000f03
+
+#define REG_A4XX_UNKNOWN_2001					0x00002001
+
+#define REG_A4XX_UNKNOWN_209B					0x0000209b
+
+#define REG_A4XX_UNKNOWN_20EF					0x000020ef
+
+#define REG_A4XX_UNKNOWN_20F0					0x000020f0
+
+#define REG_A4XX_UNKNOWN_20F1					0x000020f1
+
+#define REG_A4XX_UNKNOWN_20F2					0x000020f2
+
+#define REG_A4XX_UNKNOWN_20F3					0x000020f3
+
+#define REG_A4XX_UNKNOWN_20F4					0x000020f4
+
+#define REG_A4XX_UNKNOWN_20F5					0x000020f5
+
+#define REG_A4XX_UNKNOWN_20F6					0x000020f6
+
+#define REG_A4XX_UNKNOWN_20F7					0x000020f7
+
+#define REG_A4XX_UNKNOWN_2152					0x00002152
+
+#define REG_A4XX_UNKNOWN_2153					0x00002153
+
+#define REG_A4XX_UNKNOWN_2154					0x00002154
+
+#define REG_A4XX_UNKNOWN_2155					0x00002155
+
+#define REG_A4XX_UNKNOWN_2156					0x00002156
+
+#define REG_A4XX_UNKNOWN_2157					0x00002157
+
+#define REG_A4XX_UNKNOWN_21C3					0x000021c3
+
+#define REG_A4XX_UNKNOWN_21E6					0x000021e6
+
+#define REG_A4XX_UNKNOWN_2209					0x00002209
+
+#define REG_A4XX_UNKNOWN_22D7					0x000022d7
+
+#define REG_A4XX_UNKNOWN_2381					0x00002381
+
+#define REG_A4XX_UNKNOWN_23A0					0x000023a0
+
+#define REG_A4XX_TEX_SAMP_0					0x00000000
+#define A4XX_TEX_SAMP_0_XY_MAG__MASK				0x00000006
+#define A4XX_TEX_SAMP_0_XY_MAG__SHIFT				1
+static inline uint32_t A4XX_TEX_SAMP_0_XY_MAG(enum a4xx_tex_filter val)
+{
+	return ((val) << A4XX_TEX_SAMP_0_XY_MAG__SHIFT) & A4XX_TEX_SAMP_0_XY_MAG__MASK;
+}
+#define A4XX_TEX_SAMP_0_XY_MIN__MASK				0x00000018
+#define A4XX_TEX_SAMP_0_XY_MIN__SHIFT				3
+static inline uint32_t A4XX_TEX_SAMP_0_XY_MIN(enum a4xx_tex_filter val)
+{
+	return ((val) << A4XX_TEX_SAMP_0_XY_MIN__SHIFT) & A4XX_TEX_SAMP_0_XY_MIN__MASK;
+}
+#define A4XX_TEX_SAMP_0_WRAP_S__MASK				0x000000e0
+#define A4XX_TEX_SAMP_0_WRAP_S__SHIFT				5
+static inline uint32_t A4XX_TEX_SAMP_0_WRAP_S(enum a4xx_tex_clamp val)
+{
+	return ((val) << A4XX_TEX_SAMP_0_WRAP_S__SHIFT) & A4XX_TEX_SAMP_0_WRAP_S__MASK;
+}
+#define A4XX_TEX_SAMP_0_WRAP_T__MASK				0x00000700
+#define A4XX_TEX_SAMP_0_WRAP_T__SHIFT				8
+static inline uint32_t A4XX_TEX_SAMP_0_WRAP_T(enum a4xx_tex_clamp val)
+{
+	return ((val) << A4XX_TEX_SAMP_0_WRAP_T__SHIFT) & A4XX_TEX_SAMP_0_WRAP_T__MASK;
+}
+#define A4XX_TEX_SAMP_0_WRAP_R__MASK				0x00003800
+#define A4XX_TEX_SAMP_0_WRAP_R__SHIFT				11
+static inline uint32_t A4XX_TEX_SAMP_0_WRAP_R(enum a4xx_tex_clamp val)
+{
+	return ((val) << A4XX_TEX_SAMP_0_WRAP_R__SHIFT) & A4XX_TEX_SAMP_0_WRAP_R__MASK;
+}
+
+#define REG_A4XX_TEX_SAMP_1					0x00000001
+#define A4XX_TEX_SAMP_1_COMPARE_FUNC__MASK			0x0000000e
+#define A4XX_TEX_SAMP_1_COMPARE_FUNC__SHIFT			1
+static inline uint32_t A4XX_TEX_SAMP_1_COMPARE_FUNC(enum adreno_compare_func val)
+{
+	return ((val) << A4XX_TEX_SAMP_1_COMPARE_FUNC__SHIFT) & A4XX_TEX_SAMP_1_COMPARE_FUNC__MASK;
+}
+#define A4XX_TEX_SAMP_1_MAX_LOD__MASK				0x000fff00
+#define A4XX_TEX_SAMP_1_MAX_LOD__SHIFT				8
+static inline uint32_t A4XX_TEX_SAMP_1_MAX_LOD(float val)
+{
+	return ((((uint32_t)(val * 64.0))) << A4XX_TEX_SAMP_1_MAX_LOD__SHIFT) & A4XX_TEX_SAMP_1_MAX_LOD__MASK;
+}
+#define A4XX_TEX_SAMP_1_MIN_LOD__MASK				0xfff00000
+#define A4XX_TEX_SAMP_1_MIN_LOD__SHIFT				20
+static inline uint32_t A4XX_TEX_SAMP_1_MIN_LOD(float val)
+{
+	return ((((uint32_t)(val * 64.0))) << A4XX_TEX_SAMP_1_MIN_LOD__SHIFT) & A4XX_TEX_SAMP_1_MIN_LOD__MASK;
+}
+
+#define REG_A4XX_TEX_CONST_0					0x00000000
+#define A4XX_TEX_CONST_0_TILED					0x00000001
+#define A4XX_TEX_CONST_0_SWIZ_X__MASK				0x00000070
+#define A4XX_TEX_CONST_0_SWIZ_X__SHIFT				4
+static inline uint32_t A4XX_TEX_CONST_0_SWIZ_X(enum a4xx_tex_swiz val)
+{
+	return ((val) << A4XX_TEX_CONST_0_SWIZ_X__SHIFT) & A4XX_TEX_CONST_0_SWIZ_X__MASK;
+}
+#define A4XX_TEX_CONST_0_SWIZ_Y__MASK				0x00000380
+#define A4XX_TEX_CONST_0_SWIZ_Y__SHIFT				7
+static inline uint32_t A4XX_TEX_CONST_0_SWIZ_Y(enum a4xx_tex_swiz val)
+{
+	return ((val) << A4XX_TEX_CONST_0_SWIZ_Y__SHIFT) & A4XX_TEX_CONST_0_SWIZ_Y__MASK;
+}
+#define A4XX_TEX_CONST_0_SWIZ_Z__MASK				0x00001c00
+#define A4XX_TEX_CONST_0_SWIZ_Z__SHIFT				10
+static inline uint32_t A4XX_TEX_CONST_0_SWIZ_Z(enum a4xx_tex_swiz val)
+{
+	return ((val) << A4XX_TEX_CONST_0_SWIZ_Z__SHIFT) & A4XX_TEX_CONST_0_SWIZ_Z__MASK;
+}
+#define A4XX_TEX_CONST_0_SWIZ_W__MASK				0x0000e000
+#define A4XX_TEX_CONST_0_SWIZ_W__SHIFT				13
+static inline uint32_t A4XX_TEX_CONST_0_SWIZ_W(enum a4xx_tex_swiz val)
+{
+	return ((val) << A4XX_TEX_CONST_0_SWIZ_W__SHIFT) & A4XX_TEX_CONST_0_SWIZ_W__MASK;
+}
+#define A4XX_TEX_CONST_0_FMT__MASK				0x1fc00000
+#define A4XX_TEX_CONST_0_FMT__SHIFT				22
+static inline uint32_t A4XX_TEX_CONST_0_FMT(enum a4xx_tex_fmt val)
+{
+	return ((val) << A4XX_TEX_CONST_0_FMT__SHIFT) & A4XX_TEX_CONST_0_FMT__MASK;
+}
+#define A4XX_TEX_CONST_0_TYPE__MASK				0x60000000
+#define A4XX_TEX_CONST_0_TYPE__SHIFT				29
+static inline uint32_t A4XX_TEX_CONST_0_TYPE(enum a4xx_tex_type val)
+{
+	return ((val) << A4XX_TEX_CONST_0_TYPE__SHIFT) & A4XX_TEX_CONST_0_TYPE__MASK;
+}
+
+#define REG_A4XX_TEX_CONST_1					0x00000001
+#define A4XX_TEX_CONST_1_HEIGHT__MASK				0x00007fff
+#define A4XX_TEX_CONST_1_HEIGHT__SHIFT				0
+static inline uint32_t A4XX_TEX_CONST_1_HEIGHT(uint32_t val)
+{
+	return ((val) << A4XX_TEX_CONST_1_HEIGHT__SHIFT) & A4XX_TEX_CONST_1_HEIGHT__MASK;
+}
+#define A4XX_TEX_CONST_1_WIDTH__MASK				0x1fff8000
+#define A4XX_TEX_CONST_1_WIDTH__SHIFT				15
+static inline uint32_t A4XX_TEX_CONST_1_WIDTH(uint32_t val)
+{
+	return ((val) << A4XX_TEX_CONST_1_WIDTH__SHIFT) & A4XX_TEX_CONST_1_WIDTH__MASK;
+}
+
+#define REG_A4XX_TEX_CONST_2					0x00000002
+#define A4XX_TEX_CONST_2_PITCH__MASK				0x3ffffe00
+#define A4XX_TEX_CONST_2_PITCH__SHIFT				9
+static inline uint32_t A4XX_TEX_CONST_2_PITCH(uint32_t val)
+{
+	return ((val) << A4XX_TEX_CONST_2_PITCH__SHIFT) & A4XX_TEX_CONST_2_PITCH__MASK;
+}
+#define A4XX_TEX_CONST_2_SWAP__MASK				0xc0000000
+#define A4XX_TEX_CONST_2_SWAP__SHIFT				30
+static inline uint32_t A4XX_TEX_CONST_2_SWAP(enum a3xx_color_swap val)
+{
+	return ((val) << A4XX_TEX_CONST_2_SWAP__SHIFT) & A4XX_TEX_CONST_2_SWAP__MASK;
+}
+
+#define REG_A4XX_TEX_CONST_3					0x00000003
+#define A4XX_TEX_CONST_3_LAYERSZ__MASK				0x0000000f
+#define A4XX_TEX_CONST_3_LAYERSZ__SHIFT				0
+static inline uint32_t A4XX_TEX_CONST_3_LAYERSZ(uint32_t val)
+{
+	return ((val >> 12) << A4XX_TEX_CONST_3_LAYERSZ__SHIFT) & A4XX_TEX_CONST_3_LAYERSZ__MASK;
+}
+
+#define REG_A4XX_TEX_CONST_4					0x00000004
+#define A4XX_TEX_CONST_4_BASE__MASK				0xffffffff
+#define A4XX_TEX_CONST_4_BASE__SHIFT				0
+static inline uint32_t A4XX_TEX_CONST_4_BASE(uint32_t val)
+{
+	return ((val) << A4XX_TEX_CONST_4_BASE__SHIFT) & A4XX_TEX_CONST_4_BASE__MASK;
+}
+
+#define REG_A4XX_TEX_CONST_5					0x00000005
+
+#define REG_A4XX_TEX_CONST_6					0x00000006
+
+#define REG_A4XX_TEX_CONST_7					0x00000007
+
+
+#endif /* A4XX_XML */
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
new file mode 100644
index 000000000000..91221836c5ad
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -0,0 +1,604 @@
+/* Copyright (c) 2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+#include "a4xx_gpu.h"
+#ifdef CONFIG_MSM_OCMEM
+#  include <soc/qcom/ocmem.h>
+#endif
+
+#define A4XX_INT0_MASK \
+	(A4XX_INT0_RBBM_AHB_ERROR |        \
+	 A4XX_INT0_RBBM_ATB_BUS_OVERFLOW | \
+	 A4XX_INT0_CP_T0_PACKET_IN_IB |    \
+	 A4XX_INT0_CP_OPCODE_ERROR |       \
+	 A4XX_INT0_CP_RESERVED_BIT_ERROR | \
+	 A4XX_INT0_CP_HW_FAULT |           \
+	 A4XX_INT0_CP_IB1_INT |            \
+	 A4XX_INT0_CP_IB2_INT |            \
+	 A4XX_INT0_CP_RB_INT |             \
+	 A4XX_INT0_CP_REG_PROTECT_FAULT |  \
+	 A4XX_INT0_CP_AHB_ERROR_HALT |     \
+	 A4XX_INT0_UCHE_OOB_ACCESS)
+
+extern bool hang_debug;
+static void a4xx_dump(struct msm_gpu *gpu);
+
+/*
+ * a4xx_enable_hwcg() - Program the clock control registers
+ * @device: The adreno device pointer
+ */
+static void a4xx_enable_hwcg(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	unsigned int i;
+	for (i = 0; i < 4; i++)
+		gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL_TP(i), 0x02222202);
+	for (i = 0; i < 4; i++)
+		gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL2_TP(i), 0x00002222);
+	for (i = 0; i < 4; i++)
+		gpu_write(gpu, REG_A4XX_RBBM_CLOCK_HYST_TP(i), 0x0E739CE7);
+	for (i = 0; i < 4; i++)
+		gpu_write(gpu, REG_A4XX_RBBM_CLOCK_DELAY_TP(i), 0x00111111);
+	for (i = 0; i < 4; i++)
+		gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL_SP(i), 0x22222222);
+	for (i = 0; i < 4; i++)
+		gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL2_SP(i), 0x00222222);
+	for (i = 0; i < 4; i++)
+		gpu_write(gpu, REG_A4XX_RBBM_CLOCK_HYST_SP(i), 0x00000104);
+	for (i = 0; i < 4; i++)
+		gpu_write(gpu, REG_A4XX_RBBM_CLOCK_DELAY_SP(i), 0x00000081);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL_UCHE, 0x22222222);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL2_UCHE, 0x02222222);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL3_UCHE, 0x00000000);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL4_UCHE, 0x00000000);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_HYST_UCHE, 0x00004444);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_DELAY_UCHE, 0x00001112);
+	for (i = 0; i < 4; i++)
+		gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL_RB(i), 0x22222222);
+
+	/* Disable L1 clocking in A420 due to CCU issues with it */
+	for (i = 0; i < 4; i++) {
+		if (adreno_is_a420(adreno_gpu)) {
+			gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL2_RB(i),
+					0x00002020);
+		} else {
+			gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL2_RB(i),
+					0x00022020);
+		}
+	}
+
+	for (i = 0; i < 4; i++) {
+		gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL_MARB_CCU(i),
+				0x00000922);
+	}
+
+	for (i = 0; i < 4; i++) {
+		gpu_write(gpu, REG_A4XX_RBBM_CLOCK_HYST_RB_MARB_CCU(i),
+				0x00000000);
+	}
+
+	for (i = 0; i < 4; i++) {
+		gpu_write(gpu, REG_A4XX_RBBM_CLOCK_DELAY_RB_MARB_CCU_L1(i),
+				0x00000001);
+	}
+
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_MODE_GPC, 0x02222222);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_HYST_GPC, 0x04100104);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_DELAY_GPC, 0x00022222);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL_COM_DCOM, 0x00000022);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_HYST_COM_DCOM, 0x0000010F);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_DELAY_COM_DCOM, 0x00000022);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL_TSE_RAS_RBBM, 0x00222222);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_HYST_TSE_RAS_RBBM, 0x00004104);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_DELAY_TSE_RAS_RBBM, 0x00000222);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL_HLSQ , 0x00000000);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_HYST_HLSQ, 0x00000000);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_DELAY_HLSQ, 0x00020000);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL, 0xAAAAAAAA);
+	gpu_write(gpu, REG_A4XX_RBBM_CLOCK_CTL2, 0);
+}
+
+static void a4xx_me_init(struct msm_gpu *gpu)
+{
+	struct msm_ringbuffer *ring = gpu->rb;
+
+	OUT_PKT3(ring, CP_ME_INIT, 17);
+	OUT_RING(ring, 0x000003f7);
+	OUT_RING(ring, 0x00000000);
+	OUT_RING(ring, 0x00000000);
+	OUT_RING(ring, 0x00000000);
+	OUT_RING(ring, 0x00000080);
+	OUT_RING(ring, 0x00000100);
+	OUT_RING(ring, 0x00000180);
+	OUT_RING(ring, 0x00006600);
+	OUT_RING(ring, 0x00000150);
+	OUT_RING(ring, 0x0000014e);
+	OUT_RING(ring, 0x00000154);
+	OUT_RING(ring, 0x00000001);
+	OUT_RING(ring, 0x00000000);
+	OUT_RING(ring, 0x00000000);
+	OUT_RING(ring, 0x00000000);
+	OUT_RING(ring, 0x00000000);
+	OUT_RING(ring, 0x00000000);
+
+	gpu->funcs->flush(gpu);
+	gpu->funcs->idle(gpu);
+}
+
+static int a4xx_hw_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a4xx_gpu *a4xx_gpu = to_a4xx_gpu(adreno_gpu);
+	uint32_t *ptr, len;
+	int i, ret;
+
+	if (adreno_is_a4xx(adreno_gpu)) {
+		gpu_write(gpu, REG_A4XX_VBIF_ABIT_SORT, 0x0001001F);
+		gpu_write(gpu, REG_A4XX_VBIF_ABIT_SORT_CONF, 0x000000A4);
+		gpu_write(gpu, REG_A4XX_VBIF_GATE_OFF_WRREQ_EN, 0x00000001);
+		gpu_write(gpu, REG_A4XX_VBIF_IN_RD_LIM_CONF0, 0x18181818);
+		gpu_write(gpu, REG_A4XX_VBIF_IN_RD_LIM_CONF1, 0x00000018);
+		gpu_write(gpu, REG_A4XX_VBIF_IN_WR_LIM_CONF0, 0x18181818);
+		gpu_write(gpu, REG_A4XX_VBIF_IN_WR_LIM_CONF1, 0x00000018);
+		gpu_write(gpu, REG_A4XX_VBIF_ROUND_ROBIN_QOS_ARB, 0x00000003);
+	} else {
+		BUG();
+	}
+
+	/* Make all blocks contribute to the GPU BUSY perf counter */
+	gpu_write(gpu, REG_A4XX_RBBM_GPU_BUSY_MASKED, 0xffffffff);
+
+	/* Tune the hystersis counters for SP and CP idle detection */
+	gpu_write(gpu, REG_A4XX_RBBM_SP_HYST_CNT, 0x10);
+	gpu_write(gpu, REG_A4XX_RBBM_WAIT_IDLE_CLOCKS_CTL, 0x10);
+
+	 /* Enable the RBBM error reporting bits */
+	gpu_write(gpu, REG_A4XX_RBBM_AHB_CTL0, 0x00000001);
+
+	/* Enable AHB error reporting*/
+	gpu_write(gpu, REG_A4XX_RBBM_AHB_CTL1, 0xa6ffffff);
+
+	/* Enable power counters*/
+	gpu_write(gpu, REG_A4XX_RBBM_RBBM_CTL, 0x00000030);
+
+	/*
+	 * Turn on hang detection - this spews a lot of useful information
+	 * into the RBBM registers on a hang:
+	 */
+	gpu_write(gpu, REG_A4XX_RBBM_INTERFACE_HANG_INT_CTL,
+			(1 << 30) | 0xFFFF);
+
+	gpu_write(gpu, REG_A4XX_RB_GMEM_BASE_ADDR,
+			(unsigned int)(a4xx_gpu->ocmem_base >> 14));
+
+	/* Turn on performance counters: */
+	gpu_write(gpu, REG_A4XX_RBBM_PERFCTR_CTL, 0x01);
+
+	/* Disable L2 bypass to avoid UCHE out of bounds errors */
+	gpu_write(gpu, REG_A4XX_UCHE_TRAP_BASE_LO, 0xffff0000);
+	gpu_write(gpu, REG_A4XX_UCHE_TRAP_BASE_HI, 0xffff0000);
+
+	gpu_write(gpu, REG_A4XX_CP_DEBUG, (1 << 25) |
+			(adreno_is_a420(adreno_gpu) ? (1 << 29) : 0));
+
+	a4xx_enable_hwcg(gpu);
+
+	/*
+	 * For A420 set RBBM_CLOCK_DELAY_HLSQ.CGC_HLSQ_TP_EARLY_CYC >= 2
+	 * due to timing issue with HLSQ_TP_CLK_EN
+	 */
+	if (adreno_is_a420(adreno_gpu)) {
+		unsigned int val;
+		val = gpu_read(gpu, REG_A4XX_RBBM_CLOCK_DELAY_HLSQ);
+		val &= ~A4XX_CGC_HLSQ_EARLY_CYC__MASK;
+		val |= 2 << A4XX_CGC_HLSQ_EARLY_CYC__SHIFT;
+		gpu_write(gpu, REG_A4XX_RBBM_CLOCK_DELAY_HLSQ, val);
+	}
+
+	ret = adreno_hw_init(gpu);
+	if (ret)
+		return ret;
+
+	/* setup access protection: */
+	gpu_write(gpu, REG_A4XX_CP_PROTECT_CTRL, 0x00000007);
+
+	/* RBBM registers */
+	gpu_write(gpu, REG_A4XX_CP_PROTECT(0), 0x62000010);
+	gpu_write(gpu, REG_A4XX_CP_PROTECT(1), 0x63000020);
+	gpu_write(gpu, REG_A4XX_CP_PROTECT(2), 0x64000040);
+	gpu_write(gpu, REG_A4XX_CP_PROTECT(3), 0x65000080);
+	gpu_write(gpu, REG_A4XX_CP_PROTECT(4), 0x66000100);
+	gpu_write(gpu, REG_A4XX_CP_PROTECT(5), 0x64000200);
+
+	/* CP registers */
+	gpu_write(gpu, REG_A4XX_CP_PROTECT(6), 0x67000800);
+	gpu_write(gpu, REG_A4XX_CP_PROTECT(7), 0x64001600);
+
+
+	/* RB registers */
+	gpu_write(gpu, REG_A4XX_CP_PROTECT(8), 0x60003300);
+
+	/* HLSQ registers */
+	gpu_write(gpu, REG_A4XX_CP_PROTECT(9), 0x60003800);
+
+	/* VPC registers */
+	gpu_write(gpu, REG_A4XX_CP_PROTECT(10), 0x61003980);
+
+	/* SMMU registers */
+	gpu_write(gpu, REG_A4XX_CP_PROTECT(11), 0x6e010000);
+
+	gpu_write(gpu, REG_A4XX_RBBM_INT_0_MASK, A4XX_INT0_MASK);
+
+	ret = adreno_hw_init(gpu);
+	if (ret)
+		return ret;
+
+	/* Load PM4: */
+	ptr = (uint32_t *)(adreno_gpu->pm4->data);
+	len = adreno_gpu->pm4->size / 4;
+	DBG("loading PM4 ucode version: %u", ptr[0]);
+	gpu_write(gpu, REG_A4XX_CP_ME_RAM_WADDR, 0);
+	for (i = 1; i < len; i++)
+		gpu_write(gpu, REG_A4XX_CP_ME_RAM_DATA, ptr[i]);
+
+	/* Load PFP: */
+	ptr = (uint32_t *)(adreno_gpu->pfp->data);
+	len = adreno_gpu->pfp->size / 4;
+	DBG("loading PFP ucode version: %u", ptr[0]);
+
+	gpu_write(gpu, REG_A4XX_CP_PFP_UCODE_ADDR, 0);
+	for (i = 1; i < len; i++)
+		gpu_write(gpu, REG_A4XX_CP_PFP_UCODE_DATA, ptr[i]);
+
+	/* clear ME_HALT to start micro engine */
+	gpu_write(gpu, REG_A4XX_CP_ME_CNTL, 0);
+
+	a4xx_me_init(gpu);
+	return 0;
+}
+
+static void a4xx_recover(struct msm_gpu *gpu)
+{
+	/* dump registers before resetting gpu, if enabled: */
+	if (hang_debug)
+		a4xx_dump(gpu);
+
+	gpu_write(gpu, REG_A4XX_RBBM_SW_RESET_CMD, 1);
+	gpu_read(gpu, REG_A4XX_RBBM_SW_RESET_CMD);
+	gpu_write(gpu, REG_A4XX_RBBM_SW_RESET_CMD, 0);
+	adreno_recover(gpu);
+}
+
+static void a4xx_destroy(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a4xx_gpu *a4xx_gpu = to_a4xx_gpu(adreno_gpu);
+
+	DBG("%s", gpu->name);
+
+	adreno_gpu_cleanup(adreno_gpu);
+
+#ifdef CONFIG_MSM_OCMEM
+	if (a4xx_gpu->ocmem_base)
+		ocmem_free(OCMEM_GRAPHICS, a4xx_gpu->ocmem_hdl);
+#endif
+
+	kfree(a4xx_gpu);
+}
+
+static void a4xx_idle(struct msm_gpu *gpu)
+{
+	/* wait for ringbuffer to drain: */
+	adreno_idle(gpu);
+
+	/* then wait for GPU to finish: */
+	if (spin_until(!(gpu_read(gpu, REG_A4XX_RBBM_STATUS) &
+					A4XX_RBBM_STATUS_GPU_BUSY)))
+		DRM_ERROR("%s: timeout waiting for GPU to idle!\n", gpu->name);
+
+	/* TODO maybe we need to reset GPU here to recover from hang? */
+}
+
+static irqreturn_t a4xx_irq(struct msm_gpu *gpu)
+{
+	uint32_t status;
+
+	status = gpu_read(gpu, REG_A4XX_RBBM_INT_0_STATUS);
+	DBG("%s: Int status %08x", gpu->name, status);
+
+	gpu_write(gpu, REG_A4XX_RBBM_INT_CLEAR_CMD, status);
+
+	msm_gpu_retire(gpu);
+
+	return IRQ_HANDLED;
+}
+
+static const unsigned int a4xx_registers[] = {
+	/* RBBM */
+	0x0000, 0x0002, 0x0004, 0x0021, 0x0023, 0x0024, 0x0026, 0x0026,
+	0x0028, 0x002B, 0x002E, 0x0034, 0x0037, 0x0044, 0x0047, 0x0066,
+	0x0068, 0x0095, 0x009C, 0x0170, 0x0174, 0x01AF,
+	/* CP */
+	0x0200, 0x0233, 0x0240, 0x0250, 0x04C0, 0x04DD, 0x0500, 0x050B,
+	0x0578, 0x058F,
+	/* VSC */
+	0x0C00, 0x0C03, 0x0C08, 0x0C41, 0x0C50, 0x0C51,
+	/* GRAS */
+	0x0C80, 0x0C81, 0x0C88, 0x0C8F,
+	/* RB */
+	0x0CC0, 0x0CC0, 0x0CC4, 0x0CD2,
+	/* PC */
+	0x0D00, 0x0D0C, 0x0D10, 0x0D17, 0x0D20, 0x0D23,
+	/* VFD */
+	0x0E40, 0x0E4A,
+	/* VPC */
+	0x0E60, 0x0E61, 0x0E63, 0x0E68,
+	/* UCHE */
+	0x0E80, 0x0E84, 0x0E88, 0x0E95,
+	/* VMIDMT */
+	0x1000, 0x1000, 0x1002, 0x1002, 0x1004, 0x1004, 0x1008, 0x100A,
+	0x100C, 0x100D, 0x100F, 0x1010, 0x1012, 0x1016, 0x1024, 0x1024,
+	0x1027, 0x1027, 0x1100, 0x1100, 0x1102, 0x1102, 0x1104, 0x1104,
+	0x1110, 0x1110, 0x1112, 0x1116, 0x1124, 0x1124, 0x1300, 0x1300,
+	0x1380, 0x1380,
+	/* GRAS CTX 0 */
+	0x2000, 0x2004, 0x2008, 0x2067, 0x2070, 0x2078, 0x207B, 0x216E,
+	/* PC CTX 0 */
+	0x21C0, 0x21C6, 0x21D0, 0x21D0, 0x21D9, 0x21D9, 0x21E5, 0x21E7,
+	/* VFD CTX 0 */
+	0x2200, 0x2204, 0x2208, 0x22A9,
+	/* GRAS CTX 1 */
+	0x2400, 0x2404, 0x2408, 0x2467, 0x2470, 0x2478, 0x247B, 0x256E,
+	/* PC CTX 1 */
+	0x25C0, 0x25C6, 0x25D0, 0x25D0, 0x25D9, 0x25D9, 0x25E5, 0x25E7,
+	/* VFD CTX 1 */
+	0x2600, 0x2604, 0x2608, 0x26A9,
+	/* XPU */
+	0x2C00, 0x2C01, 0x2C10, 0x2C10, 0x2C12, 0x2C16, 0x2C1D, 0x2C20,
+	0x2C28, 0x2C28, 0x2C30, 0x2C30, 0x2C32, 0x2C36, 0x2C40, 0x2C40,
+	0x2C50, 0x2C50, 0x2C52, 0x2C56, 0x2C80, 0x2C80, 0x2C94, 0x2C95,
+	/* VBIF */
+	0x3000, 0x3007, 0x300C, 0x3014, 0x3018, 0x301D, 0x3020, 0x3022,
+	0x3024, 0x3026, 0x3028, 0x302A, 0x302C, 0x302D, 0x3030, 0x3031,
+	0x3034, 0x3036, 0x3038, 0x3038, 0x303C, 0x303D, 0x3040, 0x3040,
+	0x3049, 0x3049, 0x3058, 0x3058, 0x305B, 0x3061, 0x3064, 0x3068,
+	0x306C, 0x306D, 0x3080, 0x3088, 0x308B, 0x308C, 0x3090, 0x3094,
+	0x3098, 0x3098, 0x309C, 0x309C, 0x30C0, 0x30C0, 0x30C8, 0x30C8,
+	0x30D0, 0x30D0, 0x30D8, 0x30D8, 0x30E0, 0x30E0, 0x3100, 0x3100,
+	0x3108, 0x3108, 0x3110, 0x3110, 0x3118, 0x3118, 0x3120, 0x3120,
+	0x3124, 0x3125, 0x3129, 0x3129, 0x3131, 0x3131, 0x330C, 0x330C,
+	0x3310, 0x3310, 0x3400, 0x3401, 0x3410, 0x3410, 0x3412, 0x3416,
+	0x341D, 0x3420, 0x3428, 0x3428, 0x3430, 0x3430, 0x3432, 0x3436,
+	0x3440, 0x3440, 0x3450, 0x3450, 0x3452, 0x3456, 0x3480, 0x3480,
+	0x3494, 0x3495, 0x4000, 0x4000, 0x4002, 0x4002, 0x4004, 0x4004,
+	0x4008, 0x400A, 0x400C, 0x400D, 0x400F, 0x4012, 0x4014, 0x4016,
+	0x401D, 0x401D, 0x4020, 0x4027, 0x4060, 0x4062, 0x4200, 0x4200,
+	0x4300, 0x4300, 0x4400, 0x4400, 0x4500, 0x4500, 0x4800, 0x4802,
+	0x480F, 0x480F, 0x4811, 0x4811, 0x4813, 0x4813, 0x4815, 0x4816,
+	0x482B, 0x482B, 0x4857, 0x4857, 0x4883, 0x4883, 0x48AF, 0x48AF,
+	0x48C5, 0x48C5, 0x48E5, 0x48E5, 0x4905, 0x4905, 0x4925, 0x4925,
+	0x4945, 0x4945, 0x4950, 0x4950, 0x495B, 0x495B, 0x4980, 0x498E,
+	0x4B00, 0x4B00, 0x4C00, 0x4C00, 0x4D00, 0x4D00, 0x4E00, 0x4E00,
+	0x4E80, 0x4E80, 0x4F00, 0x4F00, 0x4F08, 0x4F08, 0x4F10, 0x4F10,
+	0x4F18, 0x4F18, 0x4F20, 0x4F20, 0x4F30, 0x4F30, 0x4F60, 0x4F60,
+	0x4F80, 0x4F81, 0x4F88, 0x4F89, 0x4FEE, 0x4FEE, 0x4FF3, 0x4FF3,
+	0x6000, 0x6001, 0x6008, 0x600F, 0x6014, 0x6016, 0x6018, 0x601B,
+	0x61FD, 0x61FD, 0x623C, 0x623C, 0x6380, 0x6380, 0x63A0, 0x63A0,
+	0x63C0, 0x63C1, 0x63C8, 0x63C9, 0x63D0, 0x63D4, 0x63D6, 0x63D6,
+	0x63EE, 0x63EE, 0x6400, 0x6401, 0x6408, 0x640F, 0x6414, 0x6416,
+	0x6418, 0x641B, 0x65FD, 0x65FD, 0x663C, 0x663C, 0x6780, 0x6780,
+	0x67A0, 0x67A0, 0x67C0, 0x67C1, 0x67C8, 0x67C9, 0x67D0, 0x67D4,
+	0x67D6, 0x67D6, 0x67EE, 0x67EE, 0x6800, 0x6801, 0x6808, 0x680F,
+	0x6814, 0x6816, 0x6818, 0x681B, 0x69FD, 0x69FD, 0x6A3C, 0x6A3C,
+	0x6B80, 0x6B80, 0x6BA0, 0x6BA0, 0x6BC0, 0x6BC1, 0x6BC8, 0x6BC9,
+	0x6BD0, 0x6BD4, 0x6BD6, 0x6BD6, 0x6BEE, 0x6BEE,
+	~0 /* sentinel */
+};
+
+#ifdef CONFIG_DEBUG_FS
+static void a4xx_show(struct msm_gpu *gpu, struct seq_file *m)
+{
+	gpu->funcs->pm_resume(gpu);
+
+	seq_printf(m, "status:   %08x\n",
+			gpu_read(gpu, REG_A4XX_RBBM_STATUS));
+	gpu->funcs->pm_suspend(gpu);
+
+	adreno_show(gpu, m);
+
+}
+#endif
+
+/* Register offset defines for A4XX, in order of enum adreno_regs */
+static const unsigned int a4xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_DEBUG, REG_A4XX_CP_DEBUG),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_RAM_WADDR, REG_A4XX_CP_ME_RAM_WADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_RAM_DATA, REG_A4XX_CP_ME_RAM_DATA),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_PFP_UCODE_DATA,
+			REG_A4XX_CP_PFP_UCODE_DATA),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_PFP_UCODE_ADDR,
+			REG_A4XX_CP_PFP_UCODE_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_WFI_PEND_CTR, REG_A4XX_CP_WFI_PEND_CTR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_A4XX_CP_RB_BASE),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR_ADDR, REG_A4XX_CP_RB_RPTR_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR, REG_A4XX_CP_RB_RPTR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_WPTR, REG_A4XX_CP_RB_WPTR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_PROTECT_CTRL, REG_A4XX_CP_PROTECT_CTRL),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_CNTL, REG_A4XX_CP_ME_CNTL),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_CNTL, REG_A4XX_CP_RB_CNTL),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB1_BASE, REG_A4XX_CP_IB1_BASE),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB1_BUFSZ, REG_A4XX_CP_IB1_BUFSZ),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB2_BASE, REG_A4XX_CP_IB2_BASE),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB2_BUFSZ, REG_A4XX_CP_IB2_BUFSZ),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_TIMESTAMP, REG_AXXX_CP_SCRATCH_REG0),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_RAM_RADDR, REG_A4XX_CP_ME_RAM_RADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_ROQ_ADDR, REG_A4XX_CP_ROQ_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_ROQ_DATA, REG_A4XX_CP_ROQ_DATA),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_MERCIU_ADDR, REG_A4XX_CP_MERCIU_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_MERCIU_DATA, REG_A4XX_CP_MERCIU_DATA),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_MERCIU_DATA2, REG_A4XX_CP_MERCIU_DATA2),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_MEQ_ADDR, REG_A4XX_CP_MEQ_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_MEQ_DATA, REG_A4XX_CP_MEQ_DATA),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_HW_FAULT, REG_A4XX_CP_HW_FAULT),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_PROTECT_STATUS,
+			REG_A4XX_CP_PROTECT_STATUS),
+	REG_ADRENO_DEFINE(REG_ADRENO_SCRATCH_ADDR, REG_A4XX_CP_SCRATCH_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_SCRATCH_UMSK, REG_A4XX_CP_SCRATCH_UMASK),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_STATUS, REG_A4XX_RBBM_STATUS),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_CTL,
+			REG_A4XX_RBBM_PERFCTR_CTL),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_CMD0,
+			REG_A4XX_RBBM_PERFCTR_LOAD_CMD0),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_CMD1,
+			REG_A4XX_RBBM_PERFCTR_LOAD_CMD1),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_CMD2,
+			REG_A4XX_RBBM_PERFCTR_LOAD_CMD2),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_PWR_1_LO,
+			REG_A4XX_RBBM_PERFCTR_PWR_1_LO),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_INT_0_MASK, REG_A4XX_RBBM_INT_0_MASK),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_INT_0_STATUS,
+			REG_A4XX_RBBM_INT_0_STATUS),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_AHB_ERROR_STATUS,
+			REG_A4XX_RBBM_AHB_ERROR_STATUS),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_AHB_CMD, REG_A4XX_RBBM_AHB_CMD),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_CLOCK_CTL, REG_A4XX_RBBM_CLOCK_CTL),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_AHB_ME_SPLIT_STATUS,
+			REG_A4XX_RBBM_AHB_ME_SPLIT_STATUS),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_AHB_PFP_SPLIT_STATUS,
+			REG_A4XX_RBBM_AHB_PFP_SPLIT_STATUS),
+	REG_ADRENO_DEFINE(REG_ADRENO_VPC_DEBUG_RAM_SEL,
+			REG_A4XX_VPC_DEBUG_RAM_SEL),
+	REG_ADRENO_DEFINE(REG_ADRENO_VPC_DEBUG_RAM_READ,
+			REG_A4XX_VPC_DEBUG_RAM_READ),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_INT_CLEAR_CMD,
+			REG_A4XX_RBBM_INT_CLEAR_CMD),
+	REG_ADRENO_DEFINE(REG_ADRENO_VSC_SIZE_ADDRESS,
+			REG_A4XX_VSC_SIZE_ADDRESS),
+	REG_ADRENO_DEFINE(REG_ADRENO_VFD_CONTROL_0, REG_A4XX_VFD_CONTROL_0),
+	REG_ADRENO_DEFINE(REG_ADRENO_SP_VS_PVT_MEM_ADDR_REG,
+			REG_A4XX_SP_VS_PVT_MEM_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_SP_FS_PVT_MEM_ADDR_REG,
+			REG_A4XX_SP_FS_PVT_MEM_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_SP_VS_OBJ_START_REG,
+			REG_A4XX_SP_VS_OBJ_START),
+	REG_ADRENO_DEFINE(REG_ADRENO_SP_FS_OBJ_START_REG,
+			REG_A4XX_SP_FS_OBJ_START),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_RBBM_CTL, REG_A4XX_RBBM_RBBM_CTL),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_SW_RESET_CMD,
+			REG_A4XX_RBBM_SW_RESET_CMD),
+	REG_ADRENO_DEFINE(REG_ADRENO_UCHE_INVALIDATE0,
+			REG_A4XX_UCHE_INVALIDATE0),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_VALUE_LO,
+			REG_A4XX_RBBM_PERFCTR_LOAD_VALUE_LO),
+	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_VALUE_HI,
+			REG_A4XX_RBBM_PERFCTR_LOAD_VALUE_HI),
+};
+
+static void a4xx_dump(struct msm_gpu *gpu)
+{
+	adreno_dump(gpu);
+	printk("status:   %08x\n",
+			gpu_read(gpu, REG_A4XX_RBBM_STATUS));
+	adreno_dump(gpu);
+}
+
+static const struct adreno_gpu_funcs funcs = {
+	.base = {
+		.get_param = adreno_get_param,
+		.hw_init = a4xx_hw_init,
+		.pm_suspend = msm_gpu_pm_suspend,
+		.pm_resume = msm_gpu_pm_resume,
+		.recover = a4xx_recover,
+		.last_fence = adreno_last_fence,
+		.submit = adreno_submit,
+		.flush = adreno_flush,
+		.idle = a4xx_idle,
+		.irq = a4xx_irq,
+		.destroy = a4xx_destroy,
+#ifdef CONFIG_DEBUG_FS
+		.show = a4xx_show,
+#endif
+	},
+};
+
+struct msm_gpu *a4xx_gpu_init(struct drm_device *dev)
+{
+	struct a4xx_gpu *a4xx_gpu = NULL;
+	struct adreno_gpu *adreno_gpu;
+	struct msm_gpu *gpu;
+	struct msm_drm_private *priv = dev->dev_private;
+	struct platform_device *pdev = priv->gpu_pdev;
+	int ret;
+
+	if (!pdev) {
+		dev_err(dev->dev, "no a4xx device\n");
+		ret = -ENXIO;
+		goto fail;
+	}
+
+	a4xx_gpu = kzalloc(sizeof(*a4xx_gpu), GFP_KERNEL);
+	if (!a4xx_gpu) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	adreno_gpu = &a4xx_gpu->base;
+	gpu = &adreno_gpu->base;
+
+	a4xx_gpu->pdev = pdev;
+
+	gpu->perfcntrs = NULL;
+	gpu->num_perfcntrs = 0;
+
+	adreno_gpu->registers = a4xx_registers;
+	adreno_gpu->reg_offsets = a4xx_register_offsets;
+
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs);
+	if (ret)
+		goto fail;
+
+	/* if needed, allocate gmem: */
+	if (adreno_is_a4xx(adreno_gpu)) {
+#ifdef CONFIG_MSM_OCMEM
+		/* TODO this is different/missing upstream: */
+		struct ocmem_buf *ocmem_hdl =
+				ocmem_allocate(OCMEM_GRAPHICS, adreno_gpu->gmem);
+
+		a4xx_gpu->ocmem_hdl = ocmem_hdl;
+		a4xx_gpu->ocmem_base = ocmem_hdl->addr;
+		adreno_gpu->gmem = ocmem_hdl->len;
+		DBG("using %dK of OCMEM at 0x%08x", adreno_gpu->gmem / 1024,
+				a4xx_gpu->ocmem_base);
+#endif
+	}
+
+	if (!gpu->mmu) {
+		/* TODO we think it is possible to configure the GPU to
+		 * restrict access to VRAM carveout.  But the required
+		 * registers are unknown.  For now just bail out and
+		 * limp along with just modesetting.  If it turns out
+		 * to not be possible to restrict access, then we must
+		 * implement a cmdstream validator.
+		 */
+		dev_err(dev->dev, "No memory protection without IOMMU\n");
+		ret = -ENXIO;
+		goto fail;
+	}
+
+	return gpu;
+
+fail:
+	if (a4xx_gpu)
+		a4xx_destroy(&a4xx_gpu->base.base);
+
+	return ERR_PTR(ret);
+}
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.h b/drivers/gpu/drm/msm/adreno/a4xx_gpu.h
new file mode 100644
index 000000000000..01247204ac92
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.h
@@ -0,0 +1,34 @@
+/* Copyright (c) 2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+#ifndef __A4XX_GPU_H__
+#define __A4XX_GPU_H__
+
+#include "adreno_gpu.h"
+
+/* arrg, somehow fb.h is getting pulled in: */
+#undef ROP_COPY
+#undef ROP_XOR
+
+#include "a4xx.xml.h"
+
+struct a4xx_gpu {
+	struct adreno_gpu base;
+	struct platform_device *pdev;
+
+	/* if OCMEM is used for GMEM: */
+	uint32_t ocmem_base;
+	void *ocmem_hdl;
+};
+#define to_a4xx_gpu(x) container_of(x, struct a4xx_gpu, base)
+
+#endif /* __A4XX_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/adreno/adreno_common.xml.h b/drivers/gpu/drm/msm/adreno/adreno_common.xml.h
index cc341bc62b51..a4b33af9338d 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_common.xml.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_common.xml.h
@@ -11,10 +11,10 @@ The rules-ng-ng source files this header was generated from are:
 - /home/robclark/src/freedreno/envytools/rnndb/adreno.xml               (    364 bytes, from 2013-11-30 14:47:15)
 - /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml  (   1453 bytes, from 2013-03-31 16:51:27)
 - /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml          (  32901 bytes, from 2014-06-02 15:21:30)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml (   9859 bytes, from 2014-06-02 15:21:30)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml    (  14960 bytes, from 2014-07-27 17:22:13)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml          (  58020 bytes, from 2014-08-01 12:22:48)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml          (  41068 bytes, from 2014-08-01 12:22:48)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml (  10551 bytes, from 2014-11-13 22:44:30)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml    (  15053 bytes, from 2014-11-09 15:45:47)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml          (  63169 bytes, from 2014-11-13 22:44:18)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml          (  49097 bytes, from 2014-11-14 15:38:00)
 
 Copyright (C) 2013-2014 by the following authors:
 - Rob Clark <robdclark@gmail.com> (robclark)
@@ -105,6 +105,7 @@ enum adreno_rb_dither_mode {
 enum adreno_rb_depth_format {
 	DEPTHX_16 = 0,
 	DEPTHX_24_8 = 1,
+	DEPTHX_32 = 2,
 };
 
 enum adreno_rb_copy_control_mode {
@@ -132,6 +133,7 @@ enum a3xx_threadmode {
 };
 
 enum a3xx_instrbuffermode {
+	CACHE = 0,
 	BUFFER = 1,
 };
 
@@ -140,6 +142,13 @@ enum a3xx_threadsize {
 	FOUR_QUADS = 1,
 };
 
+enum a3xx_color_swap {
+	WZYX = 0,
+	WXYZ = 1,
+	ZYXW = 2,
+	XYZW = 3,
+};
+
 #define REG_AXXX_CP_RB_BASE					0x000001c0
 
 #define REG_AXXX_CP_RB_CNTL					0x000001c1
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 7ab85af3a7db..be83dee83d08 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -2,6 +2,8 @@
  * Copyright (C) 2013-2014 Red Hat
  * Author: Rob Clark <robdclark@gmail.com>
  *
+ * Copyright (c) 2014 The Linux Foundation. All rights reserved.
+ *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 as published by
  * the Free Software Foundation.
@@ -28,6 +30,7 @@ MODULE_PARM_DESC(hang_debug, "Dump registers when hang is detected (can be slow!
 module_param_named(hang_debug, hang_debug, bool, 0600);
 
 struct msm_gpu *a3xx_gpu_init(struct drm_device *dev);
+struct msm_gpu *a4xx_gpu_init(struct drm_device *dev);
 
 static const struct adreno_info gpulist[] = {
 	{
@@ -54,6 +57,14 @@ static const struct adreno_info gpulist[] = {
 		.pfpfw = "a330_pfp.fw",
 		.gmem  = SZ_1M,
 		.init  = a3xx_gpu_init,
+	}, {
+		.rev   = ADRENO_REV(4, 2, 0, ANY_ID),
+		.revn  = 420,
+		.name  = "A420",
+		.pm4fw = "a420_pm4.fw",
+		.pfpfw = "a420_pfp.fw",
+		.gmem  = (SZ_1M + SZ_512K),
+		.init  = a4xx_gpu_init,
 	},
 };
 
@@ -61,6 +72,8 @@ MODULE_FIRMWARE("a300_pm4.fw");
 MODULE_FIRMWARE("a300_pfp.fw");
 MODULE_FIRMWARE("a330_pm4.fw");
 MODULE_FIRMWARE("a330_pfp.fw");
+MODULE_FIRMWARE("a420_pm4.fw");
+MODULE_FIRMWARE("a420_pfp.fw");
 
 static inline bool _rev_match(uint8_t entry, uint8_t id)
 {
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 6afa29167fee..aa873048308b 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -2,6 +2,8 @@
  * Copyright (C) 2013 Red Hat
  * Author: Rob Clark <robdclark@gmail.com>
  *
+ * Copyright (c) 2014 The Linux Foundation. All rights reserved.
+ *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 as published by
  * the Free Software Foundation.
@@ -63,19 +65,21 @@ int adreno_hw_init(struct msm_gpu *gpu)
 	}
 
 	/* Setup REG_CP_RB_CNTL: */
-	gpu_write(gpu, REG_AXXX_CP_RB_CNTL,
+	adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_CNTL,
 			/* size is log2(quad-words): */
 			AXXX_CP_RB_CNTL_BUFSZ(ilog2(gpu->rb->size / 8)) |
 			AXXX_CP_RB_CNTL_BLKSZ(ilog2(RB_BLKSIZE / 8)));
 
 	/* Setup ringbuffer address: */
-	gpu_write(gpu, REG_AXXX_CP_RB_BASE, gpu->rb_iova);
-	gpu_write(gpu, REG_AXXX_CP_RB_RPTR_ADDR, rbmemptr(adreno_gpu, rptr));
+	adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_BASE, gpu->rb_iova);
+	adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_RPTR_ADDR,
+			rbmemptr(adreno_gpu, rptr));
 
 	/* Setup scratch/timestamp: */
-	gpu_write(gpu, REG_AXXX_SCRATCH_ADDR, rbmemptr(adreno_gpu, fence));
+	adreno_gpu_write(adreno_gpu, REG_ADRENO_SCRATCH_ADDR,
+			rbmemptr(adreno_gpu, fence));
 
-	gpu_write(gpu, REG_AXXX_SCRATCH_UMSK, 0x1);
+	adreno_gpu_write(adreno_gpu, REG_ADRENO_SCRATCH_UMSK, 0x1);
 
 	return 0;
 }
@@ -151,7 +155,7 @@ int adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 	OUT_PKT0(ring, REG_AXXX_CP_SCRATCH_REG2, 1);
 	OUT_RING(ring, submit->fence);
 
-	if (adreno_is_a3xx(adreno_gpu)) {
+	if (adreno_is_a3xx(adreno_gpu) || adreno_is_a4xx(adreno_gpu)) {
 		/* Flush HLSQ lazy updates to make sure there is nothing
 		 * pending for indirect loads after the timestamp has
 		 * passed:
@@ -188,12 +192,13 @@ int adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 
 void adreno_flush(struct msm_gpu *gpu)
 {
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	uint32_t wptr = get_wptr(gpu->rb);
 
 	/* ensure writes to ringbuffer have hit system memory: */
 	mb();
 
-	gpu_write(gpu, REG_AXXX_CP_RB_WPTR, wptr);
+	adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_WPTR, wptr);
 }
 
 void adreno_idle(struct msm_gpu *gpu)
@@ -319,6 +324,12 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	DBG("fast_rate=%u, slow_rate=%u, bus_freq=%u",
 			gpu->fast_rate, gpu->slow_rate, gpu->bus_freq);
 
+	ret = msm_gpu_init(drm, pdev, &adreno_gpu->base, &funcs->base,
+			adreno_gpu->info->name, "kgsl_3d0_reg_memory", "kgsl_3d0_irq",
+			RB_SIZE);
+	if (ret)
+		return ret;
+
 	ret = request_firmware(&adreno_gpu->pm4, adreno_gpu->info->pm4fw, drm->dev);
 	if (ret) {
 		dev_err(drm->dev, "failed to load %s PM4 firmware: %d\n",
@@ -333,12 +344,6 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		return ret;
 	}
 
-	ret = msm_gpu_init(drm, pdev, &adreno_gpu->base, &funcs->base,
-			adreno_gpu->info->name, "kgsl_3d0_reg_memory", "kgsl_3d0_irq",
-			RB_SIZE);
-	if (ret)
-		return ret;
-
 	mmu = gpu->mmu;
 	if (mmu) {
 		ret = mmu->funcs->attach(mmu, iommu_ports,
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 52f051579753..a0cc30977e67 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -2,6 +2,8 @@
  * Copyright (C) 2013 Red Hat
  * Author: Rob Clark <robdclark@gmail.com>
  *
+ * Copyright (c) 2014 The Linux Foundation. All rights reserved.
+ *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 as published by
  * the Free Software Foundation.
@@ -25,6 +27,81 @@
 #include "adreno_common.xml.h"
 #include "adreno_pm4.xml.h"
 
+#define REG_ADRENO_DEFINE(_offset, _reg) [_offset] = (_reg) + 1
+/**
+ * adreno_regs: List of registers that are used in across all
+ * 3D devices. Each device type has different offset value for the same
+ * register, so an array of register offsets are declared for every device
+ * and are indexed by the enumeration values defined in this enum
+ */
+enum adreno_regs {
+	REG_ADRENO_CP_DEBUG,
+	REG_ADRENO_CP_ME_RAM_WADDR,
+	REG_ADRENO_CP_ME_RAM_DATA,
+	REG_ADRENO_CP_PFP_UCODE_DATA,
+	REG_ADRENO_CP_PFP_UCODE_ADDR,
+	REG_ADRENO_CP_WFI_PEND_CTR,
+	REG_ADRENO_CP_RB_BASE,
+	REG_ADRENO_CP_RB_RPTR_ADDR,
+	REG_ADRENO_CP_RB_RPTR,
+	REG_ADRENO_CP_RB_WPTR,
+	REG_ADRENO_CP_PROTECT_CTRL,
+	REG_ADRENO_CP_ME_CNTL,
+	REG_ADRENO_CP_RB_CNTL,
+	REG_ADRENO_CP_IB1_BASE,
+	REG_ADRENO_CP_IB1_BUFSZ,
+	REG_ADRENO_CP_IB2_BASE,
+	REG_ADRENO_CP_IB2_BUFSZ,
+	REG_ADRENO_CP_TIMESTAMP,
+	REG_ADRENO_CP_ME_RAM_RADDR,
+	REG_ADRENO_CP_ROQ_ADDR,
+	REG_ADRENO_CP_ROQ_DATA,
+	REG_ADRENO_CP_MERCIU_ADDR,
+	REG_ADRENO_CP_MERCIU_DATA,
+	REG_ADRENO_CP_MERCIU_DATA2,
+	REG_ADRENO_CP_MEQ_ADDR,
+	REG_ADRENO_CP_MEQ_DATA,
+	REG_ADRENO_CP_HW_FAULT,
+	REG_ADRENO_CP_PROTECT_STATUS,
+	REG_ADRENO_SCRATCH_ADDR,
+	REG_ADRENO_SCRATCH_UMSK,
+	REG_ADRENO_SCRATCH_REG2,
+	REG_ADRENO_RBBM_STATUS,
+	REG_ADRENO_RBBM_PERFCTR_CTL,
+	REG_ADRENO_RBBM_PERFCTR_LOAD_CMD0,
+	REG_ADRENO_RBBM_PERFCTR_LOAD_CMD1,
+	REG_ADRENO_RBBM_PERFCTR_LOAD_CMD2,
+	REG_ADRENO_RBBM_PERFCTR_PWR_1_LO,
+	REG_ADRENO_RBBM_INT_0_MASK,
+	REG_ADRENO_RBBM_INT_0_STATUS,
+	REG_ADRENO_RBBM_AHB_ERROR_STATUS,
+	REG_ADRENO_RBBM_PM_OVERRIDE2,
+	REG_ADRENO_RBBM_AHB_CMD,
+	REG_ADRENO_RBBM_INT_CLEAR_CMD,
+	REG_ADRENO_RBBM_SW_RESET_CMD,
+	REG_ADRENO_RBBM_CLOCK_CTL,
+	REG_ADRENO_RBBM_AHB_ME_SPLIT_STATUS,
+	REG_ADRENO_RBBM_AHB_PFP_SPLIT_STATUS,
+	REG_ADRENO_VPC_DEBUG_RAM_SEL,
+	REG_ADRENO_VPC_DEBUG_RAM_READ,
+	REG_ADRENO_VSC_SIZE_ADDRESS,
+	REG_ADRENO_VFD_CONTROL_0,
+	REG_ADRENO_VFD_INDEX_MAX,
+	REG_ADRENO_SP_VS_PVT_MEM_ADDR_REG,
+	REG_ADRENO_SP_FS_PVT_MEM_ADDR_REG,
+	REG_ADRENO_SP_VS_OBJ_START_REG,
+	REG_ADRENO_SP_FS_OBJ_START_REG,
+	REG_ADRENO_PA_SC_AA_CONFIG,
+	REG_ADRENO_SQ_GPR_MANAGEMENT,
+	REG_ADRENO_SQ_INST_STORE_MANAGMENT,
+	REG_ADRENO_TP0_CHICKEN,
+	REG_ADRENO_RBBM_RBBM_CTL,
+	REG_ADRENO_UCHE_INVALIDATE0,
+	REG_ADRENO_RBBM_PERFCTR_LOAD_VALUE_LO,
+	REG_ADRENO_RBBM_PERFCTR_LOAD_VALUE_HI,
+	REG_ADRENO_REGISTER_MAX,
+};
+
 struct adreno_rev {
 	uint8_t  core;
 	uint8_t  major;
@@ -76,6 +153,13 @@ struct adreno_gpu {
 	struct adreno_rbmemptrs *memptrs;
 	struct drm_gem_object *memptrs_bo;
 	uint32_t memptrs_iova;
+
+	/*
+	 * Register offsets are different between some GPUs.
+	 * GPU specific offsets will be exported by GPU specific
+	 * code (a3xx_gpu.c) and stored in this common location.
+	 */
+	const unsigned int *reg_offsets;
 };
 #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
 
@@ -128,6 +212,16 @@ static inline bool adreno_is_a330v2(struct adreno_gpu *gpu)
 	return adreno_is_a330(gpu) && (gpu->rev.patchid > 0);
 }
 
+static inline bool adreno_is_a4xx(struct adreno_gpu *gpu)
+{
+	return (gpu->revn >= 400) && (gpu->revn < 500);
+}
+
+static inline int adreno_is_a420(struct adreno_gpu *gpu)
+{
+	return gpu->revn == 420;
+}
+
 int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value);
 int adreno_hw_init(struct msm_gpu *gpu);
 uint32_t adreno_last_fence(struct msm_gpu *gpu);
@@ -171,5 +265,37 @@ OUT_PKT3(struct msm_ringbuffer *ring, uint8_t opcode, uint16_t cnt)
 	OUT_RING(ring, CP_TYPE3_PKT | ((cnt-1) << 16) | ((opcode & 0xFF) << 8));
 }
 
+/*
+ * adreno_checkreg_off() - Checks the validity of a register enum
+ * @gpu:		Pointer to struct adreno_gpu
+ * @offset_name:	The register enum that is checked
+ */
+static inline bool adreno_reg_check(struct adreno_gpu *gpu,
+		enum adreno_regs offset_name)
+{
+	if (offset_name >= REG_ADRENO_REGISTER_MAX ||
+			!gpu->reg_offsets[offset_name]) {
+		BUG();
+	}
+	return true;
+}
+
+static inline u32 adreno_gpu_read(struct adreno_gpu *gpu,
+		enum adreno_regs offset_name)
+{
+	u32 reg = gpu->reg_offsets[offset_name];
+	u32 val = 0;
+	if(adreno_reg_check(gpu,offset_name))
+		val = gpu_read(&gpu->base, reg - 1);
+	return val;
+}
+
+static inline void adreno_gpu_write(struct adreno_gpu *gpu,
+		enum adreno_regs offset_name, u32 data)
+{
+	u32 reg = gpu->reg_offsets[offset_name];
+	if(adreno_reg_check(gpu, offset_name))
+		gpu_write(&gpu->base, reg - 1, data);
+}
 
 #endif /* __ADRENO_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h b/drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h
index 6ef43f66c30a..6a75cee94d81 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h
@@ -11,10 +11,10 @@ The rules-ng-ng source files this header was generated from are:
 - /home/robclark/src/freedreno/envytools/rnndb/adreno.xml               (    364 bytes, from 2013-11-30 14:47:15)
 - /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml  (   1453 bytes, from 2013-03-31 16:51:27)
 - /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml          (  32901 bytes, from 2014-06-02 15:21:30)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml (   9859 bytes, from 2014-06-02 15:21:30)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml    (  14960 bytes, from 2014-07-27 17:22:13)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml          (  58020 bytes, from 2014-08-01 12:22:48)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml          (  41068 bytes, from 2014-08-01 12:22:48)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml (  10551 bytes, from 2014-11-13 22:44:30)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml    (  15053 bytes, from 2014-11-09 15:45:47)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml          (  63169 bytes, from 2014-11-13 22:44:18)
+- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml          (  49097 bytes, from 2014-11-14 15:38:00)
 
 Copyright (C) 2013-2014 by the following authors:
 - Rob Clark <robdclark@gmail.com> (robclark)
@@ -157,6 +157,7 @@ enum adreno_pm4_type3_packets {
 	CP_IM_STORE = 44,
 	CP_SET_DRAW_INIT_FLAGS = 75,
 	CP_SET_PROTECTED_MODE = 95,
+	CP_BOOTSTRAP_UCODE = 111,
 	CP_LOAD_STATE = 48,
 	CP_COND_INDIRECT_BUFFER_PFE = 58,
 	CP_COND_INDIRECT_BUFFER_PFD = 50,
@@ -278,11 +279,11 @@ static inline uint32_t CP_DRAW_INDX_1_INDEX_SIZE(enum pc_di_index_size val)
 #define CP_DRAW_INDX_1_NOT_EOP					0x00001000
 #define CP_DRAW_INDX_1_SMALL_INDEX				0x00002000
 #define CP_DRAW_INDX_1_PRE_DRAW_INITIATOR_ENABLE		0x00004000
-#define CP_DRAW_INDX_1_NUM_INDICES__MASK			0xffff0000
-#define CP_DRAW_INDX_1_NUM_INDICES__SHIFT			16
-static inline uint32_t CP_DRAW_INDX_1_NUM_INDICES(uint32_t val)
+#define CP_DRAW_INDX_1_NUM_INSTANCES__MASK			0xff000000
+#define CP_DRAW_INDX_1_NUM_INSTANCES__SHIFT			24
+static inline uint32_t CP_DRAW_INDX_1_NUM_INSTANCES(uint32_t val)
 {
-	return ((val) << CP_DRAW_INDX_1_NUM_INDICES__SHIFT) & CP_DRAW_INDX_1_NUM_INDICES__MASK;
+	return ((val) << CP_DRAW_INDX_1_NUM_INSTANCES__SHIFT) & CP_DRAW_INDX_1_NUM_INSTANCES__MASK;
 }
 
 #define REG_CP_DRAW_INDX_2					0x00000002
@@ -293,20 +294,20 @@ static inline uint32_t CP_DRAW_INDX_2_NUM_INDICES(uint32_t val)
 	return ((val) << CP_DRAW_INDX_2_NUM_INDICES__SHIFT) & CP_DRAW_INDX_2_NUM_INDICES__MASK;
 }
 
-#define REG_CP_DRAW_INDX_2					0x00000002
-#define CP_DRAW_INDX_2_INDX_BASE__MASK				0xffffffff
-#define CP_DRAW_INDX_2_INDX_BASE__SHIFT				0
-static inline uint32_t CP_DRAW_INDX_2_INDX_BASE(uint32_t val)
+#define REG_CP_DRAW_INDX_3					0x00000003
+#define CP_DRAW_INDX_3_INDX_BASE__MASK				0xffffffff
+#define CP_DRAW_INDX_3_INDX_BASE__SHIFT				0
+static inline uint32_t CP_DRAW_INDX_3_INDX_BASE(uint32_t val)
 {
-	return ((val) << CP_DRAW_INDX_2_INDX_BASE__SHIFT) & CP_DRAW_INDX_2_INDX_BASE__MASK;
+	return ((val) << CP_DRAW_INDX_3_INDX_BASE__SHIFT) & CP_DRAW_INDX_3_INDX_BASE__MASK;
 }
 
-#define REG_CP_DRAW_INDX_2					0x00000002
-#define CP_DRAW_INDX_2_INDX_SIZE__MASK				0xffffffff
-#define CP_DRAW_INDX_2_INDX_SIZE__SHIFT				0
-static inline uint32_t CP_DRAW_INDX_2_INDX_SIZE(uint32_t val)
+#define REG_CP_DRAW_INDX_4					0x00000004
+#define CP_DRAW_INDX_4_INDX_SIZE__MASK				0xffffffff
+#define CP_DRAW_INDX_4_INDX_SIZE__SHIFT				0
+static inline uint32_t CP_DRAW_INDX_4_INDX_SIZE(uint32_t val)
 {
-	return ((val) << CP_DRAW_INDX_2_INDX_SIZE__SHIFT) & CP_DRAW_INDX_2_INDX_SIZE__MASK;
+	return ((val) << CP_DRAW_INDX_4_INDX_SIZE__SHIFT) & CP_DRAW_INDX_4_INDX_SIZE__MASK;
 }
 
 #define REG_CP_DRAW_INDX_2_0					0x00000000
@@ -345,11 +346,11 @@ static inline uint32_t CP_DRAW_INDX_2_1_INDEX_SIZE(enum pc_di_index_size val)
 #define CP_DRAW_INDX_2_1_NOT_EOP				0x00001000
 #define CP_DRAW_INDX_2_1_SMALL_INDEX				0x00002000
 #define CP_DRAW_INDX_2_1_PRE_DRAW_INITIATOR_ENABLE		0x00004000
-#define CP_DRAW_INDX_2_1_NUM_INDICES__MASK			0xffff0000
-#define CP_DRAW_INDX_2_1_NUM_INDICES__SHIFT			16
-static inline uint32_t CP_DRAW_INDX_2_1_NUM_INDICES(uint32_t val)
+#define CP_DRAW_INDX_2_1_NUM_INSTANCES__MASK			0xff000000
+#define CP_DRAW_INDX_2_1_NUM_INSTANCES__SHIFT			24
+static inline uint32_t CP_DRAW_INDX_2_1_NUM_INSTANCES(uint32_t val)
 {
-	return ((val) << CP_DRAW_INDX_2_1_NUM_INDICES__SHIFT) & CP_DRAW_INDX_2_1_NUM_INDICES__MASK;
+	return ((val) << CP_DRAW_INDX_2_1_NUM_INSTANCES__SHIFT) & CP_DRAW_INDX_2_1_NUM_INSTANCES__MASK;
 }
 
 #define REG_CP_DRAW_INDX_2_2					0x00000002
@@ -388,11 +389,11 @@ static inline uint32_t CP_DRAW_INDX_OFFSET_0_INDEX_SIZE(enum pc_di_index_size va
 #define CP_DRAW_INDX_OFFSET_0_NOT_EOP				0x00001000
 #define CP_DRAW_INDX_OFFSET_0_SMALL_INDEX			0x00002000
 #define CP_DRAW_INDX_OFFSET_0_PRE_DRAW_INITIATOR_ENABLE		0x00004000
-#define CP_DRAW_INDX_OFFSET_0_NUM_INDICES__MASK			0xffff0000
-#define CP_DRAW_INDX_OFFSET_0_NUM_INDICES__SHIFT		16
-static inline uint32_t CP_DRAW_INDX_OFFSET_0_NUM_INDICES(uint32_t val)
+#define CP_DRAW_INDX_OFFSET_0_NUM_INSTANCES__MASK		0xffff0000
+#define CP_DRAW_INDX_OFFSET_0_NUM_INSTANCES__SHIFT		16
+static inline uint32_t CP_DRAW_INDX_OFFSET_0_NUM_INSTANCES(uint32_t val)
 {
-	return ((val) << CP_DRAW_INDX_OFFSET_0_NUM_INDICES__SHIFT) & CP_DRAW_INDX_OFFSET_0_NUM_INDICES__MASK;
+	return ((val) << CP_DRAW_INDX_OFFSET_0_NUM_INSTANCES__SHIFT) & CP_DRAW_INDX_OFFSET_0_NUM_INSTANCES__MASK;
 }
 
 #define REG_CP_DRAW_INDX_OFFSET_1				0x00000001
@@ -405,20 +406,22 @@ static inline uint32_t CP_DRAW_INDX_OFFSET_2_NUM_INDICES(uint32_t val)
 	return ((val) << CP_DRAW_INDX_OFFSET_2_NUM_INDICES__SHIFT) & CP_DRAW_INDX_OFFSET_2_NUM_INDICES__MASK;
 }
 
-#define REG_CP_DRAW_INDX_OFFSET_2				0x00000002
-#define CP_DRAW_INDX_OFFSET_2_INDX_BASE__MASK			0xffffffff
-#define CP_DRAW_INDX_OFFSET_2_INDX_BASE__SHIFT			0
-static inline uint32_t CP_DRAW_INDX_OFFSET_2_INDX_BASE(uint32_t val)
+#define REG_CP_DRAW_INDX_OFFSET_3				0x00000003
+
+#define REG_CP_DRAW_INDX_OFFSET_4				0x00000004
+#define CP_DRAW_INDX_OFFSET_4_INDX_BASE__MASK			0xffffffff
+#define CP_DRAW_INDX_OFFSET_4_INDX_BASE__SHIFT			0
+static inline uint32_t CP_DRAW_INDX_OFFSET_4_INDX_BASE(uint32_t val)
 {
-	return ((val) << CP_DRAW_INDX_OFFSET_2_INDX_BASE__SHIFT) & CP_DRAW_INDX_OFFSET_2_INDX_BASE__MASK;
+	return ((val) << CP_DRAW_INDX_OFFSET_4_INDX_BASE__SHIFT) & CP_DRAW_INDX_OFFSET_4_INDX_BASE__MASK;
 }
 
-#define REG_CP_DRAW_INDX_OFFSET_2				0x00000002
-#define CP_DRAW_INDX_OFFSET_2_INDX_SIZE__MASK			0xffffffff
-#define CP_DRAW_INDX_OFFSET_2_INDX_SIZE__SHIFT			0
-static inline uint32_t CP_DRAW_INDX_OFFSET_2_INDX_SIZE(uint32_t val)
+#define REG_CP_DRAW_INDX_OFFSET_5				0x00000005
+#define CP_DRAW_INDX_OFFSET_5_INDX_SIZE__MASK			0xffffffff
+#define CP_DRAW_INDX_OFFSET_5_INDX_SIZE__SHIFT			0
+static inline uint32_t CP_DRAW_INDX_OFFSET_5_INDX_SIZE(uint32_t val)
 {
-	return ((val) << CP_DRAW_INDX_OFFSET_2_INDX_SIZE__SHIFT) & CP_DRAW_INDX_OFFSET_2_INDX_SIZE__MASK;
+	return ((val) << CP_DRAW_INDX_OFFSET_5_INDX_SIZE__SHIFT) & CP_DRAW_INDX_OFFSET_5_INDX_SIZE__MASK;
 }
 
 #define REG_CP_SET_DRAW_STATE_0					0x00000000
diff --git a/drivers/gpu/drm/msm/dsi/dsi.xml.h b/drivers/gpu/drm/msm/dsi/dsi.xml.h
index e965898dfda6..448438b759b4 100644
--- a/drivers/gpu/drm/msm/dsi/dsi.xml.h
+++ b/drivers/gpu/drm/msm/dsi/dsi.xml.h
@@ -10,12 +10,12 @@ git clone https://github.com/freedreno/envytools.git
 The rules-ng-ng source files this header was generated from are:
 - /home/robclark/src/freedreno/envytools/rnndb/msm.xml                 (    647 bytes, from 2013-11-30 14:45:35)
 - /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml (   1453 bytes, from 2013-03-31 16:51:27)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  20457 bytes, from 2014-08-01 12:22:48)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1615 bytes, from 2014-07-17 15:34:33)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  22517 bytes, from 2014-07-17 15:34:33)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  20136 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1940 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  23963 bytes, from 2014-10-31 16:51:46)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/dsi.xml             (  11712 bytes, from 2013-08-17 17:13:43)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/sfpb.xml            (    344 bytes, from 2013-08-11 19:26:32)
-- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1686 bytes, from 2014-08-01 12:23:53)
+- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1686 bytes, from 2014-10-31 16:48:57)
 - /home/robclark/src/freedreno/envytools/rnndb/hdmi/qfprom.xml         (    600 bytes, from 2013-07-05 19:21:12)
 - /home/robclark/src/freedreno/envytools/rnndb/hdmi/hdmi.xml           (  23613 bytes, from 2014-07-17 15:33:30)
 
diff --git a/drivers/gpu/drm/msm/dsi/mmss_cc.xml.h b/drivers/gpu/drm/msm/dsi/mmss_cc.xml.h
index f2bdda957205..c102a7f074ac 100644
--- a/drivers/gpu/drm/msm/dsi/mmss_cc.xml.h
+++ b/drivers/gpu/drm/msm/dsi/mmss_cc.xml.h
@@ -10,12 +10,12 @@ git clone https://github.com/freedreno/envytools.git
 The rules-ng-ng source files this header was generated from are:
 - /home/robclark/src/freedreno/envytools/rnndb/msm.xml                 (    647 bytes, from 2013-11-30 14:45:35)
 - /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml (   1453 bytes, from 2013-03-31 16:51:27)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  20457 bytes, from 2014-08-01 12:22:48)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1615 bytes, from 2014-07-17 15:34:33)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  22517 bytes, from 2014-07-17 15:34:33)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  20136 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1940 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  23963 bytes, from 2014-10-31 16:51:46)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/dsi.xml             (  11712 bytes, from 2013-08-17 17:13:43)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/sfpb.xml            (    344 bytes, from 2013-08-11 19:26:32)
-- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1686 bytes, from 2014-08-01 12:23:53)
+- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1686 bytes, from 2014-10-31 16:48:57)
 - /home/robclark/src/freedreno/envytools/rnndb/hdmi/qfprom.xml         (    600 bytes, from 2013-07-05 19:21:12)
 - /home/robclark/src/freedreno/envytools/rnndb/hdmi/hdmi.xml           (  23613 bytes, from 2014-07-17 15:33:30)
 
diff --git a/drivers/gpu/drm/msm/dsi/sfpb.xml.h b/drivers/gpu/drm/msm/dsi/sfpb.xml.h
index e5b071ffd865..a900134bdf33 100644
--- a/drivers/gpu/drm/msm/dsi/sfpb.xml.h
+++ b/drivers/gpu/drm/msm/dsi/sfpb.xml.h
@@ -10,12 +10,12 @@ git clone https://github.com/freedreno/envytools.git
 The rules-ng-ng source files this header was generated from are:
 - /home/robclark/src/freedreno/envytools/rnndb/msm.xml                 (    647 bytes, from 2013-11-30 14:45:35)
 - /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml (   1453 bytes, from 2013-03-31 16:51:27)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  20457 bytes, from 2014-08-01 12:22:48)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1615 bytes, from 2014-07-17 15:34:33)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  22517 bytes, from 2014-07-17 15:34:33)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  20136 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1940 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  23963 bytes, from 2014-10-31 16:51:46)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/dsi.xml             (  11712 bytes, from 2013-08-17 17:13:43)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/sfpb.xml            (    344 bytes, from 2013-08-11 19:26:32)
-- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1686 bytes, from 2014-08-01 12:23:53)
+- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1686 bytes, from 2014-10-31 16:48:57)
 - /home/robclark/src/freedreno/envytools/rnndb/hdmi/qfprom.xml         (    600 bytes, from 2013-07-05 19:21:12)
 - /home/robclark/src/freedreno/envytools/rnndb/hdmi/hdmi.xml           (  23613 bytes, from 2014-07-17 15:33:30)
 
diff --git a/drivers/gpu/drm/msm/hdmi/hdmi.c b/drivers/gpu/drm/msm/hdmi/hdmi.c
index 9d00dcba6959..062c68725376 100644
--- a/drivers/gpu/drm/msm/hdmi/hdmi.c
+++ b/drivers/gpu/drm/msm/hdmi/hdmi.c
@@ -15,6 +15,7 @@
  * this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/of_irq.h>
 #include "hdmi.h"
 
 void hdmi_set_mode(struct hdmi *hdmi, bool power_on)
@@ -39,7 +40,7 @@ void hdmi_set_mode(struct hdmi *hdmi, bool power_on)
 			power_on ? "Enable" : "Disable", ctrl);
 }
 
-irqreturn_t hdmi_irq(int irq, void *dev_id)
+static irqreturn_t hdmi_irq(int irq, void *dev_id)
 {
 	struct hdmi *hdmi = dev_id;
 
@@ -54,9 +55,8 @@ irqreturn_t hdmi_irq(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
-void hdmi_destroy(struct kref *kref)
+static void hdmi_destroy(struct hdmi *hdmi)
 {
-	struct hdmi *hdmi = container_of(kref, struct hdmi, refcount);
 	struct hdmi_phy *phy = hdmi->phy;
 
 	if (phy)
@@ -68,37 +68,24 @@ void hdmi_destroy(struct kref *kref)
 	platform_set_drvdata(hdmi->pdev, NULL);
 }
 
-/* initialize connector */
-struct hdmi *hdmi_init(struct drm_device *dev, struct drm_encoder *encoder)
+/* construct hdmi at bind/probe time, grab all the resources.  If
+ * we are to EPROBE_DEFER we want to do it here, rather than later
+ * at modeset_init() time
+ */
+static struct hdmi *hdmi_init(struct platform_device *pdev)
 {
+	struct hdmi_platform_config *config = pdev->dev.platform_data;
 	struct hdmi *hdmi = NULL;
-	struct msm_drm_private *priv = dev->dev_private;
-	struct platform_device *pdev = priv->hdmi_pdev;
-	struct hdmi_platform_config *config;
 	int i, ret;
 
-	if (!pdev) {
-		dev_err(dev->dev, "no hdmi device\n");
-		ret = -ENXIO;
-		goto fail;
-	}
-
-	config = pdev->dev.platform_data;
-
-	hdmi = kzalloc(sizeof(*hdmi), GFP_KERNEL);
+	hdmi = devm_kzalloc(&pdev->dev, sizeof(*hdmi), GFP_KERNEL);
 	if (!hdmi) {
 		ret = -ENOMEM;
 		goto fail;
 	}
 
-	kref_init(&hdmi->refcount);
-
-	hdmi->dev = dev;
 	hdmi->pdev = pdev;
 	hdmi->config = config;
-	hdmi->encoder = encoder;
-
-	hdmi_audio_infoframe_init(&hdmi->audio.infoframe);
 
 	/* not sure about which phy maps to which msm.. probably I miss some */
 	if (config->phy_init)
@@ -108,7 +95,7 @@ struct hdmi *hdmi_init(struct drm_device *dev, struct drm_encoder *encoder)
 
 	if (IS_ERR(hdmi->phy)) {
 		ret = PTR_ERR(hdmi->phy);
-		dev_err(dev->dev, "failed to load phy: %d\n", ret);
+		dev_err(&pdev->dev, "failed to load phy: %d\n", ret);
 		hdmi->phy = NULL;
 		goto fail;
 	}
@@ -127,7 +114,7 @@ struct hdmi *hdmi_init(struct drm_device *dev, struct drm_encoder *encoder)
 				config->hpd_reg_names[i]);
 		if (IS_ERR(reg)) {
 			ret = PTR_ERR(reg);
-			dev_err(dev->dev, "failed to get hpd regulator: %s (%d)\n",
+			dev_err(&pdev->dev, "failed to get hpd regulator: %s (%d)\n",
 					config->hpd_reg_names[i], ret);
 			goto fail;
 		}
@@ -143,7 +130,7 @@ struct hdmi *hdmi_init(struct drm_device *dev, struct drm_encoder *encoder)
 				config->pwr_reg_names[i]);
 		if (IS_ERR(reg)) {
 			ret = PTR_ERR(reg);
-			dev_err(dev->dev, "failed to get pwr regulator: %s (%d)\n",
+			dev_err(&pdev->dev, "failed to get pwr regulator: %s (%d)\n",
 					config->pwr_reg_names[i], ret);
 			goto fail;
 		}
@@ -158,7 +145,7 @@ struct hdmi *hdmi_init(struct drm_device *dev, struct drm_encoder *encoder)
 		clk = devm_clk_get(&pdev->dev, config->hpd_clk_names[i]);
 		if (IS_ERR(clk)) {
 			ret = PTR_ERR(clk);
-			dev_err(dev->dev, "failed to get hpd clk: %s (%d)\n",
+			dev_err(&pdev->dev, "failed to get hpd clk: %s (%d)\n",
 					config->hpd_clk_names[i], ret);
 			goto fail;
 		}
@@ -173,7 +160,7 @@ struct hdmi *hdmi_init(struct drm_device *dev, struct drm_encoder *encoder)
 		clk = devm_clk_get(&pdev->dev, config->pwr_clk_names[i]);
 		if (IS_ERR(clk)) {
 			ret = PTR_ERR(clk);
-			dev_err(dev->dev, "failed to get pwr clk: %s (%d)\n",
+			dev_err(&pdev->dev, "failed to get pwr clk: %s (%d)\n",
 					config->pwr_clk_names[i], ret);
 			goto fail;
 		}
@@ -184,11 +171,40 @@ struct hdmi *hdmi_init(struct drm_device *dev, struct drm_encoder *encoder)
 	hdmi->i2c = hdmi_i2c_init(hdmi);
 	if (IS_ERR(hdmi->i2c)) {
 		ret = PTR_ERR(hdmi->i2c);
-		dev_err(dev->dev, "failed to get i2c: %d\n", ret);
+		dev_err(&pdev->dev, "failed to get i2c: %d\n", ret);
 		hdmi->i2c = NULL;
 		goto fail;
 	}
 
+	return hdmi;
+
+fail:
+	if (hdmi)
+		hdmi_destroy(hdmi);
+
+	return ERR_PTR(ret);
+}
+
+/* Second part of initialization, the drm/kms level modeset_init,
+ * constructs/initializes mode objects, etc, is called from master
+ * driver (not hdmi sub-device's probe/bind!)
+ *
+ * Any resource (regulator/clk/etc) which could be missing at boot
+ * should be handled in hdmi_init() so that failure happens from
+ * hdmi sub-device's probe.
+ */
+int hdmi_modeset_init(struct hdmi *hdmi,
+		struct drm_device *dev, struct drm_encoder *encoder)
+{
+	struct msm_drm_private *priv = dev->dev_private;
+	struct platform_device *pdev = hdmi->pdev;
+	int ret;
+
+	hdmi->dev = dev;
+	hdmi->encoder = encoder;
+
+	hdmi_audio_infoframe_init(&hdmi->audio.infoframe);
+
 	hdmi->bridge = hdmi_bridge_init(hdmi);
 	if (IS_ERR(hdmi->bridge)) {
 		ret = PTR_ERR(hdmi->bridge);
@@ -205,22 +221,20 @@ struct hdmi *hdmi_init(struct drm_device *dev, struct drm_encoder *encoder)
 		goto fail;
 	}
 
-	if (!config->shared_irq) {
-		hdmi->irq = platform_get_irq(pdev, 0);
-		if (hdmi->irq < 0) {
-			ret = hdmi->irq;
-			dev_err(dev->dev, "failed to get irq: %d\n", ret);
-			goto fail;
-		}
+	hdmi->irq = irq_of_parse_and_map(pdev->dev.of_node, 0);
+	if (hdmi->irq < 0) {
+		ret = hdmi->irq;
+		dev_err(dev->dev, "failed to get irq: %d\n", ret);
+		goto fail;
+	}
 
-		ret = devm_request_threaded_irq(&pdev->dev, hdmi->irq,
-				NULL, hdmi_irq, IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
-				"hdmi_isr", hdmi);
-		if (ret < 0) {
-			dev_err(dev->dev, "failed to request IRQ%u: %d\n",
-					hdmi->irq, ret);
-			goto fail;
-		}
+	ret = devm_request_irq(&pdev->dev, hdmi->irq,
+			hdmi_irq, IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
+			"hdmi_isr", hdmi);
+	if (ret < 0) {
+		dev_err(dev->dev, "failed to request IRQ%u: %d\n",
+				hdmi->irq, ret);
+		goto fail;
 	}
 
 	encoder->bridge = hdmi->bridge;
@@ -230,19 +244,20 @@ struct hdmi *hdmi_init(struct drm_device *dev, struct drm_encoder *encoder)
 
 	platform_set_drvdata(pdev, hdmi);
 
-	return hdmi;
+	return 0;
 
 fail:
-	if (hdmi) {
-		/* bridge/connector are normally destroyed by drm: */
-		if (hdmi->bridge)
-			hdmi->bridge->funcs->destroy(hdmi->bridge);
-		if (hdmi->connector)
-			hdmi->connector->funcs->destroy(hdmi->connector);
-		hdmi_destroy(&hdmi->refcount);
+	/* bridge/connector are normally destroyed by drm: */
+	if (hdmi->bridge) {
+		hdmi->bridge->funcs->destroy(hdmi->bridge);
+		hdmi->bridge = NULL;
+	}
+	if (hdmi->connector) {
+		hdmi->connector->funcs->destroy(hdmi->connector);
+		hdmi->connector = NULL;
 	}
 
-	return ERR_PTR(ret);
+	return ret;
 }
 
 /*
@@ -251,13 +266,6 @@ fail:
 
 #include <linux/of_gpio.h>
 
-static void set_hdmi_pdev(struct drm_device *dev,
-		struct platform_device *pdev)
-{
-	struct msm_drm_private *priv = dev->dev_private;
-	priv->hdmi_pdev = pdev;
-}
-
 #ifdef CONFIG_OF
 static int get_gpio(struct device *dev, struct device_node *of_node, const char *name)
 {
@@ -278,7 +286,10 @@ static int get_gpio(struct device *dev, struct device_node *of_node, const char
 
 static int hdmi_bind(struct device *dev, struct device *master, void *data)
 {
+	struct drm_device *drm = dev_get_drvdata(master);
+	struct msm_drm_private *priv = drm->dev_private;
 	static struct hdmi_platform_config config = {};
+	struct hdmi *hdmi;
 #ifdef CONFIG_OF
 	struct device_node *of_node = dev->of_node;
 
@@ -298,7 +309,6 @@ static int hdmi_bind(struct device *dev, struct device *master, void *data)
 		config.hpd_clk_cnt   = ARRAY_SIZE(hpd_clk_names);
 		config.pwr_clk_names = pwr_clk_names;
 		config.pwr_clk_cnt   = ARRAY_SIZE(pwr_clk_names);
-		config.shared_irq    = true;
 	} else if (of_device_is_compatible(of_node, "qcom,hdmi-tx-8960")) {
 		static const char *hpd_clk_names[] = {"core_clk", "master_iface_clk", "slave_iface_clk"};
 		static const char *hpd_reg_names[] = {"core-vdda", "hdmi-mux"};
@@ -369,14 +379,22 @@ static int hdmi_bind(struct device *dev, struct device *master, void *data)
 	}
 #endif
 	dev->platform_data = &config;
-	set_hdmi_pdev(dev_get_drvdata(master), to_platform_device(dev));
+	hdmi = hdmi_init(to_platform_device(dev));
+	if (IS_ERR(hdmi))
+		return PTR_ERR(hdmi);
+	priv->hdmi = hdmi;
 	return 0;
 }
 
 static void hdmi_unbind(struct device *dev, struct device *master,
 		void *data)
 {
-	set_hdmi_pdev(dev_get_drvdata(master), NULL);
+	struct drm_device *drm = dev_get_drvdata(master);
+	struct msm_drm_private *priv = drm->dev_private;
+	if (priv->hdmi) {
+		hdmi_destroy(priv->hdmi);
+		priv->hdmi = NULL;
+	}
 }
 
 static const struct component_ops hdmi_ops = {
diff --git a/drivers/gpu/drm/msm/hdmi/hdmi.h b/drivers/gpu/drm/msm/hdmi/hdmi.h
index b981995410b5..43e654f751b7 100644
--- a/drivers/gpu/drm/msm/hdmi/hdmi.h
+++ b/drivers/gpu/drm/msm/hdmi/hdmi.h
@@ -38,8 +38,6 @@ struct hdmi_audio {
 };
 
 struct hdmi {
-	struct kref refcount;
-
 	struct drm_device *dev;
 	struct platform_device *pdev;
 
@@ -97,13 +95,9 @@ struct hdmi_platform_config {
 	/* gpio's: */
 	int ddc_clk_gpio, ddc_data_gpio, hpd_gpio, mux_en_gpio, mux_sel_gpio;
 	int mux_lpm_gpio;
-
-	/* older devices had their own irq, mdp5+ it is shared w/ mdp: */
-	bool shared_irq;
 };
 
 void hdmi_set_mode(struct hdmi *hdmi, bool power_on);
-void hdmi_destroy(struct kref *kref);
 
 static inline void hdmi_write(struct hdmi *hdmi, u32 reg, u32 data)
 {
@@ -115,17 +109,6 @@ static inline u32 hdmi_read(struct hdmi *hdmi, u32 reg)
 	return msm_readl(hdmi->mmio + reg);
 }
 
-static inline struct hdmi * hdmi_reference(struct hdmi *hdmi)
-{
-	kref_get(&hdmi->refcount);
-	return hdmi;
-}
-
-static inline void hdmi_unreference(struct hdmi *hdmi)
-{
-	kref_put(&hdmi->refcount, hdmi_destroy);
-}
-
 /*
  * The phy appears to be different, for example between 8960 and 8x60,
  * so split the phy related functions out and load the correct one at
diff --git a/drivers/gpu/drm/msm/hdmi/hdmi.xml.h b/drivers/gpu/drm/msm/hdmi/hdmi.xml.h
index 76fd0cfc6558..5b0844befbab 100644
--- a/drivers/gpu/drm/msm/hdmi/hdmi.xml.h
+++ b/drivers/gpu/drm/msm/hdmi/hdmi.xml.h
@@ -10,12 +10,12 @@ git clone https://github.com/freedreno/envytools.git
 The rules-ng-ng source files this header was generated from are:
 - /home/robclark/src/freedreno/envytools/rnndb/msm.xml                 (    647 bytes, from 2013-11-30 14:45:35)
 - /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml (   1453 bytes, from 2013-03-31 16:51:27)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  20457 bytes, from 2014-08-01 12:22:48)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1615 bytes, from 2014-07-17 15:34:33)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  22517 bytes, from 2014-07-17 15:34:33)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  20136 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1940 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  23963 bytes, from 2014-10-31 16:51:46)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/dsi.xml             (  11712 bytes, from 2013-08-17 17:13:43)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/sfpb.xml            (    344 bytes, from 2013-08-11 19:26:32)
-- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1686 bytes, from 2014-08-01 12:23:53)
+- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1686 bytes, from 2014-10-31 16:48:57)
 - /home/robclark/src/freedreno/envytools/rnndb/hdmi/qfprom.xml         (    600 bytes, from 2013-07-05 19:21:12)
 - /home/robclark/src/freedreno/envytools/rnndb/hdmi/hdmi.xml           (  23613 bytes, from 2014-07-17 15:33:30)
 
diff --git a/drivers/gpu/drm/msm/hdmi/hdmi_bridge.c b/drivers/gpu/drm/msm/hdmi/hdmi_bridge.c
index f6cf745c249e..6902ad6da710 100644
--- a/drivers/gpu/drm/msm/hdmi/hdmi_bridge.c
+++ b/drivers/gpu/drm/msm/hdmi/hdmi_bridge.c
@@ -26,7 +26,6 @@ struct hdmi_bridge {
 static void hdmi_bridge_destroy(struct drm_bridge *bridge)
 {
 	struct hdmi_bridge *hdmi_bridge = to_hdmi_bridge(bridge);
-	hdmi_unreference(hdmi_bridge->hdmi);
 	drm_bridge_cleanup(bridge);
 	kfree(hdmi_bridge);
 }
@@ -218,7 +217,7 @@ struct drm_bridge *hdmi_bridge_init(struct hdmi *hdmi)
 		goto fail;
 	}
 
-	hdmi_bridge->hdmi = hdmi_reference(hdmi);
+	hdmi_bridge->hdmi = hdmi;
 
 	bridge = &hdmi_bridge->base;
 
diff --git a/drivers/gpu/drm/msm/hdmi/hdmi_connector.c b/drivers/gpu/drm/msm/hdmi/hdmi_connector.c
index 4aca2a3c667c..fbebb0405d76 100644
--- a/drivers/gpu/drm/msm/hdmi/hdmi_connector.c
+++ b/drivers/gpu/drm/msm/hdmi/hdmi_connector.c
@@ -330,8 +330,6 @@ static void hdmi_connector_destroy(struct drm_connector *connector)
 	drm_connector_unregister(connector);
 	drm_connector_cleanup(connector);
 
-	hdmi_unreference(hdmi_connector->hdmi);
-
 	kfree(hdmi_connector);
 }
 
@@ -401,6 +399,9 @@ static const struct drm_connector_funcs hdmi_connector_funcs = {
 	.detect = hdmi_connector_detect,
 	.fill_modes = drm_helper_probe_single_connector_modes,
 	.destroy = hdmi_connector_destroy,
+	.reset = drm_atomic_helper_connector_reset,
+	.atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state,
+	.atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
 };
 
 static const struct drm_connector_helper_funcs hdmi_connector_helper_funcs = {
@@ -422,7 +423,7 @@ struct drm_connector *hdmi_connector_init(struct hdmi *hdmi)
 		goto fail;
 	}
 
-	hdmi_connector->hdmi = hdmi_reference(hdmi);
+	hdmi_connector->hdmi = hdmi;
 	INIT_WORK(&hdmi_connector->hpd_work, hotplug_work);
 
 	connector = &hdmi_connector->base;
diff --git a/drivers/gpu/drm/msm/hdmi/hdmi_phy_8960.c b/drivers/gpu/drm/msm/hdmi/hdmi_phy_8960.c
index f408b69486a8..eeed006eed13 100644
--- a/drivers/gpu/drm/msm/hdmi/hdmi_phy_8960.c
+++ b/drivers/gpu/drm/msm/hdmi/hdmi_phy_8960.c
@@ -510,7 +510,7 @@ struct hdmi_phy *hdmi_phy_8960_init(struct hdmi *hdmi)
 
 #ifdef CONFIG_COMMON_CLK
 	phy_8960->pll_hw.init = &pll_init;
-	phy_8960->pll = devm_clk_register(hdmi->dev->dev, &phy_8960->pll_hw);
+	phy_8960->pll = devm_clk_register(&hdmi->pdev->dev, &phy_8960->pll_hw);
 	if (IS_ERR(phy_8960->pll)) {
 		ret = PTR_ERR(phy_8960->pll);
 		phy_8960->pll = NULL;
diff --git a/drivers/gpu/drm/msm/hdmi/qfprom.xml.h b/drivers/gpu/drm/msm/hdmi/qfprom.xml.h
index d53c29327df9..29bd796797de 100644
--- a/drivers/gpu/drm/msm/hdmi/qfprom.xml.h
+++ b/drivers/gpu/drm/msm/hdmi/qfprom.xml.h
@@ -10,12 +10,12 @@ git clone https://github.com/freedreno/envytools.git
 The rules-ng-ng source files this header was generated from are:
 - /home/robclark/src/freedreno/envytools/rnndb/msm.xml                 (    647 bytes, from 2013-11-30 14:45:35)
 - /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml (   1453 bytes, from 2013-03-31 16:51:27)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  20457 bytes, from 2014-08-01 12:22:48)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1615 bytes, from 2014-07-17 15:34:33)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  22517 bytes, from 2014-07-17 15:34:33)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  20136 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1940 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  23963 bytes, from 2014-10-31 16:51:46)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/dsi.xml             (  11712 bytes, from 2013-08-17 17:13:43)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/sfpb.xml            (    344 bytes, from 2013-08-11 19:26:32)
-- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1686 bytes, from 2014-08-01 12:23:53)
+- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1686 bytes, from 2014-10-31 16:48:57)
 - /home/robclark/src/freedreno/envytools/rnndb/hdmi/qfprom.xml         (    600 bytes, from 2013-07-05 19:21:12)
 - /home/robclark/src/freedreno/envytools/rnndb/hdmi/hdmi.xml           (  23613 bytes, from 2014-07-17 15:33:30)
 
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4.xml.h b/drivers/gpu/drm/msm/mdp/mdp4/mdp4.xml.h
index 03c0bd9cd5b9..a4a7f8c7122a 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4.xml.h
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4.xml.h
@@ -10,12 +10,12 @@ git clone https://github.com/freedreno/envytools.git
 The rules-ng-ng source files this header was generated from are:
 - /home/robclark/src/freedreno/envytools/rnndb/msm.xml                 (    647 bytes, from 2013-11-30 14:45:35)
 - /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml (   1453 bytes, from 2013-03-31 16:51:27)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  20457 bytes, from 2014-08-01 12:22:48)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1615 bytes, from 2014-07-17 15:34:33)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  22517 bytes, from 2014-07-17 15:34:33)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  20136 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1940 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  23963 bytes, from 2014-10-31 16:51:46)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/dsi.xml             (  11712 bytes, from 2013-08-17 17:13:43)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/sfpb.xml            (    344 bytes, from 2013-08-11 19:26:32)
-- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1686 bytes, from 2014-08-01 12:23:53)
+- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1686 bytes, from 2014-10-31 16:48:57)
 - /home/robclark/src/freedreno/envytools/rnndb/hdmi/qfprom.xml         (    600 bytes, from 2013-07-05 19:21:12)
 - /home/robclark/src/freedreno/envytools/rnndb/hdmi/hdmi.xml           (  23613 bytes, from 2014-07-17 15:33:30)
 
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c
index 7d00f7fb5773..a7672e100d8b 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c
@@ -25,8 +25,6 @@
 struct mdp4_crtc {
 	struct drm_crtc base;
 	char name[8];
-	struct drm_plane *plane;
-	struct drm_plane *planes[8];
 	int id;
 	int ovlp;
 	enum mdp4_dma dma;
@@ -52,25 +50,11 @@ struct mdp4_crtc {
 
 	/* if there is a pending flip, these will be non-null: */
 	struct drm_pending_vblank_event *event;
-	struct msm_fence_cb pageflip_cb;
 
 #define PENDING_CURSOR 0x1
 #define PENDING_FLIP   0x2
 	atomic_t pending;
 
-	/* the fb that we logically (from PoV of KMS API) hold a ref
-	 * to.  Which we may not yet be scanning out (we may still
-	 * be scanning out previous in case of page_flip while waiting
-	 * for gpu rendering to complete:
-	 */
-	struct drm_framebuffer *fb;
-
-	/* the fb that we currently hold a scanout ref to: */
-	struct drm_framebuffer *scanout_fb;
-
-	/* for unref'ing framebuffers after scanout completes: */
-	struct drm_flip_work unref_fb_work;
-
 	/* for unref'ing cursor bo's after scanout completes: */
 	struct drm_flip_work unref_cursor_work;
 
@@ -97,15 +81,14 @@ static void crtc_flush(struct drm_crtc *crtc)
 {
 	struct mdp4_crtc *mdp4_crtc = to_mdp4_crtc(crtc);
 	struct mdp4_kms *mdp4_kms = get_kms(crtc);
-	uint32_t i, flush = 0;
+	struct drm_plane *plane;
+	uint32_t flush = 0;
 
-	for (i = 0; i < ARRAY_SIZE(mdp4_crtc->planes); i++) {
-		struct drm_plane *plane = mdp4_crtc->planes[i];
-		if (plane) {
-			enum mdp4_pipe pipe_id = mdp4_plane_pipe(plane);
-			flush |= pipe2flush(pipe_id);
-		}
+	drm_atomic_crtc_for_each_plane(plane, crtc) {
+		enum mdp4_pipe pipe_id = mdp4_plane_pipe(plane);
+		flush |= pipe2flush(pipe_id);
 	}
+
 	flush |= ovlp2flush(mdp4_crtc->ovlp);
 
 	DBG("%s: flush=%08x", mdp4_crtc->name, flush);
@@ -113,47 +96,6 @@ static void crtc_flush(struct drm_crtc *crtc)
 	mdp4_write(mdp4_kms, REG_MDP4_OVERLAY_FLUSH, flush);
 }
 
-static void update_fb(struct drm_crtc *crtc, struct drm_framebuffer *new_fb)
-{
-	struct mdp4_crtc *mdp4_crtc = to_mdp4_crtc(crtc);
-	struct drm_framebuffer *old_fb = mdp4_crtc->fb;
-
-	/* grab reference to incoming scanout fb: */
-	drm_framebuffer_reference(new_fb);
-	mdp4_crtc->base.primary->fb = new_fb;
-	mdp4_crtc->fb = new_fb;
-
-	if (old_fb)
-		drm_flip_work_queue(&mdp4_crtc->unref_fb_work, old_fb);
-}
-
-/* unlike update_fb(), take a ref to the new scanout fb *before* updating
- * plane, then call this.  Needed to ensure we don't unref the buffer that
- * is actually still being scanned out.
- *
- * Note that this whole thing goes away with atomic.. since we can defer
- * calling into driver until rendering is done.
- */
-static void update_scanout(struct drm_crtc *crtc, struct drm_framebuffer *fb)
-{
-	struct mdp4_crtc *mdp4_crtc = to_mdp4_crtc(crtc);
-
-	/* flush updates, to make sure hw is updated to new scanout fb,
-	 * so that we can safely queue unref to current fb (ie. next
-	 * vblank we know hw is done w/ previous scanout_fb).
-	 */
-	crtc_flush(crtc);
-
-	if (mdp4_crtc->scanout_fb)
-		drm_flip_work_queue(&mdp4_crtc->unref_fb_work,
-				mdp4_crtc->scanout_fb);
-
-	mdp4_crtc->scanout_fb = fb;
-
-	/* enable vblank to complete flip: */
-	request_pending(crtc, PENDING_FLIP);
-}
-
 /* if file!=NULL, this is preclose potential cancel-flip path */
 static void complete_flip(struct drm_crtc *crtc, struct drm_file *file)
 {
@@ -171,38 +113,13 @@ static void complete_flip(struct drm_crtc *crtc, struct drm_file *file)
 		 */
 		if (!file || (event->base.file_priv == file)) {
 			mdp4_crtc->event = NULL;
+			DBG("%s: send event: %p", mdp4_crtc->name, event);
 			drm_send_vblank_event(dev, mdp4_crtc->id, event);
 		}
 	}
 	spin_unlock_irqrestore(&dev->event_lock, flags);
 }
 
-static void pageflip_cb(struct msm_fence_cb *cb)
-{
-	struct mdp4_crtc *mdp4_crtc =
-		container_of(cb, struct mdp4_crtc, pageflip_cb);
-	struct drm_crtc *crtc = &mdp4_crtc->base;
-	struct drm_framebuffer *fb = crtc->primary->fb;
-
-	if (!fb)
-		return;
-
-	drm_framebuffer_reference(fb);
-	mdp4_plane_set_scanout(mdp4_crtc->plane, fb);
-	update_scanout(crtc, fb);
-}
-
-static void unref_fb_worker(struct drm_flip_work *work, void *val)
-{
-	struct mdp4_crtc *mdp4_crtc =
-		container_of(work, struct mdp4_crtc, unref_fb_work);
-	struct drm_device *dev = mdp4_crtc->base.dev;
-
-	mutex_lock(&dev->mode_config.mutex);
-	drm_framebuffer_unreference(val);
-	mutex_unlock(&dev->mode_config.mutex);
-}
-
 static void unref_cursor_worker(struct drm_flip_work *work, void *val)
 {
 	struct mdp4_crtc *mdp4_crtc =
@@ -218,7 +135,6 @@ static void mdp4_crtc_destroy(struct drm_crtc *crtc)
 	struct mdp4_crtc *mdp4_crtc = to_mdp4_crtc(crtc);
 
 	drm_crtc_cleanup(crtc);
-	drm_flip_work_cleanup(&mdp4_crtc->unref_fb_work);
 	drm_flip_work_cleanup(&mdp4_crtc->unref_cursor_work);
 
 	kfree(mdp4_crtc);
@@ -251,57 +167,70 @@ static bool mdp4_crtc_mode_fixup(struct drm_crtc *crtc,
 	return true;
 }
 
-static void blend_setup(struct drm_crtc *crtc)
+/* statically (for now) map planes to mixer stage (z-order): */
+static const int idxs[] = {
+		[VG1]  = 1,
+		[VG2]  = 2,
+		[RGB1] = 0,
+		[RGB2] = 0,
+		[RGB3] = 0,
+		[VG3]  = 3,
+		[VG4]  = 4,
+
+};
+
+/* setup mixer config, for which we need to consider all crtc's and
+ * the planes attached to them
+ *
+ * TODO may possibly need some extra locking here
+ */
+static void setup_mixer(struct mdp4_kms *mdp4_kms)
 {
-	struct mdp4_crtc *mdp4_crtc = to_mdp4_crtc(crtc);
-	struct mdp4_kms *mdp4_kms = get_kms(crtc);
-	int i, ovlp = mdp4_crtc->ovlp;
+	struct drm_mode_config *config = &mdp4_kms->dev->mode_config;
+	struct drm_crtc *crtc;
 	uint32_t mixer_cfg = 0;
 	static const enum mdp_mixer_stage_id stages[] = {
 			STAGE_BASE, STAGE0, STAGE1, STAGE2, STAGE3,
 	};
-	/* statically (for now) map planes to mixer stage (z-order): */
-	static const int idxs[] = {
-			[VG1]  = 1,
-			[VG2]  = 2,
-			[RGB1] = 0,
-			[RGB2] = 0,
-			[RGB3] = 0,
-			[VG3]  = 3,
-			[VG4]  = 4,
 
-	};
-	bool alpha[4]= { false, false, false, false };
+	list_for_each_entry(crtc, &config->crtc_list, head) {
+		struct mdp4_crtc *mdp4_crtc = to_mdp4_crtc(crtc);
+		struct drm_plane *plane;
 
-	/* Don't rely on value read back from hw, but instead use our
-	 * own shadowed value.  Possibly disable/reenable looses the
-	 * previous value and goes back to power-on default?
-	 */
-	mixer_cfg = mdp4_kms->mixer_cfg;
+		drm_atomic_crtc_for_each_plane(plane, crtc) {
+			enum mdp4_pipe pipe_id = mdp4_plane_pipe(plane);
+			int idx = idxs[pipe_id];
+			mixer_cfg = mixercfg(mixer_cfg, mdp4_crtc->mixer,
+					pipe_id, stages[idx]);
+		}
+	}
+
+	mdp4_write(mdp4_kms, REG_MDP4_LAYERMIXER_IN_CFG, mixer_cfg);
+}
+
+static void blend_setup(struct drm_crtc *crtc)
+{
+	struct mdp4_crtc *mdp4_crtc = to_mdp4_crtc(crtc);
+	struct mdp4_kms *mdp4_kms = get_kms(crtc);
+	struct drm_plane *plane;
+	int i, ovlp = mdp4_crtc->ovlp;
+	bool alpha[4]= { false, false, false, false };
 
 	mdp4_write(mdp4_kms, REG_MDP4_OVLP_TRANSP_LOW0(ovlp), 0);
 	mdp4_write(mdp4_kms, REG_MDP4_OVLP_TRANSP_LOW1(ovlp), 0);
 	mdp4_write(mdp4_kms, REG_MDP4_OVLP_TRANSP_HIGH0(ovlp), 0);
 	mdp4_write(mdp4_kms, REG_MDP4_OVLP_TRANSP_HIGH1(ovlp), 0);
 
-	for (i = 0; i < ARRAY_SIZE(mdp4_crtc->planes); i++) {
-		struct drm_plane *plane = mdp4_crtc->planes[i];
-		if (plane) {
-			enum mdp4_pipe pipe_id = mdp4_plane_pipe(plane);
-			int idx = idxs[pipe_id];
-			if (idx > 0) {
-				const struct mdp_format *format =
+	drm_atomic_crtc_for_each_plane(plane, crtc) {
+		enum mdp4_pipe pipe_id = mdp4_plane_pipe(plane);
+		int idx = idxs[pipe_id];
+		if (idx > 0) {
+			const struct mdp_format *format =
 					to_mdp_format(msm_framebuffer_format(plane->fb));
-				alpha[idx-1] = format->alpha_enable;
-			}
-			mixer_cfg = mixercfg(mixer_cfg, mdp4_crtc->mixer,
-					pipe_id, stages[idx]);
+			alpha[idx-1] = format->alpha_enable;
 		}
 	}
 
-	/* this shouldn't happen.. and seems to cause underflow: */
-	WARN_ON(!mixer_cfg);
-
 	for (i = 0; i < 4; i++) {
 		uint32_t op;
 
@@ -324,22 +253,21 @@ static void blend_setup(struct drm_crtc *crtc)
 		mdp4_write(mdp4_kms, REG_MDP4_OVLP_STAGE_TRANSP_HIGH1(ovlp, i), 0);
 	}
 
-	mdp4_kms->mixer_cfg = mixer_cfg;
-	mdp4_write(mdp4_kms, REG_MDP4_LAYERMIXER_IN_CFG, mixer_cfg);
+	setup_mixer(mdp4_kms);
 }
 
-static int mdp4_crtc_mode_set(struct drm_crtc *crtc,
-		struct drm_display_mode *mode,
-		struct drm_display_mode *adjusted_mode,
-		int x, int y,
-		struct drm_framebuffer *old_fb)
+static void mdp4_crtc_mode_set_nofb(struct drm_crtc *crtc)
 {
 	struct mdp4_crtc *mdp4_crtc = to_mdp4_crtc(crtc);
 	struct mdp4_kms *mdp4_kms = get_kms(crtc);
 	enum mdp4_dma dma = mdp4_crtc->dma;
-	int ret, ovlp = mdp4_crtc->ovlp;
+	int ovlp = mdp4_crtc->ovlp;
+	struct drm_display_mode *mode;
+
+	if (WARN_ON(!crtc->state))
+		return;
 
-	mode = adjusted_mode;
+	mode = &crtc->state->adjusted_mode;
 
 	DBG("%s: set mode: %d:\"%s\" %d %d %d %d %d %d %d %d %d %d 0x%x 0x%x",
 			mdp4_crtc->name, mode->base.id, mode->name,
@@ -350,28 +278,13 @@ static int mdp4_crtc_mode_set(struct drm_crtc *crtc,
 			mode->vsync_end, mode->vtotal,
 			mode->type, mode->flags);
 
-	/* grab extra ref for update_scanout() */
-	drm_framebuffer_reference(crtc->primary->fb);
-
-	ret = mdp4_plane_mode_set(mdp4_crtc->plane, crtc, crtc->primary->fb,
-			0, 0, mode->hdisplay, mode->vdisplay,
-			x << 16, y << 16,
-			mode->hdisplay << 16, mode->vdisplay << 16);
-	if (ret) {
-		drm_framebuffer_unreference(crtc->primary->fb);
-		dev_err(crtc->dev->dev, "%s: failed to set mode on plane: %d\n",
-				mdp4_crtc->name, ret);
-		return ret;
-	}
-
 	mdp4_write(mdp4_kms, REG_MDP4_DMA_SRC_SIZE(dma),
 			MDP4_DMA_SRC_SIZE_WIDTH(mode->hdisplay) |
 			MDP4_DMA_SRC_SIZE_HEIGHT(mode->vdisplay));
 
 	/* take data from pipe: */
 	mdp4_write(mdp4_kms, REG_MDP4_DMA_SRC_BASE(dma), 0);
-	mdp4_write(mdp4_kms, REG_MDP4_DMA_SRC_STRIDE(dma),
-			crtc->primary->fb->pitches[0]);
+	mdp4_write(mdp4_kms, REG_MDP4_DMA_SRC_STRIDE(dma), 0);
 	mdp4_write(mdp4_kms, REG_MDP4_DMA_DST_SIZE(dma),
 			MDP4_DMA_DST_SIZE_WIDTH(0) |
 			MDP4_DMA_DST_SIZE_HEIGHT(0));
@@ -380,8 +293,7 @@ static int mdp4_crtc_mode_set(struct drm_crtc *crtc,
 	mdp4_write(mdp4_kms, REG_MDP4_OVLP_SIZE(ovlp),
 			MDP4_OVLP_SIZE_WIDTH(mode->hdisplay) |
 			MDP4_OVLP_SIZE_HEIGHT(mode->vdisplay));
-	mdp4_write(mdp4_kms, REG_MDP4_OVLP_STRIDE(ovlp),
-			crtc->primary->fb->pitches[0]);
+	mdp4_write(mdp4_kms, REG_MDP4_OVLP_STRIDE(ovlp), 0);
 
 	mdp4_write(mdp4_kms, REG_MDP4_OVLP_CFG(ovlp), 1);
 
@@ -390,11 +302,6 @@ static int mdp4_crtc_mode_set(struct drm_crtc *crtc,
 		mdp4_write(mdp4_kms, REG_MDP4_DMA_E_QUANT(1), 0x00ff0000);
 		mdp4_write(mdp4_kms, REG_MDP4_DMA_E_QUANT(2), 0x00ff0000);
 	}
-
-	update_fb(crtc, crtc->primary->fb);
-	update_scanout(crtc, crtc->primary->fb);
-
-	return 0;
 }
 
 static void mdp4_crtc_prepare(struct drm_crtc *crtc)
@@ -416,60 +323,51 @@ static void mdp4_crtc_commit(struct drm_crtc *crtc)
 	drm_crtc_vblank_put(crtc);
 }
 
-static int mdp4_crtc_mode_set_base(struct drm_crtc *crtc, int x, int y,
-		struct drm_framebuffer *old_fb)
+static void mdp4_crtc_load_lut(struct drm_crtc *crtc)
+{
+}
+
+static int mdp4_crtc_atomic_check(struct drm_crtc *crtc,
+		struct drm_crtc_state *state)
 {
 	struct mdp4_crtc *mdp4_crtc = to_mdp4_crtc(crtc);
-	struct drm_plane *plane = mdp4_crtc->plane;
-	struct drm_display_mode *mode = &crtc->mode;
-	int ret;
+	struct drm_device *dev = crtc->dev;
 
-	/* grab extra ref for update_scanout() */
-	drm_framebuffer_reference(crtc->primary->fb);
+	DBG("%s: check", mdp4_crtc->name);
 
-	ret = mdp4_plane_mode_set(plane, crtc, crtc->primary->fb,
-			0, 0, mode->hdisplay, mode->vdisplay,
-			x << 16, y << 16,
-			mode->hdisplay << 16, mode->vdisplay << 16);
-	if (ret) {
-		drm_framebuffer_unreference(crtc->primary->fb);
-		return ret;
+	if (mdp4_crtc->event) {
+		dev_err(dev->dev, "already pending flip!\n");
+		return -EBUSY;
 	}
 
-	update_fb(crtc, crtc->primary->fb);
-	update_scanout(crtc, crtc->primary->fb);
+	// TODO anything else to check?
 
 	return 0;
 }
 
-static void mdp4_crtc_load_lut(struct drm_crtc *crtc)
+static void mdp4_crtc_atomic_begin(struct drm_crtc *crtc)
 {
+	struct mdp4_crtc *mdp4_crtc = to_mdp4_crtc(crtc);
+	DBG("%s: begin", mdp4_crtc->name);
 }
 
-static int mdp4_crtc_page_flip(struct drm_crtc *crtc,
-		struct drm_framebuffer *new_fb,
-		struct drm_pending_vblank_event *event,
-		uint32_t page_flip_flags)
+static void mdp4_crtc_atomic_flush(struct drm_crtc *crtc)
 {
 	struct mdp4_crtc *mdp4_crtc = to_mdp4_crtc(crtc);
 	struct drm_device *dev = crtc->dev;
-	struct drm_gem_object *obj;
 	unsigned long flags;
 
-	if (mdp4_crtc->event) {
-		dev_err(dev->dev, "already pending flip!\n");
-		return -EBUSY;
-	}
+	DBG("%s: flush", mdp4_crtc->name);
 
-	obj = msm_framebuffer_bo(new_fb, 0);
+	WARN_ON(mdp4_crtc->event);
 
 	spin_lock_irqsave(&dev->event_lock, flags);
-	mdp4_crtc->event = event;
+	mdp4_crtc->event = crtc->state->event;
 	spin_unlock_irqrestore(&dev->event_lock, flags);
 
-	update_fb(crtc, new_fb);
-
-	return msm_gem_queue_inactive_cb(obj, &mdp4_crtc->pageflip_cb);
+	blend_setup(crtc);
+	crtc_flush(crtc);
+	request_pending(crtc, PENDING_FLIP);
 }
 
 static int mdp4_crtc_set_property(struct drm_crtc *crtc,
@@ -607,22 +505,29 @@ static int mdp4_crtc_cursor_move(struct drm_crtc *crtc, int x, int y)
 }
 
 static const struct drm_crtc_funcs mdp4_crtc_funcs = {
-	.set_config = drm_crtc_helper_set_config,
+	.set_config = drm_atomic_helper_set_config,
 	.destroy = mdp4_crtc_destroy,
-	.page_flip = mdp4_crtc_page_flip,
+	.page_flip = drm_atomic_helper_page_flip,
 	.set_property = mdp4_crtc_set_property,
 	.cursor_set = mdp4_crtc_cursor_set,
 	.cursor_move = mdp4_crtc_cursor_move,
+	.reset = drm_atomic_helper_crtc_reset,
+	.atomic_duplicate_state = drm_atomic_helper_crtc_duplicate_state,
+	.atomic_destroy_state = drm_atomic_helper_crtc_destroy_state,
 };
 
 static const struct drm_crtc_helper_funcs mdp4_crtc_helper_funcs = {
 	.dpms = mdp4_crtc_dpms,
 	.mode_fixup = mdp4_crtc_mode_fixup,
-	.mode_set = mdp4_crtc_mode_set,
+	.mode_set_nofb = mdp4_crtc_mode_set_nofb,
+	.mode_set = drm_helper_crtc_mode_set,
+	.mode_set_base = drm_helper_crtc_mode_set_base,
 	.prepare = mdp4_crtc_prepare,
 	.commit = mdp4_crtc_commit,
-	.mode_set_base = mdp4_crtc_mode_set_base,
 	.load_lut = mdp4_crtc_load_lut,
+	.atomic_check = mdp4_crtc_atomic_check,
+	.atomic_begin = mdp4_crtc_atomic_begin,
+	.atomic_flush = mdp4_crtc_atomic_flush,
 };
 
 static void mdp4_crtc_vblank_irq(struct mdp_irq *irq, uint32_t irqstatus)
@@ -638,7 +543,6 @@ static void mdp4_crtc_vblank_irq(struct mdp_irq *irq, uint32_t irqstatus)
 
 	if (pending & PENDING_FLIP) {
 		complete_flip(crtc, NULL);
-		drm_flip_work_commit(&mdp4_crtc->unref_fb_work, priv->wq);
 	}
 
 	if (pending & PENDING_CURSOR) {
@@ -663,7 +567,8 @@ uint32_t mdp4_crtc_vblank(struct drm_crtc *crtc)
 
 void mdp4_crtc_cancel_pending_flip(struct drm_crtc *crtc, struct drm_file *file)
 {
-	DBG("cancel: %p", file);
+	struct mdp4_crtc *mdp4_crtc = to_mdp4_crtc(crtc);
+	DBG("%s: cancel: %p", mdp4_crtc->name, file);
 	complete_flip(crtc, file);
 }
 
@@ -717,35 +622,6 @@ void mdp4_crtc_set_intf(struct drm_crtc *crtc, enum mdp4_intf intf, int mixer)
 	mdp4_write(mdp4_kms, REG_MDP4_DISP_INTF_SEL, intf_sel);
 }
 
-static void set_attach(struct drm_crtc *crtc, enum mdp4_pipe pipe_id,
-		struct drm_plane *plane)
-{
-	struct mdp4_crtc *mdp4_crtc = to_mdp4_crtc(crtc);
-
-	BUG_ON(pipe_id >= ARRAY_SIZE(mdp4_crtc->planes));
-
-	if (mdp4_crtc->planes[pipe_id] == plane)
-		return;
-
-	mdp4_crtc->planes[pipe_id] = plane;
-	blend_setup(crtc);
-	if (mdp4_crtc->enabled && (plane != mdp4_crtc->plane))
-		crtc_flush(crtc);
-}
-
-void mdp4_crtc_attach(struct drm_crtc *crtc, struct drm_plane *plane)
-{
-	set_attach(crtc, mdp4_plane_pipe(plane), plane);
-}
-
-void mdp4_crtc_detach(struct drm_crtc *crtc, struct drm_plane *plane)
-{
-	/* don't actually detatch our primary plane: */
-	if (to_mdp4_crtc(crtc)->plane == plane)
-		return;
-	set_attach(crtc, mdp4_plane_pipe(plane), NULL);
-}
-
 static const char *dma_names[] = {
 		"DMA_P", "DMA_S", "DMA_E",
 };
@@ -757,17 +633,13 @@ struct drm_crtc *mdp4_crtc_init(struct drm_device *dev,
 {
 	struct drm_crtc *crtc = NULL;
 	struct mdp4_crtc *mdp4_crtc;
-	int ret;
 
 	mdp4_crtc = kzalloc(sizeof(*mdp4_crtc), GFP_KERNEL);
-	if (!mdp4_crtc) {
-		ret = -ENOMEM;
-		goto fail;
-	}
+	if (!mdp4_crtc)
+		return ERR_PTR(-ENOMEM);
 
 	crtc = &mdp4_crtc->base;
 
-	mdp4_crtc->plane = plane;
 	mdp4_crtc->id = id;
 
 	mdp4_crtc->ovlp = ovlp_id;
@@ -784,26 +656,14 @@ struct drm_crtc *mdp4_crtc_init(struct drm_device *dev,
 
 	spin_lock_init(&mdp4_crtc->cursor.lock);
 
-	ret = drm_flip_work_init(&mdp4_crtc->unref_fb_work, 16,
-			"unref fb", unref_fb_worker);
-	if (ret)
-		goto fail;
-
-	ret = drm_flip_work_init(&mdp4_crtc->unref_cursor_work, 64,
+	drm_flip_work_init(&mdp4_crtc->unref_cursor_work,
 			"unref cursor", unref_cursor_worker);
 
-	INIT_FENCE_CB(&mdp4_crtc->pageflip_cb, pageflip_cb);
-
 	drm_crtc_init_with_planes(dev, crtc, plane, NULL, &mdp4_crtc_funcs);
 	drm_crtc_helper_add(crtc, &mdp4_crtc_helper_funcs);
+	plane->crtc = crtc;
 
-	mdp4_plane_install_properties(mdp4_crtc->plane, &crtc->base);
+	mdp4_plane_install_properties(plane, &crtc->base);
 
 	return crtc;
-
-fail:
-	if (crtc)
-		mdp4_crtc_destroy(crtc);
-
-	return ERR_PTR(ret);
 }
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c
index 79d804e61cc4..a62109e4ae0d 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c
@@ -228,7 +228,6 @@ static int modeset_init(struct mdp4_kms *mdp4_kms)
 	struct drm_encoder *encoder;
 	struct drm_connector *connector;
 	struct drm_panel *panel;
-	struct hdmi *hdmi;
 	int ret;
 
 	/* construct non-private planes: */
@@ -326,11 +325,13 @@ static int modeset_init(struct mdp4_kms *mdp4_kms)
 	priv->crtcs[priv->num_crtcs++] = crtc;
 	priv->encoders[priv->num_encoders++] = encoder;
 
-	hdmi = hdmi_init(dev, encoder);
-	if (IS_ERR(hdmi)) {
-		ret = PTR_ERR(hdmi);
-		dev_err(dev->dev, "failed to initialize HDMI: %d\n", ret);
-		goto fail;
+	if (priv->hdmi) {
+		/* Construct bridge/connector for HDMI: */
+		ret = hdmi_modeset_init(priv->hdmi, dev, encoder);
+		if (ret) {
+			dev_err(dev->dev, "failed to initialize HDMI: %d\n", ret);
+			goto fail;
+		}
 	}
 
 	return 0;
@@ -381,6 +382,10 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
 	if (IS_ERR(mdp4_kms->dsi_pll_vddio))
 		mdp4_kms->dsi_pll_vddio = NULL;
 
+	/* NOTE: driver for this regulator still missing upstream.. use
+	 * _get_exclusive() and ignore the error if it does not exist
+	 * (and hope that the bootloader left it on for us)
+	 */
 	mdp4_kms->vdd = devm_regulator_get_exclusive(&pdev->dev, "vdd");
 	if (IS_ERR(mdp4_kms->vdd))
 		mdp4_kms->vdd = NULL;
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h
index 9ff6e7ccfe90..cbd77bc626d5 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h
@@ -32,13 +32,6 @@ struct mdp4_kms {
 
 	int rev;
 
-	/* Shadow value for MDP4_LAYERMIXER_IN_CFG.. since setup for all
-	 * crtcs/encoders is in one shared register, we need to update it
-	 * via read/modify/write.  But to avoid getting confused by power-
-	 * on-default values after resume, use this shadow value instead:
-	 */
-	uint32_t mixer_cfg;
-
 	/* mapper-id used to request GEM buffer mapped for scanout: */
 	int id;
 
@@ -194,14 +187,6 @@ uint32_t mdp4_get_formats(enum mdp4_pipe pipe_id, uint32_t *pixel_formats,
 
 void mdp4_plane_install_properties(struct drm_plane *plane,
 		struct drm_mode_object *obj);
-void mdp4_plane_set_scanout(struct drm_plane *plane,
-		struct drm_framebuffer *fb);
-int mdp4_plane_mode_set(struct drm_plane *plane,
-		struct drm_crtc *crtc, struct drm_framebuffer *fb,
-		int crtc_x, int crtc_y,
-		unsigned int crtc_w, unsigned int crtc_h,
-		uint32_t src_x, uint32_t src_y,
-		uint32_t src_w, uint32_t src_h);
 enum mdp4_pipe mdp4_plane_pipe(struct drm_plane *plane);
 struct drm_plane *mdp4_plane_init(struct drm_device *dev,
 		enum mdp4_pipe pipe_id, bool private_plane);
@@ -210,8 +195,6 @@ uint32_t mdp4_crtc_vblank(struct drm_crtc *crtc);
 void mdp4_crtc_cancel_pending_flip(struct drm_crtc *crtc, struct drm_file *file);
 void mdp4_crtc_set_config(struct drm_crtc *crtc, uint32_t config);
 void mdp4_crtc_set_intf(struct drm_crtc *crtc, enum mdp4_intf intf, int mixer);
-void mdp4_crtc_attach(struct drm_crtc *crtc, struct drm_plane *plane);
-void mdp4_crtc_detach(struct drm_crtc *crtc, struct drm_plane *plane);
 struct drm_crtc *mdp4_crtc_init(struct drm_device *dev,
 		struct drm_plane *plane, int id, int ovlp_id,
 		enum mdp4_dma dma_id);
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_lvds_connector.c b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_lvds_connector.c
index 310034688c15..4ddc28e1275b 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_lvds_connector.c
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_lvds_connector.c
@@ -98,6 +98,9 @@ static const struct drm_connector_funcs mdp4_lvds_connector_funcs = {
 	.detect = mdp4_lvds_connector_detect,
 	.fill_modes = drm_helper_probe_single_connector_modes,
 	.destroy = mdp4_lvds_connector_destroy,
+	.reset = drm_atomic_helper_connector_reset,
+	.atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state,
+	.atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
 };
 
 static const struct drm_connector_helper_funcs mdp4_lvds_connector_helper_funcs = {
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c
index 66f33dba1ebb..1e5ebe83647d 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c
@@ -31,47 +31,26 @@ struct mdp4_plane {
 };
 #define to_mdp4_plane(x) container_of(x, struct mdp4_plane, base)
 
-static struct mdp4_kms *get_kms(struct drm_plane *plane)
-{
-	struct msm_drm_private *priv = plane->dev->dev_private;
-	return to_mdp4_kms(to_mdp_kms(priv->kms));
-}
-
-static int mdp4_plane_update(struct drm_plane *plane,
+static void mdp4_plane_set_scanout(struct drm_plane *plane,
+		struct drm_framebuffer *fb);
+static int mdp4_plane_mode_set(struct drm_plane *plane,
 		struct drm_crtc *crtc, struct drm_framebuffer *fb,
 		int crtc_x, int crtc_y,
 		unsigned int crtc_w, unsigned int crtc_h,
 		uint32_t src_x, uint32_t src_y,
-		uint32_t src_w, uint32_t src_h)
-{
-	struct mdp4_plane *mdp4_plane = to_mdp4_plane(plane);
-
-	mdp4_plane->enabled = true;
-
-	if (plane->fb)
-		drm_framebuffer_unreference(plane->fb);
-
-	drm_framebuffer_reference(fb);
-
-	return mdp4_plane_mode_set(plane, crtc, fb,
-			crtc_x, crtc_y, crtc_w, crtc_h,
-			src_x, src_y, src_w, src_h);
-}
+		uint32_t src_w, uint32_t src_h);
 
-static int mdp4_plane_disable(struct drm_plane *plane)
+static struct mdp4_kms *get_kms(struct drm_plane *plane)
 {
-	struct mdp4_plane *mdp4_plane = to_mdp4_plane(plane);
-	DBG("%s: disable", mdp4_plane->name);
-	if (plane->crtc)
-		mdp4_crtc_detach(plane->crtc, plane);
-	return 0;
+	struct msm_drm_private *priv = plane->dev->dev_private;
+	return to_mdp4_kms(to_mdp_kms(priv->kms));
 }
 
 static void mdp4_plane_destroy(struct drm_plane *plane)
 {
 	struct mdp4_plane *mdp4_plane = to_mdp4_plane(plane);
 
-	mdp4_plane_disable(plane);
+	drm_plane_helper_disable(plane);
 	drm_plane_cleanup(plane);
 
 	kfree(mdp4_plane);
@@ -92,19 +71,75 @@ int mdp4_plane_set_property(struct drm_plane *plane,
 }
 
 static const struct drm_plane_funcs mdp4_plane_funcs = {
-		.update_plane = mdp4_plane_update,
-		.disable_plane = mdp4_plane_disable,
+		.update_plane = drm_atomic_helper_update_plane,
+		.disable_plane = drm_atomic_helper_disable_plane,
 		.destroy = mdp4_plane_destroy,
 		.set_property = mdp4_plane_set_property,
+		.reset = drm_atomic_helper_plane_reset,
+		.atomic_duplicate_state = drm_atomic_helper_plane_duplicate_state,
+		.atomic_destroy_state = drm_atomic_helper_plane_destroy_state,
 };
 
-void mdp4_plane_set_scanout(struct drm_plane *plane,
+static int mdp4_plane_prepare_fb(struct drm_plane *plane,
+		struct drm_framebuffer *fb)
+{
+	struct mdp4_plane *mdp4_plane = to_mdp4_plane(plane);
+	struct mdp4_kms *mdp4_kms = get_kms(plane);
+
+	DBG("%s: prepare: FB[%u]", mdp4_plane->name, fb->base.id);
+	return msm_framebuffer_prepare(fb, mdp4_kms->id);
+}
+
+static void mdp4_plane_cleanup_fb(struct drm_plane *plane,
+		struct drm_framebuffer *fb)
+{
+	struct mdp4_plane *mdp4_plane = to_mdp4_plane(plane);
+	struct mdp4_kms *mdp4_kms = get_kms(plane);
+
+	DBG("%s: cleanup: FB[%u]", mdp4_plane->name, fb->base.id);
+	msm_framebuffer_cleanup(fb, mdp4_kms->id);
+}
+
+
+static int mdp4_plane_atomic_check(struct drm_plane *plane,
+		struct drm_plane_state *state)
+{
+	return 0;
+}
+
+static void mdp4_plane_atomic_update(struct drm_plane *plane,
+				     struct drm_plane_state *old_state)
+{
+	struct drm_plane_state *state = plane->state;
+	int ret;
+
+	ret = mdp4_plane_mode_set(plane,
+			state->crtc, state->fb,
+			state->crtc_x, state->crtc_y,
+			state->crtc_w, state->crtc_h,
+			state->src_x,  state->src_y,
+			state->src_w, state->src_h);
+	/* atomic_check should have ensured that this doesn't fail */
+	WARN_ON(ret < 0);
+}
+
+static const struct drm_plane_helper_funcs mdp4_plane_helper_funcs = {
+		.prepare_fb = mdp4_plane_prepare_fb,
+		.cleanup_fb = mdp4_plane_cleanup_fb,
+		.atomic_check = mdp4_plane_atomic_check,
+		.atomic_update = mdp4_plane_atomic_update,
+};
+
+static void mdp4_plane_set_scanout(struct drm_plane *plane,
 		struct drm_framebuffer *fb)
 {
 	struct mdp4_plane *mdp4_plane = to_mdp4_plane(plane);
 	struct mdp4_kms *mdp4_kms = get_kms(plane);
 	enum mdp4_pipe pipe = mdp4_plane->pipe;
-	uint32_t iova;
+	uint32_t iova = msm_framebuffer_iova(fb, mdp4_kms->id, 0);
+
+	DBG("%s: set_scanout: %08x (%u)", mdp4_plane->name,
+			iova, fb->pitches[0]);
 
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_SRC_STRIDE_A(pipe),
 			MDP4_PIPE_SRC_STRIDE_A_P0(fb->pitches[0]) |
@@ -114,7 +149,6 @@ void mdp4_plane_set_scanout(struct drm_plane *plane,
 			MDP4_PIPE_SRC_STRIDE_B_P2(fb->pitches[2]) |
 			MDP4_PIPE_SRC_STRIDE_B_P3(fb->pitches[3]));
 
-	msm_gem_get_iova(msm_framebuffer_bo(fb, 0), mdp4_kms->id, &iova);
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_SRCP0_BASE(pipe), iova);
 
 	plane->fb = fb;
@@ -122,7 +156,7 @@ void mdp4_plane_set_scanout(struct drm_plane *plane,
 
 #define MDP4_VG_PHASE_STEP_DEFAULT	0x20000000
 
-int mdp4_plane_mode_set(struct drm_plane *plane,
+static int mdp4_plane_mode_set(struct drm_plane *plane,
 		struct drm_crtc *crtc, struct drm_framebuffer *fb,
 		int crtc_x, int crtc_y,
 		unsigned int crtc_w, unsigned int crtc_h,
@@ -137,6 +171,11 @@ int mdp4_plane_mode_set(struct drm_plane *plane,
 	uint32_t phasex_step = MDP4_VG_PHASE_STEP_DEFAULT;
 	uint32_t phasey_step = MDP4_VG_PHASE_STEP_DEFAULT;
 
+	if (!(crtc && fb)) {
+		DBG("%s: disabled!", mdp4_plane->name);
+		return 0;
+	}
+
 	/* src values are in Q16 fixed point, convert to integer: */
 	src_x = src_x >> 16;
 	src_y = src_y >> 16;
@@ -197,9 +236,6 @@ int mdp4_plane_mode_set(struct drm_plane *plane,
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_PHASEX_STEP(pipe), phasex_step);
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_PHASEY_STEP(pipe), phasey_step);
 
-	/* TODO detach from old crtc (if we had more than one) */
-	mdp4_crtc_attach(crtc, plane);
-
 	return 0;
 }
 
@@ -239,9 +275,12 @@ struct drm_plane *mdp4_plane_init(struct drm_device *dev,
 			ARRAY_SIZE(mdp4_plane->formats));
 
 	type = private_plane ? DRM_PLANE_TYPE_PRIMARY : DRM_PLANE_TYPE_OVERLAY;
-	drm_universal_plane_init(dev, plane, 0xff, &mdp4_plane_funcs,
-				 mdp4_plane->formats, mdp4_plane->nformats,
-				 type);
+	ret = drm_universal_plane_init(dev, plane, 0xff, &mdp4_plane_funcs,
+				 mdp4_plane->formats, mdp4_plane->nformats, type);
+	if (ret)
+		goto fail;
+
+	drm_plane_helper_add(plane, &mdp4_plane_helper_funcs);
 
 	mdp4_plane_install_properties(plane, &plane->base);
 
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5.xml.h b/drivers/gpu/drm/msm/mdp/mdp5/mdp5.xml.h
index 67f4f896ba8c..e87ef5512cb0 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5.xml.h
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5.xml.h
@@ -10,14 +10,14 @@ git clone https://github.com/freedreno/envytools.git
 The rules-ng-ng source files this header was generated from are:
 - /home/robclark/src/freedreno/envytools/rnndb/msm.xml                 (    647 bytes, from 2013-11-30 14:45:35)
 - /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml (   1453 bytes, from 2013-03-31 16:51:27)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  17996 bytes, from 2013-12-01 19:10:31)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1615 bytes, from 2013-11-30 15:00:52)
-- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  22517 bytes, from 2014-06-25 12:55:02)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp4.xml            (  20136 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp_common.xml      (   1940 bytes, from 2014-10-31 16:51:39)
+- /home/robclark/src/freedreno/envytools/rnndb/mdp/mdp5.xml            (  23963 bytes, from 2014-10-31 16:51:46)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/dsi.xml             (  11712 bytes, from 2013-08-17 17:13:43)
 - /home/robclark/src/freedreno/envytools/rnndb/dsi/sfpb.xml            (    344 bytes, from 2013-08-11 19:26:32)
-- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1544 bytes, from 2013-08-16 19:17:05)
+- /home/robclark/src/freedreno/envytools/rnndb/dsi/mmss_cc.xml         (   1686 bytes, from 2014-10-31 16:48:57)
 - /home/robclark/src/freedreno/envytools/rnndb/hdmi/qfprom.xml         (    600 bytes, from 2013-07-05 19:21:12)
-- /home/robclark/src/freedreno/envytools/rnndb/hdmi/hdmi.xml           (  23613 bytes, from 2014-06-25 12:53:44)
+- /home/robclark/src/freedreno/envytools/rnndb/hdmi/hdmi.xml           (  23613 bytes, from 2014-07-17 15:33:30)
 
 Copyright (C) 2013-2014 by the following authors:
 - Rob Clark <robdclark@gmail.com> (robclark)
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_cfg.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_cfg.c
new file mode 100644
index 000000000000..b0a44310cf2a
--- /dev/null
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_cfg.c
@@ -0,0 +1,207 @@
+/*
+ * Copyright (c) 2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include "mdp5_kms.h"
+#include "mdp5_cfg.h"
+
+struct mdp5_cfg_handler {
+	int revision;
+	struct mdp5_cfg config;
+};
+
+/* mdp5_cfg must be exposed (used in mdp5.xml.h) */
+const struct mdp5_cfg_hw *mdp5_cfg = NULL;
+
+const struct mdp5_cfg_hw msm8x74_config = {
+	.name = "msm8x74",
+	.smp = {
+		.mmb_count = 22,
+		.mmb_size = 4096,
+	},
+	.ctl = {
+		.count = 5,
+		.base = { 0x00600, 0x00700, 0x00800, 0x00900, 0x00a00 },
+	},
+	.pipe_vig = {
+		.count = 3,
+		.base = { 0x01200, 0x01600, 0x01a00 },
+	},
+	.pipe_rgb = {
+		.count = 3,
+		.base = { 0x01e00, 0x02200, 0x02600 },
+	},
+	.pipe_dma = {
+		.count = 2,
+		.base = { 0x02a00, 0x02e00 },
+	},
+	.lm = {
+		.count = 5,
+		.base = { 0x03200, 0x03600, 0x03a00, 0x03e00, 0x04200 },
+		.nb_stages = 5,
+	},
+	.dspp = {
+		.count = 3,
+		.base = { 0x04600, 0x04a00, 0x04e00 },
+	},
+	.ad = {
+		.count = 2,
+		.base = { 0x13100, 0x13300 }, /* NOTE: no ad in v1.0 */
+	},
+	.intf = {
+		.count = 4,
+		.base = { 0x12500, 0x12700, 0x12900, 0x12b00 },
+	},
+	.max_clk = 200000000,
+};
+
+const struct mdp5_cfg_hw apq8084_config = {
+	.name = "apq8084",
+	.smp = {
+		.mmb_count = 44,
+		.mmb_size = 8192,
+		.reserved_state[0] = GENMASK(7, 0),	/* first 8 MMBs */
+		.reserved[CID_RGB0] = 2,
+		.reserved[CID_RGB1] = 2,
+		.reserved[CID_RGB2] = 2,
+		.reserved[CID_RGB3] = 2,
+	},
+	.ctl = {
+		.count = 5,
+		.base = { 0x00600, 0x00700, 0x00800, 0x00900, 0x00a00 },
+	},
+	.pipe_vig = {
+		.count = 4,
+		.base = { 0x01200, 0x01600, 0x01a00, 0x01e00 },
+	},
+	.pipe_rgb = {
+		.count = 4,
+		.base = { 0x02200, 0x02600, 0x02a00, 0x02e00 },
+	},
+	.pipe_dma = {
+		.count = 2,
+		.base = { 0x03200, 0x03600 },
+	},
+	.lm = {
+		.count = 6,
+		.base = { 0x03a00, 0x03e00, 0x04200, 0x04600, 0x04a00, 0x04e00 },
+		.nb_stages = 5,
+	},
+	.dspp = {
+		.count = 4,
+		.base = { 0x05200, 0x05600, 0x05a00, 0x05e00 },
+
+	},
+	.ad = {
+		.count = 3,
+		.base = { 0x13500, 0x13700, 0x13900 },
+	},
+	.intf = {
+		.count = 5,
+		.base = { 0x12500, 0x12700, 0x12900, 0x12b00, 0x12d00 },
+	},
+	.max_clk = 320000000,
+};
+
+static const struct mdp5_cfg_handler cfg_handlers[] = {
+	{ .revision = 0, .config = { .hw = &msm8x74_config } },
+	{ .revision = 2, .config = { .hw = &msm8x74_config } },
+	{ .revision = 3, .config = { .hw = &apq8084_config } },
+};
+
+
+static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev);
+
+const struct mdp5_cfg_hw *mdp5_cfg_get_hw_config(struct mdp5_cfg_handler *cfg_handler)
+{
+	return cfg_handler->config.hw;
+}
+
+struct mdp5_cfg *mdp5_cfg_get_config(struct mdp5_cfg_handler *cfg_handler)
+{
+	return &cfg_handler->config;
+}
+
+int mdp5_cfg_get_hw_rev(struct mdp5_cfg_handler *cfg_handler)
+{
+	return cfg_handler->revision;
+}
+
+void mdp5_cfg_destroy(struct mdp5_cfg_handler *cfg_handler)
+{
+	kfree(cfg_handler);
+}
+
+struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms *mdp5_kms,
+		uint32_t major, uint32_t minor)
+{
+	struct drm_device *dev = mdp5_kms->dev;
+	struct platform_device *pdev = dev->platformdev;
+	struct mdp5_cfg_handler *cfg_handler;
+	struct mdp5_cfg_platform *pconfig;
+	int i, ret = 0;
+
+	cfg_handler = kzalloc(sizeof(*cfg_handler), GFP_KERNEL);
+	if (unlikely(!cfg_handler)) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	if (major != 1) {
+		dev_err(dev->dev, "unexpected MDP major version: v%d.%d\n",
+				major, minor);
+		ret = -ENXIO;
+		goto fail;
+	}
+
+	/* only after mdp5_cfg global pointer's init can we access the hw */
+	for (i = 0; i < ARRAY_SIZE(cfg_handlers); i++) {
+		if (cfg_handlers[i].revision != minor)
+			continue;
+		mdp5_cfg = cfg_handlers[i].config.hw;
+
+		break;
+	}
+	if (unlikely(!mdp5_cfg)) {
+		dev_err(dev->dev, "unexpected MDP minor revision: v%d.%d\n",
+				major, minor);
+		ret = -ENXIO;
+		goto fail;
+	}
+
+	cfg_handler->revision = minor;
+	cfg_handler->config.hw = mdp5_cfg;
+
+	pconfig = mdp5_get_config(pdev);
+	memcpy(&cfg_handler->config.platform, pconfig, sizeof(*pconfig));
+
+	DBG("MDP5: %s hw config selected", mdp5_cfg->name);
+
+	return cfg_handler;
+
+fail:
+	if (cfg_handler)
+		mdp5_cfg_destroy(cfg_handler);
+
+	return NULL;
+}
+
+static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev)
+{
+	static struct mdp5_cfg_platform config = {};
+#ifdef CONFIG_OF
+	/* TODO */
+#endif
+	config.iommu = iommu_domain_alloc(&platform_bus_type);
+
+	return &config;
+}
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_cfg.h b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_cfg.h
new file mode 100644
index 000000000000..dba4d52cceeb
--- /dev/null
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_cfg.h
@@ -0,0 +1,91 @@
+/*
+ * Copyright (c) 2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __MDP5_CFG_H__
+#define __MDP5_CFG_H__
+
+#include "msm_drv.h"
+
+/*
+ * mdp5_cfg
+ *
+ * This module configures the dynamic offsets used by mdp5.xml.h
+ * (initialized in mdp5_cfg.c)
+ */
+extern const struct mdp5_cfg_hw *mdp5_cfg;
+
+#define MAX_CTL			8
+#define MAX_BASES		8
+#define MAX_SMP_BLOCKS		44
+#define MAX_CLIENTS		32
+
+typedef DECLARE_BITMAP(mdp5_smp_state_t, MAX_SMP_BLOCKS);
+
+#define MDP5_SUB_BLOCK_DEFINITION \
+	int count; \
+	uint32_t base[MAX_BASES]
+
+struct mdp5_sub_block {
+	MDP5_SUB_BLOCK_DEFINITION;
+};
+
+struct mdp5_lm_block {
+	MDP5_SUB_BLOCK_DEFINITION;
+	uint32_t nb_stages;		/* number of stages per blender */
+};
+
+struct mdp5_smp_block {
+	int mmb_count;			/* number of SMP MMBs */
+	int mmb_size;			/* MMB: size in bytes */
+	mdp5_smp_state_t reserved_state;/* SMP MMBs statically allocated */
+	int reserved[MAX_CLIENTS];	/* # of MMBs allocated per client */
+};
+
+struct mdp5_cfg_hw {
+	char  *name;
+
+	struct mdp5_smp_block smp;
+	struct mdp5_sub_block ctl;
+	struct mdp5_sub_block pipe_vig;
+	struct mdp5_sub_block pipe_rgb;
+	struct mdp5_sub_block pipe_dma;
+	struct mdp5_lm_block  lm;
+	struct mdp5_sub_block dspp;
+	struct mdp5_sub_block ad;
+	struct mdp5_sub_block intf;
+
+	uint32_t max_clk;
+};
+
+/* platform config data (ie. from DT, or pdata) */
+struct mdp5_cfg_platform {
+	struct iommu_domain *iommu;
+};
+
+struct mdp5_cfg {
+	const struct mdp5_cfg_hw *hw;
+	struct mdp5_cfg_platform platform;
+};
+
+struct mdp5_kms;
+struct mdp5_cfg_handler;
+
+const struct mdp5_cfg_hw *mdp5_cfg_get_hw_config(struct mdp5_cfg_handler *cfg_hnd);
+struct mdp5_cfg *mdp5_cfg_get_config(struct mdp5_cfg_handler *cfg_hnd);
+int mdp5_cfg_get_hw_rev(struct mdp5_cfg_handler *cfg_hnd);
+
+struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms *mdp5_kms,
+		uint32_t major, uint32_t minor);
+void mdp5_cfg_destroy(struct mdp5_cfg_handler *cfg_hnd);
+
+#endif /* __MDP5_CFG_H__ */
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c
index ebe2e60f3ab1..0e9a2e3a82d7 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c
@@ -1,4 +1,5 @@
 /*
+ * Copyright (c) 2014 The Linux Foundation. All rights reserved.
  * Copyright (C) 2013 Red Hat
  * Author: Rob Clark <robdclark@gmail.com>
  *
@@ -17,43 +18,35 @@
 
 #include "mdp5_kms.h"
 
+#include <linux/sort.h>
 #include <drm/drm_mode.h>
 #include "drm_crtc.h"
 #include "drm_crtc_helper.h"
 #include "drm_flip_work.h"
 
+#define SSPP_MAX	(SSPP_RGB3 + 1) /* TODO: Add SSPP_MAX in mdp5.xml.h */
+
 struct mdp5_crtc {
 	struct drm_crtc base;
 	char name[8];
-	struct drm_plane *plane;
-	struct drm_plane *planes[8];
 	int id;
 	bool enabled;
 
-	/* which mixer/encoder we route output to: */
-	int mixer;
+	/* layer mixer used for this CRTC (+ its lock): */
+#define GET_LM_ID(crtc_id)	((crtc_id == 3) ? 5 : crtc_id)
+	int lm;
+	spinlock_t lm_lock;	/* protect REG_MDP5_LM_* registers */
+
+	/* CTL used for this CRTC: */
+	struct mdp5_ctl *ctl;
 
 	/* if there is a pending flip, these will be non-null: */
 	struct drm_pending_vblank_event *event;
-	struct msm_fence_cb pageflip_cb;
 
 #define PENDING_CURSOR 0x1
 #define PENDING_FLIP   0x2
 	atomic_t pending;
 
-	/* the fb that we logically (from PoV of KMS API) hold a ref
-	 * to.  Which we may not yet be scanning out (we may still
-	 * be scanning out previous in case of page_flip while waiting
-	 * for gpu rendering to complete:
-	 */
-	struct drm_framebuffer *fb;
-
-	/* the fb that we currently hold a scanout ref to: */
-	struct drm_framebuffer *scanout_fb;
-
-	/* for unref'ing framebuffers after scanout completes: */
-	struct drm_flip_work unref_fb_work;
-
 	struct mdp_irq vblank;
 	struct mdp_irq err;
 };
@@ -73,67 +66,38 @@ static void request_pending(struct drm_crtc *crtc, uint32_t pending)
 	mdp_irq_register(&get_kms(crtc)->base, &mdp5_crtc->vblank);
 }
 
-static void crtc_flush(struct drm_crtc *crtc)
-{
-	struct mdp5_crtc *mdp5_crtc = to_mdp5_crtc(crtc);
-	struct mdp5_kms *mdp5_kms = get_kms(crtc);
-	int id = mdp5_crtc->id;
-	uint32_t i, flush = 0;
-
-	for (i = 0; i < ARRAY_SIZE(mdp5_crtc->planes); i++) {
-		struct drm_plane *plane = mdp5_crtc->planes[i];
-		if (plane) {
-			enum mdp5_pipe pipe = mdp5_plane_pipe(plane);
-			flush |= pipe2flush(pipe);
-		}
-	}
-	flush |= mixer2flush(mdp5_crtc->id);
-	flush |= MDP5_CTL_FLUSH_CTL;
-
-	DBG("%s: flush=%08x", mdp5_crtc->name, flush);
-
-	mdp5_write(mdp5_kms, REG_MDP5_CTL_FLUSH(id), flush);
-}
+#define mdp5_lm_get_flush(lm)	mdp_ctl_flush_mask_lm(lm)
 
-static void update_fb(struct drm_crtc *crtc, struct drm_framebuffer *new_fb)
+static void crtc_flush(struct drm_crtc *crtc, u32 flush_mask)
 {
 	struct mdp5_crtc *mdp5_crtc = to_mdp5_crtc(crtc);
-	struct drm_framebuffer *old_fb = mdp5_crtc->fb;
-
-	/* grab reference to incoming scanout fb: */
-	drm_framebuffer_reference(new_fb);
-	mdp5_crtc->base.primary->fb = new_fb;
-	mdp5_crtc->fb = new_fb;
 
-	if (old_fb)
-		drm_flip_work_queue(&mdp5_crtc->unref_fb_work, old_fb);
+	DBG("%s: flush=%08x", mdp5_crtc->name, flush_mask);
+	mdp5_ctl_commit(mdp5_crtc->ctl, flush_mask);
 }
 
-/* unlike update_fb(), take a ref to the new scanout fb *before* updating
- * plane, then call this.  Needed to ensure we don't unref the buffer that
- * is actually still being scanned out.
- *
- * Note that this whole thing goes away with atomic.. since we can defer
- * calling into driver until rendering is done.
+/*
+ * flush updates, to make sure hw is updated to new scanout fb,
+ * so that we can safely queue unref to current fb (ie. next
+ * vblank we know hw is done w/ previous scanout_fb).
  */
-static void update_scanout(struct drm_crtc *crtc, struct drm_framebuffer *fb)
+static void crtc_flush_all(struct drm_crtc *crtc)
 {
 	struct mdp5_crtc *mdp5_crtc = to_mdp5_crtc(crtc);
+	struct drm_plane *plane;
+	uint32_t flush_mask = 0;
 
-	/* flush updates, to make sure hw is updated to new scanout fb,
-	 * so that we can safely queue unref to current fb (ie. next
-	 * vblank we know hw is done w/ previous scanout_fb).
-	 */
-	crtc_flush(crtc);
-
-	if (mdp5_crtc->scanout_fb)
-		drm_flip_work_queue(&mdp5_crtc->unref_fb_work,
-				mdp5_crtc->scanout_fb);
+	/* we could have already released CTL in the disable path: */
+	if (!mdp5_crtc->ctl)
+		return;
 
-	mdp5_crtc->scanout_fb = fb;
+	drm_atomic_crtc_for_each_plane(plane, crtc) {
+		flush_mask |= mdp5_plane_get_flush(plane);
+	}
+	flush_mask |= mdp5_ctl_get_flush(mdp5_crtc->ctl);
+	flush_mask |= mdp5_lm_get_flush(mdp5_crtc->lm);
 
-	/* enable vblank to complete flip: */
-	request_pending(crtc, PENDING_FLIP);
+	crtc_flush(crtc, flush_mask);
 }
 
 /* if file!=NULL, this is preclose potential cancel-flip path */
@@ -142,7 +106,8 @@ static void complete_flip(struct drm_crtc *crtc, struct drm_file *file)
 	struct mdp5_crtc *mdp5_crtc = to_mdp5_crtc(crtc);
 	struct drm_device *dev = crtc->dev;
 	struct drm_pending_vblank_event *event;
-	unsigned long flags, i;
+	struct drm_plane *plane;
+	unsigned long flags;
 
 	spin_lock_irqsave(&dev->event_lock, flags);
 	event = mdp5_crtc->event;
@@ -153,50 +118,22 @@ static void complete_flip(struct drm_crtc *crtc, struct drm_file *file)
 		 */
 		if (!file || (event->base.file_priv == file)) {
 			mdp5_crtc->event = NULL;
+			DBG("%s: send event: %p", mdp5_crtc->name, event);
 			drm_send_vblank_event(dev, mdp5_crtc->id, event);
 		}
 	}
 	spin_unlock_irqrestore(&dev->event_lock, flags);
 
-	for (i = 0; i < ARRAY_SIZE(mdp5_crtc->planes); i++) {
-		struct drm_plane *plane = mdp5_crtc->planes[i];
-		if (plane)
-			mdp5_plane_complete_flip(plane);
+	drm_atomic_crtc_for_each_plane(plane, crtc) {
+		mdp5_plane_complete_flip(plane);
 	}
 }
 
-static void pageflip_cb(struct msm_fence_cb *cb)
-{
-	struct mdp5_crtc *mdp5_crtc =
-		container_of(cb, struct mdp5_crtc, pageflip_cb);
-	struct drm_crtc *crtc = &mdp5_crtc->base;
-	struct drm_framebuffer *fb = mdp5_crtc->fb;
-
-	if (!fb)
-		return;
-
-	drm_framebuffer_reference(fb);
-	mdp5_plane_set_scanout(mdp5_crtc->plane, fb);
-	update_scanout(crtc, fb);
-}
-
-static void unref_fb_worker(struct drm_flip_work *work, void *val)
-{
-	struct mdp5_crtc *mdp5_crtc =
-		container_of(work, struct mdp5_crtc, unref_fb_work);
-	struct drm_device *dev = mdp5_crtc->base.dev;
-
-	mutex_lock(&dev->mode_config.mutex);
-	drm_framebuffer_unreference(val);
-	mutex_unlock(&dev->mode_config.mutex);
-}
-
 static void mdp5_crtc_destroy(struct drm_crtc *crtc)
 {
 	struct mdp5_crtc *mdp5_crtc = to_mdp5_crtc(crtc);
 
 	drm_crtc_cleanup(crtc);
-	drm_flip_work_cleanup(&mdp5_crtc->unref_fb_work);
 
 	kfree(mdp5_crtc);
 }
@@ -214,6 +151,8 @@ static void mdp5_crtc_dpms(struct drm_crtc *crtc, int mode)
 			mdp5_enable(mdp5_kms);
 			mdp_irq_register(&mdp5_kms->base, &mdp5_crtc->err);
 		} else {
+			/* set STAGE_UNUSED for all layers */
+			mdp5_ctl_blend(mdp5_crtc->ctl, mdp5_crtc->lm, 0x00000000);
 			mdp_irq_unregister(&mdp5_kms->base, &mdp5_crtc->err);
 			mdp5_disable(mdp5_kms);
 		}
@@ -228,54 +167,78 @@ static bool mdp5_crtc_mode_fixup(struct drm_crtc *crtc,
 	return true;
 }
 
+/*
+ * blend_setup() - blend all the planes of a CRTC
+ *
+ * When border is enabled, the border color will ALWAYS be the base layer.
+ * Therefore, the first plane (private RGB pipe) will start at STAGE0.
+ * If disabled, the first plane starts at STAGE_BASE.
+ *
+ * Note:
+ * Border is not enabled here because the private plane is exactly
+ * the CRTC resolution.
+ */
 static void blend_setup(struct drm_crtc *crtc)
 {
 	struct mdp5_crtc *mdp5_crtc = to_mdp5_crtc(crtc);
 	struct mdp5_kms *mdp5_kms = get_kms(crtc);
-	int id = mdp5_crtc->id;
+	struct drm_plane *plane;
+	const struct mdp5_cfg_hw *hw_cfg;
+	uint32_t lm = mdp5_crtc->lm, blend_cfg = 0;
+	unsigned long flags;
+#define blender(stage)	((stage) - STAGE_BASE)
 
-	/*
-	 * Hard-coded setup for now until I figure out how the
-	 * layer-mixer works
-	 */
+	hw_cfg = mdp5_cfg_get_hw_config(mdp5_kms->cfg);
 
-	/* LM[id]: */
-	mdp5_write(mdp5_kms, REG_MDP5_LM_BLEND_COLOR_OUT(id),
-			MDP5_LM_BLEND_COLOR_OUT_STAGE0_FG_ALPHA);
-	mdp5_write(mdp5_kms, REG_MDP5_LM_BLEND_OP_MODE(id, 0),
-			MDP5_LM_BLEND_OP_MODE_FG_ALPHA(FG_CONST) |
-			MDP5_LM_BLEND_OP_MODE_BG_ALPHA(FG_PIXEL) |
-			MDP5_LM_BLEND_OP_MODE_BG_INV_ALPHA);
-	mdp5_write(mdp5_kms, REG_MDP5_LM_BLEND_FG_ALPHA(id, 0), 0xff);
-	mdp5_write(mdp5_kms, REG_MDP5_LM_BLEND_BG_ALPHA(id, 0), 0x00);
-
-	/* NOTE: seems that LM[n] and CTL[m], we do not need n==m.. but
-	 * we want to be setting CTL[m].LAYER[n].  Not sure what the
-	 * point of having CTL[m].LAYER[o] (for o!=n).. maybe that is
-	 * used when chaining up mixers for high resolution displays?
-	 */
+	spin_lock_irqsave(&mdp5_crtc->lm_lock, flags);
+
+	/* ctl could be released already when we are shutting down: */
+	if (!mdp5_crtc->ctl)
+		goto out;
 
-	/* CTL[id]: */
-	mdp5_write(mdp5_kms, REG_MDP5_CTL_LAYER_REG(id, 0),
-			MDP5_CTL_LAYER_REG_RGB0(STAGE0) |
-			MDP5_CTL_LAYER_REG_BORDER_COLOR);
-	mdp5_write(mdp5_kms, REG_MDP5_CTL_LAYER_REG(id, 1), 0);
-	mdp5_write(mdp5_kms, REG_MDP5_CTL_LAYER_REG(id, 2), 0);
-	mdp5_write(mdp5_kms, REG_MDP5_CTL_LAYER_REG(id, 3), 0);
-	mdp5_write(mdp5_kms, REG_MDP5_CTL_LAYER_REG(id, 4), 0);
+	drm_atomic_crtc_for_each_plane(plane, crtc) {
+		enum mdp_mixer_stage_id stage =
+			to_mdp5_plane_state(plane->state)->stage;
+
+		/*
+		 * Note: This cannot happen with current implementation but
+		 * we need to check this condition once z property is added
+		 */
+		BUG_ON(stage > hw_cfg->lm.nb_stages);
+
+		/* LM */
+		mdp5_write(mdp5_kms,
+				REG_MDP5_LM_BLEND_OP_MODE(lm, blender(stage)),
+				MDP5_LM_BLEND_OP_MODE_FG_ALPHA(FG_CONST) |
+				MDP5_LM_BLEND_OP_MODE_BG_ALPHA(BG_CONST));
+		mdp5_write(mdp5_kms, REG_MDP5_LM_BLEND_FG_ALPHA(lm,
+				blender(stage)), 0xff);
+		mdp5_write(mdp5_kms, REG_MDP5_LM_BLEND_BG_ALPHA(lm,
+				blender(stage)), 0x00);
+		/* CTL */
+		blend_cfg |= mdp_ctl_blend_mask(mdp5_plane_pipe(plane), stage);
+		DBG("%s: blending pipe %s on stage=%d", mdp5_crtc->name,
+				pipe2name(mdp5_plane_pipe(plane)), stage);
+	}
+
+	DBG("%s: lm%d: blend config = 0x%08x", mdp5_crtc->name, lm, blend_cfg);
+	mdp5_ctl_blend(mdp5_crtc->ctl, lm, blend_cfg);
+
+out:
+	spin_unlock_irqrestore(&mdp5_crtc->lm_lock, flags);
 }
 
-static int mdp5_crtc_mode_set(struct drm_crtc *crtc,
-		struct drm_display_mode *mode,
-		struct drm_display_mode *adjusted_mode,
-		int x, int y,
-		struct drm_framebuffer *old_fb)
+static void mdp5_crtc_mode_set_nofb(struct drm_crtc *crtc)
 {
 	struct mdp5_crtc *mdp5_crtc = to_mdp5_crtc(crtc);
 	struct mdp5_kms *mdp5_kms = get_kms(crtc);
-	int ret;
+	unsigned long flags;
+	struct drm_display_mode *mode;
 
-	mode = adjusted_mode;
+	if (WARN_ON(!crtc->state))
+		return;
+
+	mode = &crtc->state->adjusted_mode;
 
 	DBG("%s: set mode: %d:\"%s\" %d %d %d %d %d %d %d %d %d %d 0x%x 0x%x",
 			mdp5_crtc->name, mode->base.id, mode->name,
@@ -286,28 +249,11 @@ static int mdp5_crtc_mode_set(struct drm_crtc *crtc,
 			mode->vsync_end, mode->vtotal,
 			mode->type, mode->flags);
 
-	/* grab extra ref for update_scanout() */
-	drm_framebuffer_reference(crtc->primary->fb);
-
-	ret = mdp5_plane_mode_set(mdp5_crtc->plane, crtc, crtc->primary->fb,
-			0, 0, mode->hdisplay, mode->vdisplay,
-			x << 16, y << 16,
-			mode->hdisplay << 16, mode->vdisplay << 16);
-	if (ret) {
-		drm_framebuffer_unreference(crtc->primary->fb);
-		dev_err(crtc->dev->dev, "%s: failed to set mode on plane: %d\n",
-				mdp5_crtc->name, ret);
-		return ret;
-	}
-
-	mdp5_write(mdp5_kms, REG_MDP5_LM_OUT_SIZE(mdp5_crtc->id),
+	spin_lock_irqsave(&mdp5_crtc->lm_lock, flags);
+	mdp5_write(mdp5_kms, REG_MDP5_LM_OUT_SIZE(mdp5_crtc->lm),
 			MDP5_LM_OUT_SIZE_WIDTH(mode->hdisplay) |
 			MDP5_LM_OUT_SIZE_HEIGHT(mode->vdisplay));
-
-	update_fb(crtc, crtc->primary->fb);
-	update_scanout(crtc, crtc->primary->fb);
-
-	return 0;
+	spin_unlock_irqrestore(&mdp5_crtc->lm_lock, flags);
 }
 
 static void mdp5_crtc_prepare(struct drm_crtc *crtc)
@@ -321,66 +267,119 @@ static void mdp5_crtc_prepare(struct drm_crtc *crtc)
 
 static void mdp5_crtc_commit(struct drm_crtc *crtc)
 {
+	struct mdp5_crtc *mdp5_crtc = to_mdp5_crtc(crtc);
+	DBG("%s", mdp5_crtc->name);
 	mdp5_crtc_dpms(crtc, DRM_MODE_DPMS_ON);
-	crtc_flush(crtc);
+	crtc_flush_all(crtc);
 	/* drop the ref to mdp clk's that we got in prepare: */
 	mdp5_disable(get_kms(crtc));
 }
 
-static int mdp5_crtc_mode_set_base(struct drm_crtc *crtc, int x, int y,
-		struct drm_framebuffer *old_fb)
+static void mdp5_crtc_load_lut(struct drm_crtc *crtc)
+{
+}
+
+struct plane_state {
+	struct drm_plane *plane;
+	struct mdp5_plane_state *state;
+};
+
+static int pstate_cmp(const void *a, const void *b)
+{
+	struct plane_state *pa = (struct plane_state *)a;
+	struct plane_state *pb = (struct plane_state *)b;
+	return pa->state->zpos - pb->state->zpos;
+}
+
+static int mdp5_crtc_atomic_check(struct drm_crtc *crtc,
+		struct drm_crtc_state *state)
 {
 	struct mdp5_crtc *mdp5_crtc = to_mdp5_crtc(crtc);
-	struct drm_plane *plane = mdp5_crtc->plane;
-	struct drm_display_mode *mode = &crtc->mode;
-	int ret;
-
-	/* grab extra ref for update_scanout() */
-	drm_framebuffer_reference(crtc->primary->fb);
-
-	ret = mdp5_plane_mode_set(plane, crtc, crtc->primary->fb,
-			0, 0, mode->hdisplay, mode->vdisplay,
-			x << 16, y << 16,
-			mode->hdisplay << 16, mode->vdisplay << 16);
-	if (ret) {
-		drm_framebuffer_unreference(crtc->primary->fb);
-		return ret;
+	struct mdp5_kms *mdp5_kms = get_kms(crtc);
+	struct drm_plane *plane;
+	struct drm_device *dev = crtc->dev;
+	struct plane_state pstates[STAGE3 + 1];
+	int cnt = 0, i;
+
+	DBG("%s: check", mdp5_crtc->name);
+
+	if (mdp5_crtc->event) {
+		dev_err(dev->dev, "already pending flip!\n");
+		return -EBUSY;
 	}
 
-	update_fb(crtc, crtc->primary->fb);
-	update_scanout(crtc, crtc->primary->fb);
+	/* request a free CTL, if none is already allocated for this CRTC */
+	if (state->enable && !mdp5_crtc->ctl) {
+		mdp5_crtc->ctl = mdp5_ctlm_request(mdp5_kms->ctlm, crtc);
+		if (WARN_ON(!mdp5_crtc->ctl))
+			return -EINVAL;
+	}
+
+	/* verify that there are not too many planes attached to crtc
+	 * and that we don't have conflicting mixer stages:
+	 */
+	drm_atomic_crtc_state_for_each_plane(plane, state) {
+		struct drm_plane_state *pstate;
+
+		if (cnt >= ARRAY_SIZE(pstates)) {
+			dev_err(dev->dev, "too many planes!\n");
+			return -EINVAL;
+		}
+
+		pstate = state->state->plane_states[drm_plane_index(plane)];
+
+		/* plane might not have changed, in which case take
+		 * current state:
+		 */
+		if (!pstate)
+			pstate = plane->state;
+
+		pstates[cnt].plane = plane;
+		pstates[cnt].state = to_mdp5_plane_state(pstate);
+
+		cnt++;
+	}
+
+	sort(pstates, cnt, sizeof(pstates[0]), pstate_cmp, NULL);
+
+	for (i = 0; i < cnt; i++) {
+		pstates[i].state->stage = STAGE_BASE + i;
+		DBG("%s: assign pipe %s on stage=%d", mdp5_crtc->name,
+				pipe2name(mdp5_plane_pipe(pstates[i].plane)),
+				pstates[i].state->stage);
+	}
 
 	return 0;
 }
 
-static void mdp5_crtc_load_lut(struct drm_crtc *crtc)
+static void mdp5_crtc_atomic_begin(struct drm_crtc *crtc)
 {
+	struct mdp5_crtc *mdp5_crtc = to_mdp5_crtc(crtc);
+	DBG("%s: begin", mdp5_crtc->name);
 }
 
-static int mdp5_crtc_page_flip(struct drm_crtc *crtc,
-		struct drm_framebuffer *new_fb,
-		struct drm_pending_vblank_event *event,
-		uint32_t page_flip_flags)
+static void mdp5_crtc_atomic_flush(struct drm_crtc *crtc)
 {
 	struct mdp5_crtc *mdp5_crtc = to_mdp5_crtc(crtc);
 	struct drm_device *dev = crtc->dev;
-	struct drm_gem_object *obj;
 	unsigned long flags;
 
-	if (mdp5_crtc->event) {
-		dev_err(dev->dev, "already pending flip!\n");
-		return -EBUSY;
-	}
+	DBG("%s: flush", mdp5_crtc->name);
 
-	obj = msm_framebuffer_bo(new_fb, 0);
+	WARN_ON(mdp5_crtc->event);
 
 	spin_lock_irqsave(&dev->event_lock, flags);
-	mdp5_crtc->event = event;
+	mdp5_crtc->event = crtc->state->event;
 	spin_unlock_irqrestore(&dev->event_lock, flags);
 
-	update_fb(crtc, new_fb);
+	blend_setup(crtc);
+	crtc_flush_all(crtc);
+	request_pending(crtc, PENDING_FLIP);
 
-	return msm_gem_queue_inactive_cb(obj, &mdp5_crtc->pageflip_cb);
+	if (mdp5_crtc->ctl && !crtc->state->enable) {
+		mdp5_ctl_release(mdp5_crtc->ctl);
+		mdp5_crtc->ctl = NULL;
+	}
 }
 
 static int mdp5_crtc_set_property(struct drm_crtc *crtc,
@@ -391,27 +390,33 @@ static int mdp5_crtc_set_property(struct drm_crtc *crtc,
 }
 
 static const struct drm_crtc_funcs mdp5_crtc_funcs = {
-	.set_config = drm_crtc_helper_set_config,
+	.set_config = drm_atomic_helper_set_config,
 	.destroy = mdp5_crtc_destroy,
-	.page_flip = mdp5_crtc_page_flip,
+	.page_flip = drm_atomic_helper_page_flip,
 	.set_property = mdp5_crtc_set_property,
+	.reset = drm_atomic_helper_crtc_reset,
+	.atomic_duplicate_state = drm_atomic_helper_crtc_duplicate_state,
+	.atomic_destroy_state = drm_atomic_helper_crtc_destroy_state,
 };
 
 static const struct drm_crtc_helper_funcs mdp5_crtc_helper_funcs = {
 	.dpms = mdp5_crtc_dpms,
 	.mode_fixup = mdp5_crtc_mode_fixup,
-	.mode_set = mdp5_crtc_mode_set,
+	.mode_set_nofb = mdp5_crtc_mode_set_nofb,
+	.mode_set = drm_helper_crtc_mode_set,
+	.mode_set_base = drm_helper_crtc_mode_set_base,
 	.prepare = mdp5_crtc_prepare,
 	.commit = mdp5_crtc_commit,
-	.mode_set_base = mdp5_crtc_mode_set_base,
 	.load_lut = mdp5_crtc_load_lut,
+	.atomic_check = mdp5_crtc_atomic_check,
+	.atomic_begin = mdp5_crtc_atomic_begin,
+	.atomic_flush = mdp5_crtc_atomic_flush,
 };
 
 static void mdp5_crtc_vblank_irq(struct mdp_irq *irq, uint32_t irqstatus)
 {
 	struct mdp5_crtc *mdp5_crtc = container_of(irq, struct mdp5_crtc, vblank);
 	struct drm_crtc *crtc = &mdp5_crtc->base;
-	struct msm_drm_private *priv = crtc->dev->dev_private;
 	unsigned pending;
 
 	mdp_irq_unregister(&get_kms(crtc)->base, &mdp5_crtc->vblank);
@@ -420,16 +425,14 @@ static void mdp5_crtc_vblank_irq(struct mdp_irq *irq, uint32_t irqstatus)
 
 	if (pending & PENDING_FLIP) {
 		complete_flip(crtc, NULL);
-		drm_flip_work_commit(&mdp5_crtc->unref_fb_work, priv->wq);
 	}
 }
 
 static void mdp5_crtc_err_irq(struct mdp_irq *irq, uint32_t irqstatus)
 {
 	struct mdp5_crtc *mdp5_crtc = container_of(irq, struct mdp5_crtc, err);
-	struct drm_crtc *crtc = &mdp5_crtc->base;
+
 	DBG("%s: error: %08x", mdp5_crtc->name, irqstatus);
-	crtc_flush(crtc);
 }
 
 uint32_t mdp5_crtc_vblank(struct drm_crtc *crtc)
@@ -450,10 +453,9 @@ void mdp5_crtc_set_intf(struct drm_crtc *crtc, int intf,
 {
 	struct mdp5_crtc *mdp5_crtc = to_mdp5_crtc(crtc);
 	struct mdp5_kms *mdp5_kms = get_kms(crtc);
-	static const enum mdp5_intfnum intfnum[] = {
-			INTF0, INTF1, INTF2, INTF3,
-	};
+	uint32_t flush_mask = 0;
 	uint32_t intf_sel;
+	unsigned long flags;
 
 	/* now that we know what irq's we want: */
 	mdp5_crtc->err.irqmask = intf2err(intf);
@@ -463,6 +465,7 @@ void mdp5_crtc_set_intf(struct drm_crtc *crtc, int intf,
 	if (!mdp5_kms)
 		return;
 
+	spin_lock_irqsave(&mdp5_kms->resource_lock, flags);
 	intf_sel = mdp5_read(mdp5_kms, REG_MDP5_DISP_INTF_SEL);
 
 	switch (intf) {
@@ -487,45 +490,25 @@ void mdp5_crtc_set_intf(struct drm_crtc *crtc, int intf,
 		break;
 	}
 
-	blend_setup(crtc);
+	mdp5_write(mdp5_kms, REG_MDP5_DISP_INTF_SEL, intf_sel);
+	spin_unlock_irqrestore(&mdp5_kms->resource_lock, flags);
 
 	DBG("%s: intf_sel=%08x", mdp5_crtc->name, intf_sel);
+	mdp5_ctl_set_intf(mdp5_crtc->ctl, intf);
+	flush_mask |= mdp5_ctl_get_flush(mdp5_crtc->ctl);
+	flush_mask |= mdp5_lm_get_flush(mdp5_crtc->lm);
 
-	mdp5_write(mdp5_kms, REG_MDP5_DISP_INTF_SEL, intf_sel);
-	mdp5_write(mdp5_kms, REG_MDP5_CTL_OP(mdp5_crtc->id),
-			MDP5_CTL_OP_MODE(MODE_NONE) |
-			MDP5_CTL_OP_INTF_NUM(intfnum[intf]));
-
-	crtc_flush(crtc);
+	crtc_flush(crtc, flush_mask);
 }
 
-static void set_attach(struct drm_crtc *crtc, enum mdp5_pipe pipe_id,
-		struct drm_plane *plane)
+int mdp5_crtc_get_lm(struct drm_crtc *crtc)
 {
 	struct mdp5_crtc *mdp5_crtc = to_mdp5_crtc(crtc);
 
-	BUG_ON(pipe_id >= ARRAY_SIZE(mdp5_crtc->planes));
+	if (WARN_ON(!crtc))
+		return -EINVAL;
 
-	if (mdp5_crtc->planes[pipe_id] == plane)
-		return;
-
-	mdp5_crtc->planes[pipe_id] = plane;
-	blend_setup(crtc);
-	if (mdp5_crtc->enabled && (plane != mdp5_crtc->plane))
-		crtc_flush(crtc);
-}
-
-void mdp5_crtc_attach(struct drm_crtc *crtc, struct drm_plane *plane)
-{
-	set_attach(crtc, mdp5_plane_pipe(plane), plane);
-}
-
-void mdp5_crtc_detach(struct drm_crtc *crtc, struct drm_plane *plane)
-{
-	/* don't actually detatch our primary plane: */
-	if (to_mdp5_crtc(crtc)->plane == plane)
-		return;
-	set_attach(crtc, mdp5_plane_pipe(plane), NULL);
+	return mdp5_crtc->lm;
 }
 
 /* initialize crtc */
@@ -534,18 +517,17 @@ struct drm_crtc *mdp5_crtc_init(struct drm_device *dev,
 {
 	struct drm_crtc *crtc = NULL;
 	struct mdp5_crtc *mdp5_crtc;
-	int ret;
 
 	mdp5_crtc = kzalloc(sizeof(*mdp5_crtc), GFP_KERNEL);
-	if (!mdp5_crtc) {
-		ret = -ENOMEM;
-		goto fail;
-	}
+	if (!mdp5_crtc)
+		return ERR_PTR(-ENOMEM);
 
 	crtc = &mdp5_crtc->base;
 
-	mdp5_crtc->plane = plane;
 	mdp5_crtc->id = id;
+	mdp5_crtc->lm = GET_LM_ID(id);
+
+	spin_lock_init(&mdp5_crtc->lm_lock);
 
 	mdp5_crtc->vblank.irq = mdp5_crtc_vblank_irq;
 	mdp5_crtc->err.irq = mdp5_crtc_err_irq;
@@ -553,23 +535,11 @@ struct drm_crtc *mdp5_crtc_init(struct drm_device *dev,
 	snprintf(mdp5_crtc->name, sizeof(mdp5_crtc->name), "%s:%d",
 			pipe2name(mdp5_plane_pipe(plane)), id);
 
-	ret = drm_flip_work_init(&mdp5_crtc->unref_fb_work, 16,
-			"unref fb", unref_fb_worker);
-	if (ret)
-		goto fail;
-
-	INIT_FENCE_CB(&mdp5_crtc->pageflip_cb, pageflip_cb);
-
 	drm_crtc_init_with_planes(dev, crtc, plane, NULL, &mdp5_crtc_funcs);
 	drm_crtc_helper_add(crtc, &mdp5_crtc_helper_funcs);
+	plane->crtc = crtc;
 
-	mdp5_plane_install_properties(mdp5_crtc->plane, &crtc->base);
+	mdp5_plane_install_properties(plane, &crtc->base);
 
 	return crtc;
-
-fail:
-	if (crtc)
-		mdp5_crtc_destroy(crtc);
-
-	return ERR_PTR(ret);
 }
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_ctl.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_ctl.c
new file mode 100644
index 000000000000..dea4505ac963
--- /dev/null
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_ctl.c
@@ -0,0 +1,322 @@
+/*
+ * Copyright (c) 2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include "mdp5_kms.h"
+#include "mdp5_ctl.h"
+
+/*
+ * CTL - MDP Control Pool Manager
+ *
+ * Controls are shared between all CRTCs.
+ *
+ * They are intended to be used for data path configuration.
+ * The top level register programming describes the complete data path for
+ * a specific data path ID - REG_MDP5_CTL_*(<id>, ...)
+ *
+ * Hardware capabilities determine the number of concurrent data paths
+ *
+ * In certain use cases (high-resolution dual pipe), one single CTL can be
+ * shared across multiple CRTCs.
+ *
+ * Because the number of CTLs can be less than the number of CRTCs,
+ * CTLs are dynamically allocated from a pool of CTLs, only once a CRTC is
+ * requested by the client (in mdp5_crtc_mode_set()).
+ */
+
+struct mdp5_ctl {
+	struct mdp5_ctl_manager *ctlm;
+
+	u32 id;
+
+	/* whether this CTL has been allocated or not: */
+	bool busy;
+
+	/* memory output connection (@see mdp5_ctl_mode): */
+	u32 mode;
+
+	/* REG_MDP5_CTL_*(<id>) registers access info + lock: */
+	spinlock_t hw_lock;
+	u32 reg_offset;
+
+	/* flush mask used to commit CTL registers */
+	u32 flush_mask;
+
+	bool cursor_on;
+
+	struct drm_crtc *crtc;
+};
+
+struct mdp5_ctl_manager {
+	struct drm_device *dev;
+
+	/* number of CTL / Layer Mixers in this hw config: */
+	u32 nlm;
+	u32 nctl;
+
+	/* pool of CTLs + lock to protect resource allocation (ctls[i].busy) */
+	spinlock_t pool_lock;
+	struct mdp5_ctl ctls[MAX_CTL];
+};
+
+static inline
+struct mdp5_kms *get_kms(struct mdp5_ctl_manager *ctl_mgr)
+{
+	struct msm_drm_private *priv = ctl_mgr->dev->dev_private;
+
+	return to_mdp5_kms(to_mdp_kms(priv->kms));
+}
+
+static inline
+void ctl_write(struct mdp5_ctl *ctl, u32 reg, u32 data)
+{
+	struct mdp5_kms *mdp5_kms = get_kms(ctl->ctlm);
+
+	(void)ctl->reg_offset; /* TODO use this instead of mdp5_write */
+	mdp5_write(mdp5_kms, reg, data);
+}
+
+static inline
+u32 ctl_read(struct mdp5_ctl *ctl, u32 reg)
+{
+	struct mdp5_kms *mdp5_kms = get_kms(ctl->ctlm);
+
+	(void)ctl->reg_offset; /* TODO use this instead of mdp5_write */
+	return mdp5_read(mdp5_kms, reg);
+}
+
+
+int mdp5_ctl_set_intf(struct mdp5_ctl *ctl, enum mdp5_intf intf)
+{
+	unsigned long flags;
+	static const enum mdp5_intfnum intfnum[] = {
+			INTF0, INTF1, INTF2, INTF3,
+	};
+
+	spin_lock_irqsave(&ctl->hw_lock, flags);
+	ctl_write(ctl, REG_MDP5_CTL_OP(ctl->id),
+			MDP5_CTL_OP_MODE(ctl->mode) |
+			MDP5_CTL_OP_INTF_NUM(intfnum[intf]));
+	spin_unlock_irqrestore(&ctl->hw_lock, flags);
+
+	return 0;
+}
+
+int mdp5_ctl_set_cursor(struct mdp5_ctl *ctl, bool enable)
+{
+	struct mdp5_ctl_manager *ctl_mgr = ctl->ctlm;
+	unsigned long flags;
+	u32 blend_cfg;
+	int lm;
+
+	lm = mdp5_crtc_get_lm(ctl->crtc);
+	if (unlikely(WARN_ON(lm < 0))) {
+		dev_err(ctl_mgr->dev->dev, "CTL %d cannot find LM: %d",
+				ctl->id, lm);
+		return -EINVAL;
+	}
+
+	spin_lock_irqsave(&ctl->hw_lock, flags);
+
+	blend_cfg = ctl_read(ctl, REG_MDP5_CTL_LAYER_REG(ctl->id, lm));
+
+	if (enable)
+		blend_cfg |=  MDP5_CTL_LAYER_REG_CURSOR_OUT;
+	else
+		blend_cfg &= ~MDP5_CTL_LAYER_REG_CURSOR_OUT;
+
+	ctl_write(ctl, REG_MDP5_CTL_LAYER_REG(ctl->id, lm), blend_cfg);
+
+	spin_unlock_irqrestore(&ctl->hw_lock, flags);
+
+	ctl->cursor_on = enable;
+
+	return 0;
+}
+
+
+int mdp5_ctl_blend(struct mdp5_ctl *ctl, u32 lm, u32 blend_cfg)
+{
+	unsigned long flags;
+
+	if (ctl->cursor_on)
+		blend_cfg |=  MDP5_CTL_LAYER_REG_CURSOR_OUT;
+	else
+		blend_cfg &= ~MDP5_CTL_LAYER_REG_CURSOR_OUT;
+
+	spin_lock_irqsave(&ctl->hw_lock, flags);
+	ctl_write(ctl, REG_MDP5_CTL_LAYER_REG(ctl->id, lm), blend_cfg);
+	spin_unlock_irqrestore(&ctl->hw_lock, flags);
+
+	return 0;
+}
+
+int mdp5_ctl_commit(struct mdp5_ctl *ctl, u32 flush_mask)
+{
+	struct mdp5_ctl_manager *ctl_mgr = ctl->ctlm;
+	unsigned long flags;
+
+	if (flush_mask & MDP5_CTL_FLUSH_CURSOR_DUMMY) {
+		int lm = mdp5_crtc_get_lm(ctl->crtc);
+
+		if (unlikely(WARN_ON(lm < 0))) {
+			dev_err(ctl_mgr->dev->dev, "CTL %d cannot find LM: %d",
+					ctl->id, lm);
+			return -EINVAL;
+		}
+
+		/* for current targets, cursor bit is the same as LM bit */
+		flush_mask |= mdp_ctl_flush_mask_lm(lm);
+	}
+
+	spin_lock_irqsave(&ctl->hw_lock, flags);
+	ctl_write(ctl, REG_MDP5_CTL_FLUSH(ctl->id), flush_mask);
+	spin_unlock_irqrestore(&ctl->hw_lock, flags);
+
+	return 0;
+}
+
+u32 mdp5_ctl_get_flush(struct mdp5_ctl *ctl)
+{
+	return ctl->flush_mask;
+}
+
+void mdp5_ctl_release(struct mdp5_ctl *ctl)
+{
+	struct mdp5_ctl_manager *ctl_mgr = ctl->ctlm;
+	unsigned long flags;
+
+	if (unlikely(WARN_ON(ctl->id >= MAX_CTL) || !ctl->busy)) {
+		dev_err(ctl_mgr->dev->dev, "CTL %d in bad state (%d)",
+				ctl->id, ctl->busy);
+		return;
+	}
+
+	spin_lock_irqsave(&ctl_mgr->pool_lock, flags);
+	ctl->busy = false;
+	spin_unlock_irqrestore(&ctl_mgr->pool_lock, flags);
+
+	DBG("CTL %d released", ctl->id);
+}
+
+/*
+ * mdp5_ctl_request() - CTL dynamic allocation
+ *
+ * Note: Current implementation considers that we can only have one CRTC per CTL
+ *
+ * @return first free CTL
+ */
+struct mdp5_ctl *mdp5_ctlm_request(struct mdp5_ctl_manager *ctl_mgr,
+		struct drm_crtc *crtc)
+{
+	struct mdp5_ctl *ctl = NULL;
+	unsigned long flags;
+	int c;
+
+	spin_lock_irqsave(&ctl_mgr->pool_lock, flags);
+
+	for (c = 0; c < ctl_mgr->nctl; c++)
+		if (!ctl_mgr->ctls[c].busy)
+			break;
+
+	if (unlikely(c >= ctl_mgr->nctl)) {
+		dev_err(ctl_mgr->dev->dev, "No more CTL available!");
+		goto unlock;
+	}
+
+	ctl = &ctl_mgr->ctls[c];
+
+	ctl->crtc = crtc;
+	ctl->busy = true;
+	DBG("CTL %d allocated", ctl->id);
+
+unlock:
+	spin_unlock_irqrestore(&ctl_mgr->pool_lock, flags);
+	return ctl;
+}
+
+void mdp5_ctlm_hw_reset(struct mdp5_ctl_manager *ctl_mgr)
+{
+	unsigned long flags;
+	int c;
+
+	for (c = 0; c < ctl_mgr->nctl; c++) {
+		struct mdp5_ctl *ctl = &ctl_mgr->ctls[c];
+
+		spin_lock_irqsave(&ctl->hw_lock, flags);
+		ctl_write(ctl, REG_MDP5_CTL_OP(ctl->id), 0);
+		spin_unlock_irqrestore(&ctl->hw_lock, flags);
+	}
+}
+
+void mdp5_ctlm_destroy(struct mdp5_ctl_manager *ctl_mgr)
+{
+	kfree(ctl_mgr);
+}
+
+struct mdp5_ctl_manager *mdp5_ctlm_init(struct drm_device *dev,
+		void __iomem *mmio_base, const struct mdp5_cfg_hw *hw_cfg)
+{
+	struct mdp5_ctl_manager *ctl_mgr;
+	const struct mdp5_sub_block *ctl_cfg = &hw_cfg->ctl;
+	unsigned long flags;
+	int c, ret;
+
+	ctl_mgr = kzalloc(sizeof(*ctl_mgr), GFP_KERNEL);
+	if (!ctl_mgr) {
+		dev_err(dev->dev, "failed to allocate CTL manager\n");
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	if (unlikely(WARN_ON(ctl_cfg->count > MAX_CTL))) {
+		dev_err(dev->dev, "Increase static pool size to at least %d\n",
+				ctl_cfg->count);
+		ret = -ENOSPC;
+		goto fail;
+	}
+
+	/* initialize the CTL manager: */
+	ctl_mgr->dev = dev;
+	ctl_mgr->nlm = hw_cfg->lm.count;
+	ctl_mgr->nctl = ctl_cfg->count;
+	spin_lock_init(&ctl_mgr->pool_lock);
+
+	/* initialize each CTL of the pool: */
+	spin_lock_irqsave(&ctl_mgr->pool_lock, flags);
+	for (c = 0; c < ctl_mgr->nctl; c++) {
+		struct mdp5_ctl *ctl = &ctl_mgr->ctls[c];
+
+		if (WARN_ON(!ctl_cfg->base[c])) {
+			dev_err(dev->dev, "CTL_%d: base is null!\n", c);
+			ret = -EINVAL;
+			goto fail;
+		}
+		ctl->ctlm = ctl_mgr;
+		ctl->id = c;
+		ctl->mode = MODE_NONE;
+		ctl->reg_offset = ctl_cfg->base[c];
+		ctl->flush_mask = MDP5_CTL_FLUSH_CTL;
+		ctl->busy = false;
+		spin_lock_init(&ctl->hw_lock);
+	}
+	spin_unlock_irqrestore(&ctl_mgr->pool_lock, flags);
+	DBG("Pool of %d CTLs created.", ctl_mgr->nctl);
+
+	return ctl_mgr;
+
+fail:
+	if (ctl_mgr)
+		mdp5_ctlm_destroy(ctl_mgr);
+
+	return ERR_PTR(ret);
+}
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_ctl.h b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_ctl.h
new file mode 100644
index 000000000000..1018519b6af2
--- /dev/null
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_ctl.h
@@ -0,0 +1,122 @@
+/*
+ * Copyright (c) 2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __MDP5_CTL_H__
+#define __MDP5_CTL_H__
+
+#include "msm_drv.h"
+
+/*
+ * CTL Manager prototypes:
+ * mdp5_ctlm_init() returns a ctlm (CTL Manager) handler,
+ * which is then used to call the other mdp5_ctlm_*(ctlm, ...) functions.
+ */
+struct mdp5_ctl_manager;
+struct mdp5_ctl_manager *mdp5_ctlm_init(struct drm_device *dev,
+		void __iomem *mmio_base, const struct mdp5_cfg_hw *hw_cfg);
+void mdp5_ctlm_hw_reset(struct mdp5_ctl_manager *ctlm);
+void mdp5_ctlm_destroy(struct mdp5_ctl_manager *ctlm);
+
+/*
+ * CTL prototypes:
+ * mdp5_ctl_request(ctlm, ...) returns a ctl (CTL resource) handler,
+ * which is then used to call the other mdp5_ctl_*(ctl, ...) functions.
+ */
+struct mdp5_ctl *mdp5_ctlm_request(struct mdp5_ctl_manager *ctlm, struct drm_crtc *crtc);
+
+int mdp5_ctl_set_intf(struct mdp5_ctl *ctl, enum mdp5_intf intf);
+
+int mdp5_ctl_set_cursor(struct mdp5_ctl *ctl, bool enable);
+
+/* @blend_cfg: see LM blender config definition below */
+int mdp5_ctl_blend(struct mdp5_ctl *ctl, u32 lm, u32 blend_cfg);
+
+/* @flush_mask: see CTL flush masks definitions below */
+int mdp5_ctl_commit(struct mdp5_ctl *ctl, u32 flush_mask);
+u32 mdp5_ctl_get_flush(struct mdp5_ctl *ctl);
+
+void mdp5_ctl_release(struct mdp5_ctl *ctl);
+
+/*
+ * blend_cfg (LM blender config):
+ *
+ * The function below allows the caller of mdp5_ctl_blend() to specify how pipes
+ * are being blended according to their stage (z-order), through @blend_cfg arg.
+ */
+static inline u32 mdp_ctl_blend_mask(enum mdp5_pipe pipe,
+		enum mdp_mixer_stage_id stage)
+{
+	switch (pipe) {
+	case SSPP_VIG0: return MDP5_CTL_LAYER_REG_VIG0(stage);
+	case SSPP_VIG1: return MDP5_CTL_LAYER_REG_VIG1(stage);
+	case SSPP_VIG2: return MDP5_CTL_LAYER_REG_VIG2(stage);
+	case SSPP_RGB0: return MDP5_CTL_LAYER_REG_RGB0(stage);
+	case SSPP_RGB1: return MDP5_CTL_LAYER_REG_RGB1(stage);
+	case SSPP_RGB2: return MDP5_CTL_LAYER_REG_RGB2(stage);
+	case SSPP_DMA0: return MDP5_CTL_LAYER_REG_DMA0(stage);
+	case SSPP_DMA1: return MDP5_CTL_LAYER_REG_DMA1(stage);
+	case SSPP_VIG3: return MDP5_CTL_LAYER_REG_VIG3(stage);
+	case SSPP_RGB3: return MDP5_CTL_LAYER_REG_RGB3(stage);
+	default:	return 0;
+	}
+}
+
+/*
+ * flush_mask (CTL flush masks):
+ *
+ * The following functions allow each DRM entity to get and store
+ * their own flush mask.
+ * Once stored, these masks will then be accessed through each DRM's
+ * interface and used by the caller of mdp5_ctl_commit() to specify
+ * which block(s) need to be flushed through @flush_mask parameter.
+ */
+
+#define MDP5_CTL_FLUSH_CURSOR_DUMMY	0x80000000
+
+static inline u32 mdp_ctl_flush_mask_cursor(int cursor_id)
+{
+	/* TODO: use id once multiple cursor support is present */
+	(void)cursor_id;
+
+	return MDP5_CTL_FLUSH_CURSOR_DUMMY;
+}
+
+static inline u32 mdp_ctl_flush_mask_lm(int lm)
+{
+	switch (lm) {
+	case 0:  return MDP5_CTL_FLUSH_LM0;
+	case 1:  return MDP5_CTL_FLUSH_LM1;
+	case 2:  return MDP5_CTL_FLUSH_LM2;
+	case 5:  return MDP5_CTL_FLUSH_LM5;
+	default: return 0;
+	}
+}
+
+static inline u32 mdp_ctl_flush_mask_pipe(enum mdp5_pipe pipe)
+{
+	switch (pipe) {
+	case SSPP_VIG0: return MDP5_CTL_FLUSH_VIG0;
+	case SSPP_VIG1: return MDP5_CTL_FLUSH_VIG1;
+	case SSPP_VIG2: return MDP5_CTL_FLUSH_VIG2;
+	case SSPP_RGB0: return MDP5_CTL_FLUSH_RGB0;
+	case SSPP_RGB1: return MDP5_CTL_FLUSH_RGB1;
+	case SSPP_RGB2: return MDP5_CTL_FLUSH_RGB2;
+	case SSPP_DMA0: return MDP5_CTL_FLUSH_DMA0;
+	case SSPP_DMA1: return MDP5_CTL_FLUSH_DMA1;
+	case SSPP_VIG3: return MDP5_CTL_FLUSH_VIG3;
+	case SSPP_RGB3: return MDP5_CTL_FLUSH_RGB3;
+	default:        return 0;
+	}
+}
+
+#endif /* __MDP5_CTL_H__ */
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_encoder.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_encoder.c
index edec7bfaa952..0254bfdeb92f 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_encoder.c
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_encoder.c
@@ -24,6 +24,7 @@ struct mdp5_encoder {
 	struct drm_encoder base;
 	int intf;
 	enum mdp5_intf intf_id;
+	spinlock_t intf_lock;	/* protect REG_MDP5_INTF_* registers */
 	bool enabled;
 	uint32_t bsc;
 };
@@ -115,6 +116,7 @@ static void mdp5_encoder_dpms(struct drm_encoder *encoder, int mode)
 	struct mdp5_kms *mdp5_kms = get_kms(encoder);
 	int intf = mdp5_encoder->intf;
 	bool enabled = (mode == DRM_MODE_DPMS_ON);
+	unsigned long flags;
 
 	DBG("mode=%d", mode);
 
@@ -123,9 +125,24 @@ static void mdp5_encoder_dpms(struct drm_encoder *encoder, int mode)
 
 	if (enabled) {
 		bs_set(mdp5_encoder, 1);
+		spin_lock_irqsave(&mdp5_encoder->intf_lock, flags);
 		mdp5_write(mdp5_kms, REG_MDP5_INTF_TIMING_ENGINE_EN(intf), 1);
+		spin_unlock_irqrestore(&mdp5_encoder->intf_lock, flags);
 	} else {
+		spin_lock_irqsave(&mdp5_encoder->intf_lock, flags);
 		mdp5_write(mdp5_kms, REG_MDP5_INTF_TIMING_ENGINE_EN(intf), 0);
+		spin_unlock_irqrestore(&mdp5_encoder->intf_lock, flags);
+
+		/*
+		 * Wait for a vsync so we know the ENABLE=0 latched before
+		 * the (connector) source of the vsync's gets disabled,
+		 * otherwise we end up in a funny state if we re-enable
+		 * before the disable latches, which results that some of
+		 * the settings changes for the new modeset (like new
+		 * scanout buffer) don't latch properly..
+		 */
+		mdp_irq_wait(&mdp5_kms->base, intf2vblank(intf));
+
 		bs_set(mdp5_encoder, 0);
 	}
 
@@ -150,6 +167,7 @@ static void mdp5_encoder_mode_set(struct drm_encoder *encoder,
 	uint32_t display_v_start, display_v_end;
 	uint32_t hsync_start_x, hsync_end_x;
 	uint32_t format;
+	unsigned long flags;
 
 	mode = adjusted_mode;
 
@@ -180,6 +198,8 @@ static void mdp5_encoder_mode_set(struct drm_encoder *encoder,
 	display_v_start = (mode->vtotal - mode->vsync_start) * mode->htotal + dtv_hsync_skew;
 	display_v_end = vsync_period - ((mode->vsync_start - mode->vdisplay) * mode->htotal) + dtv_hsync_skew - 1;
 
+	spin_lock_irqsave(&mdp5_encoder->intf_lock, flags);
+
 	mdp5_write(mdp5_kms, REG_MDP5_INTF_HSYNC_CTL(intf),
 			MDP5_INTF_HSYNC_CTL_PULSEW(mode->hsync_end - mode->hsync_start) |
 			MDP5_INTF_HSYNC_CTL_PERIOD(mode->htotal));
@@ -201,6 +221,8 @@ static void mdp5_encoder_mode_set(struct drm_encoder *encoder,
 	mdp5_write(mdp5_kms, REG_MDP5_INTF_ACTIVE_VEND_F0(intf), 0);
 	mdp5_write(mdp5_kms, REG_MDP5_INTF_PANEL_FORMAT(intf), format);
 	mdp5_write(mdp5_kms, REG_MDP5_INTF_FRAME_LINE_COUNT_EN(intf), 0x3);  /* frame+line? */
+
+	spin_unlock_irqrestore(&mdp5_encoder->intf_lock, flags);
 }
 
 static void mdp5_encoder_prepare(struct drm_encoder *encoder)
@@ -242,6 +264,8 @@ struct drm_encoder *mdp5_encoder_init(struct drm_device *dev, int intf,
 	mdp5_encoder->intf_id = intf_id;
 	encoder = &mdp5_encoder->base;
 
+	spin_lock_init(&mdp5_encoder->intf_lock);
+
 	drm_encoder_init(dev, encoder, &mdp5_encoder_funcs,
 			 DRM_MODE_ENCODER_TMDS);
 	drm_encoder_helper_add(encoder, &mdp5_encoder_helper_funcs);
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_irq.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_irq.c
index f2b985bc2adf..70ac81edd40f 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_irq.c
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_irq.c
@@ -15,6 +15,8 @@
  * this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/irqdomain.h>
+#include <linux/irq.h>
 
 #include "msm_drv.h"
 #include "mdp5_kms.h"
@@ -88,11 +90,17 @@ irqreturn_t mdp5_irq(struct msm_kms *kms)
 
 	VERB("intr=%08x", intr);
 
-	if (intr & MDP5_HW_INTR_STATUS_INTR_MDP)
+	if (intr & MDP5_HW_INTR_STATUS_INTR_MDP) {
 		mdp5_irq_mdp(mdp_kms);
+		intr &= ~MDP5_HW_INTR_STATUS_INTR_MDP;
+	}
 
-	if (intr & MDP5_HW_INTR_STATUS_INTR_HDMI)
-		hdmi_irq(0, mdp5_kms->hdmi);
+	while (intr) {
+		irq_hw_number_t hwirq = fls(intr) - 1;
+		generic_handle_irq(irq_find_mapping(
+				mdp5_kms->irqcontroller.domain, hwirq));
+		intr &= ~(1 << hwirq);
+	}
 
 	return IRQ_HANDLED;
 }
@@ -109,3 +117,82 @@ void mdp5_disable_vblank(struct msm_kms *kms, struct drm_crtc *crtc)
 	mdp_update_vblank_mask(to_mdp_kms(kms),
 			mdp5_crtc_vblank(crtc), false);
 }
+
+/*
+ * interrupt-controller implementation, so sub-blocks (hdmi/eDP/dsi/etc)
+ * can register to get their irq's delivered
+ */
+
+#define VALID_IRQS  (MDP5_HW_INTR_STATUS_INTR_DSI0 | \
+		MDP5_HW_INTR_STATUS_INTR_DSI1 | \
+		MDP5_HW_INTR_STATUS_INTR_HDMI | \
+		MDP5_HW_INTR_STATUS_INTR_EDP)
+
+static void mdp5_hw_mask_irq(struct irq_data *irqd)
+{
+	struct mdp5_kms *mdp5_kms = irq_data_get_irq_chip_data(irqd);
+	smp_mb__before_atomic();
+	clear_bit(irqd->hwirq, &mdp5_kms->irqcontroller.enabled_mask);
+	smp_mb__after_atomic();
+}
+
+static void mdp5_hw_unmask_irq(struct irq_data *irqd)
+{
+	struct mdp5_kms *mdp5_kms = irq_data_get_irq_chip_data(irqd);
+	smp_mb__before_atomic();
+	set_bit(irqd->hwirq, &mdp5_kms->irqcontroller.enabled_mask);
+	smp_mb__after_atomic();
+}
+
+static struct irq_chip mdp5_hw_irq_chip = {
+	.name		= "mdp5",
+	.irq_mask	= mdp5_hw_mask_irq,
+	.irq_unmask	= mdp5_hw_unmask_irq,
+};
+
+static int mdp5_hw_irqdomain_map(struct irq_domain *d,
+		unsigned int irq, irq_hw_number_t hwirq)
+{
+	struct mdp5_kms *mdp5_kms = d->host_data;
+
+	if (!(VALID_IRQS & (1 << hwirq)))
+		return -EPERM;
+
+	irq_set_chip_and_handler(irq, &mdp5_hw_irq_chip, handle_level_irq);
+	irq_set_chip_data(irq, mdp5_kms);
+	set_irq_flags(irq, IRQF_VALID);
+
+	return 0;
+}
+
+static struct irq_domain_ops mdp5_hw_irqdomain_ops = {
+	.map = mdp5_hw_irqdomain_map,
+	.xlate = irq_domain_xlate_onecell,
+};
+
+
+int mdp5_irq_domain_init(struct mdp5_kms *mdp5_kms)
+{
+	struct device *dev = mdp5_kms->dev->dev;
+	struct irq_domain *d;
+
+	d = irq_domain_add_linear(dev->of_node, 32,
+			&mdp5_hw_irqdomain_ops, mdp5_kms);
+	if (!d) {
+		dev_err(dev, "mdp5 irq domain add failed\n");
+		return -ENXIO;
+	}
+
+	mdp5_kms->irqcontroller.enabled_mask = 0;
+	mdp5_kms->irqcontroller.domain = d;
+
+	return 0;
+}
+
+void mdp5_irq_domain_fini(struct mdp5_kms *mdp5_kms)
+{
+	if (mdp5_kms->irqcontroller.domain) {
+		irq_domain_remove(mdp5_kms->irqcontroller.domain);
+		mdp5_kms->irqcontroller.domain = NULL;
+	}
+}
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c
index 31a2c6331a1d..a11f1b80c488 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c
@@ -1,4 +1,5 @@
 /*
+ * Copyright (c) 2014, The Linux Foundation. All rights reserved.
  * Copyright (C) 2013 Red Hat
  * Author: Rob Clark <robdclark@gmail.com>
  *
@@ -24,145 +25,11 @@ static const char *iommu_ports[] = {
 		"mdp_0",
 };
 
-static struct mdp5_platform_config *mdp5_get_config(struct platform_device *dev);
-
-const struct mdp5_config *mdp5_cfg;
-
-static const struct mdp5_config msm8x74_config = {
-	.name = "msm8x74",
-	.ctl = {
-		.count = 5,
-		.base = { 0x00600, 0x00700, 0x00800, 0x00900, 0x00a00 },
-	},
-	.pipe_vig = {
-		.count = 3,
-		.base = { 0x01200, 0x01600, 0x01a00 },
-	},
-	.pipe_rgb = {
-		.count = 3,
-		.base = { 0x01e00, 0x02200, 0x02600 },
-	},
-	.pipe_dma = {
-		.count = 2,
-		.base = { 0x02a00, 0x02e00 },
-	},
-	.lm = {
-		.count = 5,
-		.base = { 0x03200, 0x03600, 0x03a00, 0x03e00, 0x04200 },
-	},
-	.dspp = {
-		.count = 3,
-		.base = { 0x04600, 0x04a00, 0x04e00 },
-	},
-	.ad = {
-		.count = 2,
-		.base = { 0x13100, 0x13300 }, /* NOTE: no ad in v1.0 */
-	},
-	.intf = {
-		.count = 4,
-		.base = { 0x12500, 0x12700, 0x12900, 0x12b00 },
-	},
-};
-
-static const struct mdp5_config apq8084_config = {
-	.name = "apq8084",
-	.ctl = {
-		.count = 5,
-		.base = { 0x00600, 0x00700, 0x00800, 0x00900, 0x00a00 },
-	},
-	.pipe_vig = {
-		.count = 4,
-		.base = { 0x01200, 0x01600, 0x01a00, 0x01e00 },
-	},
-	.pipe_rgb = {
-		.count = 4,
-		.base = { 0x02200, 0x02600, 0x02a00, 0x02e00 },
-	},
-	.pipe_dma = {
-		.count = 2,
-		.base = { 0x03200, 0x03600 },
-	},
-	.lm = {
-		.count = 6,
-		.base = { 0x03a00, 0x03e00, 0x04200, 0x04600, 0x04a00, 0x04e00 },
-	},
-	.dspp = {
-		.count = 4,
-		.base = { 0x05200, 0x05600, 0x05a00, 0x05e00 },
-
-	},
-	.ad = {
-		.count = 3,
-		.base = { 0x13500, 0x13700, 0x13900 },
-	},
-	.intf = {
-		.count = 5,
-		.base = { 0x12500, 0x12700, 0x12900, 0x12b00, 0x12d00 },
-	},
-};
-
-struct mdp5_config_entry {
-	int revision;
-	const struct mdp5_config *config;
-};
-
-static const struct mdp5_config_entry mdp5_configs[] = {
-	{ .revision = 0, .config = &msm8x74_config },
-	{ .revision = 2, .config = &msm8x74_config },
-	{ .revision = 3, .config = &apq8084_config },
-};
-
-static int mdp5_select_hw_cfg(struct msm_kms *kms)
-{
-	struct mdp5_kms *mdp5_kms = to_mdp5_kms(to_mdp_kms(kms));
-	struct drm_device *dev = mdp5_kms->dev;
-	uint32_t version, major, minor;
-	int i, ret = 0;
-
-	mdp5_enable(mdp5_kms);
-	version = mdp5_read(mdp5_kms, REG_MDP5_MDP_VERSION);
-	mdp5_disable(mdp5_kms);
-
-	major = FIELD(version, MDP5_MDP_VERSION_MAJOR);
-	minor = FIELD(version, MDP5_MDP_VERSION_MINOR);
-
-	DBG("found MDP5 version v%d.%d", major, minor);
-
-	if (major != 1) {
-		dev_err(dev->dev, "unexpected MDP major version: v%d.%d\n",
-				major, minor);
-		ret = -ENXIO;
-		goto out;
-	}
-
-	mdp5_kms->rev = minor;
-
-	/* only after mdp5_cfg global pointer's init can we access the hw */
-	for (i = 0; i < ARRAY_SIZE(mdp5_configs); i++) {
-		if (mdp5_configs[i].revision != minor)
-			continue;
-		mdp5_kms->hw_cfg = mdp5_cfg = mdp5_configs[i].config;
-		break;
-	}
-	if (unlikely(!mdp5_kms->hw_cfg)) {
-		dev_err(dev->dev, "unexpected MDP minor revision: v%d.%d\n",
-				major, minor);
-		ret = -ENXIO;
-		goto out;
-	}
-
-	DBG("MDP5: %s config selected", mdp5_kms->hw_cfg->name);
-
-	return 0;
-out:
-	return ret;
-}
-
 static int mdp5_hw_init(struct msm_kms *kms)
 {
 	struct mdp5_kms *mdp5_kms = to_mdp5_kms(to_mdp_kms(kms));
 	struct drm_device *dev = mdp5_kms->dev;
-	int i;
+	unsigned long flags;
 
 	pm_runtime_get_sync(dev->dev);
 
@@ -190,10 +57,11 @@ static int mdp5_hw_init(struct msm_kms *kms)
 	 * care.
 	 */
 
+	spin_lock_irqsave(&mdp5_kms->resource_lock, flags);
 	mdp5_write(mdp5_kms, REG_MDP5_DISP_INTF_SEL, 0);
+	spin_unlock_irqrestore(&mdp5_kms->resource_lock, flags);
 
-	for (i = 0; i < mdp5_kms->hw_cfg->ctl.count; i++)
-		mdp5_write(mdp5_kms, REG_MDP5_CTL_OP(i), 0);
+	mdp5_ctlm_hw_reset(mdp5_kms->ctlm);
 
 	pm_runtime_put_sync(dev->dev);
 
@@ -221,10 +89,20 @@ static void mdp5_destroy(struct msm_kms *kms)
 	struct mdp5_kms *mdp5_kms = to_mdp5_kms(to_mdp_kms(kms));
 	struct msm_mmu *mmu = mdp5_kms->mmu;
 
+	mdp5_irq_domain_fini(mdp5_kms);
+
 	if (mmu) {
 		mmu->funcs->detach(mmu, iommu_ports, ARRAY_SIZE(iommu_ports));
 		mmu->funcs->destroy(mmu);
 	}
+
+	if (mdp5_kms->ctlm)
+		mdp5_ctlm_destroy(mdp5_kms->ctlm);
+	if (mdp5_kms->smp)
+		mdp5_smp_destroy(mdp5_kms->smp);
+	if (mdp5_kms->cfg)
+		mdp5_cfg_destroy(mdp5_kms->cfg);
+
 	kfree(mdp5_kms);
 }
 
@@ -274,17 +152,31 @@ static int modeset_init(struct mdp5_kms *mdp5_kms)
 	static const enum mdp5_pipe crtcs[] = {
 			SSPP_RGB0, SSPP_RGB1, SSPP_RGB2, SSPP_RGB3,
 	};
+	static const enum mdp5_pipe pub_planes[] = {
+			SSPP_VIG0, SSPP_VIG1, SSPP_VIG2, SSPP_VIG3,
+	};
 	struct drm_device *dev = mdp5_kms->dev;
 	struct msm_drm_private *priv = dev->dev_private;
 	struct drm_encoder *encoder;
+	const struct mdp5_cfg_hw *hw_cfg;
 	int i, ret;
 
-	/* construct CRTCs: */
-	for (i = 0; i < mdp5_kms->hw_cfg->pipe_rgb.count; i++) {
+	hw_cfg = mdp5_cfg_get_hw_config(mdp5_kms->cfg);
+
+	/* register our interrupt-controller for hdmi/eDP/dsi/etc
+	 * to use for irqs routed through mdp:
+	 */
+	ret = mdp5_irq_domain_init(mdp5_kms);
+	if (ret)
+		goto fail;
+
+	/* construct CRTCs and their private planes: */
+	for (i = 0; i < hw_cfg->pipe_rgb.count; i++) {
 		struct drm_plane *plane;
 		struct drm_crtc *crtc;
 
-		plane = mdp5_plane_init(dev, crtcs[i], true);
+		plane = mdp5_plane_init(dev, crtcs[i], true,
+				hw_cfg->pipe_rgb.base[i]);
 		if (IS_ERR(plane)) {
 			ret = PTR_ERR(plane);
 			dev_err(dev->dev, "failed to construct plane for %s (%d)\n",
@@ -302,6 +194,20 @@ static int modeset_init(struct mdp5_kms *mdp5_kms)
 		priv->crtcs[priv->num_crtcs++] = crtc;
 	}
 
+	/* Construct public planes: */
+	for (i = 0; i < hw_cfg->pipe_vig.count; i++) {
+		struct drm_plane *plane;
+
+		plane = mdp5_plane_init(dev, pub_planes[i], false,
+				hw_cfg->pipe_vig.base[i]);
+		if (IS_ERR(plane)) {
+			ret = PTR_ERR(plane);
+			dev_err(dev->dev, "failed to construct %s plane: %d\n",
+					pipe2name(pub_planes[i]), ret);
+			goto fail;
+		}
+	}
+
 	/* Construct encoder for HDMI: */
 	encoder = mdp5_encoder_init(dev, 3, INTF_HDMI);
 	if (IS_ERR(encoder)) {
@@ -324,11 +230,12 @@ static int modeset_init(struct mdp5_kms *mdp5_kms)
 	priv->encoders[priv->num_encoders++] = encoder;
 
 	/* Construct bridge/connector for HDMI: */
-	mdp5_kms->hdmi = hdmi_init(dev, encoder);
-	if (IS_ERR(mdp5_kms->hdmi)) {
-		ret = PTR_ERR(mdp5_kms->hdmi);
-		dev_err(dev->dev, "failed to initialize HDMI: %d\n", ret);
-		goto fail;
+	if (priv->hdmi) {
+		ret = hdmi_modeset_init(priv->hdmi, dev, encoder);
+		if (ret) {
+			dev_err(dev->dev, "failed to initialize HDMI: %d\n", ret);
+			goto fail;
+		}
 	}
 
 	return 0;
@@ -337,6 +244,21 @@ fail:
 	return ret;
 }
 
+static void read_hw_revision(struct mdp5_kms *mdp5_kms,
+		uint32_t *major, uint32_t *minor)
+{
+	uint32_t version;
+
+	mdp5_enable(mdp5_kms);
+	version = mdp5_read(mdp5_kms, REG_MDP5_MDP_VERSION);
+	mdp5_disable(mdp5_kms);
+
+	*major = FIELD(version, MDP5_MDP_VERSION_MAJOR);
+	*minor = FIELD(version, MDP5_MDP_VERSION_MINOR);
+
+	DBG("MDP5 version v%d.%d", *major, *minor);
+}
+
 static int get_clk(struct platform_device *pdev, struct clk **clkp,
 		const char *name)
 {
@@ -353,10 +275,11 @@ static int get_clk(struct platform_device *pdev, struct clk **clkp,
 struct msm_kms *mdp5_kms_init(struct drm_device *dev)
 {
 	struct platform_device *pdev = dev->platformdev;
-	struct mdp5_platform_config *config = mdp5_get_config(pdev);
+	struct mdp5_cfg *config;
 	struct mdp5_kms *mdp5_kms;
 	struct msm_kms *kms = NULL;
 	struct msm_mmu *mmu;
+	uint32_t major, minor;
 	int i, ret;
 
 	mdp5_kms = kzalloc(sizeof(*mdp5_kms), GFP_KERNEL);
@@ -366,12 +289,13 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
 		goto fail;
 	}
 
+	spin_lock_init(&mdp5_kms->resource_lock);
+
 	mdp_kms_init(&mdp5_kms->base, &kms_funcs);
 
 	kms = &mdp5_kms->base.base;
 
 	mdp5_kms->dev = dev;
-	mdp5_kms->smp_blk_cnt = config->smp_blk_cnt;
 
 	mdp5_kms->mmio = msm_ioremap(pdev, "mdp_phys", "MDP5");
 	if (IS_ERR(mdp5_kms->mmio)) {
@@ -416,24 +340,52 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
 	if (ret)
 		goto fail;
 
-	ret = clk_set_rate(mdp5_kms->src_clk, config->max_clk);
+	/* we need to set a default rate before enabling.  Set a safe
+	 * rate first, then figure out hw revision, and then set a
+	 * more optimal rate:
+	 */
+	clk_set_rate(mdp5_kms->src_clk, 200000000);
+
+	read_hw_revision(mdp5_kms, &major, &minor);
 
-	ret = mdp5_select_hw_cfg(kms);
-	if (ret)
+	mdp5_kms->cfg = mdp5_cfg_init(mdp5_kms, major, minor);
+	if (IS_ERR(mdp5_kms->cfg)) {
+		ret = PTR_ERR(mdp5_kms->cfg);
+		mdp5_kms->cfg = NULL;
 		goto fail;
+	}
+
+	config = mdp5_cfg_get_config(mdp5_kms->cfg);
+
+	/* TODO: compute core clock rate at runtime */
+	clk_set_rate(mdp5_kms->src_clk, config->hw->max_clk);
+
+	mdp5_kms->smp = mdp5_smp_init(mdp5_kms->dev, &config->hw->smp);
+	if (IS_ERR(mdp5_kms->smp)) {
+		ret = PTR_ERR(mdp5_kms->smp);
+		mdp5_kms->smp = NULL;
+		goto fail;
+	}
+
+	mdp5_kms->ctlm = mdp5_ctlm_init(dev, mdp5_kms->mmio, config->hw);
+	if (IS_ERR(mdp5_kms->ctlm)) {
+		ret = PTR_ERR(mdp5_kms->ctlm);
+		mdp5_kms->ctlm = NULL;
+		goto fail;
+	}
 
 	/* make sure things are off before attaching iommu (bootloader could
 	 * have left things on, in which case we'll start getting faults if
 	 * we don't disable):
 	 */
 	mdp5_enable(mdp5_kms);
-	for (i = 0; i < mdp5_kms->hw_cfg->intf.count; i++)
+	for (i = 0; i < config->hw->intf.count; i++)
 		mdp5_write(mdp5_kms, REG_MDP5_INTF_TIMING_ENGINE_EN(i), 0);
 	mdp5_disable(mdp5_kms);
 	mdelay(16);
 
-	if (config->iommu) {
-		mmu = msm_iommu_new(&pdev->dev, config->iommu);
+	if (config->platform.iommu) {
+		mmu = msm_iommu_new(&pdev->dev, config->platform.iommu);
 		if (IS_ERR(mmu)) {
 			ret = PTR_ERR(mmu);
 			dev_err(dev->dev, "failed to init iommu: %d\n", ret);
@@ -474,18 +426,3 @@ fail:
 		mdp5_destroy(kms);
 	return ERR_PTR(ret);
 }
-
-static struct mdp5_platform_config *mdp5_get_config(struct platform_device *dev)
-{
-	static struct mdp5_platform_config config = {};
-#ifdef CONFIG_OF
-	/* TODO */
-#endif
-	config.iommu = iommu_domain_alloc(&platform_bus_type);
-	/* TODO hard-coded in downstream mdss, but should it be? */
-	config.max_clk = 200000000;
-	/* TODO get from DT: */
-	config.smp_blk_cnt = 22;
-
-	return &config;
-}
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h
index 5bf340dd0f00..dd69c77c0d64 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h
@@ -21,25 +21,9 @@
 #include "msm_drv.h"
 #include "msm_kms.h"
 #include "mdp/mdp_kms.h"
-/* dynamic offsets used by mdp5.xml.h (initialized in mdp5_kms.c) */
-#define MDP5_MAX_BASES		8
-struct mdp5_sub_block {
-	int	count;
-	uint32_t base[MDP5_MAX_BASES];
-};
-struct mdp5_config {
-	char  *name;
-	struct mdp5_sub_block ctl;
-	struct mdp5_sub_block pipe_vig;
-	struct mdp5_sub_block pipe_rgb;
-	struct mdp5_sub_block pipe_dma;
-	struct mdp5_sub_block lm;
-	struct mdp5_sub_block dspp;
-	struct mdp5_sub_block ad;
-	struct mdp5_sub_block intf;
-};
-extern const struct mdp5_config *mdp5_cfg;
+#include "mdp5_cfg.h"	/* must be included before mdp5.xml.h */
 #include "mdp5.xml.h"
+#include "mdp5_ctl.h"
 #include "mdp5_smp.h"
 
 struct mdp5_kms {
@@ -47,17 +31,14 @@ struct mdp5_kms {
 
 	struct drm_device *dev;
 
-	int rev;
-	const struct mdp5_config *hw_cfg;
+	struct mdp5_cfg_handler *cfg;
 
 	/* mapper-id used to request GEM buffer mapped for scanout: */
 	int id;
 	struct msm_mmu *mmu;
 
-	/* for tracking smp allocation amongst pipes: */
-	mdp5_smp_state_t smp_state;
-	struct mdp5_client_smp_state smp_client_state[CID_MAX];
-	int smp_blk_cnt;
+	struct mdp5_smp *smp;
+	struct mdp5_ctl_manager *ctlm;
 
 	/* io/register spaces: */
 	void __iomem *mmio, *vbif;
@@ -71,18 +52,47 @@ struct mdp5_kms {
 	struct clk *lut_clk;
 	struct clk *vsync_clk;
 
-	struct hdmi *hdmi;
+	/*
+	 * lock to protect access to global resources: ie., following register:
+	 *	- REG_MDP5_DISP_INTF_SEL
+	 */
+	spinlock_t resource_lock;
 
 	struct mdp_irq error_handler;
+
+	struct {
+		volatile unsigned long enabled_mask;
+		struct irq_domain *domain;
+	} irqcontroller;
 };
 #define to_mdp5_kms(x) container_of(x, struct mdp5_kms, base)
 
-/* platform config data (ie. from DT, or pdata) */
-struct mdp5_platform_config {
-	struct iommu_domain *iommu;
-	uint32_t max_clk;
-	int smp_blk_cnt;
+struct mdp5_plane_state {
+	struct drm_plane_state base;
+
+	/* "virtual" zpos.. we calculate actual mixer-stage at runtime
+	 * by sorting the attached planes by zpos and then assigning
+	 * mixer stage lowest to highest.  Private planes get default
+	 * zpos of zero, and public planes a unique value that is
+	 * greater than zero.  This way, things work out if a naive
+	 * userspace assigns planes to a crtc without setting zpos.
+	 */
+	int zpos;
+
+	/* the actual mixer stage, calculated in crtc->atomic_check()
+	 * NOTE: this should move to mdp5_crtc_state, when that exists
+	 */
+	enum mdp_mixer_stage_id stage;
+
+	/* some additional transactional status to help us know in the
+	 * apply path whether we need to update SMP allocation, and
+	 * whether current update is still pending:
+	 */
+	bool mode_changed : 1;
+	bool pending : 1;
 };
+#define to_mdp5_plane_state(x) \
+		container_of(x, struct mdp5_plane_state, base)
 
 static inline void mdp5_write(struct mdp5_kms *mdp5_kms, u32 reg, u32 data)
 {
@@ -107,23 +117,6 @@ static inline const char *pipe2name(enum mdp5_pipe pipe)
 	return names[pipe];
 }
 
-static inline uint32_t pipe2flush(enum mdp5_pipe pipe)
-{
-	switch (pipe) {
-	case SSPP_VIG0: return MDP5_CTL_FLUSH_VIG0;
-	case SSPP_VIG1: return MDP5_CTL_FLUSH_VIG1;
-	case SSPP_VIG2: return MDP5_CTL_FLUSH_VIG2;
-	case SSPP_RGB0: return MDP5_CTL_FLUSH_RGB0;
-	case SSPP_RGB1: return MDP5_CTL_FLUSH_RGB1;
-	case SSPP_RGB2: return MDP5_CTL_FLUSH_RGB2;
-	case SSPP_DMA0: return MDP5_CTL_FLUSH_DMA0;
-	case SSPP_DMA1: return MDP5_CTL_FLUSH_DMA1;
-	case SSPP_VIG3: return MDP5_CTL_FLUSH_VIG3;
-	case SSPP_RGB3: return MDP5_CTL_FLUSH_RGB3;
-	default:        return 0;
-	}
-}
-
 static inline int pipe2nclients(enum mdp5_pipe pipe)
 {
 	switch (pipe) {
@@ -137,34 +130,6 @@ static inline int pipe2nclients(enum mdp5_pipe pipe)
 	}
 }
 
-static inline enum mdp5_client_id pipe2client(enum mdp5_pipe pipe, int plane)
-{
-	WARN_ON(plane >= pipe2nclients(pipe));
-	switch (pipe) {
-	case SSPP_VIG0: return CID_VIG0_Y + plane;
-	case SSPP_VIG1: return CID_VIG1_Y + plane;
-	case SSPP_VIG2: return CID_VIG2_Y + plane;
-	case SSPP_RGB0: return CID_RGB0;
-	case SSPP_RGB1: return CID_RGB1;
-	case SSPP_RGB2: return CID_RGB2;
-	case SSPP_DMA0: return CID_DMA0_Y + plane;
-	case SSPP_DMA1: return CID_DMA1_Y + plane;
-	case SSPP_VIG3: return CID_VIG3_Y + plane;
-	case SSPP_RGB3: return CID_RGB3;
-	default:        return CID_UNUSED;
-	}
-}
-
-static inline uint32_t mixer2flush(int lm)
-{
-	switch (lm) {
-	case 0:  return MDP5_CTL_FLUSH_LM0;
-	case 1:  return MDP5_CTL_FLUSH_LM1;
-	case 2:  return MDP5_CTL_FLUSH_LM2;
-	default: return 0;
-	}
-}
-
 static inline uint32_t intf2err(int intf)
 {
 	switch (intf) {
@@ -197,6 +162,8 @@ void mdp5_irq_uninstall(struct msm_kms *kms);
 irqreturn_t mdp5_irq(struct msm_kms *kms);
 int mdp5_enable_vblank(struct msm_kms *kms, struct drm_crtc *crtc);
 void mdp5_disable_vblank(struct msm_kms *kms, struct drm_crtc *crtc);
+int mdp5_irq_domain_init(struct mdp5_kms *mdp5_kms);
+void mdp5_irq_domain_fini(struct mdp5_kms *mdp5_kms);
 
 static inline
 uint32_t mdp5_get_formats(enum mdp5_pipe pipe, uint32_t *pixel_formats,
@@ -210,26 +177,18 @@ uint32_t mdp5_get_formats(enum mdp5_pipe pipe, uint32_t *pixel_formats,
 
 void mdp5_plane_install_properties(struct drm_plane *plane,
 		struct drm_mode_object *obj);
-void mdp5_plane_set_scanout(struct drm_plane *plane,
-		struct drm_framebuffer *fb);
-int mdp5_plane_mode_set(struct drm_plane *plane,
-		struct drm_crtc *crtc, struct drm_framebuffer *fb,
-		int crtc_x, int crtc_y,
-		unsigned int crtc_w, unsigned int crtc_h,
-		uint32_t src_x, uint32_t src_y,
-		uint32_t src_w, uint32_t src_h);
+uint32_t mdp5_plane_get_flush(struct drm_plane *plane);
 void mdp5_plane_complete_flip(struct drm_plane *plane);
 enum mdp5_pipe mdp5_plane_pipe(struct drm_plane *plane);
 struct drm_plane *mdp5_plane_init(struct drm_device *dev,
-		enum mdp5_pipe pipe, bool private_plane);
+		enum mdp5_pipe pipe, bool private_plane, uint32_t reg_offset);
 
 uint32_t mdp5_crtc_vblank(struct drm_crtc *crtc);
 
+int mdp5_crtc_get_lm(struct drm_crtc *crtc);
 void mdp5_crtc_cancel_pending_flip(struct drm_crtc *crtc, struct drm_file *file);
 void mdp5_crtc_set_intf(struct drm_crtc *crtc, int intf,
 		enum mdp5_intf intf_id);
-void mdp5_crtc_attach(struct drm_crtc *crtc, struct drm_plane *plane);
-void mdp5_crtc_detach(struct drm_crtc *crtc, struct drm_plane *plane);
 struct drm_crtc *mdp5_crtc_init(struct drm_device *dev,
 		struct drm_plane *plane, int id);
 
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c
index f3daec4412ad..26e5fdea6594 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c
@@ -1,4 +1,5 @@
 /*
+ * Copyright (c) 2014 The Linux Foundation. All rights reserved.
  * Copyright (C) 2013 Red Hat
  * Author: Rob Clark <robdclark@gmail.com>
  *
@@ -17,6 +18,7 @@
 
 #include "mdp5_kms.h"
 
+#define MAX_PLANE	4
 
 struct mdp5_plane {
 	struct drm_plane base;
@@ -24,6 +26,11 @@ struct mdp5_plane {
 
 	enum mdp5_pipe pipe;
 
+	spinlock_t pipe_lock;	/* protect REG_MDP5_PIPE_* registers */
+	uint32_t reg_offset;
+
+	uint32_t flush_mask;	/* used to commit pipe registers */
+
 	uint32_t nformats;
 	uint32_t formats[32];
 
@@ -31,31 +38,24 @@ struct mdp5_plane {
 };
 #define to_mdp5_plane(x) container_of(x, struct mdp5_plane, base)
 
+static int mdp5_plane_mode_set(struct drm_plane *plane,
+		struct drm_crtc *crtc, struct drm_framebuffer *fb,
+		int crtc_x, int crtc_y,
+		unsigned int crtc_w, unsigned int crtc_h,
+		uint32_t src_x, uint32_t src_y,
+		uint32_t src_w, uint32_t src_h);
+static void set_scanout_locked(struct drm_plane *plane,
+		struct drm_framebuffer *fb);
+
 static struct mdp5_kms *get_kms(struct drm_plane *plane)
 {
 	struct msm_drm_private *priv = plane->dev->dev_private;
 	return to_mdp5_kms(to_mdp_kms(priv->kms));
 }
 
-static int mdp5_plane_update(struct drm_plane *plane,
-		struct drm_crtc *crtc, struct drm_framebuffer *fb,
-		int crtc_x, int crtc_y,
-		unsigned int crtc_w, unsigned int crtc_h,
-		uint32_t src_x, uint32_t src_y,
-		uint32_t src_w, uint32_t src_h)
+static bool plane_enabled(struct drm_plane_state *state)
 {
-	struct mdp5_plane *mdp5_plane = to_mdp5_plane(plane);
-
-	mdp5_plane->enabled = true;
-
-	if (plane->fb)
-		drm_framebuffer_unreference(plane->fb);
-
-	drm_framebuffer_reference(fb);
-
-	return mdp5_plane_mode_set(plane, crtc, fb,
-			crtc_x, crtc_y, crtc_w, crtc_h,
-			src_x, src_y, src_w, src_h);
+	return state->fb && state->crtc;
 }
 
 static int mdp5_plane_disable(struct drm_plane *plane)
@@ -63,21 +63,13 @@ static int mdp5_plane_disable(struct drm_plane *plane)
 	struct mdp5_plane *mdp5_plane = to_mdp5_plane(plane);
 	struct mdp5_kms *mdp5_kms = get_kms(plane);
 	enum mdp5_pipe pipe = mdp5_plane->pipe;
-	int i;
 
 	DBG("%s: disable", mdp5_plane->name);
 
-	/* update our SMP request to zero (release all our blks): */
-	for (i = 0; i < pipe2nclients(pipe); i++)
-		mdp5_smp_request(mdp5_kms, pipe2client(pipe, i), 0);
-
-	/* TODO detaching now will cause us not to get the last
-	 * vblank and mdp5_smp_commit().. so other planes will
-	 * still see smp blocks previously allocated to us as
-	 * in-use..
-	 */
-	if (plane->crtc)
-		mdp5_crtc_detach(plane->crtc, plane);
+	if (mdp5_kms) {
+		/* Release the memory we requested earlier from the SMP: */
+		mdp5_smp_release(mdp5_kms->smp, pipe);
+	}
 
 	return 0;
 }
@@ -85,11 +77,8 @@ static int mdp5_plane_disable(struct drm_plane *plane)
 static void mdp5_plane_destroy(struct drm_plane *plane)
 {
 	struct mdp5_plane *mdp5_plane = to_mdp5_plane(plane);
-	struct msm_drm_private *priv = plane->dev->dev_private;
-
-	if (priv->kms)
-		mdp5_plane_disable(plane);
 
+	drm_plane_helper_disable(plane);
 	drm_plane_cleanup(plane);
 
 	kfree(mdp5_plane);
@@ -109,109 +98,186 @@ int mdp5_plane_set_property(struct drm_plane *plane,
 	return -EINVAL;
 }
 
+static void mdp5_plane_reset(struct drm_plane *plane)
+{
+	struct mdp5_plane_state *mdp5_state;
+
+	if (plane->state && plane->state->fb)
+		drm_framebuffer_unreference(plane->state->fb);
+
+	kfree(to_mdp5_plane_state(plane->state));
+	mdp5_state = kzalloc(sizeof(*mdp5_state), GFP_KERNEL);
+
+	if (plane->type == DRM_PLANE_TYPE_PRIMARY) {
+		mdp5_state->zpos = 0;
+	} else {
+		mdp5_state->zpos = 1 + drm_plane_index(plane);
+	}
+
+	plane->state = &mdp5_state->base;
+}
+
+static struct drm_plane_state *
+mdp5_plane_duplicate_state(struct drm_plane *plane)
+{
+	struct mdp5_plane_state *mdp5_state;
+
+	if (WARN_ON(!plane->state))
+		return NULL;
+
+	mdp5_state = kmemdup(to_mdp5_plane_state(plane->state),
+			sizeof(*mdp5_state), GFP_KERNEL);
+
+	if (mdp5_state && mdp5_state->base.fb)
+		drm_framebuffer_reference(mdp5_state->base.fb);
+
+	mdp5_state->mode_changed = false;
+	mdp5_state->pending = false;
+
+	return &mdp5_state->base;
+}
+
+static void mdp5_plane_destroy_state(struct drm_plane *plane,
+		struct drm_plane_state *state)
+{
+	if (state->fb)
+		drm_framebuffer_unreference(state->fb);
+
+	kfree(to_mdp5_plane_state(state));
+}
+
 static const struct drm_plane_funcs mdp5_plane_funcs = {
-		.update_plane = mdp5_plane_update,
-		.disable_plane = mdp5_plane_disable,
+		.update_plane = drm_atomic_helper_update_plane,
+		.disable_plane = drm_atomic_helper_disable_plane,
 		.destroy = mdp5_plane_destroy,
 		.set_property = mdp5_plane_set_property,
+		.reset = mdp5_plane_reset,
+		.atomic_duplicate_state = mdp5_plane_duplicate_state,
+		.atomic_destroy_state = mdp5_plane_destroy_state,
 };
 
-void mdp5_plane_set_scanout(struct drm_plane *plane,
+static int mdp5_plane_prepare_fb(struct drm_plane *plane,
 		struct drm_framebuffer *fb)
 {
 	struct mdp5_plane *mdp5_plane = to_mdp5_plane(plane);
 	struct mdp5_kms *mdp5_kms = get_kms(plane);
-	enum mdp5_pipe pipe = mdp5_plane->pipe;
-	uint32_t nplanes = drm_format_num_planes(fb->pixel_format);
-	uint32_t iova[4];
-	int i;
-
-	for (i = 0; i < nplanes; i++) {
-		struct drm_gem_object *bo = msm_framebuffer_bo(fb, i);
-		msm_gem_get_iova(bo, mdp5_kms->id, &iova[i]);
-	}
-	for (; i < 4; i++)
-		iova[i] = 0;
 
-	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC_STRIDE_A(pipe),
-			MDP5_PIPE_SRC_STRIDE_A_P0(fb->pitches[0]) |
-			MDP5_PIPE_SRC_STRIDE_A_P1(fb->pitches[1]));
-
-	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC_STRIDE_B(pipe),
-			MDP5_PIPE_SRC_STRIDE_B_P2(fb->pitches[2]) |
-			MDP5_PIPE_SRC_STRIDE_B_P3(fb->pitches[3]));
-
-	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC0_ADDR(pipe), iova[0]);
-	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC1_ADDR(pipe), iova[1]);
-	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC2_ADDR(pipe), iova[2]);
-	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC3_ADDR(pipe), iova[3]);
-
-	plane->fb = fb;
+	DBG("%s: prepare: FB[%u]", mdp5_plane->name, fb->base.id);
+	return msm_framebuffer_prepare(fb, mdp5_kms->id);
 }
 
-/* NOTE: looks like if horizontal decimation is used (if we supported that)
- * then the width used to calculate SMP block requirements is the post-
- * decimated width.  Ie. SMP buffering sits downstream of decimation (which
- * presumably happens during the dma from scanout buffer).
- */
-static int request_smp_blocks(struct drm_plane *plane, uint32_t format,
-		uint32_t nplanes, uint32_t width)
+static void mdp5_plane_cleanup_fb(struct drm_plane *plane,
+		struct drm_framebuffer *fb)
 {
-	struct drm_device *dev = plane->dev;
 	struct mdp5_plane *mdp5_plane = to_mdp5_plane(plane);
 	struct mdp5_kms *mdp5_kms = get_kms(plane);
-	enum mdp5_pipe pipe = mdp5_plane->pipe;
-	int i, hsub, nlines, nblks, ret;
 
-	hsub = drm_format_horz_chroma_subsampling(format);
+	DBG("%s: cleanup: FB[%u]", mdp5_plane->name, fb->base.id);
+	msm_framebuffer_cleanup(fb, mdp5_kms->id);
+}
 
-	/* different if BWC (compressed framebuffer?) enabled: */
-	nlines = 2;
+static int mdp5_plane_atomic_check(struct drm_plane *plane,
+		struct drm_plane_state *state)
+{
+	struct mdp5_plane *mdp5_plane = to_mdp5_plane(plane);
+	struct drm_plane_state *old_state = plane->state;
 
-	for (i = 0, nblks = 0; i < nplanes; i++) {
-		int n, fetch_stride, cpp;
+	DBG("%s: check (%d -> %d)", mdp5_plane->name,
+			plane_enabled(old_state), plane_enabled(state));
 
-		cpp = drm_format_plane_cpp(format, i);
-		fetch_stride = width * cpp / (i ? hsub : 1);
+	if (plane_enabled(state) && plane_enabled(old_state)) {
+		/* we cannot change SMP block configuration during scanout: */
+		bool full_modeset = false;
+		if (state->fb->pixel_format != old_state->fb->pixel_format) {
+			DBG("%s: pixel_format change!", mdp5_plane->name);
+			full_modeset = true;
+		}
+		if (state->src_w != old_state->src_w) {
+			DBG("%s: src_w change!", mdp5_plane->name);
+			full_modeset = true;
+		}
+		if (to_mdp5_plane_state(old_state)->pending) {
+			DBG("%s: still pending!", mdp5_plane->name);
+			full_modeset = true;
+		}
+		if (full_modeset) {
+			struct drm_crtc_state *crtc_state =
+					drm_atomic_get_crtc_state(state->state, state->crtc);
+			crtc_state->mode_changed = true;
+			to_mdp5_plane_state(state)->mode_changed = true;
+		}
+	} else {
+		to_mdp5_plane_state(state)->mode_changed = true;
+	}
 
-		n = DIV_ROUND_UP(fetch_stride * nlines, SMP_BLK_SIZE);
+	return 0;
+}
 
-		/* for hw rev v1.00 */
-		if (mdp5_kms->rev == 0)
-			n = roundup_pow_of_two(n);
+static void mdp5_plane_atomic_update(struct drm_plane *plane,
+				     struct drm_plane_state *old_state)
+{
+	struct mdp5_plane *mdp5_plane = to_mdp5_plane(plane);
+	struct drm_plane_state *state = plane->state;
 
-		DBG("%s[%d]: request %d SMP blocks", mdp5_plane->name, i, n);
-		ret = mdp5_smp_request(mdp5_kms, pipe2client(pipe, i), n);
-		if (ret) {
-			dev_err(dev->dev, "Could not allocate %d SMP blocks: %d\n",
-					n, ret);
-			return ret;
-		}
+	DBG("%s: update", mdp5_plane->name);
 
-		nblks += n;
+	if (!plane_enabled(state)) {
+		to_mdp5_plane_state(state)->pending = true;
+		mdp5_plane_disable(plane);
+	} else if (to_mdp5_plane_state(state)->mode_changed) {
+		int ret;
+		to_mdp5_plane_state(state)->pending = true;
+		ret = mdp5_plane_mode_set(plane,
+				state->crtc, state->fb,
+				state->crtc_x, state->crtc_y,
+				state->crtc_w, state->crtc_h,
+				state->src_x,  state->src_y,
+				state->src_w, state->src_h);
+		/* atomic_check should have ensured that this doesn't fail */
+		WARN_ON(ret < 0);
+	} else {
+		unsigned long flags;
+		spin_lock_irqsave(&mdp5_plane->pipe_lock, flags);
+		set_scanout_locked(plane, state->fb);
+		spin_unlock_irqrestore(&mdp5_plane->pipe_lock, flags);
 	}
-
-	/* in success case, return total # of blocks allocated: */
-	return nblks;
 }
 
-static void set_fifo_thresholds(struct drm_plane *plane, int nblks)
+static const struct drm_plane_helper_funcs mdp5_plane_helper_funcs = {
+		.prepare_fb = mdp5_plane_prepare_fb,
+		.cleanup_fb = mdp5_plane_cleanup_fb,
+		.atomic_check = mdp5_plane_atomic_check,
+		.atomic_update = mdp5_plane_atomic_update,
+};
+
+static void set_scanout_locked(struct drm_plane *plane,
+		struct drm_framebuffer *fb)
 {
 	struct mdp5_plane *mdp5_plane = to_mdp5_plane(plane);
 	struct mdp5_kms *mdp5_kms = get_kms(plane);
 	enum mdp5_pipe pipe = mdp5_plane->pipe;
-	uint32_t val;
 
-	/* 1/4 of SMP pool that is being fetched */
-	val = (nblks * SMP_ENTRIES_PER_BLK) / 4;
+	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC_STRIDE_A(pipe),
+			MDP5_PIPE_SRC_STRIDE_A_P0(fb->pitches[0]) |
+			MDP5_PIPE_SRC_STRIDE_A_P1(fb->pitches[1]));
 
-	mdp5_write(mdp5_kms, REG_MDP5_PIPE_REQPRIO_FIFO_WM_0(pipe), val * 1);
-	mdp5_write(mdp5_kms, REG_MDP5_PIPE_REQPRIO_FIFO_WM_1(pipe), val * 2);
-	mdp5_write(mdp5_kms, REG_MDP5_PIPE_REQPRIO_FIFO_WM_2(pipe), val * 3);
+	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC_STRIDE_B(pipe),
+			MDP5_PIPE_SRC_STRIDE_B_P2(fb->pitches[2]) |
+			MDP5_PIPE_SRC_STRIDE_B_P3(fb->pitches[3]));
+
+	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC0_ADDR(pipe),
+			msm_framebuffer_iova(fb, mdp5_kms->id, 0));
+	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC1_ADDR(pipe),
+			msm_framebuffer_iova(fb, mdp5_kms->id, 1));
+	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC2_ADDR(pipe),
+			msm_framebuffer_iova(fb, mdp5_kms->id, 2));
+	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC3_ADDR(pipe),
+			msm_framebuffer_iova(fb, mdp5_kms->id, 4));
 
+	plane->fb = fb;
 }
 
-int mdp5_plane_mode_set(struct drm_plane *plane,
+static int mdp5_plane_mode_set(struct drm_plane *plane,
 		struct drm_crtc *crtc, struct drm_framebuffer *fb,
 		int crtc_x, int crtc_y,
 		unsigned int crtc_w, unsigned int crtc_h,
@@ -225,7 +291,8 @@ int mdp5_plane_mode_set(struct drm_plane *plane,
 	uint32_t nplanes, config = 0;
 	uint32_t phasex_step = 0, phasey_step = 0;
 	uint32_t hdecm = 0, vdecm = 0;
-	int i, nblks;
+	unsigned long flags;
+	int ret;
 
 	nplanes = drm_format_num_planes(fb->pixel_format);
 
@@ -243,12 +310,11 @@ int mdp5_plane_mode_set(struct drm_plane *plane,
 			fb->base.id, src_x, src_y, src_w, src_h,
 			crtc->base.id, crtc_x, crtc_y, crtc_w, crtc_h);
 
-	/*
-	 * Calculate and request required # of smp blocks:
-	 */
-	nblks = request_smp_blocks(plane, fb->pixel_format, nplanes, src_w);
-	if (nblks < 0)
-		return nblks;
+	/* Request some memory from the SMP: */
+	ret = mdp5_smp_request(mdp5_kms->smp,
+			mdp5_plane->pipe, fb->pixel_format, src_w);
+	if (ret)
+		return ret;
 
 	/*
 	 * Currently we update the hw for allocations/requests immediately,
@@ -256,8 +322,7 @@ int mdp5_plane_mode_set(struct drm_plane *plane,
 	 * would move into atomic->check_plane_state(), while updating the
 	 * hw would remain here:
 	 */
-	for (i = 0; i < pipe2nclients(pipe); i++)
-		mdp5_smp_configure(mdp5_kms, pipe2client(pipe, i));
+	mdp5_smp_configure(mdp5_kms->smp, pipe);
 
 	if (src_w != crtc_w) {
 		config |= MDP5_PIPE_SCALE_CONFIG_SCALEX_EN;
@@ -269,6 +334,8 @@ int mdp5_plane_mode_set(struct drm_plane *plane,
 		/* TODO calc phasey_step, vdecm */
 	}
 
+	spin_lock_irqsave(&mdp5_plane->pipe_lock, flags);
+
 	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC_IMG_SIZE(pipe),
 			MDP5_PIPE_SRC_IMG_SIZE_WIDTH(src_w) |
 			MDP5_PIPE_SRC_IMG_SIZE_HEIGHT(src_h));
@@ -289,8 +356,6 @@ int mdp5_plane_mode_set(struct drm_plane *plane,
 			MDP5_PIPE_OUT_XY_X(crtc_x) |
 			MDP5_PIPE_OUT_XY_Y(crtc_y));
 
-	mdp5_plane_set_scanout(plane, fb);
-
 	format = to_mdp_format(msm_framebuffer_format(fb));
 
 	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC_FORMAT(pipe),
@@ -330,22 +395,24 @@ int mdp5_plane_mode_set(struct drm_plane *plane,
 			MDP5_PIPE_SCALE_CONFIG_SCALEX_MAX_FILTER(SCALE_FILTER_NEAREST) |
 			MDP5_PIPE_SCALE_CONFIG_SCALEY_MAX_FILTER(SCALE_FILTER_NEAREST));
 
-	set_fifo_thresholds(plane, nblks);
+	set_scanout_locked(plane, fb);
 
-	/* TODO detach from old crtc (if we had more than one) */
-	mdp5_crtc_attach(crtc, plane);
+	spin_unlock_irqrestore(&mdp5_plane->pipe_lock, flags);
 
-	return 0;
+	return ret;
 }
 
 void mdp5_plane_complete_flip(struct drm_plane *plane)
 {
 	struct mdp5_kms *mdp5_kms = get_kms(plane);
-	enum mdp5_pipe pipe = to_mdp5_plane(plane)->pipe;
-	int i;
+	struct mdp5_plane *mdp5_plane = to_mdp5_plane(plane);
+	enum mdp5_pipe pipe = mdp5_plane->pipe;
+
+	DBG("%s: complete flip", mdp5_plane->name);
 
-	for (i = 0; i < pipe2nclients(pipe); i++)
-		mdp5_smp_commit(mdp5_kms, pipe2client(pipe, i));
+	mdp5_smp_commit(mdp5_kms->smp, pipe);
+
+	to_mdp5_plane_state(plane->state)->pending = false;
 }
 
 enum mdp5_pipe mdp5_plane_pipe(struct drm_plane *plane)
@@ -354,9 +421,16 @@ enum mdp5_pipe mdp5_plane_pipe(struct drm_plane *plane)
 	return mdp5_plane->pipe;
 }
 
+uint32_t mdp5_plane_get_flush(struct drm_plane *plane)
+{
+	struct mdp5_plane *mdp5_plane = to_mdp5_plane(plane);
+
+	return mdp5_plane->flush_mask;
+}
+
 /* initialize plane */
 struct drm_plane *mdp5_plane_init(struct drm_device *dev,
-		enum mdp5_pipe pipe, bool private_plane)
+		enum mdp5_pipe pipe, bool private_plane, uint32_t reg_offset)
 {
 	struct drm_plane *plane = NULL;
 	struct mdp5_plane *mdp5_plane;
@@ -377,10 +451,18 @@ struct drm_plane *mdp5_plane_init(struct drm_device *dev,
 	mdp5_plane->nformats = mdp5_get_formats(pipe, mdp5_plane->formats,
 			ARRAY_SIZE(mdp5_plane->formats));
 
+	mdp5_plane->flush_mask = mdp_ctl_flush_mask_pipe(pipe);
+	mdp5_plane->reg_offset = reg_offset;
+	spin_lock_init(&mdp5_plane->pipe_lock);
+
 	type = private_plane ? DRM_PLANE_TYPE_PRIMARY : DRM_PLANE_TYPE_OVERLAY;
-	drm_universal_plane_init(dev, plane, 0xff, &mdp5_plane_funcs,
+	ret = drm_universal_plane_init(dev, plane, 0xff, &mdp5_plane_funcs,
 				 mdp5_plane->formats, mdp5_plane->nformats,
 				 type);
+	if (ret)
+		goto fail;
+
+	drm_plane_helper_add(plane, &mdp5_plane_helper_funcs);
 
 	mdp5_plane_install_properties(plane, &plane->base);
 
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_smp.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_smp.c
index 2d0236b963a6..bf551885e019 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_smp.c
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_smp.c
@@ -1,4 +1,5 @@
 /*
+ * Copyright (c) 2014, The Linux Foundation. All rights reserved.
  * Copyright (C) 2013 Red Hat
  * Author: Rob Clark <robdclark@gmail.com>
  *
@@ -29,8 +30,11 @@
  * Based on the size of the attached scanout buffer, a certain # of
  * blocks must be allocated to that client out of the shared pool.
  *
- * For each block, it can be either free, or pending/in-use by a
- * client.  The updates happen in three steps:
+ * In some hw, some blocks are statically allocated for certain pipes
+ * and CANNOT be re-allocated (eg: MMB0 and MMB1 both tied to RGB0).
+ *
+ * For each block that can be dynamically allocated, it can be either
+ * free, or pending/in-use by a client. The updates happen in three steps:
  *
  *  1) mdp5_smp_request():
  *     When plane scanout is setup, calculate required number of
@@ -61,21 +65,68 @@
  * inuse and pending state of all clients..
  */
 
-static DEFINE_SPINLOCK(smp_lock);
+struct mdp5_smp {
+	struct drm_device *dev;
+
+	int blk_cnt;
+	int blk_size;
+
+	spinlock_t state_lock;
+	mdp5_smp_state_t state; /* to track smp allocation amongst pipes: */
+
+	struct mdp5_client_smp_state client_state[CID_MAX];
+};
 
+static inline
+struct mdp5_kms *get_kms(struct mdp5_smp *smp)
+{
+	struct msm_drm_private *priv = smp->dev->dev_private;
+
+	return to_mdp5_kms(to_mdp_kms(priv->kms));
+}
+
+static inline enum mdp5_client_id pipe2client(enum mdp5_pipe pipe, int plane)
+{
+	WARN_ON(plane >= pipe2nclients(pipe));
+	switch (pipe) {
+	case SSPP_VIG0: return CID_VIG0_Y + plane;
+	case SSPP_VIG1: return CID_VIG1_Y + plane;
+	case SSPP_VIG2: return CID_VIG2_Y + plane;
+	case SSPP_RGB0: return CID_RGB0;
+	case SSPP_RGB1: return CID_RGB1;
+	case SSPP_RGB2: return CID_RGB2;
+	case SSPP_DMA0: return CID_DMA0_Y + plane;
+	case SSPP_DMA1: return CID_DMA1_Y + plane;
+	case SSPP_VIG3: return CID_VIG3_Y + plane;
+	case SSPP_RGB3: return CID_RGB3;
+	default:        return CID_UNUSED;
+	}
+}
 
 /* step #1: update # of blocks pending for the client: */
-int mdp5_smp_request(struct mdp5_kms *mdp5_kms,
+static int smp_request_block(struct mdp5_smp *smp,
 		enum mdp5_client_id cid, int nblks)
 {
-	struct mdp5_client_smp_state *ps = &mdp5_kms->smp_client_state[cid];
-	int i, ret, avail, cur_nblks, cnt = mdp5_kms->smp_blk_cnt;
+	struct mdp5_kms *mdp5_kms = get_kms(smp);
+	const struct mdp5_cfg_hw *hw_cfg;
+	struct mdp5_client_smp_state *ps = &smp->client_state[cid];
+	int i, ret, avail, cur_nblks, cnt = smp->blk_cnt;
+	int reserved;
 	unsigned long flags;
 
-	spin_lock_irqsave(&smp_lock, flags);
+	hw_cfg = mdp5_cfg_get_hw_config(mdp5_kms->cfg);
+	reserved = hw_cfg->smp.reserved[cid];
+
+	spin_lock_irqsave(&smp->state_lock, flags);
 
-	avail = cnt - bitmap_weight(mdp5_kms->smp_state, cnt);
+	nblks -= reserved;
+	if (reserved)
+		DBG("%d MMBs allocated (%d reserved)", nblks, reserved);
+
+	avail = cnt - bitmap_weight(smp->state, cnt);
 	if (nblks > avail) {
+		dev_err(mdp5_kms->dev->dev, "out of blks (req=%d > avail=%d)\n",
+				nblks, avail);
 		ret = -ENOSPC;
 		goto fail;
 	}
@@ -84,9 +135,9 @@ int mdp5_smp_request(struct mdp5_kms *mdp5_kms,
 	if (nblks > cur_nblks) {
 		/* grow the existing pending reservation: */
 		for (i = cur_nblks; i < nblks; i++) {
-			int blk = find_first_zero_bit(mdp5_kms->smp_state, cnt);
+			int blk = find_first_zero_bit(smp->state, cnt);
 			set_bit(blk, ps->pending);
-			set_bit(blk, mdp5_kms->smp_state);
+			set_bit(blk, smp->state);
 		}
 	} else {
 		/* shrink the existing pending reservation: */
@@ -98,15 +149,88 @@ int mdp5_smp_request(struct mdp5_kms *mdp5_kms,
 	}
 
 fail:
-	spin_unlock_irqrestore(&smp_lock, flags);
+	spin_unlock_irqrestore(&smp->state_lock, flags);
+	return 0;
+}
+
+static void set_fifo_thresholds(struct mdp5_smp *smp,
+		enum mdp5_pipe pipe, int nblks)
+{
+	struct mdp5_kms *mdp5_kms = get_kms(smp);
+	u32 smp_entries_per_blk = smp->blk_size / (128 / BITS_PER_BYTE);
+	u32 val;
+
+	/* 1/4 of SMP pool that is being fetched */
+	val = (nblks * smp_entries_per_blk) / 4;
+
+	mdp5_write(mdp5_kms, REG_MDP5_PIPE_REQPRIO_FIFO_WM_0(pipe), val * 1);
+	mdp5_write(mdp5_kms, REG_MDP5_PIPE_REQPRIO_FIFO_WM_1(pipe), val * 2);
+	mdp5_write(mdp5_kms, REG_MDP5_PIPE_REQPRIO_FIFO_WM_2(pipe), val * 3);
+}
+
+/*
+ * NOTE: looks like if horizontal decimation is used (if we supported that)
+ * then the width used to calculate SMP block requirements is the post-
+ * decimated width.  Ie. SMP buffering sits downstream of decimation (which
+ * presumably happens during the dma from scanout buffer).
+ */
+int mdp5_smp_request(struct mdp5_smp *smp, enum mdp5_pipe pipe, u32 fmt, u32 width)
+{
+	struct mdp5_kms *mdp5_kms = get_kms(smp);
+	struct drm_device *dev = mdp5_kms->dev;
+	int rev = mdp5_cfg_get_hw_rev(mdp5_kms->cfg);
+	int i, hsub, nplanes, nlines, nblks, ret;
+
+	nplanes = drm_format_num_planes(fmt);
+	hsub = drm_format_horz_chroma_subsampling(fmt);
+
+	/* different if BWC (compressed framebuffer?) enabled: */
+	nlines = 2;
+
+	for (i = 0, nblks = 0; i < nplanes; i++) {
+		int n, fetch_stride, cpp;
+
+		cpp = drm_format_plane_cpp(fmt, i);
+		fetch_stride = width * cpp / (i ? hsub : 1);
+
+		n = DIV_ROUND_UP(fetch_stride * nlines, smp->blk_size);
+
+		/* for hw rev v1.00 */
+		if (rev == 0)
+			n = roundup_pow_of_two(n);
+
+		DBG("%s[%d]: request %d SMP blocks", pipe2name(pipe), i, n);
+		ret = smp_request_block(smp, pipe2client(pipe, i), n);
+		if (ret) {
+			dev_err(dev->dev, "Cannot allocate %d SMP blocks: %d\n",
+					n, ret);
+			return ret;
+		}
+
+		nblks += n;
+	}
+
+	set_fifo_thresholds(smp, pipe, nblks);
+
 	return 0;
 }
 
-static void update_smp_state(struct mdp5_kms *mdp5_kms,
+/* Release SMP blocks for all clients of the pipe */
+void mdp5_smp_release(struct mdp5_smp *smp, enum mdp5_pipe pipe)
+{
+	int i, nblks;
+
+	for (i = 0, nblks = 0; i < pipe2nclients(pipe); i++)
+		smp_request_block(smp, pipe2client(pipe, i), 0);
+	set_fifo_thresholds(smp, pipe, 0);
+}
+
+static void update_smp_state(struct mdp5_smp *smp,
 		enum mdp5_client_id cid, mdp5_smp_state_t *assigned)
 {
-	int cnt = mdp5_kms->smp_blk_cnt;
-	uint32_t blk, val;
+	struct mdp5_kms *mdp5_kms = get_kms(smp);
+	int cnt = smp->blk_cnt;
+	u32 blk, val;
 
 	for_each_set_bit(blk, *assigned, cnt) {
 		int idx = blk / 3;
@@ -135,39 +259,80 @@ static void update_smp_state(struct mdp5_kms *mdp5_kms,
 }
 
 /* step #2: configure hw for union(pending, inuse): */
-void mdp5_smp_configure(struct mdp5_kms *mdp5_kms, enum mdp5_client_id cid)
+void mdp5_smp_configure(struct mdp5_smp *smp, enum mdp5_pipe pipe)
 {
-	struct mdp5_client_smp_state *ps = &mdp5_kms->smp_client_state[cid];
-	int cnt = mdp5_kms->smp_blk_cnt;
+	int cnt = smp->blk_cnt;
 	mdp5_smp_state_t assigned;
+	int i;
 
-	bitmap_or(assigned, ps->inuse, ps->pending, cnt);
-	update_smp_state(mdp5_kms, cid, &assigned);
+	for (i = 0; i < pipe2nclients(pipe); i++) {
+		enum mdp5_client_id cid = pipe2client(pipe, i);
+		struct mdp5_client_smp_state *ps = &smp->client_state[cid];
+
+		bitmap_or(assigned, ps->inuse, ps->pending, cnt);
+		update_smp_state(smp, cid, &assigned);
+	}
 }
 
 /* step #3: after vblank, copy pending -> inuse: */
-void mdp5_smp_commit(struct mdp5_kms *mdp5_kms, enum mdp5_client_id cid)
+void mdp5_smp_commit(struct mdp5_smp *smp, enum mdp5_pipe pipe)
 {
-	struct mdp5_client_smp_state *ps = &mdp5_kms->smp_client_state[cid];
-	int cnt = mdp5_kms->smp_blk_cnt;
+	int cnt = smp->blk_cnt;
 	mdp5_smp_state_t released;
+	int i;
+
+	for (i = 0; i < pipe2nclients(pipe); i++) {
+		enum mdp5_client_id cid = pipe2client(pipe, i);
+		struct mdp5_client_smp_state *ps = &smp->client_state[cid];
+
+		/*
+		 * Figure out if there are any blocks we where previously
+		 * using, which can be released and made available to other
+		 * clients:
+		 */
+		if (bitmap_andnot(released, ps->inuse, ps->pending, cnt)) {
+			unsigned long flags;
+
+			spin_lock_irqsave(&smp->state_lock, flags);
+			/* clear released blocks: */
+			bitmap_andnot(smp->state, smp->state, released, cnt);
+			spin_unlock_irqrestore(&smp->state_lock, flags);
 
-	/*
-	 * Figure out if there are any blocks we where previously
-	 * using, which can be released and made available to other
-	 * clients:
-	 */
-	if (bitmap_andnot(released, ps->inuse, ps->pending, cnt)) {
-		unsigned long flags;
-
-		spin_lock_irqsave(&smp_lock, flags);
-		/* clear released blocks: */
-		bitmap_andnot(mdp5_kms->smp_state, mdp5_kms->smp_state,
-				released, cnt);
-		spin_unlock_irqrestore(&smp_lock, flags);
-
-		update_smp_state(mdp5_kms, CID_UNUSED, &released);
+			update_smp_state(smp, CID_UNUSED, &released);
+		}
+
+		bitmap_copy(ps->inuse, ps->pending, cnt);
 	}
+}
+
+void mdp5_smp_destroy(struct mdp5_smp *smp)
+{
+	kfree(smp);
+}
+
+struct mdp5_smp *mdp5_smp_init(struct drm_device *dev, const struct mdp5_smp_block *cfg)
+{
+	struct mdp5_smp *smp = NULL;
+	int ret;
+
+	smp = kzalloc(sizeof(*smp), GFP_KERNEL);
+	if (unlikely(!smp)) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	smp->dev = dev;
+	smp->blk_cnt = cfg->mmb_count;
+	smp->blk_size = cfg->mmb_size;
+
+	/* statically tied MMBs cannot be re-allocated: */
+	bitmap_copy(smp->state, cfg->reserved_state, smp->blk_cnt);
+	spin_lock_init(&smp->state_lock);
+
+	return smp;
+fail:
+	if (smp)
+		mdp5_smp_destroy(smp);
 
-	bitmap_copy(ps->inuse, ps->pending, cnt);
+	return ERR_PTR(ret);
 }
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_smp.h b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_smp.h
index 0ab739e1a1dd..e47179f63585 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_smp.h
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_smp.h
@@ -1,4 +1,5 @@
 /*
+ * Copyright (c) 2014, The Linux Foundation. All rights reserved.
  * Copyright (C) 2013 Red Hat
  * Author: Rob Clark <robdclark@gmail.com>
  *
@@ -20,22 +21,26 @@
 
 #include "msm_drv.h"
 
-#define MAX_SMP_BLOCKS  22
-#define SMP_BLK_SIZE    4096
-#define SMP_ENTRIES_PER_BLK (SMP_BLK_SIZE / 16)
-
-typedef DECLARE_BITMAP(mdp5_smp_state_t, MAX_SMP_BLOCKS);
-
 struct mdp5_client_smp_state {
 	mdp5_smp_state_t inuse;
 	mdp5_smp_state_t pending;
 };
 
 struct mdp5_kms;
+struct mdp5_smp;
+
+/*
+ * SMP module prototypes:
+ * mdp5_smp_init() returns a SMP @handler,
+ * which is then used to call the other mdp5_smp_*(handler, ...) functions.
+ */
 
-int mdp5_smp_request(struct mdp5_kms *mdp5_kms, enum mdp5_client_id cid, int nblks);
-void mdp5_smp_configure(struct mdp5_kms *mdp5_kms, enum mdp5_client_id cid);
-void mdp5_smp_commit(struct mdp5_kms *mdp5_kms, enum mdp5_client_id cid);
+struct mdp5_smp *mdp5_smp_init(struct drm_device *dev, const struct mdp5_smp_block *cfg);
+void  mdp5_smp_destroy(struct mdp5_smp *smp);
 
+int  mdp5_smp_request(struct mdp5_smp *smp, enum mdp5_pipe pipe, u32 fmt, u32 width);
+void mdp5_smp_configure(struct mdp5_smp *smp, enum mdp5_pipe pipe);
+void mdp5_smp_commit(struct mdp5_smp *smp, enum mdp5_pipe pipe);
+void mdp5_smp_release(struct mdp5_smp *smp, enum mdp5_pipe pipe);
 
 #endif /* __MDP5_SMP_H__ */
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
new file mode 100644
index 000000000000..f0de412e13dc
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_atomic.c
@@ -0,0 +1,163 @@
+/*
+ * Copyright (C) 2014 Red Hat
+ * Author: Rob Clark <robdclark@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "msm_drv.h"
+#include "msm_kms.h"
+#include "msm_gem.h"
+
+struct msm_commit {
+	struct drm_atomic_state *state;
+	uint32_t fence;
+	struct msm_fence_cb fence_cb;
+};
+
+static void fence_cb(struct msm_fence_cb *cb);
+
+static struct msm_commit *new_commit(struct drm_atomic_state *state)
+{
+	struct msm_commit *c = kzalloc(sizeof(*c), GFP_KERNEL);
+
+	if (!c)
+		return NULL;
+
+	c->state = state;
+	/* TODO we might need a way to indicate to run the cb on a
+	 * different wq so wait_for_vblanks() doesn't block retiring
+	 * bo's..
+	 */
+	INIT_FENCE_CB(&c->fence_cb, fence_cb);
+
+	return c;
+}
+
+/* The (potentially) asynchronous part of the commit.  At this point
+ * nothing can fail short of armageddon.
+ */
+static void complete_commit(struct msm_commit *c)
+{
+	struct drm_atomic_state *state = c->state;
+	struct drm_device *dev = state->dev;
+
+	drm_atomic_helper_commit_pre_planes(dev, state);
+
+	drm_atomic_helper_commit_planes(dev, state);
+
+	drm_atomic_helper_commit_post_planes(dev, state);
+
+	drm_atomic_helper_wait_for_vblanks(dev, state);
+
+	drm_atomic_helper_cleanup_planes(dev, state);
+
+	drm_atomic_state_free(state);
+
+	kfree(c);
+}
+
+static void fence_cb(struct msm_fence_cb *cb)
+{
+	struct msm_commit *c =
+			container_of(cb, struct msm_commit, fence_cb);
+	complete_commit(c);
+}
+
+static void add_fb(struct msm_commit *c, struct drm_framebuffer *fb)
+{
+	struct drm_gem_object *obj = msm_framebuffer_bo(fb, 0);
+	c->fence = max(c->fence, msm_gem_fence(to_msm_bo(obj), MSM_PREP_READ));
+}
+
+
+/**
+ * drm_atomic_helper_commit - commit validated state object
+ * @dev: DRM device
+ * @state: the driver state object
+ * @async: asynchronous commit
+ *
+ * This function commits a with drm_atomic_helper_check() pre-validated state
+ * object. This can still fail when e.g. the framebuffer reservation fails. For
+ * now this doesn't implement asynchronous commits.
+ *
+ * RETURNS
+ * Zero for success or -errno.
+ */
+int msm_atomic_commit(struct drm_device *dev,
+		struct drm_atomic_state *state, bool async)
+{
+	struct msm_commit *c;
+	int nplanes = dev->mode_config.num_total_plane;
+	int i, ret;
+
+	ret = drm_atomic_helper_prepare_planes(dev, state);
+	if (ret)
+		return ret;
+
+	c = new_commit(state);
+
+	/*
+	 * Figure out what fence to wait for:
+	 */
+	for (i = 0; i < nplanes; i++) {
+		struct drm_plane *plane = state->planes[i];
+		struct drm_plane_state *new_state = state->plane_states[i];
+
+		if (!plane)
+			continue;
+
+		if ((plane->state->fb != new_state->fb) && new_state->fb)
+			add_fb(c, new_state->fb);
+	}
+
+	/*
+	 * This is the point of no return - everything below never fails except
+	 * when the hw goes bonghits. Which means we can commit the new state on
+	 * the software side now.
+	 */
+
+	drm_atomic_helper_swap_state(dev, state);
+
+	/*
+	 * Everything below can be run asynchronously without the need to grab
+	 * any modeset locks at all under one conditions: It must be guaranteed
+	 * that the asynchronous work has either been cancelled (if the driver
+	 * supports it, which at least requires that the framebuffers get
+	 * cleaned up with drm_atomic_helper_cleanup_planes()) or completed
+	 * before the new state gets committed on the software side with
+	 * drm_atomic_helper_swap_state().
+	 *
+	 * This scheme allows new atomic state updates to be prepared and
+	 * checked in parallel to the asynchronous completion of the previous
+	 * update. Which is important since compositors need to figure out the
+	 * composition of the next frame right after having submitted the
+	 * current layout.
+	 */
+
+	if (async) {
+		msm_queue_fence_cb(dev, &c->fence_cb, c->fence);
+		return 0;
+	}
+
+	ret = msm_wait_fence_interruptable(dev, c->fence, NULL);
+	if (ret) {
+		WARN_ON(ret);  // TODO unswap state back?  or??
+		kfree(c);
+		return ret;
+	}
+
+	complete_commit(c);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 42e1c48eef28..c795217e1bfc 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -29,6 +29,8 @@ static void msm_fb_output_poll_changed(struct drm_device *dev)
 static const struct drm_mode_config_funcs mode_config_funcs = {
 	.fb_create = msm_framebuffer_create,
 	.output_poll_changed = msm_fb_output_poll_changed,
+	.atomic_check = drm_atomic_helper_check,
+	.atomic_commit = msm_atomic_commit,
 };
 
 int msm_register_mmu(struct drm_device *dev, struct msm_mmu *mmu)
@@ -294,6 +296,8 @@ static int msm_load(struct drm_device *dev, unsigned long flags)
 		goto fail;
 	}
 
+	drm_mode_config_reset(dev);
+
 #ifdef CONFIG_DRM_MSM_FBDEV
 	priv->fbdev = msm_fbdev_init(dev);
 #endif
@@ -619,6 +623,26 @@ int msm_wait_fence_interruptable(struct drm_device *dev, uint32_t fence,
 	return ret;
 }
 
+int msm_queue_fence_cb(struct drm_device *dev,
+		struct msm_fence_cb *cb, uint32_t fence)
+{
+	struct msm_drm_private *priv = dev->dev_private;
+	int ret = 0;
+
+	mutex_lock(&dev->struct_mutex);
+	if (!list_empty(&cb->work.entry)) {
+		ret = -EINVAL;
+	} else if (fence > priv->completed_fence) {
+		cb->fence = fence;
+		list_add_tail(&cb->work.entry, &priv->fence_cbs);
+	} else {
+		queue_work(priv->wq, &cb->work);
+	}
+	mutex_unlock(&dev->struct_mutex);
+
+	return ret;
+}
+
 /* called from workqueue */
 void msm_update_fence(struct drm_device *dev, uint32_t fence)
 {
@@ -832,6 +856,7 @@ static struct drm_driver msm_driver = {
 	.gem_prime_import_sg_table = msm_gem_prime_import_sg_table,
 	.gem_prime_vmap     = msm_gem_prime_vmap,
 	.gem_prime_vunmap   = msm_gem_prime_vunmap,
+	.gem_prime_mmap     = msm_gem_prime_mmap,
 #ifdef CONFIG_DEBUG_FS
 	.debugfs_init       = msm_debugfs_init,
 	.debugfs_cleanup    = msm_debugfs_cleanup,
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 67f9d0a2332c..136303818436 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -32,15 +32,6 @@
 #include <linux/types.h>
 #include <asm/sizes.h>
 
-
-#if defined(CONFIG_COMPILE_TEST) && !defined(CONFIG_ARCH_QCOM)
-/* stubs we need for compile-test: */
-static inline struct device *msm_iommu_get_ctx(const char *ctx_name)
-{
-	return NULL;
-}
-#endif
-
 #ifndef CONFIG_OF
 #include <mach/board.h>
 #include <mach/socinfo.h>
@@ -48,7 +39,10 @@ static inline struct device *msm_iommu_get_ctx(const char *ctx_name)
 #endif
 
 #include <drm/drmP.h>
+#include <drm/drm_atomic.h>
+#include <drm/drm_atomic_helper.h>
 #include <drm/drm_crtc_helper.h>
+#include <drm/drm_plane_helper.h>
 #include <drm/drm_fb_helper.h>
 #include <drm/msm_drm.h>
 #include <drm/drm_gem.h>
@@ -75,7 +69,12 @@ struct msm_drm_private {
 	struct msm_kms *kms;
 
 	/* subordinate devices, if present: */
-	struct platform_device *hdmi_pdev, *gpu_pdev;
+	struct platform_device *gpu_pdev;
+
+	/* possibly this should be in the kms component, but it is
+	 * shared by both mdp4 and mdp5..
+	 */
+	struct hdmi *hdmi;
 
 	/* when we have more than one 'msm_gpu' these need to be an array: */
 	struct msm_gpu *gpu;
@@ -145,21 +144,29 @@ void __msm_fence_worker(struct work_struct *work);
 		(_cb)->func = _func;                         \
 	} while (0)
 
+int msm_atomic_commit(struct drm_device *dev,
+		struct drm_atomic_state *state, bool async);
+
 int msm_register_mmu(struct drm_device *dev, struct msm_mmu *mmu);
 
 int msm_wait_fence_interruptable(struct drm_device *dev, uint32_t fence,
 		struct timespec *timeout);
+int msm_queue_fence_cb(struct drm_device *dev,
+		struct msm_fence_cb *cb, uint32_t fence);
 void msm_update_fence(struct drm_device *dev, uint32_t fence);
 
 int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 		struct drm_file *file);
 
+int msm_gem_mmap_obj(struct drm_gem_object *obj,
+			struct vm_area_struct *vma);
 int msm_gem_mmap(struct file *filp, struct vm_area_struct *vma);
 int msm_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
 uint64_t msm_gem_mmap_offset(struct drm_gem_object *obj);
 int msm_gem_get_iova_locked(struct drm_gem_object *obj, int id,
 		uint32_t *iova);
 int msm_gem_get_iova(struct drm_gem_object *obj, int id, uint32_t *iova);
+uint32_t msm_gem_iova(struct drm_gem_object *obj, int id);
 struct page **msm_gem_get_pages(struct drm_gem_object *obj);
 void msm_gem_put_pages(struct drm_gem_object *obj);
 void msm_gem_put_iova(struct drm_gem_object *obj, int id);
@@ -170,6 +177,7 @@ int msm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
 struct sg_table *msm_gem_prime_get_sg_table(struct drm_gem_object *obj);
 void *msm_gem_prime_vmap(struct drm_gem_object *obj);
 void msm_gem_prime_vunmap(struct drm_gem_object *obj, void *vaddr);
+int msm_gem_prime_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma);
 struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev,
 		struct dma_buf_attachment *attach, struct sg_table *sg);
 int msm_gem_prime_pin(struct drm_gem_object *obj);
@@ -192,6 +200,9 @@ struct drm_gem_object *msm_gem_new(struct drm_device *dev,
 struct drm_gem_object *msm_gem_import(struct drm_device *dev,
 		uint32_t size, struct sg_table *sgt);
 
+int msm_framebuffer_prepare(struct drm_framebuffer *fb, int id);
+void msm_framebuffer_cleanup(struct drm_framebuffer *fb, int id);
+uint32_t msm_framebuffer_iova(struct drm_framebuffer *fb, int id, int plane);
 struct drm_gem_object *msm_framebuffer_bo(struct drm_framebuffer *fb, int plane);
 const struct msm_format *msm_framebuffer_format(struct drm_framebuffer *fb);
 struct drm_framebuffer *msm_framebuffer_init(struct drm_device *dev,
@@ -202,8 +213,8 @@ struct drm_framebuffer *msm_framebuffer_create(struct drm_device *dev,
 struct drm_fb_helper *msm_fbdev_init(struct drm_device *dev);
 
 struct hdmi;
-struct hdmi *hdmi_init(struct drm_device *dev, struct drm_encoder *encoder);
-irqreturn_t hdmi_irq(int irq, void *dev_id);
+int hdmi_modeset_init(struct hdmi *hdmi, struct drm_device *dev,
+		struct drm_encoder *encoder);
 void __init hdmi_register(void);
 void __exit hdmi_unregister(void);
 
diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
index 81bafdf19ab3..84dec161d836 100644
--- a/drivers/gpu/drm/msm/msm_fb.c
+++ b/drivers/gpu/drm/msm/msm_fb.c
@@ -24,7 +24,7 @@
 struct msm_framebuffer {
 	struct drm_framebuffer base;
 	const struct msm_format *format;
-	struct drm_gem_object *planes[2];
+	struct drm_gem_object *planes[3];
 };
 #define to_msm_framebuffer(x) container_of(x, struct msm_framebuffer, base)
 
@@ -87,6 +87,44 @@ void msm_framebuffer_describe(struct drm_framebuffer *fb, struct seq_file *m)
 }
 #endif
 
+/* prepare/pin all the fb's bo's for scanout.  Note that it is not valid
+ * to prepare an fb more multiple different initiator 'id's.  But that
+ * should be fine, since only the scanout (mdpN) side of things needs
+ * this, the gpu doesn't care about fb's.
+ */
+int msm_framebuffer_prepare(struct drm_framebuffer *fb, int id)
+{
+	struct msm_framebuffer *msm_fb = to_msm_framebuffer(fb);
+	int ret, i, n = drm_format_num_planes(fb->pixel_format);
+	uint32_t iova;
+
+	for (i = 0; i < n; i++) {
+		ret = msm_gem_get_iova(msm_fb->planes[i], id, &iova);
+		DBG("FB[%u]: iova[%d]: %08x (%d)", fb->base.id, i, iova, ret);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+void msm_framebuffer_cleanup(struct drm_framebuffer *fb, int id)
+{
+	struct msm_framebuffer *msm_fb = to_msm_framebuffer(fb);
+	int i, n = drm_format_num_planes(fb->pixel_format);
+
+	for (i = 0; i < n; i++)
+		msm_gem_put_iova(msm_fb->planes[i], id);
+}
+
+uint32_t msm_framebuffer_iova(struct drm_framebuffer *fb, int id, int plane)
+{
+	struct msm_framebuffer *msm_fb = to_msm_framebuffer(fb);
+	if (!msm_fb->planes[plane])
+		return 0;
+	return msm_gem_iova(msm_fb->planes[plane], id);
+}
+
 struct drm_gem_object *msm_framebuffer_bo(struct drm_framebuffer *fb, int plane)
 {
 	struct msm_framebuffer *msm_fb = to_msm_framebuffer(fb);
@@ -166,6 +204,11 @@ struct drm_framebuffer *msm_framebuffer_init(struct drm_device *dev,
 
 	msm_fb->format = format;
 
+	if (n > ARRAY_SIZE(msm_fb->planes)) {
+		ret = -EINVAL;
+		goto fail;
+	}
+
 	for (i = 0; i < n; i++) {
 		unsigned int width = mode_cmd->width / (i ? hsub : 1);
 		unsigned int height = mode_cmd->height / (i ? vsub : 1);
diff --git a/drivers/gpu/drm/msm/msm_fbdev.c b/drivers/gpu/drm/msm/msm_fbdev.c
index ab5bfd2d0ebf..94d55e526b4e 100644
--- a/drivers/gpu/drm/msm/msm_fbdev.c
+++ b/drivers/gpu/drm/msm/msm_fbdev.c
@@ -93,9 +93,6 @@ static int msm_fbdev_create(struct drm_fb_helper *helper,
 	uint32_t paddr;
 	int ret, size;
 
-	sizes->surface_bpp = 32;
-	sizes->surface_depth = 24;
-
 	DBG("create fbdev: %dx%d@%d (%dx%d)", sizes->surface_width,
 			sizes->surface_height, sizes->surface_bpp,
 			sizes->fb_width, sizes->fb_height);
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 4b1b82adabde..4a6f0e49d5b5 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -309,6 +309,7 @@ int msm_gem_get_iova_locked(struct drm_gem_object *obj, int id,
 	return ret;
 }
 
+/* get iova, taking a reference.  Should have a matching put */
 int msm_gem_get_iova(struct drm_gem_object *obj, int id, uint32_t *iova)
 {
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
@@ -328,6 +329,16 @@ int msm_gem_get_iova(struct drm_gem_object *obj, int id, uint32_t *iova)
 	return ret;
 }
 
+/* get iova without taking a reference, used in places where you have
+ * already done a 'msm_gem_get_iova()'.
+ */
+uint32_t msm_gem_iova(struct drm_gem_object *obj, int id)
+{
+	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	WARN_ON(!msm_obj->domain[id].iova);
+	return msm_obj->domain[id].iova;
+}
+
 void msm_gem_put_iova(struct drm_gem_object *obj, int id)
 {
 	// XXX TODO ..
@@ -397,23 +408,10 @@ void *msm_gem_vaddr(struct drm_gem_object *obj)
 int msm_gem_queue_inactive_cb(struct drm_gem_object *obj,
 		struct msm_fence_cb *cb)
 {
-	struct drm_device *dev = obj->dev;
-	struct msm_drm_private *priv = dev->dev_private;
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
-	int ret = 0;
-
-	mutex_lock(&dev->struct_mutex);
-	if (!list_empty(&cb->work.entry)) {
-		ret = -EINVAL;
-	} else if (is_active(msm_obj)) {
-		cb->fence = max(msm_obj->read_fence, msm_obj->write_fence);
-		list_add_tail(&cb->work.entry, &priv->fence_cbs);
-	} else {
-		queue_work(priv->wq, &cb->work);
-	}
-	mutex_unlock(&dev->struct_mutex);
-
-	return ret;
+	uint32_t fence = msm_gem_fence(msm_obj,
+			MSM_PREP_READ | MSM_PREP_WRITE);
+	return msm_queue_fence_cb(obj->dev, cb, fence);
 }
 
 void msm_gem_move_to_active(struct drm_gem_object *obj,
@@ -452,12 +450,8 @@ int msm_gem_cpu_prep(struct drm_gem_object *obj, uint32_t op,
 	int ret = 0;
 
 	if (is_active(msm_obj)) {
-		uint32_t fence = 0;
+		uint32_t fence = msm_gem_fence(msm_obj, op);
 
-		if (op & MSM_PREP_READ)
-			fence = msm_obj->write_fence;
-		if (op & MSM_PREP_WRITE)
-			fence = max(fence, msm_obj->read_fence);
 		if (op & MSM_PREP_NOSYNC)
 			timeout = NULL;
 
@@ -525,13 +519,11 @@ void msm_gem_free_object(struct drm_gem_object *obj)
 	for (id = 0; id < ARRAY_SIZE(msm_obj->domain); id++) {
 		struct msm_mmu *mmu = priv->mmus[id];
 		if (mmu && msm_obj->domain[id].iova) {
-			uint32_t offset = (uint32_t)mmap_offset(obj);
+			uint32_t offset = msm_obj->domain[id].iova;
 			mmu->funcs->unmap(mmu, offset, msm_obj->sgt, obj->size);
 		}
 	}
 
-	drm_gem_free_mmap_offset(obj);
-
 	if (obj->import_attach) {
 		if (msm_obj->vaddr)
 			dma_buf_vunmap(obj->import_attach->dmabuf, msm_obj->vaddr);
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index bfb052688f8e..8fbbd0594c46 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -70,6 +70,19 @@ static inline bool is_active(struct msm_gem_object *msm_obj)
 	return msm_obj->gpu != NULL;
 }
 
+static inline uint32_t msm_gem_fence(struct msm_gem_object *msm_obj,
+		uint32_t op)
+{
+	uint32_t fence = 0;
+
+	if (op & MSM_PREP_READ)
+		fence = msm_obj->write_fence;
+	if (op & MSM_PREP_WRITE)
+		fence = max(fence, msm_obj->read_fence);
+
+	return fence;
+}
+
 #define MAX_CMDS 4
 
 /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc,
diff --git a/drivers/gpu/drm/msm/msm_gem_prime.c b/drivers/gpu/drm/msm/msm_gem_prime.c
index ad772fe36115..dd7a7ab603e2 100644
--- a/drivers/gpu/drm/msm/msm_gem_prime.c
+++ b/drivers/gpu/drm/msm/msm_gem_prime.c
@@ -37,6 +37,19 @@ void msm_gem_prime_vunmap(struct drm_gem_object *obj, void *vaddr)
 	/* TODO msm_gem_vunmap() */
 }
 
+int msm_gem_prime_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
+{
+	int ret;
+
+	mutex_lock(&obj->dev->struct_mutex);
+	ret = drm_gem_mmap_obj(obj, obj->size, vma);
+	mutex_unlock(&obj->dev->struct_mutex);
+	if (ret < 0)
+		return ret;
+
+	return msm_gem_mmap_obj(vma->vm_private_data, vma);
+}
+
 struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev,
 		struct dma_buf_attachment *attach, struct sg_table *sg)
 {
diff --git a/drivers/gpu/drm/nouveau/Makefile b/drivers/gpu/drm/nouveau/Makefile
index 12c24c8abf7f..6461e3565afe 100644
--- a/drivers/gpu/drm/nouveau/Makefile
+++ b/drivers/gpu/drm/nouveau/Makefile
@@ -41,17 +41,28 @@ nouveau-y += core/subdev/bios/extdev.o
 nouveau-y += core/subdev/bios/fan.o
 nouveau-y += core/subdev/bios/gpio.o
 nouveau-y += core/subdev/bios/i2c.o
+nouveau-y += core/subdev/bios/image.o
 nouveau-y += core/subdev/bios/init.o
 nouveau-y += core/subdev/bios/mxm.o
+nouveau-y += core/subdev/bios/npde.o
+nouveau-y += core/subdev/bios/pcir.o
 nouveau-y += core/subdev/bios/perf.o
 nouveau-y += core/subdev/bios/pll.o
+nouveau-y += core/subdev/bios/pmu.o
 nouveau-y += core/subdev/bios/ramcfg.o
 nouveau-y += core/subdev/bios/rammap.o
+nouveau-y += core/subdev/bios/shadow.o
+nouveau-y += core/subdev/bios/shadowacpi.o
+nouveau-y += core/subdev/bios/shadowof.o
+nouveau-y += core/subdev/bios/shadowpci.o
+nouveau-y += core/subdev/bios/shadowramin.o
+nouveau-y += core/subdev/bios/shadowrom.o
 nouveau-y += core/subdev/bios/timing.o
 nouveau-y += core/subdev/bios/therm.o
 nouveau-y += core/subdev/bios/vmap.o
 nouveau-y += core/subdev/bios/volt.o
 nouveau-y += core/subdev/bios/xpio.o
+nouveau-y += core/subdev/bios/M0203.o
 nouveau-y += core/subdev/bios/M0205.o
 nouveau-y += core/subdev/bios/M0209.o
 nouveau-y += core/subdev/bios/P0260.o
@@ -86,6 +97,7 @@ nouveau-y += core/subdev/devinit/nva3.o
 nouveau-y += core/subdev/devinit/nvaf.o
 nouveau-y += core/subdev/devinit/nvc0.o
 nouveau-y += core/subdev/devinit/gm107.o
+nouveau-y += core/subdev/devinit/gm204.o
 nouveau-y += core/subdev/fb/base.o
 nouveau-y += core/subdev/fb/nv04.o
 nouveau-y += core/subdev/fb/nv10.o
@@ -129,6 +141,7 @@ nouveau-y += core/subdev/fb/ramgk20a.o
 nouveau-y += core/subdev/fb/ramgm107.o
 nouveau-y += core/subdev/fb/sddr2.o
 nouveau-y += core/subdev/fb/sddr3.o
+nouveau-y += core/subdev/fb/gddr3.o
 nouveau-y += core/subdev/fb/gddr5.o
 nouveau-y += core/subdev/fuse/base.o
 nouveau-y += core/subdev/fuse/g80.o
@@ -147,6 +160,7 @@ nouveau-y += core/subdev/i2c/bit.o
 nouveau-y += core/subdev/i2c/pad.o
 nouveau-y += core/subdev/i2c/padnv04.o
 nouveau-y += core/subdev/i2c/padnv94.o
+nouveau-y += core/subdev/i2c/padgm204.o
 nouveau-y += core/subdev/i2c/nv04.o
 nouveau-y += core/subdev/i2c/nv4e.o
 nouveau-y += core/subdev/i2c/nv50.o
@@ -154,6 +168,7 @@ nouveau-y += core/subdev/i2c/nv94.o
 nouveau-y += core/subdev/i2c/nvd0.o
 nouveau-y += core/subdev/i2c/gf117.o
 nouveau-y += core/subdev/i2c/nve0.o
+nouveau-y += core/subdev/i2c/gm204.o
 nouveau-y += core/subdev/ibus/nvc0.o
 nouveau-y += core/subdev/ibus/nve0.o
 nouveau-y += core/subdev/ibus/gk20a.o
@@ -211,6 +226,7 @@ nouveau-y += core/subdev/vm/nvc0.o
 nouveau-y += core/subdev/volt/base.o
 nouveau-y += core/subdev/volt/gpio.o
 nouveau-y += core/subdev/volt/nv40.o
+nouveau-y += core/subdev/volt/gk20a.o
 
 nouveau-y += core/engine/falcon.o
 nouveau-y += core/engine/xtensa.o
@@ -254,6 +270,7 @@ nouveau-y += core/engine/disp/nvd0.o
 nouveau-y += core/engine/disp/nve0.o
 nouveau-y += core/engine/disp/nvf0.o
 nouveau-y += core/engine/disp/gm107.o
+nouveau-y += core/engine/disp/gm204.o
 nouveau-y += core/engine/disp/dacnv50.o
 nouveau-y += core/engine/disp/dport.o
 nouveau-y += core/engine/disp/hdanva3.o
@@ -266,6 +283,7 @@ nouveau-y += core/engine/disp/piornv50.o
 nouveau-y += core/engine/disp/sornv50.o
 nouveau-y += core/engine/disp/sornv94.o
 nouveau-y += core/engine/disp/sornvd0.o
+nouveau-y += core/engine/disp/sorgm204.o
 nouveau-y += core/engine/disp/vga.o
 nouveau-y += core/engine/fifo/base.o
 nouveau-y += core/engine/fifo/nv04.o
diff --git a/drivers/gpu/drm/nouveau/core/core/handle.c b/drivers/gpu/drm/nouveau/core/core/handle.c
index a490b805d7e3..13f816cb08bd 100644
--- a/drivers/gpu/drm/nouveau/core/core/handle.c
+++ b/drivers/gpu/drm/nouveau/core/core/handle.c
@@ -222,116 +222,3 @@ nouveau_handle_put(struct nouveau_handle *handle)
 	if (handle)
 		nouveau_namedb_put(handle);
 }
-
-int
-nouveau_handle_new(struct nouveau_object *client, u32 _parent, u32 _handle,
-		   u16 _oclass, void *data, u32 size,
-		   struct nouveau_object **pobject)
-{
-	struct nouveau_object *parent = NULL;
-	struct nouveau_object *engctx = NULL;
-	struct nouveau_object *object = NULL;
-	struct nouveau_object *engine;
-	struct nouveau_oclass *oclass;
-	struct nouveau_handle *handle;
-	int ret;
-
-	/* lookup parent object and ensure it *is* a parent */
-	parent = nouveau_handle_ref(client, _parent);
-	if (!parent) {
-		nv_error(client, "parent 0x%08x not found\n", _parent);
-		return -ENOENT;
-	}
-
-	if (!nv_iclass(parent, NV_PARENT_CLASS)) {
-		nv_error(parent, "cannot have children\n");
-		ret = -EINVAL;
-		goto fail_class;
-	}
-
-	/* check that parent supports the requested subclass */
-	ret = nouveau_parent_sclass(parent, _oclass, &engine, &oclass);
-	if (ret) {
-		nv_debug(parent, "illegal class 0x%04x\n", _oclass);
-		goto fail_class;
-	}
-
-	/* make sure engine init has been completed *before* any objects
-	 * it controls are created - the constructors may depend on
-	 * state calculated at init (ie. default context construction)
-	 */
-	if (engine) {
-		ret = nouveau_object_inc(engine);
-		if (ret)
-			goto fail_class;
-	}
-
-	/* if engine requires it, create a context object to insert
-	 * between the parent and its children (eg. PGRAPH context)
-	 */
-	if (engine && nv_engine(engine)->cclass) {
-		ret = nouveau_object_ctor(parent, engine,
-					  nv_engine(engine)->cclass,
-					  data, size, &engctx);
-		if (ret)
-			goto fail_engctx;
-	} else {
-		nouveau_object_ref(parent, &engctx);
-	}
-
-	/* finally, create new object and bind it to its handle */
-	ret = nouveau_object_ctor(engctx, engine, oclass, data, size, &object);
-	*pobject = object;
-	if (ret)
-		goto fail_ctor;
-
-	ret = nouveau_object_inc(object);
-	if (ret)
-		goto fail_init;
-
-	ret = nouveau_handle_create(parent, _parent, _handle, object, &handle);
-	if (ret)
-		goto fail_handle;
-
-	ret = nouveau_handle_init(handle);
-	if (ret)
-		nouveau_handle_destroy(handle);
-
-fail_handle:
-	nouveau_object_dec(object, false);
-fail_init:
-	nouveau_object_ref(NULL, &object);
-fail_ctor:
-	nouveau_object_ref(NULL, &engctx);
-fail_engctx:
-	if (engine)
-		nouveau_object_dec(engine, false);
-fail_class:
-	nouveau_object_ref(NULL, &parent);
-	return ret;
-}
-
-int
-nouveau_handle_del(struct nouveau_object *client, u32 _parent, u32 _handle)
-{
-	struct nouveau_object *parent = NULL;
-	struct nouveau_object *namedb = NULL;
-	struct nouveau_handle *handle = NULL;
-
-	parent = nouveau_handle_ref(client, _parent);
-	if (!parent)
-		return -ENOENT;
-
-	namedb = nv_pclass(parent, NV_NAMEDB_CLASS);
-	if (namedb) {
-		handle = nouveau_namedb_get(nv_namedb(namedb), _handle);
-		if (handle) {
-			nouveau_namedb_put(handle);
-			nouveau_handle_fini(handle, false);
-			nouveau_handle_destroy(handle);
-		}
-	}
-
-	nouveau_object_ref(NULL, &parent);
-	return handle ? 0 : -EINVAL;
-}
diff --git a/drivers/gpu/drm/nouveau/core/engine/device/base.c b/drivers/gpu/drm/nouveau/core/engine/device/base.c
index 0ef5a5713182..137e0b0faeae 100644
--- a/drivers/gpu/drm/nouveau/core/engine/device/base.c
+++ b/drivers/gpu/drm/nouveau/core/engine/device/base.c
@@ -29,6 +29,7 @@
 #include <nvif/unpack.h>
 #include <nvif/class.h>
 
+#include <subdev/bios.h>
 #include <subdev/fb.h>
 #include <subdev/instmem.h>
 
@@ -138,7 +139,7 @@ nouveau_devobj_info(struct nouveau_object *object, void *data, u32 size)
 	}
 
 	args->v0.chipset  = device->chipset;
-	args->v0.revision = device->chipset >= 0x10 ? nv_rd32(device, 0) : 0x00;
+	args->v0.revision = device->chiprev;
 	if (pfb)  args->v0.ram_size = args->v0.ram_user = pfb->ram->size;
 	else      args->v0.ram_size = args->v0.ram_user = 0;
 	if (imem) args->v0.ram_user = args->v0.ram_user - imem->reserved;
@@ -222,6 +223,7 @@ static const u64 disable_map[] = {
 	[NVDEV_SUBDEV_VOLT]	= NV_DEVICE_V0_DISABLE_CORE,
 	[NVDEV_SUBDEV_THERM]	= NV_DEVICE_V0_DISABLE_CORE,
 	[NVDEV_SUBDEV_PWR]	= NV_DEVICE_V0_DISABLE_CORE,
+	[NVDEV_SUBDEV_FUSE]	= NV_DEVICE_V0_DISABLE_CORE,
 	[NVDEV_ENGINE_DMAOBJ]	= NV_DEVICE_V0_DISABLE_CORE,
 	[NVDEV_ENGINE_PERFMON]  = NV_DEVICE_V0_DISABLE_CORE,
 	[NVDEV_ENGINE_FIFO]	= NV_DEVICE_V0_DISABLE_FIFO,
@@ -235,6 +237,7 @@ static const u64 disable_map[] = {
 	[NVDEV_ENGINE_PPP]	= NV_DEVICE_V0_DISABLE_PPP,
 	[NVDEV_ENGINE_COPY0]	= NV_DEVICE_V0_DISABLE_COPY0,
 	[NVDEV_ENGINE_COPY1]	= NV_DEVICE_V0_DISABLE_COPY1,
+	[NVDEV_ENGINE_COPY2]	= NV_DEVICE_V0_DISABLE_COPY1,
 	[NVDEV_ENGINE_VIC]	= NV_DEVICE_V0_DISABLE_VIC,
 	[NVDEV_ENGINE_VENC]	= NV_DEVICE_V0_DISABLE_VENC,
 	[NVDEV_ENGINE_DISP]	= NV_DEVICE_V0_DISABLE_DISP,
@@ -352,12 +355,14 @@ nouveau_devobj_ctor(struct nouveau_object *parent,
 		/* determine chipset and derive architecture from it */
 		if ((boot0 & 0x1f000000) > 0) {
 			device->chipset = (boot0 & 0x1ff00000) >> 20;
+			device->chiprev = (boot0 & 0x000000ff);
 			switch (device->chipset & 0x1f0) {
 			case 0x010: {
 				if (0x461 & (1 << (device->chipset & 0xf)))
 					device->card_type = NV_10;
 				else
 					device->card_type = NV_11;
+				device->chiprev = 0x00;
 				break;
 			}
 			case 0x020: device->card_type = NV_20; break;
@@ -373,7 +378,8 @@ nouveau_devobj_ctor(struct nouveau_object *parent,
 			case 0x0e0:
 			case 0x0f0:
 			case 0x100: device->card_type = NV_E0; break;
-			case 0x110: device->card_type = GM100; break;
+			case 0x110:
+			case 0x120: device->card_type = GM100; break;
 			default:
 				break;
 			}
@@ -427,6 +433,10 @@ nouveau_devobj_ctor(struct nouveau_object *parent,
 		}
 
 		nv_debug(device, "crystal freq: %dKHz\n", device->crystal);
+	} else
+	if ( (args->v0.disable & NV_DEVICE_V0_DISABLE_IDENTIFY)) {
+		device->cname = "NULL";
+		device->oclass[NVDEV_SUBDEV_VBIOS] = &nouveau_bios_oclass;
 	}
 
 	if (!(args->v0.disable & NV_DEVICE_V0_DISABLE_MMIO) &&
diff --git a/drivers/gpu/drm/nouveau/core/engine/device/gm100.c b/drivers/gpu/drm/nouveau/core/engine/device/gm100.c
index 6295668e29a5..4e74a3376de8 100644
--- a/drivers/gpu/drm/nouveau/core/engine/device/gm100.c
+++ b/drivers/gpu/drm/nouveau/core/engine/device/gm100.c
@@ -98,6 +98,49 @@ gm100_identify(struct nouveau_device *device)
 		device->oclass[NVDEV_ENGINE_PPP    ] = &nvc0_ppp_oclass;
 #endif
 		break;
+	case 0x124:
+		device->cname = "GM204";
+		device->oclass[NVDEV_SUBDEV_VBIOS  ] = &nouveau_bios_oclass;
+		device->oclass[NVDEV_SUBDEV_GPIO   ] =  nve0_gpio_oclass;
+		device->oclass[NVDEV_SUBDEV_I2C    ] =  gm204_i2c_oclass;
+		device->oclass[NVDEV_SUBDEV_FUSE   ] = &gm107_fuse_oclass;
+#if 0
+		/* looks to be some non-trivial changes */
+		device->oclass[NVDEV_SUBDEV_CLOCK  ] = &nve0_clock_oclass;
+		/* priv ring says no to 0x10eb14 writes */
+		device->oclass[NVDEV_SUBDEV_THERM  ] = &gm107_therm_oclass;
+#endif
+		device->oclass[NVDEV_SUBDEV_MXM    ] = &nv50_mxm_oclass;
+		device->oclass[NVDEV_SUBDEV_DEVINIT] =  gm204_devinit_oclass;
+		device->oclass[NVDEV_SUBDEV_MC     ] =  gk20a_mc_oclass;
+		device->oclass[NVDEV_SUBDEV_BUS    ] =  nvc0_bus_oclass;
+		device->oclass[NVDEV_SUBDEV_TIMER  ] = &gk20a_timer_oclass;
+		device->oclass[NVDEV_SUBDEV_FB     ] =  gm107_fb_oclass;
+		device->oclass[NVDEV_SUBDEV_LTC    ] =  gm107_ltc_oclass;
+		device->oclass[NVDEV_SUBDEV_IBUS   ] = &nve0_ibus_oclass;
+		device->oclass[NVDEV_SUBDEV_INSTMEM] =  nv50_instmem_oclass;
+		device->oclass[NVDEV_SUBDEV_VM     ] = &nvc0_vmmgr_oclass;
+		device->oclass[NVDEV_SUBDEV_BAR    ] = &nvc0_bar_oclass;
+		device->oclass[NVDEV_SUBDEV_PWR    ] =  nv108_pwr_oclass;
+#if 0
+		device->oclass[NVDEV_SUBDEV_VOLT   ] = &nv40_volt_oclass;
+#endif
+		device->oclass[NVDEV_ENGINE_DMAOBJ ] =  nvd0_dmaeng_oclass;
+#if 0
+		device->oclass[NVDEV_ENGINE_FIFO   ] =  nv108_fifo_oclass;
+		device->oclass[NVDEV_ENGINE_SW     ] =  nvc0_software_oclass;
+		device->oclass[NVDEV_ENGINE_GR     ] =  gm107_graph_oclass;
+#endif
+		device->oclass[NVDEV_ENGINE_DISP   ] =  gm204_disp_oclass;
+#if 0
+		device->oclass[NVDEV_ENGINE_COPY0  ] = &gm204_copy0_oclass;
+		device->oclass[NVDEV_ENGINE_COPY1  ] = &gm204_copy1_oclass;
+		device->oclass[NVDEV_ENGINE_COPY2  ] = &gm204_copy2_oclass;
+		device->oclass[NVDEV_ENGINE_BSP    ] = &nve0_bsp_oclass;
+		device->oclass[NVDEV_ENGINE_VP     ] = &nve0_vp_oclass;
+		device->oclass[NVDEV_ENGINE_PPP    ] = &nvc0_ppp_oclass;
+#endif
+		break;
 	default:
 		nv_fatal(device, "unknown Maxwell chipset\n");
 		return -EINVAL;
diff --git a/drivers/gpu/drm/nouveau/core/engine/device/nve0.c b/drivers/gpu/drm/nouveau/core/engine/device/nve0.c
index b1b2e484ecfa..674da1f095b2 100644
--- a/drivers/gpu/drm/nouveau/core/engine/device/nve0.c
+++ b/drivers/gpu/drm/nouveau/core/engine/device/nve0.c
@@ -179,6 +179,7 @@ nve0_identify(struct nouveau_device *device)
 		device->oclass[NVDEV_ENGINE_GR     ] =  gk20a_graph_oclass;
 		device->oclass[NVDEV_ENGINE_COPY2  ] = &nve0_copy2_oclass;
 		device->oclass[NVDEV_ENGINE_PERFMON] = &nve0_perfmon_oclass;
+		device->oclass[NVDEV_SUBDEV_VOLT   ] = &gk20a_volt_oclass;
 		break;
 	case 0xf0:
 		device->cname = "GK110";
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/dport.c b/drivers/gpu/drm/nouveau/core/engine/disp/dport.c
index 39890221b91c..16db08dfba6e 100644
--- a/drivers/gpu/drm/nouveau/core/engine/disp/dport.c
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/dport.c
@@ -28,7 +28,7 @@
 #include <subdev/bios/init.h>
 #include <subdev/i2c.h>
 
-#include <engine/disp.h>
+#include "nv50.h"
 
 #include <nvif/class.h>
 
@@ -326,7 +326,7 @@ void
 nouveau_dp_train(struct work_struct *w)
 {
 	struct nvkm_output_dp *outp = container_of(w, typeof(*outp), lt.work);
-	struct nouveau_disp *disp = nouveau_disp(outp);
+	struct nv50_disp_priv *priv = (void *)nouveau_disp(outp);
 	const struct dp_rates *cfg = nouveau_dp_rates;
 	struct dp_state _dp = {
 		.outp = outp,
@@ -334,8 +334,11 @@ nouveau_dp_train(struct work_struct *w)
 	u32 datarate = 0;
 	int ret;
 
+	if (!outp->base.info.location && priv->sor.magic)
+		priv->sor.magic(&outp->base);
+
 	/* bring capabilities within encoder limits */
-	if (nv_mclass(disp) < GF110_DISP)
+	if (nv_mclass(priv) < GF110_DISP)
 		outp->dpcd[2] &= ~DPCD_RC02_TPS3_SUPPORTED;
 	if ((outp->dpcd[2] & 0x1f) > outp->base.info.dpconf.link_nr) {
 		outp->dpcd[2] &= ~DPCD_RC02_MAX_LANE_COUNT;
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/gm107.c b/drivers/gpu/drm/nouveau/core/engine/disp/gm107.c
index b3df3fe2dc09..e2ad0543fb31 100644
--- a/drivers/gpu/drm/nouveau/core/engine/disp/gm107.c
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/gm107.c
@@ -35,8 +35,8 @@
 
 static struct nouveau_oclass
 gm107_disp_sclass[] = {
-	{ GM107_DISP_CORE_CHANNEL_DMA, &nvd0_disp_mast_ofuncs.base },
-	{ GK110_DISP_BASE_CHANNEL_DMA, &nvd0_disp_sync_ofuncs.base },
+	{ GM107_DISP_CORE_CHANNEL_DMA, &nvd0_disp_core_ofuncs.base },
+	{ GK110_DISP_BASE_CHANNEL_DMA, &nvd0_disp_base_ofuncs.base },
 	{ GK104_DISP_OVERLAY_CONTROL_DMA, &nvd0_disp_ovly_ofuncs.base },
 	{ GK104_DISP_OVERLAY, &nvd0_disp_oimm_ofuncs.base },
 	{ GK104_DISP_CURSOR, &nvd0_disp_curs_ofuncs.base },
@@ -44,8 +44,8 @@ gm107_disp_sclass[] = {
 };
 
 static struct nouveau_oclass
-gm107_disp_base_oclass[] = {
-	{ GM107_DISP, &nvd0_disp_base_ofuncs },
+gm107_disp_main_oclass[] = {
+	{ GM107_DISP, &nvd0_disp_main_ofuncs },
 	{}
 };
 
@@ -72,7 +72,7 @@ gm107_disp_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 	if (ret)
 		return ret;
 
-	nv_engine(priv)->sclass = gm107_disp_base_oclass;
+	nv_engine(priv)->sclass = gm107_disp_main_oclass;
 	nv_engine(priv)->cclass = &nv50_disp_cclass;
 	nv_subdev(priv)->intr = nvd0_disp_intr;
 	INIT_WORK(&priv->supervisor, nvd0_disp_intr_supervisor);
@@ -99,9 +99,9 @@ gm107_disp_oclass = &(struct nv50_disp_impl) {
 	},
 	.base.vblank = &nvd0_disp_vblank_func,
 	.base.outp =  nvd0_disp_outp_sclass,
-	.mthd.core = &nve0_disp_mast_mthd_chan,
-	.mthd.base = &nvd0_disp_sync_mthd_chan,
+	.mthd.core = &nve0_disp_core_mthd_chan,
+	.mthd.base = &nvd0_disp_base_mthd_chan,
 	.mthd.ovly = &nve0_disp_ovly_mthd_chan,
 	.mthd.prev = -0x020000,
-	.head.scanoutpos = nvd0_disp_base_scanoutpos,
+	.head.scanoutpos = nvd0_disp_main_scanoutpos,
 }.base.base;
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/gm204.c b/drivers/gpu/drm/nouveau/core/engine/disp/gm204.c
new file mode 100644
index 000000000000..672ded79b2a9
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/gm204.c
@@ -0,0 +1,114 @@
+/*
+ * Copyright 2012 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Ben Skeggs
+ */
+
+#include <engine/software.h>
+#include <engine/disp.h>
+
+#include <nvif/class.h>
+
+#include "nv50.h"
+
+/*******************************************************************************
+ * Base display object
+ ******************************************************************************/
+
+static struct nouveau_oclass
+gm204_disp_sclass[] = {
+	{ GM204_DISP_CORE_CHANNEL_DMA, &nvd0_disp_core_ofuncs.base },
+	{ GK110_DISP_BASE_CHANNEL_DMA, &nvd0_disp_base_ofuncs.base },
+	{ GK104_DISP_OVERLAY_CONTROL_DMA, &nvd0_disp_ovly_ofuncs.base },
+	{ GK104_DISP_OVERLAY, &nvd0_disp_oimm_ofuncs.base },
+	{ GK104_DISP_CURSOR, &nvd0_disp_curs_ofuncs.base },
+	{}
+};
+
+static struct nouveau_oclass
+gm204_disp_main_oclass[] = {
+	{ GM204_DISP, &nvd0_disp_main_ofuncs },
+	{}
+};
+
+/*******************************************************************************
+ * Display engine implementation
+ ******************************************************************************/
+
+static int
+gm204_disp_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
+	       struct nouveau_oclass *oclass, void *data, u32 size,
+	       struct nouveau_object **pobject)
+{
+	struct nv50_disp_priv *priv;
+	int heads = nv_rd32(parent, 0x022448);
+	int ret;
+
+	ret = nouveau_disp_create(parent, engine, oclass, heads,
+				  "PDISP", "display", &priv);
+	*pobject = nv_object(priv);
+	if (ret)
+		return ret;
+
+	ret = nvkm_event_init(&nvd0_disp_chan_uevent, 1, 17, &priv->uevent);
+	if (ret)
+		return ret;
+
+	nv_engine(priv)->sclass = gm204_disp_main_oclass;
+	nv_engine(priv)->cclass = &nv50_disp_cclass;
+	nv_subdev(priv)->intr = nvd0_disp_intr;
+	INIT_WORK(&priv->supervisor, nvd0_disp_intr_supervisor);
+	priv->sclass = gm204_disp_sclass;
+	priv->head.nr = heads;
+	priv->dac.nr = 3;
+	priv->sor.nr = 4;
+	priv->dac.power = nv50_dac_power;
+	priv->dac.sense = nv50_dac_sense;
+	priv->sor.power = nv50_sor_power;
+	priv->sor.hda_eld = nvd0_hda_eld;
+	priv->sor.hdmi = nvd0_hdmi_ctrl;
+	priv->sor.magic = gm204_sor_magic;
+	return 0;
+}
+
+struct nouveau_oclass *
+gm204_disp_outp_sclass[] = {
+	&gm204_sor_dp_impl.base.base,
+	NULL
+};
+
+struct nouveau_oclass *
+gm204_disp_oclass = &(struct nv50_disp_impl) {
+	.base.base.handle = NV_ENGINE(DISP, 0x07),
+	.base.base.ofuncs = &(struct nouveau_ofuncs) {
+		.ctor = gm204_disp_ctor,
+		.dtor = _nouveau_disp_dtor,
+		.init = _nouveau_disp_init,
+		.fini = _nouveau_disp_fini,
+	},
+	.base.vblank = &nvd0_disp_vblank_func,
+	.base.outp =  gm204_disp_outp_sclass,
+	.mthd.core = &nve0_disp_core_mthd_chan,
+	.mthd.base = &nvd0_disp_base_mthd_chan,
+	.mthd.ovly = &nve0_disp_ovly_mthd_chan,
+	.mthd.prev = -0x020000,
+	.head.scanoutpos = nvd0_disp_main_scanoutpos,
+}.base.base;
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/nv50.c b/drivers/gpu/drm/nouveau/core/engine/disp/nv50.c
index 2df3a937037d..44a8290aaea5 100644
--- a/drivers/gpu/drm/nouveau/core/engine/disp/nv50.c
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/nv50.c
@@ -88,12 +88,14 @@ nv50_disp_chan_uevent_fini(struct nvkm_event *event, int type, int index)
 {
 	struct nv50_disp_priv *priv = container_of(event, typeof(*priv), uevent);
 	nv_mask(priv, 0x610028, 0x00000001 << index, 0x00000000 << index);
+	nv_wr32(priv, 0x610020, 0x00000001 << index);
 }
 
 static void
 nv50_disp_chan_uevent_init(struct nvkm_event *event, int types, int index)
 {
 	struct nv50_disp_priv *priv = container_of(event, typeof(*priv), uevent);
+	nv_wr32(priv, 0x610020, 0x00000001 << index);
 	nv_mask(priv, 0x610028, 0x00000001 << index, 0x00000001 << index);
 }
 
@@ -374,7 +376,7 @@ nv50_disp_mthd_chan(struct nv50_disp_priv *priv, int debug, int head,
 }
 
 const struct nv50_disp_mthd_list
-nv50_disp_mast_mthd_base = {
+nv50_disp_core_mthd_base = {
 	.mthd = 0x0000,
 	.addr = 0x000000,
 	.data = {
@@ -387,7 +389,7 @@ nv50_disp_mast_mthd_base = {
 };
 
 static const struct nv50_disp_mthd_list
-nv50_disp_mast_mthd_dac = {
+nv50_disp_core_mthd_dac = {
 	.mthd = 0x0080,
 	.addr = 0x000008,
 	.data = {
@@ -399,7 +401,7 @@ nv50_disp_mast_mthd_dac = {
 };
 
 const struct nv50_disp_mthd_list
-nv50_disp_mast_mthd_sor = {
+nv50_disp_core_mthd_sor = {
 	.mthd = 0x0040,
 	.addr = 0x000008,
 	.data = {
@@ -409,7 +411,7 @@ nv50_disp_mast_mthd_sor = {
 };
 
 const struct nv50_disp_mthd_list
-nv50_disp_mast_mthd_pior = {
+nv50_disp_core_mthd_pior = {
 	.mthd = 0x0040,
 	.addr = 0x000008,
 	.data = {
@@ -419,7 +421,7 @@ nv50_disp_mast_mthd_pior = {
 };
 
 static const struct nv50_disp_mthd_list
-nv50_disp_mast_mthd_head = {
+nv50_disp_core_mthd_head = {
 	.mthd = 0x0400,
 	.addr = 0x000540,
 	.data = {
@@ -466,21 +468,21 @@ nv50_disp_mast_mthd_head = {
 };
 
 static const struct nv50_disp_mthd_chan
-nv50_disp_mast_mthd_chan = {
+nv50_disp_core_mthd_chan = {
 	.name = "Core",
 	.addr = 0x000000,
 	.data = {
-		{ "Global", 1, &nv50_disp_mast_mthd_base },
-		{    "DAC", 3, &nv50_disp_mast_mthd_dac  },
-		{    "SOR", 2, &nv50_disp_mast_mthd_sor  },
-		{   "PIOR", 3, &nv50_disp_mast_mthd_pior },
-		{   "HEAD", 2, &nv50_disp_mast_mthd_head },
+		{ "Global", 1, &nv50_disp_core_mthd_base },
+		{    "DAC", 3, &nv50_disp_core_mthd_dac  },
+		{    "SOR", 2, &nv50_disp_core_mthd_sor  },
+		{   "PIOR", 3, &nv50_disp_core_mthd_pior },
+		{   "HEAD", 2, &nv50_disp_core_mthd_head },
 		{}
 	}
 };
 
 int
-nv50_disp_mast_ctor(struct nouveau_object *parent,
+nv50_disp_core_ctor(struct nouveau_object *parent,
 		    struct nouveau_object *engine,
 		    struct nouveau_oclass *oclass, void *data, u32 size,
 		    struct nouveau_object **pobject)
@@ -509,7 +511,7 @@ nv50_disp_mast_ctor(struct nouveau_object *parent,
 }
 
 static int
-nv50_disp_mast_init(struct nouveau_object *object)
+nv50_disp_core_init(struct nouveau_object *object)
 {
 	struct nv50_disp_priv *priv = (void *)object->engine;
 	struct nv50_disp_dmac *mast = (void *)object;
@@ -546,7 +548,7 @@ nv50_disp_mast_init(struct nouveau_object *object)
 }
 
 static int
-nv50_disp_mast_fini(struct nouveau_object *object, bool suspend)
+nv50_disp_core_fini(struct nouveau_object *object, bool suspend)
 {
 	struct nv50_disp_priv *priv = (void *)object->engine;
 	struct nv50_disp_dmac *mast = (void *)object;
@@ -567,11 +569,11 @@ nv50_disp_mast_fini(struct nouveau_object *object, bool suspend)
 }
 
 struct nv50_disp_chan_impl
-nv50_disp_mast_ofuncs = {
-	.base.ctor = nv50_disp_mast_ctor,
+nv50_disp_core_ofuncs = {
+	.base.ctor = nv50_disp_core_ctor,
 	.base.dtor = nv50_disp_dmac_dtor,
-	.base.init = nv50_disp_mast_init,
-	.base.fini = nv50_disp_mast_fini,
+	.base.init = nv50_disp_core_init,
+	.base.fini = nv50_disp_core_fini,
 	.base.map  = nv50_disp_chan_map,
 	.base.ntfy = nv50_disp_chan_ntfy,
 	.base.rd32 = nv50_disp_chan_rd32,
@@ -586,7 +588,7 @@ nv50_disp_mast_ofuncs = {
  ******************************************************************************/
 
 static const struct nv50_disp_mthd_list
-nv50_disp_sync_mthd_base = {
+nv50_disp_base_mthd_base = {
 	.mthd = 0x0000,
 	.addr = 0x000000,
 	.data = {
@@ -611,7 +613,7 @@ nv50_disp_sync_mthd_base = {
 };
 
 const struct nv50_disp_mthd_list
-nv50_disp_sync_mthd_image = {
+nv50_disp_base_mthd_image = {
 	.mthd = 0x0400,
 	.addr = 0x000000,
 	.data = {
@@ -625,18 +627,18 @@ nv50_disp_sync_mthd_image = {
 };
 
 static const struct nv50_disp_mthd_chan
-nv50_disp_sync_mthd_chan = {
+nv50_disp_base_mthd_chan = {
 	.name = "Base",
 	.addr = 0x000540,
 	.data = {
-		{ "Global", 1, &nv50_disp_sync_mthd_base },
-		{  "Image", 2, &nv50_disp_sync_mthd_image },
+		{ "Global", 1, &nv50_disp_base_mthd_base },
+		{  "Image", 2, &nv50_disp_base_mthd_image },
 		{}
 	}
 };
 
 int
-nv50_disp_sync_ctor(struct nouveau_object *parent,
+nv50_disp_base_ctor(struct nouveau_object *parent,
 		    struct nouveau_object *engine,
 		    struct nouveau_oclass *oclass, void *data, u32 size,
 		    struct nouveau_object **pobject)
@@ -669,8 +671,8 @@ nv50_disp_sync_ctor(struct nouveau_object *parent,
 }
 
 struct nv50_disp_chan_impl
-nv50_disp_sync_ofuncs = {
-	.base.ctor = nv50_disp_sync_ctor,
+nv50_disp_base_ofuncs = {
+	.base.ctor = nv50_disp_base_ctor,
 	.base.dtor = nv50_disp_dmac_dtor,
 	.base.init = nv50_disp_dmac_init,
 	.base.fini = nv50_disp_dmac_fini,
@@ -942,7 +944,7 @@ nv50_disp_curs_ofuncs = {
  ******************************************************************************/
 
 int
-nv50_disp_base_scanoutpos(NV50_DISP_MTHD_V0)
+nv50_disp_main_scanoutpos(NV50_DISP_MTHD_V0)
 {
 	const u32 blanke = nv_rd32(priv, 0x610aec + (head * 0x540));
 	const u32 blanks = nv_rd32(priv, 0x610af4 + (head * 0x540));
@@ -974,7 +976,7 @@ nv50_disp_base_scanoutpos(NV50_DISP_MTHD_V0)
 }
 
 int
-nv50_disp_base_mthd(struct nouveau_object *object, u32 mthd,
+nv50_disp_main_mthd(struct nouveau_object *object, u32 mthd,
 		    void *data, u32 size)
 {
 	const struct nv50_disp_impl *impl = (void *)nv_oclass(object->engine);
@@ -1098,7 +1100,7 @@ nv50_disp_base_mthd(struct nouveau_object *object, u32 mthd,
 }
 
 int
-nv50_disp_base_ctor(struct nouveau_object *parent,
+nv50_disp_main_ctor(struct nouveau_object *parent,
 		    struct nouveau_object *engine,
 		    struct nouveau_oclass *oclass, void *data, u32 size,
 		    struct nouveau_object **pobject)
@@ -1118,7 +1120,7 @@ nv50_disp_base_ctor(struct nouveau_object *parent,
 }
 
 void
-nv50_disp_base_dtor(struct nouveau_object *object)
+nv50_disp_main_dtor(struct nouveau_object *object)
 {
 	struct nv50_disp_base *base = (void *)object;
 	nouveau_ramht_ref(NULL, &base->ramht);
@@ -1126,7 +1128,7 @@ nv50_disp_base_dtor(struct nouveau_object *object)
 }
 
 static int
-nv50_disp_base_init(struct nouveau_object *object)
+nv50_disp_main_init(struct nouveau_object *object)
 {
 	struct nv50_disp_priv *priv = (void *)object->engine;
 	struct nv50_disp_base *base = (void *)object;
@@ -1194,7 +1196,7 @@ nv50_disp_base_init(struct nouveau_object *object)
 }
 
 static int
-nv50_disp_base_fini(struct nouveau_object *object, bool suspend)
+nv50_disp_main_fini(struct nouveau_object *object, bool suspend)
 {
 	struct nv50_disp_priv *priv = (void *)object->engine;
 	struct nv50_disp_base *base = (void *)object;
@@ -1207,25 +1209,25 @@ nv50_disp_base_fini(struct nouveau_object *object, bool suspend)
 }
 
 struct nouveau_ofuncs
-nv50_disp_base_ofuncs = {
-	.ctor = nv50_disp_base_ctor,
-	.dtor = nv50_disp_base_dtor,
-	.init = nv50_disp_base_init,
-	.fini = nv50_disp_base_fini,
-	.mthd = nv50_disp_base_mthd,
+nv50_disp_main_ofuncs = {
+	.ctor = nv50_disp_main_ctor,
+	.dtor = nv50_disp_main_dtor,
+	.init = nv50_disp_main_init,
+	.fini = nv50_disp_main_fini,
+	.mthd = nv50_disp_main_mthd,
 	.ntfy = nouveau_disp_ntfy,
 };
 
 static struct nouveau_oclass
-nv50_disp_base_oclass[] = {
-	{ NV50_DISP, &nv50_disp_base_ofuncs },
+nv50_disp_main_oclass[] = {
+	{ NV50_DISP, &nv50_disp_main_ofuncs },
 	{}
 };
 
 static struct nouveau_oclass
 nv50_disp_sclass[] = {
-	{ NV50_DISP_CORE_CHANNEL_DMA, &nv50_disp_mast_ofuncs.base },
-	{ NV50_DISP_BASE_CHANNEL_DMA, &nv50_disp_sync_ofuncs.base },
+	{ NV50_DISP_CORE_CHANNEL_DMA, &nv50_disp_core_ofuncs.base },
+	{ NV50_DISP_BASE_CHANNEL_DMA, &nv50_disp_base_ofuncs.base },
 	{ NV50_DISP_OVERLAY_CHANNEL_DMA, &nv50_disp_ovly_ofuncs.base },
 	{ NV50_DISP_OVERLAY, &nv50_disp_oimm_ofuncs.base },
 	{ NV50_DISP_CURSOR, &nv50_disp_curs_ofuncs.base },
@@ -1974,7 +1976,7 @@ nv50_disp_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 	if (ret)
 		return ret;
 
-	nv_engine(priv)->sclass = nv50_disp_base_oclass;
+	nv_engine(priv)->sclass = nv50_disp_main_oclass;
 	nv_engine(priv)->cclass = &nv50_disp_cclass;
 	nv_subdev(priv)->intr = nv50_disp_intr;
 	INIT_WORK(&priv->supervisor, nv50_disp_intr_supervisor);
@@ -2007,9 +2009,9 @@ nv50_disp_oclass = &(struct nv50_disp_impl) {
 	},
 	.base.vblank = &nv50_disp_vblank_func,
 	.base.outp =  nv50_disp_outp_sclass,
-	.mthd.core = &nv50_disp_mast_mthd_chan,
-	.mthd.base = &nv50_disp_sync_mthd_chan,
+	.mthd.core = &nv50_disp_core_mthd_chan,
+	.mthd.base = &nv50_disp_base_mthd_chan,
 	.mthd.ovly = &nv50_disp_ovly_mthd_chan,
 	.mthd.prev = 0x000004,
-	.head.scanoutpos = nv50_disp_base_scanoutpos,
+	.head.scanoutpos = nv50_disp_main_scanoutpos,
 }.base.base;
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/nv50.h b/drivers/gpu/drm/nouveau/core/engine/disp/nv50.h
index 5279feefec06..7f08078ee925 100644
--- a/drivers/gpu/drm/nouveau/core/engine/disp/nv50.h
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/nv50.h
@@ -42,6 +42,7 @@ struct nv50_disp_priv {
 		int (*hda_eld)(NV50_DISP_MTHD_V1);
 		int (*hdmi)(NV50_DISP_MTHD_V1);
 		u32 lvdsconf;
+		void (*magic)(struct nvkm_output *);
 	} sor;
 	struct {
 		int nr;
@@ -63,10 +64,10 @@ struct nv50_disp_impl {
 	} head;
 };
 
-int nv50_disp_base_scanoutpos(NV50_DISP_MTHD_V0);
-int nv50_disp_base_mthd(struct nouveau_object *, u32, void *, u32);
+int nv50_disp_main_scanoutpos(NV50_DISP_MTHD_V0);
+int nv50_disp_main_mthd(struct nouveau_object *, u32, void *, u32);
 
-int nvd0_disp_base_scanoutpos(NV50_DISP_MTHD_V0);
+int nvd0_disp_main_scanoutpos(NV50_DISP_MTHD_V0);
 
 int nv50_dac_power(NV50_DISP_MTHD_V1);
 int nv50_dac_sense(NV50_DISP_MTHD_V1);
@@ -169,18 +170,18 @@ struct nv50_disp_mthd_chan {
 	} data[];
 };
 
-extern struct nv50_disp_chan_impl nv50_disp_mast_ofuncs;
-int nv50_disp_mast_ctor(struct nouveau_object *, struct nouveau_object *,
+extern struct nv50_disp_chan_impl nv50_disp_core_ofuncs;
+int nv50_disp_core_ctor(struct nouveau_object *, struct nouveau_object *,
 			struct nouveau_oclass *, void *, u32,
 			struct nouveau_object **);
-extern const struct nv50_disp_mthd_list nv50_disp_mast_mthd_base;
-extern const struct nv50_disp_mthd_list nv50_disp_mast_mthd_sor;
-extern const struct nv50_disp_mthd_list nv50_disp_mast_mthd_pior;
-extern struct nv50_disp_chan_impl nv50_disp_sync_ofuncs;
-int nv50_disp_sync_ctor(struct nouveau_object *, struct nouveau_object *,
+extern const struct nv50_disp_mthd_list nv50_disp_core_mthd_base;
+extern const struct nv50_disp_mthd_list nv50_disp_core_mthd_sor;
+extern const struct nv50_disp_mthd_list nv50_disp_core_mthd_pior;
+extern struct nv50_disp_chan_impl nv50_disp_base_ofuncs;
+int nv50_disp_base_ctor(struct nouveau_object *, struct nouveau_object *,
 			struct nouveau_oclass *, void *, u32,
 			struct nouveau_object **);
-extern const struct nv50_disp_mthd_list nv50_disp_sync_mthd_image;
+extern const struct nv50_disp_mthd_list nv50_disp_base_mthd_image;
 extern struct nv50_disp_chan_impl nv50_disp_ovly_ofuncs;
 int nv50_disp_ovly_ctor(struct nouveau_object *, struct nouveau_object *,
 			struct nouveau_oclass *, void *, u32,
@@ -194,12 +195,12 @@ extern struct nv50_disp_chan_impl nv50_disp_curs_ofuncs;
 int nv50_disp_curs_ctor(struct nouveau_object *, struct nouveau_object *,
 			struct nouveau_oclass *, void *, u32,
 			struct nouveau_object **);
-extern struct nouveau_ofuncs nv50_disp_base_ofuncs;
-int  nv50_disp_base_ctor(struct nouveau_object *, struct nouveau_object *,
+extern struct nouveau_ofuncs nv50_disp_main_ofuncs;
+int  nv50_disp_main_ctor(struct nouveau_object *, struct nouveau_object *,
 			 struct nouveau_oclass *, void *, u32,
 			 struct nouveau_object **);
-void nv50_disp_base_dtor(struct nouveau_object *);
-extern struct nouveau_omthds nv50_disp_base_omthds[];
+void nv50_disp_main_dtor(struct nouveau_object *);
+extern struct nouveau_omthds nv50_disp_main_omthds[];
 extern struct nouveau_oclass nv50_disp_cclass;
 void nv50_disp_mthd_chan(struct nv50_disp_priv *, int debug, int head,
 			 const struct nv50_disp_mthd_chan *);
@@ -207,31 +208,31 @@ void nv50_disp_intr_supervisor(struct work_struct *);
 void nv50_disp_intr(struct nouveau_subdev *);
 extern const struct nvkm_event_func nv50_disp_vblank_func;
 
-extern const struct nv50_disp_mthd_chan nv84_disp_mast_mthd_chan;
-extern const struct nv50_disp_mthd_list nv84_disp_mast_mthd_dac;
-extern const struct nv50_disp_mthd_list nv84_disp_mast_mthd_head;
-extern const struct nv50_disp_mthd_chan nv84_disp_sync_mthd_chan;
+extern const struct nv50_disp_mthd_chan nv84_disp_core_mthd_chan;
+extern const struct nv50_disp_mthd_list nv84_disp_core_mthd_dac;
+extern const struct nv50_disp_mthd_list nv84_disp_core_mthd_head;
+extern const struct nv50_disp_mthd_chan nv84_disp_base_mthd_chan;
 extern const struct nv50_disp_mthd_chan nv84_disp_ovly_mthd_chan;
 
-extern const struct nv50_disp_mthd_chan nv94_disp_mast_mthd_chan;
+extern const struct nv50_disp_mthd_chan nv94_disp_core_mthd_chan;
 
-extern struct nv50_disp_chan_impl nvd0_disp_mast_ofuncs;
-extern const struct nv50_disp_mthd_list nvd0_disp_mast_mthd_base;
-extern const struct nv50_disp_mthd_list nvd0_disp_mast_mthd_dac;
-extern const struct nv50_disp_mthd_list nvd0_disp_mast_mthd_sor;
-extern const struct nv50_disp_mthd_list nvd0_disp_mast_mthd_pior;
-extern struct nv50_disp_chan_impl nvd0_disp_sync_ofuncs;
+extern struct nv50_disp_chan_impl nvd0_disp_core_ofuncs;
+extern const struct nv50_disp_mthd_list nvd0_disp_core_mthd_base;
+extern const struct nv50_disp_mthd_list nvd0_disp_core_mthd_dac;
+extern const struct nv50_disp_mthd_list nvd0_disp_core_mthd_sor;
+extern const struct nv50_disp_mthd_list nvd0_disp_core_mthd_pior;
+extern struct nv50_disp_chan_impl nvd0_disp_base_ofuncs;
 extern struct nv50_disp_chan_impl nvd0_disp_ovly_ofuncs;
-extern const struct nv50_disp_mthd_chan nvd0_disp_sync_mthd_chan;
+extern const struct nv50_disp_mthd_chan nvd0_disp_base_mthd_chan;
 extern struct nv50_disp_chan_impl nvd0_disp_oimm_ofuncs;
 extern struct nv50_disp_chan_impl nvd0_disp_curs_ofuncs;
-extern struct nouveau_ofuncs nvd0_disp_base_ofuncs;
+extern struct nouveau_ofuncs nvd0_disp_main_ofuncs;
 extern struct nouveau_oclass nvd0_disp_cclass;
 void nvd0_disp_intr_supervisor(struct work_struct *);
 void nvd0_disp_intr(struct nouveau_subdev *);
 extern const struct nvkm_event_func nvd0_disp_vblank_func;
 
-extern const struct nv50_disp_mthd_chan nve0_disp_mast_mthd_chan;
+extern const struct nv50_disp_mthd_chan nve0_disp_core_mthd_chan;
 extern const struct nv50_disp_mthd_chan nve0_disp_ovly_mthd_chan;
 
 extern struct nvkm_output_dp_impl nv50_pior_dp_impl;
@@ -242,6 +243,10 @@ int nv94_sor_dp_lnk_pwr(struct nvkm_output_dp *, int);
 extern struct nouveau_oclass *nv94_disp_outp_sclass[];
 
 extern struct nvkm_output_dp_impl nvd0_sor_dp_impl;
+int nvd0_sor_dp_lnk_ctl(struct nvkm_output_dp *, int, int, bool);
 extern struct nouveau_oclass *nvd0_disp_outp_sclass[];
 
+void gm204_sor_magic(struct nvkm_output *outp);
+extern struct nvkm_output_dp_impl gm204_sor_dp_impl;
+
 #endif
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/nv84.c b/drivers/gpu/drm/nouveau/core/engine/disp/nv84.c
index d36284715b2a..13eff5e4ee51 100644
--- a/drivers/gpu/drm/nouveau/core/engine/disp/nv84.c
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/nv84.c
@@ -34,7 +34,7 @@
  ******************************************************************************/
 
 const struct nv50_disp_mthd_list
-nv84_disp_mast_mthd_dac = {
+nv84_disp_core_mthd_dac = {
 	.mthd = 0x0080,
 	.addr = 0x000008,
 	.data = {
@@ -46,7 +46,7 @@ nv84_disp_mast_mthd_dac = {
 };
 
 const struct nv50_disp_mthd_list
-nv84_disp_mast_mthd_head = {
+nv84_disp_core_mthd_head = {
 	.mthd = 0x0400,
 	.addr = 0x000540,
 	.data = {
@@ -98,15 +98,15 @@ nv84_disp_mast_mthd_head = {
 };
 
 const struct nv50_disp_mthd_chan
-nv84_disp_mast_mthd_chan = {
+nv84_disp_core_mthd_chan = {
 	.name = "Core",
 	.addr = 0x000000,
 	.data = {
-		{ "Global", 1, &nv50_disp_mast_mthd_base },
-		{    "DAC", 3, &nv84_disp_mast_mthd_dac  },
-		{    "SOR", 2, &nv50_disp_mast_mthd_sor  },
-		{   "PIOR", 3, &nv50_disp_mast_mthd_pior },
-		{   "HEAD", 2, &nv84_disp_mast_mthd_head },
+		{ "Global", 1, &nv50_disp_core_mthd_base },
+		{    "DAC", 3, &nv84_disp_core_mthd_dac  },
+		{    "SOR", 2, &nv50_disp_core_mthd_sor  },
+		{   "PIOR", 3, &nv50_disp_core_mthd_pior },
+		{   "HEAD", 2, &nv84_disp_core_mthd_head },
 		{}
 	}
 };
@@ -116,7 +116,7 @@ nv84_disp_mast_mthd_chan = {
  ******************************************************************************/
 
 static const struct nv50_disp_mthd_list
-nv84_disp_sync_mthd_base = {
+nv84_disp_base_mthd_base = {
 	.mthd = 0x0000,
 	.addr = 0x000000,
 	.data = {
@@ -146,12 +146,12 @@ nv84_disp_sync_mthd_base = {
 };
 
 const struct nv50_disp_mthd_chan
-nv84_disp_sync_mthd_chan = {
+nv84_disp_base_mthd_chan = {
 	.name = "Base",
 	.addr = 0x000540,
 	.data = {
-		{ "Global", 1, &nv84_disp_sync_mthd_base },
-		{  "Image", 2, &nv50_disp_sync_mthd_image },
+		{ "Global", 1, &nv84_disp_base_mthd_base },
+		{  "Image", 2, &nv50_disp_base_mthd_image },
 		{}
 	}
 };
@@ -204,8 +204,8 @@ nv84_disp_ovly_mthd_chan = {
 
 static struct nouveau_oclass
 nv84_disp_sclass[] = {
-	{ G82_DISP_CORE_CHANNEL_DMA, &nv50_disp_mast_ofuncs.base },
-	{ G82_DISP_BASE_CHANNEL_DMA, &nv50_disp_sync_ofuncs.base },
+	{ G82_DISP_CORE_CHANNEL_DMA, &nv50_disp_core_ofuncs.base },
+	{ G82_DISP_BASE_CHANNEL_DMA, &nv50_disp_base_ofuncs.base },
 	{ G82_DISP_OVERLAY_CHANNEL_DMA, &nv50_disp_ovly_ofuncs.base },
 	{ G82_DISP_OVERLAY, &nv50_disp_oimm_ofuncs.base },
 	{ G82_DISP_CURSOR, &nv50_disp_curs_ofuncs.base },
@@ -213,8 +213,8 @@ nv84_disp_sclass[] = {
 };
 
 static struct nouveau_oclass
-nv84_disp_base_oclass[] = {
-	{ G82_DISP, &nv50_disp_base_ofuncs },
+nv84_disp_main_oclass[] = {
+	{ G82_DISP, &nv50_disp_main_ofuncs },
 	{}
 };
 
@@ -240,7 +240,7 @@ nv84_disp_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 	if (ret)
 		return ret;
 
-	nv_engine(priv)->sclass = nv84_disp_base_oclass;
+	nv_engine(priv)->sclass = nv84_disp_main_oclass;
 	nv_engine(priv)->cclass = &nv50_disp_cclass;
 	nv_subdev(priv)->intr = nv50_disp_intr;
 	INIT_WORK(&priv->supervisor, nv50_disp_intr_supervisor);
@@ -268,9 +268,9 @@ nv84_disp_oclass = &(struct nv50_disp_impl) {
 	},
 	.base.vblank = &nv50_disp_vblank_func,
 	.base.outp =  nv50_disp_outp_sclass,
-	.mthd.core = &nv84_disp_mast_mthd_chan,
-	.mthd.base = &nv84_disp_sync_mthd_chan,
+	.mthd.core = &nv84_disp_core_mthd_chan,
+	.mthd.base = &nv84_disp_base_mthd_chan,
 	.mthd.ovly = &nv84_disp_ovly_mthd_chan,
 	.mthd.prev = 0x000004,
-	.head.scanoutpos = nv50_disp_base_scanoutpos,
+	.head.scanoutpos = nv50_disp_main_scanoutpos,
 }.base.base;
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/nv94.c b/drivers/gpu/drm/nouveau/core/engine/disp/nv94.c
index a117064002b1..2bb7ac5cd0e6 100644
--- a/drivers/gpu/drm/nouveau/core/engine/disp/nv94.c
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/nv94.c
@@ -34,7 +34,7 @@
  ******************************************************************************/
 
 const struct nv50_disp_mthd_list
-nv94_disp_mast_mthd_sor = {
+nv94_disp_core_mthd_sor = {
 	.mthd = 0x0040,
 	.addr = 0x000008,
 	.data = {
@@ -44,15 +44,15 @@ nv94_disp_mast_mthd_sor = {
 };
 
 const struct nv50_disp_mthd_chan
-nv94_disp_mast_mthd_chan = {
+nv94_disp_core_mthd_chan = {
 	.name = "Core",
 	.addr = 0x000000,
 	.data = {
-		{ "Global", 1, &nv50_disp_mast_mthd_base },
-		{    "DAC", 3, &nv84_disp_mast_mthd_dac  },
-		{    "SOR", 4, &nv94_disp_mast_mthd_sor  },
-		{   "PIOR", 3, &nv50_disp_mast_mthd_pior },
-		{   "HEAD", 2, &nv84_disp_mast_mthd_head },
+		{ "Global", 1, &nv50_disp_core_mthd_base },
+		{    "DAC", 3, &nv84_disp_core_mthd_dac  },
+		{    "SOR", 4, &nv94_disp_core_mthd_sor  },
+		{   "PIOR", 3, &nv50_disp_core_mthd_pior },
+		{   "HEAD", 2, &nv84_disp_core_mthd_head },
 		{}
 	}
 };
@@ -63,8 +63,8 @@ nv94_disp_mast_mthd_chan = {
 
 static struct nouveau_oclass
 nv94_disp_sclass[] = {
-	{ GT206_DISP_CORE_CHANNEL_DMA, &nv50_disp_mast_ofuncs.base },
-	{ GT200_DISP_BASE_CHANNEL_DMA, &nv50_disp_sync_ofuncs.base },
+	{ GT206_DISP_CORE_CHANNEL_DMA, &nv50_disp_core_ofuncs.base },
+	{ GT200_DISP_BASE_CHANNEL_DMA, &nv50_disp_base_ofuncs.base },
 	{ GT200_DISP_OVERLAY_CHANNEL_DMA, &nv50_disp_ovly_ofuncs.base },
 	{ G82_DISP_OVERLAY, &nv50_disp_oimm_ofuncs.base },
 	{ G82_DISP_CURSOR, &nv50_disp_curs_ofuncs.base },
@@ -72,8 +72,8 @@ nv94_disp_sclass[] = {
 };
 
 static struct nouveau_oclass
-nv94_disp_base_oclass[] = {
-	{ GT206_DISP, &nv50_disp_base_ofuncs },
+nv94_disp_main_oclass[] = {
+	{ GT206_DISP, &nv50_disp_main_ofuncs },
 	{}
 };
 
@@ -99,7 +99,7 @@ nv94_disp_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 	if (ret)
 		return ret;
 
-	nv_engine(priv)->sclass = nv94_disp_base_oclass;
+	nv_engine(priv)->sclass = nv94_disp_main_oclass;
 	nv_engine(priv)->cclass = &nv50_disp_cclass;
 	nv_subdev(priv)->intr = nv50_disp_intr;
 	INIT_WORK(&priv->supervisor, nv50_disp_intr_supervisor);
@@ -134,9 +134,9 @@ nv94_disp_oclass = &(struct nv50_disp_impl) {
 	},
 	.base.vblank = &nv50_disp_vblank_func,
 	.base.outp =  nv94_disp_outp_sclass,
-	.mthd.core = &nv94_disp_mast_mthd_chan,
-	.mthd.base = &nv84_disp_sync_mthd_chan,
+	.mthd.core = &nv94_disp_core_mthd_chan,
+	.mthd.base = &nv84_disp_base_mthd_chan,
 	.mthd.ovly = &nv84_disp_ovly_mthd_chan,
 	.mthd.prev = 0x000004,
-	.head.scanoutpos = nv50_disp_base_scanoutpos,
+	.head.scanoutpos = nv50_disp_main_scanoutpos,
 }.base.base;
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/nva0.c b/drivers/gpu/drm/nouveau/core/engine/disp/nva0.c
index c67e68aadd45..b32456c9494f 100644
--- a/drivers/gpu/drm/nouveau/core/engine/disp/nva0.c
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/nva0.c
@@ -80,8 +80,8 @@ nva0_disp_ovly_mthd_chan = {
 
 static struct nouveau_oclass
 nva0_disp_sclass[] = {
-	{ GT200_DISP_CORE_CHANNEL_DMA, &nv50_disp_mast_ofuncs.base },
-	{ GT200_DISP_BASE_CHANNEL_DMA, &nv50_disp_sync_ofuncs.base },
+	{ GT200_DISP_CORE_CHANNEL_DMA, &nv50_disp_core_ofuncs.base },
+	{ GT200_DISP_BASE_CHANNEL_DMA, &nv50_disp_base_ofuncs.base },
 	{ GT200_DISP_OVERLAY_CHANNEL_DMA, &nv50_disp_ovly_ofuncs.base },
 	{ G82_DISP_OVERLAY, &nv50_disp_oimm_ofuncs.base },
 	{ G82_DISP_CURSOR, &nv50_disp_curs_ofuncs.base },
@@ -89,8 +89,8 @@ nva0_disp_sclass[] = {
 };
 
 static struct nouveau_oclass
-nva0_disp_base_oclass[] = {
-	{ GT200_DISP, &nv50_disp_base_ofuncs },
+nva0_disp_main_oclass[] = {
+	{ GT200_DISP, &nv50_disp_main_ofuncs },
 	{}
 };
 
@@ -116,7 +116,7 @@ nva0_disp_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 	if (ret)
 		return ret;
 
-	nv_engine(priv)->sclass = nva0_disp_base_oclass;
+	nv_engine(priv)->sclass = nva0_disp_main_oclass;
 	nv_engine(priv)->cclass = &nv50_disp_cclass;
 	nv_subdev(priv)->intr = nv50_disp_intr;
 	INIT_WORK(&priv->supervisor, nv50_disp_intr_supervisor);
@@ -144,9 +144,9 @@ nva0_disp_oclass = &(struct nv50_disp_impl) {
 	},
 	.base.vblank = &nv50_disp_vblank_func,
 	.base.outp =  nv50_disp_outp_sclass,
-	.mthd.core = &nv84_disp_mast_mthd_chan,
-	.mthd.base = &nv84_disp_sync_mthd_chan,
+	.mthd.core = &nv84_disp_core_mthd_chan,
+	.mthd.base = &nv84_disp_base_mthd_chan,
 	.mthd.ovly = &nva0_disp_ovly_mthd_chan,
 	.mthd.prev = 0x000004,
-	.head.scanoutpos = nv50_disp_base_scanoutpos,
+	.head.scanoutpos = nv50_disp_main_scanoutpos,
 }.base.base;
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/nva3.c b/drivers/gpu/drm/nouveau/core/engine/disp/nva3.c
index 22969f355aae..951d79f9b781 100644
--- a/drivers/gpu/drm/nouveau/core/engine/disp/nva3.c
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/nva3.c
@@ -35,8 +35,8 @@
 
 static struct nouveau_oclass
 nva3_disp_sclass[] = {
-	{ GT214_DISP_CORE_CHANNEL_DMA, &nv50_disp_mast_ofuncs.base },
-	{ GT214_DISP_BASE_CHANNEL_DMA, &nv50_disp_sync_ofuncs.base },
+	{ GT214_DISP_CORE_CHANNEL_DMA, &nv50_disp_core_ofuncs.base },
+	{ GT214_DISP_BASE_CHANNEL_DMA, &nv50_disp_base_ofuncs.base },
 	{ GT214_DISP_OVERLAY_CHANNEL_DMA, &nv50_disp_ovly_ofuncs.base },
 	{ GT214_DISP_OVERLAY, &nv50_disp_oimm_ofuncs.base },
 	{ GT214_DISP_CURSOR, &nv50_disp_curs_ofuncs.base },
@@ -44,8 +44,8 @@ nva3_disp_sclass[] = {
 };
 
 static struct nouveau_oclass
-nva3_disp_base_oclass[] = {
-	{ GT214_DISP, &nv50_disp_base_ofuncs },
+nva3_disp_main_oclass[] = {
+	{ GT214_DISP, &nv50_disp_main_ofuncs },
 	{}
 };
 
@@ -71,7 +71,7 @@ nva3_disp_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 	if (ret)
 		return ret;
 
-	nv_engine(priv)->sclass = nva3_disp_base_oclass;
+	nv_engine(priv)->sclass = nva3_disp_main_oclass;
 	nv_engine(priv)->cclass = &nv50_disp_cclass;
 	nv_subdev(priv)->intr = nv50_disp_intr;
 	INIT_WORK(&priv->supervisor, nv50_disp_intr_supervisor);
@@ -100,9 +100,9 @@ nva3_disp_oclass = &(struct nv50_disp_impl) {
 	},
 	.base.vblank = &nv50_disp_vblank_func,
 	.base.outp =  nv94_disp_outp_sclass,
-	.mthd.core = &nv94_disp_mast_mthd_chan,
-	.mthd.base = &nv84_disp_sync_mthd_chan,
+	.mthd.core = &nv94_disp_core_mthd_chan,
+	.mthd.base = &nv84_disp_base_mthd_chan,
 	.mthd.ovly = &nv84_disp_ovly_mthd_chan,
 	.mthd.prev = 0x000004,
-	.head.scanoutpos = nv50_disp_base_scanoutpos,
+	.head.scanoutpos = nv50_disp_main_scanoutpos,
 }.base.base;
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/nvd0.c b/drivers/gpu/drm/nouveau/core/engine/disp/nvd0.c
index 747e64bb9c06..181a2d57e356 100644
--- a/drivers/gpu/drm/nouveau/core/engine/disp/nvd0.c
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/nvd0.c
@@ -51,12 +51,14 @@ nvd0_disp_chan_uevent_fini(struct nvkm_event *event, int type, int index)
 {
 	struct nv50_disp_priv *priv = container_of(event, typeof(*priv), uevent);
 	nv_mask(priv, 0x610090, 0x00000001 << index, 0x00000000 << index);
+	nv_wr32(priv, 0x61008c, 0x00000001 << index);
 }
 
 static void
 nvd0_disp_chan_uevent_init(struct nvkm_event *event, int types, int index)
 {
 	struct nv50_disp_priv *priv = container_of(event, typeof(*priv), uevent);
+	nv_wr32(priv, 0x61008c, 0x00000001 << index);
 	nv_mask(priv, 0x610090, 0x00000001 << index, 0x00000001 << index);
 }
 
@@ -151,7 +153,7 @@ nvd0_disp_dmac_fini(struct nouveau_object *object, bool suspend)
  ******************************************************************************/
 
 const struct nv50_disp_mthd_list
-nvd0_disp_mast_mthd_base = {
+nvd0_disp_core_mthd_base = {
 	.mthd = 0x0000,
 	.addr = 0x000000,
 	.data = {
@@ -164,7 +166,7 @@ nvd0_disp_mast_mthd_base = {
 };
 
 const struct nv50_disp_mthd_list
-nvd0_disp_mast_mthd_dac = {
+nvd0_disp_core_mthd_dac = {
 	.mthd = 0x0020,
 	.addr = 0x000020,
 	.data = {
@@ -177,7 +179,7 @@ nvd0_disp_mast_mthd_dac = {
 };
 
 const struct nv50_disp_mthd_list
-nvd0_disp_mast_mthd_sor = {
+nvd0_disp_core_mthd_sor = {
 	.mthd = 0x0020,
 	.addr = 0x000020,
 	.data = {
@@ -190,7 +192,7 @@ nvd0_disp_mast_mthd_sor = {
 };
 
 const struct nv50_disp_mthd_list
-nvd0_disp_mast_mthd_pior = {
+nvd0_disp_core_mthd_pior = {
 	.mthd = 0x0020,
 	.addr = 0x000020,
 	.data = {
@@ -203,7 +205,7 @@ nvd0_disp_mast_mthd_pior = {
 };
 
 static const struct nv50_disp_mthd_list
-nvd0_disp_mast_mthd_head = {
+nvd0_disp_core_mthd_head = {
 	.mthd = 0x0300,
 	.addr = 0x000300,
 	.data = {
@@ -277,21 +279,21 @@ nvd0_disp_mast_mthd_head = {
 };
 
 static const struct nv50_disp_mthd_chan
-nvd0_disp_mast_mthd_chan = {
+nvd0_disp_core_mthd_chan = {
 	.name = "Core",
 	.addr = 0x000000,
 	.data = {
-		{ "Global", 1, &nvd0_disp_mast_mthd_base },
-		{    "DAC", 3, &nvd0_disp_mast_mthd_dac  },
-		{    "SOR", 8, &nvd0_disp_mast_mthd_sor  },
-		{   "PIOR", 4, &nvd0_disp_mast_mthd_pior },
-		{   "HEAD", 4, &nvd0_disp_mast_mthd_head },
+		{ "Global", 1, &nvd0_disp_core_mthd_base },
+		{    "DAC", 3, &nvd0_disp_core_mthd_dac  },
+		{    "SOR", 8, &nvd0_disp_core_mthd_sor  },
+		{   "PIOR", 4, &nvd0_disp_core_mthd_pior },
+		{   "HEAD", 4, &nvd0_disp_core_mthd_head },
 		{}
 	}
 };
 
 static int
-nvd0_disp_mast_init(struct nouveau_object *object)
+nvd0_disp_core_init(struct nouveau_object *object)
 {
 	struct nv50_disp_priv *priv = (void *)object->engine;
 	struct nv50_disp_dmac *mast = (void *)object;
@@ -322,7 +324,7 @@ nvd0_disp_mast_init(struct nouveau_object *object)
 }
 
 static int
-nvd0_disp_mast_fini(struct nouveau_object *object, bool suspend)
+nvd0_disp_core_fini(struct nouveau_object *object, bool suspend)
 {
 	struct nv50_disp_priv *priv = (void *)object->engine;
 	struct nv50_disp_dmac *mast = (void *)object;
@@ -344,11 +346,11 @@ nvd0_disp_mast_fini(struct nouveau_object *object, bool suspend)
 }
 
 struct nv50_disp_chan_impl
-nvd0_disp_mast_ofuncs = {
-	.base.ctor = nv50_disp_mast_ctor,
+nvd0_disp_core_ofuncs = {
+	.base.ctor = nv50_disp_core_ctor,
 	.base.dtor = nv50_disp_dmac_dtor,
-	.base.init = nvd0_disp_mast_init,
-	.base.fini = nvd0_disp_mast_fini,
+	.base.init = nvd0_disp_core_init,
+	.base.fini = nvd0_disp_core_fini,
 	.base.ntfy = nv50_disp_chan_ntfy,
 	.base.map  = nv50_disp_chan_map,
 	.base.rd32 = nv50_disp_chan_rd32,
@@ -363,7 +365,7 @@ nvd0_disp_mast_ofuncs = {
  ******************************************************************************/
 
 static const struct nv50_disp_mthd_list
-nvd0_disp_sync_mthd_base = {
+nvd0_disp_base_mthd_base = {
 	.mthd = 0x0000,
 	.addr = 0x000000,
 	.data = {
@@ -413,7 +415,7 @@ nvd0_disp_sync_mthd_base = {
 };
 
 static const struct nv50_disp_mthd_list
-nvd0_disp_sync_mthd_image = {
+nvd0_disp_base_mthd_image = {
 	.mthd = 0x0400,
 	.addr = 0x000400,
 	.data = {
@@ -427,19 +429,19 @@ nvd0_disp_sync_mthd_image = {
 };
 
 const struct nv50_disp_mthd_chan
-nvd0_disp_sync_mthd_chan = {
+nvd0_disp_base_mthd_chan = {
 	.name = "Base",
 	.addr = 0x001000,
 	.data = {
-		{ "Global", 1, &nvd0_disp_sync_mthd_base },
-		{  "Image", 2, &nvd0_disp_sync_mthd_image },
+		{ "Global", 1, &nvd0_disp_base_mthd_base },
+		{  "Image", 2, &nvd0_disp_base_mthd_image },
 		{}
 	}
 };
 
 struct nv50_disp_chan_impl
-nvd0_disp_sync_ofuncs = {
-	.base.ctor = nv50_disp_sync_ctor,
+nvd0_disp_base_ofuncs = {
+	.base.ctor = nv50_disp_base_ctor,
 	.base.dtor = nv50_disp_dmac_dtor,
 	.base.init = nvd0_disp_dmac_init,
 	.base.fini = nvd0_disp_dmac_fini,
@@ -624,7 +626,7 @@ nvd0_disp_curs_ofuncs = {
  ******************************************************************************/
 
 int
-nvd0_disp_base_scanoutpos(NV50_DISP_MTHD_V0)
+nvd0_disp_main_scanoutpos(NV50_DISP_MTHD_V0)
 {
 	const u32 total  = nv_rd32(priv, 0x640414 + (head * 0x300));
 	const u32 blanke = nv_rd32(priv, 0x64041c + (head * 0x300));
@@ -656,7 +658,7 @@ nvd0_disp_base_scanoutpos(NV50_DISP_MTHD_V0)
 }
 
 static int
-nvd0_disp_base_init(struct nouveau_object *object)
+nvd0_disp_main_init(struct nouveau_object *object)
 {
 	struct nv50_disp_priv *priv = (void *)object->engine;
 	struct nv50_disp_base *base = (void *)object;
@@ -725,7 +727,7 @@ nvd0_disp_base_init(struct nouveau_object *object)
 }
 
 static int
-nvd0_disp_base_fini(struct nouveau_object *object, bool suspend)
+nvd0_disp_main_fini(struct nouveau_object *object, bool suspend)
 {
 	struct nv50_disp_priv *priv = (void *)object->engine;
 	struct nv50_disp_base *base = (void *)object;
@@ -737,25 +739,25 @@ nvd0_disp_base_fini(struct nouveau_object *object, bool suspend)
 }
 
 struct nouveau_ofuncs
-nvd0_disp_base_ofuncs = {
-	.ctor = nv50_disp_base_ctor,
-	.dtor = nv50_disp_base_dtor,
-	.init = nvd0_disp_base_init,
-	.fini = nvd0_disp_base_fini,
-	.mthd = nv50_disp_base_mthd,
+nvd0_disp_main_ofuncs = {
+	.ctor = nv50_disp_main_ctor,
+	.dtor = nv50_disp_main_dtor,
+	.init = nvd0_disp_main_init,
+	.fini = nvd0_disp_main_fini,
+	.mthd = nv50_disp_main_mthd,
 	.ntfy = nouveau_disp_ntfy,
 };
 
 static struct nouveau_oclass
-nvd0_disp_base_oclass[] = {
-	{ GF110_DISP, &nvd0_disp_base_ofuncs },
+nvd0_disp_main_oclass[] = {
+	{ GF110_DISP, &nvd0_disp_main_ofuncs },
 	{}
 };
 
 static struct nouveau_oclass
 nvd0_disp_sclass[] = {
-	{ GF110_DISP_CORE_CHANNEL_DMA, &nvd0_disp_mast_ofuncs.base },
-	{ GF110_DISP_BASE_CHANNEL_DMA, &nvd0_disp_sync_ofuncs.base },
+	{ GF110_DISP_CORE_CHANNEL_DMA, &nvd0_disp_core_ofuncs.base },
+	{ GF110_DISP_BASE_CHANNEL_DMA, &nvd0_disp_base_ofuncs.base },
 	{ GF110_DISP_OVERLAY_CONTROL_DMA, &nvd0_disp_ovly_ofuncs.base },
 	{ GF110_DISP_OVERLAY, &nvd0_disp_oimm_ofuncs.base },
 	{ GF110_DISP_CURSOR, &nvd0_disp_curs_ofuncs.base },
@@ -1055,6 +1057,9 @@ nvd0_disp_intr_unk2_2(struct nv50_disp_priv *priv, int head)
 
 		if (nvkm_output_dp_train(outp, pclk, true))
 			ERR("link not trained before attach\n");
+	} else {
+		if (priv->sor.magic)
+			priv->sor.magic(outp);
 	}
 
 	exec_clkcmp(priv, head, 0, pclk, &conf);
@@ -1063,10 +1068,18 @@ nvd0_disp_intr_unk2_2(struct nv50_disp_priv *priv, int head)
 		addr = 0x612280 + (ffs(outp->info.or) - 1) * 0x800;
 		data = 0x00000000;
 	} else {
-		if (outp->info.type == DCB_OUTPUT_DP)
-			nvd0_disp_intr_unk2_2_tu(priv, head, &outp->info);
 		addr = 0x612300 + (ffs(outp->info.or) - 1) * 0x800;
 		data = (conf & 0x0100) ? 0x00000101 : 0x00000000;
+		switch (outp->info.type) {
+		case DCB_OUTPUT_TMDS:
+			nv_mask(priv, addr, 0x007c0000, 0x00280000);
+			break;
+		case DCB_OUTPUT_DP:
+			nvd0_disp_intr_unk2_2_tu(priv, head, &outp->info);
+			break;
+		default:
+			break;
+		}
 	}
 
 	nv_mask(priv, addr, 0x00000707, data);
@@ -1259,7 +1272,7 @@ nvd0_disp_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 	if (ret)
 		return ret;
 
-	nv_engine(priv)->sclass = nvd0_disp_base_oclass;
+	nv_engine(priv)->sclass = nvd0_disp_main_oclass;
 	nv_engine(priv)->cclass = &nv50_disp_cclass;
 	nv_subdev(priv)->intr = nvd0_disp_intr;
 	INIT_WORK(&priv->supervisor, nvd0_disp_intr_supervisor);
@@ -1292,9 +1305,9 @@ nvd0_disp_oclass = &(struct nv50_disp_impl) {
 	},
 	.base.vblank = &nvd0_disp_vblank_func,
 	.base.outp =  nvd0_disp_outp_sclass,
-	.mthd.core = &nvd0_disp_mast_mthd_chan,
-	.mthd.base = &nvd0_disp_sync_mthd_chan,
+	.mthd.core = &nvd0_disp_core_mthd_chan,
+	.mthd.base = &nvd0_disp_base_mthd_chan,
 	.mthd.ovly = &nvd0_disp_ovly_mthd_chan,
 	.mthd.prev = -0x020000,
-	.head.scanoutpos = nvd0_disp_base_scanoutpos,
+	.head.scanoutpos = nvd0_disp_main_scanoutpos,
 }.base.base;
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/nve0.c b/drivers/gpu/drm/nouveau/core/engine/disp/nve0.c
index db144b2cf06b..55debec7e68f 100644
--- a/drivers/gpu/drm/nouveau/core/engine/disp/nve0.c
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/nve0.c
@@ -34,7 +34,7 @@
  ******************************************************************************/
 
 static const struct nv50_disp_mthd_list
-nve0_disp_mast_mthd_head = {
+nve0_disp_core_mthd_head = {
 	.mthd = 0x0300,
 	.addr = 0x000300,
 	.data = {
@@ -113,15 +113,15 @@ nve0_disp_mast_mthd_head = {
 };
 
 const struct nv50_disp_mthd_chan
-nve0_disp_mast_mthd_chan = {
+nve0_disp_core_mthd_chan = {
 	.name = "Core",
 	.addr = 0x000000,
 	.data = {
-		{ "Global", 1, &nvd0_disp_mast_mthd_base },
-		{    "DAC", 3, &nvd0_disp_mast_mthd_dac  },
-		{    "SOR", 8, &nvd0_disp_mast_mthd_sor  },
-		{   "PIOR", 4, &nvd0_disp_mast_mthd_pior },
-		{   "HEAD", 4, &nve0_disp_mast_mthd_head },
+		{ "Global", 1, &nvd0_disp_core_mthd_base },
+		{    "DAC", 3, &nvd0_disp_core_mthd_dac  },
+		{    "SOR", 8, &nvd0_disp_core_mthd_sor  },
+		{   "PIOR", 4, &nvd0_disp_core_mthd_pior },
+		{   "HEAD", 4, &nve0_disp_core_mthd_head },
 		{}
 	}
 };
@@ -200,8 +200,8 @@ nve0_disp_ovly_mthd_chan = {
 
 static struct nouveau_oclass
 nve0_disp_sclass[] = {
-	{ GK104_DISP_CORE_CHANNEL_DMA, &nvd0_disp_mast_ofuncs.base },
-	{ GK104_DISP_BASE_CHANNEL_DMA, &nvd0_disp_sync_ofuncs.base },
+	{ GK104_DISP_CORE_CHANNEL_DMA, &nvd0_disp_core_ofuncs.base },
+	{ GK104_DISP_BASE_CHANNEL_DMA, &nvd0_disp_base_ofuncs.base },
 	{ GK104_DISP_OVERLAY_CONTROL_DMA, &nvd0_disp_ovly_ofuncs.base },
 	{ GK104_DISP_OVERLAY, &nvd0_disp_oimm_ofuncs.base },
 	{ GK104_DISP_CURSOR, &nvd0_disp_curs_ofuncs.base },
@@ -209,8 +209,8 @@ nve0_disp_sclass[] = {
 };
 
 static struct nouveau_oclass
-nve0_disp_base_oclass[] = {
-	{ GK104_DISP, &nvd0_disp_base_ofuncs },
+nve0_disp_main_oclass[] = {
+	{ GK104_DISP, &nvd0_disp_main_ofuncs },
 	{}
 };
 
@@ -237,7 +237,7 @@ nve0_disp_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 	if (ret)
 		return ret;
 
-	nv_engine(priv)->sclass = nve0_disp_base_oclass;
+	nv_engine(priv)->sclass = nve0_disp_main_oclass;
 	nv_engine(priv)->cclass = &nv50_disp_cclass;
 	nv_subdev(priv)->intr = nvd0_disp_intr;
 	INIT_WORK(&priv->supervisor, nvd0_disp_intr_supervisor);
@@ -264,9 +264,9 @@ nve0_disp_oclass = &(struct nv50_disp_impl) {
 	},
 	.base.vblank = &nvd0_disp_vblank_func,
 	.base.outp =  nvd0_disp_outp_sclass,
-	.mthd.core = &nve0_disp_mast_mthd_chan,
-	.mthd.base = &nvd0_disp_sync_mthd_chan,
+	.mthd.core = &nve0_disp_core_mthd_chan,
+	.mthd.base = &nvd0_disp_base_mthd_chan,
 	.mthd.ovly = &nve0_disp_ovly_mthd_chan,
 	.mthd.prev = -0x020000,
-	.head.scanoutpos = nvd0_disp_base_scanoutpos,
+	.head.scanoutpos = nvd0_disp_main_scanoutpos,
 }.base.base;
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/nvf0.c b/drivers/gpu/drm/nouveau/core/engine/disp/nvf0.c
index 402d7d67d806..3e7e2d28744c 100644
--- a/drivers/gpu/drm/nouveau/core/engine/disp/nvf0.c
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/nvf0.c
@@ -35,8 +35,8 @@
 
 static struct nouveau_oclass
 nvf0_disp_sclass[] = {
-	{ GK110_DISP_CORE_CHANNEL_DMA, &nvd0_disp_mast_ofuncs.base },
-	{ GK110_DISP_BASE_CHANNEL_DMA, &nvd0_disp_sync_ofuncs.base },
+	{ GK110_DISP_CORE_CHANNEL_DMA, &nvd0_disp_core_ofuncs.base },
+	{ GK110_DISP_BASE_CHANNEL_DMA, &nvd0_disp_base_ofuncs.base },
 	{ GK104_DISP_OVERLAY_CONTROL_DMA, &nvd0_disp_ovly_ofuncs.base },
 	{ GK104_DISP_OVERLAY, &nvd0_disp_oimm_ofuncs.base },
 	{ GK104_DISP_CURSOR, &nvd0_disp_curs_ofuncs.base },
@@ -44,8 +44,8 @@ nvf0_disp_sclass[] = {
 };
 
 static struct nouveau_oclass
-nvf0_disp_base_oclass[] = {
-	{ GK110_DISP, &nvd0_disp_base_ofuncs },
+nvf0_disp_main_oclass[] = {
+	{ GK110_DISP, &nvd0_disp_main_ofuncs },
 	{}
 };
 
@@ -72,7 +72,7 @@ nvf0_disp_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 	if (ret)
 		return ret;
 
-	nv_engine(priv)->sclass = nvf0_disp_base_oclass;
+	nv_engine(priv)->sclass = nvf0_disp_main_oclass;
 	nv_engine(priv)->cclass = &nv50_disp_cclass;
 	nv_subdev(priv)->intr = nvd0_disp_intr;
 	INIT_WORK(&priv->supervisor, nvd0_disp_intr_supervisor);
@@ -99,9 +99,9 @@ nvf0_disp_oclass = &(struct nv50_disp_impl) {
 	},
 	.base.vblank = &nvd0_disp_vblank_func,
 	.base.outp =  nvd0_disp_outp_sclass,
-	.mthd.core = &nve0_disp_mast_mthd_chan,
-	.mthd.base = &nvd0_disp_sync_mthd_chan,
+	.mthd.core = &nve0_disp_core_mthd_chan,
+	.mthd.base = &nvd0_disp_base_mthd_chan,
 	.mthd.ovly = &nve0_disp_ovly_mthd_chan,
 	.mthd.prev = -0x020000,
-	.head.scanoutpos = nvd0_disp_base_scanoutpos,
+	.head.scanoutpos = nvd0_disp_main_scanoutpos,
 }.base.base;
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/outp.c b/drivers/gpu/drm/nouveau/core/engine/disp/outp.c
index a5ff00a9cedc..bbd9b6fdc90f 100644
--- a/drivers/gpu/drm/nouveau/core/engine/disp/outp.c
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/outp.c
@@ -85,7 +85,10 @@ nvkm_output_create_(struct nouveau_object *parent,
 	    dcbE->sorconf.link : 0, dcbE->connector, dcbE->i2c_index,
 	    dcbE->bus, dcbE->heads);
 
-	outp->port = i2c->find(i2c, outp->info.i2c_index);
+	if (outp->info.type != DCB_OUTPUT_DP)
+		outp->port = i2c->find(i2c, NV_I2C_PORT(outp->info.i2c_index));
+	else
+		outp->port = i2c->find(i2c, NV_I2C_AUX(outp->info.i2c_index));
 	outp->edid = outp->port;
 
 	data = nvbios_connEp(bios, outp->info.connector, &ver, &hdr, &connE);
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/sorgm204.c b/drivers/gpu/drm/nouveau/core/engine/disp/sorgm204.c
new file mode 100644
index 000000000000..0b4fad39e9a6
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/sorgm204.c
@@ -0,0 +1,144 @@
+/*
+ * Copyright 2012 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Ben Skeggs
+ */
+
+#include <core/os.h>
+
+#include <subdev/bios.h>
+#include <subdev/bios/dcb.h>
+#include <subdev/bios/dp.h>
+#include <subdev/bios/init.h>
+#include <subdev/timer.h>
+
+#include "nv50.h"
+
+static inline u32
+gm204_sor_soff(struct nvkm_output_dp *outp)
+{
+	return (ffs(outp->base.info.or) - 1) * 0x800;
+}
+
+static inline u32
+gm204_sor_loff(struct nvkm_output_dp *outp)
+{
+	return gm204_sor_soff(outp) + !(outp->base.info.sorconf.link & 1) * 0x80;
+}
+
+void
+gm204_sor_magic(struct nvkm_output *outp)
+{
+	struct nv50_disp_priv *priv = (void *)nouveau_disp(outp);
+	const u32 soff = outp->or * 0x100;
+	const u32 data = outp->or + 1;
+	if (outp->info.sorconf.link & 1)
+		nv_mask(priv, 0x612308 + soff, 0x0000001f, 0x00000000 | data);
+	if (outp->info.sorconf.link & 2)
+		nv_mask(priv, 0x612388 + soff, 0x0000001f, 0x00000010 | data);
+}
+
+static inline u32
+gm204_sor_dp_lane_map(struct nv50_disp_priv *priv, u8 lane)
+{
+	return lane * 0x08;
+}
+
+static int
+gm204_sor_dp_pattern(struct nvkm_output_dp *outp, int pattern)
+{
+	struct nv50_disp_priv *priv = (void *)nouveau_disp(outp);
+	const u32 soff = gm204_sor_soff(outp);
+	const u32 data = 0x01010101 * pattern;
+	if (outp->base.info.sorconf.link & 1)
+		nv_mask(priv, 0x61c110 + soff, 0x0f0f0f0f, data);
+	else
+		nv_mask(priv, 0x61c12c + soff, 0x0f0f0f0f, data);
+	return 0;
+}
+
+static int
+gm204_sor_dp_lnk_pwr(struct nvkm_output_dp *outp, int nr)
+{
+	struct nv50_disp_priv *priv = (void *)nouveau_disp(outp);
+	const u32 soff = gm204_sor_soff(outp);
+	const u32 loff = gm204_sor_loff(outp);
+	u32 mask = 0, i;
+
+	for (i = 0; i < nr; i++)
+		mask |= 1 << (gm204_sor_dp_lane_map(priv, i) >> 3);
+
+	nv_mask(priv, 0x61c130 + loff, 0x0000000f, mask);
+	nv_mask(priv, 0x61c034 + soff, 0x80000000, 0x80000000);
+	nv_wait(priv, 0x61c034 + soff, 0x80000000, 0x00000000);
+	return 0;
+}
+
+static int
+gm204_sor_dp_drv_ctl(struct nvkm_output_dp *outp, int ln, int vs, int pe, int pc)
+{
+	struct nv50_disp_priv *priv = (void *)nouveau_disp(outp);
+	struct nouveau_bios *bios = nouveau_bios(priv);
+	const u32 shift = gm204_sor_dp_lane_map(priv, ln);
+	const u32 loff = gm204_sor_loff(outp);
+	u32 addr, data[4];
+	u8  ver, hdr, cnt, len;
+	struct nvbios_dpout info;
+	struct nvbios_dpcfg ocfg;
+
+	addr = nvbios_dpout_match(bios, outp->base.info.hasht,
+					outp->base.info.hashm,
+				 &ver, &hdr, &cnt, &len, &info);
+	if (!addr)
+		return -ENODEV;
+
+	addr = nvbios_dpcfg_match(bios, addr, pc, vs, pe,
+				 &ver, &hdr, &cnt, &len, &ocfg);
+	if (!addr)
+		return -EINVAL;
+
+	data[0] = nv_rd32(priv, 0x61c118 + loff) & ~(0x000000ff << shift);
+	data[1] = nv_rd32(priv, 0x61c120 + loff) & ~(0x000000ff << shift);
+	data[2] = nv_rd32(priv, 0x61c130 + loff);
+	if ((data[2] & 0x0000ff00) < (ocfg.tx_pu << 8) || ln == 0)
+		data[2] = (data[2] & ~0x0000ff00) | (ocfg.tx_pu << 8);
+	nv_wr32(priv, 0x61c118 + loff, data[0] | (ocfg.dc << shift));
+	nv_wr32(priv, 0x61c120 + loff, data[1] | (ocfg.pe << shift));
+	nv_wr32(priv, 0x61c130 + loff, data[2] | (ocfg.tx_pu << 8));
+	data[3] = nv_rd32(priv, 0x61c13c + loff) & ~(0x000000ff << shift);
+	nv_wr32(priv, 0x61c13c + loff, data[3] | (ocfg.pc << shift));
+	return 0;
+}
+
+struct nvkm_output_dp_impl
+gm204_sor_dp_impl = {
+	.base.base.handle = DCB_OUTPUT_DP,
+	.base.base.ofuncs = &(struct nouveau_ofuncs) {
+		.ctor = _nvkm_output_dp_ctor,
+		.dtor = _nvkm_output_dp_dtor,
+		.init = _nvkm_output_dp_init,
+		.fini = _nvkm_output_dp_fini,
+	},
+	.pattern = gm204_sor_dp_pattern,
+	.lnk_pwr = gm204_sor_dp_lnk_pwr,
+	.lnk_ctl = nvd0_sor_dp_lnk_ctl,
+	.drv_ctl = gm204_sor_dp_drv_ctl,
+};
diff --git a/drivers/gpu/drm/nouveau/core/engine/disp/sornvd0.c b/drivers/gpu/drm/nouveau/core/engine/disp/sornvd0.c
index 7b7bbc3e459e..fdab2939070c 100644
--- a/drivers/gpu/drm/nouveau/core/engine/disp/sornvd0.c
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/sornvd0.c
@@ -60,7 +60,7 @@ nvd0_sor_dp_pattern(struct nvkm_output_dp *outp, int pattern)
 	return 0;
 }
 
-static int
+int
 nvd0_sor_dp_lnk_ctl(struct nvkm_output_dp *outp, int nr, int bw, bool ef)
 {
 	struct nv50_disp_priv *priv = (void *)nouveau_disp(outp);
diff --git a/drivers/gpu/drm/nouveau/core/engine/dmaobj/nvd0.c b/drivers/gpu/drm/nouveau/core/engine/dmaobj/nvd0.c
index 3fc4f0b0eaca..19f5f6522962 100644
--- a/drivers/gpu/drm/nouveau/core/engine/dmaobj/nvd0.c
+++ b/drivers/gpu/drm/nouveau/core/engine/dmaobj/nvd0.c
@@ -51,6 +51,7 @@ nvd0_dmaobj_bind(struct nouveau_dmaobj *dmaobj,
 		case GK104_DISP_CORE_CHANNEL_DMA:
 		case GK110_DISP_CORE_CHANNEL_DMA:
 		case GM107_DISP_CORE_CHANNEL_DMA:
+		case GM204_DISP_CORE_CHANNEL_DMA:
 		case GF110_DISP_BASE_CHANNEL_DMA:
 		case GK104_DISP_BASE_CHANNEL_DMA:
 		case GK110_DISP_BASE_CHANNEL_DMA:
diff --git a/drivers/gpu/drm/nouveau/core/engine/fifo/nve0.c b/drivers/gpu/drm/nouveau/core/engine/fifo/nve0.c
index f8734eb74eaa..6a8db7c80bd1 100644
--- a/drivers/gpu/drm/nouveau/core/engine/fifo/nve0.c
+++ b/drivers/gpu/drm/nouveau/core/engine/fifo/nve0.c
@@ -792,7 +792,7 @@ nve0_fifo_intr_fault(struct nve0_fifo_priv *priv, int unit)
 	nouveau_engctx_put(engctx);
 }
 
-static const struct nouveau_bitfield nve0_fifo_pbdma_intr[] = {
+static const struct nouveau_bitfield nve0_fifo_pbdma_intr_0[] = {
 	{ 0x00000001, "MEMREQ" },
 	{ 0x00000002, "MEMACK_TIMEOUT" },
 	{ 0x00000004, "MEMACK_EXTRA" },
@@ -827,9 +827,10 @@ static const struct nouveau_bitfield nve0_fifo_pbdma_intr[] = {
 };
 
 static void
-nve0_fifo_intr_pbdma(struct nve0_fifo_priv *priv, int unit)
+nve0_fifo_intr_pbdma_0(struct nve0_fifo_priv *priv, int unit)
 {
-	u32 stat = nv_rd32(priv, 0x040108 + (unit * 0x2000));
+	u32 mask = nv_rd32(priv, 0x04010c + (unit * 0x2000));
+	u32 stat = nv_rd32(priv, 0x040108 + (unit * 0x2000)) & mask;
 	u32 addr = nv_rd32(priv, 0x0400c0 + (unit * 0x2000));
 	u32 data = nv_rd32(priv, 0x0400c4 + (unit * 0x2000));
 	u32 chid = nv_rd32(priv, 0x040120 + (unit * 0x2000)) & 0xfff;
@@ -840,11 +841,12 @@ nve0_fifo_intr_pbdma(struct nve0_fifo_priv *priv, int unit)
 	if (stat & 0x00800000) {
 		if (!nve0_fifo_swmthd(priv, chid, mthd, data))
 			show &= ~0x00800000;
+		nv_wr32(priv, 0x0400c0 + (unit * 0x2000), 0x80600008);
 	}
 
 	if (show) {
 		nv_error(priv, "PBDMA%d:", unit);
-		nouveau_bitfield_print(nve0_fifo_pbdma_intr, show);
+		nouveau_bitfield_print(nve0_fifo_pbdma_intr_0, show);
 		pr_cont("\n");
 		nv_error(priv,
 			 "PBDMA%d: ch %d [%s] subc %d mthd 0x%04x data 0x%08x\n",
@@ -853,10 +855,37 @@ nve0_fifo_intr_pbdma(struct nve0_fifo_priv *priv, int unit)
 			 subc, mthd, data);
 	}
 
-	nv_wr32(priv, 0x0400c0 + (unit * 0x2000), 0x80600008);
 	nv_wr32(priv, 0x040108 + (unit * 0x2000), stat);
 }
 
+static const struct nouveau_bitfield nve0_fifo_pbdma_intr_1[] = {
+	{ 0x00000001, "HCE_RE_ILLEGAL_OP" },
+	{ 0x00000002, "HCE_RE_ALIGNB" },
+	{ 0x00000004, "HCE_PRIV" },
+	{ 0x00000008, "HCE_ILLEGAL_MTHD" },
+	{ 0x00000010, "HCE_ILLEGAL_CLASS" },
+	{}
+};
+
+static void
+nve0_fifo_intr_pbdma_1(struct nve0_fifo_priv *priv, int unit)
+{
+	u32 mask = nv_rd32(priv, 0x04014c + (unit * 0x2000));
+	u32 stat = nv_rd32(priv, 0x040148 + (unit * 0x2000)) & mask;
+	u32 chid = nv_rd32(priv, 0x040120 + (unit * 0x2000)) & 0xfff;
+
+	if (stat) {
+		nv_error(priv, "PBDMA%d:", unit);
+		nouveau_bitfield_print(nve0_fifo_pbdma_intr_1, stat);
+		pr_cont("\n");
+		nv_error(priv, "PBDMA%d: ch %d %08x %08x\n", unit, chid,
+			 nv_rd32(priv, 0x040150 + (unit * 0x2000)),
+			 nv_rd32(priv, 0x040154 + (unit * 0x2000)));
+	}
+
+	nv_wr32(priv, 0x040148 + (unit * 0x2000), stat);
+}
+
 static void
 nve0_fifo_intr_runlist(struct nve0_fifo_priv *priv)
 {
@@ -939,7 +968,8 @@ nve0_fifo_intr(struct nouveau_subdev *subdev)
 		u32 mask = nv_rd32(priv, 0x0025a0);
 		while (mask) {
 			u32 unit = __ffs(mask);
-			nve0_fifo_intr_pbdma(priv, unit);
+			nve0_fifo_intr_pbdma_0(priv, unit);
+			nve0_fifo_intr_pbdma_1(priv, unit);
 			nv_wr32(priv, 0x0025a0, (1 << unit));
 			mask &= ~(1 << unit);
 		}
@@ -1022,6 +1052,12 @@ nve0_fifo_init(struct nouveau_object *object)
 		nv_wr32(priv, 0x04010c + (i * 0x2000), 0xfffffeff); /* INTREN */
 	}
 
+	/* PBDMA[n].HCE */
+	for (i = 0; i < priv->spoon_nr; i++) {
+		nv_wr32(priv, 0x040148 + (i * 0x2000), 0xffffffff); /* INTR */
+		nv_wr32(priv, 0x04014c + (i * 0x2000), 0xffffffff); /* INTREN */
+	}
+
 	nv_wr32(priv, 0x002254, 0x10000000 | priv->user.bar.offset >> 12);
 
 	nv_wr32(priv, 0x002100, 0xffffffff);
diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c
index 30fd1dc64f93..17251e4b9e86 100644
--- a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c
+++ b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c
@@ -1557,7 +1557,7 @@ nvc0_graph_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 		    nvc0_graph_ctor_fw(priv, "fuc409d", &priv->fuc409d) ||
 		    nvc0_graph_ctor_fw(priv, "fuc41ac", &priv->fuc41ac) ||
 		    nvc0_graph_ctor_fw(priv, "fuc41ad", &priv->fuc41ad))
-			return -EINVAL;
+			return -ENODEV;
 		priv->firmware = true;
 	}
 
diff --git a/drivers/gpu/drm/nouveau/core/include/core/device.h b/drivers/gpu/drm/nouveau/core/include/core/device.h
index 1d9d893929bb..2ec2e50d3676 100644
--- a/drivers/gpu/drm/nouveau/core/include/core/device.h
+++ b/drivers/gpu/drm/nouveau/core/include/core/device.h
@@ -16,6 +16,7 @@ enum nv_subdev_type {
 	 * to during POST.
 	 */
 	NVDEV_SUBDEV_DEVINIT,
+	NVDEV_SUBDEV_IBUS,
 	NVDEV_SUBDEV_GPIO,
 	NVDEV_SUBDEV_I2C,
 	NVDEV_SUBDEV_DEVINIT_LAST = NVDEV_SUBDEV_I2C,
@@ -31,7 +32,6 @@ enum nv_subdev_type {
 	NVDEV_SUBDEV_TIMER,
 	NVDEV_SUBDEV_FB,
 	NVDEV_SUBDEV_LTC,
-	NVDEV_SUBDEV_IBUS,
 	NVDEV_SUBDEV_INSTMEM,
 	NVDEV_SUBDEV_VM,
 	NVDEV_SUBDEV_BAR,
@@ -92,6 +92,7 @@ struct nouveau_device {
 		GM100    = 0x110,
 	} card_type;
 	u32 chipset;
+	u8  chiprev;
 	u32 crystal;
 
 	struct nouveau_oclass *oclass[NVDEV_SUBDEV_NR];
@@ -158,6 +159,12 @@ nv_device_is_pci(struct nouveau_device *device)
 	return device->pdev != NULL;
 }
 
+static inline bool
+nv_device_is_cpu_coherent(struct nouveau_device *device)
+{
+	return (!IS_ENABLED(CONFIG_ARM) && nv_device_is_pci(device));
+}
+
 static inline struct device *
 nv_device_base(struct nouveau_device *device)
 {
diff --git a/drivers/gpu/drm/nouveau/core/include/core/handle.h b/drivers/gpu/drm/nouveau/core/include/core/handle.h
index ceb67d770875..d22a59138a9b 100644
--- a/drivers/gpu/drm/nouveau/core/include/core/handle.h
+++ b/drivers/gpu/drm/nouveau/core/include/core/handle.h
@@ -23,11 +23,6 @@ void nouveau_handle_destroy(struct nouveau_handle *);
 int  nouveau_handle_init(struct nouveau_handle *);
 int  nouveau_handle_fini(struct nouveau_handle *, bool suspend);
 
-int  nouveau_handle_new(struct nouveau_object *, u32 parent, u32 handle,
-			u16 oclass, void *data, u32 size,
-			struct nouveau_object **);
-int  nouveau_handle_del(struct nouveau_object *, u32 parent, u32 handle);
-
 struct nouveau_object *
 nouveau_handle_ref(struct nouveau_object *, u32 name);
 
diff --git a/drivers/gpu/drm/nouveau/core/include/core/object.h b/drivers/gpu/drm/nouveau/core/include/core/object.h
index d7039482d6fd..2e2afa502c99 100644
--- a/drivers/gpu/drm/nouveau/core/include/core/object.h
+++ b/drivers/gpu/drm/nouveau/core/include/core/object.h
@@ -203,21 +203,4 @@ nv_memcmp(void *obj, u32 addr, const char *str, u32 len)
 	return 0;
 }
 
-#include <core/handle.h>
-
-static inline int
-nouveau_object_new(struct nouveau_object *client, u32 parent, u32 handle,
-		   u16 oclass, void *data, u32 size,
-		   struct nouveau_object **pobject)
-{
-	return nouveau_handle_new(client, parent, handle, oclass,
-				  data, size, pobject);
-}
-
-static inline int
-nouveau_object_del(struct nouveau_object *client, u32 parent, u32 handle)
-{
-	return nouveau_handle_del(client, parent, handle);
-}
-
 #endif
diff --git a/drivers/gpu/drm/nouveau/core/include/engine/disp.h b/drivers/gpu/drm/nouveau/core/include/engine/disp.h
index 7a64f347b385..fc307f1317ff 100644
--- a/drivers/gpu/drm/nouveau/core/include/engine/disp.h
+++ b/drivers/gpu/drm/nouveau/core/include/engine/disp.h
@@ -31,5 +31,6 @@ extern struct nouveau_oclass *nvd0_disp_oclass;
 extern struct nouveau_oclass *nve0_disp_oclass;
 extern struct nouveau_oclass *nvf0_disp_oclass;
 extern struct nouveau_oclass *gm107_disp_oclass;
+extern struct nouveau_oclass *gm204_disp_oclass;
 
 #endif
diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/bios/M0203.h b/drivers/gpu/drm/nouveau/core/include/subdev/bios/M0203.h
new file mode 100644
index 000000000000..1f84d3612dd8
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/bios/M0203.h
@@ -0,0 +1,31 @@
+#ifndef __NVBIOS_M0203_H__
+#define __NVBIOS_M0203_H__
+
+struct nvbios_M0203T {
+#define M0203T_TYPE_RAMCFG 0x00
+	u8  type;
+	u16 pointer;
+};
+
+u32 nvbios_M0203Te(struct nouveau_bios *, u8 *ver, u8 *hdr, u8 *cnt, u8 *len);
+u32 nvbios_M0203Tp(struct nouveau_bios *, u8 *ver, u8 *hdr, u8 *cnt, u8 *len,
+		   struct nvbios_M0203T *);
+
+struct nvbios_M0203E {
+#define M0203E_TYPE_DDR2  0x0
+#define M0203E_TYPE_DDR3  0x1
+#define M0203E_TYPE_GDDR3 0x2
+#define M0203E_TYPE_GDDR5 0x3
+#define M0203E_TYPE_SKIP  0xf
+	u8 type;
+	u8 strap;
+	u8 group;
+};
+
+u32 nvbios_M0203Ee(struct nouveau_bios *, int idx, u8 *ver, u8 *hdr);
+u32 nvbios_M0203Ep(struct nouveau_bios *, int idx, u8 *ver, u8 *hdr,
+		   struct nvbios_M0203E *);
+u32 nvbios_M0203Em(struct nouveau_bios *, u8 ramcfg, u8 *ver, u8 *hdr,
+		   struct nvbios_M0203E *);
+
+#endif
diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/bios/i2c.h b/drivers/gpu/drm/nouveau/core/include/subdev/bios/i2c.h
index 10b57a19a7de..c9bb112895af 100644
--- a/drivers/gpu/drm/nouveau/core/include/subdev/bios/i2c.h
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/bios/i2c.h
@@ -4,11 +4,14 @@
 struct nouveau_bios;
 
 enum dcb_i2c_type {
-	DCB_I2C_NV04_BIT = 0,
-	DCB_I2C_NV4E_BIT = 4,
-	DCB_I2C_NVIO_BIT = 5,
-	DCB_I2C_NVIO_AUX = 6,
-	DCB_I2C_UNUSED = 0xff
+	/* matches bios type field prior to ccb 4.1 */
+	DCB_I2C_NV04_BIT = 0x00,
+	DCB_I2C_NV4E_BIT = 0x04,
+	DCB_I2C_NVIO_BIT = 0x05,
+	DCB_I2C_NVIO_AUX = 0x06,
+	/* made up - mostly */
+	DCB_I2C_PMGR     = 0x80,
+	DCB_I2C_UNUSED   = 0xff
 };
 
 struct dcb_i2c_entry {
@@ -16,6 +19,7 @@ struct dcb_i2c_entry {
 	u8 drive;
 	u8 sense;
 	u8 share;
+	u8 auxch;
 };
 
 u16 dcb_i2c_table(struct nouveau_bios *, u8 *ver, u8 *hdr, u8 *cnt, u8 *len);
diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/bios/image.h b/drivers/gpu/drm/nouveau/core/include/subdev/bios/image.h
new file mode 100644
index 000000000000..3348b4580843
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/bios/image.h
@@ -0,0 +1,13 @@
+#ifndef __NVBIOS_IMAGE_H__
+#define __NVBIOS_IMAGE_H__
+
+struct nvbios_image {
+	u32  base;
+	u32  size;
+	u8   type;
+	bool last;
+};
+
+bool nvbios_image(struct nouveau_bios *, int, struct nvbios_image *);
+
+#endif
diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/bios/npde.h b/drivers/gpu/drm/nouveau/core/include/subdev/bios/npde.h
new file mode 100644
index 000000000000..b18413d951e5
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/bios/npde.h
@@ -0,0 +1,12 @@
+#ifndef __NVBIOS_NPDE_H__
+#define __NVBIOS_NPDE_H__
+
+struct nvbios_npdeT {
+	u32 image_size;
+	bool last;
+};
+
+u32 nvbios_npdeTe(struct nouveau_bios *, u32);
+u32 nvbios_npdeTp(struct nouveau_bios *, u32, struct nvbios_npdeT *);
+
+#endif
diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/bios/pcir.h b/drivers/gpu/drm/nouveau/core/include/subdev/bios/pcir.h
new file mode 100644
index 000000000000..3d634a06dca1
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/bios/pcir.h
@@ -0,0 +1,18 @@
+#ifndef __NVBIOS_PCIR_H__
+#define __NVBIOS_PCIR_H__
+
+struct nvbios_pcirT {
+	u16 vendor_id;
+	u16 device_id;
+	u8  class_code[3];
+	u32 image_size;
+	u16 image_rev;
+	u8  image_type;
+	bool last;
+};
+
+u32 nvbios_pcirTe(struct nouveau_bios *, u32, u8 *ver, u16 *hdr);
+u32 nvbios_pcirTp(struct nouveau_bios *, u32, u8 *ver, u16 *hdr,
+		  struct nvbios_pcirT *);
+
+#endif
diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/bios/pmu.h b/drivers/gpu/drm/nouveau/core/include/subdev/bios/pmu.h
new file mode 100644
index 000000000000..9de593deaea8
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/bios/pmu.h
@@ -0,0 +1,37 @@
+#ifndef __NVBIOS_PMU_H__
+#define __NVBIOS_PMU_H__
+
+struct nvbios_pmuT {
+};
+
+u32 nvbios_pmuTe(struct nouveau_bios *, u8 *ver, u8 *hdr, u8 *cnt, u8 *len);
+u32 nvbios_pmuTp(struct nouveau_bios *, u8 *ver, u8 *hdr, u8 *cnt, u8 *len,
+		 struct nvbios_pmuT *);
+
+struct nvbios_pmuE {
+	u8  type;
+	u32 data;
+};
+
+u32 nvbios_pmuEe(struct nouveau_bios *, int idx, u8 *ver, u8 *hdr);
+u32 nvbios_pmuEp(struct nouveau_bios *, int idx, u8 *ver, u8 *hdr,
+		 struct nvbios_pmuE *);
+
+struct nvbios_pmuR {
+	u32 boot_addr_pmu;
+	u32 boot_addr;
+	u32 boot_size;
+	u32 code_addr_pmu;
+	u32 code_addr;
+	u32 code_size;
+	u32 init_addr_pmu;
+
+	u32 data_addr_pmu;
+	u32 data_addr;
+	u32 data_size;
+	u32 args_addr_pmu;
+};
+
+bool nvbios_pmuRm(struct nouveau_bios *, u8 type, struct nvbios_pmuR *);
+
+#endif
diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/bios/ramcfg.h b/drivers/gpu/drm/nouveau/core/include/subdev/bios/ramcfg.h
index a685bbd04568..4a0e0ceb41ba 100644
--- a/drivers/gpu/drm/nouveau/core/include/subdev/bios/ramcfg.h
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/bios/ramcfg.h
@@ -43,8 +43,9 @@ struct nvbios_ramcfg {
 			unsigned ramcfg_10_02_08:1;
 			unsigned ramcfg_10_02_10:1;
 			unsigned ramcfg_10_02_20:1;
-			unsigned ramcfg_10_02_40:1;
+			unsigned ramcfg_10_DLLoff:1;
 			unsigned ramcfg_10_03_0f:4;
+			unsigned ramcfg_10_04_01:1;
 			unsigned ramcfg_10_05:8;
 			unsigned ramcfg_10_06:8;
 			unsigned ramcfg_10_07:8;
@@ -95,9 +96,29 @@ struct nvbios_ramcfg {
 	union {
 		struct {
 			unsigned timing_10_WR:8;
+			unsigned timing_10_WTR:8;
 			unsigned timing_10_CL:8;
+			unsigned timing_10_RC:8;
+			/*empty: 4 */
+			unsigned timing_10_RFC:8;        /* Byte 5 */
+			/*empty: 6 */
+			unsigned timing_10_RAS:8;        /* Byte 7 */
+			/*empty: 8 */
+			unsigned timing_10_RP:8;         /* Byte 9 */
+			unsigned timing_10_RCDRD:8;
+			unsigned timing_10_RCDWR:8;
+			unsigned timing_10_RRD:8;
+			unsigned timing_10_13:8;
 			unsigned timing_10_ODT:3;
+			/* empty: 15 */
+			unsigned timing_10_16:8;
+			/* empty: 17 */
+			unsigned timing_10_18:8;
 			unsigned timing_10_CWL:8;
+			unsigned timing_10_20:8;
+			unsigned timing_10_21:8;
+			/* empty: 22, 23 */
+			unsigned timing_10_24:8;
 		};
 		struct {
 			unsigned timing_20_2e_03:2;
diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/devinit.h b/drivers/gpu/drm/nouveau/core/include/subdev/devinit.h
index e292271a84e4..e007a9d44683 100644
--- a/drivers/gpu/drm/nouveau/core/include/subdev/devinit.h
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/devinit.h
@@ -30,5 +30,6 @@ extern struct nouveau_oclass *nva3_devinit_oclass;
 extern struct nouveau_oclass *nvaf_devinit_oclass;
 extern struct nouveau_oclass *nvc0_devinit_oclass;
 extern struct nouveau_oclass *gm107_devinit_oclass;
+extern struct nouveau_oclass *gm204_devinit_oclass;
 
 #endif
diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/i2c.h b/drivers/gpu/drm/nouveau/core/include/subdev/i2c.h
index 1b937c2c25ae..d94ccacb40bf 100644
--- a/drivers/gpu/drm/nouveau/core/include/subdev/i2c.h
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/i2c.h
@@ -8,6 +8,8 @@
 #include <subdev/bios/i2c.h>
 
 #define NV_I2C_PORT(n)    (0x00 + (n))
+#define NV_I2C_AUX(n)     (0x10 + (n))
+#define NV_I2C_EXT(n)     (0x20 + (n))
 #define NV_I2C_DEFAULT(n) (0x80 + (n))
 
 #define NV_I2C_TYPE_DCBI2C(n) (0x0000 | (n))
@@ -89,6 +91,7 @@ extern struct nouveau_oclass *nv94_i2c_oclass;
 extern struct nouveau_oclass *nvd0_i2c_oclass;
 extern struct nouveau_oclass *gf117_i2c_oclass;
 extern struct nouveau_oclass *nve0_i2c_oclass;
+extern struct nouveau_oclass *gm204_i2c_oclass;
 
 static inline int
 nv_rdi2cr(struct nouveau_i2c_port *port, u8 addr, u8 reg)
diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/pwr.h b/drivers/gpu/drm/nouveau/core/include/subdev/pwr.h
index bf3d1f611333..f2427bf5aeed 100644
--- a/drivers/gpu/drm/nouveau/core/include/subdev/pwr.h
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/pwr.h
@@ -48,6 +48,8 @@ void nouveau_memx_wait(struct nouveau_memx *,
 		       u32 addr, u32 mask, u32 data, u32 nsec);
 void nouveau_memx_nsec(struct nouveau_memx *, u32 nsec);
 void nouveau_memx_wait_vblank(struct nouveau_memx *);
+void nouveau_memx_train(struct nouveau_memx *);
+int  nouveau_memx_train_result(struct nouveau_pwr *, u32 *, int);
 void nouveau_memx_block(struct nouveau_memx *);
 void nouveau_memx_unblock(struct nouveau_memx *);
 
diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/volt.h b/drivers/gpu/drm/nouveau/core/include/subdev/volt.h
index 820b62ffd75b..67db5e58880d 100644
--- a/drivers/gpu/drm/nouveau/core/include/subdev/volt.h
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/volt.h
@@ -52,6 +52,7 @@ int  _nouveau_volt_init(struct nouveau_object *);
 #define _nouveau_volt_fini _nouveau_subdev_fini
 
 extern struct nouveau_oclass nv40_volt_oclass;
+extern struct nouveau_oclass gk20a_volt_oclass;
 
 int nouveau_voltgpio_init(struct nouveau_volt *);
 int nouveau_voltgpio_get(struct nouveau_volt *);
diff --git a/drivers/gpu/drm/nouveau/core/os.h b/drivers/gpu/drm/nouveau/core/os.h
index ccfa21d72ddc..bdd05ee7ec72 100644
--- a/drivers/gpu/drm/nouveau/core/os.h
+++ b/drivers/gpu/drm/nouveau/core/os.h
@@ -23,6 +23,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/power_supply.h>
 #include <linux/clk.h>
+#include <linux/regulator/consumer.h>
 
 #include <asm/unaligned.h>
 
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/M0203.c b/drivers/gpu/drm/nouveau/core/subdev/bios/M0203.c
new file mode 100644
index 000000000000..28906b16d4e5
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/M0203.c
@@ -0,0 +1,129 @@
+/*
+ * Copyright 2014 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Ben Skeggs
+ */
+
+#include <subdev/bios.h>
+#include <subdev/bios/bit.h>
+#include <subdev/bios/M0203.h>
+
+u32
+nvbios_M0203Te(struct nouveau_bios *bios, u8 *ver, u8 *hdr, u8 *cnt, u8 *len)
+{
+	struct bit_entry bit_M;
+	u32 data = 0x00000000;
+
+	if (!bit_entry(bios, 'M', &bit_M)) {
+		if (bit_M.version == 2 && bit_M.length > 0x04)
+			data = nv_ro16(bios, bit_M.offset + 0x03);
+		if (data) {
+			*ver = nv_ro08(bios, data + 0x00);
+			switch (*ver) {
+			case 0x10:
+				*hdr = nv_ro08(bios, data + 0x01);
+				*len = nv_ro08(bios, data + 0x02);
+				*cnt = nv_ro08(bios, data + 0x03);
+				return data;
+			default:
+				break;
+			}
+		}
+	}
+
+	return 0x00000000;
+}
+
+u32
+nvbios_M0203Tp(struct nouveau_bios *bios, u8 *ver, u8 *hdr, u8 *cnt, u8 *len,
+	       struct nvbios_M0203T *info)
+{
+	u32 data = nvbios_M0203Te(bios, ver, hdr, cnt, len);
+	memset(info, 0x00, sizeof(*info));
+	switch (!!data * *ver) {
+	case 0x10:
+		info->type    = nv_ro08(bios, data + 0x04);
+		info->pointer = nv_ro16(bios, data + 0x05);
+		break;
+	default:
+		break;
+	}
+	return data;
+}
+
+u32
+nvbios_M0203Ee(struct nouveau_bios *bios, int idx, u8 *ver, u8 *hdr)
+{
+	u8  cnt, len;
+	u32 data = nvbios_M0203Te(bios, ver, hdr, &cnt, &len);
+	if (data && idx < cnt) {
+		data = data + *hdr + idx * len;
+		*hdr = len;
+		return data;
+	}
+	return 0x00000000;
+}
+
+u32
+nvbios_M0203Ep(struct nouveau_bios *bios, int idx, u8 *ver, u8 *hdr,
+	       struct nvbios_M0203E *info)
+{
+	u32 data = nvbios_M0203Ee(bios, idx, ver, hdr);
+	memset(info, 0x00, sizeof(*info));
+	switch (!!data * *ver) {
+	case 0x10:
+		info->type  = (nv_ro08(bios, data + 0x00) & 0x0f) >> 0;
+		info->strap = (nv_ro08(bios, data + 0x00) & 0xf0) >> 4;
+		info->group = (nv_ro08(bios, data + 0x01) & 0x0f) >> 0;
+		return data;
+	default:
+		break;
+	}
+	return 0x00000000;
+}
+
+u32
+nvbios_M0203Em(struct nouveau_bios *bios, u8 ramcfg, u8 *ver, u8 *hdr,
+	       struct nvbios_M0203E *info)
+{
+	struct nvbios_M0203T M0203T;
+	u8  cnt, len, idx = 0xff;
+	u32 data;
+
+	if (!nvbios_M0203Tp(bios, ver, hdr, &cnt, &len, &M0203T)) {
+		nv_warn(bios, "M0203T not found\n");
+		return 0x00000000;
+	}
+
+	while ((data = nvbios_M0203Ep(bios, ++idx, ver, hdr, info))) {
+		switch (M0203T.type) {
+		case M0203T_TYPE_RAMCFG:
+			if (info->strap != ramcfg)
+				continue;
+			return data;
+		default:
+			nv_warn(bios, "M0203T type %02x\n", M0203T.type);
+			return 0x00000000;
+		}
+	}
+
+	return data;
+}
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/base.c b/drivers/gpu/drm/nouveau/core/subdev/bios/base.c
index d45704a2c2df..7df3a273553d 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/bios/base.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/base.c
@@ -31,6 +31,8 @@
 #include <subdev/bios/bmp.h>
 #include <subdev/bios/bit.h>
 
+#include "priv.h"
+
 u8
 nvbios_checksum(const u8 *data, int size)
 {
@@ -56,362 +58,21 @@ nvbios_findstr(const u8 *data, int size, const char *str, int len)
 	return 0;
 }
 
-#if defined(__powerpc__)
-static void
-nouveau_bios_shadow_of(struct nouveau_bios *bios)
+int
+nvbios_extend(struct nouveau_bios *bios, u32 length)
 {
-	struct pci_dev *pdev = nv_device(bios)->pdev;
-	struct device_node *dn;
-	const u32 *data;
-	int size;
-
-	dn = pci_device_to_OF_node(pdev);
-	if (!dn) {
-		nv_info(bios, "Unable to get the OF node\n");
-		return;
-	}
-
-	data = of_get_property(dn, "NVDA,BMP", &size);
-	if (data && size) {
-		bios->size = size;
-		bios->data = kmalloc(bios->size, GFP_KERNEL);
-		if (bios->data)
-			memcpy(bios->data, data, size);
-	}
-}
-#endif
-
-static void
-nouveau_bios_shadow_pramin(struct nouveau_bios *bios)
-{
-	struct nouveau_device *device = nv_device(bios);
-	u64 addr = 0;
-	u32 bar0 = 0;
-	int i;
-
-	if (device->card_type >= NV_50) {
-		if (device->card_type >= NV_C0 && device->card_type < GM100) {
-			if (nv_rd32(bios, 0x022500) & 0x00000001)
-				return;
-		} else
-		if (device->card_type >= GM100) {
-			if (nv_rd32(bios, 0x021c04) & 0x00000001)
-				return;
-		}
-
-		addr = nv_rd32(bios, 0x619f04);
-		if (!(addr & 0x00000008)) {
-			nv_debug(bios, "... not enabled\n");
-			return;
+	if (bios->size < length) {
+		u8 *prev = bios->data;
+		if (!(bios->data = kmalloc(length, GFP_KERNEL))) {
+			bios->data = prev;
+			return -ENOMEM;
 		}
-		if ( (addr & 0x00000003) != 1) {
-			nv_debug(bios, "... not in vram\n");
-			return;
-		}
-
-		addr = (addr & 0xffffff00) << 8;
-		if (!addr) {
-			addr  = (u64)nv_rd32(bios, 0x001700) << 16;
-			addr += 0xf0000;
-		}
-
-		bar0 = nv_mask(bios, 0x001700, 0xffffffff, addr >> 16);
-	}
-
-	/* bail if no rom signature */
-	if (nv_rd08(bios, 0x700000) != 0x55 ||
-	    nv_rd08(bios, 0x700001) != 0xaa)
-		goto out;
-
-	bios->size = nv_rd08(bios, 0x700002) * 512;
-	if (!bios->size)
-		goto out;
-
-	bios->data = kmalloc(bios->size, GFP_KERNEL);
-	if (bios->data) {
-		for (i = 0; i < bios->size; i++)
-			nv_wo08(bios, i, nv_rd08(bios, 0x700000 + i));
-	}
-
-out:
-	if (device->card_type >= NV_50)
-		nv_wr32(bios, 0x001700, bar0);
-}
-
-static void
-nouveau_bios_shadow_prom(struct nouveau_bios *bios)
-{
-	struct nouveau_device *device = nv_device(bios);
-	u32 pcireg, access;
-	u16 pcir;
-	int i;
-
-	/* there is no prom on nv4x IGP's */
-	if (device->card_type == NV_40 && device->chipset >= 0x4c)
-		return;
-
-	/* enable access to rom */
-	if (device->card_type >= NV_50)
-		pcireg = 0x088050;
-	else
-		pcireg = 0x001850;
-	access = nv_mask(bios, pcireg, 0x00000001, 0x00000000);
-
-	/* WARNING: PROM accesses should always be 32-bits aligned. Other
-	 * accesses work on most chipset but do not on Kepler chipsets
-	 */
-
-	/* bail if no rom signature, with a workaround for a PROM reading
-	 * issue on some chipsets.  the first read after a period of
-	 * inactivity returns the wrong result, so retry the first header
-	 * byte a few times before giving up as a workaround
-	 */
-	i = 16;
-	do {
-		u32 data = le32_to_cpu(nv_rd32(bios, 0x300000)) & 0xffff;
-		if (data == 0xaa55)
-			break;
-	} while (i--);
-
-	if (!i)
-		goto out;
-
-	/* read entire bios image to system memory */
-	bios->size = (le32_to_cpu(nv_rd32(bios, 0x300000)) >> 16) & 0xff;
-	bios->size = bios->size * 512;
-	if (!bios->size)
-		goto out;
-
-	bios->data = kmalloc(bios->size, GFP_KERNEL);
-	if (!bios->data)
-		goto out;
-
-	for (i = 0; i < bios->size; i += 4)
-		((u32 *)bios->data)[i/4] = nv_rd32(bios, 0x300000 + i);
-
-	/* check the PCI record header */
-	pcir = nv_ro16(bios, 0x0018);
-	if (bios->data[pcir + 0] != 'P' ||
-	    bios->data[pcir + 1] != 'C' ||
-	    bios->data[pcir + 2] != 'I' ||
-	    bios->data[pcir + 3] != 'R') {
-		bios->size = 0;
-		kfree(bios->data);
-	}
-
-out:
-	/* disable access to rom */
-	nv_wr32(bios, pcireg, access);
-}
-
-#if defined(CONFIG_ACPI) && defined(CONFIG_X86)
-int nouveau_acpi_get_bios_chunk(uint8_t *bios, int offset, int len);
-bool nouveau_acpi_rom_supported(struct pci_dev *pdev);
-#else
-static inline bool
-nouveau_acpi_rom_supported(struct pci_dev *pdev) {
-	return false;
-}
-
-static inline int
-nouveau_acpi_get_bios_chunk(uint8_t *bios, int offset, int len) {
-	return -EINVAL;
-}
-#endif
-
-static void
-nouveau_bios_shadow_acpi(struct nouveau_bios *bios)
-{
-	struct pci_dev *pdev = nv_device(bios)->pdev;
-	int ret, cnt, i;
-
-	if (!nouveau_acpi_rom_supported(pdev)) {
-		bios->data = NULL;
-		return;
-	}
-
-	bios->size = 0;
-	bios->data = kmalloc(4096, GFP_KERNEL);
-	if (bios->data) {
-		if (nouveau_acpi_get_bios_chunk(bios->data, 0, 4096) == 4096)
-			bios->size = bios->data[2] * 512;
-		kfree(bios->data);
+		memcpy(bios->data, prev, bios->size);
+		bios->size = length;
+		kfree(prev);
+		return 1;
 	}
-
-	if (!bios->size)
-		return;
-
-	bios->data = kmalloc(bios->size, GFP_KERNEL);
-	if (bios->data) {
-		/* disobey the acpi spec - much faster on at least w530 ... */
-		ret = nouveau_acpi_get_bios_chunk(bios->data, 0, bios->size);
-		if (ret != bios->size ||
-		    nvbios_checksum(bios->data, bios->size)) {
-			/* ... that didn't work, ok, i'll be good now */
-			for (i = 0; i < bios->size; i += cnt) {
-				cnt = min((bios->size - i), (u32)4096);
-				ret = nouveau_acpi_get_bios_chunk(bios->data, i, cnt);
-				if (ret != cnt)
-					break;
-			}
-		}
-	}
-}
-
-static void
-nouveau_bios_shadow_pci(struct nouveau_bios *bios)
-{
-	struct pci_dev *pdev = nv_device(bios)->pdev;
-	size_t size;
-
-	if (!pci_enable_rom(pdev)) {
-		void __iomem *rom = pci_map_rom(pdev, &size);
-		if (rom && size) {
-			bios->data = kmalloc(size, GFP_KERNEL);
-			if (bios->data) {
-				memcpy_fromio(bios->data, rom, size);
-				bios->size = size;
-			}
-		}
-		if (rom)
-			pci_unmap_rom(pdev, rom);
-
-		pci_disable_rom(pdev);
-	}
-}
-
-static void
-nouveau_bios_shadow_platform(struct nouveau_bios *bios)
-{
-	struct pci_dev *pdev = nv_device(bios)->pdev;
-	size_t size;
-
-	void __iomem *rom = pci_platform_rom(pdev, &size);
-	if (rom && size) {
-		bios->data = kmalloc(size, GFP_KERNEL);
-		if (bios->data) {
-			memcpy_fromio(bios->data, rom, size);
-			bios->size = size;
-		}
-	}
-}
-
-static int
-nouveau_bios_score(struct nouveau_bios *bios, const bool writeable)
-{
-	if (bios->size < 3 || !bios->data || bios->data[0] != 0x55 ||
-			bios->data[1] != 0xAA) {
-		nv_info(bios, "... signature not found\n");
-		return 0;
-	}
-
-	if (nvbios_checksum(bios->data,
-			min_t(u32, bios->data[2] * 512, bios->size))) {
-		nv_info(bios, "... checksum invalid\n");
-		/* if a ro image is somewhat bad, it's probably all rubbish */
-		return writeable ? 2 : 1;
-	}
-
-	nv_info(bios, "... appears to be valid\n");
-	return 3;
-}
-
-struct methods {
-	const char desc[16];
-	void (*shadow)(struct nouveau_bios *);
-	const bool rw;
-	int score;
-	u32 size;
-	u8 *data;
-};
-
-static int
-nouveau_bios_shadow(struct nouveau_bios *bios)
-{
-	struct methods shadow_methods[] = {
-#if defined(__powerpc__)
-		{ "OpenFirmware", nouveau_bios_shadow_of, true, 0, 0, NULL },
-#endif
-		{ "PRAMIN", nouveau_bios_shadow_pramin, true, 0, 0, NULL },
-		{ "PROM", nouveau_bios_shadow_prom, false, 0, 0, NULL },
-		{ "ACPI", nouveau_bios_shadow_acpi, true, 0, 0, NULL },
-		{ "PCIROM", nouveau_bios_shadow_pci, true, 0, 0, NULL },
-		{ "PLATFORM", nouveau_bios_shadow_platform, true, 0, 0, NULL },
-		{}
-	};
-	struct methods *mthd, *best;
-	const struct firmware *fw;
-	const char *optarg;
-	int optlen, ret;
-	char *source;
-
-	optarg = nouveau_stropt(nv_device(bios)->cfgopt, "NvBios", &optlen);
-	source = optarg ? kstrndup(optarg, optlen, GFP_KERNEL) : NULL;
-	if (source) {
-		/* try to match one of the built-in methods */
-		mthd = shadow_methods;
-		do {
-			if (strcasecmp(source, mthd->desc))
-				continue;
-			nv_info(bios, "source: %s\n", mthd->desc);
-
-			mthd->shadow(bios);
-			mthd->score = nouveau_bios_score(bios, mthd->rw);
-			if (mthd->score) {
-				kfree(source);
-				return 0;
-			}
-		} while ((++mthd)->shadow);
-
-		/* attempt to load firmware image */
-		ret = request_firmware(&fw, source, &nv_device(bios)->pdev->dev);
-		if (ret == 0) {
-			bios->size = fw->size;
-			bios->data = kmemdup(fw->data, fw->size, GFP_KERNEL);
-			release_firmware(fw);
-
-			nv_info(bios, "image: %s\n", source);
-			if (nouveau_bios_score(bios, 1)) {
-				kfree(source);
-				return 0;
-			}
-
-			kfree(bios->data);
-			bios->data = NULL;
-		}
-
-		nv_error(bios, "source \'%s\' invalid\n", source);
-		kfree(source);
-	}
-
-	mthd = shadow_methods;
-	do {
-		nv_info(bios, "checking %s for image...\n", mthd->desc);
-		mthd->shadow(bios);
-		mthd->score = nouveau_bios_score(bios, mthd->rw);
-		mthd->size = bios->size;
-		mthd->data = bios->data;
-		bios->data = NULL;
-	} while (mthd->score != 3 && (++mthd)->shadow);
-
-	mthd = shadow_methods;
-	best = mthd;
-	do {
-		if (mthd->score > best->score) {
-			kfree(best->data);
-			best = mthd;
-		}
-	} while ((++mthd)->shadow);
-
-	if (best->score) {
-		nv_info(bios, "using image from %s\n", best->desc);
-		bios->size = best->size;
-		bios->data = best->data;
-		return 0;
-	}
-
-	nv_error(bios, "unable to locate usable image\n");
-	return -EINVAL;
+	return 0;
 }
 
 static u8
@@ -472,7 +133,7 @@ nouveau_bios_ctor(struct nouveau_object *parent,
 	if (ret)
 		return ret;
 
-	ret = nouveau_bios_shadow(bios);
+	ret = nvbios_shadow(bios);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/dcb.c b/drivers/gpu/drm/nouveau/core/subdev/bios/dcb.c
index bd8d348385b3..96099aff8b41 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/bios/dcb.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/dcb.c
@@ -42,7 +42,7 @@ dcb_table(struct nouveau_bios *bios, u8 *ver, u8 *hdr, u8 *cnt, u8 *len)
 
 	*ver = nv_ro08(bios, dcb);
 
-	if (*ver >= 0x41) {
+	if (*ver >= 0x42) {
 		nv_warn(bios, "DCB version 0x%02x unknown\n", *ver);
 		return 0x0000;
 	} else
@@ -157,17 +157,20 @@ dcb_outp_parse(struct nouveau_bios *bios, u8 idx, u8 *ver, u8 *len,
 					break;
 				}
 
-				switch (conf & 0x0f000000) {
-				case 0x0f000000:
-					outp->dpconf.link_nr = 4;
-					break;
-				case 0x03000000:
-					outp->dpconf.link_nr = 2;
-					break;
-				case 0x01000000:
-				default:
-					outp->dpconf.link_nr = 1;
-					break;
+				outp->dpconf.link_nr = (conf & 0x0f000000) >> 24;
+				if (*ver < 0x41) {
+					switch (outp->dpconf.link_nr) {
+					case 0x0f:
+						outp->dpconf.link_nr = 4;
+						break;
+					case 0x03:
+						outp->dpconf.link_nr = 2;
+						break;
+					case 0x01:
+					default:
+						outp->dpconf.link_nr = 1;
+						break;
+					}
 				}
 
 				/* fall-through... */
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/disp.c b/drivers/gpu/drm/nouveau/core/subdev/bios/disp.c
index 7f16e52d9bea..51f355599694 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/bios/disp.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/disp.c
@@ -40,6 +40,7 @@ nvbios_disp_table(struct nouveau_bios *bios,
 				switch (*ver) {
 				case 0x20:
 				case 0x21:
+				case 0x22:
 					*hdr = nv_ro08(bios, data + 0x01);
 					*len = nv_ro08(bios, data + 0x02);
 					*cnt = nv_ro08(bios, data + 0x03);
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/dp.c b/drivers/gpu/drm/nouveau/core/subdev/bios/dp.c
index f309dd657250..cef53f81f12b 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/bios/dp.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/dp.c
@@ -41,6 +41,7 @@ nvbios_dp_table(struct nouveau_bios *bios, u8 *ver, u8 *hdr, u8 *cnt, u8 *len)
 				case 0x21:
 				case 0x30:
 				case 0x40:
+				case 0x41:
 					*hdr = nv_ro08(bios, data + 0x01);
 					*len = nv_ro08(bios, data + 0x02);
 					*cnt = nv_ro08(bios, data + 0x03);
@@ -70,6 +71,7 @@ nvbios_dpout_entry(struct nouveau_bios *bios, u8 idx,
 			*cnt = nv_ro08(bios, outp + 0x04);
 			break;
 		case 0x40:
+		case 0x41:
 			*hdr = nv_ro08(bios, data + 0x04);
 			*cnt = 0;
 			*len = 0;
@@ -108,6 +110,7 @@ nvbios_dpout_parse(struct nouveau_bios *bios, u8 idx,
 				info->script[4] = nv_ro16(bios, data + 0x10);
 			break;
 		case 0x40:
+		case 0x41:
 			info->flags     = nv_ro08(bios, data + 0x04);
 			info->script[0] = nv_ro16(bios, data + 0x05);
 			info->script[1] = nv_ro16(bios, data + 0x07);
@@ -172,10 +175,11 @@ nvbios_dpcfg_parse(struct nouveau_bios *bios, u16 outp, u8 idx,
 			break;
 		case 0x30:
 		case 0x40:
+		case 0x41:
 			info->pc    = nv_ro08(bios, data + 0x00);
 			info->dc    = nv_ro08(bios, data + 0x01);
 			info->pe    = nv_ro08(bios, data + 0x02);
-			info->tx_pu = nv_ro08(bios, data + 0x03);
+			info->tx_pu = nv_ro08(bios, data + 0x03) & 0x0f;
 			break;
 		default:
 			data = 0x0000;
@@ -194,6 +198,10 @@ nvbios_dpcfg_match(struct nouveau_bios *bios, u16 outp, u8 pc, u8 vs, u8 pe,
 	u16 data;
 
 	if (*ver >= 0x30) {
+		/*XXX: there's a second set of these on at least 4.1, that
+		 *     i've witnessed nvidia using instead of the first
+		 *     on gm204.  figure out what/why
+		 */
 		const u8 vsoff[] = { 0, 4, 7, 9 };
 		idx = (pc * 10) + vsoff[vs] + pe;
 	} else {
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/extdev.c b/drivers/gpu/drm/nouveau/core/subdev/bios/extdev.c
index b2a676e53580..49285d4f7ca5 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/bios/extdev.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/extdev.c
@@ -90,7 +90,7 @@ nvbios_extdev_find(struct nouveau_bios *bios, enum nvbios_extdev_type type,
 	u16 entry;
 
 	i = 0;
-	while (!(entry = nvbios_extdev_entry(bios, i++, &ver, &len))) {
+	while ((entry = nvbios_extdev_entry(bios, i++, &ver, &len))) {
 		extdev_parse_entry(bios, entry, func);
 		if (func->type == type)
 			return 0;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/i2c.c b/drivers/gpu/drm/nouveau/core/subdev/bios/i2c.c
index cfb9288c6d28..282320ba9264 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/bios/i2c.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/i2c.c
@@ -39,6 +39,11 @@ dcb_i2c_table(struct nouveau_bios *bios, u8 *ver, u8 *hdr, u8 *cnt, u8 *len)
 			i2c = nv_ro16(bios, dcb + 4);
 	}
 
+	if (i2c && *ver >= 0x42) {
+		nv_warn(bios, "ccb %02x not supported\n", *ver);
+		return 0x0000;
+	}
+
 	if (i2c && *ver >= 0x30) {
 		*ver = nv_ro08(bios, i2c + 0);
 		*hdr = nv_ro08(bios, i2c + 1);
@@ -70,14 +75,25 @@ dcb_i2c_parse(struct nouveau_bios *bios, u8 idx, struct dcb_i2c_entry *info)
 	u8  ver, len;
 	u16 ent = dcb_i2c_entry(bios, idx, &ver, &len);
 	if (ent) {
-		info->type  = nv_ro08(bios, ent + 3);
-		info->share = DCB_I2C_UNUSED;
-		if (ver < 0x30) {
-			info->type &= 0x07;
+		if (ver >= 0x41) {
+			if (!(nv_ro32(bios, ent) & 0x80000000))
+				info->type = DCB_I2C_UNUSED;
+			else
+				info->type = DCB_I2C_PMGR;
+		} else
+		if (ver >= 0x30) {
+			info->type = nv_ro08(bios, ent + 0x03);
+		} else {
+			info->type = nv_ro08(bios, ent + 0x03) & 0x07;
 			if (info->type == 0x07)
 				info->type = DCB_I2C_UNUSED;
 		}
 
+		info->drive = DCB_I2C_UNUSED;
+		info->sense = DCB_I2C_UNUSED;
+		info->share = DCB_I2C_UNUSED;
+		info->auxch = DCB_I2C_UNUSED;
+
 		switch (info->type) {
 		case DCB_I2C_NV04_BIT:
 			info->drive = nv_ro08(bios, ent + 0);
@@ -87,12 +103,23 @@ dcb_i2c_parse(struct nouveau_bios *bios, u8 idx, struct dcb_i2c_entry *info)
 			info->drive = nv_ro08(bios, ent + 1);
 			return 0;
 		case DCB_I2C_NVIO_BIT:
-		case DCB_I2C_NVIO_AUX:
 			info->drive = nv_ro08(bios, ent + 0) & 0x0f;
-			if (nv_ro08(bios, ent + 1) & 0x01) {
-				info->share  = nv_ro08(bios, ent + 1) >> 1;
-				info->share &= 0x0f;
-			}
+			if (nv_ro08(bios, ent + 1) & 0x01)
+				info->share = nv_ro08(bios, ent + 1) >> 1;
+			return 0;
+		case DCB_I2C_NVIO_AUX:
+			info->auxch = nv_ro08(bios, ent + 0) & 0x0f;
+			if (nv_ro08(bios, ent + 1) & 0x01)
+					info->share = info->auxch;
+			return 0;
+		case DCB_I2C_PMGR:
+			info->drive = (nv_ro16(bios, ent + 0) & 0x01f) >> 0;
+			if (info->drive == 0x1f)
+				info->drive = DCB_I2C_UNUSED;
+			info->auxch = (nv_ro16(bios, ent + 0) & 0x3e0) >> 5;
+			if (info->auxch == 0x1f)
+				info->auxch = DCB_I2C_UNUSED;
+			info->share = info->auxch;
 			return 0;
 		case DCB_I2C_UNUSED:
 			return 0;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/image.c b/drivers/gpu/drm/nouveau/core/subdev/bios/image.c
new file mode 100644
index 000000000000..373f9a564ac9
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/image.c
@@ -0,0 +1,78 @@
+/*
+ * Copyright 2014 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Ben Skeggs <bskeggs@redhat.com>
+ */
+
+#include <subdev/bios.h>
+#include <subdev/bios/image.h>
+#include <subdev/bios/pcir.h>
+#include <subdev/bios/npde.h>
+
+static bool
+nvbios_imagen(struct nouveau_bios *bios, struct nvbios_image *image)
+{
+	struct nvbios_pcirT pcir;
+	struct nvbios_npdeT npde;
+	u8  ver;
+	u16 hdr;
+	u32 data;
+
+	switch ((data = nv_ro16(bios, image->base + 0x00))) {
+	case 0xaa55:
+	case 0xbb77:
+	case 0x4e56: /* NV */
+		break;
+	default:
+		nv_debug(bios, "%08x: ROM signature (%04x) unknown\n",
+			 image->base, data);
+		return false;
+	}
+
+	if (!(data = nvbios_pcirTp(bios, image->base, &ver, &hdr, &pcir)))
+		return false;
+	image->size = pcir.image_size;
+	image->type = pcir.image_type;
+	image->last = pcir.last;
+
+	if (image->type != 0x70) {
+		if (!(data = nvbios_npdeTp(bios, image->base, &npde)))
+			return true;
+		image->size = npde.image_size;
+		image->last = npde.last;
+	} else {
+		image->last = true;
+	}
+
+	return true;
+}
+
+bool
+nvbios_image(struct nouveau_bios *bios, int idx, struct nvbios_image *image)
+{
+	memset(image, 0x00, sizeof(*image));
+	do {
+		image->base += image->size;
+		if (image->last || !nvbios_imagen(bios, image))
+			return false;
+	} while(idx--);
+	return true;
+}
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/init.c b/drivers/gpu/drm/nouveau/core/subdev/bios/init.c
index 626380f9e4c0..c6579ef32cd1 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/bios/init.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/init.c
@@ -255,6 +255,8 @@ init_i2c(struct nvbios_init *init, int index)
 		}
 
 		index = init->outp->i2c_index;
+		if (init->outp->type == DCB_OUTPUT_DP)
+			index += NV_I2C_AUX(0);
 	}
 
 	return i2c->find(i2c, index);
@@ -278,7 +280,7 @@ init_wri2cr(struct nvbios_init *init, u8 index, u8 addr, u8 reg, u8 val)
 	return -ENODEV;
 }
 
-static int
+static u8
 init_rdauxr(struct nvbios_init *init, u32 addr)
 {
 	struct nouveau_i2c_port *port = init_i2c(init, -2);
@@ -286,20 +288,24 @@ init_rdauxr(struct nvbios_init *init, u32 addr)
 
 	if (port && init_exec(init)) {
 		int ret = nv_rdaux(port, addr, &data, 1);
-		if (ret)
-			return ret;
-		return data;
+		if (ret == 0)
+			return data;
+		trace("auxch read failed with %d\n", ret);
 	}
 
-	return -ENODEV;
+	return 0x00;
 }
 
 static int
 init_wrauxr(struct nvbios_init *init, u32 addr, u8 data)
 {
 	struct nouveau_i2c_port *port = init_i2c(init, -2);
-	if (port && init_exec(init))
-		return nv_wraux(port, addr, &data, 1);
+	if (port && init_exec(init)) {
+		int ret = nv_wraux(port, addr, &data, 1);
+		if (ret)
+			trace("auxch write failed with %d\n", ret);
+		return ret;
+	}
 	return -ENODEV;
 }
 
@@ -838,6 +844,40 @@ init_io_or(struct nvbios_init *init)
 }
 
 /**
+ * INIT_ANDN_REG - opcode 0x47
+ *
+ */
+static void
+init_andn_reg(struct nvbios_init *init)
+{
+	struct nouveau_bios *bios = init->bios;
+	u32  reg = nv_ro32(bios, init->offset + 1);
+	u32 mask = nv_ro32(bios, init->offset + 5);
+
+	trace("ANDN_REG\tR[0x%06x] &= ~0x%08x\n", reg, mask);
+	init->offset += 9;
+
+	init_mask(init, reg, mask, 0);
+}
+
+/**
+ * INIT_OR_REG - opcode 0x48
+ *
+ */
+static void
+init_or_reg(struct nvbios_init *init)
+{
+	struct nouveau_bios *bios = init->bios;
+	u32  reg = nv_ro32(bios, init->offset + 1);
+	u32 mask = nv_ro32(bios, init->offset + 5);
+
+	trace("OR_REG\tR[0x%06x] |= 0x%08x\n", reg, mask);
+	init->offset += 9;
+
+	init_mask(init, reg, 0, mask);
+}
+
+/**
  * INIT_INDEX_ADDRESS_LATCHED - opcode 0x49
  *
  */
@@ -2068,6 +2108,8 @@ static struct nvbios_init_opcode {
 	[0x3a] = { init_dp_condition },
 	[0x3b] = { init_io_mask_or },
 	[0x3c] = { init_io_or },
+	[0x47] = { init_andn_reg },
+	[0x48] = { init_or_reg },
 	[0x49] = { init_idx_addr_latched },
 	[0x4a] = { init_io_restrict_pll2 },
 	[0x4b] = { init_pll2 },
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/npde.c b/drivers/gpu/drm/nouveau/core/subdev/bios/npde.c
new file mode 100644
index 000000000000..d694716a166c
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/npde.c
@@ -0,0 +1,59 @@
+/*
+ * Copyright 2014 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Ben Skeggs <bskeggs@redhat.com>
+ */
+
+#include <subdev/bios.h>
+#include <subdev/bios/npde.h>
+#include <subdev/bios/pcir.h>
+
+u32
+nvbios_npdeTe(struct nouveau_bios *bios, u32 base)
+{
+	struct nvbios_pcirT pcir;
+	u8  ver; u16 hdr;
+	u32 data = nvbios_pcirTp(bios, base, &ver, &hdr, &pcir);
+	if (data = (data + hdr + 0x0f) & ~0x0f, data) {
+		switch (nv_ro32(bios, data + 0x00)) {
+		case 0x4544504e: /* NPDE */
+			break;
+		default:
+			nv_debug(bios, "%08x: NPDE signature (%08x) unknown\n",
+				 data, nv_ro32(bios, data + 0x00));
+			data = 0;
+			break;
+		}
+	}
+	return data;
+}
+
+u32
+nvbios_npdeTp(struct nouveau_bios *bios, u32 base, struct nvbios_npdeT *info)
+{
+	u32 data = nvbios_npdeTe(bios, base);
+	memset(info, 0x00, sizeof(*info));
+	if (data) {
+		info->image_size = nv_ro16(bios, data + 0x08) * 512;
+		info->last = nv_ro08(bios, data + 0x0a) & 0x80;
+	}
+	return data;
+}
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/pcir.c b/drivers/gpu/drm/nouveau/core/subdev/bios/pcir.c
new file mode 100644
index 000000000000..91dae26bc50f
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/pcir.c
@@ -0,0 +1,69 @@
+/*
+ * Copyright 2014 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Ben Skeggs <bskeggs@redhat.com>
+ */
+
+#include <subdev/bios.h>
+#include <subdev/bios/pcir.h>
+
+u32
+nvbios_pcirTe(struct nouveau_bios *bios, u32 base, u8 *ver, u16 *hdr)
+{
+	u32 data = nv_ro16(bios, base + 0x18);
+	if (data) {
+		data += base;
+		switch (nv_ro32(bios, data + 0x00)) {
+		case 0x52494350: /* PCIR */
+		case 0x53494752: /* RGIS */
+		case 0x5344504e: /* NPDS */
+			*hdr = nv_ro16(bios, data + 0x0a);
+			*ver = nv_ro08(bios, data + 0x0c);
+			break;
+		default:
+			nv_debug(bios, "%08x: PCIR signature (%08x) unknown\n",
+				 data, nv_ro32(bios, data + 0x00));
+			data = 0;
+			break;
+		}
+	}
+	return data;
+}
+
+u32
+nvbios_pcirTp(struct nouveau_bios *bios, u32 base, u8 *ver, u16 *hdr,
+	      struct nvbios_pcirT *info)
+{
+	u32 data = nvbios_pcirTe(bios, base, ver, hdr);
+	memset(info, 0x00, sizeof(*info));
+	if (data) {
+		info->vendor_id = nv_ro16(bios, data + 0x04);
+		info->device_id = nv_ro16(bios, data + 0x06);
+		info->class_code[0] = nv_ro08(bios, data + 0x0d);
+		info->class_code[1] = nv_ro08(bios, data + 0x0e);
+		info->class_code[2] = nv_ro08(bios, data + 0x0f);
+		info->image_size = nv_ro16(bios, data + 0x10) * 512;
+		info->image_rev = nv_ro16(bios, data + 0x12);
+		info->image_type = nv_ro08(bios, data + 0x14);
+		info->last = nv_ro08(bios, data + 0x15) & 0x80;
+	}
+	return data;
+}
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/pmu.c b/drivers/gpu/drm/nouveau/core/subdev/bios/pmu.c
new file mode 100644
index 000000000000..66c56ba07d1b
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/pmu.c
@@ -0,0 +1,135 @@
+/*
+ * Copyright 2014 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Ben Skeggs <bskeggs@redhat.com>
+ */
+
+#include <subdev/bios.h>
+#include <subdev/bios/bit.h>
+#include <subdev/bios/image.h>
+#include <subdev/bios/pmu.h>
+
+static u32
+weirdo_pointer(struct nouveau_bios *bios, u32 data)
+{
+	struct nvbios_image image;
+	int idx = 0;
+	if (nvbios_image(bios, idx++, &image)) {
+		data -= image.size;
+		while (nvbios_image(bios, idx++, &image)) {
+			if (image.type == 0xe0)
+				return image.base + data;
+		}
+	}
+	return 0;
+}
+
+u32
+nvbios_pmuTe(struct nouveau_bios *bios, u8 *ver, u8 *hdr, u8 *cnt, u8 *len)
+{
+	struct bit_entry bit_p;
+	u32 data = 0;
+
+	if (!bit_entry(bios, 'p', &bit_p)) {
+		if (bit_p.version == 2 && bit_p.length >= 4)
+			data = nv_ro32(bios, bit_p.offset + 0x00);
+		if ((data = weirdo_pointer(bios, data))) {
+			*ver = nv_ro08(bios, data + 0x00); /* maybe? */
+			*hdr = nv_ro08(bios, data + 0x01);
+			*len = nv_ro08(bios, data + 0x02);
+			*cnt = nv_ro08(bios, data + 0x03);
+		}
+	}
+
+	return data;
+}
+
+u32
+nvbios_pmuTp(struct nouveau_bios *bios, u8 *ver, u8 *hdr, u8 *cnt, u8 *len,
+	     struct nvbios_pmuT *info)
+{
+	u32 data = nvbios_pmuTe(bios, ver, hdr, cnt, len);
+	memset(info, 0x00, sizeof(*info));
+	switch (!!data * *ver) {
+	default:
+		break;
+	}
+	return data;
+}
+
+u32
+nvbios_pmuEe(struct nouveau_bios *bios, int idx, u8 *ver, u8 *hdr)
+{
+	u8  cnt, len;
+	u32 data = nvbios_pmuTe(bios, ver, hdr, &cnt, &len);
+	if (data && idx < cnt) {
+		data = data + *hdr + (idx * len);
+		*hdr = len;
+		return data;
+	}
+	return 0;
+}
+
+u32
+nvbios_pmuEp(struct nouveau_bios *bios, int idx, u8 *ver, u8 *hdr,
+	     struct nvbios_pmuE *info)
+{
+	u32 data = nvbios_pmuEe(bios, idx, ver, hdr);
+	memset(info, 0x00, sizeof(*info));
+	switch (!!data * *ver) {
+	default:
+		info->type = nv_ro08(bios, data + 0x00);
+		info->data = nv_ro32(bios, data + 0x02);
+		break;
+	}
+	return data;
+}
+
+bool
+nvbios_pmuRm(struct nouveau_bios *bios, u8 type, struct nvbios_pmuR *info)
+{
+	struct nvbios_pmuE pmuE;
+	u8  ver, hdr, idx = 0;
+	u32 data;
+	memset(info, 0x00, sizeof(*info));
+	while ((data = nvbios_pmuEp(bios, idx++, &ver, &hdr, &pmuE))) {
+		if ( pmuE.type == type &&
+		    (data = weirdo_pointer(bios, pmuE.data))) {
+			info->init_addr_pmu = nv_ro32(bios, data + 0x08);
+			info->args_addr_pmu = nv_ro32(bios, data + 0x0c);
+			info->boot_addr     = data + 0x30;
+			info->boot_addr_pmu = nv_ro32(bios, data + 0x10) +
+					      nv_ro32(bios, data + 0x18);
+			info->boot_size     = nv_ro32(bios, data + 0x1c) -
+					      nv_ro32(bios, data + 0x18);
+			info->code_addr     = info->boot_addr + info->boot_size;
+			info->code_addr_pmu = info->boot_addr_pmu +
+					      info->boot_size;
+			info->code_size     = nv_ro32(bios, data + 0x20);
+			info->data_addr     = data + 0x30 +
+					      nv_ro32(bios, data + 0x24);
+			info->data_addr_pmu = nv_ro32(bios, data + 0x28);
+			info->data_size     = nv_ro32(bios, data + 0x2c);
+			return true;
+		}
+	}
+	return false;
+}
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/priv.h b/drivers/gpu/drm/nouveau/core/subdev/bios/priv.h
new file mode 100644
index 000000000000..187d225bd1e9
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/priv.h
@@ -0,0 +1,25 @@
+#ifndef __NVKM_BIOS_PRIV_H__
+#define __NVKM_BIOS_PRIV_H__
+
+#include <subdev/bios.h>
+
+struct nvbios_source {
+	const char *name;
+	void *(*init)(struct nouveau_bios *, const char *);
+	void  (*fini)(void *);
+	u32   (*read)(void *, u32 offset, u32 length, struct nouveau_bios *);
+	bool rw;
+};
+
+int nvbios_extend(struct nouveau_bios *, u32 length);
+int nvbios_shadow(struct nouveau_bios *);
+
+extern const struct nvbios_source nvbios_rom;
+extern const struct nvbios_source nvbios_ramin;
+extern const struct nvbios_source nvbios_acpi_fast;
+extern const struct nvbios_source nvbios_acpi_slow;
+extern const struct nvbios_source nvbios_pcirom;
+extern const struct nvbios_source nvbios_platform;
+extern const struct nvbios_source nvbios_of;
+
+#endif
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/ramcfg.c b/drivers/gpu/drm/nouveau/core/subdev/bios/ramcfg.c
index 6c401f70ab99..1623c8dfe797 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/bios/ramcfg.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/ramcfg.c
@@ -25,6 +25,7 @@
 #include <subdev/bios.h>
 #include <subdev/bios/bit.h>
 #include <subdev/bios/ramcfg.h>
+#include <subdev/bios/M0203.h>
 
 static u8
 nvbios_ramcfg_strap(struct nouveau_subdev *subdev)
@@ -54,12 +55,22 @@ nvbios_ramcfg_index(struct nouveau_subdev *subdev)
 	u8 strap = nvbios_ramcfg_strap(subdev);
 	u32 xlat = 0x00000000;
 	struct bit_entry bit_M;
+	struct nvbios_M0203E M0203E;
+	u8 ver, hdr;
 
 	if (!bit_entry(bios, 'M', &bit_M)) {
 		if (bit_M.version == 1 && bit_M.length >= 5)
 			xlat = nv_ro16(bios, bit_M.offset + 3);
-		if (bit_M.version == 2 && bit_M.length >= 3)
+		if (bit_M.version == 2 && bit_M.length >= 3) {
+			/*XXX: is M ever shorter than this?
+			 *     if not - what is xlat used for now?
+			 *     also - sigh..
+			 */
+			if (bit_M.length >= 7 &&
+			    nvbios_M0203Em(bios, strap, &ver, &hdr, &M0203E))
+				return M0203E.group;
 			xlat = nv_ro16(bios, bit_M.offset + 1);
+		}
 	}
 
 	if (xlat)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/rammap.c b/drivers/gpu/drm/nouveau/core/subdev/bios/rammap.c
index 585e69331ccc..c5685228c322 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/bios/rammap.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/rammap.c
@@ -162,8 +162,9 @@ nvbios_rammapSp(struct nouveau_bios *bios, u32 data,
 		p->ramcfg_10_02_08 = (nv_ro08(bios, data + 0x02) & 0x08) >> 3;
 		p->ramcfg_10_02_10 = (nv_ro08(bios, data + 0x02) & 0x10) >> 4;
 		p->ramcfg_10_02_20 = (nv_ro08(bios, data + 0x02) & 0x20) >> 5;
-		p->ramcfg_10_02_40 = (nv_ro08(bios, data + 0x02) & 0x40) >> 6;
+		p->ramcfg_10_DLLoff = (nv_ro08(bios, data + 0x02) & 0x40) >> 6;
 		p->ramcfg_10_03_0f = (nv_ro08(bios, data + 0x03) & 0x0f) >> 0;
+		p->ramcfg_10_04_01 = (nv_ro08(bios, data + 0x04) & 0x01) >> 0;
 		p->ramcfg_10_05    = (nv_ro08(bios, data + 0x05) & 0xff) >> 0;
 		p->ramcfg_10_06    = (nv_ro08(bios, data + 0x06) & 0xff) >> 0;
 		p->ramcfg_10_07    = (nv_ro08(bios, data + 0x07) & 0xff) >> 0;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/shadow.c b/drivers/gpu/drm/nouveau/core/subdev/bios/shadow.c
new file mode 100644
index 000000000000..bb9e0018d936
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/shadow.c
@@ -0,0 +1,270 @@
+/*
+ * Copyright 2014 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Ben Skeggs <bskeggs@redhat.com>
+ */
+
+#include "priv.h"
+#include <core/option.h>
+#include <subdev/bios/image.h>
+
+struct shadow {
+	struct nouveau_oclass base;
+	u32 skip;
+	const struct nvbios_source *func;
+	void *data;
+	u32 size;
+	int score;
+};
+
+static bool
+shadow_fetch(struct nouveau_bios *bios, u32 upto)
+{
+	struct shadow *mthd = (void *)nv_object(bios)->oclass;
+	const u32 limit = (upto + 3) & ~3;
+	const u32 start = bios->size;
+	void *data = mthd->data;
+	if (nvbios_extend(bios, limit) > 0) {
+		u32 read = mthd->func->read(data, start, limit - start, bios);
+		bios->size = start + read;
+	}
+	return bios->size >= limit;
+}
+
+static u8
+shadow_rd08(struct nouveau_object *object, u64 addr)
+{
+	struct nouveau_bios *bios = (void *)object;
+	if (shadow_fetch(bios, addr + 1))
+		return bios->data[addr];
+	return 0x00;
+}
+
+static u16
+shadow_rd16(struct nouveau_object *object, u64 addr)
+{
+	struct nouveau_bios *bios = (void *)object;
+	if (shadow_fetch(bios, addr + 2))
+		return get_unaligned_le16(&bios->data[addr]);
+	return 0x0000;
+}
+
+static u32
+shadow_rd32(struct nouveau_object *object, u64 addr)
+{
+	struct nouveau_bios *bios = (void *)object;
+	if (shadow_fetch(bios, addr + 4))
+		return get_unaligned_le32(&bios->data[addr]);
+	return 0x00000000;
+}
+
+static struct nouveau_oclass
+shadow_class = {
+	.handle = NV_SUBDEV(VBIOS, 0x00),
+	.ofuncs = &(struct nouveau_ofuncs) {
+		.rd08 = shadow_rd08,
+		.rd16 = shadow_rd16,
+		.rd32 = shadow_rd32,
+	},
+};
+
+static int
+shadow_image(struct nouveau_bios *bios, int idx, struct shadow *mthd)
+{
+	struct nvbios_image image;
+	int score = 1;
+
+	if (!nvbios_image(bios, idx, &image)) {
+		nv_debug(bios, "image %d invalid\n", idx);
+		return 0;
+	}
+	nv_debug(bios, "%08x: type %02x, %d bytes\n",
+		 image.base, image.type, image.size);
+
+	if (!shadow_fetch(bios, image.size)) {
+		nv_debug(bios, "%08x: fetch failed\n", image.base);
+		return 0;
+	}
+
+	switch (image.type) {
+	case 0x00:
+		if (nvbios_checksum(&bios->data[image.base], image.size)) {
+			nv_debug(bios, "%08x: checksum failed\n", image.base);
+			if (mthd->func->rw)
+				score += 1;
+			score += 1;
+		} else {
+			score += 3;
+		}
+		break;
+	default:
+		score += 3;
+		break;
+	}
+
+	if (!image.last)
+		score += shadow_image(bios, idx + 1, mthd);
+	return score;
+}
+
+static int
+shadow_score(struct nouveau_bios *bios, struct shadow *mthd)
+{
+	struct nouveau_oclass *oclass = nv_object(bios)->oclass;
+	int score;
+	nv_object(bios)->oclass = &mthd->base;
+	score = shadow_image(bios, 0, mthd);
+	nv_object(bios)->oclass = oclass;
+	return score;
+
+}
+
+static int
+shadow_method(struct nouveau_bios *bios, struct shadow *mthd, const char *name)
+{
+	const struct nvbios_source *func = mthd->func;
+	if (func->name) {
+		nv_debug(bios, "trying %s...\n", name ? name : func->name);
+		if (func->init) {
+			mthd->data = func->init(bios, name);
+			if (IS_ERR(mthd->data)) {
+				mthd->data = NULL;
+				return 0;
+			}
+		}
+		mthd->score = shadow_score(bios, mthd);
+		if (func->fini)
+			func->fini(mthd->data);
+		nv_debug(bios, "scored %d\n", mthd->score);
+		mthd->data = bios->data;
+		mthd->size = bios->size;
+		bios->data  = NULL;
+		bios->size  = 0;
+	}
+	return mthd->score;
+}
+
+static u32
+shadow_fw_read(void *data, u32 offset, u32 length, struct nouveau_bios *bios)
+{
+	const struct firmware *fw = data;
+	if (offset + length <= fw->size) {
+		memcpy(bios->data + offset, fw->data + offset, length);
+		return length;
+	}
+	return 0;
+}
+
+static void *
+shadow_fw_init(struct nouveau_bios *bios, const char *name)
+{
+	struct device *dev = &nv_device(bios)->pdev->dev;
+	const struct firmware *fw;
+	int ret = request_firmware(&fw, name, dev);
+	if (ret)
+		return ERR_PTR(-ENOENT);
+	return (void *)fw;
+}
+
+static const struct nvbios_source
+shadow_fw = {
+	.name = "firmware",
+	.init = shadow_fw_init,
+	.fini = (void(*)(void *))release_firmware,
+	.read = shadow_fw_read,
+	.rw = false,
+};
+
+int
+nvbios_shadow(struct nouveau_bios *bios)
+{
+	struct shadow mthds[] = {
+		{ shadow_class, 0, &nvbios_of },
+		{ shadow_class, 0, &nvbios_ramin },
+		{ shadow_class, 0, &nvbios_rom },
+		{ shadow_class, 0, &nvbios_acpi_fast },
+		{ shadow_class, 4, &nvbios_acpi_slow },
+		{ shadow_class, 1, &nvbios_pcirom },
+		{ shadow_class, 1, &nvbios_platform },
+		{ shadow_class }
+	}, *mthd = mthds, *best = NULL;
+	const char *optarg;
+	char *source;
+	int optlen;
+
+	/* handle user-specified bios source */
+	optarg = nouveau_stropt(nv_device(bios)->cfgopt, "NvBios", &optlen);
+	source = optarg ? kstrndup(optarg, optlen, GFP_KERNEL) : NULL;
+	if (source) {
+		/* try to match one of the built-in methods */
+		for (mthd = mthds; mthd->func; mthd++) {
+			if (mthd->func->name &&
+			    !strcasecmp(source, mthd->func->name)) {
+				best = mthd;
+				if (shadow_method(bios, mthd, NULL))
+					break;
+			}
+		}
+
+		/* otherwise, attempt to load as firmware */
+		if (!best && (best = mthd)) {
+			mthd->func = &shadow_fw;
+			shadow_method(bios, mthd, source);
+			mthd->func = NULL;
+		}
+
+		if (!best->score) {
+			nv_error(bios, "%s invalid\n", source);
+			kfree(source);
+			source = NULL;
+		}
+	}
+
+	/* scan all potential bios sources, looking for best image */
+	if (!best || !best->score) {
+		for (mthd = mthds, best = mthd; mthd->func; mthd++) {
+			if (!mthd->skip || best->score < mthd->skip) {
+				if (shadow_method(bios, mthd, NULL)) {
+					if (mthd->score > best->score)
+						best = mthd;
+				}
+			}
+		}
+	}
+
+	/* cleanup the ones we didn't use */
+	for (mthd = mthds; mthd->func; mthd++) {
+		if (mthd != best)
+			kfree(mthd->data);
+	}
+
+	if (!best->score) {
+		nv_fatal(bios, "unable to locate usable image\n");
+		return -EINVAL;
+	}
+
+	nv_info(bios, "using image from %s\n", best->func ?
+		best->func->name : source);
+	bios->data = best->data;
+	bios->size = best->size;
+	kfree(source);
+	return 0;
+}
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/shadowacpi.c b/drivers/gpu/drm/nouveau/core/subdev/bios/shadowacpi.c
new file mode 100644
index 000000000000..bc130c12ec06
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/shadowacpi.c
@@ -0,0 +1,111 @@
+/*
+ * Copyright 2012 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "priv.h"
+
+#if defined(CONFIG_ACPI) && defined(CONFIG_X86)
+int nouveau_acpi_get_bios_chunk(uint8_t *bios, int offset, int len);
+bool nouveau_acpi_rom_supported(struct pci_dev *pdev);
+#else
+static inline bool
+nouveau_acpi_rom_supported(struct pci_dev *pdev)
+{
+	return false;
+}
+
+static inline int
+nouveau_acpi_get_bios_chunk(uint8_t *bios, int offset, int len)
+{
+	return -EINVAL;
+}
+#endif
+
+/* This version of the shadow function disobeys the ACPI spec and tries
+ * to fetch in units of more than 4KiB at a time.  This is a LOT faster
+ * on some systems, such as Lenovo W530.
+ */
+static u32
+acpi_read_fast(void *data, u32 offset, u32 length, struct nouveau_bios *bios)
+{
+	u32 limit = (offset + length + 0xfff) & ~0xfff;
+	u32 start = offset & ~0x00000fff;
+	u32 fetch = limit - start;
+
+	if (nvbios_extend(bios, limit) > 0) {
+		int ret = nouveau_acpi_get_bios_chunk(bios->data, start, fetch);
+		if (ret == fetch)
+			return fetch;
+	}
+
+	return 0;
+}
+
+/* Other systems, such as the one in fdo#55948, will report a success
+ * but only return 4KiB of data.  The common bios fetching logic will
+ * detect an invalid image, and fall back to this version of the read
+ * function.
+ */
+static u32
+acpi_read_slow(void *data, u32 offset, u32 length, struct nouveau_bios *bios)
+{
+	u32 limit = (offset + length + 0xfff) & ~0xfff;
+	u32 start = offset & ~0xfff;
+	u32 fetch = 0;
+
+	if (nvbios_extend(bios, limit) > 0) {
+		while (start + fetch < limit) {
+			int ret = nouveau_acpi_get_bios_chunk(bios->data,
+							      start + fetch,
+							      0x1000);
+			if (ret != 0x1000)
+				break;
+			fetch += 0x1000;
+		}
+	}
+
+	return fetch;
+}
+
+static void *
+acpi_init(struct nouveau_bios *bios, const char *name)
+{
+	if (!nouveau_acpi_rom_supported(nv_device(bios)->pdev))
+		return ERR_PTR(-ENODEV);
+	return NULL;
+}
+
+const struct nvbios_source
+nvbios_acpi_fast = {
+	.name = "ACPI",
+	.init = acpi_init,
+	.read = acpi_read_fast,
+	.rw = false,
+};
+
+const struct nvbios_source
+nvbios_acpi_slow = {
+	.name = "ACPI",
+	.init = acpi_init,
+	.read = acpi_read_slow,
+	.rw = false,
+};
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/shadowof.c b/drivers/gpu/drm/nouveau/core/subdev/bios/shadowof.c
new file mode 100644
index 000000000000..3abe487a6025
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/shadowof.c
@@ -0,0 +1,71 @@
+/*
+ * Copyright 2012 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "priv.h"
+
+#if defined(__powerpc__)
+struct priv {
+	const void __iomem *data;
+	int size;
+};
+
+static u32
+of_read(void *data, u32 offset, u32 length, struct nouveau_bios *bios)
+{
+	struct priv *priv = data;
+	if (offset + length <= priv->size) {
+		memcpy_fromio(bios->data + offset, priv->data + offset, length);
+		return length;
+	}
+	return 0;
+}
+
+static void *
+of_init(struct nouveau_bios *bios, const char *name)
+{
+	struct pci_dev *pdev = nv_device(bios)->pdev;
+	struct device_node *dn;
+	struct priv *priv;
+	if (!(dn = pci_device_to_OF_node(pdev)))
+		return ERR_PTR(-ENODEV);
+	if (!(priv = kzalloc(sizeof(*priv), GFP_KERNEL)))
+		return ERR_PTR(-ENOMEM);
+	if ((priv->data = of_get_property(dn, "NVDA,BMP", &priv->size)))
+		return priv;
+	kfree(priv);
+	return ERR_PTR(-EINVAL);
+}
+
+const struct nvbios_source
+nvbios_of = {
+	.name = "OpenFirmware",
+	.init = of_init,
+	.fini = (void(*)(void *))kfree,
+	.read = of_read,
+	.rw = false,
+};
+#else
+const struct nvbios_source
+nvbios_of = {
+};
+#endif
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/shadowpci.c b/drivers/gpu/drm/nouveau/core/subdev/bios/shadowpci.c
new file mode 100644
index 000000000000..1d0389c0abef
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/shadowpci.c
@@ -0,0 +1,108 @@
+/*
+ * Copyright 2012 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "priv.h"
+
+struct priv {
+	struct pci_dev *pdev;
+	void __iomem *rom;
+	size_t size;
+};
+
+static u32
+pcirom_read(void *data, u32 offset, u32 length, struct nouveau_bios *bios)
+{
+	struct priv *priv = data;
+	if (offset + length <= priv->size) {
+		memcpy_fromio(bios->data + offset, priv->rom + offset, length);
+		return length;
+	}
+	return 0;
+}
+
+static void
+pcirom_fini(void *data)
+{
+	struct priv *priv = data;
+	pci_unmap_rom(priv->pdev, priv->rom);
+	pci_disable_rom(priv->pdev);
+	kfree(priv);
+}
+
+static void *
+pcirom_init(struct nouveau_bios *bios, const char *name)
+{
+	struct pci_dev *pdev = nv_device(bios)->pdev;
+	struct priv *priv = NULL;
+	int ret;
+
+	if (!(ret = pci_enable_rom(pdev))) {
+		if (ret = -ENOMEM,
+		    (priv = kmalloc(sizeof(*priv), GFP_KERNEL))) {
+			if (ret = -EFAULT,
+			    (priv->rom = pci_map_rom(pdev, &priv->size))) {
+				priv->pdev = pdev;
+				return priv;
+			}
+			kfree(priv);
+		}
+		pci_disable_rom(pdev);
+	}
+
+	return ERR_PTR(ret);
+}
+
+const struct nvbios_source
+nvbios_pcirom = {
+	.name = "PCIROM",
+	.init = pcirom_init,
+	.fini = pcirom_fini,
+	.read = pcirom_read,
+	.rw = true,
+};
+
+static void *
+platform_init(struct nouveau_bios *bios, const char *name)
+{
+	struct pci_dev *pdev = nv_device(bios)->pdev;
+	struct priv *priv;
+	int ret = -ENOMEM;
+
+	if ((priv = kmalloc(sizeof(*priv), GFP_KERNEL))) {
+		if (ret = -ENODEV,
+		    (priv->rom = pci_platform_rom(pdev, &priv->size)))
+			return priv;
+		kfree(priv);
+	}
+
+	return ERR_PTR(ret);
+}
+
+const struct nvbios_source
+nvbios_platform = {
+	.name = "PLATFORM",
+	.init = platform_init,
+	.fini = (void(*)(void *))kfree,
+	.read = pcirom_read,
+	.rw = true,
+};
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/shadowramin.c b/drivers/gpu/drm/nouveau/core/subdev/bios/shadowramin.c
new file mode 100644
index 000000000000..5e58bba0dd5c
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/shadowramin.c
@@ -0,0 +1,112 @@
+/*
+ * Copyright 2012 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "priv.h"
+
+struct priv {
+	struct nouveau_bios *bios;
+	u32 bar0;
+};
+
+static u32
+pramin_read(void *data, u32 offset, u32 length, struct nouveau_bios *bios)
+{
+	u32 i;
+	if (offset + length <= 0x00100000) {
+		for (i = offset; i < offset + length; i += 4)
+			*(u32 *)&bios->data[i] = nv_rd32(bios, 0x700000 + i);
+		return length;
+	}
+	return 0;
+}
+
+static void
+pramin_fini(void *data)
+{
+	struct priv *priv = data;
+	nv_wr32(priv->bios, 0x001700, priv->bar0);
+	kfree(priv);
+}
+
+static void *
+pramin_init(struct nouveau_bios *bios, const char *name)
+{
+	struct priv *priv = NULL;
+	u64 addr = 0;
+
+	/* PRAMIN always potentially available prior to nv50 */
+	if (nv_device(bios)->card_type < NV_50)
+		return NULL;
+
+	/* we can't get the bios image pointer without PDISP */
+	if (nv_device(bios)->card_type >= GM100)
+		addr = nv_rd32(bios, 0x021c04);
+	else
+	if (nv_device(bios)->card_type >= NV_C0)
+		addr = nv_rd32(bios, 0x022500);
+	if (addr & 0x00000001) {
+		nv_debug(bios, "... display disabled\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	/* check that the window is enabled and in vram, particularly
+	 * important as we don't want to be touching vram on an
+	 * uninitialised board
+	 */
+	addr = nv_rd32(bios, 0x619f04);
+	if (!(addr & 0x00000008)) {
+		nv_debug(bios, "... not enabled\n");
+		return ERR_PTR(-ENODEV);
+	}
+	if ( (addr & 0x00000003) != 1) {
+		nv_debug(bios, "... not in vram\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	/* some alternate method inherited from xf86-video-nv... */
+	addr = (addr & 0xffffff00) << 8;
+	if (!addr) {
+		addr  = (u64)nv_rd32(bios, 0x001700) << 16;
+		addr += 0xf0000;
+	}
+
+	/* modify bar0 PRAMIN window to cover the bios image */
+	if (!(priv = kmalloc(sizeof(*priv), GFP_KERNEL))) {
+		nv_error(bios, "... out of memory\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	priv->bios = bios;
+	priv->bar0 = nv_rd32(bios, 0x001700);
+	nv_wr32(bios, 0x001700, addr >> 16);
+	return priv;
+}
+
+const struct nvbios_source
+nvbios_ramin = {
+	.name = "PRAMIN",
+	.init = pramin_init,
+	.fini = pramin_fini,
+	.read = pramin_read,
+	.rw = true,
+};
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/shadowrom.c b/drivers/gpu/drm/nouveau/core/subdev/bios/shadowrom.c
new file mode 100644
index 000000000000..b7992bc3ffa5
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/shadowrom.c
@@ -0,0 +1,69 @@
+/*
+ * Copyright 2012 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "priv.h"
+
+static u32
+prom_read(void *data, u32 offset, u32 length, struct nouveau_bios *bios)
+{
+	u32 i;
+	if (offset + length <= 0x00100000) {
+		for (i = offset; i < offset + length; i += 4)
+			*(u32 *)&bios->data[i] = nv_rd32(bios, 0x300000 + i);
+		return length;
+	}
+	return 0;
+}
+
+static void
+prom_fini(void *data)
+{
+	struct nouveau_bios *bios = data;
+	if (nv_device(bios)->card_type < NV_50)
+		nv_mask(bios, 0x001850, 0x00000001, 0x00000001);
+	else
+		nv_mask(bios, 0x088050, 0x00000001, 0x00000001);
+}
+
+static void *
+prom_init(struct nouveau_bios *bios, const char *name)
+{
+	if (nv_device(bios)->card_type < NV_50) {
+		if (nv_device(bios)->card_type == NV_40 &&
+		    nv_device(bios)->chipset >= 0x4c)
+			return ERR_PTR(-ENODEV);
+		nv_mask(bios, 0x001850, 0x00000001, 0x00000000);
+	} else {
+		nv_mask(bios, 0x088050, 0x00000001, 0x00000000);
+	}
+	return bios;
+}
+
+const struct nvbios_source
+nvbios_rom = {
+	.name = "PROM",
+	.init = prom_init,
+	.fini = prom_fini,
+	.read = prom_read,
+	.rw = false,
+};
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/timing.c b/drivers/gpu/drm/nouveau/core/subdev/bios/timing.c
index 46d955eb51eb..8521eca1ed9c 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/bios/timing.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/timing.c
@@ -93,10 +93,44 @@ nvbios_timingEp(struct nouveau_bios *bios, int idx,
 	p->timing_hdr = *hdr;
 	switch (!!data * *ver) {
 	case 0x10:
-		p->timing_10_WR = nv_ro08(bios, data + 0x00);
-		p->timing_10_CL = nv_ro08(bios, data + 0x02);
-		p->timing_10_ODT = nv_ro08(bios, data + 0x0e) & 0x07;
-		p->timing_10_CWL = nv_ro08(bios, data + 0x13);
+		p->timing_10_WR    = nv_ro08(bios, data + 0x00);
+		p->timing_10_WTR   = nv_ro08(bios, data + 0x01);
+		p->timing_10_CL    = nv_ro08(bios, data + 0x02);
+		p->timing_10_RC    = nv_ro08(bios, data + 0x03);
+		p->timing_10_RFC   = nv_ro08(bios, data + 0x05);
+		p->timing_10_RAS   = nv_ro08(bios, data + 0x07);
+		p->timing_10_RP    = nv_ro08(bios, data + 0x09);
+		p->timing_10_RCDRD = nv_ro08(bios, data + 0x0a);
+		p->timing_10_RCDWR = nv_ro08(bios, data + 0x0b);
+		p->timing_10_RRD   = nv_ro08(bios, data + 0x0c);
+		p->timing_10_13    = nv_ro08(bios, data + 0x0d);
+		p->timing_10_ODT   = nv_ro08(bios, data + 0x0e) & 0x07;
+
+		p->timing_10_24  = 0xff;
+		p->timing_10_21  = 0;
+		p->timing_10_20  = 0;
+		p->timing_10_CWL = 0;
+		p->timing_10_18  = 0;
+		p->timing_10_16  = 0;
+
+		switch (min_t(u8, *hdr, 25)) {
+		case 25:
+			p->timing_10_24  = nv_ro08(bios, data + 0x18);
+		case 24:
+		case 23:
+		case 22:
+			p->timing_10_21  = nv_ro08(bios, data + 0x15);
+		case 21:
+			p->timing_10_20  = nv_ro08(bios, data + 0x14);
+		case 20:
+			p->timing_10_CWL = nv_ro08(bios, data + 0x13);
+		case 19:
+			p->timing_10_18  = nv_ro08(bios, data + 0x12);
+		case 18:
+		case 17:
+			p->timing_10_16  = nv_ro08(bios, data + 0x10);
+		}
+
 		break;
 	case 0x20:
 		p->timing[0] = nv_ro32(bios, data + 0x00);
diff --git a/drivers/gpu/drm/nouveau/core/subdev/clock/gk20a.c b/drivers/gpu/drm/nouveau/core/subdev/clock/gk20a.c
index 425a8d5e9129..fb4fad374bdd 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/clock/gk20a.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/clock/gk20a.c
@@ -109,7 +109,7 @@ struct gk20a_clk_pllg_params {
 };
 
 static const struct gk20a_clk_pllg_params gk20a_pllg_params = {
-	.min_vco = 1000, .max_vco = 1700,
+	.min_vco = 1000, .max_vco = 2064,
 	.min_u = 12, .max_u = 38,
 	.min_m = 1, .max_m = 255,
 	.min_n = 8, .max_n = 255,
@@ -470,76 +470,91 @@ gk20a_pstates[] = {
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 72000,
+			.voltage = 0,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 108000,
+			.voltage = 1,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 180000,
+			.voltage = 2,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 252000,
+			.voltage = 3,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 324000,
+			.voltage = 4,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 396000,
+			.voltage = 5,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 468000,
+			.voltage = 6,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 540000,
+			.voltage = 7,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 612000,
+			.voltage = 8,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 648000,
+			.voltage = 9,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 684000,
+			.voltage = 10,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 708000,
+			.voltage = 11,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 756000,
+			.voltage = 12,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 804000,
+			.voltage = 13,
 		},
 	},
 	{
 		.base = {
 			.domain[nv_clk_src_gpc] = 852000,
+			.voltage = 14,
 		},
 	},
 };
diff --git a/drivers/gpu/drm/nouveau/core/subdev/clock/nva3.c b/drivers/gpu/drm/nouveau/core/subdev/clock/nva3.c
index 094551d8ad9b..07ad01247675 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/clock/nva3.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/clock/nva3.c
@@ -510,7 +510,7 @@ nva3_clock_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 	int ret;
 
 	ret = nouveau_clock_create(parent, engine, oclass, nva3_domain, NULL, 0,
-				   false, &priv);
+				   true, &priv);
 	*pobject = nv_object(priv);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/base.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/base.c
index 239acfe876c3..0e45cee82463 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/base.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/base.c
@@ -24,8 +24,6 @@
 
 #include <core/option.h>
 
-#include <subdev/bios.h>
-#include <subdev/bios/init.h>
 #include <subdev/vga.h>
 
 #include "priv.h"
@@ -56,7 +54,7 @@ _nouveau_devinit_init(struct nouveau_object *object)
 	if (ret)
 		return ret;
 
-	ret = nvbios_init(&devinit->base, devinit->post);
+	ret = impl->post(&devinit->base, devinit->post);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/gm107.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/gm107.c
index c69bc7f54e37..4ba43d6a1ec8 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/gm107.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/gm107.c
@@ -24,7 +24,7 @@
 
 #include "nv50.h"
 
-static u64
+u64
 gm107_devinit_disable(struct nouveau_devinit *devinit)
 {
 	struct nv50_devinit_priv *priv = (void *)devinit;
@@ -53,4 +53,5 @@ gm107_devinit_oclass = &(struct nouveau_devinit_impl) {
 	},
 	.pll_set = nvc0_devinit_pll_set,
 	.disable = gm107_devinit_disable,
+	.post = nvbios_init,
 }.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/gm204.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/gm204.c
new file mode 100644
index 000000000000..e44a86662a2a
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/gm204.c
@@ -0,0 +1,173 @@
+/*
+ * Copyright 2013 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Ben Skeggs
+ */
+
+#include <subdev/bios.h>
+#include <subdev/bios/bit.h>
+#include <subdev/bios/pmu.h>
+
+#include "nv50.h"
+
+static void
+pmu_code(struct nv50_devinit_priv *priv, u32 pmu, u32 img, u32 len, bool sec)
+{
+	struct nouveau_bios *bios = nouveau_bios(priv);
+	int i;
+
+	nv_wr32(priv, 0x10a180, 0x01000000 | (sec ? 0x10000000 : 0) | pmu);
+	for (i = 0; i < len; i += 4) {
+		if ((i & 0xff) == 0)
+			nv_wr32(priv, 0x10a188, (pmu + i) >> 8);
+		nv_wr32(priv, 0x10a184, nv_ro32(bios, img + i));
+	}
+
+	while (i & 0xff) {
+		nv_wr32(priv, 0x10a184, 0x00000000);
+		i += 4;
+	}
+}
+
+static void
+pmu_data(struct nv50_devinit_priv *priv, u32 pmu, u32 img, u32 len)
+{
+	struct nouveau_bios *bios = nouveau_bios(priv);
+	int i;
+
+	nv_wr32(priv, 0x10a1c0, 0x01000000 | pmu);
+	for (i = 0; i < len; i += 4)
+		nv_wr32(priv, 0x10a1c4, nv_ro32(bios, img + i));
+}
+
+static u32
+pmu_args(struct nv50_devinit_priv *priv, u32 argp, u32 argi)
+{
+	nv_wr32(priv, 0x10a1c0, argp);
+	nv_wr32(priv, 0x10a1c0, nv_rd32(priv, 0x10a1c4) + argi);
+	return nv_rd32(priv, 0x10a1c4);
+}
+
+static void
+pmu_exec(struct nv50_devinit_priv *priv, u32 init_addr)
+{
+	nv_wr32(priv, 0x10a104, init_addr);
+	nv_wr32(priv, 0x10a10c, 0x00000000);
+	nv_wr32(priv, 0x10a100, 0x00000002);
+}
+
+static int
+pmu_load(struct nv50_devinit_priv *priv, u8 type, bool post,
+	 u32 *init_addr_pmu, u32 *args_addr_pmu)
+{
+	struct nouveau_bios *bios = nouveau_bios(priv);
+	struct nvbios_pmuR pmu;
+
+	if (!nvbios_pmuRm(bios, type, &pmu)) {
+		nv_error(priv, "VBIOS PMU fuc %02x not found\n", type);
+		return -EINVAL;
+	}
+
+	if (!post)
+		return 0;
+
+	pmu_code(priv, pmu.boot_addr_pmu, pmu.boot_addr, pmu.boot_size, false);
+	pmu_code(priv, pmu.code_addr_pmu, pmu.code_addr, pmu.code_size, true);
+	pmu_data(priv, pmu.data_addr_pmu, pmu.data_addr, pmu.data_size);
+
+	if (init_addr_pmu) {
+		*init_addr_pmu = pmu.init_addr_pmu;
+		*args_addr_pmu = pmu.args_addr_pmu;
+		return 0;
+	}
+
+	return pmu_exec(priv, pmu.init_addr_pmu), 0;
+}
+
+static int
+gm204_devinit_post(struct nouveau_subdev *subdev, bool post)
+{
+	struct nv50_devinit_priv *priv = (void *)nouveau_devinit(subdev);
+	struct nouveau_bios *bios = nouveau_bios(priv);
+	struct bit_entry bit_I;
+	u32 init, args;
+	int ret;
+
+	if (bit_entry(bios, 'I', &bit_I) || bit_I.version != 1 ||
+					    bit_I.length < 0x1c) {
+		nv_error(priv, "VBIOS PMU init data not found\n");
+		return -EINVAL;
+	}
+
+	/* reset PMU and load init table parser ucode */
+	if (post) {
+		nv_mask(priv, 0x000200, 0x00002000, 0x00000000);
+		nv_mask(priv, 0x000200, 0x00002000, 0x00002000);
+		nv_rd32(priv, 0x000200);
+		while (nv_rd32(priv, 0x10a10c) & 0x00000006) {
+		}
+	}
+
+	ret = pmu_load(priv, 0x04, post, &init, &args);
+	if (ret)
+		return ret;
+
+	/* upload first chunk of init data */
+	if (post) {
+		u32 pmu = pmu_args(priv, args + 0x08, 0x08);
+		u32 img = nv_ro16(bios, bit_I.offset + 0x14);
+		u32 len = nv_ro16(bios, bit_I.offset + 0x16);
+		pmu_data(priv, pmu, img, len);
+	}
+
+	/* upload second chunk of init data */
+	if (post) {
+		u32 pmu = pmu_args(priv, args + 0x08, 0x10);
+		u32 img = nv_ro16(bios, bit_I.offset + 0x18);
+		u32 len = nv_ro16(bios, bit_I.offset + 0x1a);
+		pmu_data(priv, pmu, img, len);
+	}
+
+	/* execute init tables */
+	if (post) {
+		nv_wr32(priv, 0x10a040, 0x00005000);
+		pmu_exec(priv, init);
+		while (!(nv_rd32(priv, 0x10a040) & 0x00002000)) {
+		}
+	}
+
+	/* load and execute some other ucode image (bios therm?) */
+	return pmu_load(priv, 0x01, post, NULL, NULL);
+}
+
+struct nouveau_oclass *
+gm204_devinit_oclass = &(struct nouveau_devinit_impl) {
+	.base.handle = NV_SUBDEV(DEVINIT, 0x07),
+	.base.ofuncs = &(struct nouveau_ofuncs) {
+		.ctor = nv50_devinit_ctor,
+		.dtor = _nouveau_devinit_dtor,
+		.init = nv50_devinit_init,
+		.fini = _nouveau_devinit_fini,
+	},
+	.pll_set = nvc0_devinit_pll_set,
+	.disable = gm107_devinit_disable,
+	.post = gm204_devinit_post,
+}.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv04.c
index 052ad690b468..65651c50f6ea 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv04.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv04.c
@@ -464,4 +464,5 @@ nv04_devinit_oclass = &(struct nouveau_devinit_impl) {
 	},
 	.meminit = nv04_devinit_meminit,
 	.pll_set = nv04_devinit_pll_set,
+	.post = nvbios_init,
 }.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv05.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv05.c
index 4a19c10e5178..a2007a3efc4d 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv05.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv05.c
@@ -136,4 +136,5 @@ nv05_devinit_oclass = &(struct nouveau_devinit_impl) {
 	},
 	.meminit = nv05_devinit_meminit,
 	.pll_set = nv04_devinit_pll_set,
+	.post = nvbios_init,
 }.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv10.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv10.c
index 3b8d657da279..178b46f79b50 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv10.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv10.c
@@ -107,4 +107,5 @@ nv10_devinit_oclass = &(struct nouveau_devinit_impl) {
 	},
 	.meminit = nv10_devinit_meminit,
 	.pll_set = nv04_devinit_pll_set,
+	.post = nvbios_init,
 }.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv1a.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv1a.c
index 526d0c6faacd..995dd97af3e9 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv1a.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv1a.c
@@ -34,4 +34,5 @@ nv1a_devinit_oclass = &(struct nouveau_devinit_impl) {
 		.fini = nv04_devinit_fini,
 	},
 	.pll_set = nv04_devinit_pll_set,
+	.post = nvbios_init,
 }.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv20.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv20.c
index 04bc9732644c..915089fb46f7 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv20.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv20.c
@@ -71,4 +71,5 @@ nv20_devinit_oclass = &(struct nouveau_devinit_impl) {
 	},
 	.meminit = nv20_devinit_meminit,
 	.pll_set = nv04_devinit_pll_set,
+	.post = nvbios_init,
 }.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv50.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv50.c
index b46c62a1d5d8..968334d1dca4 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv50.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv50.c
@@ -26,6 +26,7 @@
 #include <subdev/bios/dcb.h>
 #include <subdev/bios/disp.h>
 #include <subdev/bios/init.h>
+#include <subdev/ibus.h>
 #include <subdev/vga.h>
 
 #include "nv50.h"
@@ -91,6 +92,7 @@ int
 nv50_devinit_init(struct nouveau_object *object)
 {
 	struct nouveau_bios *bios = nouveau_bios(object);
+	struct nouveau_ibus *ibus = nouveau_ibus(object);
 	struct nv50_devinit_priv *priv = (void *)object;
 	struct nvbios_outp info;
 	struct dcb_output outp;
@@ -105,6 +107,13 @@ nv50_devinit_init(struct nouveau_object *object)
 		}
 	}
 
+	/* some boards appear to require certain priv register timeouts
+	 * to be bumped before runing devinit scripts.  not a clue why
+	 * the vbios engineers didn't make the scripts just work...
+	 */
+	if (priv->base.post && ibus)
+		nv_ofuncs(ibus)->init(nv_object(ibus));
+
 	ret = nouveau_devinit_init(&priv->base);
 	if (ret)
 		return ret;
@@ -160,4 +169,5 @@ nv50_devinit_oclass = &(struct nouveau_devinit_impl) {
 	},
 	.pll_set = nv50_devinit_pll_set,
 	.disable = nv50_devinit_disable,
+	.post = nvbios_init,
 }.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv50.h b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv50.h
index 51d5076333ec..f412bb7f780e 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv50.h
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv50.h
@@ -18,4 +18,6 @@ int  nva3_devinit_pll_set(struct nouveau_devinit *, u32, u32);
 
 int  nvc0_devinit_pll_set(struct nouveau_devinit *, u32, u32);
 
+u64  gm107_devinit_disable(struct nouveau_devinit *);
+
 #endif
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv84.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv84.c
index 787422505d87..a7c80ded77cd 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv84.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv84.c
@@ -60,4 +60,5 @@ nv84_devinit_oclass = &(struct nouveau_devinit_impl) {
 	},
 	.pll_set = nv50_devinit_pll_set,
 	.disable = nv84_devinit_disable,
+	.post = nvbios_init,
 }.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv98.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv98.c
index 2b0e963fc6f0..a773253a17f6 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/nv98.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/nv98.c
@@ -59,4 +59,5 @@ nv98_devinit_oclass = &(struct nouveau_devinit_impl) {
 	},
 	.pll_set = nv50_devinit_pll_set,
 	.disable = nv98_devinit_disable,
+	.post = nvbios_init,
 }.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/nva3.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/nva3.c
index 006cf348bda7..b9cd9e53f760 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/nva3.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/nva3.c
@@ -142,4 +142,5 @@ nva3_devinit_oclass = &(struct nouveau_devinit_impl) {
 	.pll_set = nva3_devinit_pll_set,
 	.disable = nva3_devinit_disable,
 	.mmio    = nva3_devinit_mmio,
+	.post = nvbios_init,
 }.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/nvaf.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/nvaf.c
index 4fc68d27eff3..3729846a8e5c 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/nvaf.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/nvaf.c
@@ -60,4 +60,5 @@ nvaf_devinit_oclass = &(struct nouveau_devinit_impl) {
 	},
 	.pll_set = nva3_devinit_pll_set,
 	.disable = nvaf_devinit_disable,
+	.post = nvbios_init,
 }.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/nvc0.c b/drivers/gpu/drm/nouveau/core/subdev/devinit/nvc0.c
index 30c765747eea..80bd7f5eda3d 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/nvc0.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/nvc0.c
@@ -115,4 +115,5 @@ nvc0_devinit_oclass = &(struct nouveau_devinit_impl) {
 	},
 	.pll_set = nvc0_devinit_pll_set,
 	.disable = nvc0_devinit_disable,
+	.post = nvbios_init,
 }.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/devinit/priv.h b/drivers/gpu/drm/nouveau/core/subdev/devinit/priv.h
index f0e8683ad840..cbcd51852472 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/devinit/priv.h
+++ b/drivers/gpu/drm/nouveau/core/subdev/devinit/priv.h
@@ -3,6 +3,7 @@
 
 #include <subdev/bios.h>
 #include <subdev/bios/pll.h>
+#include <subdev/bios/init.h>
 #include <subdev/clock/pll.h>
 #include <subdev/devinit.h>
 
@@ -12,6 +13,7 @@ struct nouveau_devinit_impl {
 	int  (*pll_set)(struct nouveau_devinit *, u32 type, u32 freq);
 	u64  (*disable)(struct nouveau_devinit *);
 	u32  (*mmio)(struct nouveau_devinit *, u32);
+	int  (*post)(struct nouveau_subdev *, bool);
 };
 
 #define nouveau_devinit_create(p,e,o,d)                                        \
diff --git a/drivers/gpu/drm/nouveau/core/subdev/fb/base.c b/drivers/gpu/drm/nouveau/core/subdev/fb/base.c
index f009d8a39d9d..c866148c440f 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/fb/base.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/fb/base.c
@@ -23,37 +23,30 @@
  */
 
 #include <subdev/bios.h>
-#include <subdev/bios/bit.h>
+#include <subdev/bios/M0203.h>
 
 #include "priv.h"
 
 int
 nouveau_fb_bios_memtype(struct nouveau_bios *bios)
 {
-	struct bit_entry M;
-	u8 ramcfg;
-
-	ramcfg = (nv_rd32(bios, 0x101000) & 0x0000003c) >> 2;
-	if (!bit_entry(bios, 'M', &M) && M.version == 2 && M.length >= 5) {
-		u16 table   = nv_ro16(bios, M.offset + 3);
-		u8  version = nv_ro08(bios, table + 0);
-		u8  header  = nv_ro08(bios, table + 1);
-		u8  record  = nv_ro08(bios, table + 2);
-		u8  entries = nv_ro08(bios, table + 3);
-		if (table && version == 0x10 && ramcfg < entries) {
-			u16 entry = table + header + (ramcfg * record);
-			switch (nv_ro08(bios, entry) & 0x0f) {
-			case 0: return NV_MEM_TYPE_DDR2;
-			case 1: return NV_MEM_TYPE_DDR3;
-			case 2: return NV_MEM_TYPE_GDDR3;
-			case 3: return NV_MEM_TYPE_GDDR5;
-			default:
-				break;
-			}
-
+	const u8 ramcfg = (nv_rd32(bios, 0x101000) & 0x0000003c) >> 2;
+	struct nvbios_M0203E M0203E;
+	u8 ver, hdr;
+
+	if (nvbios_M0203Em(bios, ramcfg, &ver, &hdr, &M0203E)) {
+		switch (M0203E.type) {
+		case M0203E_TYPE_DDR2 : return NV_MEM_TYPE_DDR2;
+		case M0203E_TYPE_DDR3 : return NV_MEM_TYPE_DDR3;
+		case M0203E_TYPE_GDDR3: return NV_MEM_TYPE_GDDR3;
+		case M0203E_TYPE_GDDR5: return NV_MEM_TYPE_GDDR5;
+		default:
+			nv_warn(bios, "M0203E type %02x\n", M0203E.type);
+			return NV_MEM_TYPE_UNKNOWN;
 		}
 	}
 
+	nv_warn(bios, "M0203E not matched!\n");
 	return NV_MEM_TYPE_UNKNOWN;
 }
 
diff --git a/drivers/gpu/drm/nouveau/core/subdev/fb/gddr3.c b/drivers/gpu/drm/nouveau/core/subdev/fb/gddr3.c
new file mode 100644
index 000000000000..d85a25d027ee
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/fb/gddr3.c
@@ -0,0 +1,117 @@
+/*
+ * Copyright 2013 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Ben Skeggs <bskeggs@redhat.com>
+ * 	    Roy Spliet <rspliet@eclipso.eu>
+ */
+
+#include <subdev/bios.h>
+#include "priv.h"
+
+struct ramxlat {
+	int id;
+	u8 enc;
+};
+
+static inline int
+ramxlat(const struct ramxlat *xlat, int id)
+{
+	while (xlat->id >= 0) {
+		if (xlat->id == id)
+			return xlat->enc;
+		xlat++;
+	}
+	return -EINVAL;
+}
+
+static const struct ramxlat
+ramgddr3_cl_lo[] = {
+	{ 7, 7 }, { 8, 0 }, { 9, 1 }, { 10, 2 }, { 11, 3 },
+	/* the below are mentioned in some, but not all, gddr3 docs */
+	{ 12, 4 }, { 13, 5 }, { 14, 6 },
+	/* XXX: Per Samsung docs, are these used? They overlap with Qimonda */
+	/* { 4, 4 }, { 5, 5 }, { 6, 6 }, { 12, 8 }, { 13, 9 }, { 14, 10 },
+	 * { 15, 11 }, */
+	{ -1 }
+};
+
+static const struct ramxlat
+ramgddr3_cl_hi[] = {
+	{ 10, 2 }, { 11, 3 }, { 12, 4 }, { 13, 5 }, { 14, 6 }, { 15, 7 },
+	{ 16, 0 }, { 17, 1 },
+	{ -1 }
+};
+
+static const struct ramxlat
+ramgddr3_wr_lo[] = {
+	{ 5, 2 }, { 7, 4 }, { 8, 5 }, { 9, 6 }, { 10, 7 },
+	{ 11, 0 },
+	/* the below are mentioned in some, but not all, gddr3 docs */
+	{ 4, 1 }, { 6, 3 }, { 12, 1 }, { 13 , 2 },
+	{ -1 }
+};
+
+int
+nouveau_gddr3_calc(struct nouveau_ram *ram)
+{
+	int CL, WR, CWL, DLL = 0, ODT = 0, hi;
+
+	switch (ram->next->bios.timing_ver) {
+	case 0x10:
+		CWL = ram->next->bios.timing_10_CWL;
+		CL  = ram->next->bios.timing_10_CL;
+		WR  = ram->next->bios.timing_10_WR;
+		DLL = !ram->next->bios.ramcfg_10_DLLoff;
+		ODT = ram->next->bios.timing_10_ODT;
+		break;
+	case 0x20:
+		CWL = (ram->next->bios.timing[1] & 0x00000f80) >> 7;
+		CL  = (ram->next->bios.timing[1] & 0x0000001f) >> 0;
+		WR  = (ram->next->bios.timing[2] & 0x007f0000) >> 16;
+		/* XXX: Get these values from the VBIOS instead */
+		DLL = !(ram->mr[1] & 0x1);
+		ODT =  (ram->mr[1] & 0x004) >> 2 |
+		       (ram->mr[1] & 0x040) >> 5 |
+		       (ram->mr[1] & 0x200) >> 7;
+		break;
+	default:
+		return -ENOSYS;
+	}
+
+	hi = ram->mr[2] & 0x1;
+	CL  = ramxlat(hi ? ramgddr3_cl_hi : ramgddr3_cl_lo, CL);
+	WR  = ramxlat(ramgddr3_wr_lo, WR);
+	if (CL < 0 || CWL < 1 || CWL > 7 || WR < 0)
+		return -EINVAL;
+
+	ram->mr[0] &= ~0xf74;
+	ram->mr[0] |= (CWL & 0x07) << 9;
+	ram->mr[0] |= (CL & 0x07) << 4;
+	ram->mr[0] |= (CL & 0x08) >> 1;
+
+	ram->mr[1] &= ~0x3fc;
+	ram->mr[1] |= (ODT & 0x03) << 2;
+	ram->mr[1] |= (ODT & 0x03) << 8;
+	ram->mr[1] |= (WR  & 0x03) << 4;
+	ram->mr[1] |= (WR  & 0x04) << 5;
+	ram->mr[1] |= !DLL << 6;
+	return 0;
+}
diff --git a/drivers/gpu/drm/nouveau/core/subdev/fb/priv.h b/drivers/gpu/drm/nouveau/core/subdev/fb/priv.h
index 60322e906dd4..283863f7aa9b 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/fb/priv.h
+++ b/drivers/gpu/drm/nouveau/core/subdev/fb/priv.h
@@ -37,6 +37,7 @@ extern struct nouveau_oclass gm107_ram_oclass;
 
 int nouveau_sddr2_calc(struct nouveau_ram *ram);
 int nouveau_sddr3_calc(struct nouveau_ram *ram);
+int nouveau_gddr3_calc(struct nouveau_ram *ram);
 int nouveau_gddr5_calc(struct nouveau_ram *ram, bool nuts);
 
 #define nouveau_fb_create(p,e,c,d)                                             \
diff --git a/drivers/gpu/drm/nouveau/core/subdev/fb/ramfuc.h b/drivers/gpu/drm/nouveau/core/subdev/fb/ramfuc.h
index d1fbbe4b00a2..0ac7256443bb 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/fb/ramfuc.h
+++ b/drivers/gpu/drm/nouveau/core/subdev/fb/ramfuc.h
@@ -141,6 +141,20 @@ ramfuc_wait_vblank(struct ramfuc *ram)
 }
 
 static inline void
+ramfuc_train(struct ramfuc *ram)
+{
+	nouveau_memx_train(ram->memx);
+}
+
+static inline int
+ramfuc_train_result(struct nouveau_fb *pfb, u32 *result, u32 rsize)
+{
+	struct nouveau_pwr *ppwr = nouveau_pwr(pfb);
+
+	return nouveau_memx_train_result(ppwr, result, rsize);
+}
+
+static inline void
 ramfuc_block(struct ramfuc *ram)
 {
 	nouveau_memx_block(ram->memx);
@@ -162,6 +176,8 @@ ramfuc_unblock(struct ramfuc *ram)
 #define ram_wait(s,r,m,d,n)  ramfuc_wait(&(s)->base, (r), (m), (d), (n))
 #define ram_nsec(s,n)        ramfuc_nsec(&(s)->base, (n))
 #define ram_wait_vblank(s)   ramfuc_wait_vblank(&(s)->base)
+#define ram_train(s)         ramfuc_train(&(s)->base)
+#define ram_train_result(s,r,l) ramfuc_train_result((s), (r), (l))
 #define ram_block(s)         ramfuc_block(&(s)->base)
 #define ram_unblock(s)       ramfuc_unblock(&(s)->base)
 
diff --git a/drivers/gpu/drm/nouveau/core/subdev/fb/ramnva3.c b/drivers/gpu/drm/nouveau/core/subdev/fb/ramnva3.c
index 3601deca0bd5..3b38a538845d 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/fb/ramnva3.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/fb/ramnva3.c
@@ -20,86 +20,512 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  * Authors: Ben Skeggs
+ * 	    Roy Spliet <rspliet@eclipso.eu>
  */
 
 #include <subdev/bios.h>
 #include <subdev/bios/bit.h>
 #include <subdev/bios/pll.h>
 #include <subdev/bios/rammap.h>
+#include <subdev/bios/M0205.h>
 #include <subdev/bios/timing.h>
 
 #include <subdev/clock/nva3.h>
 #include <subdev/clock/pll.h>
 
+#include <subdev/gpio.h>
+
+#include <subdev/timer.h>
+
+#include <engine/fifo.h>
+
 #include <core/option.h>
 
 #include "ramfuc.h"
 
 #include "nv50.h"
 
+/* XXX: Remove when memx gains GPIO support */
+extern int nv50_gpio_location(int line, u32 *reg, u32 *shift);
+
 struct nva3_ramfuc {
 	struct ramfuc base;
+	struct ramfuc_reg r_0x001610;
+	struct ramfuc_reg r_0x001700;
+	struct ramfuc_reg r_0x002504;
 	struct ramfuc_reg r_0x004000;
 	struct ramfuc_reg r_0x004004;
 	struct ramfuc_reg r_0x004018;
 	struct ramfuc_reg r_0x004128;
 	struct ramfuc_reg r_0x004168;
+	struct ramfuc_reg r_0x100080;
 	struct ramfuc_reg r_0x100200;
 	struct ramfuc_reg r_0x100210;
 	struct ramfuc_reg r_0x100220[9];
+	struct ramfuc_reg r_0x100264;
 	struct ramfuc_reg r_0x1002d0;
 	struct ramfuc_reg r_0x1002d4;
 	struct ramfuc_reg r_0x1002dc;
 	struct ramfuc_reg r_0x10053c;
 	struct ramfuc_reg r_0x1005a0;
 	struct ramfuc_reg r_0x1005a4;
+	struct ramfuc_reg r_0x100700;
 	struct ramfuc_reg r_0x100714;
 	struct ramfuc_reg r_0x100718;
 	struct ramfuc_reg r_0x10071c;
+	struct ramfuc_reg r_0x100720;
 	struct ramfuc_reg r_0x100760;
 	struct ramfuc_reg r_0x1007a0;
 	struct ramfuc_reg r_0x1007e0;
+	struct ramfuc_reg r_0x100da0;
 	struct ramfuc_reg r_0x10f804;
 	struct ramfuc_reg r_0x1110e0;
 	struct ramfuc_reg r_0x111100;
 	struct ramfuc_reg r_0x111104;
+	struct ramfuc_reg r_0x1111e0;
+	struct ramfuc_reg r_0x111400;
 	struct ramfuc_reg r_0x611200;
 	struct ramfuc_reg r_mr[4];
+	struct ramfuc_reg r_gpioFBVREF;
+};
+
+struct nva3_ltrain {
+	enum {
+		NVA3_TRAIN_UNKNOWN,
+		NVA3_TRAIN_UNSUPPORTED,
+		NVA3_TRAIN_ONCE,
+		NVA3_TRAIN_EXEC,
+		NVA3_TRAIN_DONE
+	} state;
+	u32 r_100720;
+	u32 r_1111e0;
+	u32 r_111400;
+	struct nouveau_mem *mem;
 };
 
 struct nva3_ram {
 	struct nouveau_ram base;
 	struct nva3_ramfuc fuc;
+	struct nva3_ltrain ltrain;
 };
 
+void
+nva3_link_train_calc(u32 *vals, struct nva3_ltrain *train)
+{
+	int i, lo, hi;
+	u8 median[8], bins[4] = {0, 0, 0, 0}, bin = 0, qty = 0;
+
+	for (i = 0; i < 8; i++) {
+		for (lo = 0; lo < 0x40; lo++) {
+			if (!(vals[lo] & 0x80000000))
+				continue;
+			if (vals[lo] & (0x101 << i))
+				break;
+		}
+
+		if (lo == 0x40)
+			return;
+
+		for (hi = lo + 1; hi < 0x40; hi++) {
+			if (!(vals[lo] & 0x80000000))
+				continue;
+			if (!(vals[hi] & (0x101 << i))) {
+				hi--;
+				break;
+			}
+		}
+
+		median[i] = ((hi - lo) >> 1) + lo;
+		bins[(median[i] & 0xf0) >> 4]++;
+		median[i] += 0x30;
+	}
+
+	/* Find the best value for 0x1111e0 */
+	for (i = 0; i < 4; i++) {
+		if (bins[i] > qty) {
+			bin = i + 3;
+			qty = bins[i];
+		}
+	}
+
+	train->r_100720 = 0;
+	for (i = 0; i < 8; i++) {
+		median[i] = max(median[i], (u8) (bin << 4));
+		median[i] = min(median[i], (u8) ((bin << 4) | 0xf));
+
+		train->r_100720 |= ((median[i] & 0x0f) << (i << 2));
+	}
+
+	train->r_1111e0 = 0x02000000 | (bin * 0x101);
+	train->r_111400 = 0x0;
+}
+
+/*
+ * Link training for (at least) DDR3
+ */
+int
+nva3_link_train(struct nouveau_fb *pfb)
+{
+	struct nouveau_bios *bios = nouveau_bios(pfb);
+	struct nva3_ram *ram = (void *)pfb->ram;
+	struct nouveau_clock *clk = nouveau_clock(pfb);
+	struct nva3_ltrain *train = &ram->ltrain;
+	struct nouveau_device *device = nv_device(pfb);
+	struct nva3_ramfuc *fuc = &ram->fuc;
+	u32 *result, r1700;
+	int ret, i;
+	struct nvbios_M0205T M0205T = { 0 };
+	u8 ver, hdr, cnt, len, snr, ssz;
+	unsigned int clk_current;
+	unsigned long flags;
+	unsigned long *f = &flags;
+
+	if (nouveau_boolopt(device->cfgopt, "NvMemExec", true) != true)
+		return -ENOSYS;
+
+	/* XXX: Multiple partitions? */
+	result = kmalloc(64 * sizeof(u32), GFP_KERNEL);
+	if (!result)
+		return -ENOMEM;
+
+	train->state = NVA3_TRAIN_EXEC;
+
+	/* Clock speeds for training and back */
+	nvbios_M0205Tp(bios, &ver, &hdr, &cnt, &len, &snr, &ssz, &M0205T);
+	if (M0205T.freq == 0)
+		return -ENOENT;
+
+	clk_current = clk->read(clk, nv_clk_src_mem);
+
+	ret = nva3_clock_pre(clk, f);
+	if (ret)
+		goto out;
+
+	/* First: clock up/down */
+	ret = ram->base.calc(pfb, (u32) M0205T.freq * 1000);
+	if (ret)
+		goto out;
+
+	/* Do this *after* calc, eliminates write in script */
+	nv_wr32(pfb, 0x111400, 0x00000000);
+	/* XXX: Magic writes that improve train reliability? */
+	nv_mask(pfb, 0x100674, 0x0000ffff, 0x00000000);
+	nv_mask(pfb, 0x1005e4, 0x0000ffff, 0x00000000);
+	nv_mask(pfb, 0x100b0c, 0x000000ff, 0x00000000);
+	nv_wr32(pfb, 0x100c04, 0x00000400);
+
+	/* Now the training script */
+	r1700 = ram_rd32(fuc, 0x001700);
+
+	ram_mask(fuc, 0x100200, 0x00000800, 0x00000000);
+	ram_wr32(fuc, 0x611200, 0x3300);
+	ram_wait_vblank(fuc);
+	ram_wait(fuc, 0x611200, 0x00000003, 0x00000000, 500000);
+	ram_mask(fuc, 0x001610, 0x00000083, 0x00000003);
+	ram_mask(fuc, 0x100080, 0x00000020, 0x00000000);
+	ram_mask(fuc, 0x10f804, 0x80000000, 0x00000000);
+	ram_wr32(fuc, 0x001700, 0x00000000);
+
+	ram_train(fuc);
+
+	/* Reset */
+	ram_mask(fuc, 0x10f804, 0x80000000, 0x80000000);
+	ram_wr32(fuc, 0x10053c, 0x0);
+	ram_wr32(fuc, 0x100720, train->r_100720);
+	ram_wr32(fuc, 0x1111e0, train->r_1111e0);
+	ram_wr32(fuc, 0x111400, train->r_111400);
+	ram_nuke(fuc, 0x100080);
+	ram_mask(fuc, 0x100080, 0x00000020, 0x00000020);
+	ram_nsec(fuc, 1000);
+
+	ram_wr32(fuc, 0x001700, r1700);
+	ram_mask(fuc, 0x001610, 0x00000083, 0x00000080);
+	ram_wr32(fuc, 0x611200, 0x3330);
+	ram_mask(fuc, 0x100200, 0x00000800, 0x00000800);
+
+	ram_exec(fuc, true);
+
+	ram->base.calc(pfb, clk_current);
+	ram_exec(fuc, true);
+
+	/* Post-processing, avoids flicker */
+	nv_mask(pfb, 0x616308, 0x10, 0x10);
+	nv_mask(pfb, 0x616b08, 0x10, 0x10);
+
+	nva3_clock_post(clk, f);
+
+	ram_train_result(pfb, result, 64);
+	for (i = 0; i < 64; i++)
+		nv_debug(pfb, "Train: %08x", result[i]);
+	nva3_link_train_calc(result, train);
+
+	nv_debug(pfb, "Train: %08x %08x %08x", train->r_100720,
+			train->r_1111e0, train->r_111400);
+
+	kfree(result);
+
+	train->state = NVA3_TRAIN_DONE;
+
+	return ret;
+
+out:
+	if(ret == -EBUSY)
+		f = NULL;
+
+	train->state = NVA3_TRAIN_UNSUPPORTED;
+
+	nva3_clock_post(clk, f);
+	return ret;
+}
+
+int
+nva3_link_train_init(struct nouveau_fb *pfb)
+{
+	static const u32 pattern[16] = {
+		0xaaaaaaaa, 0xcccccccc, 0xdddddddd, 0xeeeeeeee,
+		0x00000000, 0x11111111, 0x44444444, 0xdddddddd,
+		0x33333333, 0x55555555, 0x77777777, 0x66666666,
+		0x99999999, 0x88888888, 0xeeeeeeee, 0xbbbbbbbb,
+	};
+	struct nouveau_bios *bios = nouveau_bios(pfb);
+	struct nva3_ram *ram = (void *)pfb->ram;
+	struct nva3_ltrain *train = &ram->ltrain;
+	struct nouveau_mem *mem;
+	struct nvbios_M0205E M0205E;
+	u8 ver, hdr, cnt, len;
+	u32 r001700;
+	int ret, i = 0;
+
+	train->state = NVA3_TRAIN_UNSUPPORTED;
+
+	/* We support type "5"
+	 * XXX: training pattern table appears to be unused for this routine */
+	if (!nvbios_M0205Ep(bios, i, &ver, &hdr, &cnt, &len, &M0205E))
+		return -ENOENT;
+
+	if (M0205E.type != 5)
+		return 0;
+
+	train->state = NVA3_TRAIN_ONCE;
+
+	ret = pfb->ram->get(pfb, 0x8000, 0x10000, 0, 0x800, &ram->ltrain.mem);
+	if (ret)
+		return ret;
+
+	mem = ram->ltrain.mem;
+
+	nv_wr32(pfb, 0x100538, 0x10000000 | (mem->offset >> 16));
+	nv_wr32(pfb, 0x1005a8, 0x0000ffff);
+	nv_mask(pfb, 0x10f800, 0x00000001, 0x00000001);
+
+	for (i = 0; i < 0x30; i++) {
+		nv_wr32(pfb, 0x10f8c0, (i << 8) | i);
+		nv_wr32(pfb, 0x10f900, pattern[i % 16]);
+	}
+
+	for (i = 0; i < 0x30; i++) {
+		nv_wr32(pfb, 0x10f8e0, (i << 8) | i);
+		nv_wr32(pfb, 0x10f920, pattern[i % 16]);
+	}
+
+	/* And upload the pattern */
+	r001700 = nv_rd32(pfb, 0x1700);
+	nv_wr32(pfb, 0x1700, mem->offset >> 16);
+	for (i = 0; i < 16; i++)
+		nv_wr32(pfb, 0x700000 + (i << 2), pattern[i]);
+	for (i = 0; i < 16; i++)
+		nv_wr32(pfb, 0x700100 + (i << 2), pattern[i]);
+	nv_wr32(pfb, 0x1700, r001700);
+
+	train->r_100720 = nv_rd32(pfb, 0x100720);
+	train->r_1111e0 = nv_rd32(pfb, 0x1111e0);
+	train->r_111400 = nv_rd32(pfb, 0x111400);
+
+	return 0;
+}
+
+void
+nva3_link_train_fini(struct nouveau_fb *pfb)
+{
+	struct nva3_ram *ram = (void *)pfb->ram;
+
+	if (ram->ltrain.mem)
+		pfb->ram->put(pfb, &ram->ltrain.mem);
+}
+
+/*
+ * RAM reclocking
+ */
+#define T(t) cfg->timing_10_##t
+static int
+nva3_ram_timing_calc(struct nouveau_fb *pfb, u32 *timing)
+{
+	struct nva3_ram *ram = (void *)pfb->ram;
+	struct nvbios_ramcfg *cfg = &ram->base.target.bios;
+	int tUNK_base, tUNK_40_0, prevCL;
+	u32 cur2, cur3, cur7, cur8;
+
+	cur2 = nv_rd32(pfb, 0x100228);
+	cur3 = nv_rd32(pfb, 0x10022c);
+	cur7 = nv_rd32(pfb, 0x10023c);
+	cur8 = nv_rd32(pfb, 0x100240);
+
+
+	switch ((!T(CWL)) * ram->base.type) {
+	case NV_MEM_TYPE_DDR2:
+		T(CWL) = T(CL) - 1;
+		break;
+	case NV_MEM_TYPE_GDDR3:
+		T(CWL) = ((cur2 & 0xff000000) >> 24) + 1;
+		break;
+	}
+
+	prevCL = (cur3 & 0x000000ff) + 1;
+	tUNK_base = ((cur7 & 0x00ff0000) >> 16) - prevCL;
+
+	timing[0] = (T(RP) << 24 | T(RAS) << 16 | T(RFC) << 8 | T(RC));
+	timing[1] = (T(WR) + 1 + T(CWL)) << 24 |
+		    max_t(u8,T(18), 1) << 16 |
+		    (T(WTR) + 1 + T(CWL)) << 8 |
+		    (5 + T(CL) - T(CWL));
+	timing[2] = (T(CWL) - 1) << 24 |
+		    (T(RRD) << 16) |
+		    (T(RCDWR) << 8) |
+		    T(RCDRD);
+	timing[3] = (cur3 & 0x00ff0000) |
+		    (0x30 + T(CL)) << 24 |
+		    (0xb + T(CL)) << 8 |
+		    (T(CL) - 1);
+	timing[4] = T(20) << 24 |
+		    T(21) << 16 |
+		    T(13) << 8 |
+		    T(13);
+	timing[5] = T(RFC) << 24 |
+		    max_t(u8,T(RCDRD), T(RCDWR)) << 16 |
+		    max_t(u8, (T(CWL) + 6), (T(CL) + 2)) << 8 |
+		    T(RP);
+	timing[6] = (0x5a + T(CL)) << 16 |
+		    max_t(u8, 1, (6 - T(CL) + T(CWL))) << 8 |
+		    (0x50 + T(CL) - T(CWL));
+	timing[7] = (cur7 & 0xff000000) |
+		    ((tUNK_base + T(CL)) << 16) |
+		    0x202;
+	timing[8] = cur8 & 0xffffff00;
+
+	switch (ram->base.type) {
+	case NV_MEM_TYPE_DDR2:
+	case NV_MEM_TYPE_GDDR3:
+		tUNK_40_0 = prevCL - (cur8 & 0xff);
+		if (tUNK_40_0 > 0)
+			timing[8] |= T(CL);
+		break;
+	default:
+		break;
+	}
+
+	nv_debug(pfb, "Entry: 220: %08x %08x %08x %08x\n",
+			timing[0], timing[1], timing[2], timing[3]);
+	nv_debug(pfb, "  230: %08x %08x %08x %08x\n",
+			timing[4], timing[5], timing[6], timing[7]);
+	nv_debug(pfb, "  240: %08x\n", timing[8]);
+	return 0;
+}
+#undef T
+
+static void
+nouveau_sddr2_dll_reset(struct nva3_ramfuc *fuc)
+{
+	ram_mask(fuc, mr[0], 0x100, 0x100);
+	ram_nsec(fuc, 1000);
+	ram_mask(fuc, mr[0], 0x100, 0x000);
+	ram_nsec(fuc, 1000);
+}
+
+static void
+nouveau_sddr3_dll_disable(struct nva3_ramfuc *fuc, u32 *mr)
+{
+	u32 mr1_old = ram_rd32(fuc, mr[1]);
+
+	if (!(mr1_old & 0x1)) {
+		ram_wr32(fuc, 0x1002d4, 0x00000001);
+		ram_wr32(fuc, mr[1], mr[1]);
+		ram_nsec(fuc, 1000);
+	}
+}
+
+static void
+nouveau_gddr3_dll_disable(struct nva3_ramfuc *fuc, u32 *mr)
+{
+	u32 mr1_old = ram_rd32(fuc, mr[1]);
+
+	if (!(mr1_old & 0x40)) {
+		ram_wr32(fuc, mr[1], mr[1]);
+		ram_nsec(fuc, 1000);
+	}
+}
+
+static void
+nva3_ram_lock_pll(struct nva3_ramfuc *fuc, struct nva3_clock_info *mclk)
+{
+	ram_wr32(fuc, 0x004004, mclk->pll);
+	ram_mask(fuc, 0x004000, 0x00000001, 0x00000001);
+	ram_mask(fuc, 0x004000, 0x00000010, 0x00000000);
+	ram_wait(fuc, 0x004000, 0x00020000, 0x00020000, 64000);
+	ram_mask(fuc, 0x004000, 0x00000010, 0x00000010);
+}
+
+static void
+nva3_ram_fbvref(struct nva3_ramfuc *fuc, u32 val)
+{
+	struct nouveau_gpio *gpio = nouveau_gpio(fuc->base.pfb);
+	struct dcb_gpio_func func;
+	u32 reg, sh, gpio_val;
+	int ret;
+
+	if (gpio->get(gpio, 0, 0x2e, DCB_GPIO_UNUSED) != val) {
+		ret = gpio->find(gpio, 0, 0x2e, DCB_GPIO_UNUSED, &func);
+		if (ret)
+			return;
+
+		nv50_gpio_location(func.line, &reg, &sh);
+		gpio_val = ram_rd32(fuc, gpioFBVREF);
+		if (gpio_val & (8 << sh))
+			val = !val;
+
+		ram_mask(fuc, gpioFBVREF, (0x3 << sh), ((val | 0x2) << sh));
+		ram_nsec(fuc, 20000);
+	}
+}
+
 static int
 nva3_ram_calc(struct nouveau_fb *pfb, u32 freq)
 {
 	struct nouveau_bios *bios = nouveau_bios(pfb);
 	struct nva3_ram *ram = (void *)pfb->ram;
 	struct nva3_ramfuc *fuc = &ram->fuc;
+	struct nva3_ltrain *train = &ram->ltrain;
 	struct nva3_clock_info mclk;
 	struct nouveau_ram_data *next;
 	u8  ver, hdr, cnt, len, strap;
 	u32 data;
-	u32 r004018, r100760, ctrl;
+	u32 r004018, r100760, r100da0, r111100, ctrl;
 	u32 unk714, unk718, unk71c;
 	int ret, i;
+	u32 timing[9];
+	bool pll2pll;
 
 	next = &ram->base.target;
 	next->freq = freq;
 	ram->base.next = next;
 
+	if (ram->ltrain.state == NVA3_TRAIN_ONCE)
+		nva3_link_train(pfb);
+
 	/* lookup memory config data relevant to the target frequency */
 	i = 0;
-	while ((data = nvbios_rammapEp(bios, i++, &ver, &hdr, &cnt, &len,
-				      &next->bios))) {
-		if (freq / 1000 >= next->bios.rammap_min &&
-		    freq / 1000 <= next->bios.rammap_max)
-			break;
-	}
-
-	if (!data || ver != 0x10 || hdr < 0x0e) {
+	data = nvbios_rammapEm(bios, freq / 1000, &ver, &hdr, &cnt, &len,
+				      &next->bios);
+	if (!data || ver != 0x10 || hdr < 0x05) {
 		nv_error(pfb, "invalid/missing rammap entry\n");
 		return -EINVAL;
 	}
@@ -113,7 +539,7 @@ nva3_ram_calc(struct nouveau_fb *pfb, u32 freq)
 
 	data = nvbios_rammapSp(bios, data, ver, hdr, cnt, len, strap,
 			       &ver, &hdr, &next->bios);
-	if (!data || ver != 0x10 || hdr < 0x0e) {
+	if (!data || ver != 0x10 || hdr < 0x09) {
 		nv_error(pfb, "invalid/missing ramcfg entry\n");
 		return -EINVAL;
 	}
@@ -123,7 +549,7 @@ nva3_ram_calc(struct nouveau_fb *pfb, u32 freq)
 		data = nvbios_timingEp(bios, next->bios.ramcfg_timing,
 				       &ver, &hdr, &cnt, &len,
 				       &next->bios);
-		if (!data || ver != 0x10 || hdr < 0x19) {
+		if (!data || ver != 0x10 || hdr < 0x17) {
 			nv_error(pfb, "invalid/missing timing entry\n");
 			return -EINVAL;
 		}
@@ -135,53 +561,99 @@ nva3_ram_calc(struct nouveau_fb *pfb, u32 freq)
 		return ret;
 	}
 
+	nva3_ram_timing_calc(pfb, timing);
+
 	ret = ram_init(fuc, pfb);
 	if (ret)
 		return ret;
 
+	/* Determine ram-specific MR values */
+	ram->base.mr[0] = ram_rd32(fuc, mr[0]);
+	ram->base.mr[1] = ram_rd32(fuc, mr[1]);
+	ram->base.mr[2] = ram_rd32(fuc, mr[2]);
+
+	switch (ram->base.type) {
+	case NV_MEM_TYPE_DDR2:
+		ret = nouveau_sddr2_calc(&ram->base);
+		break;
+	case NV_MEM_TYPE_DDR3:
+		ret = nouveau_sddr3_calc(&ram->base);
+		break;
+	case NV_MEM_TYPE_GDDR3:
+		ret = nouveau_gddr3_calc(&ram->base);
+		break;
+	default:
+		ret = -ENOSYS;
+		break;
+	}
+
+	if (ret)
+		return ret;
+
 	/* XXX: where the fuck does 750MHz come from? */
 	if (freq <= 750000) {
 		r004018 = 0x10000000;
 		r100760 = 0x22222222;
+		r100da0 = 0x00000010;
 	} else {
 		r004018 = 0x00000000;
 		r100760 = 0x00000000;
+		r100da0 = 0x00000000;
 	}
 
+	if (!next->bios.ramcfg_10_DLLoff)
+		r004018 |= 0x00004000;
+
+	/* pll2pll requires to switch to a safe clock first */
 	ctrl = ram_rd32(fuc, 0x004000);
-	if (ctrl & 0x00000008) {
-		if (mclk.pll) {
-			ram_mask(fuc, 0x004128, 0x00000101, 0x00000101);
-			ram_wr32(fuc, 0x004004, mclk.pll);
-			ram_wr32(fuc, 0x004000, (ctrl |= 0x00000001));
-			ram_wr32(fuc, 0x004000, (ctrl &= 0xffffffef));
-			ram_wait(fuc, 0x004000, 0x00020000, 0x00020000, 64000);
-			ram_wr32(fuc, 0x004000, (ctrl |= 0x00000010));
-			ram_wr32(fuc, 0x004018, 0x00005000 | r004018);
-			ram_wr32(fuc, 0x004000, (ctrl |= 0x00000004));
-		}
-	} else {
-		u32 ssel = 0x00000101;
-		if (mclk.clk)
-			ssel |= mclk.clk;
-		else
-			ssel |= 0x00080000; /* 324MHz, shouldn't matter... */
-		ram_mask(fuc, 0x004168, 0x003f3141, ctrl);
-	}
+	pll2pll = (!(ctrl & 0x00000008)) && mclk.pll;
 
+	/* Pre, NVIDIA does this outside the script */
 	if (next->bios.ramcfg_10_02_10) {
 		ram_mask(fuc, 0x111104, 0x00000600, 0x00000000);
 	} else {
 		ram_mask(fuc, 0x111100, 0x40000000, 0x40000000);
 		ram_mask(fuc, 0x111104, 0x00000180, 0x00000000);
 	}
+	/* Always disable this bit during reclock */
+	ram_mask(fuc, 0x100200, 0x00000800, 0x00000000);
+
+	/* If switching from non-pll to pll, lock before disabling FB */
+	if (mclk.pll && !pll2pll) {
+		ram_mask(fuc, 0x004128, 0x003f3141, mclk.clk | 0x00000101);
+		nva3_ram_lock_pll(fuc, &mclk);
+	}
+
+	/* Start with disabling some CRTCs and PFIFO? */
+	ram_wait_vblank(fuc);
+	ram_wr32(fuc, 0x611200, 0x3300);
+	ram_mask(fuc, 0x002504, 0x1, 0x1);
+	ram_nsec(fuc, 10000);
+	ram_wait(fuc, 0x002504, 0x10, 0x10, 20000); /* XXX: or longer? */
+	ram_block(fuc);
+	ram_nsec(fuc, 2000);
+
+	if (!next->bios.ramcfg_10_02_10) {
+		if (ram->base.type == NV_MEM_TYPE_GDDR3)
+			ram_mask(fuc, 0x111100, 0x04020000, 0x00020000);
+		else
+			ram_mask(fuc, 0x111100, 0x04020000, 0x04020000);
+	}
+
+	/* If we're disabling the DLL, do it now */
+	switch (next->bios.ramcfg_10_DLLoff * ram->base.type) {
+	case NV_MEM_TYPE_DDR3:
+		nouveau_sddr3_dll_disable(fuc, ram->base.mr);
+		break;
+	case NV_MEM_TYPE_GDDR3:
+		nouveau_gddr3_dll_disable(fuc, ram->base.mr);
+		break;
+	}
 
-	if (!next->bios.rammap_10_04_02)
-		ram_mask(fuc, 0x100200, 0x00000800, 0x00000000);
-	ram_wr32(fuc, 0x611200, 0x00003300);
-	if (!next->bios.ramcfg_10_02_10)
-		ram_wr32(fuc, 0x111100, 0x4c020000); /*XXX*/
+	if (fuc->r_gpioFBVREF.addr && next->bios.timing_10_ODT)
+		nva3_ram_fbvref(fuc, 0);
 
+	/* Brace RAM for impact */
 	ram_wr32(fuc, 0x1002d4, 0x00000001);
 	ram_wr32(fuc, 0x1002d0, 0x00000001);
 	ram_wr32(fuc, 0x1002d0, 0x00000001);
@@ -189,24 +661,38 @@ nva3_ram_calc(struct nouveau_fb *pfb, u32 freq)
 	ram_wr32(fuc, 0x1002dc, 0x00000001);
 	ram_nsec(fuc, 2000);
 
-	ctrl = ram_rd32(fuc, 0x004000);
-	if (!(ctrl & 0x00000008) && mclk.pll) {
-		ram_wr32(fuc, 0x004000, (ctrl |=  0x00000008));
+	if (nv_device(pfb)->chipset == 0xa3 && freq <= 500000)
+		ram_mask(fuc, 0x100700, 0x00000006, 0x00000006);
+
+	/* Fiddle with clocks */
+	/* There's 4 scenario's
+	 * pll->pll: first switch to a 324MHz clock, set up new PLL, switch
+	 * clk->pll: Set up new PLL, switch
+	 * pll->clk: Set up clock, switch
+	 * clk->clk: Overwrite ctrl and other bits, switch */
+
+	/* Switch to regular clock - 324MHz */
+	if (pll2pll) {
+		ram_mask(fuc, 0x004000, 0x00000004, 0x00000004);
+		ram_mask(fuc, 0x004168, 0x003f3141, 0x00083101);
+		ram_mask(fuc, 0x004000, 0x00000008, 0x00000008);
 		ram_mask(fuc, 0x1110e0, 0x00088000, 0x00088000);
 		ram_wr32(fuc, 0x004018, 0x00001000);
-		ram_wr32(fuc, 0x004000, (ctrl &= ~0x00000001));
-		ram_wr32(fuc, 0x004004, mclk.pll);
-		ram_wr32(fuc, 0x004000, (ctrl |=  0x00000001));
-		udelay(64);
-		ram_wr32(fuc, 0x004018, 0x00005000 | r004018);
-		udelay(20);
-	} else
-	if (!mclk.pll) {
-		ram_mask(fuc, 0x004168, 0x003f3040, mclk.clk);
-		ram_wr32(fuc, 0x004000, (ctrl |= 0x00000008));
+		nva3_ram_lock_pll(fuc, &mclk);
+	}
+
+	if (mclk.pll) {
+		ram_mask(fuc, 0x004000, 0x00000105, 0x00000105);
+		ram_wr32(fuc, 0x004018, 0x00001000 | r004018);
+		ram_wr32(fuc, 0x100da0, r100da0);
+	} else {
+		ram_mask(fuc, 0x004168, 0x003f3141, mclk.clk | 0x00000101);
+		ram_mask(fuc, 0x004000, 0x00000108, 0x00000008);
 		ram_mask(fuc, 0x1110e0, 0x00088000, 0x00088000);
-		ram_wr32(fuc, 0x004018, 0x0000d000 | r004018);
+		ram_wr32(fuc, 0x004018, 0x00009000 | r004018);
+		ram_wr32(fuc, 0x100da0, r100da0);
 	}
+	ram_nsec(fuc, 20000);
 
 	if (next->bios.rammap_10_04_08) {
 		ram_wr32(fuc, 0x1005a0, next->bios.ramcfg_10_06 << 16 |
@@ -220,6 +706,12 @@ nva3_ram_calc(struct nouveau_fb *pfb, u32 freq)
 					0x80000000);
 		ram_mask(fuc, 0x10053c, 0x00001000, 0x00000000);
 	} else {
+		if (train->state == NVA3_TRAIN_DONE) {
+			ram_wr32(fuc, 0x100080, 0x1020);
+			ram_mask(fuc, 0x111400, 0xffffffff, train->r_111400);
+			ram_mask(fuc, 0x1111e0, 0xffffffff, train->r_1111e0);
+			ram_mask(fuc, 0x100720, 0xffffffff, train->r_100720);
+		}
 		ram_mask(fuc, 0x10053c, 0x00001000, 0x00001000);
 		ram_mask(fuc, 0x10f804, 0x80000000, 0x00000000);
 		ram_mask(fuc, 0x100760, 0x22222222, r100760);
@@ -227,65 +719,131 @@ nva3_ram_calc(struct nouveau_fb *pfb, u32 freq)
 		ram_mask(fuc, 0x1007e0, 0x22222222, r100760);
 	}
 
+	if (nv_device(pfb)->chipset == 0xa3 && freq > 500000) {
+		ram_mask(fuc, 0x100700, 0x00000006, 0x00000000);
+	}
+
+	/* Final switch */
 	if (mclk.pll) {
 		ram_mask(fuc, 0x1110e0, 0x00088000, 0x00011000);
-		ram_wr32(fuc, 0x004000, (ctrl &= ~0x00000008));
+		ram_mask(fuc, 0x004000, 0x00000008, 0x00000000);
 	}
 
-	/*XXX: LEAVE */
 	ram_wr32(fuc, 0x1002dc, 0x00000000);
 	ram_wr32(fuc, 0x1002d4, 0x00000001);
 	ram_wr32(fuc, 0x100210, 0x80000000);
-	ram_nsec(fuc, 1000);
-	ram_nsec(fuc, 1000);
+	ram_nsec(fuc, 2000);
 
-	ram_mask(fuc, mr[2], 0x00000000, 0x00000000);
-	ram_nsec(fuc, 1000);
-	ram_nuke(fuc, mr[0]);
-	ram_mask(fuc, mr[0], 0x00000000, 0x00000000);
-	ram_nsec(fuc, 1000);
+	/* Set RAM MR parameters and timings */
+	for (i = 2; i >= 0; i--) {
+		if (ram_rd32(fuc, mr[i]) != ram->base.mr[i]) {
+			ram_wr32(fuc, mr[i], ram->base.mr[i]);
+			ram_nsec(fuc, 1000);
+		}
+	}
 
-	ram_mask(fuc, 0x100220[3], 0x00000000, 0x00000000);
-	ram_mask(fuc, 0x100220[1], 0x00000000, 0x00000000);
-	ram_mask(fuc, 0x100220[6], 0x00000000, 0x00000000);
-	ram_mask(fuc, 0x100220[7], 0x00000000, 0x00000000);
-	ram_mask(fuc, 0x100220[2], 0x00000000, 0x00000000);
-	ram_mask(fuc, 0x100220[4], 0x00000000, 0x00000000);
-	ram_mask(fuc, 0x100220[5], 0x00000000, 0x00000000);
-	ram_mask(fuc, 0x100220[0], 0x00000000, 0x00000000);
-	ram_mask(fuc, 0x100220[8], 0x00000000, 0x00000000);
+	ram_wr32(fuc, 0x100220[3], timing[3]);
+	ram_wr32(fuc, 0x100220[1], timing[1]);
+	ram_wr32(fuc, 0x100220[6], timing[6]);
+	ram_wr32(fuc, 0x100220[7], timing[7]);
+	ram_wr32(fuc, 0x100220[2], timing[2]);
+	ram_wr32(fuc, 0x100220[4], timing[4]);
+	ram_wr32(fuc, 0x100220[5], timing[5]);
+	ram_wr32(fuc, 0x100220[0], timing[0]);
+	ram_wr32(fuc, 0x100220[8], timing[8]);
 
+	/* Misc */
 	ram_mask(fuc, 0x100200, 0x00001000, !next->bios.ramcfg_10_02_08 << 12);
 
-	unk714 = ram_rd32(fuc, 0x100714) & ~0xf0000010;
-	unk718 = ram_rd32(fuc, 0x100718) & ~0x00000100;
-	unk71c = ram_rd32(fuc, 0x10071c) & ~0x00000100;
+	/* XXX: A lot of "chipset"/"ram type" specific stuff...? */
+	unk714  = ram_rd32(fuc, 0x100714) & ~0xf0000130;
+	unk718  = ram_rd32(fuc, 0x100718) & ~0x00000100;
+	unk71c  = ram_rd32(fuc, 0x10071c) & ~0x00000100;
+	r111100 = ram_rd32(fuc, 0x111100) & ~0x3a800000;
+
+	if (next->bios.ramcfg_10_02_04) {
+		switch (ram->base.type) {
+		case NV_MEM_TYPE_DDR3:
+			if (nv_device(pfb)->chipset != 0xa8)
+				r111100 |= 0x00000004;
+			/* no break */
+		case NV_MEM_TYPE_DDR2:
+			r111100 |= 0x08000000;
+			break;
+		default:
+			break;
+		}
+	} else {
+		switch (ram->base.type) {
+		case NV_MEM_TYPE_DDR2:
+			r111100 |= 0x1a800000;
+			unk714  |= 0x00000010;
+			break;
+		case NV_MEM_TYPE_DDR3:
+			if (nv_device(pfb)->chipset == 0xa8) {
+				r111100 |=  0x08000000;
+			} else {
+				r111100 &= ~0x00000004;
+				r111100 |=  0x12800000;
+			}
+			unk714  |= 0x00000010;
+			break;
+		case NV_MEM_TYPE_GDDR3:
+			r111100 |= 0x30000000;
+			unk714  |= 0x00000020;
+			break;
+		default:
+			break;
+		}
+	}
+
+	unk714 |= (next->bios.ramcfg_10_04_01) << 8;
+
 	if (next->bios.ramcfg_10_02_20)
 		unk714 |= 0xf0000000;
-	if (!next->bios.ramcfg_10_02_04)
-		unk714 |= 0x00000010;
-	ram_wr32(fuc, 0x100714, unk714);
-
+	if (next->bios.ramcfg_10_02_02)
+		unk718 |= 0x00000100;
 	if (next->bios.ramcfg_10_02_01)
 		unk71c |= 0x00000100;
-	ram_wr32(fuc, 0x10071c, unk71c);
+	if (next->bios.timing_10_24 != 0xff) {
+		unk718 &= ~0xf0000000;
+		unk718 |= next->bios.timing_10_24 << 28;
+	}
+	if (next->bios.ramcfg_10_02_10)
+		r111100 &= ~0x04020000;
 
-	if (next->bios.ramcfg_10_02_02)
-		unk718 |= 0x00000100;
-	ram_wr32(fuc, 0x100718, unk718);
+	ram_mask(fuc, 0x100714, 0xffffffff, unk714);
+	ram_mask(fuc, 0x10071c, 0xffffffff, unk71c);
+	ram_mask(fuc, 0x100718, 0xffffffff, unk718);
+	ram_mask(fuc, 0x111100, 0xffffffff, r111100);
 
-	if (next->bios.ramcfg_10_02_10)
-		ram_wr32(fuc, 0x111100, 0x48000000); /*XXX*/
+	if (fuc->r_gpioFBVREF.addr && !next->bios.timing_10_ODT)
+		nva3_ram_fbvref(fuc, 1);
 
-	ram_mask(fuc, mr[0], 0x100, 0x100);
-	ram_nsec(fuc, 1000);
-	ram_mask(fuc, mr[0], 0x100, 0x000);
-	ram_nsec(fuc, 1000);
+	/* Reset DLL */
+	if (!next->bios.ramcfg_10_DLLoff)
+		nouveau_sddr2_dll_reset(fuc);
 
-	ram_nsec(fuc, 2000);
-	ram_nsec(fuc, 12000);
+	if (ram->base.type == NV_MEM_TYPE_GDDR3) {
+		ram_nsec(fuc, 31000);
+	} else {
+		ram_nsec(fuc, 14000);
+	}
+
+	if (ram->base.type == NV_MEM_TYPE_DDR3) {
+		ram_wr32(fuc, 0x100264, 0x1);
+		ram_nsec(fuc, 2000);
+	}
 
-	ram_wr32(fuc, 0x611200, 0x00003330);
+	ram_nuke(fuc, 0x100700);
+	ram_mask(fuc, 0x100700, 0x01000000, 0x01000000);
+	ram_mask(fuc, 0x100700, 0x01000000, 0x00000000);
+
+	/* Re-enable FB */
+	ram_unblock(fuc);
+	ram_wr32(fuc, 0x611200, 0x3330);
+
+	/* Post fiddlings */
 	if (next->bios.rammap_10_04_02)
 		ram_mask(fuc, 0x100200, 0x00000800, 0x00000800);
 	if (next->bios.ramcfg_10_02_10) {
@@ -313,7 +871,22 @@ nva3_ram_prog(struct nouveau_fb *pfb)
 	struct nouveau_device *device = nv_device(pfb);
 	struct nva3_ram *ram = (void *)pfb->ram;
 	struct nva3_ramfuc *fuc = &ram->fuc;
-	ram_exec(fuc, nouveau_boolopt(device->cfgopt, "NvMemExec", true));
+	bool exec = nouveau_boolopt(device->cfgopt, "NvMemExec", true);
+
+	if (exec) {
+		nv_mask(pfb, 0x001534, 0x2, 0x2);
+
+		ram_exec(fuc, true);
+
+		/* Post-processing, avoids flicker */
+		nv_mask(pfb, 0x002504, 0x1, 0x0);
+		nv_mask(pfb, 0x001534, 0x2, 0x0);
+
+		nv_mask(pfb, 0x616308, 0x10, 0x10);
+		nv_mask(pfb, 0x616b08, 0x10, 0x10);
+	} else {
+		ram_exec(fuc, false);
+	}
 	return 0;
 }
 
@@ -330,38 +903,24 @@ nva3_ram_init(struct nouveau_object *object)
 {
 	struct nouveau_fb *pfb = (void *)object->parent;
 	struct nva3_ram   *ram = (void *)object;
-	int ret, i;
+	int ret;
 
 	ret = nouveau_ram_init(&ram->base);
 	if (ret)
 		return ret;
 
-	/* prepare for ddr link training, and load training patterns */
-	switch (ram->base.type) {
-	case NV_MEM_TYPE_DDR3: {
-		if (nv_device(pfb)->chipset == 0xa8) {
-			static const u32 pattern[16] = {
-				0xaaaaaaaa, 0xcccccccc, 0xdddddddd, 0xeeeeeeee,
-				0x00000000, 0x11111111, 0x44444444, 0xdddddddd,
-				0x33333333, 0x55555555, 0x77777777, 0x66666666,
-				0x99999999, 0x88888888, 0xeeeeeeee, 0xbbbbbbbb,
-			};
-
-			nv_wr32(pfb, 0x100538, 0x10001ff6); /*XXX*/
-			nv_wr32(pfb, 0x1005a8, 0x0000ffff);
-			nv_mask(pfb, 0x10f800, 0x00000001, 0x00000001);
-			for (i = 0; i < 0x30; i++) {
-				nv_wr32(pfb, 0x10f8c0, (i << 8) | i);
-				nv_wr32(pfb, 0x10f8e0, (i << 8) | i);
-				nv_wr32(pfb, 0x10f900, pattern[i % 16]);
-				nv_wr32(pfb, 0x10f920, pattern[i % 16]);
-			}
-		}
-	}
-		break;
-	default:
-		break;
-	}
+	nva3_link_train_init(pfb);
+
+	return 0;
+}
+
+static int
+nva3_ram_fini(struct nouveau_object *object, bool suspend)
+{
+	struct nouveau_fb *pfb = (void *)object->parent;
+
+	if (!suspend)
+		nva3_link_train_fini(pfb);
 
 	return 0;
 }
@@ -371,8 +930,12 @@ nva3_ram_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 	      struct nouveau_oclass *oclass, void *data, u32 datasize,
 	      struct nouveau_object **pobject)
 {
+	struct nouveau_fb *pfb = nouveau_fb(parent);
+	struct nouveau_gpio *gpio = nouveau_gpio(pfb);
+	struct dcb_gpio_func func;
 	struct nva3_ram *ram;
 	int ret, i;
+	u32 reg, shift;
 
 	ret = nv50_ram_create(parent, engine, oclass, &ram);
 	*pobject = nv_object(ram);
@@ -380,7 +943,9 @@ nva3_ram_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 		return ret;
 
 	switch (ram->base.type) {
+	case NV_MEM_TYPE_DDR2:
 	case NV_MEM_TYPE_DDR3:
+	case NV_MEM_TYPE_GDDR3:
 		ram->base.calc = nva3_ram_calc;
 		ram->base.prog = nva3_ram_prog;
 		ram->base.tidy = nva3_ram_tidy;
@@ -390,31 +955,41 @@ nva3_ram_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 		return 0;
 	}
 
+	ram->fuc.r_0x001610 = ramfuc_reg(0x001610);
+	ram->fuc.r_0x001700 = ramfuc_reg(0x001700);
+	ram->fuc.r_0x002504 = ramfuc_reg(0x002504);
 	ram->fuc.r_0x004000 = ramfuc_reg(0x004000);
 	ram->fuc.r_0x004004 = ramfuc_reg(0x004004);
 	ram->fuc.r_0x004018 = ramfuc_reg(0x004018);
 	ram->fuc.r_0x004128 = ramfuc_reg(0x004128);
 	ram->fuc.r_0x004168 = ramfuc_reg(0x004168);
+	ram->fuc.r_0x100080 = ramfuc_reg(0x100080);
 	ram->fuc.r_0x100200 = ramfuc_reg(0x100200);
 	ram->fuc.r_0x100210 = ramfuc_reg(0x100210);
 	for (i = 0; i < 9; i++)
 		ram->fuc.r_0x100220[i] = ramfuc_reg(0x100220 + (i * 4));
+	ram->fuc.r_0x100264 = ramfuc_reg(0x100264);
 	ram->fuc.r_0x1002d0 = ramfuc_reg(0x1002d0);
 	ram->fuc.r_0x1002d4 = ramfuc_reg(0x1002d4);
 	ram->fuc.r_0x1002dc = ramfuc_reg(0x1002dc);
 	ram->fuc.r_0x10053c = ramfuc_reg(0x10053c);
 	ram->fuc.r_0x1005a0 = ramfuc_reg(0x1005a0);
 	ram->fuc.r_0x1005a4 = ramfuc_reg(0x1005a4);
+	ram->fuc.r_0x100700 = ramfuc_reg(0x100700);
 	ram->fuc.r_0x100714 = ramfuc_reg(0x100714);
 	ram->fuc.r_0x100718 = ramfuc_reg(0x100718);
 	ram->fuc.r_0x10071c = ramfuc_reg(0x10071c);
+	ram->fuc.r_0x100720 = ramfuc_reg(0x100720);
 	ram->fuc.r_0x100760 = ramfuc_stride(0x100760, 4, ram->base.part_mask);
 	ram->fuc.r_0x1007a0 = ramfuc_stride(0x1007a0, 4, ram->base.part_mask);
 	ram->fuc.r_0x1007e0 = ramfuc_stride(0x1007e0, 4, ram->base.part_mask);
+	ram->fuc.r_0x100da0 = ramfuc_stride(0x100da0, 4, ram->base.part_mask);
 	ram->fuc.r_0x10f804 = ramfuc_reg(0x10f804);
 	ram->fuc.r_0x1110e0 = ramfuc_stride(0x1110e0, 4, ram->base.part_mask);
 	ram->fuc.r_0x111100 = ramfuc_reg(0x111100);
 	ram->fuc.r_0x111104 = ramfuc_reg(0x111104);
+	ram->fuc.r_0x1111e0 = ramfuc_reg(0x1111e0);
+	ram->fuc.r_0x111400 = ramfuc_reg(0x111400);
 	ram->fuc.r_0x611200 = ramfuc_reg(0x611200);
 
 	if (ram->base.ranks > 1) {
@@ -429,6 +1004,12 @@ nva3_ram_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 		ram->fuc.r_mr[3] = ramfuc_reg(0x1002e4);
 	}
 
+	ret = gpio->find(gpio, 0, 0x2e, DCB_GPIO_UNUSED, &func);
+	if (ret == 0) {
+		nv50_gpio_location(func.line, &reg, &shift);
+		ram->fuc.r_gpioFBVREF = ramfuc_reg(reg);
+	}
+
 	return 0;
 }
 
@@ -438,6 +1019,6 @@ nva3_ram_oclass = {
 		.ctor = nva3_ram_ctor,
 		.dtor = _nouveau_ram_dtor,
 		.init = nva3_ram_init,
-		.fini = _nouveau_ram_fini,
+		.fini = nva3_ram_fini,
 	},
 };
diff --git a/drivers/gpu/drm/nouveau/core/subdev/fb/sddr2.c b/drivers/gpu/drm/nouveau/core/subdev/fb/sddr2.c
index bb1eb8f3e639..252575f3aa29 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/fb/sddr2.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/fb/sddr2.c
@@ -66,7 +66,7 @@ nouveau_sddr2_calc(struct nouveau_ram *ram)
 	case 0x10:
 		CL  = ram->next->bios.timing_10_CL;
 		WR  = ram->next->bios.timing_10_WR;
-		DLL = !ram->next->bios.ramcfg_10_02_40;
+		DLL = !ram->next->bios.ramcfg_10_DLLoff;
 		ODT = ram->next->bios.timing_10_ODT & 3;
 		break;
 	case 0x20:
diff --git a/drivers/gpu/drm/nouveau/core/subdev/fb/sddr3.c b/drivers/gpu/drm/nouveau/core/subdev/fb/sddr3.c
index 83949b11833a..a2dca4869e52 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/fb/sddr3.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/fb/sddr3.c
@@ -80,7 +80,7 @@ nouveau_sddr3_calc(struct nouveau_ram *ram)
 		CWL = ram->next->bios.timing_10_CWL;
 		CL  = ram->next->bios.timing_10_CL;
 		WR  = ram->next->bios.timing_10_WR;
-		DLL = !ram->next->bios.ramcfg_10_02_40;
+		DLL = !ram->next->bios.ramcfg_10_DLLoff;
 		ODT = ram->next->bios.timing_10_ODT;
 		break;
 	case 0x20:
diff --git a/drivers/gpu/drm/nouveau/core/subdev/gpio/nv50.c b/drivers/gpu/drm/nouveau/core/subdev/gpio/nv50.c
index 1864fa98e6b1..2e30d5a62d6e 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/gpio/nv50.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/gpio/nv50.c
@@ -54,7 +54,7 @@ nv50_gpio_reset(struct nouveau_gpio *gpio, u8 match)
 	}
 }
 
-static int
+int
 nv50_gpio_location(int line, u32 *reg, u32 *shift)
 {
 	const u32 nv50_gpio_reg[4] = { 0xe104, 0xe108, 0xe280, 0xe284 };
diff --git a/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c b/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c
index 2b1bf545e488..0dc605db7ec8 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c
@@ -473,18 +473,56 @@ nouveau_i2c_extdev_sclass[] = {
 	nouveau_anx9805_sclass,
 };
 
+static void
+nouveau_i2c_create_port(struct nouveau_i2c *i2c, int index, u8 type,
+			struct dcb_i2c_entry *info)
+{
+	const struct nouveau_i2c_impl *impl = (void *)nv_oclass(i2c);
+	struct nouveau_oclass *oclass;
+	struct nouveau_object *parent;
+	struct nouveau_object *object;
+	int ret, pad;
+
+	if (info->share != DCB_I2C_UNUSED) {
+		pad    = info->share;
+		oclass = impl->pad_s;
+	} else {
+		if (type != DCB_I2C_NVIO_AUX)
+			pad = 0x100 + info->drive;
+		else
+			pad = 0x100 + info->auxch;
+		oclass = impl->pad_x;
+	}
+
+	ret = nouveau_object_ctor(NULL, nv_object(i2c), oclass, NULL, pad,
+				 &parent);
+	if (ret < 0)
+		return;
+
+	oclass = impl->sclass;
+	do {
+		ret = -EINVAL;
+		if (oclass->handle == type) {
+			ret = nouveau_object_ctor(parent, nv_object(i2c),
+						  oclass, info, index,
+						 &object);
+		}
+	} while (ret && (++oclass)->handle);
+
+	nouveau_object_ref(NULL, &parent);
+}
+
 int
 nouveau_i2c_create_(struct nouveau_object *parent,
 		    struct nouveau_object *engine,
 		    struct nouveau_oclass *oclass,
 		    int length, void **pobject)
 {
-	const struct nouveau_i2c_impl *impl = (void *)oclass;
 	struct nouveau_bios *bios = nouveau_bios(parent);
 	struct nouveau_i2c *i2c;
 	struct nouveau_object *object;
 	struct dcb_i2c_entry info;
-	int ret, i, j, index = -1, pad;
+	int ret, i, j, index = -1;
 	struct dcb_output outp;
 	u8  ver, hdr;
 	u32 data;
@@ -507,43 +545,40 @@ nouveau_i2c_create_(struct nouveau_object *parent,
 	INIT_LIST_HEAD(&i2c->ports);
 
 	while (!dcb_i2c_parse(bios, ++index, &info)) {
-		if (info.type == DCB_I2C_UNUSED)
+		switch (info.type) {
+		case DCB_I2C_NV04_BIT:
+		case DCB_I2C_NV4E_BIT:
+		case DCB_I2C_NVIO_BIT:
+			nouveau_i2c_create_port(i2c, NV_I2C_PORT(index),
+						info.type, &info);
+			break;
+		case DCB_I2C_NVIO_AUX:
+			nouveau_i2c_create_port(i2c, NV_I2C_AUX(index),
+						info.type, &info);
+			break;
+		case DCB_I2C_PMGR:
+			if (info.drive != DCB_I2C_UNUSED) {
+				nouveau_i2c_create_port(i2c, NV_I2C_PORT(index),
+							DCB_I2C_NVIO_BIT,
+							&info);
+			}
+			if (info.auxch != DCB_I2C_UNUSED) {
+				nouveau_i2c_create_port(i2c, NV_I2C_AUX(index),
+							DCB_I2C_NVIO_AUX,
+							&info);
+			}
+			break;
+		case DCB_I2C_UNUSED:
+		default:
 			continue;
-
-		if (info.share != DCB_I2C_UNUSED) {
-			if (info.type == DCB_I2C_NVIO_AUX)
-				pad = info.drive;
-			else
-				pad = info.share;
-			oclass = impl->pad_s;
-		} else {
-			pad = 0x100 + info.drive;
-			oclass = impl->pad_x;
 		}
-
-		ret = nouveau_object_ctor(NULL, *pobject, oclass,
-					  NULL, pad, &parent);
-		if (ret < 0)
-			continue;
-
-		oclass = impl->sclass;
-		do {
-			ret = -EINVAL;
-			if (oclass->handle == info.type) {
-				ret = nouveau_object_ctor(parent, *pobject,
-							  oclass, &info,
-							  index, &object);
-			}
-		} while (ret && (++oclass)->handle);
-
-		nouveau_object_ref(NULL, &parent);
 	}
 
 	/* in addition to the busses specified in the i2c table, there
 	 * may be ddc/aux channels hiding behind external tmds/dp/etc
 	 * transmitters.
 	 */
-	index = ((index + 0x0f) / 0x10) * 0x10;
+	index = NV_I2C_EXT(0);
 	i = -1;
 	while ((data = dcb_outp_parse(bios, ++i, &ver, &hdr, &outp))) {
 		if (!outp.location || !outp.extdev)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/i2c/gm204.c b/drivers/gpu/drm/nouveau/core/subdev/i2c/gm204.c
new file mode 100644
index 000000000000..06a2b87ccbf1
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/i2c/gm204.c
@@ -0,0 +1,221 @@
+/*
+ * Copyright 2012 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Ben Skeggs
+ */
+
+#include "nv50.h"
+
+#define AUX_DBG(fmt, args...) nv_debug(aux, "AUXCH(%d): " fmt, ch, ##args)
+#define AUX_ERR(fmt, args...) nv_error(aux, "AUXCH(%d): " fmt, ch, ##args)
+
+static void
+auxch_fini(struct nouveau_i2c *aux, int ch)
+{
+	nv_mask(aux, 0x00d954 + (ch * 0x50), 0x00310000, 0x00000000);
+}
+
+static int
+auxch_init(struct nouveau_i2c *aux, int ch)
+{
+	const u32 unksel = 1; /* nfi which to use, or if it matters.. */
+	const u32 ureq = unksel ? 0x00100000 : 0x00200000;
+	const u32 urep = unksel ? 0x01000000 : 0x02000000;
+	u32 ctrl, timeout;
+
+	/* wait up to 1ms for any previous transaction to be done... */
+	timeout = 1000;
+	do {
+		ctrl = nv_rd32(aux, 0x00d954 + (ch * 0x50));
+		udelay(1);
+		if (!timeout--) {
+			AUX_ERR("begin idle timeout 0x%08x\n", ctrl);
+			return -EBUSY;
+		}
+	} while (ctrl & 0x03010000);
+
+	/* set some magic, and wait up to 1ms for it to appear */
+	nv_mask(aux, 0x00d954 + (ch * 0x50), 0x00300000, ureq);
+	timeout = 1000;
+	do {
+		ctrl = nv_rd32(aux, 0x00d954 + (ch * 0x50));
+		udelay(1);
+		if (!timeout--) {
+			AUX_ERR("magic wait 0x%08x\n", ctrl);
+			auxch_fini(aux, ch);
+			return -EBUSY;
+		}
+	} while ((ctrl & 0x03000000) != urep);
+
+	return 0;
+}
+
+int
+gm204_aux(struct nouveau_i2c_port *base, bool retry,
+	 u8 type, u32 addr, u8 *data, u8 size)
+{
+	struct nouveau_i2c *aux = nouveau_i2c(base);
+	struct nv50_i2c_port *port = (void *)base;
+	u32 ctrl, stat, timeout, retries;
+	u32 xbuf[4] = {};
+	int ch = port->addr;
+	int ret, i;
+
+	AUX_DBG("%d: 0x%08x %d\n", type, addr, size);
+
+	ret = auxch_init(aux, ch);
+	if (ret)
+		goto out;
+
+	stat = nv_rd32(aux, 0x00d958 + (ch * 0x50));
+	if (!(stat & 0x10000000)) {
+		AUX_DBG("sink not detected\n");
+		ret = -ENXIO;
+		goto out;
+	}
+
+	if (!(type & 1)) {
+		memcpy(xbuf, data, size);
+		for (i = 0; i < 16; i += 4) {
+			AUX_DBG("wr 0x%08x\n", xbuf[i / 4]);
+			nv_wr32(aux, 0x00d930 + (ch * 0x50) + i, xbuf[i / 4]);
+		}
+	}
+
+	ctrl  = nv_rd32(aux, 0x00d954 + (ch * 0x50));
+	ctrl &= ~0x0001f0ff;
+	ctrl |= type << 12;
+	ctrl |= size - 1;
+	nv_wr32(aux, 0x00d950 + (ch * 0x50), addr);
+
+	/* (maybe) retry transaction a number of times on failure... */
+	for (retries = 0; !ret && retries < 32; retries++) {
+		/* reset, and delay a while if this is a retry */
+		nv_wr32(aux, 0x00d954 + (ch * 0x50), 0x80000000 | ctrl);
+		nv_wr32(aux, 0x00d954 + (ch * 0x50), 0x00000000 | ctrl);
+		if (retries)
+			udelay(400);
+
+		/* transaction request, wait up to 1ms for it to complete */
+		nv_wr32(aux, 0x00d954 + (ch * 0x50), 0x00010000 | ctrl);
+
+		timeout = 1000;
+		do {
+			ctrl = nv_rd32(aux, 0x00d954 + (ch * 0x50));
+			udelay(1);
+			if (!timeout--) {
+				AUX_ERR("tx req timeout 0x%08x\n", ctrl);
+				ret = -EIO;
+				goto out;
+			}
+		} while (ctrl & 0x00010000);
+		ret = 1;
+
+		/* read status, and check if transaction completed ok */
+		stat = nv_mask(aux, 0x00d958 + (ch * 0x50), 0, 0);
+		if ((stat & 0x000f0000) == 0x00080000 ||
+		    (stat & 0x000f0000) == 0x00020000)
+			ret = retry ? 0 : 1;
+		if ((stat & 0x00000100))
+			ret = -ETIMEDOUT;
+		if ((stat & 0x00000e00))
+			ret = -EIO;
+
+		AUX_DBG("%02d 0x%08x 0x%08x\n", retries, ctrl, stat);
+	}
+
+	if (type & 1) {
+		for (i = 0; i < 16; i += 4) {
+			xbuf[i / 4] = nv_rd32(aux, 0x00d940 + (ch * 0x50) + i);
+			AUX_DBG("rd 0x%08x\n", xbuf[i / 4]);
+		}
+		memcpy(data, xbuf, size);
+	}
+
+out:
+	auxch_fini(aux, ch);
+	return ret < 0 ? ret : (stat & 0x000f0000) >> 16;
+}
+
+static const struct nouveau_i2c_func
+gm204_aux_func = {
+	.aux       = gm204_aux,
+};
+
+int
+gm204_aux_port_ctor(struct nouveau_object *parent,
+		    struct nouveau_object *engine,
+		    struct nouveau_oclass *oclass, void *data, u32 index,
+		    struct nouveau_object **pobject)
+{
+	struct dcb_i2c_entry *info = data;
+	struct nv50_i2c_port *port;
+	int ret;
+
+	ret = nouveau_i2c_port_create(parent, engine, oclass, index,
+				      &nouveau_i2c_aux_algo, &gm204_aux_func,
+				      &port);
+	*pobject = nv_object(port);
+	if (ret)
+		return ret;
+
+	port->base.aux = info->auxch;
+	port->addr = info->auxch;
+	return 0;
+}
+
+struct nouveau_oclass
+gm204_i2c_sclass[] = {
+	{ .handle = NV_I2C_TYPE_DCBI2C(DCB_I2C_NVIO_BIT),
+	  .ofuncs = &(struct nouveau_ofuncs) {
+		  .ctor = nvd0_i2c_port_ctor,
+		  .dtor = _nouveau_i2c_port_dtor,
+		  .init = nv50_i2c_port_init,
+		  .fini = _nouveau_i2c_port_fini,
+	  },
+	},
+	{ .handle = NV_I2C_TYPE_DCBI2C(DCB_I2C_NVIO_AUX),
+	  .ofuncs = &(struct nouveau_ofuncs) {
+		  .ctor = gm204_aux_port_ctor,
+		  .dtor = _nouveau_i2c_port_dtor,
+		  .init = _nouveau_i2c_port_init,
+		  .fini = _nouveau_i2c_port_fini,
+	  },
+	},
+	{}
+};
+
+struct nouveau_oclass *
+gm204_i2c_oclass = &(struct nouveau_i2c_impl) {
+	.base.handle = NV_SUBDEV(I2C, 0x24),
+	.base.ofuncs = &(struct nouveau_ofuncs) {
+		.ctor = _nouveau_i2c_ctor,
+		.dtor = _nouveau_i2c_dtor,
+		.init = _nouveau_i2c_init,
+		.fini = _nouveau_i2c_fini,
+	},
+	.sclass = gm204_i2c_sclass,
+	.pad_x = &nv04_i2c_pad_oclass,
+	.pad_s = &gm204_i2c_pad_oclass,
+	.aux = 8,
+	.aux_stat = nve0_aux_stat,
+	.aux_mask = nve0_aux_mask,
+}.base;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/i2c/nv50.h b/drivers/gpu/drm/nouveau/core/subdev/i2c/nv50.h
index 5d2a77421c74..9ef965692fb1 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/i2c/nv50.h
+++ b/drivers/gpu/drm/nouveau/core/subdev/i2c/nv50.h
@@ -10,8 +10,6 @@ struct nv50_i2c_priv {
 struct nv50_i2c_port {
 	struct nouveau_i2c_port base;
 	u32 addr;
-	u32 ctrl;
-	u32 data;
 	u32 state;
 };
 
@@ -29,4 +27,8 @@ int  nv94_aux_port_ctor(struct nouveau_object *, struct nouveau_object *,
 void nv94_i2c_acquire(struct nouveau_i2c_port *);
 void nv94_i2c_release(struct nouveau_i2c_port *);
 
+int  nvd0_i2c_port_ctor(struct nouveau_object *, struct nouveau_object *,
+			struct nouveau_oclass *, void *, u32,
+			struct nouveau_object **);
+
 #endif
diff --git a/drivers/gpu/drm/nouveau/core/subdev/i2c/nv94.c b/drivers/gpu/drm/nouveau/core/subdev/i2c/nv94.c
index f59c3a255462..e383ee81f4d2 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/i2c/nv94.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/i2c/nv94.c
@@ -214,10 +214,6 @@ nv94_i2c_port_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 
 	port->state = 7;
 	port->addr = nv50_i2c_addr[info->drive];
-	if (info->share != DCB_I2C_UNUSED) {
-		port->ctrl = 0x00e500 + (info->share * 0x50);
-		port->data = 0x0000e001;
-	}
 	return 0;
 }
 
@@ -242,13 +238,8 @@ nv94_aux_port_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 	if (ret)
 		return ret;
 
-	port->base.aux = info->drive;
-	port->addr = info->drive;
-	if (info->share != DCB_I2C_UNUSED) {
-		port->ctrl = 0x00e500 + (info->drive * 0x50);
-		port->data = 0x00002002;
-	}
-
+	port->base.aux = info->auxch;
+	port->addr = info->auxch;
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/nouveau/core/subdev/i2c/nvd0.c b/drivers/gpu/drm/nouveau/core/subdev/i2c/nvd0.c
index 364ddb1c5f03..fd99380502ec 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/i2c/nvd0.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/i2c/nvd0.c
@@ -48,7 +48,7 @@ nvd0_i2c_func = {
 	.sense_sda = nvd0_i2c_sense_sda,
 };
 
-static int
+int
 nvd0_i2c_port_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 		   struct nouveau_oclass *oclass, void *data, u32 index,
 		   struct nouveau_object **pobject)
@@ -66,10 +66,6 @@ nvd0_i2c_port_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 
 	port->state = 0x00000007;
 	port->addr = 0x00d014 + (info->drive * 0x20);
-	if (info->share != DCB_I2C_UNUSED) {
-		port->ctrl = 0x00e500 + (info->share * 0x50);
-		port->data = 0x0000e001;
-	}
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/nouveau/core/subdev/i2c/nve0.c b/drivers/gpu/drm/nouveau/core/subdev/i2c/nve0.c
index cae77e1ad8dc..25fe5c2d110e 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/i2c/nve0.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/i2c/nve0.c
@@ -24,7 +24,7 @@
 
 #include "nv50.h"
 
-static void
+void
 nve0_aux_stat(struct nouveau_i2c *i2c, u32 *hi, u32 *lo, u32 *rq, u32 *tx)
 {
 	u32 intr = nv_rd32(i2c, 0x00dc60);
@@ -38,7 +38,7 @@ nve0_aux_stat(struct nouveau_i2c *i2c, u32 *hi, u32 *lo, u32 *rq, u32 *tx)
 	nv_wr32(i2c, 0x00dc60, intr);
 }
 
-static void
+void
 nve0_aux_mask(struct nouveau_i2c *i2c, u32 type, u32 mask, u32 data)
 {
 	u32 temp = nv_rd32(i2c, 0x00dc68), i;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/i2c/padgm204.c b/drivers/gpu/drm/nouveau/core/subdev/i2c/padgm204.c
new file mode 100644
index 000000000000..f0e6fbbaa8cd
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/i2c/padgm204.c
@@ -0,0 +1,86 @@
+/*
+ * Copyright 2014 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Ben Skeggs
+ */
+
+#include "pad.h"
+
+struct gm204_i2c_pad {
+	struct nvkm_i2c_pad base;
+	int addr;
+};
+
+static int
+gm204_i2c_pad_fini(struct nouveau_object *object, bool suspend)
+{
+	struct nouveau_i2c *i2c = (void *)object->engine;
+	struct gm204_i2c_pad *pad = (void *)object;
+	nv_mask(i2c, 0x00d97c + pad->addr, 0x00000001, 0x00000001);
+	return nvkm_i2c_pad_fini(&pad->base, suspend);
+}
+
+static int
+gm204_i2c_pad_init(struct nouveau_object *object)
+{
+	struct nouveau_i2c *i2c = (void *)object->engine;
+	struct gm204_i2c_pad *pad = (void *)object;
+
+	switch (nv_oclass(pad->base.next)->handle) {
+	case NV_I2C_TYPE_DCBI2C(DCB_I2C_NVIO_AUX):
+		nv_mask(i2c, 0x00d970 + pad->addr, 0x0000c003, 0x00000002);
+		break;
+	case NV_I2C_TYPE_DCBI2C(DCB_I2C_NVIO_BIT):
+	default:
+		nv_mask(i2c, 0x00d970 + pad->addr, 0x0000c003, 0x0000c001);
+		break;
+	}
+
+	nv_mask(i2c, 0x00d97c + pad->addr, 0x00000001, 0x00000000);
+	return nvkm_i2c_pad_init(&pad->base);
+}
+
+static int
+gm204_i2c_pad_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
+		  struct nouveau_oclass *oclass, void *data, u32 index,
+		  struct nouveau_object **pobject)
+{
+	struct gm204_i2c_pad *pad;
+	int ret;
+
+	ret = nvkm_i2c_pad_create(parent, engine, oclass, index, &pad);
+	*pobject = nv_object(pad);
+	if (ret)
+		return ret;
+
+	pad->addr = index * 0x50;;
+	return 0;
+}
+
+struct nouveau_oclass
+gm204_i2c_pad_oclass = {
+	.ofuncs = &(struct nouveau_ofuncs) {
+		.ctor = gm204_i2c_pad_ctor,
+		.dtor = _nvkm_i2c_pad_dtor,
+		.init = gm204_i2c_pad_init,
+		.fini = gm204_i2c_pad_fini,
+	},
+};
diff --git a/drivers/gpu/drm/nouveau/core/subdev/i2c/priv.h b/drivers/gpu/drm/nouveau/core/subdev/i2c/priv.h
index 780090b6425a..4fe7ae3fde4e 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/i2c/priv.h
+++ b/drivers/gpu/drm/nouveau/core/subdev/i2c/priv.h
@@ -5,6 +5,7 @@
 
 extern struct nouveau_oclass nv04_i2c_pad_oclass;
 extern struct nouveau_oclass nv94_i2c_pad_oclass;
+extern struct nouveau_oclass gm204_i2c_pad_oclass;
 
 #define nouveau_i2c_port_create(p,e,o,i,a,f,d)                                 \
 	nouveau_i2c_port_create_((p), (e), (o), (i), (a), (f),                 \
@@ -82,4 +83,7 @@ struct nouveau_i2c_impl {
 void nv94_aux_stat(struct nouveau_i2c *, u32 *, u32 *, u32 *, u32 *);
 void nv94_aux_mask(struct nouveau_i2c *, u32, u32, u32);
 
+void nve0_aux_stat(struct nouveau_i2c *, u32 *, u32 *, u32 *, u32 *);
+void nve0_aux_mask(struct nouveau_i2c *, u32, u32, u32);
+
 #endif
diff --git a/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/memx.fuc b/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/memx.fuc
index e89789a53b80..ec03f9a4290b 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/memx.fuc
+++ b/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/memx.fuc
@@ -50,6 +50,7 @@ handler(WR32  , 0x0000, 0x0002, #memx_func_wr32)
 handler(WAIT  , 0x0004, 0x0000, #memx_func_wait)
 handler(DELAY , 0x0001, 0x0000, #memx_func_delay)
 handler(VBLANK, 0x0001, 0x0000, #memx_func_wait_vblank)
+handler(TRAIN , 0x0000, 0x0000, #memx_func_train)
 memx_func_tail:
 
 .equ #memx_func_size #memx_func_next - #memx_func_head
@@ -63,6 +64,10 @@ memx_ts_end:
 memx_data_head:
 .skip 0x0800
 memx_data_tail:
+
+memx_train_head:
+.skip 0x0100
+memx_train_tail:
 #endif
 
 /******************************************************************************
@@ -260,6 +265,101 @@ memx_func_delay:
 // description
 //
 // $r15 - current (memx)
+// $r4  - packet length
+// $r3  - opcode desciption
+// $r0  - zero
+memx_func_train:
+#if NVKM_PPWR_CHIPSET == GT215
+// $r5 - outer loop counter
+// $r6 - inner loop counter
+// $r7 - entry counter (#memx_train_head + $r7)
+	movw $r5 0x3
+	movw $r7 0x0
+
+// Read random memory to wake up... things
+	imm32($r9, 0x700000)
+	nv_rd32($r8,$r9)
+	movw $r14 0x2710
+	call(nsec)
+
+	memx_func_train_loop_outer:
+		mulu $r8 $r5 0x101
+		sethi $r8 0x02000000
+		imm32($r9, 0x1111e0)
+		nv_wr32($r9, $r8)
+		push $r5
+
+		movw $r6 0x0
+		memx_func_train_loop_inner:
+			movw $r8 0x1111
+			mulu $r9 $r6 $r8
+			shl b32 $r8 $r9 0x10
+			or $r8 $r9
+			imm32($r9, 0x100720)
+			nv_wr32($r9, $r8)
+
+			imm32($r9, 0x100080)
+			nv_rd32($r8, $r9)
+			or $r8 $r8 0x20
+			nv_wr32($r9, $r8)
+
+			imm32($r9, 0x10053c)
+			imm32($r8, 0x80003002)
+			nv_wr32($r9, $r8)
+
+			imm32($r14, 0x100560)
+			imm32($r13, 0x80000000)
+			add b32 $r12 $r13 0
+			imm32($r11, 0x001e8480)
+			call(wait)
+
+			// $r5 - inner inner loop counter
+			// $r9 - result
+			movw $r5 0
+			imm32($r9, 0x8300ffff)
+			memx_func_train_loop_4x:
+				imm32($r10, 0x100080)
+				nv_rd32($r8, $r10)
+				imm32($r11, 0xffffffdf)
+				and $r8 $r11
+				nv_wr32($r10, $r8)
+
+				imm32($r10, 0x10053c)
+				imm32($r8, 0x80003002)
+				nv_wr32($r10, $r8)
+
+				imm32($r14, 0x100560)
+				imm32($r13, 0x80000000)
+				mov b32 $r12 $r13
+				imm32($r11, 0x00002710)
+				call(wait)
+
+				nv_rd32($r13, $r14)
+				and $r9 $r9 $r13
+
+				add b32 $r5 1
+				cmp b16 $r5 0x4
+				bra l #memx_func_train_loop_4x
+
+			add b32 $r10 $r7 #memx_train_head
+			st b32 D[$r10 + 0] $r9
+			add b32 $r6 1
+			add b32 $r7 4
+
+			cmp b16 $r6 0x10
+			bra l #memx_func_train_loop_inner
+
+		pop $r5
+		add b32 $r5 1
+		cmp b16 $r5 7
+		bra l #memx_func_train_loop_outer
+
+#endif
+	ret
+
+// description
+//
+// $r15 - current (memx)
 // $r14 - sender process name
 // $r13 - message (exec)
 // $r12 - head of script
@@ -307,8 +407,19 @@ memx_exec:
 // $r11 - data1
 // $r0  - zero
 memx_info:
+	cmp b16 $r12 0x1
+	bra e #memx_info_train
+
+	memx_info_data:
 	mov $r12 #memx_data_head
 	mov $r11 #memx_data_tail - #memx_data_head
+	bra #memx_info_send
+
+	memx_info_train:
+	mov $r12 #memx_train_head
+	mov $r11 #memx_train_tail - #memx_train_head
+
+	memx_info_send:
 	call(send)
 	ret
 
diff --git a/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nv108.fuc.h b/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nv108.fuc.h
index 4d278a96b2bb..713e11e2953d 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nv108.fuc.h
+++ b/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nv108.fuc.h
@@ -46,8 +46,8 @@ uint32_t nv108_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x584d454d,
-	0x0000061c,
-	0x0000060e,
+	0x0000062d,
+	0x0000061f,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -68,8 +68,8 @@ uint32_t nv108_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x46524550,
-	0x00000620,
-	0x0000061e,
+	0x00000631,
+	0x0000062f,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -90,8 +90,8 @@ uint32_t nv108_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x5f433249,
-	0x00000a24,
-	0x000008cb,
+	0x00000a35,
+	0x000008dc,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -112,8 +112,8 @@ uint32_t nv108_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x54534554,
-	0x00000a45,
-	0x00000a26,
+	0x00000a56,
+	0x00000a37,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -134,8 +134,8 @@ uint32_t nv108_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x454c4449,
-	0x00000a50,
-	0x00000a4e,
+	0x00000a61,
+	0x00000a5f,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -246,13 +246,15 @@ uint32_t nv108_pwr_data[] = {
 	0x00010006,
 	0x00000000,
 	0x0000057b,
-/* 0x03b8: memx_func_tail */
-/* 0x03b8: memx_ts_start */
+	0x00000007,
 	0x00000000,
-/* 0x03bc: memx_ts_end */
+	0x000005c3,
+/* 0x03c4: memx_func_tail */
+/* 0x03c4: memx_ts_start */
 	0x00000000,
-/* 0x03c0: memx_data_head */
+/* 0x03c8: memx_ts_end */
 	0x00000000,
+/* 0x03cc: memx_data_head */
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -764,8 +766,75 @@ uint32_t nv108_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-/* 0x0bc0: memx_data_tail */
-/* 0x0bc0: i2c_scl_map */
+	0x00000000,
+/* 0x0bcc: memx_data_tail */
+/* 0x0bcc: memx_train_head */
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+/* 0x0ccc: memx_train_tail */
+/* 0x0ccc: i2c_scl_map */
 	0x00000400,
 	0x00000800,
 	0x00001000,
@@ -776,7 +845,7 @@ uint32_t nv108_pwr_data[] = {
 	0x00020000,
 	0x00040000,
 	0x00080000,
-/* 0x0be8: i2c_sda_map */
+/* 0x0cf4: i2c_sda_map */
 	0x00100000,
 	0x00200000,
 	0x00400000,
@@ -844,9 +913,6 @@ uint32_t nv108_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
 };
 
 uint32_t nv108_pwr_code[] = {
@@ -1215,10 +1281,10 @@ uint32_t nv108_pwr_code[] = {
 	0xf40464f0,
 	0x2c06f70b,
 	0xb50066cf,
-	0x00f8ee06,
+	0x00f8f106,
 /* 0x0500: memx_func_leave */
 	0x66cf2c06,
-	0xef06b500,
+	0xf206b500,
 	0xe4400406,
 	0x0006f607,
 /* 0x0512: memx_func_leave_wait */
@@ -1270,370 +1336,374 @@ uint32_t nv108_pwr_code[] = {
 	0x9800f800,
 	0x10b6001e,
 	0x005d7e04,
-/* 0x05c3: memx_exec */
-	0xf900f800,
-	0xb2d0f9e0,
-/* 0x05cb: memx_exec_next */
-	0x98b2b2c1,
-	0x10b60013,
-	0xf034e704,
-	0xe033e701,
-	0x0132b601,
-	0x980c30f0,
-	0x55f9de35,
-	0x1ef412a6,
-	0xee0b98e5,
-	0xbbef0c98,
-	0xc44b02cb,
-	0x00bbcf07,
-	0xe0fcd0fc,
-	0x0002c27e,
-/* 0x0602: memx_info */
-	0xc04c00f8,
+/* 0x05c3: memx_func_train */
+	0xf800f800,
+/* 0x05c5: memx_exec */
+	0xf9e0f900,
+	0xb2c1b2d0,
+/* 0x05cd: memx_exec_next */
+	0x001398b2,
+	0xe70410b6,
+	0xe701f034,
+	0xb601e033,
+	0x30f00132,
+	0xde35980c,
+	0x12a655f9,
+	0x98e51ef4,
+	0x0c98f10b,
+	0x02cbbbf2,
+	0xcf07c44b,
+	0xd0fc00bb,
+	0xc27ee0fc,
+	0x00f80002,
+/* 0x0604: memx_info */
+	0xf401c670,
+/* 0x060a: memx_info_data */
+	0xcc4c0c0b,
 	0x08004b03,
-	0x0002c27e,
-/* 0x060e: memx_recv */
-	0xd6b000f8,
-	0xb20bf401,
-	0xf400d6b0,
-	0x00f8eb0b,
-/* 0x061c: memx_init */
-/* 0x061e: perf_recv */
-	0x00f800f8,
-/* 0x0620: perf_init */
-/* 0x0622: i2c_drive_scl */
-	0x36b000f8,
-	0x0d0bf400,
-	0xf607e040,
-	0x04bd0001,
-/* 0x0632: i2c_drive_scl_lo */
-	0xe44000f8,
-	0x0001f607,
-	0x00f804bd,
-/* 0x063c: i2c_drive_sda */
-	0xf40036b0,
-	0xe0400d0b,
-	0x0002f607,
-	0x00f804bd,
-/* 0x064c: i2c_drive_sda_lo */
-	0xf607e440,
-	0x04bd0002,
-/* 0x0656: i2c_sense_scl */
-	0x32f400f8,
-	0x07c44301,
-	0xfd0033cf,
-	0x0bf40431,
-	0x0131f406,
-/* 0x0668: i2c_sense_scl_done */
-/* 0x066a: i2c_sense_sda */
-	0x32f400f8,
-	0x07c44301,
-	0xfd0033cf,
-	0x0bf40432,
-	0x0131f406,
-/* 0x067c: i2c_sense_sda_done */
-/* 0x067e: i2c_raise_scl */
-	0x40f900f8,
-	0x03089844,
-	0x06227e01,
-/* 0x0689: i2c_raise_scl_wait */
-	0x03e84e00,
-	0x00005d7e,
-	0x0006567e,
-	0xb60901f4,
-	0x1bf40142,
-/* 0x069d: i2c_raise_scl_done */
-	0xf840fcef,
-/* 0x06a1: i2c_start */
-	0x06567e00,
-	0x0d11f400,
-	0x00066a7e,
-	0xf40611f4,
-/* 0x06b2: i2c_start_rep */
-	0x00032e0e,
-	0x0006227e,
-	0x3c7e0103,
-	0x76bb0006,
-	0x0465b600,
-	0x659450f9,
-	0x0256bb04,
-	0x75fd50bd,
-	0x7e50fc04,
-	0xb600067e,
-	0x11f40464,
-/* 0x06dd: i2c_start_send */
-	0x7e00031d,
-	0x4e00063c,
-	0x5d7e1388,
-	0x00030000,
-	0x0006227e,
-	0x7e13884e,
-/* 0x06f7: i2c_start_out */
-	0xf800005d,
-/* 0x06f9: i2c_stop */
-	0x7e000300,
-	0x03000622,
-	0x063c7e00,
-	0x03e84e00,
-	0x00005d7e,
-	0x227e0103,
-	0x884e0006,
-	0x005d7e13,
+/* 0x0613: memx_info_train */
+	0x4c090ef4,
+	0x004b0bcc,
+/* 0x0619: memx_info_send */
+	0x02c27e01,
+/* 0x061f: memx_recv */
+	0xb000f800,
+	0x0bf401d6,
+	0x00d6b0a3,
+	0xf8dc0bf4,
+/* 0x062d: memx_init */
+/* 0x062f: perf_recv */
+	0xf800f800,
+/* 0x0631: perf_init */
+/* 0x0633: i2c_drive_scl */
+	0xb000f800,
+	0x0bf40036,
+	0x07e0400d,
+	0xbd0001f6,
+/* 0x0643: i2c_drive_scl_lo */
+	0x4000f804,
+	0x01f607e4,
+	0xf804bd00,
+/* 0x064d: i2c_drive_sda */
+	0x0036b000,
+	0x400d0bf4,
+	0x02f607e0,
+	0xf804bd00,
+/* 0x065d: i2c_drive_sda_lo */
+	0x07e44000,
+	0xbd0002f6,
+/* 0x0667: i2c_sense_scl */
+	0xf400f804,
+	0xc4430132,
+	0x0033cf07,
+	0xf40431fd,
+	0x31f4060b,
+/* 0x0679: i2c_sense_scl_done */
+/* 0x067b: i2c_sense_sda */
+	0xf400f801,
+	0xc4430132,
+	0x0033cf07,
+	0xf40432fd,
+	0x31f4060b,
+/* 0x068d: i2c_sense_sda_done */
+/* 0x068f: i2c_raise_scl */
+	0xf900f801,
+	0x08984440,
+	0x337e0103,
+/* 0x069a: i2c_raise_scl_wait */
+	0xe84e0006,
+	0x005d7e03,
+	0x06677e00,
+	0x0901f400,
+	0xf40142b6,
+/* 0x06ae: i2c_raise_scl_done */
+	0x40fcef1b,
+/* 0x06b2: i2c_start */
+	0x677e00f8,
+	0x11f40006,
+	0x067b7e0d,
+	0x0611f400,
+/* 0x06c3: i2c_start_rep */
+	0x032e0ef4,
+	0x06337e00,
 	0x7e010300,
-	0x4e00063c,
-	0x5d7e1388,
-	0x00f80000,
-/* 0x0728: i2c_bitw */
-	0x00063c7e,
-	0x7e03e84e,
-	0xbb00005d,
+	0xbb00064d,
 	0x65b60076,
 	0x9450f904,
 	0x56bb0465,
 	0xfd50bd02,
 	0x50fc0475,
-	0x00067e7e,
+	0x00068f7e,
 	0xf40464b6,
-	0x884e1711,
-	0x005d7e13,
-	0x7e000300,
-	0x4e000622,
-	0x5d7e1388,
-/* 0x0766: i2c_bitw_out */
-	0x00f80000,
-/* 0x0768: i2c_bitr */
-	0x3c7e0103,
+/* 0x06ee: i2c_start_send */
+	0x00031d11,
+	0x00064d7e,
+	0x7e13884e,
+	0x0300005d,
+	0x06337e00,
+	0x13884e00,
+	0x00005d7e,
+/* 0x0708: i2c_start_out */
+/* 0x070a: i2c_stop */
+	0x000300f8,
+	0x0006337e,
+	0x4d7e0003,
 	0xe84e0006,
 	0x005d7e03,
-	0x0076bb00,
-	0xf90465b6,
-	0x04659450,
-	0xbd0256bb,
-	0x0475fd50,
-	0x7e7e50fc,
-	0x64b60006,
-	0x1a11f404,
-	0x00066a7e,
-	0x227e0003,
-	0x884e0006,
-	0x005d7e13,
-	0x013cf000,
-/* 0x07ab: i2c_bitr_done */
-	0xf80131f4,
-/* 0x07ad: i2c_get_byte */
-	0x04000500,
-/* 0x07b1: i2c_get_byte_next */
-	0x0154b608,
+	0x7e010300,
+	0x4e000633,
+	0x5d7e1388,
+	0x01030000,
+	0x00064d7e,
+	0x7e13884e,
+	0xf800005d,
+/* 0x0739: i2c_bitw */
+	0x064d7e00,
+	0x03e84e00,
+	0x00005d7e,
 	0xb60076bb,
 	0x50f90465,
 	0xbb046594,
 	0x50bd0256,
 	0xfc0475fd,
-	0x07687e50,
+	0x068f7e50,
 	0x0464b600,
-	0xfd2a11f4,
-	0x42b60553,
-	0xd81bf401,
-	0x76bb0103,
+	0x4e1711f4,
+	0x5d7e1388,
+	0x00030000,
+	0x0006337e,
+	0x7e13884e,
+/* 0x0777: i2c_bitw_out */
+	0xf800005d,
+/* 0x0779: i2c_bitr */
+	0x7e010300,
+	0x4e00064d,
+	0x5d7e03e8,
+	0x76bb0000,
 	0x0465b600,
 	0x659450f9,
 	0x0256bb04,
 	0x75fd50bd,
 	0x7e50fc04,
-	0xb6000728,
-/* 0x07fa: i2c_get_byte_done */
-	0x00f80464,
-/* 0x07fc: i2c_put_byte */
-/* 0x07fe: i2c_put_byte_next */
-	0x42b60804,
-	0x3854ff01,
-	0xb60076bb,
-	0x50f90465,
-	0xbb046594,
-	0x50bd0256,
-	0xfc0475fd,
-	0x07287e50,
-	0x0464b600,
-	0xb03411f4,
-	0x1bf40046,
-	0x0076bbd8,
+	0xb600068f,
+	0x11f40464,
+	0x067b7e1a,
+	0x7e000300,
+	0x4e000633,
+	0x5d7e1388,
+	0x3cf00000,
+	0x0131f401,
+/* 0x07bc: i2c_bitr_done */
+/* 0x07be: i2c_get_byte */
+	0x000500f8,
+/* 0x07c2: i2c_get_byte_next */
+	0x54b60804,
+	0x0076bb01,
+	0xf90465b6,
+	0x04659450,
+	0xbd0256bb,
+	0x0475fd50,
+	0x797e50fc,
+	0x64b60007,
+	0x2a11f404,
+	0xb60553fd,
+	0x1bf40142,
+	0xbb0103d8,
+	0x65b60076,
+	0x9450f904,
+	0x56bb0465,
+	0xfd50bd02,
+	0x50fc0475,
+	0x0007397e,
+/* 0x080b: i2c_get_byte_done */
+	0xf80464b6,
+/* 0x080d: i2c_put_byte */
+/* 0x080f: i2c_put_byte_next */
+	0xb6080400,
+	0x54ff0142,
+	0x0076bb38,
 	0xf90465b6,
 	0x04659450,
 	0xbd0256bb,
 	0x0475fd50,
-	0x687e50fc,
+	0x397e50fc,
 	0x64b60007,
-	0x0f11f404,
-	0xb00076bb,
-	0x1bf40136,
-	0x0132f406,
-/* 0x0854: i2c_put_byte_done */
-/* 0x0856: i2c_addr */
-	0x76bb00f8,
+	0x3411f404,
+	0xf40046b0,
+	0x76bbd81b,
 	0x0465b600,
 	0x659450f9,
 	0x0256bb04,
 	0x75fd50bd,
 	0x7e50fc04,
-	0xb60006a1,
+	0xb6000779,
 	0x11f40464,
-	0x2ec3e729,
-	0x0134b601,
-	0xbb0553fd,
+	0x0076bb0f,
+	0xf40136b0,
+	0x32f4061b,
+/* 0x0865: i2c_put_byte_done */
+/* 0x0867: i2c_addr */
+	0xbb00f801,
 	0x65b60076,
 	0x9450f904,
 	0x56bb0465,
 	0xfd50bd02,
 	0x50fc0475,
-	0x0007fc7e,
-/* 0x089b: i2c_addr_done */
-	0xf80464b6,
-/* 0x089d: i2c_acquire_addr */
-	0xf8cec700,
-	0xb705e4b6,
-	0xf8d014e0,
-/* 0x08a9: i2c_acquire */
-	0x089d7e00,
-	0x00047e00,
-	0x03d9f000,
-	0x00002e7e,
-/* 0x08ba: i2c_release */
-	0x9d7e00f8,
+	0x0006b27e,
+	0xf40464b6,
+	0xc3e72911,
+	0x34b6012e,
+	0x0553fd01,
+	0xb60076bb,
+	0x50f90465,
+	0xbb046594,
+	0x50bd0256,
+	0xfc0475fd,
+	0x080d7e50,
+	0x0464b600,
+/* 0x08ac: i2c_addr_done */
+/* 0x08ae: i2c_acquire_addr */
+	0xcec700f8,
+	0x05e4b6f8,
+	0xd014e0b7,
+/* 0x08ba: i2c_acquire */
+	0xae7e00f8,
 	0x047e0008,
-	0xdaf00000,
+	0xd9f00000,
 	0x002e7e03,
-/* 0x08cb: i2c_recv */
-	0xf400f800,
-	0xc1c70132,
-	0x0214b6f8,
-	0xf52816b0,
-	0xb801371f,
-	0x000be813,
-	0xb8003298,
-	0x000bc013,
-	0xf4003198,
-	0xd0f90231,
-	0xd0f9e0f9,
-	0x000067f1,
-	0x100063f1,
-	0xbb016792,
+/* 0x08cb: i2c_release */
+	0x7e00f800,
+	0x7e0008ae,
+	0xf0000004,
+	0x2e7e03da,
+	0x00f80000,
+/* 0x08dc: i2c_recv */
+	0xc70132f4,
+	0x14b6f8c1,
+	0x2816b002,
+	0x01371ff5,
+	0x0cf413b8,
+	0x00329800,
+	0x0ccc13b8,
+	0x00319800,
+	0xf90231f4,
+	0xf9e0f9d0,
+	0x0067f1d0,
+	0x0063f100,
+	0x01679210,
+	0xb60076bb,
+	0x50f90465,
+	0xbb046594,
+	0x50bd0256,
+	0xfc0475fd,
+	0x08ba7e50,
+	0x0464b600,
+	0xd6b0d0fc,
+	0xb01bf500,
+	0xbb000500,
 	0x65b60076,
 	0x9450f904,
 	0x56bb0465,
 	0xfd50bd02,
 	0x50fc0475,
-	0x0008a97e,
-	0xfc0464b6,
-	0x00d6b0d0,
-	0x00b01bf5,
-	0x76bb0005,
+	0x0008677e,
+	0xf50464b6,
+	0xc700cc11,
+	0x76bbe0c5,
 	0x0465b600,
 	0x659450f9,
 	0x0256bb04,
 	0x75fd50bd,
 	0x7e50fc04,
-	0xb6000856,
+	0xb600080d,
 	0x11f50464,
-	0xc5c700cc,
-	0x0076bbe0,
-	0xf90465b6,
-	0x04659450,
-	0xbd0256bb,
-	0x0475fd50,
-	0xfc7e50fc,
-	0x64b60007,
-	0xa911f504,
-	0xbb010500,
-	0x65b60076,
-	0x9450f904,
-	0x56bb0465,
-	0xfd50bd02,
-	0x50fc0475,
-	0x0008567e,
-	0xf50464b6,
-	0xbb008711,
-	0x65b60076,
-	0x9450f904,
-	0x56bb0465,
-	0xfd50bd02,
-	0x50fc0475,
-	0x0007ad7e,
-	0xf40464b6,
-	0x5bcb6711,
-	0x0076bbe0,
-	0xf90465b6,
-	0x04659450,
-	0xbd0256bb,
-	0x0475fd50,
-	0xf97e50fc,
-	0x64b60006,
-	0xbd5bb204,
-	0x410ef474,
-/* 0x09d0: i2c_recv_not_rd08 */
-	0xf401d6b0,
-	0x00053b1b,
-	0x0008567e,
-	0xc73211f4,
-	0xfc7ee0c5,
-	0x11f40007,
-	0x7e000528,
-	0xf4000856,
-	0xb5c71f11,
-	0x07fc7ee0,
-	0x1511f400,
-	0x0006f97e,
-	0xc5c774bd,
-	0x091bf408,
-	0xf40232f4,
-/* 0x0a0e: i2c_recv_not_wr08 */
-/* 0x0a0e: i2c_recv_done */
-	0xcec7030e,
-	0x08ba7ef8,
-	0xfce0fc00,
-	0x0912f4d0,
-	0xc27e7cb2,
-/* 0x0a22: i2c_recv_exit */
-	0x00f80002,
-/* 0x0a24: i2c_init */
-/* 0x0a26: test_recv */
-	0x584100f8,
-	0x0011cf04,
-	0x400110b6,
-	0x01f60458,
-	0xf104bd00,
-	0xf1d900e7,
-	0x7e134fe3,
-	0xf8000201,
-/* 0x0a45: test_init */
-	0x08004e00,
-	0x0002017e,
-/* 0x0a4e: idle_recv */
-	0x00f800f8,
-/* 0x0a50: idle */
-	0x410031f4,
-	0x11cf0454,
+	0x010500a9,
+	0xb60076bb,
+	0x50f90465,
+	0xbb046594,
+	0x50bd0256,
+	0xfc0475fd,
+	0x08677e50,
+	0x0464b600,
+	0x008711f5,
+	0xb60076bb,
+	0x50f90465,
+	0xbb046594,
+	0x50bd0256,
+	0xfc0475fd,
+	0x07be7e50,
+	0x0464b600,
+	0xcb6711f4,
+	0x76bbe05b,
+	0x0465b600,
+	0x659450f9,
+	0x0256bb04,
+	0x75fd50bd,
+	0x7e50fc04,
+	0xb600070a,
+	0x5bb20464,
+	0x0ef474bd,
+/* 0x09e1: i2c_recv_not_rd08 */
+	0x01d6b041,
+	0x053b1bf4,
+	0x08677e00,
+	0x3211f400,
+	0x7ee0c5c7,
+	0xf400080d,
+	0x00052811,
+	0x0008677e,
+	0xc71f11f4,
+	0x0d7ee0b5,
+	0x11f40008,
+	0x070a7e15,
+	0xc774bd00,
+	0x1bf408c5,
+	0x0232f409,
+/* 0x0a1f: i2c_recv_not_wr08 */
+/* 0x0a1f: i2c_recv_done */
+	0xc7030ef4,
+	0xcb7ef8ce,
+	0xe0fc0008,
+	0x12f4d0fc,
+	0x7e7cb209,
+/* 0x0a33: i2c_recv_exit */
+	0xf80002c2,
+/* 0x0a35: i2c_init */
+/* 0x0a37: test_recv */
+	0x4100f800,
+	0x11cf0458,
 	0x0110b600,
-	0xf6045440,
+	0xf6045840,
 	0x04bd0001,
-/* 0x0a64: idle_loop */
-	0x32f45801,
-/* 0x0a69: idle_proc */
-/* 0x0a69: idle_proc_exec */
-	0xb210f902,
-	0x02cb7e1e,
-	0xf410fc00,
-	0x31f40911,
-	0xf00ef402,
-/* 0x0a7c: idle_proc_next */
-	0xa65810b6,
-	0xe81bf41f,
-	0xf4e002f4,
-	0x0ef40028,
-	0x000000c6,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
+	0xd900e7f1,
+	0x134fe3f1,
+	0x0002017e,
+/* 0x0a56: test_init */
+	0x004e00f8,
+	0x02017e08,
+/* 0x0a5f: idle_recv */
+	0xf800f800,
+/* 0x0a61: idle */
+	0x0031f400,
+	0xcf045441,
+	0x10b60011,
+	0x04544001,
+	0xbd0001f6,
+/* 0x0a75: idle_loop */
+	0xf4580104,
+/* 0x0a7a: idle_proc */
+/* 0x0a7a: idle_proc_exec */
+	0x10f90232,
+	0xcb7e1eb2,
+	0x10fc0002,
+	0xf40911f4,
+	0x0ef40231,
+/* 0x0a8d: idle_proc_next */
+	0x5810b6f0,
+	0x1bf41fa6,
+	0xe002f4e8,
+	0xf40028f4,
+	0x0000c60e,
 	0x00000000,
 	0x00000000,
 	0x00000000,
diff --git a/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nva3.fuc.h b/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nva3.fuc.h
index 64e97baabc3c..d1f9b6cb66d7 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nva3.fuc.h
+++ b/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nva3.fuc.h
@@ -46,8 +46,8 @@ uint32_t nva3_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x584d454d,
-	0x000006e0,
-	0x000006d2,
+	0x00000842,
+	0x00000834,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -68,8 +68,8 @@ uint32_t nva3_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x46524550,
-	0x000006e4,
-	0x000006e2,
+	0x00000846,
+	0x00000844,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -90,8 +90,8 @@ uint32_t nva3_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x5f433249,
-	0x00000b14,
-	0x000009b7,
+	0x00000c76,
+	0x00000b19,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -112,8 +112,8 @@ uint32_t nva3_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x54534554,
-	0x00000b3d,
-	0x00000b16,
+	0x00000c9f,
+	0x00000c78,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -134,8 +134,8 @@ uint32_t nva3_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x454c4449,
-	0x00000b49,
-	0x00000b47,
+	0x00000cab,
+	0x00000ca9,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -246,13 +246,15 @@ uint32_t nva3_pwr_data[] = {
 	0x00010006,
 	0x00000000,
 	0x000005f8,
-/* 0x03b8: memx_func_tail */
-/* 0x03b8: memx_ts_start */
+	0x00000007,
 	0x00000000,
-/* 0x03bc: memx_ts_end */
+	0x0000067e,
+/* 0x03c4: memx_func_tail */
+/* 0x03c4: memx_ts_start */
 	0x00000000,
-/* 0x03c0: memx_data_head */
+/* 0x03c8: memx_ts_end */
 	0x00000000,
+/* 0x03cc: memx_data_head */
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -764,8 +766,75 @@ uint32_t nva3_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-/* 0x0bc0: memx_data_tail */
-/* 0x0bc0: i2c_scl_map */
+	0x00000000,
+/* 0x0bcc: memx_data_tail */
+/* 0x0bcc: memx_train_head */
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+/* 0x0ccc: memx_train_tail */
+/* 0x0ccc: i2c_scl_map */
 	0x00001000,
 	0x00004000,
 	0x00010000,
@@ -776,7 +845,7 @@ uint32_t nva3_pwr_data[] = {
 	0x01000000,
 	0x04000000,
 	0x10000000,
-/* 0x0be8: i2c_sda_map */
+/* 0x0cf4: i2c_sda_map */
 	0x00002000,
 	0x00008000,
 	0x00020000,
@@ -787,7 +856,7 @@ uint32_t nva3_pwr_data[] = {
 	0x02000000,
 	0x08000000,
 	0x20000000,
-/* 0x0c10: i2c_ctrl */
+/* 0x0d1c: i2c_ctrl */
 	0x0000e138,
 	0x0000e150,
 	0x0000e168,
@@ -845,9 +914,6 @@ uint32_t nva3_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
 };
 
 uint32_t nva3_pwr_code[] = {
@@ -1258,11 +1324,11 @@ uint32_t nva3_pwr_code[] = {
 	0x67f0f30b,
 	0x0664b62c,
 	0x800066cf,
-	0x00f8ee06,
+	0x00f8f106,
 /* 0x05a8: memx_func_leave */
 	0xb62c67f0,
 	0x66cf0664,
-	0xef068000,
+	0xf2068000,
 	0xf10467f0,
 	0xb607e407,
 	0x06d00604,
@@ -1323,408 +1389,479 @@ uint32_t nva3_pwr_code[] = {
 	0x9800f8a4,
 	0x10b6001e,
 	0x7f21f404,
-/* 0x067e: memx_exec */
-	0xe0f900f8,
-	0xc1b9d0f9,
-	0x02b2b902,
-/* 0x0688: memx_exec_next */
-	0xb6001398,
-	0x34e70410,
-	0x33e701f0,
-	0x32b601e0,
-	0x0c30f001,
-	0xf9de3598,
-	0x0612b855,
-	0x98e41ef4,
-	0x0c98ee0b,
-	0x02cbbbef,
-	0x07c4b7f1,
-	0xcf06b4b6,
-	0xd0fc00bb,
-	0x21f5e0fc,
+/* 0x067e: memx_func_train */
+	0x57f100f8,
+	0x77f10003,
+	0x97f10000,
+	0x93f00000,
+	0x029eb970,
+	0xb90421f4,
+	0xe7f102d8,
+	0x21f42710,
+/* 0x069d: memx_func_train_loop_outer */
+	0x0158e07f,
+	0x0083f101,
+	0xe097f102,
+	0x1193f011,
+	0x80f990f9,
+	0xe0fcd0fc,
+	0xf93f21f4,
+	0x0067f150,
+/* 0x06bd: memx_func_train_loop_inner */
+	0x1187f100,
+	0x9068ff11,
+	0xfd109894,
+	0x97f10589,
+	0x93f00720,
+	0xf990f910,
+	0xfcd0fc80,
+	0x3f21f4e0,
+	0x008097f1,
+	0xb91093f0,
+	0x21f4029e,
+	0x02d8b904,
+	0xf92088c5,
+	0xfc80f990,
+	0xf4e0fcd0,
+	0x97f13f21,
+	0x93f0053c,
+	0x0287f110,
+	0x0083f130,
+	0xf990f980,
+	0xfcd0fc80,
+	0x3f21f4e0,
+	0x0560e7f1,
+	0xf110e3f0,
+	0xf10000d7,
+	0x908000d3,
+	0xb7f100dc,
+	0xb3f08480,
+	0xa421f41e,
+	0x000057f1,
+	0xffff97f1,
+	0x830093f1,
+/* 0x073c: memx_func_train_loop_4x */
+	0x0080a7f1,
+	0xb910a3f0,
+	0x21f402ae,
+	0x02d8b904,
+	0xffdfb7f1,
+	0xffffb3f1,
+	0xf9048bfd,
+	0xfc80f9a0,
+	0xf4e0fcd0,
+	0xa7f13f21,
+	0xa3f0053c,
+	0x0287f110,
+	0x0083f130,
+	0xf9a0f980,
+	0xfcd0fc80,
+	0x3f21f4e0,
+	0x0560e7f1,
+	0xf110e3f0,
+	0xf10000d7,
+	0xb98000d3,
+	0xb7f102dc,
+	0xb3f02710,
+	0xa421f400,
+	0xf402eeb9,
+	0xddb90421,
+	0x949dff02,
+	0x700150b6,
+	0x1ef40456,
+	0xcc7aa092,
+	0x00a9800b,
+	0xb60160b6,
+	0x66700470,
+	0x001ef510,
+	0xb650fcff,
+	0x56700150,
+	0xd41ef507,
+/* 0x07cf: memx_exec */
+	0xf900f8fe,
+	0xb9d0f9e0,
+	0xb2b902c1,
+/* 0x07d9: memx_exec_next */
+	0x00139802,
+	0xe70410b6,
+	0xe701f034,
+	0xb601e033,
+	0x30f00132,
+	0xde35980c,
+	0x12b855f9,
+	0xe41ef406,
+	0x98f10b98,
+	0xcbbbf20c,
+	0xc4b7f102,
+	0x06b4b607,
+	0xfc00bbcf,
+	0xf5e0fcd0,
+	0xf8034221,
+/* 0x0815: memx_info */
+	0x01c67000,
+/* 0x081b: memx_info_data */
+	0xf10e0bf4,
+	0xf103ccc7,
+	0xf40800b7,
+/* 0x0826: memx_info_train */
+	0xc7f10b0e,
+	0xb7f10bcc,
+/* 0x082e: memx_info_send */
+	0x21f50100,
 	0x00f80342,
-/* 0x06c4: memx_info */
-	0x03c0c7f1,
-	0x0800b7f1,
-	0x034221f5,
-/* 0x06d2: memx_recv */
-	0xd6b000f8,
-	0xa90bf401,
-	0xf400d6b0,
-	0x00f8e90b,
-/* 0x06e0: memx_init */
-/* 0x06e2: perf_recv */
+/* 0x0834: memx_recv */
+	0xf401d6b0,
+	0xd6b0980b,
+	0xd80bf400,
+/* 0x0842: memx_init */
+	0x00f800f8,
+/* 0x0844: perf_recv */
+/* 0x0846: perf_init */
 	0x00f800f8,
-/* 0x06e4: perf_init */
-/* 0x06e6: i2c_drive_scl */
+/* 0x0848: i2c_drive_scl */
+	0xf40036b0,
+	0x07f1110b,
+	0x04b607e0,
+	0x0001d006,
+	0x00f804bd,
+/* 0x085c: i2c_drive_scl_lo */
+	0x07e407f1,
+	0xd00604b6,
+	0x04bd0001,
+/* 0x086a: i2c_drive_sda */
 	0x36b000f8,
 	0x110bf400,
 	0x07e007f1,
 	0xd00604b6,
-	0x04bd0001,
-/* 0x06fa: i2c_drive_scl_lo */
+	0x04bd0002,
+/* 0x087e: i2c_drive_sda_lo */
 	0x07f100f8,
 	0x04b607e4,
-	0x0001d006,
-	0x00f804bd,
-/* 0x0708: i2c_drive_sda */
-	0xf40036b0,
-	0x07f1110b,
-	0x04b607e0,
 	0x0002d006,
 	0x00f804bd,
-/* 0x071c: i2c_drive_sda_lo */
-	0x07e407f1,
-	0xd00604b6,
-	0x04bd0002,
-/* 0x072a: i2c_sense_scl */
-	0x32f400f8,
-	0xc437f101,
-	0x0634b607,
-	0xfd0033cf,
-	0x0bf40431,
-	0x0131f406,
-/* 0x0740: i2c_sense_scl_done */
-/* 0x0742: i2c_sense_sda */
-	0x32f400f8,
-	0xc437f101,
-	0x0634b607,
-	0xfd0033cf,
-	0x0bf40432,
-	0x0131f406,
-/* 0x0758: i2c_sense_sda_done */
-/* 0x075a: i2c_raise_scl */
-	0x40f900f8,
-	0x089847f1,
-	0xf50137f0,
-/* 0x0767: i2c_raise_scl_wait */
-	0xf106e621,
-	0xf403e8e7,
-	0x21f57f21,
-	0x01f4072a,
-	0x0142b609,
-/* 0x077b: i2c_raise_scl_done */
-	0xfcef1bf4,
-/* 0x077f: i2c_start */
-	0xf500f840,
-	0xf4072a21,
-	0x21f50d11,
-	0x11f40742,
-	0x300ef406,
-/* 0x0790: i2c_start_rep */
-	0xf50037f0,
-	0xf006e621,
-	0x21f50137,
-	0x76bb0708,
-	0x0465b600,
-	0x659450f9,
-	0x0256bb04,
-	0x75fd50bd,
-	0xf550fc04,
-	0xb6075a21,
-	0x11f40464,
-/* 0x07bd: i2c_start_send */
-	0x0037f01f,
-	0x070821f5,
-	0x1388e7f1,
-	0xf07f21f4,
-	0x21f50037,
-	0xe7f106e6,
-	0x21f41388,
-/* 0x07d9: i2c_start_out */
-/* 0x07db: i2c_stop */
-	0xf000f87f,
-	0x21f50037,
-	0x37f006e6,
-	0x0821f500,
-	0xe8e7f107,
+/* 0x088c: i2c_sense_scl */
+	0xf10132f4,
+	0xb607c437,
+	0x33cf0634,
+	0x0431fd00,
+	0xf4060bf4,
+/* 0x08a2: i2c_sense_scl_done */
+	0x00f80131,
+/* 0x08a4: i2c_sense_sda */
+	0xf10132f4,
+	0xb607c437,
+	0x33cf0634,
+	0x0432fd00,
+	0xf4060bf4,
+/* 0x08ba: i2c_sense_sda_done */
+	0x00f80131,
+/* 0x08bc: i2c_raise_scl */
+	0x47f140f9,
+	0x37f00898,
+	0x4821f501,
+/* 0x08c9: i2c_raise_scl_wait */
+	0xe8e7f108,
 	0x7f21f403,
-	0xf50137f0,
-	0xf106e621,
-	0xf41388e7,
-	0x37f07f21,
-	0x0821f501,
-	0x88e7f107,
-	0x7f21f413,
-/* 0x080e: i2c_bitw */
-	0x21f500f8,
-	0xe7f10708,
-	0x21f403e8,
-	0x0076bb7f,
-	0xf90465b6,
-	0x04659450,
-	0xbd0256bb,
-	0x0475fd50,
-	0x21f550fc,
-	0x64b6075a,
-	0x1811f404,
-	0x1388e7f1,
-	0xf07f21f4,
+	0x088c21f5,
+	0xb60901f4,
+	0x1bf40142,
+/* 0x08dd: i2c_raise_scl_done */
+	0xf840fcef,
+/* 0x08e1: i2c_start */
+	0x8c21f500,
+	0x0d11f408,
+	0x08a421f5,
+	0xf40611f4,
+/* 0x08f2: i2c_start_rep */
+	0x37f0300e,
+	0x4821f500,
+	0x0137f008,
+	0x086a21f5,
+	0xb60076bb,
+	0x50f90465,
+	0xbb046594,
+	0x50bd0256,
+	0xfc0475fd,
+	0xbc21f550,
+	0x0464b608,
+/* 0x091f: i2c_start_send */
+	0xf01f11f4,
 	0x21f50037,
-	0xe7f106e6,
+	0xe7f1086a,
 	0x21f41388,
-/* 0x084d: i2c_bitw_out */
-/* 0x084f: i2c_bitr */
-	0xf000f87f,
-	0x21f50137,
-	0xe7f10708,
-	0x21f403e8,
-	0x0076bb7f,
-	0xf90465b6,
-	0x04659450,
-	0xbd0256bb,
-	0x0475fd50,
-	0x21f550fc,
-	0x64b6075a,
-	0x1b11f404,
-	0x074221f5,
+	0x0037f07f,
+	0x084821f5,
+	0x1388e7f1,
+/* 0x093b: i2c_start_out */
+	0xf87f21f4,
+/* 0x093d: i2c_stop */
+	0x0037f000,
+	0x084821f5,
 	0xf50037f0,
-	0xf106e621,
+	0xf1086a21,
+	0xf403e8e7,
+	0x37f07f21,
+	0x4821f501,
+	0x88e7f108,
+	0x7f21f413,
+	0xf50137f0,
+	0xf1086a21,
 	0xf41388e7,
-	0x3cf07f21,
-	0x0131f401,
-/* 0x0894: i2c_bitr_done */
-/* 0x0896: i2c_get_byte */
-	0x57f000f8,
-	0x0847f000,
-/* 0x089c: i2c_get_byte_next */
-	0xbb0154b6,
+	0x00f87f21,
+/* 0x0970: i2c_bitw */
+	0x086a21f5,
+	0x03e8e7f1,
+	0xbb7f21f4,
 	0x65b60076,
 	0x9450f904,
 	0x56bb0465,
 	0xfd50bd02,
 	0x50fc0475,
-	0x084f21f5,
+	0x08bc21f5,
 	0xf40464b6,
-	0x53fd2b11,
-	0x0142b605,
-	0xf0d81bf4,
-	0x76bb0137,
-	0x0465b600,
-	0x659450f9,
-	0x0256bb04,
-	0x75fd50bd,
-	0xf550fc04,
-	0xb6080e21,
-/* 0x08e6: i2c_get_byte_done */
-	0x00f80464,
-/* 0x08e8: i2c_put_byte */
-/* 0x08eb: i2c_put_byte_next */
-	0xb60847f0,
-	0x54ff0142,
-	0x0076bb38,
+	0xe7f11811,
+	0x21f41388,
+	0x0037f07f,
+	0x084821f5,
+	0x1388e7f1,
+/* 0x09af: i2c_bitw_out */
+	0xf87f21f4,
+/* 0x09b1: i2c_bitr */
+	0x0137f000,
+	0x086a21f5,
+	0x03e8e7f1,
+	0xbb7f21f4,
+	0x65b60076,
+	0x9450f904,
+	0x56bb0465,
+	0xfd50bd02,
+	0x50fc0475,
+	0x08bc21f5,
+	0xf40464b6,
+	0x21f51b11,
+	0x37f008a4,
+	0x4821f500,
+	0x88e7f108,
+	0x7f21f413,
+	0xf4013cf0,
+/* 0x09f6: i2c_bitr_done */
+	0x00f80131,
+/* 0x09f8: i2c_get_byte */
+	0xf00057f0,
+/* 0x09fe: i2c_get_byte_next */
+	0x54b60847,
+	0x0076bb01,
 	0xf90465b6,
 	0x04659450,
 	0xbd0256bb,
 	0x0475fd50,
 	0x21f550fc,
-	0x64b6080e,
-	0x3411f404,
-	0xf40046b0,
-	0x76bbd81b,
-	0x0465b600,
-	0x659450f9,
-	0x0256bb04,
-	0x75fd50bd,
-	0xf550fc04,
-	0xb6084f21,
-	0x11f40464,
-	0x0076bb0f,
-	0xf40136b0,
-	0x32f4061b,
-/* 0x0941: i2c_put_byte_done */
-/* 0x0943: i2c_addr */
-	0xbb00f801,
+	0x64b609b1,
+	0x2b11f404,
+	0xb60553fd,
+	0x1bf40142,
+	0x0137f0d8,
+	0xb60076bb,
+	0x50f90465,
+	0xbb046594,
+	0x50bd0256,
+	0xfc0475fd,
+	0x7021f550,
+	0x0464b609,
+/* 0x0a48: i2c_get_byte_done */
+/* 0x0a4a: i2c_put_byte */
+	0x47f000f8,
+/* 0x0a4d: i2c_put_byte_next */
+	0x0142b608,
+	0xbb3854ff,
 	0x65b60076,
 	0x9450f904,
 	0x56bb0465,
 	0xfd50bd02,
 	0x50fc0475,
-	0x077f21f5,
+	0x097021f5,
 	0xf40464b6,
-	0xc3e72911,
-	0x34b6012e,
-	0x0553fd01,
+	0x46b03411,
+	0xd81bf400,
 	0xb60076bb,
 	0x50f90465,
 	0xbb046594,
 	0x50bd0256,
 	0xfc0475fd,
-	0xe821f550,
-	0x0464b608,
-/* 0x0988: i2c_addr_done */
-/* 0x098a: i2c_acquire_addr */
-	0xcec700f8,
-	0x02e4b6f8,
-	0x0c10e0b7,
-	0xf800ee98,
-/* 0x0999: i2c_acquire */
-	0x8a21f500,
-	0x0421f409,
-	0xf403d9f0,
-	0x00f83f21,
-/* 0x09a8: i2c_release */
-	0x098a21f5,
-	0xf00421f4,
-	0x21f403da,
-/* 0x09b7: i2c_recv */
-	0xf400f83f,
-	0xc1c70132,
-	0x0214b6f8,
-	0xf52816b0,
-	0xa0013a1f,
-	0x980be813,
-	0x13a00032,
-	0x31980bc0,
-	0x0231f400,
-	0xe0f9d0f9,
-	0x67f1d0f9,
-	0x63f10000,
-	0x67921000,
-	0x0076bb01,
-	0xf90465b6,
-	0x04659450,
-	0xbd0256bb,
-	0x0475fd50,
-	0x21f550fc,
-	0x64b60999,
-	0xb0d0fc04,
-	0x1bf500d6,
-	0x57f000b3,
+	0xb121f550,
+	0x0464b609,
+	0xbb0f11f4,
+	0x36b00076,
+	0x061bf401,
+/* 0x0aa3: i2c_put_byte_done */
+	0xf80132f4,
+/* 0x0aa5: i2c_addr */
 	0x0076bb00,
 	0xf90465b6,
 	0x04659450,
 	0xbd0256bb,
 	0x0475fd50,
 	0x21f550fc,
-	0x64b60943,
-	0xd011f504,
-	0xe0c5c700,
-	0xb60076bb,
-	0x50f90465,
-	0xbb046594,
-	0x50bd0256,
-	0xfc0475fd,
-	0xe821f550,
-	0x0464b608,
-	0x00ad11f5,
-	0xbb0157f0,
+	0x64b608e1,
+	0x2911f404,
+	0x012ec3e7,
+	0xfd0134b6,
+	0x76bb0553,
+	0x0465b600,
+	0x659450f9,
+	0x0256bb04,
+	0x75fd50bd,
+	0xf550fc04,
+	0xb60a4a21,
+/* 0x0aea: i2c_addr_done */
+	0x00f80464,
+/* 0x0aec: i2c_acquire_addr */
+	0xb6f8cec7,
+	0xe0b702e4,
+	0xee980d1c,
+/* 0x0afb: i2c_acquire */
+	0xf500f800,
+	0xf40aec21,
+	0xd9f00421,
+	0x3f21f403,
+/* 0x0b0a: i2c_release */
+	0x21f500f8,
+	0x21f40aec,
+	0x03daf004,
+	0xf83f21f4,
+/* 0x0b19: i2c_recv */
+	0x0132f400,
+	0xb6f8c1c7,
+	0x16b00214,
+	0x3a1ff528,
+	0xf413a001,
+	0x0032980c,
+	0x0ccc13a0,
+	0xf4003198,
+	0xd0f90231,
+	0xd0f9e0f9,
+	0x000067f1,
+	0x100063f1,
+	0xbb016792,
 	0x65b60076,
 	0x9450f904,
 	0x56bb0465,
 	0xfd50bd02,
 	0x50fc0475,
-	0x094321f5,
-	0xf50464b6,
-	0xbb008a11,
+	0x0afb21f5,
+	0xfc0464b6,
+	0x00d6b0d0,
+	0x00b31bf5,
+	0xbb0057f0,
 	0x65b60076,
 	0x9450f904,
 	0x56bb0465,
 	0xfd50bd02,
 	0x50fc0475,
-	0x089621f5,
-	0xf40464b6,
-	0x5bcb6a11,
-	0x0076bbe0,
+	0x0aa521f5,
+	0xf50464b6,
+	0xc700d011,
+	0x76bbe0c5,
+	0x0465b600,
+	0x659450f9,
+	0x0256bb04,
+	0x75fd50bd,
+	0xf550fc04,
+	0xb60a4a21,
+	0x11f50464,
+	0x57f000ad,
+	0x0076bb01,
 	0xf90465b6,
 	0x04659450,
 	0xbd0256bb,
 	0x0475fd50,
 	0x21f550fc,
-	0x64b607db,
-	0x025bb904,
-	0x0ef474bd,
-/* 0x0abd: i2c_recv_not_rd08 */
-	0x01d6b043,
-	0xf03d1bf4,
-	0x21f50057,
-	0x11f40943,
-	0xe0c5c733,
-	0x08e821f5,
-	0xf02911f4,
-	0x21f50057,
-	0x11f40943,
-	0xe0b5c71f,
-	0x08e821f5,
-	0xf51511f4,
-	0xbd07db21,
-	0x08c5c774,
-	0xf4091bf4,
-	0x0ef40232,
-/* 0x0afd: i2c_recv_not_wr08 */
-/* 0x0afd: i2c_recv_done */
-	0xf8cec703,
-	0x09a821f5,
-	0xd0fce0fc,
-	0xb90a12f4,
-	0x21f5027c,
-/* 0x0b12: i2c_recv_exit */
-	0x00f80342,
-/* 0x0b14: i2c_init */
-/* 0x0b16: test_recv */
-	0x17f100f8,
-	0x14b605d8,
-	0x0011cf06,
-	0xf10110b6,
-	0xb605d807,
-	0x01d00604,
-	0xf104bd00,
-	0xf1d900e7,
-	0xf5134fe3,
-	0xf8026221,
-/* 0x0b3d: test_init */
-	0x00e7f100,
-	0x6221f508,
-/* 0x0b47: idle_recv */
-	0xf800f802,
-/* 0x0b49: idle */
-	0x0031f400,
-	0x05d417f1,
+	0x64b60aa5,
+	0x8a11f504,
+	0x0076bb00,
+	0xf90465b6,
+	0x04659450,
+	0xbd0256bb,
+	0x0475fd50,
+	0x21f550fc,
+	0x64b609f8,
+	0x6a11f404,
+	0xbbe05bcb,
+	0x65b60076,
+	0x9450f904,
+	0x56bb0465,
+	0xfd50bd02,
+	0x50fc0475,
+	0x093d21f5,
+	0xb90464b6,
+	0x74bd025b,
+/* 0x0c1f: i2c_recv_not_rd08 */
+	0xb0430ef4,
+	0x1bf401d6,
+	0x0057f03d,
+	0x0aa521f5,
+	0xc73311f4,
+	0x21f5e0c5,
+	0x11f40a4a,
+	0x0057f029,
+	0x0aa521f5,
+	0xc71f11f4,
+	0x21f5e0b5,
+	0x11f40a4a,
+	0x3d21f515,
+	0xc774bd09,
+	0x1bf408c5,
+	0x0232f409,
+/* 0x0c5f: i2c_recv_not_wr08 */
+/* 0x0c5f: i2c_recv_done */
+	0xc7030ef4,
+	0x21f5f8ce,
+	0xe0fc0b0a,
+	0x12f4d0fc,
+	0x027cb90a,
+	0x034221f5,
+/* 0x0c74: i2c_recv_exit */
+/* 0x0c76: i2c_init */
+	0x00f800f8,
+/* 0x0c78: test_recv */
+	0x05d817f1,
 	0xcf0614b6,
 	0x10b60011,
-	0xd407f101,
+	0xd807f101,
 	0x0604b605,
 	0xbd0001d0,
-/* 0x0b65: idle_loop */
-	0x5817f004,
-/* 0x0b6b: idle_proc */
-/* 0x0b6b: idle_proc_exec */
-	0xf90232f4,
-	0x021eb910,
-	0x034b21f5,
-	0x11f410fc,
-	0x0231f409,
-/* 0x0b7f: idle_proc_next */
-	0xb6ef0ef4,
-	0x1fb85810,
-	0xe61bf406,
-	0xf4dd02f4,
-	0x0ef40028,
-	0x000000bb,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
+	0x00e7f104,
+	0x4fe3f1d9,
+	0x6221f513,
+/* 0x0c9f: test_init */
+	0xf100f802,
+	0xf50800e7,
+	0xf8026221,
+/* 0x0ca9: idle_recv */
+/* 0x0cab: idle */
+	0xf400f800,
+	0x17f10031,
+	0x14b605d4,
+	0x0011cf06,
+	0xf10110b6,
+	0xb605d407,
+	0x01d00604,
+/* 0x0cc7: idle_loop */
+	0xf004bd00,
+	0x32f45817,
+/* 0x0ccd: idle_proc */
+/* 0x0ccd: idle_proc_exec */
+	0xb910f902,
+	0x21f5021e,
+	0x10fc034b,
+	0xf40911f4,
+	0x0ef40231,
+/* 0x0ce1: idle_proc_next */
+	0x5810b6ef,
+	0xf4061fb8,
+	0x02f4e61b,
+	0x0028f4dd,
+	0x00bb0ef4,
 	0x00000000,
 	0x00000000,
 	0x00000000,
diff --git a/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nvc0.fuc.h b/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nvc0.fuc.h
index ca30fa4011b5..90221d973f84 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nvc0.fuc.h
+++ b/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nvc0.fuc.h
@@ -46,8 +46,8 @@ uint32_t nvc0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x584d454d,
-	0x0000074b,
-	0x0000073d,
+	0x0000075e,
+	0x00000750,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -68,8 +68,8 @@ uint32_t nvc0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x46524550,
-	0x0000074f,
-	0x0000074d,
+	0x00000762,
+	0x00000760,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -90,8 +90,8 @@ uint32_t nvc0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x5f433249,
-	0x00000b7f,
-	0x00000a22,
+	0x00000b92,
+	0x00000a35,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -112,8 +112,8 @@ uint32_t nvc0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x54534554,
-	0x00000ba8,
-	0x00000b81,
+	0x00000bbb,
+	0x00000b94,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -134,8 +134,8 @@ uint32_t nvc0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x454c4449,
-	0x00000bb4,
-	0x00000bb2,
+	0x00000bc7,
+	0x00000bc5,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -246,13 +246,15 @@ uint32_t nvc0_pwr_data[] = {
 	0x00010006,
 	0x00000000,
 	0x00000663,
-/* 0x03b8: memx_func_tail */
-/* 0x03b8: memx_ts_start */
+	0x00000007,
 	0x00000000,
-/* 0x03bc: memx_ts_end */
+	0x000006e9,
+/* 0x03c4: memx_func_tail */
+/* 0x03c4: memx_ts_start */
 	0x00000000,
-/* 0x03c0: memx_data_head */
+/* 0x03c8: memx_ts_end */
 	0x00000000,
+/* 0x03cc: memx_data_head */
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -764,8 +766,75 @@ uint32_t nvc0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-/* 0x0bc0: memx_data_tail */
-/* 0x0bc0: i2c_scl_map */
+	0x00000000,
+/* 0x0bcc: memx_data_tail */
+/* 0x0bcc: memx_train_head */
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+/* 0x0ccc: memx_train_tail */
+/* 0x0ccc: i2c_scl_map */
 	0x00001000,
 	0x00004000,
 	0x00010000,
@@ -776,7 +845,7 @@ uint32_t nvc0_pwr_data[] = {
 	0x01000000,
 	0x04000000,
 	0x10000000,
-/* 0x0be8: i2c_sda_map */
+/* 0x0cf4: i2c_sda_map */
 	0x00002000,
 	0x00008000,
 	0x00020000,
@@ -787,7 +856,7 @@ uint32_t nvc0_pwr_data[] = {
 	0x02000000,
 	0x08000000,
 	0x20000000,
-/* 0x0c10: i2c_ctrl */
+/* 0x0d1c: i2c_ctrl */
 	0x0000e138,
 	0x0000e150,
 	0x0000e168,
@@ -845,9 +914,6 @@ uint32_t nvc0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
 };
 
 uint32_t nvc0_pwr_code[] = {
@@ -1272,10 +1338,10 @@ uint32_t nvc0_pwr_code[] = {
 	0xcf0664b6,
 	0x06800066,
 /* 0x05db: memx_func_leave */
-	0xf000f8ee,
+	0xf000f8f1,
 	0x64b62c67,
 	0x0066cf06,
-	0xf0ef0680,
+	0xf0f20680,
 	0x07f10467,
 	0x04b607e4,
 	0x0006d006,
@@ -1350,382 +1416,450 @@ uint32_t nvc0_pwr_code[] = {
 	0x1e9800f8,
 	0x0410b600,
 	0xf87f21f4,
-/* 0x06e9: memx_exec */
-	0xf9e0f900,
-	0x02c1b9d0,
-/* 0x06f3: memx_exec_next */
-	0x9802b2b9,
-	0x10b60013,
-	0xf034e704,
-	0xe033e701,
-	0x0132b601,
-	0x980c30f0,
-	0x55f9de35,
-	0xf40612b8,
-	0x0b98e41e,
-	0xef0c98ee,
-	0xf102cbbb,
-	0xb607c4b7,
-	0xbbcf06b4,
-	0xfcd0fc00,
-	0x4221f5e0,
-/* 0x072f: memx_info */
-	0xf100f803,
-	0xf103c0c7,
-	0xf50800b7,
+/* 0x06e9: memx_func_train */
+/* 0x06eb: memx_exec */
+	0xf900f800,
+	0xb9d0f9e0,
+	0xb2b902c1,
+/* 0x06f5: memx_exec_next */
+	0x00139802,
+	0xe70410b6,
+	0xe701f034,
+	0xb601e033,
+	0x30f00132,
+	0xde35980c,
+	0x12b855f9,
+	0xe41ef406,
+	0x98f10b98,
+	0xcbbbf20c,
+	0xc4b7f102,
+	0x06b4b607,
+	0xfc00bbcf,
+	0xf5e0fcd0,
 	0xf8034221,
-/* 0x073d: memx_recv */
-	0x01d6b000,
-	0xb0a90bf4,
-	0x0bf400d6,
-/* 0x074b: memx_init */
-	0xf800f8e9,
-/* 0x074d: perf_recv */
-/* 0x074f: perf_init */
-	0xf800f800,
-/* 0x0751: i2c_drive_scl */
-	0x0036b000,
-	0xf1110bf4,
-	0xb607e007,
-	0x01d00604,
-	0xf804bd00,
-/* 0x0765: i2c_drive_scl_lo */
-	0xe407f100,
-	0x0604b607,
-	0xbd0001d0,
-/* 0x0773: i2c_drive_sda */
-	0xb000f804,
-	0x0bf40036,
-	0xe007f111,
-	0x0604b607,
-	0xbd0002d0,
-/* 0x0787: i2c_drive_sda_lo */
-	0xf100f804,
-	0xb607e407,
-	0x02d00604,
-	0xf804bd00,
-/* 0x0795: i2c_sense_scl */
-	0x0132f400,
-	0x07c437f1,
-	0xcf0634b6,
-	0x31fd0033,
-	0x060bf404,
-/* 0x07ab: i2c_sense_scl_done */
-	0xf80131f4,
-/* 0x07ad: i2c_sense_sda */
-	0x0132f400,
-	0x07c437f1,
-	0xcf0634b6,
-	0x32fd0033,
-	0x060bf404,
-/* 0x07c3: i2c_sense_sda_done */
-	0xf80131f4,
-/* 0x07c5: i2c_raise_scl */
-	0xf140f900,
-	0xf0089847,
-	0x21f50137,
-/* 0x07d2: i2c_raise_scl_wait */
-	0xe7f10751,
-	0x21f403e8,
-	0x9521f57f,
-	0x0901f407,
-	0xf40142b6,
-/* 0x07e6: i2c_raise_scl_done */
-	0x40fcef1b,
-/* 0x07ea: i2c_start */
-	0x21f500f8,
-	0x11f40795,
-	0xad21f50d,
-	0x0611f407,
-/* 0x07fb: i2c_start_rep */
-	0xf0300ef4,
-	0x21f50037,
-	0x37f00751,
-	0x7321f501,
-	0x0076bb07,
-	0xf90465b6,
-	0x04659450,
-	0xbd0256bb,
-	0x0475fd50,
-	0x21f550fc,
-	0x64b607c5,
-	0x1f11f404,
-/* 0x0828: i2c_start_send */
-	0xf50037f0,
-	0xf1077321,
-	0xf41388e7,
-	0x37f07f21,
-	0x5121f500,
-	0x88e7f107,
-	0x7f21f413,
-/* 0x0844: i2c_start_out */
-/* 0x0846: i2c_stop */
-	0x37f000f8,
-	0x5121f500,
-	0x0037f007,
-	0x077321f5,
-	0x03e8e7f1,
-	0xf07f21f4,
-	0x21f50137,
-	0xe7f10751,
-	0x21f41388,
-	0x0137f07f,
-	0x077321f5,
-	0x1388e7f1,
-	0xf87f21f4,
-/* 0x0879: i2c_bitw */
-	0x7321f500,
+/* 0x0731: memx_info */
+	0x01c67000,
+/* 0x0737: memx_info_data */
+	0xf10e0bf4,
+	0xf103ccc7,
+	0xf40800b7,
+/* 0x0742: memx_info_train */
+	0xc7f10b0e,
+	0xb7f10bcc,
+/* 0x074a: memx_info_send */
+	0x21f50100,
+	0x00f80342,
+/* 0x0750: memx_recv */
+	0xf401d6b0,
+	0xd6b0980b,
+	0xd80bf400,
+/* 0x075e: memx_init */
+	0x00f800f8,
+/* 0x0760: perf_recv */
+/* 0x0762: perf_init */
+	0x00f800f8,
+/* 0x0764: i2c_drive_scl */
+	0xf40036b0,
+	0x07f1110b,
+	0x04b607e0,
+	0x0001d006,
+	0x00f804bd,
+/* 0x0778: i2c_drive_scl_lo */
+	0x07e407f1,
+	0xd00604b6,
+	0x04bd0001,
+/* 0x0786: i2c_drive_sda */
+	0x36b000f8,
+	0x110bf400,
+	0x07e007f1,
+	0xd00604b6,
+	0x04bd0002,
+/* 0x079a: i2c_drive_sda_lo */
+	0x07f100f8,
+	0x04b607e4,
+	0x0002d006,
+	0x00f804bd,
+/* 0x07a8: i2c_sense_scl */
+	0xf10132f4,
+	0xb607c437,
+	0x33cf0634,
+	0x0431fd00,
+	0xf4060bf4,
+/* 0x07be: i2c_sense_scl_done */
+	0x00f80131,
+/* 0x07c0: i2c_sense_sda */
+	0xf10132f4,
+	0xb607c437,
+	0x33cf0634,
+	0x0432fd00,
+	0xf4060bf4,
+/* 0x07d6: i2c_sense_sda_done */
+	0x00f80131,
+/* 0x07d8: i2c_raise_scl */
+	0x47f140f9,
+	0x37f00898,
+	0x6421f501,
+/* 0x07e5: i2c_raise_scl_wait */
 	0xe8e7f107,
 	0x7f21f403,
+	0x07a821f5,
+	0xb60901f4,
+	0x1bf40142,
+/* 0x07f9: i2c_raise_scl_done */
+	0xf840fcef,
+/* 0x07fd: i2c_start */
+	0xa821f500,
+	0x0d11f407,
+	0x07c021f5,
+	0xf40611f4,
+/* 0x080e: i2c_start_rep */
+	0x37f0300e,
+	0x6421f500,
+	0x0137f007,
+	0x078621f5,
 	0xb60076bb,
 	0x50f90465,
 	0xbb046594,
 	0x50bd0256,
 	0xfc0475fd,
-	0xc521f550,
+	0xd821f550,
 	0x0464b607,
-	0xf11811f4,
-	0xf41388e7,
+/* 0x083b: i2c_start_send */
+	0xf01f11f4,
+	0x21f50037,
+	0xe7f10786,
+	0x21f41388,
+	0x0037f07f,
+	0x076421f5,
+	0x1388e7f1,
+/* 0x0857: i2c_start_out */
+	0xf87f21f4,
+/* 0x0859: i2c_stop */
+	0x0037f000,
+	0x076421f5,
+	0xf50037f0,
+	0xf1078621,
+	0xf403e8e7,
 	0x37f07f21,
-	0x5121f500,
+	0x6421f501,
 	0x88e7f107,
 	0x7f21f413,
-/* 0x08b8: i2c_bitw_out */
-/* 0x08ba: i2c_bitr */
-	0x37f000f8,
-	0x7321f501,
-	0xe8e7f107,
-	0x7f21f403,
-	0xb60076bb,
-	0x50f90465,
-	0xbb046594,
-	0x50bd0256,
-	0xfc0475fd,
-	0xc521f550,
-	0x0464b607,
-	0xf51b11f4,
-	0xf007ad21,
-	0x21f50037,
-	0xe7f10751,
+	0xf50137f0,
+	0xf1078621,
+	0xf41388e7,
+	0x00f87f21,
+/* 0x088c: i2c_bitw */
+	0x078621f5,
+	0x03e8e7f1,
+	0xbb7f21f4,
+	0x65b60076,
+	0x9450f904,
+	0x56bb0465,
+	0xfd50bd02,
+	0x50fc0475,
+	0x07d821f5,
+	0xf40464b6,
+	0xe7f11811,
 	0x21f41388,
-	0x013cf07f,
-/* 0x08ff: i2c_bitr_done */
-	0xf80131f4,
-/* 0x0901: i2c_get_byte */
-	0x0057f000,
-/* 0x0907: i2c_get_byte_next */
-	0xb60847f0,
-	0x76bb0154,
-	0x0465b600,
-	0x659450f9,
-	0x0256bb04,
-	0x75fd50bd,
-	0xf550fc04,
-	0xb608ba21,
-	0x11f40464,
-	0x0553fd2b,
-	0xf40142b6,
-	0x37f0d81b,
+	0x0037f07f,
+	0x076421f5,
+	0x1388e7f1,
+/* 0x08cb: i2c_bitw_out */
+	0xf87f21f4,
+/* 0x08cd: i2c_bitr */
+	0x0137f000,
+	0x078621f5,
+	0x03e8e7f1,
+	0xbb7f21f4,
+	0x65b60076,
+	0x9450f904,
+	0x56bb0465,
+	0xfd50bd02,
+	0x50fc0475,
+	0x07d821f5,
+	0xf40464b6,
+	0x21f51b11,
+	0x37f007c0,
+	0x6421f500,
+	0x88e7f107,
+	0x7f21f413,
+	0xf4013cf0,
+/* 0x0912: i2c_bitr_done */
+	0x00f80131,
+/* 0x0914: i2c_get_byte */
+	0xf00057f0,
+/* 0x091a: i2c_get_byte_next */
+	0x54b60847,
 	0x0076bb01,
 	0xf90465b6,
 	0x04659450,
 	0xbd0256bb,
 	0x0475fd50,
 	0x21f550fc,
-	0x64b60879,
-/* 0x0951: i2c_get_byte_done */
-/* 0x0953: i2c_put_byte */
-	0xf000f804,
-/* 0x0956: i2c_put_byte_next */
-	0x42b60847,
-	0x3854ff01,
+	0x64b608cd,
+	0x2b11f404,
+	0xb60553fd,
+	0x1bf40142,
+	0x0137f0d8,
 	0xb60076bb,
 	0x50f90465,
 	0xbb046594,
 	0x50bd0256,
 	0xfc0475fd,
-	0x7921f550,
+	0x8c21f550,
 	0x0464b608,
-	0xb03411f4,
-	0x1bf40046,
-	0x0076bbd8,
+/* 0x0964: i2c_get_byte_done */
+/* 0x0966: i2c_put_byte */
+	0x47f000f8,
+/* 0x0969: i2c_put_byte_next */
+	0x0142b608,
+	0xbb3854ff,
+	0x65b60076,
+	0x9450f904,
+	0x56bb0465,
+	0xfd50bd02,
+	0x50fc0475,
+	0x088c21f5,
+	0xf40464b6,
+	0x46b03411,
+	0xd81bf400,
+	0xb60076bb,
+	0x50f90465,
+	0xbb046594,
+	0x50bd0256,
+	0xfc0475fd,
+	0xcd21f550,
+	0x0464b608,
+	0xbb0f11f4,
+	0x36b00076,
+	0x061bf401,
+/* 0x09bf: i2c_put_byte_done */
+	0xf80132f4,
+/* 0x09c1: i2c_addr */
+	0x0076bb00,
 	0xf90465b6,
 	0x04659450,
 	0xbd0256bb,
 	0x0475fd50,
 	0x21f550fc,
-	0x64b608ba,
-	0x0f11f404,
-	0xb00076bb,
-	0x1bf40136,
-	0x0132f406,
-/* 0x09ac: i2c_put_byte_done */
-/* 0x09ae: i2c_addr */
-	0x76bb00f8,
+	0x64b607fd,
+	0x2911f404,
+	0x012ec3e7,
+	0xfd0134b6,
+	0x76bb0553,
 	0x0465b600,
 	0x659450f9,
 	0x0256bb04,
 	0x75fd50bd,
 	0xf550fc04,
-	0xb607ea21,
-	0x11f40464,
-	0x2ec3e729,
-	0x0134b601,
-	0xbb0553fd,
+	0xb6096621,
+/* 0x0a06: i2c_addr_done */
+	0x00f80464,
+/* 0x0a08: i2c_acquire_addr */
+	0xb6f8cec7,
+	0xe0b702e4,
+	0xee980d1c,
+/* 0x0a17: i2c_acquire */
+	0xf500f800,
+	0xf40a0821,
+	0xd9f00421,
+	0x3f21f403,
+/* 0x0a26: i2c_release */
+	0x21f500f8,
+	0x21f40a08,
+	0x03daf004,
+	0xf83f21f4,
+/* 0x0a35: i2c_recv */
+	0x0132f400,
+	0xb6f8c1c7,
+	0x16b00214,
+	0x3a1ff528,
+	0xf413a001,
+	0x0032980c,
+	0x0ccc13a0,
+	0xf4003198,
+	0xd0f90231,
+	0xd0f9e0f9,
+	0x000067f1,
+	0x100063f1,
+	0xbb016792,
 	0x65b60076,
 	0x9450f904,
 	0x56bb0465,
 	0xfd50bd02,
 	0x50fc0475,
-	0x095321f5,
-/* 0x09f3: i2c_addr_done */
-	0xf80464b6,
-/* 0x09f5: i2c_acquire_addr */
-	0xf8cec700,
-	0xb702e4b6,
-	0x980c10e0,
-	0x00f800ee,
-/* 0x0a04: i2c_acquire */
-	0x09f521f5,
-	0xf00421f4,
-	0x21f403d9,
-/* 0x0a13: i2c_release */
-	0xf500f83f,
-	0xf409f521,
-	0xdaf00421,
-	0x3f21f403,
-/* 0x0a22: i2c_recv */
-	0x32f400f8,
-	0xf8c1c701,
-	0xb00214b6,
-	0x1ff52816,
-	0x13a0013a,
-	0x32980be8,
-	0xc013a000,
-	0x0031980b,
-	0xf90231f4,
-	0xf9e0f9d0,
-	0x0067f1d0,
-	0x0063f100,
-	0x01679210,
-	0xb60076bb,
-	0x50f90465,
-	0xbb046594,
-	0x50bd0256,
-	0xfc0475fd,
-	0x0421f550,
-	0x0464b60a,
-	0xd6b0d0fc,
-	0xb31bf500,
-	0x0057f000,
-	0xb60076bb,
-	0x50f90465,
-	0xbb046594,
-	0x50bd0256,
-	0xfc0475fd,
-	0xae21f550,
-	0x0464b609,
-	0x00d011f5,
-	0xbbe0c5c7,
+	0x0a1721f5,
+	0xfc0464b6,
+	0x00d6b0d0,
+	0x00b31bf5,
+	0xbb0057f0,
 	0x65b60076,
 	0x9450f904,
 	0x56bb0465,
 	0xfd50bd02,
 	0x50fc0475,
-	0x095321f5,
+	0x09c121f5,
 	0xf50464b6,
-	0xf000ad11,
-	0x76bb0157,
+	0xc700d011,
+	0x76bbe0c5,
 	0x0465b600,
 	0x659450f9,
 	0x0256bb04,
 	0x75fd50bd,
 	0xf550fc04,
-	0xb609ae21,
+	0xb6096621,
 	0x11f50464,
-	0x76bb008a,
-	0x0465b600,
-	0x659450f9,
-	0x0256bb04,
-	0x75fd50bd,
-	0xf550fc04,
-	0xb6090121,
-	0x11f40464,
-	0xe05bcb6a,
-	0xb60076bb,
-	0x50f90465,
-	0xbb046594,
-	0x50bd0256,
-	0xfc0475fd,
-	0x4621f550,
-	0x0464b608,
-	0xbd025bb9,
-	0x430ef474,
-/* 0x0b28: i2c_recv_not_rd08 */
-	0xf401d6b0,
-	0x57f03d1b,
-	0xae21f500,
-	0x3311f409,
-	0xf5e0c5c7,
-	0xf4095321,
-	0x57f02911,
-	0xae21f500,
-	0x1f11f409,
-	0xf5e0b5c7,
-	0xf4095321,
-	0x21f51511,
-	0x74bd0846,
-	0xf408c5c7,
-	0x32f4091b,
-	0x030ef402,
-/* 0x0b68: i2c_recv_not_wr08 */
-/* 0x0b68: i2c_recv_done */
-	0xf5f8cec7,
-	0xfc0a1321,
-	0xf4d0fce0,
-	0x7cb90a12,
-	0x4221f502,
-/* 0x0b7d: i2c_recv_exit */
-/* 0x0b7f: i2c_init */
-	0xf800f803,
-/* 0x0b81: test_recv */
-	0xd817f100,
-	0x0614b605,
-	0xb60011cf,
-	0x07f10110,
-	0x04b605d8,
-	0x0001d006,
-	0xe7f104bd,
-	0xe3f1d900,
-	0x21f5134f,
-	0x00f80262,
-/* 0x0ba8: test_init */
-	0x0800e7f1,
-	0x026221f5,
-/* 0x0bb2: idle_recv */
+	0x57f000ad,
+	0x0076bb01,
+	0xf90465b6,
+	0x04659450,
+	0xbd0256bb,
+	0x0475fd50,
+	0x21f550fc,
+	0x64b609c1,
+	0x8a11f504,
+	0x0076bb00,
+	0xf90465b6,
+	0x04659450,
+	0xbd0256bb,
+	0x0475fd50,
+	0x21f550fc,
+	0x64b60914,
+	0x6a11f404,
+	0xbbe05bcb,
+	0x65b60076,
+	0x9450f904,
+	0x56bb0465,
+	0xfd50bd02,
+	0x50fc0475,
+	0x085921f5,
+	0xb90464b6,
+	0x74bd025b,
+/* 0x0b3b: i2c_recv_not_rd08 */
+	0xb0430ef4,
+	0x1bf401d6,
+	0x0057f03d,
+	0x09c121f5,
+	0xc73311f4,
+	0x21f5e0c5,
+	0x11f40966,
+	0x0057f029,
+	0x09c121f5,
+	0xc71f11f4,
+	0x21f5e0b5,
+	0x11f40966,
+	0x5921f515,
+	0xc774bd08,
+	0x1bf408c5,
+	0x0232f409,
+/* 0x0b7b: i2c_recv_not_wr08 */
+/* 0x0b7b: i2c_recv_done */
+	0xc7030ef4,
+	0x21f5f8ce,
+	0xe0fc0a26,
+	0x12f4d0fc,
+	0x027cb90a,
+	0x034221f5,
+/* 0x0b90: i2c_recv_exit */
+/* 0x0b92: i2c_init */
 	0x00f800f8,
-/* 0x0bb4: idle */
-	0xf10031f4,
-	0xb605d417,
-	0x11cf0614,
-	0x0110b600,
-	0x05d407f1,
-	0xd00604b6,
-	0x04bd0001,
-/* 0x0bd0: idle_loop */
-	0xf45817f0,
-/* 0x0bd6: idle_proc */
-/* 0x0bd6: idle_proc_exec */
-	0x10f90232,
-	0xf5021eb9,
-	0xfc034b21,
-	0x0911f410,
-	0xf40231f4,
-/* 0x0bea: idle_proc_next */
-	0x10b6ef0e,
-	0x061fb858,
-	0xf4e61bf4,
-	0x28f4dd02,
-	0xbb0ef400,
+/* 0x0b94: test_recv */
+	0x05d817f1,
+	0xcf0614b6,
+	0x10b60011,
+	0xd807f101,
+	0x0604b605,
+	0xbd0001d0,
+	0x00e7f104,
+	0x4fe3f1d9,
+	0x6221f513,
+/* 0x0bbb: test_init */
+	0xf100f802,
+	0xf50800e7,
+	0xf8026221,
+/* 0x0bc5: idle_recv */
+/* 0x0bc7: idle */
+	0xf400f800,
+	0x17f10031,
+	0x14b605d4,
+	0x0011cf06,
+	0xf10110b6,
+	0xb605d407,
+	0x01d00604,
+/* 0x0be3: idle_loop */
+	0xf004bd00,
+	0x32f45817,
+/* 0x0be9: idle_proc */
+/* 0x0be9: idle_proc_exec */
+	0xb910f902,
+	0x21f5021e,
+	0x10fc034b,
+	0xf40911f4,
+	0x0ef40231,
+/* 0x0bfd: idle_proc_next */
+	0x5810b6ef,
+	0xf4061fb8,
+	0x02f4e61b,
+	0x0028f4dd,
+	0x00bb0ef4,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
 	0x00000000,
 };
diff --git a/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nvd0.fuc.h b/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nvd0.fuc.h
index 12d86f72ad10..7e16aab44d85 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nvd0.fuc.h
+++ b/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/nvd0.fuc.h
@@ -46,8 +46,8 @@ uint32_t nvd0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x584d454d,
-	0x00000678,
-	0x0000066a,
+	0x0000068b,
+	0x0000067d,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -68,8 +68,8 @@ uint32_t nvd0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x46524550,
-	0x0000067c,
-	0x0000067a,
+	0x0000068f,
+	0x0000068d,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -90,8 +90,8 @@ uint32_t nvd0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x5f433249,
-	0x00000a97,
-	0x0000093a,
+	0x00000aaa,
+	0x0000094d,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -112,8 +112,8 @@ uint32_t nvd0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x54534554,
-	0x00000aba,
-	0x00000a99,
+	0x00000acd,
+	0x00000aac,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -134,8 +134,8 @@ uint32_t nvd0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x454c4449,
-	0x00000ac6,
-	0x00000ac4,
+	0x00000ad9,
+	0x00000ad7,
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -246,13 +246,15 @@ uint32_t nvd0_pwr_data[] = {
 	0x00010006,
 	0x00000000,
 	0x000005d3,
-/* 0x03b8: memx_func_tail */
-/* 0x03b8: memx_ts_start */
+	0x00000007,
 	0x00000000,
-/* 0x03bc: memx_ts_end */
+	0x00000619,
+/* 0x03c4: memx_func_tail */
+/* 0x03c4: memx_ts_start */
 	0x00000000,
-/* 0x03c0: memx_data_head */
+/* 0x03c8: memx_ts_end */
 	0x00000000,
+/* 0x03cc: memx_data_head */
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -764,8 +766,75 @@ uint32_t nvd0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-/* 0x0bc0: memx_data_tail */
-/* 0x0bc0: i2c_scl_map */
+	0x00000000,
+/* 0x0bcc: memx_data_tail */
+/* 0x0bcc: memx_train_head */
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+/* 0x0ccc: memx_train_tail */
+/* 0x0ccc: i2c_scl_map */
 	0x00000400,
 	0x00000800,
 	0x00001000,
@@ -776,7 +845,7 @@ uint32_t nvd0_pwr_data[] = {
 	0x00020000,
 	0x00040000,
 	0x00080000,
-/* 0x0be8: i2c_sda_map */
+/* 0x0cf4: i2c_sda_map */
 	0x00100000,
 	0x00200000,
 	0x00400000,
@@ -844,9 +913,6 @@ uint32_t nvd0_pwr_data[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
 };
 
 uint32_t nvd0_pwr_code[] = {
@@ -1236,11 +1302,11 @@ uint32_t nvd0_pwr_code[] = {
 	0x0bf40464,
 	0x2c67f0f6,
 	0x800066cf,
-	0x00f8ee06,
+	0x00f8f106,
 /* 0x0554: memx_func_leave */
 	0xcf2c67f0,
 	0x06800066,
-	0x0467f0ef,
+	0x0467f0f2,
 	0x07e407f1,
 	0xbd0006d0,
 /* 0x0569: memx_func_leave_wait */
@@ -1292,379 +1358,383 @@ uint32_t nvd0_pwr_code[] = {
 	0x1e9800f8,
 	0x0410b600,
 	0xf86721f4,
-/* 0x0619: memx_exec */
-	0xf9e0f900,
-	0x02c1b9d0,
-/* 0x0623: memx_exec_next */
-	0x9802b2b9,
-	0x10b60013,
-	0xf034e704,
-	0xe033e701,
-	0x0132b601,
-	0x980c30f0,
-	0x55f9de35,
-	0xf40612b8,
-	0x0b98e41e,
-	0xef0c98ee,
-	0xf102cbbb,
-	0xcf07c4b7,
-	0xd0fc00bb,
-	0x21f5e0fc,
-	0x00f802f1,
-/* 0x065c: memx_info */
-	0x03c0c7f1,
-	0x0800b7f1,
+/* 0x0619: memx_func_train */
+/* 0x061b: memx_exec */
+	0xf900f800,
+	0xb9d0f9e0,
+	0xb2b902c1,
+/* 0x0625: memx_exec_next */
+	0x00139802,
+	0xe70410b6,
+	0xe701f034,
+	0xb601e033,
+	0x30f00132,
+	0xde35980c,
+	0x12b855f9,
+	0xe41ef406,
+	0x98f10b98,
+	0xcbbbf20c,
+	0xc4b7f102,
+	0x00bbcf07,
+	0xe0fcd0fc,
 	0x02f121f5,
-/* 0x066a: memx_recv */
-	0xd6b000f8,
-	0xac0bf401,
-	0xf400d6b0,
-	0x00f8e90b,
-/* 0x0678: memx_init */
-/* 0x067a: perf_recv */
-	0x00f800f8,
-/* 0x067c: perf_init */
-/* 0x067e: i2c_drive_scl */
-	0x36b000f8,
-	0x0e0bf400,
-	0x07e007f1,
-	0xbd0001d0,
-/* 0x068f: i2c_drive_scl_lo */
-	0xf100f804,
-	0xd007e407,
+/* 0x065e: memx_info */
+	0xc67000f8,
+	0x0e0bf401,
+/* 0x0664: memx_info_data */
+	0x03ccc7f1,
+	0x0800b7f1,
+/* 0x066f: memx_info_train */
+	0xf10b0ef4,
+	0xf10bccc7,
+/* 0x0677: memx_info_send */
+	0xf50100b7,
+	0xf802f121,
+/* 0x067d: memx_recv */
+	0x01d6b000,
+	0xb09b0bf4,
+	0x0bf400d6,
+/* 0x068b: memx_init */
+	0xf800f8d8,
+/* 0x068d: perf_recv */
+/* 0x068f: perf_init */
+	0xf800f800,
+/* 0x0691: i2c_drive_scl */
+	0x0036b000,
+	0xf10e0bf4,
+	0xd007e007,
 	0x04bd0001,
-/* 0x069a: i2c_drive_sda */
-	0x36b000f8,
-	0x0e0bf400,
-	0x07e007f1,
-	0xbd0002d0,
-/* 0x06ab: i2c_drive_sda_lo */
-	0xf100f804,
-	0xd007e407,
+/* 0x06a2: i2c_drive_scl_lo */
+	0x07f100f8,
+	0x01d007e4,
+	0xf804bd00,
+/* 0x06ad: i2c_drive_sda */
+	0x0036b000,
+	0xf10e0bf4,
+	0xd007e007,
 	0x04bd0002,
-/* 0x06b6: i2c_sense_scl */
+/* 0x06be: i2c_drive_sda_lo */
+	0x07f100f8,
+	0x02d007e4,
+	0xf804bd00,
+/* 0x06c9: i2c_sense_scl */
+	0x0132f400,
+	0x07c437f1,
+	0xfd0033cf,
+	0x0bf40431,
+	0x0131f406,
+/* 0x06dc: i2c_sense_scl_done */
+/* 0x06de: i2c_sense_sda */
 	0x32f400f8,
 	0xc437f101,
 	0x0033cf07,
-	0xf40431fd,
+	0xf40432fd,
 	0x31f4060b,
-/* 0x06c9: i2c_sense_scl_done */
-/* 0x06cb: i2c_sense_sda */
-	0xf400f801,
-	0x37f10132,
-	0x33cf07c4,
-	0x0432fd00,
-	0xf4060bf4,
-/* 0x06de: i2c_sense_sda_done */
-	0x00f80131,
-/* 0x06e0: i2c_raise_scl */
-	0x47f140f9,
-	0x37f00898,
-	0x7e21f501,
-/* 0x06ed: i2c_raise_scl_wait */
-	0xe8e7f106,
-	0x6721f403,
-	0x06b621f5,
-	0xb60901f4,
-	0x1bf40142,
-/* 0x0701: i2c_raise_scl_done */
-	0xf840fcef,
-/* 0x0705: i2c_start */
-	0xb621f500,
-	0x0d11f406,
-	0x06cb21f5,
-	0xf40611f4,
-/* 0x0716: i2c_start_rep */
-	0x37f0300e,
-	0x7e21f500,
-	0x0137f006,
-	0x069a21f5,
-	0xb60076bb,
-	0x50f90465,
-	0xbb046594,
-	0x50bd0256,
-	0xfc0475fd,
-	0xe021f550,
-	0x0464b606,
-/* 0x0743: i2c_start_send */
-	0xf01f11f4,
-	0x21f50037,
-	0xe7f1069a,
-	0x21f41388,
-	0x0037f067,
-	0x067e21f5,
-	0x1388e7f1,
-/* 0x075f: i2c_start_out */
-	0xf86721f4,
-/* 0x0761: i2c_stop */
-	0x0037f000,
-	0x067e21f5,
-	0xf50037f0,
-	0xf1069a21,
-	0xf403e8e7,
-	0x37f06721,
-	0x7e21f501,
-	0x88e7f106,
-	0x6721f413,
-	0xf50137f0,
-	0xf1069a21,
-	0xf41388e7,
-	0x00f86721,
-/* 0x0794: i2c_bitw */
-	0x069a21f5,
+/* 0x06f1: i2c_sense_sda_done */
+/* 0x06f3: i2c_raise_scl */
+	0xf900f801,
+	0x9847f140,
+	0x0137f008,
+	0x069121f5,
+/* 0x0700: i2c_raise_scl_wait */
 	0x03e8e7f1,
-	0xbb6721f4,
-	0x65b60076,
-	0x9450f904,
-	0x56bb0465,
-	0xfd50bd02,
-	0x50fc0475,
-	0x06e021f5,
-	0xf40464b6,
-	0xe7f11811,
-	0x21f41388,
-	0x0037f067,
-	0x067e21f5,
-	0x1388e7f1,
-/* 0x07d3: i2c_bitw_out */
-	0xf86721f4,
-/* 0x07d5: i2c_bitr */
-	0x0137f000,
-	0x069a21f5,
-	0x03e8e7f1,
-	0xbb6721f4,
+	0xf56721f4,
+	0xf406c921,
+	0x42b60901,
+	0xef1bf401,
+/* 0x0714: i2c_raise_scl_done */
+	0x00f840fc,
+/* 0x0718: i2c_start */
+	0x06c921f5,
+	0xf50d11f4,
+	0xf406de21,
+	0x0ef40611,
+/* 0x0729: i2c_start_rep */
+	0x0037f030,
+	0x069121f5,
+	0xf50137f0,
+	0xbb06ad21,
 	0x65b60076,
 	0x9450f904,
 	0x56bb0465,
 	0xfd50bd02,
 	0x50fc0475,
-	0x06e021f5,
+	0x06f321f5,
 	0xf40464b6,
-	0x21f51b11,
-	0x37f006cb,
-	0x7e21f500,
+/* 0x0756: i2c_start_send */
+	0x37f01f11,
+	0xad21f500,
 	0x88e7f106,
 	0x6721f413,
-	0xf4013cf0,
-/* 0x081a: i2c_bitr_done */
-	0x00f80131,
-/* 0x081c: i2c_get_byte */
-	0xf00057f0,
-/* 0x0822: i2c_get_byte_next */
-	0x54b60847,
-	0x0076bb01,
-	0xf90465b6,
-	0x04659450,
-	0xbd0256bb,
-	0x0475fd50,
-	0x21f550fc,
-	0x64b607d5,
-	0x2b11f404,
-	0xb60553fd,
-	0x1bf40142,
-	0x0137f0d8,
+	0xf50037f0,
+	0xf1069121,
+	0xf41388e7,
+/* 0x0772: i2c_start_out */
+	0x00f86721,
+/* 0x0774: i2c_stop */
+	0xf50037f0,
+	0xf0069121,
+	0x21f50037,
+	0xe7f106ad,
+	0x21f403e8,
+	0x0137f067,
+	0x069121f5,
+	0x1388e7f1,
+	0xf06721f4,
+	0x21f50137,
+	0xe7f106ad,
+	0x21f41388,
+/* 0x07a7: i2c_bitw */
+	0xf500f867,
+	0xf106ad21,
+	0xf403e8e7,
+	0x76bb6721,
+	0x0465b600,
+	0x659450f9,
+	0x0256bb04,
+	0x75fd50bd,
+	0xf550fc04,
+	0xb606f321,
+	0x11f40464,
+	0x88e7f118,
+	0x6721f413,
+	0xf50037f0,
+	0xf1069121,
+	0xf41388e7,
+/* 0x07e6: i2c_bitw_out */
+	0x00f86721,
+/* 0x07e8: i2c_bitr */
+	0xf50137f0,
+	0xf106ad21,
+	0xf403e8e7,
+	0x76bb6721,
+	0x0465b600,
+	0x659450f9,
+	0x0256bb04,
+	0x75fd50bd,
+	0xf550fc04,
+	0xb606f321,
+	0x11f40464,
+	0xde21f51b,
+	0x0037f006,
+	0x069121f5,
+	0x1388e7f1,
+	0xf06721f4,
+	0x31f4013c,
+/* 0x082d: i2c_bitr_done */
+/* 0x082f: i2c_get_byte */
+	0xf000f801,
+	0x47f00057,
+/* 0x0835: i2c_get_byte_next */
+	0x0154b608,
 	0xb60076bb,
 	0x50f90465,
 	0xbb046594,
 	0x50bd0256,
 	0xfc0475fd,
-	0x9421f550,
+	0xe821f550,
 	0x0464b607,
-/* 0x086c: i2c_get_byte_done */
-/* 0x086e: i2c_put_byte */
-	0x47f000f8,
-/* 0x0871: i2c_put_byte_next */
-	0x0142b608,
-	0xbb3854ff,
+	0xfd2b11f4,
+	0x42b60553,
+	0xd81bf401,
+	0xbb0137f0,
+	0x65b60076,
+	0x9450f904,
+	0x56bb0465,
+	0xfd50bd02,
+	0x50fc0475,
+	0x07a721f5,
+/* 0x087f: i2c_get_byte_done */
+	0xf80464b6,
+/* 0x0881: i2c_put_byte */
+	0x0847f000,
+/* 0x0884: i2c_put_byte_next */
+	0xff0142b6,
+	0x76bb3854,
+	0x0465b600,
+	0x659450f9,
+	0x0256bb04,
+	0x75fd50bd,
+	0xf550fc04,
+	0xb607a721,
+	0x11f40464,
+	0x0046b034,
+	0xbbd81bf4,
 	0x65b60076,
 	0x9450f904,
 	0x56bb0465,
 	0xfd50bd02,
 	0x50fc0475,
-	0x079421f5,
+	0x07e821f5,
 	0xf40464b6,
-	0x46b03411,
-	0xd81bf400,
+	0x76bb0f11,
+	0x0136b000,
+	0xf4061bf4,
+/* 0x08da: i2c_put_byte_done */
+	0x00f80132,
+/* 0x08dc: i2c_addr */
 	0xb60076bb,
 	0x50f90465,
 	0xbb046594,
 	0x50bd0256,
 	0xfc0475fd,
-	0xd521f550,
+	0x1821f550,
 	0x0464b607,
-	0xbb0f11f4,
-	0x36b00076,
-	0x061bf401,
-/* 0x08c7: i2c_put_byte_done */
-	0xf80132f4,
-/* 0x08c9: i2c_addr */
-	0x0076bb00,
+	0xe72911f4,
+	0xb6012ec3,
+	0x53fd0134,
+	0x0076bb05,
 	0xf90465b6,
 	0x04659450,
 	0xbd0256bb,
 	0x0475fd50,
 	0x21f550fc,
-	0x64b60705,
-	0x2911f404,
-	0x012ec3e7,
-	0xfd0134b6,
-	0x76bb0553,
-	0x0465b600,
-	0x659450f9,
-	0x0256bb04,
-	0x75fd50bd,
-	0xf550fc04,
-	0xb6086e21,
-/* 0x090e: i2c_addr_done */
-	0x00f80464,
-/* 0x0910: i2c_acquire_addr */
-	0xb6f8cec7,
-	0xe0b705e4,
-	0x00f8d014,
-/* 0x091c: i2c_acquire */
-	0x091021f5,
-	0xf00421f4,
-	0x21f403d9,
-/* 0x092b: i2c_release */
-	0xf500f833,
-	0xf4091021,
-	0xdaf00421,
+	0x64b60881,
+/* 0x0921: i2c_addr_done */
+/* 0x0923: i2c_acquire_addr */
+	0xc700f804,
+	0xe4b6f8ce,
+	0x14e0b705,
+/* 0x092f: i2c_acquire */
+	0xf500f8d0,
+	0xf4092321,
+	0xd9f00421,
 	0x3321f403,
-/* 0x093a: i2c_recv */
-	0x32f400f8,
-	0xf8c1c701,
-	0xb00214b6,
-	0x1ff52816,
-	0x13a0013a,
-	0x32980be8,
-	0xc013a000,
-	0x0031980b,
-	0xf90231f4,
-	0xf9e0f9d0,
-	0x0067f1d0,
-	0x0063f100,
-	0x01679210,
-	0xb60076bb,
-	0x50f90465,
-	0xbb046594,
-	0x50bd0256,
-	0xfc0475fd,
-	0x1c21f550,
-	0x0464b609,
-	0xd6b0d0fc,
-	0xb31bf500,
-	0x0057f000,
-	0xb60076bb,
-	0x50f90465,
-	0xbb046594,
-	0x50bd0256,
-	0xfc0475fd,
-	0xc921f550,
-	0x0464b608,
-	0x00d011f5,
-	0xbbe0c5c7,
+/* 0x093e: i2c_release */
+	0x21f500f8,
+	0x21f40923,
+	0x03daf004,
+	0xf83321f4,
+/* 0x094d: i2c_recv */
+	0x0132f400,
+	0xb6f8c1c7,
+	0x16b00214,
+	0x3a1ff528,
+	0xf413a001,
+	0x0032980c,
+	0x0ccc13a0,
+	0xf4003198,
+	0xd0f90231,
+	0xd0f9e0f9,
+	0x000067f1,
+	0x100063f1,
+	0xbb016792,
 	0x65b60076,
 	0x9450f904,
 	0x56bb0465,
 	0xfd50bd02,
 	0x50fc0475,
-	0x086e21f5,
+	0x092f21f5,
+	0xfc0464b6,
+	0x00d6b0d0,
+	0x00b31bf5,
+	0xbb0057f0,
+	0x65b60076,
+	0x9450f904,
+	0x56bb0465,
+	0xfd50bd02,
+	0x50fc0475,
+	0x08dc21f5,
 	0xf50464b6,
-	0xf000ad11,
-	0x76bb0157,
+	0xc700d011,
+	0x76bbe0c5,
 	0x0465b600,
 	0x659450f9,
 	0x0256bb04,
 	0x75fd50bd,
 	0xf550fc04,
-	0xb608c921,
+	0xb6088121,
 	0x11f50464,
-	0x76bb008a,
-	0x0465b600,
-	0x659450f9,
-	0x0256bb04,
-	0x75fd50bd,
-	0xf550fc04,
-	0xb6081c21,
-	0x11f40464,
-	0xe05bcb6a,
-	0xb60076bb,
-	0x50f90465,
-	0xbb046594,
-	0x50bd0256,
-	0xfc0475fd,
-	0x6121f550,
-	0x0464b607,
-	0xbd025bb9,
-	0x430ef474,
-/* 0x0a40: i2c_recv_not_rd08 */
-	0xf401d6b0,
-	0x57f03d1b,
-	0xc921f500,
-	0x3311f408,
-	0xf5e0c5c7,
-	0xf4086e21,
-	0x57f02911,
-	0xc921f500,
-	0x1f11f408,
-	0xf5e0b5c7,
-	0xf4086e21,
-	0x21f51511,
-	0x74bd0761,
-	0xf408c5c7,
-	0x32f4091b,
-	0x030ef402,
-/* 0x0a80: i2c_recv_not_wr08 */
-/* 0x0a80: i2c_recv_done */
-	0xf5f8cec7,
-	0xfc092b21,
-	0xf4d0fce0,
-	0x7cb90a12,
-	0xf121f502,
-/* 0x0a95: i2c_recv_exit */
-/* 0x0a97: i2c_init */
+	0x57f000ad,
+	0x0076bb01,
+	0xf90465b6,
+	0x04659450,
+	0xbd0256bb,
+	0x0475fd50,
+	0x21f550fc,
+	0x64b608dc,
+	0x8a11f504,
+	0x0076bb00,
+	0xf90465b6,
+	0x04659450,
+	0xbd0256bb,
+	0x0475fd50,
+	0x21f550fc,
+	0x64b6082f,
+	0x6a11f404,
+	0xbbe05bcb,
+	0x65b60076,
+	0x9450f904,
+	0x56bb0465,
+	0xfd50bd02,
+	0x50fc0475,
+	0x077421f5,
+	0xb90464b6,
+	0x74bd025b,
+/* 0x0a53: i2c_recv_not_rd08 */
+	0xb0430ef4,
+	0x1bf401d6,
+	0x0057f03d,
+	0x08dc21f5,
+	0xc73311f4,
+	0x21f5e0c5,
+	0x11f40881,
+	0x0057f029,
+	0x08dc21f5,
+	0xc71f11f4,
+	0x21f5e0b5,
+	0x11f40881,
+	0x7421f515,
+	0xc774bd07,
+	0x1bf408c5,
+	0x0232f409,
+/* 0x0a93: i2c_recv_not_wr08 */
+/* 0x0a93: i2c_recv_done */
+	0xc7030ef4,
+	0x21f5f8ce,
+	0xe0fc093e,
+	0x12f4d0fc,
+	0x027cb90a,
+	0x02f121f5,
+/* 0x0aa8: i2c_recv_exit */
+/* 0x0aaa: i2c_init */
+	0x00f800f8,
+/* 0x0aac: test_recv */
+	0x05d817f1,
+	0xb60011cf,
+	0x07f10110,
+	0x01d005d8,
+	0xf104bd00,
+	0xf1d900e7,
+	0xf5134fe3,
+	0xf8022321,
+/* 0x0acd: test_init */
+	0x00e7f100,
+	0x2321f508,
+/* 0x0ad7: idle_recv */
 	0xf800f802,
-/* 0x0a99: test_recv */
-	0xd817f100,
-	0x0011cf05,
-	0xf10110b6,
-	0xd005d807,
-	0x04bd0001,
-	0xd900e7f1,
-	0x134fe3f1,
-	0x022321f5,
-/* 0x0aba: test_init */
-	0xe7f100f8,
-	0x21f50800,
-	0x00f80223,
-/* 0x0ac4: idle_recv */
-/* 0x0ac6: idle */
-	0x31f400f8,
-	0xd417f100,
-	0x0011cf05,
-	0xf10110b6,
-	0xd005d407,
-	0x04bd0001,
-/* 0x0adc: idle_loop */
-	0xf45817f0,
-/* 0x0ae2: idle_proc */
-/* 0x0ae2: idle_proc_exec */
-	0x10f90232,
-	0xf5021eb9,
-	0xfc02fa21,
-	0x0911f410,
-	0xf40231f4,
-/* 0x0af6: idle_proc_next */
-	0x10b6ef0e,
-	0x061fb858,
-	0xf4e61bf4,
-	0x28f4dd02,
-	0xc10ef400,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
-	0x00000000,
+/* 0x0ad9: idle */
+	0x0031f400,
+	0x05d417f1,
+	0xb60011cf,
+	0x07f10110,
+	0x01d005d4,
+/* 0x0aef: idle_loop */
+	0xf004bd00,
+	0x32f45817,
+/* 0x0af5: idle_proc */
+/* 0x0af5: idle_proc_exec */
+	0xb910f902,
+	0x21f5021e,
+	0x10fc02fa,
+	0xf40911f4,
+	0x0ef40231,
+/* 0x0b09: idle_proc_next */
+	0x5810b6ef,
+	0xf4061fb8,
+	0x02f4e61b,
+	0x0028f4dd,
+	0x00c10ef4,
 	0x00000000,
 	0x00000000,
 	0x00000000,
diff --git a/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/os.h b/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/os.h
index 522e3079f824..c8b06cb77e72 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/os.h
+++ b/drivers/gpu/drm/nouveau/core/subdev/pwr/fuc/os.h
@@ -18,6 +18,10 @@
 #define MEMX_MSG_INFO 0
 #define MEMX_MSG_EXEC 1
 
+/* MEMX: info types */
+#define MEMX_INFO_DATA  0
+#define MEMX_INFO_TRAIN 1
+
 /* MEMX: script opcode definitions */
 #define MEMX_ENTER  1
 #define MEMX_LEAVE  2
@@ -25,6 +29,7 @@
 #define MEMX_WAIT   4
 #define MEMX_DELAY  5
 #define MEMX_VBLANK 6
+#define MEMX_TRAIN  7
 
 /* I2C_: message identifiers */
 #define I2C__MSG_RD08 0
diff --git a/drivers/gpu/drm/nouveau/core/subdev/pwr/memx.c b/drivers/gpu/drm/nouveau/core/subdev/pwr/memx.c
index 65eaa2546cad..7a9299d7159f 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/pwr/memx.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/pwr/memx.c
@@ -47,7 +47,8 @@ nouveau_memx_init(struct nouveau_pwr *ppwr, struct nouveau_memx **pmemx)
 	u32 reply[2];
 	int ret;
 
-	ret = ppwr->message(ppwr, reply, PROC_MEMX, MEMX_MSG_INFO, 0, 0);
+	ret = ppwr->message(ppwr, reply, PROC_MEMX, MEMX_MSG_INFO,
+					MEMX_INFO_DATA, 0);
 	if (ret)
 		return ret;
 
@@ -106,7 +107,7 @@ nouveau_memx_wait(struct nouveau_memx *memx,
 {
 	nv_debug(memx->ppwr, "R[%06x] & 0x%08x == 0x%08x, %d us\n",
 				addr, mask, data, nsec);
-	memx_cmd(memx, MEMX_WAIT, 4, (u32[]){ addr, ~mask, data, nsec });
+	memx_cmd(memx, MEMX_WAIT, 4, (u32[]){ addr, mask, data, nsec });
 	memx_out(memx); /* fuc can't handle multiple */
 }
 
@@ -152,6 +153,38 @@ nouveau_memx_wait_vblank(struct nouveau_memx *memx)
 }
 
 void
+nouveau_memx_train(struct nouveau_memx *memx)
+{
+	nv_debug(memx->ppwr, "   MEM TRAIN\n");
+	memx_cmd(memx, MEMX_TRAIN, 0, NULL);
+}
+
+int
+nouveau_memx_train_result(struct nouveau_pwr *ppwr, u32 *res, int rsize)
+{
+	u32 reply[2], base, size, i;
+	int ret;
+
+	ret = ppwr->message(ppwr, reply, PROC_MEMX, MEMX_MSG_INFO,
+					MEMX_INFO_TRAIN, 0);
+	if (ret)
+		return ret;
+
+	base = reply[0];
+	size = reply[1] >> 2;
+	if (size > rsize)
+		return -ENOMEM;
+
+	/* read the packet */
+	nv_wr32(ppwr, 0x10a1c0, 0x02000000 | base);
+
+	for (i = 0; i < size; i++)
+		res[i] = nv_rd32(ppwr, 0x10a1c4);
+
+	return 0;
+}
+
+void
 nouveau_memx_block(struct nouveau_memx *memx)
 {
 	nv_debug(memx->ppwr, "   HOST BLOCKED\n");
diff --git a/drivers/gpu/drm/nouveau/core/subdev/volt/base.c b/drivers/gpu/drm/nouveau/core/subdev/volt/base.c
index 32794a999106..26ccd8df193f 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/volt/base.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/volt/base.c
@@ -101,6 +101,41 @@ nouveau_volt_set_id(struct nouveau_volt *volt, u8 id, int condition)
 	return ret;
 }
 
+static void nouveau_volt_parse_bios(struct nouveau_bios *bios,
+		struct nouveau_volt *volt)
+{
+	struct nvbios_volt_entry ivid;
+	struct nvbios_volt info;
+	u8  ver, hdr, cnt, len;
+	u16 data;
+	int i;
+
+	data = nvbios_volt_parse(bios, &ver, &hdr, &cnt, &len, &info);
+	if (data && info.vidmask && info.base && info.step) {
+		for (i = 0; i < info.vidmask + 1; i++) {
+			if (info.base >= info.min &&
+				info.base <= info.max) {
+				volt->vid[volt->vid_nr].uv = info.base;
+				volt->vid[volt->vid_nr].vid = i;
+				volt->vid_nr++;
+			}
+			info.base += info.step;
+		}
+		volt->vid_mask = info.vidmask;
+	} else if (data && info.vidmask) {
+		for (i = 0; i < cnt; i++) {
+			data = nvbios_volt_entry_parse(bios, i, &ver, &hdr,
+							  &ivid);
+			if (data) {
+				volt->vid[volt->vid_nr].uv = ivid.voltage;
+				volt->vid[volt->vid_nr].vid = ivid.vid;
+				volt->vid_nr++;
+			}
+		}
+		volt->vid_mask = info.vidmask;
+	}
+}
+
 int
 _nouveau_volt_init(struct nouveau_object *object)
 {
@@ -136,10 +171,6 @@ nouveau_volt_create_(struct nouveau_object *parent,
 {
 	struct nouveau_bios *bios = nouveau_bios(parent);
 	struct nouveau_volt *volt;
-	struct nvbios_volt_entry ivid;
-	struct nvbios_volt info;
-	u8  ver, hdr, cnt, len;
-	u16 data;
 	int ret, i;
 
 	ret = nouveau_subdev_create_(parent, engine, oclass, 0, "VOLT",
@@ -152,31 +183,9 @@ nouveau_volt_create_(struct nouveau_object *parent,
 	volt->set = nouveau_volt_set;
 	volt->set_id = nouveau_volt_set_id;
 
-	data = nvbios_volt_parse(bios, &ver, &hdr, &cnt, &len, &info);
-	if (data && info.vidmask && info.base && info.step) {
-		for (i = 0; i < info.vidmask + 1; i++) {
-			if (info.base >= info.min &&
-			    info.base <= info.max) {
-				volt->vid[volt->vid_nr].uv = info.base;
-				volt->vid[volt->vid_nr].vid = i;
-				volt->vid_nr++;
-			}
-			info.base += info.step;
-		}
-		volt->vid_mask = info.vidmask;
-	} else
-	if (data && info.vidmask) {
-		for (i = 0; i < cnt; i++) {
-			data = nvbios_volt_entry_parse(bios, i, &ver, &hdr,
-						      &ivid);
-			if (data) {
-				volt->vid[volt->vid_nr].uv = ivid.voltage;
-				volt->vid[volt->vid_nr].vid = ivid.vid;
-				volt->vid_nr++;
-			}
-		}
-		volt->vid_mask = info.vidmask;
-	}
+	/* Assuming the non-bios device should build the voltage table later */
+	if (bios)
+		nouveau_volt_parse_bios(bios, volt);
 
 	if (volt->vid_nr) {
 		for (i = 0; i < volt->vid_nr; i++) {
diff --git a/drivers/gpu/drm/nouveau/core/subdev/volt/gk20a.c b/drivers/gpu/drm/nouveau/core/subdev/volt/gk20a.c
new file mode 100644
index 000000000000..717368ef31ac
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/core/subdev/volt/gk20a.c
@@ -0,0 +1,199 @@
+/*
+ * Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifdef __KERNEL__
+#include <nouveau_platform.h>
+#endif
+#include <subdev/volt.h>
+
+struct cvb_coef {
+	int c0;
+	int c1;
+	int c2;
+	int c3;
+	int c4;
+	int c5;
+};
+
+struct gk20a_volt_priv {
+	struct nouveau_volt base;
+	struct regulator *vdd;
+};
+
+const struct cvb_coef gk20a_cvb_coef[] = {
+	/* MHz,        c0,     c1,   c2,    c3,     c4,   c5 */
+	/*  72 */ { 1209886, -36468,  515,   417, -13123,  203},
+	/* 108 */ { 1130804, -27659,  296,   298, -10834,  221},
+	/* 180 */ { 1162871, -27110,  247,   238, -10681,  268},
+	/* 252 */ { 1220458, -28654,  247,   179, -10376,  298},
+	/* 324 */ { 1280953, -30204,  247,   119,  -9766,  304},
+	/* 396 */ { 1344547, -31777,  247,   119,  -8545,  292},
+	/* 468 */ { 1420168, -34227,  269,    60,  -7172,  256},
+	/* 540 */ { 1490757, -35955,  274,    60,  -5188,  197},
+	/* 612 */ { 1599112, -42583,  398,     0,  -1831,  119},
+	/* 648 */ { 1366986, -16459, -274,     0,  -3204,   72},
+	/* 684 */ { 1391884, -17078, -274,   -60,  -1526,   30},
+	/* 708 */ { 1415522, -17497, -274,   -60,   -458,    0},
+	/* 756 */ { 1464061, -18331, -274,  -119,   1831,  -72},
+	/* 804 */ { 1524225, -20064, -254,  -119,   4272, -155},
+	/* 852 */ { 1608418, -21643, -269,     0,    763,  -48},
+};
+
+/**
+ * cvb_mv = ((c2 * speedo / s_scale + c1) * speedo / s_scale + c0)
+ */
+static inline int
+gk20a_volt_get_cvb_voltage(int speedo, int s_scale,
+		const struct cvb_coef *coef)
+{
+	int mv;
+
+	mv = DIV_ROUND_CLOSEST(coef->c2 * speedo, s_scale);
+	mv = DIV_ROUND_CLOSEST((mv + coef->c1) * speedo, s_scale) + coef->c0;
+	return mv;
+}
+
+/**
+ * cvb_t_mv =
+ * ((c2 * speedo / s_scale + c1) * speedo / s_scale + c0) +
+ * ((c3 * speedo / s_scale + c4 + c5 * T / t_scale) * T / t_scale)
+ */
+static inline int
+gk20a_volt_get_cvb_t_voltage(int speedo, int temp, int s_scale, int t_scale,
+		const struct cvb_coef *coef)
+{
+	int cvb_mv, mv;
+
+	cvb_mv = gk20a_volt_get_cvb_voltage(speedo, s_scale, coef);
+
+	mv = DIV_ROUND_CLOSEST(coef->c3 * speedo, s_scale) + coef->c4 +
+		DIV_ROUND_CLOSEST(coef->c5 * temp, t_scale);
+	mv = DIV_ROUND_CLOSEST(mv * temp, t_scale) + cvb_mv;
+	return mv;
+}
+
+static int
+gk20a_volt_calc_voltage(const struct cvb_coef *coef, int speedo)
+{
+	int mv;
+
+	mv = gk20a_volt_get_cvb_t_voltage(speedo, -10, 100, 10, coef);
+	mv = DIV_ROUND_UP(mv, 1000);
+
+	return mv * 1000;
+}
+
+static int
+gk20a_volt_vid_get(struct nouveau_volt *volt)
+{
+	struct gk20a_volt_priv *priv = (void *)volt;
+	int i, uv;
+
+	uv = regulator_get_voltage(priv->vdd);
+
+	for (i = 0; i < volt->vid_nr; i++)
+		if (volt->vid[i].uv >= uv)
+			return i;
+
+	return -EINVAL;
+}
+
+static int
+gk20a_volt_vid_set(struct nouveau_volt *volt, u8 vid)
+{
+	struct gk20a_volt_priv *priv = (void *)volt;
+
+	nv_debug(volt, "set voltage as %duv\n", volt->vid[vid].uv);
+	return regulator_set_voltage(priv->vdd, volt->vid[vid].uv, 1200000);
+}
+
+static int
+gk20a_volt_set_id(struct nouveau_volt *volt, u8 id, int condition)
+{
+	struct gk20a_volt_priv *priv = (void *)volt;
+	int prev_uv = regulator_get_voltage(priv->vdd);
+	int target_uv = volt->vid[id].uv;
+	int ret;
+
+	nv_debug(volt, "prev=%d, target=%d, condition=%d\n",
+			prev_uv, target_uv, condition);
+	if (!condition ||
+		(condition < 0 && target_uv < prev_uv) ||
+		(condition > 0 && target_uv > prev_uv)) {
+		ret = gk20a_volt_vid_set(volt, volt->vid[id].vid);
+	} else {
+		ret = 0;
+	}
+
+	return ret;
+}
+
+static int
+gk20a_volt_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
+	       struct nouveau_oclass *oclass, void *data, u32 size,
+	       struct nouveau_object **pobject)
+{
+	struct gk20a_volt_priv *priv;
+	struct nouveau_volt *volt;
+	struct nouveau_platform_device *plat;
+	int i, ret, uv;
+
+	ret = nouveau_volt_create(parent, engine, oclass, &priv);
+	*pobject = nv_object(priv);
+	if (ret)
+		return ret;
+
+	volt = &priv->base;
+
+	plat = nv_device_to_platform(nv_device(parent));
+
+	uv = regulator_get_voltage(plat->gpu->vdd);
+	nv_info(priv, "The default voltage is %duV\n", uv);
+
+	priv->vdd = plat->gpu->vdd;
+	priv->base.vid_get = gk20a_volt_vid_get;
+	priv->base.vid_set = gk20a_volt_vid_set;
+	priv->base.set_id = gk20a_volt_set_id;
+
+	volt->vid_nr = ARRAY_SIZE(gk20a_cvb_coef);
+	nv_debug(priv, "%s - vid_nr = %d\n", __func__, volt->vid_nr);
+	for (i = 0; i < volt->vid_nr; i++) {
+		volt->vid[i].vid = i;
+		volt->vid[i].uv = gk20a_volt_calc_voltage(&gk20a_cvb_coef[i],
+					plat->gpu_speedo);
+		nv_debug(priv, "%2d: vid=%d, uv=%d\n", i, volt->vid[i].vid,
+					volt->vid[i].uv);
+	}
+
+	return 0;
+}
+
+struct nouveau_oclass
+gk20a_volt_oclass = {
+	.handle = NV_SUBDEV(VOLT, 0xea),
+	.ofuncs = &(struct nouveau_ofuncs) {
+		.ctor = gk20a_volt_ctor,
+		.dtor = _nouveau_volt_dtor,
+		.init = _nouveau_volt_init,
+		.fini = _nouveau_volt_fini,
+	},
+};
diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
index fca6a1f9c20c..38402ade6835 100644
--- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
+++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
@@ -26,6 +26,7 @@
 
 #include <drm/drmP.h>
 #include <drm/drm_crtc_helper.h>
+#include <drm/drm_plane_helper.h>
 
 #include "nouveau_drm.h"
 #include "nouveau_reg.h"
@@ -613,7 +614,7 @@ nv_crtc_swap_fbs(struct drm_crtc *crtc, struct drm_framebuffer *old_fb)
 	struct nouveau_crtc *nv_crtc = nouveau_crtc(crtc);
 	int ret;
 
-	ret = nouveau_bo_pin(nvfb->nvbo, TTM_PL_FLAG_VRAM);
+	ret = nouveau_bo_pin(nvfb->nvbo, TTM_PL_FLAG_VRAM, false);
 	if (ret == 0) {
 		if (disp->image[nv_crtc->index])
 			nouveau_bo_unpin(disp->image[nv_crtc->index]);
@@ -1129,7 +1130,7 @@ nv04_crtc_create(struct drm_device *dev, int crtc_num)
 	ret = nouveau_bo_new(dev, 64*64*4, 0x100, TTM_PL_FLAG_VRAM,
 			     0, 0x0000, NULL, NULL, &nv_crtc->cursor.nvbo);
 	if (!ret) {
-		ret = nouveau_bo_pin(nv_crtc->cursor.nvbo, TTM_PL_FLAG_VRAM);
+		ret = nouveau_bo_pin(nv_crtc->cursor.nvbo, TTM_PL_FLAG_VRAM, false);
 		if (!ret) {
 			ret = nouveau_bo_map(nv_crtc->cursor.nvbo);
 			if (ret)
diff --git a/drivers/gpu/drm/nouveau/dispnv04/overlay.c b/drivers/gpu/drm/nouveau/dispnv04/overlay.c
index 1e9056a8df94..9f2498571d09 100644
--- a/drivers/gpu/drm/nouveau/dispnv04/overlay.c
+++ b/drivers/gpu/drm/nouveau/dispnv04/overlay.c
@@ -126,7 +126,7 @@ nv10_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
 			return -ERANGE;
 	}
 
-	ret = nouveau_bo_pin(nv_fb->nvbo, TTM_PL_FLAG_VRAM);
+	ret = nouveau_bo_pin(nv_fb->nvbo, TTM_PL_FLAG_VRAM, false);
 	if (ret)
 		return ret;
 
@@ -373,7 +373,7 @@ nv04_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
 	if (crtc_w < src_w || crtc_h < src_h)
 		return -ERANGE;
 
-	ret = nouveau_bo_pin(nv_fb->nvbo, TTM_PL_FLAG_VRAM);
+	ret = nouveau_bo_pin(nv_fb->nvbo, TTM_PL_FLAG_VRAM, false);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c b/drivers/gpu/drm/nouveau/nouveau_abi16.c
index a24faa5e2a2a..d39a15000068 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
@@ -308,7 +308,7 @@ nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
 	ret = nouveau_gem_new(dev, PAGE_SIZE, 0, NOUVEAU_GEM_DOMAIN_GART,
 			      0, 0, &chan->ntfy);
 	if (ret == 0)
-		ret = nouveau_bo_pin(chan->ntfy, TTM_PL_FLAG_TT);
+		ret = nouveau_bo_pin(chan->ntfy, TTM_PL_FLAG_TT, false);
 	if (ret)
 		goto done;
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_bios.c b/drivers/gpu/drm/nouveau/nouveau_bios.c
index dae2c96deef8..7df6acc8bb34 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bios.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bios.c
@@ -1258,7 +1258,7 @@ olddcb_table(struct drm_device *dev)
 		return NULL;
 	}
 
-	if (dcb[0] >= 0x41) {
+	if (dcb[0] >= 0x42) {
 		NV_WARN(drm, "DCB version 0x%02x unknown\n", dcb[0]);
 		return NULL;
 	} else
@@ -1481,18 +1481,22 @@ parse_dcb20_entry(struct drm_device *dev, struct dcb_table *dcb,
 			entry->dpconf.link_bw = 540000;
 			break;
 		}
-		switch ((conf & 0x0f000000) >> 24) {
-		case 0xf:
-			entry->dpconf.link_nr = 4;
-			break;
-		case 0x3:
-			entry->dpconf.link_nr = 2;
-			break;
-		default:
-			entry->dpconf.link_nr = 1;
-			break;
+		entry->dpconf.link_nr = (conf & 0x0f000000) >> 24;
+		if (dcb->version < 0x41) {
+			switch (entry->dpconf.link_nr) {
+			case 0xf:
+				entry->dpconf.link_nr = 4;
+				break;
+			case 0x3:
+				entry->dpconf.link_nr = 2;
+				break;
+			default:
+				entry->dpconf.link_nr = 1;
+				break;
+			}
 		}
 		link = entry->dpconf.sor.link;
+		entry->i2c_index += NV_I2C_AUX(0);
 		break;
 	case DCB_OUTPUT_TMDS:
 		if (dcb->version >= 0x40) {
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 3d474ac03f88..21ec561edc99 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -214,6 +214,9 @@ nouveau_bo_new(struct drm_device *dev, int size, int align,
 	nvbo->tile_flags = tile_flags;
 	nvbo->bo.bdev = &drm->ttm.bdev;
 
+	if (!nv_device_is_cpu_coherent(nvkm_device(&drm->device)))
+		nvbo->force_coherent = flags & TTM_PL_FLAG_UNCACHED;
+
 	nvbo->page_shift = 12;
 	if (drm->client.vm) {
 		if (!(flags & TTM_PL_FLAG_TT) && size > 256 * 1024)
@@ -291,8 +294,9 @@ void
 nouveau_bo_placement_set(struct nouveau_bo *nvbo, uint32_t type, uint32_t busy)
 {
 	struct ttm_placement *pl = &nvbo->placement;
-	uint32_t flags = TTM_PL_MASK_CACHING |
-		(nvbo->pin_refcnt ? TTM_PL_FLAG_NO_EVICT : 0);
+	uint32_t flags = (nvbo->force_coherent ? TTM_PL_FLAG_UNCACHED :
+						 TTM_PL_MASK_CACHING) |
+			 (nvbo->pin_refcnt ? TTM_PL_FLAG_NO_EVICT : 0);
 
 	pl->placement = nvbo->placements;
 	set_placement_list(nvbo->placements, &pl->num_placement,
@@ -306,42 +310,75 @@ nouveau_bo_placement_set(struct nouveau_bo *nvbo, uint32_t type, uint32_t busy)
 }
 
 int
-nouveau_bo_pin(struct nouveau_bo *nvbo, uint32_t memtype)
+nouveau_bo_pin(struct nouveau_bo *nvbo, uint32_t memtype, bool contig)
 {
 	struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
 	struct ttm_buffer_object *bo = &nvbo->bo;
+	bool force = false, evict = false;
 	int ret;
 
 	ret = ttm_bo_reserve(bo, false, false, false, NULL);
 	if (ret)
-		goto out;
+		return ret;
 
-	if (nvbo->pin_refcnt && !(memtype & (1 << bo->mem.mem_type))) {
-		NV_ERROR(drm, "bo %p pinned elsewhere: 0x%08x vs 0x%08x\n", bo,
-			 1 << bo->mem.mem_type, memtype);
-		ret = -EINVAL;
-		goto out;
+	if (drm->device.info.family >= NV_DEVICE_INFO_V0_TESLA &&
+	    memtype == TTM_PL_FLAG_VRAM && contig) {
+		if (nvbo->tile_flags & NOUVEAU_GEM_TILE_NONCONTIG) {
+			if (bo->mem.mem_type == TTM_PL_VRAM) {
+				struct nouveau_mem *mem = bo->mem.mm_node;
+				if (!list_is_singular(&mem->regions))
+					evict = true;
+			}
+			nvbo->tile_flags &= ~NOUVEAU_GEM_TILE_NONCONTIG;
+			force = true;
+		}
 	}
 
-	if (nvbo->pin_refcnt++)
+	if (nvbo->pin_refcnt) {
+		if (!(memtype & (1 << bo->mem.mem_type)) || evict) {
+			NV_ERROR(drm, "bo %p pinned elsewhere: "
+				      "0x%08x vs 0x%08x\n", bo,
+				 1 << bo->mem.mem_type, memtype);
+			ret = -EBUSY;
+		}
+		nvbo->pin_refcnt++;
 		goto out;
+	}
 
+	if (evict) {
+		nouveau_bo_placement_set(nvbo, TTM_PL_FLAG_TT, 0);
+		ret = nouveau_bo_validate(nvbo, false, false);
+		if (ret)
+			goto out;
+	}
+
+	nvbo->pin_refcnt++;
 	nouveau_bo_placement_set(nvbo, memtype, 0);
 
+	/* drop pin_refcnt temporarily, so we don't trip the assertion
+	 * in nouveau_bo_move() that makes sure we're not trying to
+	 * move a pinned buffer
+	 */
+	nvbo->pin_refcnt--;
 	ret = nouveau_bo_validate(nvbo, false, false);
-	if (ret == 0) {
-		switch (bo->mem.mem_type) {
-		case TTM_PL_VRAM:
-			drm->gem.vram_available -= bo->mem.size;
-			break;
-		case TTM_PL_TT:
-			drm->gem.gart_available -= bo->mem.size;
-			break;
-		default:
-			break;
-		}
+	if (ret)
+		goto out;
+	nvbo->pin_refcnt++;
+
+	switch (bo->mem.mem_type) {
+	case TTM_PL_VRAM:
+		drm->gem.vram_available -= bo->mem.size;
+		break;
+	case TTM_PL_TT:
+		drm->gem.gart_available -= bo->mem.size;
+		break;
+	default:
+		break;
 	}
+
 out:
+	if (force && ret)
+		nvbo->tile_flags |= NOUVEAU_GEM_TILE_NONCONTIG;
 	ttm_bo_unreserve(bo);
 	return ret;
 }
@@ -392,7 +429,14 @@ nouveau_bo_map(struct nouveau_bo *nvbo)
 	if (ret)
 		return ret;
 
-	ret = ttm_bo_kmap(&nvbo->bo, 0, nvbo->bo.mem.num_pages, &nvbo->kmap);
+	/*
+	 * TTM buffers allocated using the DMA API already have a mapping, let's
+	 * use it instead.
+	 */
+	if (!nvbo->force_coherent)
+		ret = ttm_bo_kmap(&nvbo->bo, 0, nvbo->bo.mem.num_pages,
+				  &nvbo->kmap);
+
 	ttm_bo_unreserve(&nvbo->bo);
 	return ret;
 }
@@ -400,10 +444,57 @@ nouveau_bo_map(struct nouveau_bo *nvbo)
 void
 nouveau_bo_unmap(struct nouveau_bo *nvbo)
 {
-	if (nvbo)
+	if (!nvbo)
+		return;
+
+	/*
+	 * TTM buffers allocated using the DMA API already had a coherent
+	 * mapping which we used, no need to unmap.
+	 */
+	if (!nvbo->force_coherent)
 		ttm_bo_kunmap(&nvbo->kmap);
 }
 
+void
+nouveau_bo_sync_for_device(struct nouveau_bo *nvbo)
+{
+	struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
+	struct nouveau_device *device = nvkm_device(&drm->device);
+	struct ttm_dma_tt *ttm_dma = (struct ttm_dma_tt *)nvbo->bo.ttm;
+	int i;
+
+	if (!ttm_dma)
+		return;
+
+	/* Don't waste time looping if the object is coherent */
+	if (nvbo->force_coherent)
+		return;
+
+	for (i = 0; i < ttm_dma->ttm.num_pages; i++)
+		dma_sync_single_for_device(nv_device_base(device),
+			ttm_dma->dma_address[i], PAGE_SIZE, DMA_TO_DEVICE);
+}
+
+void
+nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo)
+{
+	struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
+	struct nouveau_device *device = nvkm_device(&drm->device);
+	struct ttm_dma_tt *ttm_dma = (struct ttm_dma_tt *)nvbo->bo.ttm;
+	int i;
+
+	if (!ttm_dma)
+		return;
+
+	/* Don't waste time looping if the object is coherent */
+	if (nvbo->force_coherent)
+		return;
+
+	for (i = 0; i < ttm_dma->ttm.num_pages; i++)
+		dma_sync_single_for_cpu(nv_device_base(device),
+			ttm_dma->dma_address[i], PAGE_SIZE, DMA_FROM_DEVICE);
+}
+
 int
 nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible,
 		    bool no_wait_gpu)
@@ -415,15 +506,41 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible,
 	if (ret)
 		return ret;
 
+	nouveau_bo_sync_for_device(nvbo);
+
 	return 0;
 }
 
+static inline void *
+_nouveau_bo_mem_index(struct nouveau_bo *nvbo, unsigned index, void *mem, u8 sz)
+{
+	struct ttm_dma_tt *dma_tt;
+	u8 *m = mem;
+
+	index *= sz;
+
+	if (m) {
+		/* kmap'd address, return the corresponding offset */
+		m += index;
+	} else {
+		/* DMA-API mapping, lookup the right address */
+		dma_tt = (struct ttm_dma_tt *)nvbo->bo.ttm;
+		m = dma_tt->cpu_address[index / PAGE_SIZE];
+		m += index % PAGE_SIZE;
+	}
+
+	return m;
+}
+#define nouveau_bo_mem_index(o, i, m) _nouveau_bo_mem_index(o, i, m, sizeof(*m))
+
 u16
 nouveau_bo_rd16(struct nouveau_bo *nvbo, unsigned index)
 {
 	bool is_iomem;
 	u16 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem);
-	mem = &mem[index];
+
+	mem = nouveau_bo_mem_index(nvbo, index, mem);
+
 	if (is_iomem)
 		return ioread16_native((void __force __iomem *)mem);
 	else
@@ -435,7 +552,9 @@ nouveau_bo_wr16(struct nouveau_bo *nvbo, unsigned index, u16 val)
 {
 	bool is_iomem;
 	u16 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem);
-	mem = &mem[index];
+
+	mem = nouveau_bo_mem_index(nvbo, index, mem);
+
 	if (is_iomem)
 		iowrite16_native(val, (void __force __iomem *)mem);
 	else
@@ -447,7 +566,9 @@ nouveau_bo_rd32(struct nouveau_bo *nvbo, unsigned index)
 {
 	bool is_iomem;
 	u32 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem);
-	mem = &mem[index];
+
+	mem = nouveau_bo_mem_index(nvbo, index, mem);
+
 	if (is_iomem)
 		return ioread32_native((void __force __iomem *)mem);
 	else
@@ -459,7 +580,9 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32 val)
 {
 	bool is_iomem;
 	u32 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem);
-	mem = &mem[index];
+
+	mem = nouveau_bo_mem_index(nvbo, index, mem);
+
 	if (is_iomem)
 		iowrite32_native(val, (void __force __iomem *)mem);
 	else
@@ -1184,6 +1307,9 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool evict, bool intr,
 	struct nouveau_drm_tile *new_tile = NULL;
 	int ret = 0;
 
+	if (nvbo->pin_refcnt)
+		NV_WARN(drm, "Moving pinned object %p!\n", nvbo);
+
 	if (drm->device.info.family < NV_DEVICE_INFO_V0_TESLA) {
 		ret = nouveau_bo_vm_bind(bo, new_mem, &new_tile);
 		if (ret)
@@ -1376,6 +1502,14 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm)
 	dev = drm->dev;
 	pdev = nv_device_base(device);
 
+	/*
+	 * Objects matching this condition have been marked as force_coherent,
+	 * so use the DMA API for them.
+	 */
+	if (!nv_device_is_cpu_coherent(device) &&
+	    ttm->caching_state == tt_uncached)
+		return ttm_dma_populate(ttm_dma, dev->dev);
+
 #if __OS_HAS_AGP
 	if (drm->agp.stat == ENABLED) {
 		return ttm_agp_tt_populate(ttm);
@@ -1433,6 +1567,14 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
 	dev = drm->dev;
 	pdev = nv_device_base(device);
 
+	/*
+	 * Objects matching this condition have been marked as force_coherent,
+	 * so use the DMA API for them.
+	 */
+	if (!nv_device_is_cpu_coherent(device) &&
+	    ttm->caching_state == tt_uncached)
+		ttm_dma_unpopulate(ttm_dma, dev->dev);
+
 #if __OS_HAS_AGP
 	if (drm->agp.stat == ENABLED) {
 		ttm_agp_tt_unpopulate(ttm);
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h b/drivers/gpu/drm/nouveau/nouveau_bo.h
index 22d2c764d80b..072222efeeb7 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.h
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.h
@@ -13,6 +13,7 @@ struct nouveau_bo {
 	u32 valid_domains;
 	struct ttm_place placements[3];
 	struct ttm_place busy_placements[3];
+	bool force_coherent;
 	struct ttm_bo_kmap_obj kmap;
 	struct list_head head;
 
@@ -72,7 +73,7 @@ int  nouveau_bo_new(struct drm_device *, int size, int align, u32 flags,
 		    u32 tile_mode, u32 tile_flags, struct sg_table *sg,
 		    struct reservation_object *robj,
 		    struct nouveau_bo **);
-int  nouveau_bo_pin(struct nouveau_bo *, u32 flags);
+int  nouveau_bo_pin(struct nouveau_bo *, u32 flags, bool contig);
 int  nouveau_bo_unpin(struct nouveau_bo *);
 int  nouveau_bo_map(struct nouveau_bo *);
 void nouveau_bo_unmap(struct nouveau_bo *);
@@ -84,6 +85,8 @@ void nouveau_bo_wr32(struct nouveau_bo *, unsigned index, u32 val);
 void nouveau_bo_fence(struct nouveau_bo *, struct nouveau_fence *, bool exclusive);
 int  nouveau_bo_validate(struct nouveau_bo *, bool interruptible,
 			 bool no_wait_gpu);
+void nouveau_bo_sync_for_device(struct nouveau_bo *nvbo);
+void nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo);
 
 struct nouveau_vma *
 nouveau_bo_vma_find(struct nouveau_bo *, struct nouveau_vm *);
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c b/drivers/gpu/drm/nouveau/nouveau_chan.c
index fd3dbd59d73e..aff9099aae6c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -102,14 +102,14 @@ nouveau_channel_prep(struct nouveau_drm *drm, struct nvif_device *device,
 	chan->drm = drm;
 
 	/* allocate memory for dma push buffer */
-	target = TTM_PL_FLAG_TT;
+	target = TTM_PL_FLAG_TT | TTM_PL_FLAG_UNCACHED;
 	if (nouveau_vram_pushbuf)
 		target = TTM_PL_FLAG_VRAM;
 
 	ret = nouveau_bo_new(drm->dev, size, 0, target, 0, 0, NULL, NULL,
 			    &chan->push.buffer);
 	if (ret == 0) {
-		ret = nouveau_bo_pin(chan->push.buffer, target);
+		ret = nouveau_bo_pin(chan->push.buffer, target, false);
 		if (ret == 0)
 			ret = nouveau_bo_map(chan->push.buffer);
 	}
@@ -285,7 +285,6 @@ nouveau_channel_init(struct nouveau_channel *chan, u32 vram, u32 gart)
 	struct nouveau_software_chan *swch;
 	struct nv_dma_v0 args = {};
 	int ret, i;
-	bool save;
 
 	nvif_object_map(chan->object);
 
@@ -387,11 +386,7 @@ nouveau_channel_init(struct nouveau_channel *chan, u32 vram, u32 gart)
 	}
 
 	/* initialise synchronisation */
-	save = cli->base.super;
-	cli->base.super = true; /* hack until fencenv50 fixed */
-	ret = nouveau_fence(chan->drm)->context_new(chan);
-	cli->base.super = save;
-	return ret;
+	return nouveau_fence(chan->drm)->context_new(chan);
 }
 
 int
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index a88e6927f571..5d93902a91ab 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -479,6 +479,7 @@ nouveau_display_create(struct drm_device *dev)
 
 	if (nouveau_modeset != 2 && drm->vbios.dcb.entries) {
 		static const u16 oclass[] = {
+			GM204_DISP,
 			GM107_DISP,
 			GK110_DISP,
 			GK104_DISP,
@@ -568,9 +569,10 @@ nouveau_display_suspend(struct drm_device *dev, bool runtime)
 
 	list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
 		struct nouveau_crtc *nv_crtc = nouveau_crtc(crtc);
-
-		nouveau_bo_unmap(nv_crtc->cursor.nvbo);
-		nouveau_bo_unpin(nv_crtc->cursor.nvbo);
+		if (nv_crtc->cursor.nvbo) {
+			nouveau_bo_unmap(nv_crtc->cursor.nvbo);
+			nouveau_bo_unpin(nv_crtc->cursor.nvbo);
+		}
 	}
 
 	return 0;
@@ -591,15 +593,17 @@ nouveau_display_resume(struct drm_device *dev, bool runtime)
 		if (!nouveau_fb || !nouveau_fb->nvbo)
 			continue;
 
-		ret = nouveau_bo_pin(nouveau_fb->nvbo, TTM_PL_FLAG_VRAM);
+		ret = nouveau_bo_pin(nouveau_fb->nvbo, TTM_PL_FLAG_VRAM, true);
 		if (ret)
 			NV_ERROR(drm, "Could not pin framebuffer\n");
 	}
 
 	list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
 		struct nouveau_crtc *nv_crtc = nouveau_crtc(crtc);
+		if (!nv_crtc->cursor.nvbo)
+			continue;
 
-		ret = nouveau_bo_pin(nv_crtc->cursor.nvbo, TTM_PL_FLAG_VRAM);
+		ret = nouveau_bo_pin(nv_crtc->cursor.nvbo, TTM_PL_FLAG_VRAM, true);
 		if (!ret)
 			ret = nouveau_bo_map(nv_crtc->cursor.nvbo);
 		if (ret)
@@ -630,9 +634,10 @@ nouveau_display_resume(struct drm_device *dev, bool runtime)
 
 	list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
 		struct nouveau_crtc *nv_crtc = nouveau_crtc(crtc);
-		u32 offset = nv_crtc->cursor.nvbo->bo.offset;
 
-		nv_crtc->cursor.set_offset(nv_crtc, offset);
+		if (!nv_crtc->cursor.nvbo)
+			continue;
+		nv_crtc->cursor.set_offset(nv_crtc, nv_crtc->cursor.nvbo->bo.offset);
 		nv_crtc->cursor.set_pos(nv_crtc, nv_crtc->cursor_saved_x,
 						 nv_crtc->cursor_saved_y);
 	}
@@ -710,7 +715,7 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
 		return -ENOMEM;
 
 	if (new_bo != old_bo) {
-		ret = nouveau_bo_pin(new_bo, TTM_PL_FLAG_VRAM);
+		ret = nouveau_bo_pin(new_bo, TTM_PL_FLAG_VRAM, true);
 		if (ret)
 			goto fail_free;
 	}
@@ -871,6 +876,7 @@ nouveau_display_dumb_create(struct drm_file *file_priv, struct drm_device *dev,
 	if (ret)
 		return ret;
 
+	bo->gem.dumb = true;
 	ret = drm_gem_handle_create(file_priv, &bo->gem, &args->handle);
 	drm_gem_object_unreference_unlocked(&bo->gem);
 	return ret;
@@ -886,6 +892,14 @@ nouveau_display_dumb_map_offset(struct drm_file *file_priv,
 	gem = drm_gem_object_lookup(dev, file_priv, handle);
 	if (gem) {
 		struct nouveau_bo *bo = nouveau_gem_object(gem);
+
+		/*
+		 * We don't allow dumb mmaps on objects created using another
+		 * interface.
+		 */
+		WARN_ONCE(!(gem->dumb || gem->import_attach),
+			  "Illegal dumb map of accelerated buffer.\n");
+
 		*poffset = drm_vma_node_offset_addr(&bo->bo.vma_node);
 		drm_gem_object_unreference_unlocked(gem);
 		return 0;
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 62b97c4eef8d..65910e3aed0c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -613,26 +613,6 @@ fail_display:
 	return ret;
 }
 
-int nouveau_pmops_suspend(struct device *dev)
-{
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct drm_device *drm_dev = pci_get_drvdata(pdev);
-	int ret;
-
-	if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF ||
-	    drm_dev->switch_power_state == DRM_SWITCH_POWER_DYNAMIC_OFF)
-		return 0;
-
-	ret = nouveau_do_suspend(drm_dev, false);
-	if (ret)
-		return ret;
-
-	pci_save_state(pdev);
-	pci_disable_device(pdev);
-	pci_set_power_state(pdev, PCI_D3hot);
-	return 0;
-}
-
 static int
 nouveau_do_resume(struct drm_device *dev, bool runtime)
 {
@@ -667,7 +647,29 @@ nouveau_do_resume(struct drm_device *dev, bool runtime)
 	return 0;
 }
 
-int nouveau_pmops_resume(struct device *dev)
+int
+nouveau_pmops_suspend(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct drm_device *drm_dev = pci_get_drvdata(pdev);
+	int ret;
+
+	if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF ||
+	    drm_dev->switch_power_state == DRM_SWITCH_POWER_DYNAMIC_OFF)
+		return 0;
+
+	ret = nouveau_do_suspend(drm_dev, false);
+	if (ret)
+		return ret;
+
+	pci_save_state(pdev);
+	pci_disable_device(pdev);
+	pci_set_power_state(pdev, PCI_D3hot);
+	return 0;
+}
+
+int
+nouveau_pmops_resume(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct drm_device *drm_dev = pci_get_drvdata(pdev);
@@ -687,20 +689,122 @@ int nouveau_pmops_resume(struct device *dev)
 	return nouveau_do_resume(drm_dev, false);
 }
 
-static int nouveau_pmops_freeze(struct device *dev)
+static int
+nouveau_pmops_freeze(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct drm_device *drm_dev = pci_get_drvdata(pdev);
 	return nouveau_do_suspend(drm_dev, false);
 }
 
-static int nouveau_pmops_thaw(struct device *dev)
+static int
+nouveau_pmops_thaw(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct drm_device *drm_dev = pci_get_drvdata(pdev);
 	return nouveau_do_resume(drm_dev, false);
 }
 
+static int
+nouveau_pmops_runtime_suspend(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct drm_device *drm_dev = pci_get_drvdata(pdev);
+	int ret;
+
+	if (nouveau_runtime_pm == 0) {
+		pm_runtime_forbid(dev);
+		return -EBUSY;
+	}
+
+	/* are we optimus enabled? */
+	if (nouveau_runtime_pm == -1 && !nouveau_is_optimus() && !nouveau_is_v1_dsm()) {
+		DRM_DEBUG_DRIVER("failing to power off - not optimus\n");
+		pm_runtime_forbid(dev);
+		return -EBUSY;
+	}
+
+	nv_debug_level(SILENT);
+	drm_kms_helper_poll_disable(drm_dev);
+	vga_switcheroo_set_dynamic_switch(pdev, VGA_SWITCHEROO_OFF);
+	nouveau_switcheroo_optimus_dsm();
+	ret = nouveau_do_suspend(drm_dev, true);
+	pci_save_state(pdev);
+	pci_disable_device(pdev);
+	pci_ignore_hotplug(pdev);
+	pci_set_power_state(pdev, PCI_D3cold);
+	drm_dev->switch_power_state = DRM_SWITCH_POWER_DYNAMIC_OFF;
+	return ret;
+}
+
+static int
+nouveau_pmops_runtime_resume(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct drm_device *drm_dev = pci_get_drvdata(pdev);
+	struct nvif_device *device = &nouveau_drm(drm_dev)->device;
+	int ret;
+
+	if (nouveau_runtime_pm == 0)
+		return -EINVAL;
+
+	pci_set_power_state(pdev, PCI_D0);
+	pci_restore_state(pdev);
+	ret = pci_enable_device(pdev);
+	if (ret)
+		return ret;
+	pci_set_master(pdev);
+
+	ret = nouveau_do_resume(drm_dev, true);
+	drm_kms_helper_poll_enable(drm_dev);
+	/* do magic */
+	nvif_mask(device, 0x88488, (1 << 25), (1 << 25));
+	vga_switcheroo_set_dynamic_switch(pdev, VGA_SWITCHEROO_ON);
+	drm_dev->switch_power_state = DRM_SWITCH_POWER_ON;
+	nv_debug_level(NORMAL);
+	return ret;
+}
+
+static int
+nouveau_pmops_runtime_idle(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct drm_device *drm_dev = pci_get_drvdata(pdev);
+	struct nouveau_drm *drm = nouveau_drm(drm_dev);
+	struct drm_crtc *crtc;
+
+	if (nouveau_runtime_pm == 0) {
+		pm_runtime_forbid(dev);
+		return -EBUSY;
+	}
+
+	/* are we optimus enabled? */
+	if (nouveau_runtime_pm == -1 && !nouveau_is_optimus() && !nouveau_is_v1_dsm()) {
+		DRM_DEBUG_DRIVER("failing to power off - not optimus\n");
+		pm_runtime_forbid(dev);
+		return -EBUSY;
+	}
+
+	/* if we have a hdmi audio device - make sure it has a driver loaded */
+	if (drm->hdmi_device) {
+		if (!drm->hdmi_device->driver) {
+			DRM_DEBUG_DRIVER("failing to power off - no HDMI audio driver loaded\n");
+			pm_runtime_mark_last_busy(dev);
+			return -EBUSY;
+		}
+	}
+
+	list_for_each_entry(crtc, &drm->dev->mode_config.crtc_list, head) {
+		if (crtc->enabled) {
+			DRM_DEBUG_DRIVER("failing to power off - crtc active\n");
+			return -EBUSY;
+		}
+	}
+	pm_runtime_mark_last_busy(dev);
+	pm_runtime_autosuspend(dev);
+	/* we don't want the main rpm_idle to call suspend - we want to autosuspend */
+	return 1;
+}
 
 static int
 nouveau_drm_open(struct drm_device *dev, struct drm_file *fpriv)
@@ -907,104 +1011,6 @@ nouveau_drm_pci_table[] = {
 	{}
 };
 
-static int nouveau_pmops_runtime_suspend(struct device *dev)
-{
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct drm_device *drm_dev = pci_get_drvdata(pdev);
-	int ret;
-
-	if (nouveau_runtime_pm == 0) {
-		pm_runtime_forbid(dev);
-		return -EBUSY;
-	}
-
-	/* are we optimus enabled? */
-	if (nouveau_runtime_pm == -1 && !nouveau_is_optimus() && !nouveau_is_v1_dsm()) {
-		DRM_DEBUG_DRIVER("failing to power off - not optimus\n");
-		pm_runtime_forbid(dev);
-		return -EBUSY;
-	}
-
-	nv_debug_level(SILENT);
-	drm_kms_helper_poll_disable(drm_dev);
-	vga_switcheroo_set_dynamic_switch(pdev, VGA_SWITCHEROO_OFF);
-	nouveau_switcheroo_optimus_dsm();
-	ret = nouveau_do_suspend(drm_dev, true);
-	pci_save_state(pdev);
-	pci_disable_device(pdev);
-	pci_ignore_hotplug(pdev);
-	pci_set_power_state(pdev, PCI_D3cold);
-	drm_dev->switch_power_state = DRM_SWITCH_POWER_DYNAMIC_OFF;
-	return ret;
-}
-
-static int nouveau_pmops_runtime_resume(struct device *dev)
-{
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct drm_device *drm_dev = pci_get_drvdata(pdev);
-	struct nvif_device *device = &nouveau_drm(drm_dev)->device;
-	int ret;
-
-	if (nouveau_runtime_pm == 0)
-		return -EINVAL;
-
-	pci_set_power_state(pdev, PCI_D0);
-	pci_restore_state(pdev);
-	ret = pci_enable_device(pdev);
-	if (ret)
-		return ret;
-	pci_set_master(pdev);
-
-	ret = nouveau_do_resume(drm_dev, true);
-	drm_kms_helper_poll_enable(drm_dev);
-	/* do magic */
-	nvif_mask(device, 0x88488, (1 << 25), (1 << 25));
-	vga_switcheroo_set_dynamic_switch(pdev, VGA_SWITCHEROO_ON);
-	drm_dev->switch_power_state = DRM_SWITCH_POWER_ON;
-	nv_debug_level(NORMAL);
-	return ret;
-}
-
-static int nouveau_pmops_runtime_idle(struct device *dev)
-{
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct drm_device *drm_dev = pci_get_drvdata(pdev);
-	struct nouveau_drm *drm = nouveau_drm(drm_dev);
-	struct drm_crtc *crtc;
-
-	if (nouveau_runtime_pm == 0) {
-		pm_runtime_forbid(dev);
-		return -EBUSY;
-	}
-
-	/* are we optimus enabled? */
-	if (nouveau_runtime_pm == -1 && !nouveau_is_optimus() && !nouveau_is_v1_dsm()) {
-		DRM_DEBUG_DRIVER("failing to power off - not optimus\n");
-		pm_runtime_forbid(dev);
-		return -EBUSY;
-	}
-
-	/* if we have a hdmi audio device - make sure it has a driver loaded */
-	if (drm->hdmi_device) {
-		if (!drm->hdmi_device->driver) {
-			DRM_DEBUG_DRIVER("failing to power off - no HDMI audio driver loaded\n");
-			pm_runtime_mark_last_busy(dev);
-			return -EBUSY;
-		}
-	}
-
-	list_for_each_entry(crtc, &drm->dev->mode_config.crtc_list, head) {
-		if (crtc->enabled) {
-			DRM_DEBUG_DRIVER("failing to power off - crtc active\n");
-			return -EBUSY;
-		}
-	}
-	pm_runtime_mark_last_busy(dev);
-	pm_runtime_autosuspend(dev);
-	/* we don't want the main rpm_idle to call suspend - we want to autosuspend */
-	return 1;
-}
-
 static void nouveau_display_options(void)
 {
 	DRM_DEBUG_DRIVER("Loading Nouveau with parameters:\n");
diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
index 593ef8a2a069..3ed12a8cfc91 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
@@ -341,7 +341,7 @@ nouveau_fbcon_create(struct drm_fb_helper *helper,
 		goto out;
 	}
 
-	ret = nouveau_bo_pin(nvbo, TTM_PL_FLAG_VRAM);
+	ret = nouveau_bo_pin(nvbo, TTM_PL_FLAG_VRAM, false);
 	if (ret) {
 		NV_ERROR(drm, "failed to pin fb: %d\n", ret);
 		goto out_unref;
@@ -498,6 +498,23 @@ nouveau_fbcon_set_suspend_work(struct work_struct *work)
 	console_unlock();
 }
 
+void
+nouveau_fbcon_set_suspend(struct drm_device *dev, int state)
+{
+	struct nouveau_drm *drm = nouveau_drm(dev);
+	if (drm->fbcon) {
+		if (state == FBINFO_STATE_RUNNING) {
+			schedule_work(&drm->fbcon->work);
+			return;
+		}
+		flush_work(&drm->fbcon->work);
+		console_lock();
+		fb_set_suspend(drm->fbcon->helper.fbdev, state);
+		nouveau_fbcon_accel_save_disable(dev);
+		console_unlock();
+	}
+}
+
 int
 nouveau_fbcon_init(struct drm_device *dev)
 {
@@ -557,20 +574,3 @@ nouveau_fbcon_fini(struct drm_device *dev)
 	kfree(drm->fbcon);
 	drm->fbcon = NULL;
 }
-
-void
-nouveau_fbcon_set_suspend(struct drm_device *dev, int state)
-{
-	struct nouveau_drm *drm = nouveau_drm(dev);
-	if (drm->fbcon) {
-		if (state == FBINFO_STATE_RUNNING) {
-			schedule_work(&drm->fbcon->work);
-			return;
-		}
-		flush_work(&drm->fbcon->work);
-		console_lock();
-		fb_set_suspend(drm->fbcon->helper.fbdev, state);
-		nouveau_fbcon_accel_save_disable(dev);
-		console_unlock();
-	}
-}
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 36951ee4b157..28d51a22a4bf 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -444,6 +444,9 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
 	list_for_each_entry(nvbo, list, entry) {
 		struct drm_nouveau_gem_pushbuf_bo *b = &pbbo[nvbo->pbbo_index];
 
+		WARN_ONCE(nvbo->gem.dumb,
+			  "GPU use of dumb buffer is illegal.\n");
+
 		ret = nouveau_gem_set_domain(&nvbo->gem, b->read_domains,
 					     b->write_domains,
 					     b->valid_domains);
@@ -867,6 +870,7 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
 		else
 			ret = lret;
 	}
+	nouveau_bo_sync_for_cpu(nvbo);
 	drm_gem_object_unreference_unlocked(gem);
 
 	return ret;
@@ -876,6 +880,17 @@ int
 nouveau_gem_ioctl_cpu_fini(struct drm_device *dev, void *data,
 			   struct drm_file *file_priv)
 {
+	struct drm_nouveau_gem_cpu_fini *req = data;
+	struct drm_gem_object *gem;
+	struct nouveau_bo *nvbo;
+
+	gem = drm_gem_object_lookup(dev, file_priv, req->handle);
+	if (!gem)
+		return -ENOENT;
+	nvbo = nouveau_gem_object(gem);
+
+	nouveau_bo_sync_for_device(nvbo);
+	drm_gem_object_unreference_unlocked(gem);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_platform.c b/drivers/gpu/drm/nouveau/nouveau_platform.c
index 246a824c16ca..b307bbedd4c4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_platform.c
+++ b/drivers/gpu/drm/nouveau/nouveau_platform.c
@@ -27,6 +27,7 @@
 #include <linux/of.h>
 #include <linux/reset.h>
 #include <linux/regulator/consumer.h>
+#include <soc/tegra/fuse.h>
 #include <soc/tegra/pmc.h>
 
 #include "nouveau_drm.h"
@@ -128,6 +129,7 @@ static int nouveau_platform_probe(struct platform_device *pdev)
 	}
 
 	device->gpu = gpu;
+	device->gpu_speedo = tegra_sku_info.gpu_speedo_value;
 
 	err = drm_dev_register(drm, 0);
 	if (err < 0)
diff --git a/drivers/gpu/drm/nouveau/nouveau_platform.h b/drivers/gpu/drm/nouveau/nouveau_platform.h
index 91f66504900e..58c28b5653d5 100644
--- a/drivers/gpu/drm/nouveau/nouveau_platform.h
+++ b/drivers/gpu/drm/nouveau/nouveau_platform.h
@@ -41,6 +41,8 @@ struct nouveau_platform_device {
 	struct nouveau_device device;
 
 	struct nouveau_platform_gpu *gpu;
+
+	int gpu_speedo;
 };
 
 #define nv_device_to_platform(d)                                               \
diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c b/drivers/gpu/drm/nouveau/nouveau_prime.c
index 228226ab27fc..dd32ad6db53d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_prime.c
+++ b/drivers/gpu/drm/nouveau/nouveau_prime.c
@@ -93,7 +93,7 @@ int nouveau_gem_prime_pin(struct drm_gem_object *obj)
 	int ret;
 
 	/* pin buffer into GTT */
-	ret = nouveau_bo_pin(nvbo, TTM_PL_FLAG_TT);
+	ret = nouveau_bo_pin(nvbo, TTM_PL_FLAG_TT, false);
 	if (ret)
 		return -EINVAL;
 
diff --git a/drivers/gpu/drm/nouveau/nv17_fence.c b/drivers/gpu/drm/nouveau/nv17_fence.c
index 40b461c7d5c5..57860cfa1de5 100644
--- a/drivers/gpu/drm/nouveau/nv17_fence.c
+++ b/drivers/gpu/drm/nouveau/nv17_fence.c
@@ -131,7 +131,7 @@ nv17_fence_create(struct nouveau_drm *drm)
 	ret = nouveau_bo_new(drm->dev, 4096, 0x1000, TTM_PL_FLAG_VRAM,
 			     0, 0x0000, NULL, NULL, &priv->bo);
 	if (!ret) {
-		ret = nouveau_bo_pin(priv->bo, TTM_PL_FLAG_VRAM);
+		ret = nouveau_bo_pin(priv->bo, TTM_PL_FLAG_VRAM, false);
 		if (!ret) {
 			ret = nouveau_bo_map(priv->bo);
 			if (ret)
diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c
index eb8b36714fa1..490b90866baf 100644
--- a/drivers/gpu/drm/nouveau/nv50_display.c
+++ b/drivers/gpu/drm/nouveau/nv50_display.c
@@ -26,6 +26,7 @@
 
 #include <drm/drmP.h>
 #include <drm/drm_crtc_helper.h>
+#include <drm/drm_plane_helper.h>
 #include <drm/drm_dp_helper.h>
 
 #include <nvif/class.h>
@@ -65,15 +66,29 @@ static int
 nv50_chan_create(struct nvif_object *disp, const u32 *oclass, u8 head,
 		 void *data, u32 size, struct nv50_chan *chan)
 {
+	const u32 handle = (oclass[0] << 16) | head;
+	u32 sclass[8];
+	int ret, i;
+
+	ret = nvif_object_sclass(disp, sclass, ARRAY_SIZE(sclass));
+	WARN_ON(ret > ARRAY_SIZE(sclass));
+	if (ret < 0)
+		return ret;
+
 	while (oclass[0]) {
-		int ret = nvif_object_init(disp, NULL, (oclass[0] << 16) | head,
-					   oclass[0], data, size,
-					  &chan->user);
-		if (oclass++, ret == 0) {
-			nvif_object_map(&chan->user);
-			return ret;
+		for (i = 0; i < ARRAY_SIZE(sclass); i++) {
+			if (sclass[i] == oclass[0]) {
+				ret = nvif_object_init(disp, NULL, handle,
+						       oclass[0], data, size,
+						       &chan->user);
+				if (ret == 0)
+					nvif_object_map(&chan->user);
+				return ret;
+			}
 		}
+		oclass++;
 	}
+
 	return -ENOSYS;
 }
 
@@ -110,6 +125,7 @@ nv50_pioc_create(struct nvif_object *disp, const u32 *oclass, u8 head,
 
 struct nv50_curs {
 	struct nv50_pioc base;
+	struct nouveau_bo *image;
 };
 
 static int
@@ -265,6 +281,7 @@ nv50_core_create(struct nvif_object *disp, u64 syncbuf, struct nv50_mast *core)
 		.pushbuf = 0xb0007d00,
 	};
 	static const u32 oclass[] = {
+		GM204_DISP_CORE_CHANNEL_DMA,
 		GM107_DISP_CORE_CHANNEL_DMA,
 		GK110_DISP_CORE_CHANNEL_DMA,
 		GK104_DISP_CORE_CHANNEL_DMA,
@@ -424,8 +441,21 @@ evo_kick(u32 *push, void *evoc)
 	mutex_unlock(&dmac->lock);
 }
 
+#if 1
 #define evo_mthd(p,m,s) *((p)++) = (((s) << 18) | (m))
 #define evo_data(p,d)   *((p)++) = (d)
+#else
+#define evo_mthd(p,m,s) do {                                                   \
+	const u32 _m = (m), _s = (s);                                          \
+	printk(KERN_ERR "%04x %d %s\n", _m, _s, __func__);                     \
+	*((p)++) = ((_s << 18) | _m);                                          \
+} while(0)
+#define evo_data(p,d) do {                                                     \
+	const u32 _d = (d);                                                    \
+	printk(KERN_ERR "\t%08x\n", _d);                                       \
+	*((p)++) = _d;                                                         \
+} while(0)
+#endif
 
 static bool
 evo_sync_wait(void *data)
@@ -887,23 +917,24 @@ static void
 nv50_crtc_cursor_show(struct nouveau_crtc *nv_crtc)
 {
 	struct nv50_mast *mast = nv50_mast(nv_crtc->base.dev);
+	struct nv50_curs *curs = nv50_curs(&nv_crtc->base);
 	u32 *push = evo_wait(mast, 16);
 	if (push) {
 		if (nv50_vers(mast) < G82_DISP_CORE_CHANNEL_DMA) {
 			evo_mthd(push, 0x0880 + (nv_crtc->index * 0x400), 2);
 			evo_data(push, 0x85000000);
-			evo_data(push, nv_crtc->cursor.nvbo->bo.offset >> 8);
+			evo_data(push, curs->image->bo.offset >> 8);
 		} else
 		if (nv50_vers(mast) < GF110_DISP_CORE_CHANNEL_DMA) {
 			evo_mthd(push, 0x0880 + (nv_crtc->index * 0x400), 2);
 			evo_data(push, 0x85000000);
-			evo_data(push, nv_crtc->cursor.nvbo->bo.offset >> 8);
+			evo_data(push, curs->image->bo.offset >> 8);
 			evo_mthd(push, 0x089c + (nv_crtc->index * 0x400), 1);
 			evo_data(push, mast->base.vram.handle);
 		} else {
 			evo_mthd(push, 0x0480 + (nv_crtc->index * 0x300), 2);
 			evo_data(push, 0x85000000);
-			evo_data(push, nv_crtc->cursor.nvbo->bo.offset >> 8);
+			evo_data(push, curs->image->bo.offset >> 8);
 			evo_mthd(push, 0x048c + (nv_crtc->index * 0x300), 1);
 			evo_data(push, mast->base.vram.handle);
 		}
@@ -940,8 +971,9 @@ static void
 nv50_crtc_cursor_show_hide(struct nouveau_crtc *nv_crtc, bool show, bool update)
 {
 	struct nv50_mast *mast = nv50_mast(nv_crtc->base.dev);
+	struct nv50_curs *curs = nv50_curs(&nv_crtc->base);
 
-	if (show)
+	if (show && curs->image)
 		nv50_crtc_cursor_show(nv_crtc);
 	else
 		nv50_crtc_cursor_hide(nv_crtc);
@@ -1041,7 +1073,7 @@ nv50_crtc_commit(struct drm_crtc *crtc)
 		evo_kick(push, mast);
 	}
 
-	nv50_crtc_cursor_show_hide(nv_crtc, nv_crtc->cursor.visible, true);
+	nv50_crtc_cursor_show_hide(nv_crtc, true, true);
 	nv50_display_flip_next(crtc, crtc->primary->fb, NULL, 1);
 }
 
@@ -1060,7 +1092,7 @@ nv50_crtc_swap_fbs(struct drm_crtc *crtc, struct drm_framebuffer *old_fb)
 	struct nv50_head *head = nv50_head(crtc);
 	int ret;
 
-	ret = nouveau_bo_pin(nvfb->nvbo, TTM_PL_FLAG_VRAM);
+	ret = nouveau_bo_pin(nvfb->nvbo, TTM_PL_FLAG_VRAM, true);
 	if (ret == 0) {
 		if (head->image)
 			nouveau_bo_unpin(head->image);
@@ -1241,13 +1273,13 @@ nv50_crtc_cursor_set(struct drm_crtc *crtc, struct drm_file *file_priv,
 		     uint32_t handle, uint32_t width, uint32_t height)
 {
 	struct nouveau_crtc *nv_crtc = nouveau_crtc(crtc);
+	struct nv50_curs *curs = nv50_curs(crtc);
 	struct drm_device *dev = crtc->dev;
-	struct drm_gem_object *gem;
-	struct nouveau_bo *nvbo;
-	bool visible = (handle != 0);
-	int i, ret = 0;
+	struct drm_gem_object *gem = NULL;
+	struct nouveau_bo *nvbo = NULL;
+	int ret = 0;
 
-	if (visible) {
+	if (handle) {
 		if (width != 64 || height != 64)
 			return -EINVAL;
 
@@ -1256,23 +1288,17 @@ nv50_crtc_cursor_set(struct drm_crtc *crtc, struct drm_file *file_priv,
 			return -ENOENT;
 		nvbo = nouveau_gem_object(gem);
 
-		ret = nouveau_bo_map(nvbo);
-		if (ret == 0) {
-			for (i = 0; i < 64 * 64; i++) {
-				u32 v = nouveau_bo_rd32(nvbo, i);
-				nouveau_bo_wr32(nv_crtc->cursor.nvbo, i, v);
-			}
-			nouveau_bo_unmap(nvbo);
-		}
-
-		drm_gem_object_unreference_unlocked(gem);
+		ret = nouveau_bo_pin(nvbo, TTM_PL_FLAG_VRAM, true);
 	}
 
-	if (visible != nv_crtc->cursor.visible) {
-		nv50_crtc_cursor_show_hide(nv_crtc, visible, true);
-		nv_crtc->cursor.visible = visible;
+	if (ret == 0) {
+		if (curs->image)
+			nouveau_bo_unpin(curs->image);
+		nouveau_bo_ref(nvbo, &curs->image);
 	}
+	drm_gem_object_unreference_unlocked(gem);
 
+	nv50_crtc_cursor_show_hide(nv_crtc, true, true);
 	return ret;
 }
 
@@ -1327,10 +1353,10 @@ nv50_crtc_destroy(struct drm_crtc *crtc)
 		nouveau_bo_unpin(head->image);
 	nouveau_bo_ref(NULL, &head->image);
 
-	nouveau_bo_unmap(nv_crtc->cursor.nvbo);
-	if (nv_crtc->cursor.nvbo)
-		nouveau_bo_unpin(nv_crtc->cursor.nvbo);
-	nouveau_bo_ref(NULL, &nv_crtc->cursor.nvbo);
+	/*XXX: ditto */
+	if (head->curs.image)
+		nouveau_bo_unpin(head->curs.image);
+	nouveau_bo_ref(NULL, &head->curs.image);
 
 	nouveau_bo_unmap(nv_crtc->lut.nvbo);
 	if (nv_crtc->lut.nvbo)
@@ -1362,16 +1388,6 @@ static const struct drm_crtc_funcs nv50_crtc_func = {
 	.page_flip = nouveau_crtc_page_flip,
 };
 
-static void
-nv50_cursor_set_pos(struct nouveau_crtc *nv_crtc, int x, int y)
-{
-}
-
-static void
-nv50_cursor_set_offset(struct nouveau_crtc *nv_crtc, uint32_t offset)
-{
-}
-
 static int
 nv50_crtc_create(struct drm_device *dev, int index)
 {
@@ -1390,8 +1406,6 @@ nv50_crtc_create(struct drm_device *dev, int index)
 	head->base.set_color_vibrance = nv50_crtc_set_color_vibrance;
 	head->base.color_vibrance = 50;
 	head->base.vibrant_hue = 0;
-	head->base.cursor.set_offset = nv50_cursor_set_offset;
-	head->base.cursor.set_pos = nv50_cursor_set_pos;
 	for (i = 0; i < 256; i++) {
 		head->base.lut.r[i] = i << 8;
 		head->base.lut.g[i] = i << 8;
@@ -1406,7 +1420,7 @@ nv50_crtc_create(struct drm_device *dev, int index)
 	ret = nouveau_bo_new(dev, 8192, 0x100, TTM_PL_FLAG_VRAM,
 			     0, 0x0000, NULL, NULL, &head->base.lut.nvbo);
 	if (!ret) {
-		ret = nouveau_bo_pin(head->base.lut.nvbo, TTM_PL_FLAG_VRAM);
+		ret = nouveau_bo_pin(head->base.lut.nvbo, TTM_PL_FLAG_VRAM, true);
 		if (!ret) {
 			ret = nouveau_bo_map(head->base.lut.nvbo);
 			if (ret)
@@ -1426,22 +1440,6 @@ nv50_crtc_create(struct drm_device *dev, int index)
 	if (ret)
 		goto out;
 
-	ret = nouveau_bo_new(dev, 64 * 64 * 4, 0x100, TTM_PL_FLAG_VRAM,
-			     0, 0x0000, NULL, NULL, &head->base.cursor.nvbo);
-	if (!ret) {
-		ret = nouveau_bo_pin(head->base.cursor.nvbo, TTM_PL_FLAG_VRAM);
-		if (!ret) {
-			ret = nouveau_bo_map(head->base.cursor.nvbo);
-			if (ret)
-				nouveau_bo_unpin(head->base.lut.nvbo);
-		}
-		if (ret)
-			nouveau_bo_ref(NULL, &head->base.cursor.nvbo);
-	}
-
-	if (ret)
-		goto out;
-
 	/* allocate page flip / sync resources */
 	ret = nv50_base_create(disp->disp, index, disp->sync->bo.offset,
 			      &head->sync);
@@ -1701,7 +1699,8 @@ nv50_audio_mode_set(struct drm_encoder *encoder, struct drm_display_mode *mode)
 	drm_edid_to_eld(&nv_connector->base, nv_connector->edid);
 	memcpy(args.data, nv_connector->base.eld, sizeof(args.data));
 
-	nvif_mthd(disp->disp, 0, &args, sizeof(args.base) + args.data[2] * 4);
+	nvif_mthd(disp->disp, 0, &args,
+		  sizeof(args.base) + drm_eld_size(args.data));
 }
 
 static void
@@ -2373,11 +2372,6 @@ nv50_fb_ctor(struct drm_framebuffer *fb)
 	u8 kind = nouveau_bo_tile_layout(nvbo) >> 8;
 	u8 tile = nvbo->tile_mode;
 
-	if (nvbo->tile_flags & NOUVEAU_GEM_TILE_NONCONTIG) {
-		NV_ERROR(drm, "framebuffer requires contiguous bo\n");
-		return -EINVAL;
-	}
-
 	if (drm->device.info.chipset >= 0xc0)
 		tile >>= 4; /* yep.. */
 
@@ -2491,7 +2485,7 @@ nv50_display_create(struct drm_device *dev)
 	ret = nouveau_bo_new(dev, 4096, 0x1000, TTM_PL_FLAG_VRAM,
 			     0, 0x0000, NULL, NULL, &disp->sync);
 	if (!ret) {
-		ret = nouveau_bo_pin(disp->sync, TTM_PL_FLAG_VRAM);
+		ret = nouveau_bo_pin(disp->sync, TTM_PL_FLAG_VRAM, true);
 		if (!ret) {
 			ret = nouveau_bo_map(disp->sync);
 			if (ret)
diff --git a/drivers/gpu/drm/nouveau/nv50_fence.c b/drivers/gpu/drm/nouveau/nv50_fence.c
index 22d242b37962..a82d9ea7c6fd 100644
--- a/drivers/gpu/drm/nouveau/nv50_fence.c
+++ b/drivers/gpu/drm/nouveau/nv50_fence.c
@@ -102,7 +102,7 @@ nv50_fence_create(struct nouveau_drm *drm)
 	ret = nouveau_bo_new(drm->dev, 4096, 0x1000, TTM_PL_FLAG_VRAM,
 			     0, 0x0000, NULL, NULL, &priv->bo);
 	if (!ret) {
-		ret = nouveau_bo_pin(priv->bo, TTM_PL_FLAG_VRAM);
+		ret = nouveau_bo_pin(priv->bo, TTM_PL_FLAG_VRAM, false);
 		if (!ret) {
 			ret = nouveau_bo_map(priv->bo);
 			if (ret)
diff --git a/drivers/gpu/drm/nouveau/nv84_fence.c b/drivers/gpu/drm/nouveau/nv84_fence.c
index d6c6c87c3f07..cb5b88938d45 100644
--- a/drivers/gpu/drm/nouveau/nv84_fence.c
+++ b/drivers/gpu/drm/nouveau/nv84_fence.c
@@ -234,7 +234,7 @@ nv84_fence_create(struct nouveau_drm *drm)
 	ret = nouveau_bo_new(drm->dev, 16 * priv->base.contexts, 0,
 			     TTM_PL_FLAG_VRAM, 0, 0, NULL, NULL, &priv->bo);
 	if (ret == 0) {
-		ret = nouveau_bo_pin(priv->bo, TTM_PL_FLAG_VRAM);
+		ret = nouveau_bo_pin(priv->bo, TTM_PL_FLAG_VRAM, false);
 		if (ret == 0) {
 			ret = nouveau_bo_map(priv->bo);
 			if (ret)
@@ -246,10 +246,10 @@ nv84_fence_create(struct nouveau_drm *drm)
 
 	if (ret == 0)
 		ret = nouveau_bo_new(drm->dev, 16 * priv->base.contexts, 0,
-				     TTM_PL_FLAG_TT, 0, 0, NULL, NULL,
-				     &priv->bo_gart);
+				     TTM_PL_FLAG_TT | TTM_PL_FLAG_UNCACHED, 0,
+				     0, NULL, NULL, &priv->bo_gart);
 	if (ret == 0) {
-		ret = nouveau_bo_pin(priv->bo_gart, TTM_PL_FLAG_TT);
+		ret = nouveau_bo_pin(priv->bo_gart, TTM_PL_FLAG_TT, false);
 		if (ret == 0) {
 			ret = nouveau_bo_map(priv->bo_gart);
 			if (ret)
diff --git a/drivers/gpu/drm/nouveau/nvif/class.h b/drivers/gpu/drm/nouveau/nvif/class.h
index e5a27df0672b..4e308eacb27a 100644
--- a/drivers/gpu/drm/nouveau/nvif/class.h
+++ b/drivers/gpu/drm/nouveau/nvif/class.h
@@ -35,6 +35,7 @@
 #define GK104_DISP                                                   0x00009170
 #define GK110_DISP                                                   0x00009270
 #define GM107_DISP                                                   0x00009470
+#define GM204_DISP                                                   0x00009570
 
 #define NV50_DISP_CURSOR                                             0x0000507a
 #define G82_DISP_CURSOR                                              0x0000827a
@@ -65,6 +66,7 @@
 #define GK104_DISP_CORE_CHANNEL_DMA                                  0x0000917d
 #define GK110_DISP_CORE_CHANNEL_DMA                                  0x0000927d
 #define GM107_DISP_CORE_CHANNEL_DMA                                  0x0000947d
+#define GM204_DISP_CORE_CHANNEL_DMA                                  0x0000957d
 
 #define NV50_DISP_OVERLAY_CHANNEL_DMA                                0x0000507e
 #define G82_DISP_OVERLAY_CHANNEL_DMA                                 0x0000827e
@@ -131,6 +133,7 @@ struct nv_device_v0 {
 #define NV_DEVICE_V0_DISABLE_COPY1                        0x0000010000000000ULL
 #define NV_DEVICE_V0_DISABLE_VIC                          0x0000020000000000ULL
 #define NV_DEVICE_V0_DISABLE_VENC                         0x0000040000000000ULL
+#define NV_DEVICE_V0_DISABLE_COPY2                        0x0000080000000000ULL
 	__u64 disable;	/* disable particular subsystems */
 	__u64 debug0;	/* as above, but *internal* ids, and *NOT* ABI */
 };
diff --git a/drivers/gpu/drm/nouveau/nvif/client.c b/drivers/gpu/drm/nouveau/nvif/client.c
index 3c4df1fc26dc..3f7ac5bc8e03 100644
--- a/drivers/gpu/drm/nouveau/nvif/client.c
+++ b/drivers/gpu/drm/nouveau/nvif/client.c
@@ -62,6 +62,7 @@ nvif_drivers[] = {
 #else
 	&nvif_driver_drm,
 	&nvif_driver_lib,
+	&nvif_driver_null,
 #endif
 	NULL
 };
diff --git a/drivers/gpu/drm/nouveau/nvif/driver.h b/drivers/gpu/drm/nouveau/nvif/driver.h
index ac4bdb3ea506..8bd39e69229c 100644
--- a/drivers/gpu/drm/nouveau/nvif/driver.h
+++ b/drivers/gpu/drm/nouveau/nvif/driver.h
@@ -17,5 +17,6 @@ struct nvif_driver {
 extern const struct nvif_driver nvif_driver_nvkm;
 extern const struct nvif_driver nvif_driver_drm;
 extern const struct nvif_driver nvif_driver_lib;
+extern const struct nvif_driver nvif_driver_null;
 
 #endif
diff --git a/drivers/gpu/drm/omapdrm/omap_crtc.c b/drivers/gpu/drm/omapdrm/omap_crtc.c
index 2d28dc337cfb..b0566a1ca28f 100644
--- a/drivers/gpu/drm/omapdrm/omap_crtc.c
+++ b/drivers/gpu/drm/omapdrm/omap_crtc.c
@@ -20,6 +20,7 @@
 #include "omap_drv.h"
 
 #include <drm/drm_mode.h>
+#include <drm/drm_plane_helper.h>
 #include "drm_crtc.h"
 #include "drm_crtc_helper.h"
 
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c
index e4849413ee80..aeb91ed653c9 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -612,8 +612,7 @@ int omap_gem_dumb_create(struct drm_file *file, struct drm_device *dev,
 {
 	union omap_gem_size gsize;
 
-	/* in case someone tries to feed us a completely bogus stride: */
-	args->pitch = align_pitch(args->pitch, args->width, args->bpp);
+	args->pitch = align_pitch(0, args->width, args->bpp);
 	args->size = PAGE_ALIGN(args->pitch * args->height);
 
 	gsize = (union omap_gem_size){
diff --git a/drivers/gpu/drm/omapdrm/omap_plane.c b/drivers/gpu/drm/omapdrm/omap_plane.c
index 891a4dc608af..ee8e2b3a117e 100644
--- a/drivers/gpu/drm/omapdrm/omap_plane.c
+++ b/drivers/gpu/drm/omapdrm/omap_plane.c
@@ -388,20 +388,15 @@ struct drm_plane *omap_plane_init(struct drm_device *dev,
 	struct drm_plane *plane = NULL;
 	struct omap_plane *omap_plane;
 	struct omap_overlay_info *info;
-	int ret;
 
 	DBG("%s: priv=%d", plane_names[id], private_plane);
 
 	omap_plane = kzalloc(sizeof(*omap_plane), GFP_KERNEL);
 	if (!omap_plane)
-		goto fail;
+		return NULL;
 
-	ret = drm_flip_work_init(&omap_plane->unpin_work, 16,
+	drm_flip_work_init(&omap_plane->unpin_work,
 			"unpin", unpin_worker);
-	if (ret) {
-		dev_err(dev->dev, "could not allocate unpin FIFO\n");
-		goto fail;
-	}
 
 	omap_plane->nformats = omap_framebuffer_get_formats(
 			omap_plane->formats, ARRAY_SIZE(omap_plane->formats),
@@ -443,10 +438,4 @@ struct drm_plane *omap_plane_init(struct drm_device *dev,
 		omap_plane->info.zorder = id;
 
 	return plane;
-
-fail:
-	if (plane)
-		omap_plane_destroy(plane);
-
-	return NULL;
 }
diff --git a/drivers/gpu/drm/panel/Kconfig b/drivers/gpu/drm/panel/Kconfig
index bee9f72b3a93..024e98ef8e4d 100644
--- a/drivers/gpu/drm/panel/Kconfig
+++ b/drivers/gpu/drm/panel/Kconfig
@@ -27,4 +27,17 @@ config DRM_PANEL_S6E8AA0
 	select DRM_MIPI_DSI
 	select VIDEOMODE_HELPERS
 
+config DRM_PANEL_SHARP_LQ101R1SX01
+	tristate "Sharp LQ101R1SX01 panel"
+	depends on OF
+	depends on DRM_MIPI_DSI
+	help
+	  Say Y here if you want to enable support for Sharp LQ101R1SX01
+	  TFT-LCD modules. The panel has a 2560x1600 resolution and uses
+	  24 bit RGB per pixel. It provides a dual MIPI DSI interface to
+	  the host and has a built-in LED backlight.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called panel-sharp-lq101r1sx01.
+
 endmenu
diff --git a/drivers/gpu/drm/panel/Makefile b/drivers/gpu/drm/panel/Makefile
index 8b929212fad7..4b2a0430804b 100644
--- a/drivers/gpu/drm/panel/Makefile
+++ b/drivers/gpu/drm/panel/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_DRM_PANEL_SIMPLE) += panel-simple.o
 obj-$(CONFIG_DRM_PANEL_LD9040) += panel-ld9040.o
 obj-$(CONFIG_DRM_PANEL_S6E8AA0) += panel-s6e8aa0.o
+obj-$(CONFIG_DRM_PANEL_SHARP_LQ101R1SX01) += panel-sharp-lq101r1sx01.o
diff --git a/drivers/gpu/drm/panel/panel-ld9040.c b/drivers/gpu/drm/panel/panel-ld9040.c
index 42ac67b21e9f..08cf2c588c3d 100644
--- a/drivers/gpu/drm/panel/panel-ld9040.c
+++ b/drivers/gpu/drm/panel/panel-ld9040.c
@@ -145,7 +145,7 @@ static void ld9040_dcs_write(struct ld9040 *ctx, const u8 *data, size_t len)
 	if (ctx->error < 0 || len == 0)
 		return;
 
-	dev_dbg(ctx->dev, "writing dcs seq: %*ph\n", len, data);
+	dev_dbg(ctx->dev, "writing dcs seq: %*ph\n", (int)len, data);
 	ret = ld9040_spi_write_word(ctx, *data);
 
 	while (!ret && --len) {
@@ -154,8 +154,8 @@ static void ld9040_dcs_write(struct ld9040 *ctx, const u8 *data, size_t len)
 	}
 
 	if (ret) {
-		dev_err(ctx->dev, "error %d writing dcs seq: %*ph\n", ret, len,
-			data);
+		dev_err(ctx->dev, "error %d writing dcs seq: %*ph\n", ret,
+			(int)len, data);
 		ctx->error = ret;
 	}
 
@@ -336,17 +336,12 @@ static int ld9040_probe(struct spi_device *spi)
 	if (ret < 0)
 		return ret;
 
-	ctx->reset_gpio = devm_gpiod_get(dev, "reset");
+	ctx->reset_gpio = devm_gpiod_get(dev, "reset", GPIOD_OUT_HIGH);
 	if (IS_ERR(ctx->reset_gpio)) {
 		dev_err(dev, "cannot get reset-gpios %ld\n",
 			PTR_ERR(ctx->reset_gpio));
 		return PTR_ERR(ctx->reset_gpio);
 	}
-	ret = gpiod_direction_output(ctx->reset_gpio, 1);
-	if (ret < 0) {
-		dev_err(dev, "cannot configure reset-gpios %d\n", ret);
-		return ret;
-	}
 
 	spi->bits_per_word = 9;
 	ret = spi_setup(spi);
diff --git a/drivers/gpu/drm/panel/panel-s6e8aa0.c b/drivers/gpu/drm/panel/panel-s6e8aa0.c
index b5217fe37f02..144b2733e3d7 100644
--- a/drivers/gpu/drm/panel/panel-s6e8aa0.c
+++ b/drivers/gpu/drm/panel/panel-s6e8aa0.c
@@ -141,10 +141,10 @@ static void s6e8aa0_dcs_write(struct s6e8aa0 *ctx, const void *data, size_t len)
 	if (ctx->error < 0)
 		return;
 
-	ret = mipi_dsi_dcs_write(dsi, data, len);
+	ret = mipi_dsi_dcs_write_buffer(dsi, data, len);
 	if (ret < 0) {
-		dev_err(ctx->dev, "error %zd writing dcs seq: %*ph\n", ret, len,
-			data);
+		dev_err(ctx->dev, "error %zd writing dcs seq: %*ph\n", ret,
+			(int)len, data);
 		ctx->error = ret;
 	}
 }
@@ -800,27 +800,15 @@ static void s6e8aa0_panel_init(struct s6e8aa0 *ctx)
 }
 
 static void s6e8aa0_set_maximum_return_packet_size(struct s6e8aa0 *ctx,
-						   int size)
+						   u16 size)
 {
 	struct mipi_dsi_device *dsi = to_mipi_dsi_device(ctx->dev);
-	const struct mipi_dsi_host_ops *ops = dsi->host->ops;
-	u8 buf[] = {size, 0};
-	struct mipi_dsi_msg msg = {
-		.channel = dsi->channel,
-		.type = MIPI_DSI_SET_MAXIMUM_RETURN_PACKET_SIZE,
-		.tx_len = sizeof(buf),
-		.tx_buf = buf
-	};
 	int ret;
 
 	if (ctx->error < 0)
 		return;
 
-	if (!ops || !ops->transfer)
-		ret = -EIO;
-	else
-		ret = ops->transfer(dsi->host, &msg);
-
+	ret = mipi_dsi_set_maximum_return_packet_size(dsi, size);
 	if (ret < 0) {
 		dev_err(ctx->dev,
 			"error %d setting maximum return packet size to %d\n",
@@ -1019,17 +1007,12 @@ static int s6e8aa0_probe(struct mipi_dsi_device *dsi)
 		return ret;
 	}
 
-	ctx->reset_gpio = devm_gpiod_get(dev, "reset");
+	ctx->reset_gpio = devm_gpiod_get(dev, "reset", GPIOD_OUT_HIGH);
 	if (IS_ERR(ctx->reset_gpio)) {
 		dev_err(dev, "cannot get reset-gpios %ld\n",
 			PTR_ERR(ctx->reset_gpio));
 		return PTR_ERR(ctx->reset_gpio);
 	}
-	ret = gpiod_direction_output(ctx->reset_gpio, 1);
-	if (ret < 0) {
-		dev_err(dev, "cannot configure reset-gpios %d\n", ret);
-		return ret;
-	}
 
 	ctx->brightness = GAMMA_LEVEL_NUM - 1;
 
@@ -1069,7 +1052,6 @@ static struct mipi_dsi_driver s6e8aa0_driver = {
 	.remove = s6e8aa0_remove,
 	.driver = {
 		.name = "panel_s6e8aa0",
-		.owner = THIS_MODULE,
 		.of_match_table = s6e8aa0_of_match,
 	},
 };
diff --git a/drivers/gpu/drm/panel/panel-sharp-lq101r1sx01.c b/drivers/gpu/drm/panel/panel-sharp-lq101r1sx01.c
new file mode 100644
index 000000000000..9d81759d82fc
--- /dev/null
+++ b/drivers/gpu/drm/panel/panel-sharp-lq101r1sx01.c
@@ -0,0 +1,464 @@
+/*
+ * Copyright (C) 2014 NVIDIA Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/backlight.h>
+#include <linux/gpio/consumer.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/regulator/consumer.h>
+
+#include <drm/drmP.h>
+#include <drm/drm_crtc.h>
+#include <drm/drm_mipi_dsi.h>
+#include <drm/drm_panel.h>
+
+#include <video/mipi_display.h>
+
+#include <linux/host1x.h>
+
+struct sharp_panel {
+	struct drm_panel base;
+	/* the datasheet refers to them as DSI-LINK1 and DSI-LINK2 */
+	struct mipi_dsi_device *link1;
+	struct mipi_dsi_device *link2;
+
+	struct backlight_device *backlight;
+	struct regulator *supply;
+
+	bool prepared;
+	bool enabled;
+
+	const struct drm_display_mode *mode;
+};
+
+static inline struct sharp_panel *to_sharp_panel(struct drm_panel *panel)
+{
+	return container_of(panel, struct sharp_panel, base);
+}
+
+static int sharp_panel_write(struct sharp_panel *sharp, u16 offset, u8 value)
+{
+	u8 payload[3] = { offset >> 8, offset & 0xff, value };
+	struct mipi_dsi_device *dsi = sharp->link1;
+	ssize_t err;
+
+	err = mipi_dsi_generic_write(dsi, payload, sizeof(payload));
+	if (err < 0) {
+		dev_err(&dsi->dev, "failed to write %02x to %04x: %zd\n",
+			value, offset, err);
+		return err;
+	}
+
+	err = mipi_dsi_dcs_nop(dsi);
+	if (err < 0) {
+		dev_err(&dsi->dev, "failed to send DCS nop: %zd\n", err);
+		return err;
+	}
+
+	usleep_range(10, 20);
+
+	return 0;
+}
+
+static __maybe_unused int sharp_panel_read(struct sharp_panel *sharp,
+					   u16 offset, u8 *value)
+{
+	ssize_t err;
+
+	cpu_to_be16s(&offset);
+
+	err = mipi_dsi_generic_read(sharp->link1, &offset, sizeof(offset),
+				    value, sizeof(*value));
+	if (err < 0)
+		dev_err(&sharp->link1->dev, "failed to read from %04x: %zd\n",
+			offset, err);
+
+	return err;
+}
+
+static int sharp_panel_disable(struct drm_panel *panel)
+{
+	struct sharp_panel *sharp = to_sharp_panel(panel);
+
+	if (!sharp->enabled)
+		return 0;
+
+	if (sharp->backlight) {
+		sharp->backlight->props.power = FB_BLANK_POWERDOWN;
+		backlight_update_status(sharp->backlight);
+	}
+
+	sharp->enabled = false;
+
+	return 0;
+}
+
+static int sharp_panel_unprepare(struct drm_panel *panel)
+{
+	struct sharp_panel *sharp = to_sharp_panel(panel);
+	int err;
+
+	if (!sharp->prepared)
+		return 0;
+
+	err = mipi_dsi_dcs_set_display_off(sharp->link1);
+	if (err < 0)
+		dev_err(panel->dev, "failed to set display off: %d\n", err);
+
+	err = mipi_dsi_dcs_enter_sleep_mode(sharp->link1);
+	if (err < 0)
+		dev_err(panel->dev, "failed to enter sleep mode: %d\n", err);
+
+	msleep(120);
+
+	regulator_disable(sharp->supply);
+
+	sharp->prepared = false;
+
+	return 0;
+}
+
+static int sharp_setup_symmetrical_split(struct mipi_dsi_device *left,
+					 struct mipi_dsi_device *right,
+					 const struct drm_display_mode *mode)
+{
+	int err;
+
+	err = mipi_dsi_dcs_set_column_address(left, 0, mode->hdisplay / 2 - 1);
+	if (err < 0) {
+		dev_err(&left->dev, "failed to set column address: %d\n", err);
+		return err;
+	}
+
+	err = mipi_dsi_dcs_set_page_address(left, 0, mode->vdisplay - 1);
+	if (err < 0) {
+		dev_err(&left->dev, "failed to set page address: %d\n", err);
+		return err;
+	}
+
+	err = mipi_dsi_dcs_set_column_address(right, mode->hdisplay / 2,
+					      mode->hdisplay - 1);
+	if (err < 0) {
+		dev_err(&right->dev, "failed to set column address: %d\n", err);
+		return err;
+	}
+
+	err = mipi_dsi_dcs_set_page_address(right, 0, mode->vdisplay - 1);
+	if (err < 0) {
+		dev_err(&right->dev, "failed to set page address: %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+static int sharp_panel_prepare(struct drm_panel *panel)
+{
+	struct sharp_panel *sharp = to_sharp_panel(panel);
+	u8 format = MIPI_DCS_PIXEL_FMT_24BIT;
+	int err;
+
+	if (sharp->prepared)
+		return 0;
+
+	err = regulator_enable(sharp->supply);
+	if (err < 0)
+		return err;
+
+	usleep_range(10000, 20000);
+
+	err = mipi_dsi_dcs_soft_reset(sharp->link1);
+	if (err < 0) {
+		dev_err(panel->dev, "soft reset failed: %d\n", err);
+		goto poweroff;
+	}
+
+	msleep(120);
+
+	err = mipi_dsi_dcs_exit_sleep_mode(sharp->link1);
+	if (err < 0) {
+		dev_err(panel->dev, "failed to exit sleep mode: %d\n", err);
+		goto poweroff;
+	}
+
+	/*
+	 * The MIPI DCS specification mandates this delay only between the
+	 * exit_sleep_mode and enter_sleep_mode commands, so it isn't strictly
+	 * necessary here.
+	 */
+	/*
+	msleep(120);
+	*/
+
+	/* set left-right mode */
+	err = sharp_panel_write(sharp, 0x1000, 0x2a);
+	if (err < 0) {
+		dev_err(panel->dev, "failed to set left-right mode: %d\n", err);
+		goto poweroff;
+	}
+
+	/* enable command mode */
+	err = sharp_panel_write(sharp, 0x1001, 0x01);
+	if (err < 0) {
+		dev_err(panel->dev, "failed to enable command mode: %d\n", err);
+		goto poweroff;
+	}
+
+	err = mipi_dsi_dcs_set_pixel_format(sharp->link1, format);
+	if (err < 0) {
+		dev_err(panel->dev, "failed to set pixel format: %d\n", err);
+		goto poweroff;
+	}
+
+	/*
+	 * TODO: The device supports both left-right and even-odd split
+	 * configurations, but this driver currently supports only the left-
+	 * right split. To support a different mode a mechanism needs to be
+	 * put in place to communicate the configuration back to the DSI host
+	 * controller.
+	 */
+	err = sharp_setup_symmetrical_split(sharp->link1, sharp->link2,
+					    sharp->mode);
+	if (err < 0) {
+		dev_err(panel->dev, "failed to set up symmetrical split: %d\n",
+			err);
+		goto poweroff;
+	}
+
+	err = mipi_dsi_dcs_set_display_on(sharp->link1);
+	if (err < 0) {
+		dev_err(panel->dev, "failed to set display on: %d\n", err);
+		goto poweroff;
+	}
+
+	sharp->prepared = true;
+
+	return 0;
+
+poweroff:
+	regulator_disable(sharp->supply);
+	return err;
+}
+
+static int sharp_panel_enable(struct drm_panel *panel)
+{
+	struct sharp_panel *sharp = to_sharp_panel(panel);
+
+	if (sharp->enabled)
+		return 0;
+
+	if (sharp->backlight) {
+		sharp->backlight->props.power = FB_BLANK_UNBLANK;
+		backlight_update_status(sharp->backlight);
+	}
+
+	sharp->enabled = true;
+
+	return 0;
+}
+
+static const struct drm_display_mode default_mode = {
+	.clock = 278000,
+	.hdisplay = 2560,
+	.hsync_start = 2560 + 128,
+	.hsync_end = 2560 + 128 + 64,
+	.htotal = 2560 + 128 + 64 + 64,
+	.vdisplay = 1600,
+	.vsync_start = 1600 + 4,
+	.vsync_end = 1600 + 4 + 8,
+	.vtotal = 1600 + 4 + 8 + 32,
+	.vrefresh = 60,
+};
+
+static int sharp_panel_get_modes(struct drm_panel *panel)
+{
+	struct drm_display_mode *mode;
+
+	mode = drm_mode_duplicate(panel->drm, &default_mode);
+	if (!mode) {
+		dev_err(panel->drm->dev, "failed to add mode %ux%ux@%u\n",
+			default_mode.hdisplay, default_mode.vdisplay,
+			default_mode.vrefresh);
+		return -ENOMEM;
+	}
+
+	drm_mode_set_name(mode);
+
+	drm_mode_probed_add(panel->connector, mode);
+
+	panel->connector->display_info.width_mm = 217;
+	panel->connector->display_info.height_mm = 136;
+
+	return 1;
+}
+
+static const struct drm_panel_funcs sharp_panel_funcs = {
+	.disable = sharp_panel_disable,
+	.unprepare = sharp_panel_unprepare,
+	.prepare = sharp_panel_prepare,
+	.enable = sharp_panel_enable,
+	.get_modes = sharp_panel_get_modes,
+};
+
+static const struct of_device_id sharp_of_match[] = {
+	{ .compatible = "sharp,lq101r1sx01", },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, sharp_of_match);
+
+static int sharp_panel_add(struct sharp_panel *sharp)
+{
+	struct device_node *np;
+	int err;
+
+	sharp->mode = &default_mode;
+
+	sharp->supply = devm_regulator_get(&sharp->link1->dev, "power");
+	if (IS_ERR(sharp->supply))
+		return PTR_ERR(sharp->supply);
+
+	np = of_parse_phandle(sharp->link1->dev.of_node, "backlight", 0);
+	if (np) {
+		sharp->backlight = of_find_backlight_by_node(np);
+		of_node_put(np);
+
+		if (!sharp->backlight)
+			return -EPROBE_DEFER;
+	}
+
+	drm_panel_init(&sharp->base);
+	sharp->base.funcs = &sharp_panel_funcs;
+	sharp->base.dev = &sharp->link1->dev;
+
+	err = drm_panel_add(&sharp->base);
+	if (err < 0)
+		goto put_backlight;
+
+	return 0;
+
+put_backlight:
+	if (sharp->backlight)
+		put_device(&sharp->backlight->dev);
+
+	return err;
+}
+
+static void sharp_panel_del(struct sharp_panel *sharp)
+{
+	if (sharp->base.dev)
+		drm_panel_remove(&sharp->base);
+
+	if (sharp->backlight)
+		put_device(&sharp->backlight->dev);
+
+	if (sharp->link2)
+		put_device(&sharp->link2->dev);
+}
+
+static int sharp_panel_probe(struct mipi_dsi_device *dsi)
+{
+	struct mipi_dsi_device *secondary = NULL;
+	struct sharp_panel *sharp;
+	struct device_node *np;
+	int err;
+
+	dsi->lanes = 4;
+	dsi->format = MIPI_DSI_FMT_RGB888;
+	dsi->mode_flags = MIPI_DSI_MODE_LPM;
+
+	/* Find DSI-LINK1 */
+	np = of_parse_phandle(dsi->dev.of_node, "link2", 0);
+	if (np) {
+		secondary = of_find_mipi_dsi_device_by_node(np);
+		of_node_put(np);
+
+		if (!secondary)
+			return -EPROBE_DEFER;
+	}
+
+	/* register a panel for only the DSI-LINK1 interface */
+	if (secondary) {
+		sharp = devm_kzalloc(&dsi->dev, sizeof(*sharp), GFP_KERNEL);
+		if (!sharp) {
+			put_device(&secondary->dev);
+			return -ENOMEM;
+		}
+
+		mipi_dsi_set_drvdata(dsi, sharp);
+
+		sharp->link2 = secondary;
+		sharp->link1 = dsi;
+
+		err = sharp_panel_add(sharp);
+		if (err < 0) {
+			put_device(&secondary->dev);
+			return err;
+		}
+	}
+
+	err = mipi_dsi_attach(dsi);
+	if (err < 0) {
+		if (secondary)
+			sharp_panel_del(sharp);
+
+		return err;
+	}
+
+	return 0;
+}
+
+static int sharp_panel_remove(struct mipi_dsi_device *dsi)
+{
+	struct sharp_panel *sharp = mipi_dsi_get_drvdata(dsi);
+	int err;
+
+	/* only detach from host for the DSI-LINK2 interface */
+	if (!sharp) {
+		mipi_dsi_detach(dsi);
+		return 0;
+	}
+
+	err = sharp_panel_disable(&sharp->base);
+	if (err < 0)
+		dev_err(&dsi->dev, "failed to disable panel: %d\n", err);
+
+	err = mipi_dsi_detach(dsi);
+	if (err < 0)
+		dev_err(&dsi->dev, "failed to detach from DSI host: %d\n", err);
+
+	drm_panel_detach(&sharp->base);
+	sharp_panel_del(sharp);
+
+	return 0;
+}
+
+static void sharp_panel_shutdown(struct mipi_dsi_device *dsi)
+{
+	struct sharp_panel *sharp = mipi_dsi_get_drvdata(dsi);
+
+	/* nothing to do for DSI-LINK2 */
+	if (!sharp)
+		return;
+
+	sharp_panel_disable(&sharp->base);
+}
+
+static struct mipi_dsi_driver sharp_panel_driver = {
+	.driver = {
+		.name = "panel-sharp-lq101r1sx01",
+		.of_match_table = sharp_of_match,
+	},
+	.probe = sharp_panel_probe,
+	.remove = sharp_panel_remove,
+	.shutdown = sharp_panel_shutdown,
+};
+module_mipi_dsi_driver(sharp_panel_driver);
+
+MODULE_AUTHOR("Thierry Reding <treding@nvidia.com>");
+MODULE_DESCRIPTION("Sharp LQ101R1SX01 panel driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/gpu/drm/panel/panel-simple.c b/drivers/gpu/drm/panel/panel-simple.c
index 12bc8a0ab1cf..e95385bf8356 100644
--- a/drivers/gpu/drm/panel/panel-simple.c
+++ b/drivers/gpu/drm/panel/panel-simple.c
@@ -247,21 +247,14 @@ static int panel_simple_probe(struct device *dev, const struct panel_desc *desc)
 	if (IS_ERR(panel->supply))
 		return PTR_ERR(panel->supply);
 
-	panel->enable_gpio = devm_gpiod_get_optional(dev, "enable");
+	panel->enable_gpio = devm_gpiod_get_optional(dev, "enable",
+						     GPIOD_OUT_LOW);
 	if (IS_ERR(panel->enable_gpio)) {
 		err = PTR_ERR(panel->enable_gpio);
 		dev_err(dev, "failed to request GPIO: %d\n", err);
 		return err;
 	}
 
-	if (panel->enable_gpio) {
-		err = gpiod_direction_output(panel->enable_gpio, 0);
-		if (err < 0) {
-			dev_err(dev, "failed to setup GPIO: %d\n", err);
-			return err;
-		}
-	}
-
 	backlight = of_parse_phandle(dev->of_node, "backlight", 0);
 	if (backlight) {
 		panel->backlight = of_find_backlight_by_node(backlight);
@@ -376,6 +369,29 @@ static const struct panel_desc auo_b101xtn01 = {
 	},
 };
 
+static const struct drm_display_mode auo_b116xw03_mode = {
+	.clock = 70589,
+	.hdisplay = 1366,
+	.hsync_start = 1366 + 40,
+	.hsync_end = 1366 + 40 + 40,
+	.htotal = 1366 + 40 + 40 + 32,
+	.vdisplay = 768,
+	.vsync_start = 768 + 10,
+	.vsync_end = 768 + 10 + 12,
+	.vtotal = 768 + 10 + 12 + 6,
+	.vrefresh = 60,
+};
+
+static const struct panel_desc auo_b116xw03 = {
+	.modes = &auo_b116xw03_mode,
+	.num_modes = 1,
+	.bpc = 6,
+	.size = {
+		.width = 256,
+		.height = 144,
+	},
+};
+
 static const struct drm_display_mode auo_b133xtn01_mode = {
 	.clock = 69500,
 	.hdisplay = 1366,
@@ -415,6 +431,7 @@ static const struct drm_display_mode auo_b133htn01_mode = {
 static const struct panel_desc auo_b133htn01 = {
 	.modes = &auo_b133htn01_mode,
 	.num_modes = 1,
+	.bpc = 6,
 	.size = {
 		.width = 293,
 		.height = 165,
@@ -536,22 +553,92 @@ static const struct drm_display_mode foxlink_fl500wvr00_a0t_mode = {
 static const struct panel_desc foxlink_fl500wvr00_a0t = {
 	.modes = &foxlink_fl500wvr00_a0t_mode,
 	.num_modes = 1,
+	.bpc = 8,
 	.size = {
 		.width = 108,
 		.height = 65,
 	},
 };
 
-static const struct drm_display_mode innolux_n116bge_mode = {
+static const struct drm_display_mode hannstar_hsd070pww1_mode = {
+	.clock = 71100,
+	.hdisplay = 1280,
+	.hsync_start = 1280 + 1,
+	.hsync_end = 1280 + 1 + 158,
+	.htotal = 1280 + 1 + 158 + 1,
+	.vdisplay = 800,
+	.vsync_start = 800 + 1,
+	.vsync_end = 800 + 1 + 21,
+	.vtotal = 800 + 1 + 21 + 1,
+	.vrefresh = 60,
+};
+
+static const struct panel_desc hannstar_hsd070pww1 = {
+	.modes = &hannstar_hsd070pww1_mode,
+	.num_modes = 1,
+	.bpc = 6,
+	.size = {
+		.width = 151,
+		.height = 94,
+	},
+};
+
+static const struct drm_display_mode hitachi_tx23d38vm0caa_mode = {
+	.clock = 33333,
+	.hdisplay = 800,
+	.hsync_start = 800 + 85,
+	.hsync_end = 800 + 85 + 86,
+	.htotal = 800 + 85 + 86 + 85,
+	.vdisplay = 480,
+	.vsync_start = 480 + 16,
+	.vsync_end = 480 + 16 + 13,
+	.vtotal = 480 + 16 + 13 + 16,
+	.vrefresh = 60,
+};
+
+static const struct panel_desc hitachi_tx23d38vm0caa = {
+	.modes = &hitachi_tx23d38vm0caa_mode,
+	.num_modes = 1,
+	.bpc = 6,
+	.size = {
+		.width = 195,
+		.height = 117,
+	},
+};
+
+static const struct drm_display_mode innolux_g121i1_l01_mode = {
 	.clock = 71000,
+	.hdisplay = 1280,
+	.hsync_start = 1280 + 64,
+	.hsync_end = 1280 + 64 + 32,
+	.htotal = 1280 + 64 + 32 + 64,
+	.vdisplay = 800,
+	.vsync_start = 800 + 9,
+	.vsync_end = 800 + 9 + 6,
+	.vtotal = 800 + 9 + 6 + 9,
+	.vrefresh = 60,
+};
+
+static const struct panel_desc innolux_g121i1_l01 = {
+	.modes = &innolux_g121i1_l01_mode,
+	.num_modes = 1,
+	.bpc = 6,
+	.size = {
+		.width = 261,
+		.height = 163,
+	},
+};
+
+static const struct drm_display_mode innolux_n116bge_mode = {
+	.clock = 76420,
 	.hdisplay = 1366,
-	.hsync_start = 1366 + 64,
-	.hsync_end = 1366 + 64 + 6,
-	.htotal = 1366 + 64 + 6 + 64,
+	.hsync_start = 1366 + 136,
+	.hsync_end = 1366 + 136 + 30,
+	.htotal = 1366 + 136 + 30 + 60,
 	.vdisplay = 768,
 	.vsync_start = 768 + 8,
-	.vsync_end = 768 + 8 + 4,
-	.vtotal = 768 + 8 + 4 + 8,
+	.vsync_end = 768 + 8 + 12,
+	.vtotal = 768 + 8 + 12 + 12,
 	.vrefresh = 60,
 	.flags = DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_NVSYNC,
 };
@@ -643,6 +730,9 @@ static const struct of_device_id platform_of_match[] = {
 		.compatible = "auo,b101xtn01",
 		.data = &auo_b101xtn01,
 	}, {
+		.compatible = "auo,b116xw03",
+		.data = &auo_b116xw03,
+	}, {
 		.compatible = "auo,b133htn01",
 		.data = &auo_b133htn01,
 	}, {
@@ -667,6 +757,15 @@ static const struct of_device_id platform_of_match[] = {
 		.compatible = "foxlink,fl500wvr00-a0t",
 		.data = &foxlink_fl500wvr00_a0t,
 	}, {
+		.compatible = "hannstar,hsd070pww1",
+		.data = &hannstar_hsd070pww1,
+	}, {
+		.compatible = "hit,tx23d38vm0caa",
+		.data = &hitachi_tx23d38vm0caa
+	}, {
+		.compatible ="innolux,g121i1-l01",
+		.data = &innolux_g121i1_l01
+	}, {
 		.compatible = "innolux,n116bge",
 		.data = &innolux_n116bge,
 	}, {
@@ -740,6 +839,7 @@ static const struct panel_desc_dsi lg_ld070wx3_sl01 = {
 	.desc = {
 		.modes = &lg_ld070wx3_sl01_mode,
 		.num_modes = 1,
+		.bpc = 8,
 		.size = {
 			.width = 94,
 			.height = 151,
@@ -767,6 +867,7 @@ static const struct panel_desc_dsi lg_lh500wx1_sd03 = {
 	.desc = {
 		.modes = &lg_lh500wx1_sd03_mode,
 		.num_modes = 1,
+		.bpc = 8,
 		.size = {
 			.width = 62,
 			.height = 110,
@@ -794,6 +895,7 @@ static const struct panel_desc_dsi panasonic_vvx10f004b00 = {
 	.desc = {
 		.modes = &panasonic_vvx10f004b00_mode,
 		.num_modes = 1,
+		.bpc = 8,
 		.size = {
 			.width = 217,
 			.height = 136,
@@ -863,7 +965,6 @@ static void panel_simple_dsi_shutdown(struct mipi_dsi_device *dsi)
 static struct mipi_dsi_driver panel_simple_dsi_driver = {
 	.driver = {
 		.name = "panel-simple-dsi",
-		.owner = THIS_MODULE,
 		.of_match_table = dsi_of_match,
 	},
 	.probe = panel_simple_dsi_probe,
diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c
index 0d1396266857..4a0a8b29b0a1 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -29,6 +29,7 @@
 #include "qxl_drv.h"
 #include "qxl_object.h"
 #include "drm_crtc_helper.h"
+#include <drm/drm_plane_helper.h>
 
 static bool qxl_head_enabled(struct qxl_head *head)
 {
@@ -100,14 +101,37 @@ static int qxl_display_copy_rom_client_monitors_config(struct qxl_device *qdev)
 	return 0;
 }
 
+static void qxl_update_offset_props(struct qxl_device *qdev)
+{
+	struct drm_device *dev = qdev->ddev;
+	struct drm_connector *connector;
+	struct qxl_output *output;
+	struct qxl_head *head;
+
+	list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
+		output = drm_connector_to_qxl_output(connector);
+
+		head = &qdev->client_monitors_config->heads[output->index];
+
+		drm_object_property_set_value(&connector->base,
+			dev->mode_config.suggested_x_property, head->x);
+		drm_object_property_set_value(&connector->base,
+			dev->mode_config.suggested_y_property, head->y);
+	}
+}
+
 void qxl_display_read_client_monitors_config(struct qxl_device *qdev)
 {
 
+	struct drm_device *dev = qdev->ddev;
 	while (qxl_display_copy_rom_client_monitors_config(qdev)) {
 		qxl_io_log(qdev, "failed crc check for client_monitors_config,"
 				 " retrying\n");
 	}
 
+	drm_modeset_lock_all(dev);
+	qxl_update_offset_props(qdev);
+	drm_modeset_unlock_all(dev);
 	if (!drm_helper_hpd_irq_event(qdev->ddev)) {
 		/* notify that the monitor configuration changed, to
 		   adjust at the arbitrary resolution */
@@ -568,7 +592,6 @@ static int qxl_crtc_mode_set(struct drm_crtc *crtc,
 {
 	struct drm_device *dev = crtc->dev;
 	struct qxl_device *qdev = dev->dev_private;
-	struct qxl_mode *m = (void *)mode->private;
 	struct qxl_framebuffer *qfb;
 	struct qxl_bo *bo, *old_bo = NULL;
 	struct qxl_crtc *qcrtc = to_qxl_crtc(crtc);
@@ -586,12 +609,6 @@ static int qxl_crtc_mode_set(struct drm_crtc *crtc,
 	}
 	qfb = to_qxl_framebuffer(crtc->primary->fb);
 	bo = gem_to_qxl_bo(qfb->obj);
-	if (!m)
-		/* and do we care? */
-		DRM_DEBUG("%dx%d: not a native mode\n", x, y);
-	else
-		DRM_DEBUG("%dx%d: qxl id %d\n",
-			  mode->hdisplay, mode->vdisplay, m->id);
 	DRM_DEBUG("+%d+%d (%d,%d) => (%d,%d)\n",
 		  x, y,
 		  mode->hdisplay, mode->vdisplay,
@@ -951,6 +968,10 @@ static int qdev_output_init(struct drm_device *dev, int num_output)
 
 	drm_object_attach_property(&connector->base,
 				   qdev->hotplug_mode_update_property, 0);
+	drm_object_attach_property(&connector->base,
+				   dev->mode_config.suggested_x_property, 0);
+	drm_object_attach_property(&connector->base,
+				   dev->mode_config.suggested_y_property, 0);
 	drm_connector_register(connector);
 	return 0;
 }
@@ -1064,6 +1085,7 @@ int qxl_modeset_init(struct qxl_device *qdev)
 
 	qdev->ddev->mode_config.fb_base = qdev->vram_base;
 
+	drm_mode_create_suggested_offset_properties(qdev->ddev);
 	qxl_mode_create_hotplug_mode_update_property(qdev);
 
 	for (i = 0 ; i < qxl_num_crtc; ++i) {
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 446e71ca36cb..d9b25684ac98 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -264,7 +264,8 @@ int qxl_release_reserve_list(struct qxl_release *release, bool no_intr)
 	if (list_is_singular(&release->bos))
 		return 0;
 
-	ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos, !no_intr);
+	ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos,
+				     !no_intr, NULL);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/r128/r128_state.c b/drivers/gpu/drm/r128/r128_state.c
index 575e986f82a7..8fd2d9f58f77 100644
--- a/drivers/gpu/drm/r128/r128_state.c
+++ b/drivers/gpu/drm/r128/r128_state.c
@@ -905,7 +905,7 @@ static int r128_cce_dispatch_write_span(struct drm_device *dev,
 	if (IS_ERR(buffer))
 		return PTR_ERR(buffer);
 
-	mask_size = depth->n * sizeof(u8);
+	mask_size = depth->n;
 	if (depth->mask) {
 		mask = memdup_user(depth->mask, mask_size);
 		if (IS_ERR(mask)) {
@@ -1010,7 +1010,7 @@ static int r128_cce_dispatch_write_pixels(struct drm_device *dev,
 	}
 
 	if (depth->mask) {
-		mask_size = depth->n * sizeof(u8);
+		mask_size = depth->n;
 		mask = memdup_user(depth->mask, mask_size);
 		if (IS_ERR(mask)) {
 			kfree(x);
diff --git a/drivers/gpu/drm/radeon/Makefile b/drivers/gpu/drm/radeon/Makefile
index d01b87991422..12bc21219a0e 100644
--- a/drivers/gpu/drm/radeon/Makefile
+++ b/drivers/gpu/drm/radeon/Makefile
@@ -80,7 +80,8 @@ radeon-y += radeon_device.o radeon_asic.o radeon_kms.o \
 	r600_dpm.o rs780_dpm.o rv6xx_dpm.o rv770_dpm.o rv730_dpm.o rv740_dpm.o \
 	rv770_smc.o cypress_dpm.o btc_dpm.o sumo_dpm.o sumo_smc.o trinity_dpm.o \
 	trinity_smc.o ni_dpm.o si_smc.o si_dpm.o kv_smc.o kv_dpm.o ci_smc.o \
-	ci_dpm.o dce6_afmt.o radeon_vm.o radeon_ucode.o radeon_ib.o radeon_mn.o
+	ci_dpm.o dce6_afmt.o radeon_vm.o radeon_ucode.o radeon_ib.o radeon_mn.o \
+	radeon_sync.o
 
 # add async DMA block
 radeon-y += \
@@ -104,6 +105,7 @@ radeon-y += \
 	radeon_vce.o \
 	vce_v1_0.o \
 	vce_v2_0.o \
+	radeon_kfd.o
 
 radeon-$(CONFIG_COMPAT) += radeon_ioc32.o
 radeon-$(CONFIG_VGA_SWITCHEROO) += radeon_atpx_handler.o
diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c b/drivers/gpu/drm/radeon/atombios_crtc.c
index 30d242b25078..d59ec491dbb9 100644
--- a/drivers/gpu/drm/radeon/atombios_crtc.c
+++ b/drivers/gpu/drm/radeon/atombios_crtc.c
@@ -2039,6 +2039,7 @@ int atombios_crtc_mode_set(struct drm_crtc *crtc,
 	atombios_crtc_set_base(crtc, x, y, old_fb);
 	atombios_overscan_setup(crtc, mode, adjusted_mode);
 	atombios_scaler_setup(crtc);
+	radeon_cursor_reset(crtc);
 	/* update the hw version fpr dpm */
 	radeon_crtc->hw_mode = *adjusted_mode;
 
diff --git a/drivers/gpu/drm/radeon/ci_dpm.c b/drivers/gpu/drm/radeon/ci_dpm.c
index 11a55e9dad7f..f373a81ba3d5 100644
--- a/drivers/gpu/drm/radeon/ci_dpm.c
+++ b/drivers/gpu/drm/radeon/ci_dpm.c
@@ -46,15 +46,15 @@
 static const struct ci_pt_defaults defaults_hawaii_xt =
 {
 	1, 0xF, 0xFD, 0x19, 5, 0x14, 0, 0xB0000,
-	{ 0x84,  0x0,   0x0,   0x7F,  0x0,   0x0,   0x5A,  0x60,  0x51,  0x8E,  0x79,  0x6B,  0x5F,  0x90,  0x79  },
-	{ 0x1EA, 0x1EA, 0x1EA, 0x224, 0x224, 0x224, 0x24F, 0x24F, 0x24F, 0x28E, 0x28E, 0x28E, 0x2BC, 0x2BC, 0x2BC }
+	{ 0x2E,  0x00,  0x00,  0x88,  0x00,  0x00,  0x72,  0x60,  0x51,  0xA7,  0x79,  0x6B,  0x90,  0xBD,  0x79  },
+	{ 0x217, 0x217, 0x217, 0x242, 0x242, 0x242, 0x269, 0x269, 0x269, 0x2A1, 0x2A1, 0x2A1, 0x2C9, 0x2C9, 0x2C9 }
 };
 
 static const struct ci_pt_defaults defaults_hawaii_pro =
 {
 	1, 0xF, 0xFD, 0x19, 5, 0x14, 0, 0x65062,
-	{ 0x93,  0x0,   0x0,   0x97,  0x0,   0x0,   0x6B,  0x60,  0x51,  0x95,  0x79,  0x6B,  0x5F,  0x90,  0x79  },
-	{ 0x1EA, 0x1EA, 0x1EA, 0x224, 0x224, 0x224, 0x24F, 0x24F, 0x24F, 0x28E, 0x28E, 0x28E, 0x2BC, 0x2BC, 0x2BC }
+	{ 0x2E,  0x00,  0x00,  0x88,  0x00,  0x00,  0x72,  0x60,  0x51,  0xA7,  0x79,  0x6B,  0x90,  0xBD,  0x79  },
+	{ 0x217, 0x217, 0x217, 0x242, 0x242, 0x242, 0x269, 0x269, 0x269, 0x2A1, 0x2A1, 0x2A1, 0x2C9, 0x2C9, 0x2C9 }
 };
 
 static const struct ci_pt_defaults defaults_bonaire_xt =
@@ -184,6 +184,9 @@ static int ci_set_overdrive_target_tdp(struct radeon_device *rdev,
 				       u32 target_tdp);
 static int ci_update_uvd_dpm(struct radeon_device *rdev, bool gate);
 
+static PPSMC_Result ci_send_msg_to_smc_with_parameter(struct radeon_device *rdev,
+						      PPSMC_Msg msg, u32 parameter);
+
 static struct ci_power_info *ci_get_pi(struct radeon_device *rdev)
 {
         struct ci_power_info *pi = rdev->pm.dpm.priv;
@@ -249,7 +252,10 @@ static void ci_initialize_powertune_defaults(struct radeon_device *rdev)
 
 	if (pi->caps_power_containment) {
 		pi->caps_cac = true;
-		pi->enable_bapm_feature = true;
+		if (rdev->family == CHIP_HAWAII)
+			pi->enable_bapm_feature = false;
+		else
+			pi->enable_bapm_feature = true;
 		pi->enable_tdc_limit_feature = true;
 		pi->enable_pkg_pwr_tracking_feature = true;
 	}
@@ -352,6 +358,21 @@ static int ci_populate_dw8(struct radeon_device *rdev)
 	return 0;
 }
 
+static int ci_populate_fuzzy_fan(struct radeon_device *rdev)
+{
+	struct ci_power_info *pi = ci_get_pi(rdev);
+
+	if ((rdev->pm.dpm.fan.fan_output_sensitivity & (1 << 15)) ||
+	    (rdev->pm.dpm.fan.fan_output_sensitivity == 0))
+		rdev->pm.dpm.fan.fan_output_sensitivity =
+			rdev->pm.dpm.fan.default_fan_output_sensitivity;
+
+	pi->smc_powertune_table.FuzzyFan_PwmSetDelta =
+		cpu_to_be16(rdev->pm.dpm.fan.fan_output_sensitivity);
+
+	return 0;
+}
+
 static int ci_min_max_v_gnbl_pm_lid_from_bapm_vddc(struct radeon_device *rdev)
 {
 	struct ci_power_info *pi = ci_get_pi(rdev);
@@ -477,6 +498,9 @@ static int ci_populate_pm_base(struct radeon_device *rdev)
 		ret = ci_populate_dw8(rdev);
 		if (ret)
 			return ret;
+		ret = ci_populate_fuzzy_fan(rdev);
+		if (ret)
+			return ret;
 		ret = ci_min_max_v_gnbl_pm_lid_from_bapm_vddc(rdev);
 		if (ret)
 			return ret;
@@ -690,6 +714,25 @@ static int ci_enable_smc_cac(struct radeon_device *rdev, bool enable)
 	return ret;
 }
 
+static int ci_enable_thermal_based_sclk_dpm(struct radeon_device *rdev,
+					    bool enable)
+{
+	struct ci_power_info *pi = ci_get_pi(rdev);
+	PPSMC_Result smc_result = PPSMC_Result_OK;
+
+	if (pi->thermal_sclk_dpm_enabled) {
+		if (enable)
+			smc_result = ci_send_msg_to_smc(rdev, PPSMC_MSG_ENABLE_THERMAL_DPM);
+		else
+			smc_result = ci_send_msg_to_smc(rdev, PPSMC_MSG_DISABLE_THERMAL_DPM);
+	}
+
+	if (smc_result == PPSMC_Result_OK)
+		return 0;
+	else
+		return -EINVAL;
+}
+
 static int ci_power_control_set_level(struct radeon_device *rdev)
 {
 	struct ci_power_info *pi = ci_get_pi(rdev);
@@ -700,13 +743,11 @@ static int ci_power_control_set_level(struct radeon_device *rdev)
 	int ret = 0;
 	bool adjust_polarity = false; /* ??? */
 
-	if (pi->caps_power_containment &&
-	    (pi->power_containment_features & POWERCONTAINMENT_FEATURE_BAPM)) {
+	if (pi->caps_power_containment) {
 		adjust_percent = adjust_polarity ?
 			rdev->pm.dpm.tdp_adjustment : (-1 * rdev->pm.dpm.tdp_adjustment);
 		target_tdp = ((100 + adjust_percent) *
 			      (s32)cac_tdp_table->configurable_tdp) / 100;
-		target_tdp *= 256;
 
 		ret = ci_set_overdrive_target_tdp(rdev, (u32)target_tdp);
 	}
@@ -814,7 +855,7 @@ static void ci_apply_state_adjust_rules(struct radeon_device *rdev,
 	}
 }
 
-static int ci_set_thermal_temperature_range(struct radeon_device *rdev,
+static int ci_thermal_set_temperature_range(struct radeon_device *rdev,
 					    int min_temp, int max_temp)
 {
 	int low_temp = 0 * 1000;
@@ -850,6 +891,350 @@ static int ci_set_thermal_temperature_range(struct radeon_device *rdev,
 	return 0;
 }
 
+static int ci_thermal_enable_alert(struct radeon_device *rdev,
+				   bool enable)
+{
+	u32 thermal_int = RREG32_SMC(CG_THERMAL_INT);
+	PPSMC_Result result;
+
+	if (enable) {
+		thermal_int &= ~(THERM_INT_MASK_HIGH | THERM_INT_MASK_LOW);
+		WREG32_SMC(CG_THERMAL_INT, thermal_int);
+		rdev->irq.dpm_thermal = false;
+		result = ci_send_msg_to_smc(rdev, PPSMC_MSG_Thermal_Cntl_Enable);
+		if (result != PPSMC_Result_OK) {
+			DRM_DEBUG_KMS("Could not enable thermal interrupts.\n");
+			return -EINVAL;
+		}
+	} else {
+		thermal_int |= THERM_INT_MASK_HIGH | THERM_INT_MASK_LOW;
+		WREG32_SMC(CG_THERMAL_INT, thermal_int);
+		rdev->irq.dpm_thermal = true;
+		result = ci_send_msg_to_smc(rdev, PPSMC_MSG_Thermal_Cntl_Disable);
+		if (result != PPSMC_Result_OK) {
+			DRM_DEBUG_KMS("Could not disable thermal interrupts.\n");
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+static void ci_fan_ctrl_set_static_mode(struct radeon_device *rdev, u32 mode)
+{
+	struct ci_power_info *pi = ci_get_pi(rdev);
+	u32 tmp;
+
+	if (pi->fan_ctrl_is_in_default_mode) {
+		tmp = (RREG32_SMC(CG_FDO_CTRL2) & FDO_PWM_MODE_MASK) >> FDO_PWM_MODE_SHIFT;
+		pi->fan_ctrl_default_mode = tmp;
+		tmp = (RREG32_SMC(CG_FDO_CTRL2) & TMIN_MASK) >> TMIN_SHIFT;
+		pi->t_min = tmp;
+		pi->fan_ctrl_is_in_default_mode = false;
+	}
+
+	tmp = RREG32_SMC(CG_FDO_CTRL2) & ~TMIN_MASK;
+	tmp |= TMIN(0);
+	WREG32_SMC(CG_FDO_CTRL2, tmp);
+
+	tmp = RREG32_SMC(CG_FDO_CTRL2) & ~FDO_PWM_MODE_MASK;
+	tmp |= FDO_PWM_MODE(mode);
+	WREG32_SMC(CG_FDO_CTRL2, tmp);
+}
+
+static int ci_thermal_setup_fan_table(struct radeon_device *rdev)
+{
+	struct ci_power_info *pi = ci_get_pi(rdev);
+	SMU7_Discrete_FanTable fan_table = { FDO_MODE_HARDWARE };
+	u32 duty100;
+	u32 t_diff1, t_diff2, pwm_diff1, pwm_diff2;
+	u16 fdo_min, slope1, slope2;
+	u32 reference_clock, tmp;
+	int ret;
+	u64 tmp64;
+
+	if (!pi->fan_table_start) {
+		rdev->pm.dpm.fan.ucode_fan_control = false;
+		return 0;
+	}
+
+	duty100 = (RREG32_SMC(CG_FDO_CTRL1) & FMAX_DUTY100_MASK) >> FMAX_DUTY100_SHIFT;
+
+	if (duty100 == 0) {
+		rdev->pm.dpm.fan.ucode_fan_control = false;
+		return 0;
+	}
+
+	tmp64 = (u64)rdev->pm.dpm.fan.pwm_min * duty100;
+	do_div(tmp64, 10000);
+	fdo_min = (u16)tmp64;
+
+	t_diff1 = rdev->pm.dpm.fan.t_med - rdev->pm.dpm.fan.t_min;
+	t_diff2 = rdev->pm.dpm.fan.t_high - rdev->pm.dpm.fan.t_med;
+
+	pwm_diff1 = rdev->pm.dpm.fan.pwm_med - rdev->pm.dpm.fan.pwm_min;
+	pwm_diff2 = rdev->pm.dpm.fan.pwm_high - rdev->pm.dpm.fan.pwm_med;
+
+	slope1 = (u16)((50 + ((16 * duty100 * pwm_diff1) / t_diff1)) / 100);
+	slope2 = (u16)((50 + ((16 * duty100 * pwm_diff2) / t_diff2)) / 100);
+
+	fan_table.TempMin = cpu_to_be16((50 + rdev->pm.dpm.fan.t_min) / 100);
+	fan_table.TempMed = cpu_to_be16((50 + rdev->pm.dpm.fan.t_med) / 100);
+	fan_table.TempMax = cpu_to_be16((50 + rdev->pm.dpm.fan.t_max) / 100);
+
+	fan_table.Slope1 = cpu_to_be16(slope1);
+	fan_table.Slope2 = cpu_to_be16(slope2);
+
+	fan_table.FdoMin = cpu_to_be16(fdo_min);
+
+	fan_table.HystDown = cpu_to_be16(rdev->pm.dpm.fan.t_hyst);
+
+	fan_table.HystUp = cpu_to_be16(1);
+
+	fan_table.HystSlope = cpu_to_be16(1);
+
+	fan_table.TempRespLim = cpu_to_be16(5);
+
+	reference_clock = radeon_get_xclk(rdev);
+
+	fan_table.RefreshPeriod = cpu_to_be32((rdev->pm.dpm.fan.cycle_delay *
+					       reference_clock) / 1600);
+
+	fan_table.FdoMax = cpu_to_be16((u16)duty100);
+
+	tmp = (RREG32_SMC(CG_MULT_THERMAL_CTRL) & TEMP_SEL_MASK) >> TEMP_SEL_SHIFT;
+	fan_table.TempSrc = (uint8_t)tmp;
+
+	ret = ci_copy_bytes_to_smc(rdev,
+				   pi->fan_table_start,
+				   (u8 *)(&fan_table),
+				   sizeof(fan_table),
+				   pi->sram_end);
+
+	if (ret) {
+		DRM_ERROR("Failed to load fan table to the SMC.");
+		rdev->pm.dpm.fan.ucode_fan_control = false;
+	}
+
+	return 0;
+}
+
+static int ci_fan_ctrl_start_smc_fan_control(struct radeon_device *rdev)
+{
+	struct ci_power_info *pi = ci_get_pi(rdev);
+	PPSMC_Result ret;
+
+	if (pi->caps_od_fuzzy_fan_control_support) {
+		ret = ci_send_msg_to_smc_with_parameter(rdev,
+							PPSMC_StartFanControl,
+							FAN_CONTROL_FUZZY);
+		if (ret != PPSMC_Result_OK)
+			return -EINVAL;
+		ret = ci_send_msg_to_smc_with_parameter(rdev,
+							PPSMC_MSG_SetFanPwmMax,
+							rdev->pm.dpm.fan.default_max_fan_pwm);
+		if (ret != PPSMC_Result_OK)
+			return -EINVAL;
+	} else {
+		ret = ci_send_msg_to_smc_with_parameter(rdev,
+							PPSMC_StartFanControl,
+							FAN_CONTROL_TABLE);
+		if (ret != PPSMC_Result_OK)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+#if 0
+static int ci_fan_ctrl_stop_smc_fan_control(struct radeon_device *rdev)
+{
+	PPSMC_Result ret;
+
+	ret = ci_send_msg_to_smc(rdev, PPSMC_StopFanControl);
+	if (ret == PPSMC_Result_OK)
+		return 0;
+	else
+		return -EINVAL;
+}
+
+static int ci_fan_ctrl_get_fan_speed_percent(struct radeon_device *rdev,
+					     u32 *speed)
+{
+	u32 duty, duty100;
+	u64 tmp64;
+
+	if (rdev->pm.no_fan)
+		return -ENOENT;
+
+	duty100 = (RREG32_SMC(CG_FDO_CTRL1) & FMAX_DUTY100_MASK) >> FMAX_DUTY100_SHIFT;
+	duty = (RREG32_SMC(CG_THERMAL_STATUS) & FDO_PWM_DUTY_MASK) >> FDO_PWM_DUTY_SHIFT;
+
+	if (duty100 == 0)
+		return -EINVAL;
+
+	tmp64 = (u64)duty * 100;
+	do_div(tmp64, duty100);
+	*speed = (u32)tmp64;
+
+	if (*speed > 100)
+		*speed = 100;
+
+	return 0;
+}
+
+static int ci_fan_ctrl_set_fan_speed_percent(struct radeon_device *rdev,
+					     u32 speed)
+{
+	u32 tmp;
+	u32 duty, duty100;
+	u64 tmp64;
+
+	if (rdev->pm.no_fan)
+		return -ENOENT;
+
+	if (speed > 100)
+		return -EINVAL;
+
+	if (rdev->pm.dpm.fan.ucode_fan_control)
+		ci_fan_ctrl_stop_smc_fan_control(rdev);
+
+	duty100 = (RREG32_SMC(CG_FDO_CTRL1) & FMAX_DUTY100_MASK) >> FMAX_DUTY100_SHIFT;
+
+	if (duty100 == 0)
+		return -EINVAL;
+
+	tmp64 = (u64)speed * duty100;
+	do_div(tmp64, 100);
+	duty = (u32)tmp64;
+
+	tmp = RREG32_SMC(CG_FDO_CTRL0) & ~FDO_STATIC_DUTY_MASK;
+	tmp |= FDO_STATIC_DUTY(duty);
+	WREG32_SMC(CG_FDO_CTRL0, tmp);
+
+	ci_fan_ctrl_set_static_mode(rdev, FDO_PWM_MODE_STATIC);
+
+	return 0;
+}
+
+static int ci_fan_ctrl_get_fan_speed_rpm(struct radeon_device *rdev,
+					 u32 *speed)
+{
+	u32 tach_period;
+	u32 xclk = radeon_get_xclk(rdev);
+
+	if (rdev->pm.no_fan)
+		return -ENOENT;
+
+	if (rdev->pm.fan_pulses_per_revolution == 0)
+		return -ENOENT;
+
+	tach_period = (RREG32_SMC(CG_TACH_STATUS) & TACH_PERIOD_MASK) >> TACH_PERIOD_SHIFT;
+	if (tach_period == 0)
+		return -ENOENT;
+
+	*speed = 60 * xclk * 10000 / tach_period;
+
+	return 0;
+}
+
+static int ci_fan_ctrl_set_fan_speed_rpm(struct radeon_device *rdev,
+					 u32 speed)
+{
+	u32 tach_period, tmp;
+	u32 xclk = radeon_get_xclk(rdev);
+
+	if (rdev->pm.no_fan)
+		return -ENOENT;
+
+	if (rdev->pm.fan_pulses_per_revolution == 0)
+		return -ENOENT;
+
+	if ((speed < rdev->pm.fan_min_rpm) ||
+	    (speed > rdev->pm.fan_max_rpm))
+		return -EINVAL;
+
+	if (rdev->pm.dpm.fan.ucode_fan_control)
+		ci_fan_ctrl_stop_smc_fan_control(rdev);
+
+	tach_period = 60 * xclk * 10000 / (8 * speed);
+	tmp = RREG32_SMC(CG_TACH_CTRL) & ~TARGET_PERIOD_MASK;
+	tmp |= TARGET_PERIOD(tach_period);
+	WREG32_SMC(CG_TACH_CTRL, tmp);
+
+	ci_fan_ctrl_set_static_mode(rdev, FDO_PWM_MODE_STATIC_RPM);
+
+	return 0;
+}
+#endif
+
+static void ci_fan_ctrl_set_default_mode(struct radeon_device *rdev)
+{
+	struct ci_power_info *pi = ci_get_pi(rdev);
+	u32 tmp;
+
+	if (!pi->fan_ctrl_is_in_default_mode) {
+		tmp = RREG32_SMC(CG_FDO_CTRL2) & ~FDO_PWM_MODE_MASK;
+		tmp |= FDO_PWM_MODE(pi->fan_ctrl_default_mode);
+		WREG32_SMC(CG_FDO_CTRL2, tmp);
+
+		tmp = RREG32_SMC(CG_FDO_CTRL2) & ~TMIN_MASK;
+		tmp |= TMIN(pi->t_min);
+		WREG32_SMC(CG_FDO_CTRL2, tmp);
+		pi->fan_ctrl_is_in_default_mode = true;
+	}
+}
+
+static void ci_thermal_start_smc_fan_control(struct radeon_device *rdev)
+{
+	if (rdev->pm.dpm.fan.ucode_fan_control) {
+		ci_fan_ctrl_start_smc_fan_control(rdev);
+		ci_fan_ctrl_set_static_mode(rdev, FDO_PWM_MODE_STATIC);
+	}
+}
+
+static void ci_thermal_initialize(struct radeon_device *rdev)
+{
+	u32 tmp;
+
+	if (rdev->pm.fan_pulses_per_revolution) {
+		tmp = RREG32_SMC(CG_TACH_CTRL) & ~EDGE_PER_REV_MASK;
+		tmp |= EDGE_PER_REV(rdev->pm.fan_pulses_per_revolution -1);
+		WREG32_SMC(CG_TACH_CTRL, tmp);
+	}
+
+	tmp = RREG32_SMC(CG_FDO_CTRL2) & ~TACH_PWM_RESP_RATE_MASK;
+	tmp |= TACH_PWM_RESP_RATE(0x28);
+	WREG32_SMC(CG_FDO_CTRL2, tmp);
+}
+
+static int ci_thermal_start_thermal_controller(struct radeon_device *rdev)
+{
+	int ret;
+
+	ci_thermal_initialize(rdev);
+	ret = ci_thermal_set_temperature_range(rdev, R600_TEMP_RANGE_MIN, R600_TEMP_RANGE_MAX);
+	if (ret)
+		return ret;
+	ret = ci_thermal_enable_alert(rdev, true);
+	if (ret)
+		return ret;
+	if (rdev->pm.dpm.fan.ucode_fan_control) {
+		ret = ci_thermal_setup_fan_table(rdev);
+		if (ret)
+			return ret;
+		ci_thermal_start_smc_fan_control(rdev);
+	}
+
+	return 0;
+}
+
+static void ci_thermal_stop_thermal_controller(struct radeon_device *rdev)
+{
+	if (!rdev->pm.no_fan)
+		ci_fan_ctrl_set_default_mode(rdev);
+}
+
 #if 0
 static int ci_read_smc_soft_register(struct radeon_device *rdev,
 				     u16 reg_offset, u32 *value)
@@ -1253,7 +1638,7 @@ static int ci_dpm_force_state_sclk(struct radeon_device *rdev, u32 n)
 
 	if (!pi->sclk_dpm_key_disabled) {
 		PPSMC_Result smc_result =
-			ci_send_msg_to_smc_with_parameter(rdev, PPSMC_MSG_DPM_ForceState, n);
+			ci_send_msg_to_smc_with_parameter(rdev, PPSMC_MSG_SCLKDPM_SetEnabledMask, 1 << n);
 		if (smc_result != PPSMC_Result_OK)
 			return -EINVAL;
 	}
@@ -1267,7 +1652,7 @@ static int ci_dpm_force_state_mclk(struct radeon_device *rdev, u32 n)
 
 	if (!pi->mclk_dpm_key_disabled) {
 		PPSMC_Result smc_result =
-			ci_send_msg_to_smc_with_parameter(rdev, PPSMC_MSG_MCLKDPM_ForceState, n);
+			ci_send_msg_to_smc_with_parameter(rdev, PPSMC_MSG_MCLKDPM_SetEnabledMask, 1 << n);
 		if (smc_result != PPSMC_Result_OK)
 			return -EINVAL;
 	}
@@ -2042,6 +2427,33 @@ static int ci_force_switch_to_arb_f0(struct radeon_device *rdev)
 	return ni_copy_and_switch_arb_sets(rdev, tmp, MC_CG_ARB_FREQ_F0);
 }
 
+static void ci_register_patching_mc_arb(struct radeon_device *rdev,
+					const u32 engine_clock,
+					const u32 memory_clock,
+					u32 *dram_timimg2)
+{
+	bool patch;
+	u32 tmp, tmp2;
+
+	tmp = RREG32(MC_SEQ_MISC0);
+	patch = ((tmp & 0x0000f00) == 0x300) ? true : false;
+
+	if (patch &&
+	    ((rdev->pdev->device == 0x67B0) ||
+	     (rdev->pdev->device == 0x67B1))) {
+		if ((memory_clock > 100000) && (memory_clock <= 125000)) {
+			tmp2 = (((0x31 * engine_clock) / 125000) - 1) & 0xff;
+			*dram_timimg2 &= ~0x00ff0000;
+			*dram_timimg2 |= tmp2 << 16;
+		} else if ((memory_clock > 125000) && (memory_clock <= 137500)) {
+			tmp2 = (((0x36 * engine_clock) / 137500) - 1) & 0xff;
+			*dram_timimg2 &= ~0x00ff0000;
+			*dram_timimg2 |= tmp2 << 16;
+		}
+	}
+}
+
+
 static int ci_populate_memory_timing_parameters(struct radeon_device *rdev,
 						u32 sclk,
 						u32 mclk,
@@ -2057,6 +2469,8 @@ static int ci_populate_memory_timing_parameters(struct radeon_device *rdev,
 	dram_timing2 = RREG32(MC_ARB_DRAM_TIMING2);
 	burst_time = RREG32(MC_ARB_BURST_TIME) & STATE0_MASK;
 
+	ci_register_patching_mc_arb(rdev, sclk, mclk, &dram_timing2);
+
 	arb_regs->McArbDramTiming  = cpu_to_be32(dram_timing);
 	arb_regs->McArbDramTiming2 = cpu_to_be32(dram_timing2);
 	arb_regs->McArbBurstTime = (u8)burst_time;
@@ -2351,10 +2765,10 @@ static int ci_calculate_mclk_params(struct radeon_device *rdev,
 		u32 tmp;
 		u32 reference_clock = rdev->clock.mpll.reference_freq;
 
-		if (pi->mem_gddr5)
-			freq_nom = memory_clock * 4;
+		if (mpll_param.qdr == 1)
+			freq_nom = memory_clock * 4 * (1 << mpll_param.post_div);
 		else
-			freq_nom = memory_clock * 2;
+			freq_nom = memory_clock * 2 * (1 << mpll_param.post_div);
 
 		tmp = (freq_nom / reference_clock);
 		tmp = tmp * tmp;
@@ -2434,7 +2848,6 @@ static int ci_populate_single_memory_level(struct radeon_device *rdev,
 						      &memory_level->MinVddcPhases);
 
 	memory_level->EnabledForThrottle = 1;
-	memory_level->EnabledForActivity = 1;
 	memory_level->UpH = 0;
 	memory_level->DownH = 100;
 	memory_level->VoltageDownH = 0;
@@ -2767,7 +3180,6 @@ static int ci_populate_single_graphic_level(struct radeon_device *rdev,
 
 	graphic_level->CcPwrDynRm = 0;
 	graphic_level->CcPwrDynRm1 = 0;
-	graphic_level->EnabledForActivity = 1;
 	graphic_level->EnabledForThrottle = 1;
 	graphic_level->UpH = 0;
 	graphic_level->DownH = 0;
@@ -2816,10 +3228,13 @@ static int ci_populate_all_graphic_levels(struct radeon_device *rdev)
 						       &pi->smc_state_table.GraphicsLevel[i]);
 		if (ret)
 			return ret;
+		if (i > 1)
+			pi->smc_state_table.GraphicsLevel[i].DeepSleepDivId = 0;
 		if (i == (dpm_table->sclk_table.count - 1))
 			pi->smc_state_table.GraphicsLevel[i].DisplayWatermark =
 				PPSMC_DISPLAY_WATERMARK_HIGH;
 	}
+	pi->smc_state_table.GraphicsLevel[0].EnabledForActivity = 1;
 
 	pi->smc_state_table.GraphicsDpmLevelCount = (u8)dpm_table->sclk_table.count;
 	pi->dpm_level_enable_mask.sclk_dpm_enable_mask =
@@ -2863,6 +3278,16 @@ static int ci_populate_all_memory_levels(struct radeon_device *rdev)
 			return ret;
 	}
 
+	pi->smc_state_table.MemoryLevel[0].EnabledForActivity = 1;
+
+	if ((dpm_table->mclk_table.count >= 2) &&
+	    ((rdev->pdev->device == 0x67B0) || (rdev->pdev->device == 0x67B1))) {
+		pi->smc_state_table.MemoryLevel[1].MinVddc =
+			pi->smc_state_table.MemoryLevel[0].MinVddc;
+		pi->smc_state_table.MemoryLevel[1].MinVddcPhases =
+			pi->smc_state_table.MemoryLevel[0].MinVddcPhases;
+	}
+
 	pi->smc_state_table.MemoryLevel[0].ActivityLevel = cpu_to_be16(0x1F);
 
 	pi->smc_state_table.MemoryDpmLevelCount = (u8)dpm_table->mclk_table.count;
@@ -2919,9 +3344,14 @@ static int ci_setup_default_pcie_tables(struct radeon_device *rdev)
 				  &pi->dpm_table.pcie_speed_table,
 				  SMU7_MAX_LEVELS_LINK);
 
-	ci_setup_pcie_table_entry(&pi->dpm_table.pcie_speed_table, 0,
-				  pi->pcie_gen_powersaving.min,
-				  pi->pcie_lane_powersaving.min);
+	if (rdev->family == CHIP_BONAIRE)
+		ci_setup_pcie_table_entry(&pi->dpm_table.pcie_speed_table, 0,
+					  pi->pcie_gen_powersaving.min,
+					  pi->pcie_lane_powersaving.max);
+	else
+		ci_setup_pcie_table_entry(&pi->dpm_table.pcie_speed_table, 0,
+					  pi->pcie_gen_powersaving.min,
+					  pi->pcie_lane_powersaving.min);
 	ci_setup_pcie_table_entry(&pi->dpm_table.pcie_speed_table, 1,
 				  pi->pcie_gen_performance.min,
 				  pi->pcie_lane_performance.min);
@@ -2988,19 +3418,21 @@ static int ci_setup_default_dpm_tables(struct radeon_device *rdev)
 		     allowed_sclk_vddc_table->entries[i].clk)) {
 			pi->dpm_table.sclk_table.dpm_levels[pi->dpm_table.sclk_table.count].value =
 				allowed_sclk_vddc_table->entries[i].clk;
-			pi->dpm_table.sclk_table.dpm_levels[pi->dpm_table.sclk_table.count].enabled = true;
+			pi->dpm_table.sclk_table.dpm_levels[pi->dpm_table.sclk_table.count].enabled =
+				(i == 0) ? true : false;
 			pi->dpm_table.sclk_table.count++;
 		}
 	}
 
 	pi->dpm_table.mclk_table.count = 0;
 	for (i = 0; i < allowed_mclk_table->count; i++) {
-		if ((i==0) ||
+		if ((i == 0) ||
 		    (pi->dpm_table.mclk_table.dpm_levels[pi->dpm_table.mclk_table.count-1].value !=
 		     allowed_mclk_table->entries[i].clk)) {
 			pi->dpm_table.mclk_table.dpm_levels[pi->dpm_table.mclk_table.count].value =
 				allowed_mclk_table->entries[i].clk;
-			pi->dpm_table.mclk_table.dpm_levels[pi->dpm_table.mclk_table.count].enabled = true;
+			pi->dpm_table.mclk_table.dpm_levels[pi->dpm_table.mclk_table.count].enabled =
+				(i == 0) ? true : false;
 			pi->dpm_table.mclk_table.count++;
 		}
 	}
@@ -3166,7 +3598,7 @@ static int ci_init_smc_table(struct radeon_device *rdev)
 	table->VddcVddciDelta = 4000;
 	table->PhaseResponseTime = 0;
 	table->MemoryThermThrottleEnable = 1;
-	table->PCIeBootLinkLevel = 0;
+	table->PCIeBootLinkLevel = pi->dpm_table.pcie_speed_table.count - 1;
 	table->PCIeGenInterval = 1;
 	if (pi->voltage_control == CISLANDS_VOLTAGE_CONTROL_BY_SVID2)
 		table->SVI2Enable  = 1;
@@ -3320,6 +3752,8 @@ static int ci_upload_dpm_level_enable_mask(struct radeon_device *rdev)
 	struct ci_power_info *pi = ci_get_pi(rdev);
 	PPSMC_Result result;
 
+	ci_apply_disp_minimum_voltage_request(rdev);
+
 	if (!pi->sclk_dpm_key_disabled) {
 		if (pi->dpm_level_enable_mask.sclk_dpm_enable_mask) {
 			result = ci_send_msg_to_smc_with_parameter(rdev,
@@ -3339,7 +3773,7 @@ static int ci_upload_dpm_level_enable_mask(struct radeon_device *rdev)
 				return -EINVAL;
 		}
 	}
-
+#if 0
 	if (!pi->pcie_dpm_key_disabled) {
 		if (pi->dpm_level_enable_mask.pcie_dpm_enable_mask) {
 			result = ci_send_msg_to_smc_with_parameter(rdev,
@@ -3349,9 +3783,7 @@ static int ci_upload_dpm_level_enable_mask(struct radeon_device *rdev)
 				return -EINVAL;
 		}
 	}
-
-	ci_apply_disp_minimum_voltage_request(rdev);
-
+#endif
 	return 0;
 }
 
@@ -3377,7 +3809,7 @@ static void ci_find_dpm_states_clocks_in_dpm_table(struct radeon_device *rdev,
 		pi->need_update_smu7_dpm_table |= DPMTABLE_OD_UPDATE_SCLK;
 	} else {
 		/* XXX check display min clock requirements */
-		if (0 != CISLAND_MINIMUM_ENGINE_CLOCK)
+		if (CISLAND_MINIMUM_ENGINE_CLOCK != CISLAND_MINIMUM_ENGINE_CLOCK)
 			pi->need_update_smu7_dpm_table |= DPMTABLE_UPDATE_SCLK;
 	}
 
@@ -3707,62 +4139,61 @@ int ci_dpm_force_performance_level(struct radeon_device *rdev,
 				   enum radeon_dpm_forced_level level)
 {
 	struct ci_power_info *pi = ci_get_pi(rdev);
-	PPSMC_Result smc_result;
 	u32 tmp, levels, i;
 	int ret;
 
 	if (level == RADEON_DPM_FORCED_LEVEL_HIGH) {
-		if ((!pi->sclk_dpm_key_disabled) &&
-		    pi->dpm_level_enable_mask.sclk_dpm_enable_mask) {
+		if ((!pi->pcie_dpm_key_disabled) &&
+		    pi->dpm_level_enable_mask.pcie_dpm_enable_mask) {
 			levels = 0;
-			tmp = pi->dpm_level_enable_mask.sclk_dpm_enable_mask;
+			tmp = pi->dpm_level_enable_mask.pcie_dpm_enable_mask;
 			while (tmp >>= 1)
 				levels++;
 			if (levels) {
-				ret = ci_dpm_force_state_sclk(rdev, levels);
+				ret = ci_dpm_force_state_pcie(rdev, level);
 				if (ret)
 					return ret;
 				for (i = 0; i < rdev->usec_timeout; i++) {
-					tmp = (RREG32_SMC(TARGET_AND_CURRENT_PROFILE_INDEX) &
-					       CURR_SCLK_INDEX_MASK) >> CURR_SCLK_INDEX_SHIFT;
+					tmp = (RREG32_SMC(TARGET_AND_CURRENT_PROFILE_INDEX_1) &
+					       CURR_PCIE_INDEX_MASK) >> CURR_PCIE_INDEX_SHIFT;
 					if (tmp == levels)
 						break;
 					udelay(1);
 				}
 			}
 		}
-		if ((!pi->mclk_dpm_key_disabled) &&
-		    pi->dpm_level_enable_mask.mclk_dpm_enable_mask) {
+		if ((!pi->sclk_dpm_key_disabled) &&
+		    pi->dpm_level_enable_mask.sclk_dpm_enable_mask) {
 			levels = 0;
-			tmp = pi->dpm_level_enable_mask.mclk_dpm_enable_mask;
+			tmp = pi->dpm_level_enable_mask.sclk_dpm_enable_mask;
 			while (tmp >>= 1)
 				levels++;
 			if (levels) {
-				ret = ci_dpm_force_state_mclk(rdev, levels);
+				ret = ci_dpm_force_state_sclk(rdev, levels);
 				if (ret)
 					return ret;
 				for (i = 0; i < rdev->usec_timeout; i++) {
 					tmp = (RREG32_SMC(TARGET_AND_CURRENT_PROFILE_INDEX) &
-					       CURR_MCLK_INDEX_MASK) >> CURR_MCLK_INDEX_SHIFT;
+					       CURR_SCLK_INDEX_MASK) >> CURR_SCLK_INDEX_SHIFT;
 					if (tmp == levels)
 						break;
 					udelay(1);
 				}
 			}
 		}
-		if ((!pi->pcie_dpm_key_disabled) &&
-		    pi->dpm_level_enable_mask.pcie_dpm_enable_mask) {
+		if ((!pi->mclk_dpm_key_disabled) &&
+		    pi->dpm_level_enable_mask.mclk_dpm_enable_mask) {
 			levels = 0;
-			tmp = pi->dpm_level_enable_mask.pcie_dpm_enable_mask;
+			tmp = pi->dpm_level_enable_mask.mclk_dpm_enable_mask;
 			while (tmp >>= 1)
 				levels++;
 			if (levels) {
-				ret = ci_dpm_force_state_pcie(rdev, level);
+				ret = ci_dpm_force_state_mclk(rdev, levels);
 				if (ret)
 					return ret;
 				for (i = 0; i < rdev->usec_timeout; i++) {
-					tmp = (RREG32_SMC(TARGET_AND_CURRENT_PROFILE_INDEX_1) &
-					       CURR_PCIE_INDEX_MASK) >> CURR_PCIE_INDEX_SHIFT;
+					tmp = (RREG32_SMC(TARGET_AND_CURRENT_PROFILE_INDEX) &
+					       CURR_MCLK_INDEX_MASK) >> CURR_MCLK_INDEX_SHIFT;
 					if (tmp == levels)
 						break;
 					udelay(1);
@@ -3816,21 +4247,17 @@ int ci_dpm_force_performance_level(struct radeon_device *rdev,
 			}
 		}
 	} else if (level == RADEON_DPM_FORCED_LEVEL_AUTO) {
-		if (!pi->sclk_dpm_key_disabled) {
-			smc_result = ci_send_msg_to_smc(rdev, PPSMC_MSG_NoForcedLevel);
-			if (smc_result != PPSMC_Result_OK)
-				return -EINVAL;
-		}
-		if (!pi->mclk_dpm_key_disabled) {
-			smc_result = ci_send_msg_to_smc(rdev, PPSMC_MSG_MCLKDPM_NoForcedLevel);
-			if (smc_result != PPSMC_Result_OK)
-				return -EINVAL;
-		}
 		if (!pi->pcie_dpm_key_disabled) {
-			smc_result = ci_send_msg_to_smc(rdev, PPSMC_MSG_PCIeDPM_UnForceLevel);
+			PPSMC_Result smc_result;
+
+			smc_result = ci_send_msg_to_smc(rdev,
+							PPSMC_MSG_PCIeDPM_UnForceLevel);
 			if (smc_result != PPSMC_Result_OK)
 				return -EINVAL;
 		}
+		ret = ci_upload_dpm_level_enable_mask(rdev);
+		if (ret)
+			return ret;
 	}
 
 	rdev->pm.dpm.forced_level = level;
@@ -4036,6 +4463,96 @@ static int ci_copy_vbios_mc_reg_table(const struct atom_mc_reg_table *table,
 	return 0;
 }
 
+static int ci_register_patching_mc_seq(struct radeon_device *rdev,
+				       struct ci_mc_reg_table *table)
+{
+	u8 i, k;
+	u32 tmp;
+	bool patch;
+
+	tmp = RREG32(MC_SEQ_MISC0);
+	patch = ((tmp & 0x0000f00) == 0x300) ? true : false;
+
+	if (patch &&
+	    ((rdev->pdev->device == 0x67B0) ||
+	     (rdev->pdev->device == 0x67B1))) {
+		for (i = 0; i < table->last; i++) {
+			if (table->last >= SMU7_DISCRETE_MC_REGISTER_ARRAY_SIZE)
+				return -EINVAL;
+			switch(table->mc_reg_address[i].s1 >> 2) {
+			case MC_SEQ_MISC1:
+				for (k = 0; k < table->num_entries; k++) {
+					if ((table->mc_reg_table_entry[k].mclk_max == 125000) ||
+					    (table->mc_reg_table_entry[k].mclk_max == 137500))
+						table->mc_reg_table_entry[k].mc_data[i] =
+							(table->mc_reg_table_entry[k].mc_data[i] & 0xFFFFFFF8) |
+							0x00000007;
+				}
+				break;
+			case MC_SEQ_WR_CTL_D0:
+				for (k = 0; k < table->num_entries; k++) {
+					if ((table->mc_reg_table_entry[k].mclk_max == 125000) ||
+					    (table->mc_reg_table_entry[k].mclk_max == 137500))
+						table->mc_reg_table_entry[k].mc_data[i] =
+							(table->mc_reg_table_entry[k].mc_data[i] & 0xFFFF0F00) |
+							0x0000D0DD;
+				}
+				break;
+			case MC_SEQ_WR_CTL_D1:
+				for (k = 0; k < table->num_entries; k++) {
+					if ((table->mc_reg_table_entry[k].mclk_max == 125000) ||
+					    (table->mc_reg_table_entry[k].mclk_max == 137500))
+						table->mc_reg_table_entry[k].mc_data[i] =
+							(table->mc_reg_table_entry[k].mc_data[i] & 0xFFFF0F00) |
+							0x0000D0DD;
+				}
+				break;
+			case MC_SEQ_WR_CTL_2:
+				for (k = 0; k < table->num_entries; k++) {
+					if ((table->mc_reg_table_entry[k].mclk_max == 125000) ||
+					    (table->mc_reg_table_entry[k].mclk_max == 137500))
+						table->mc_reg_table_entry[k].mc_data[i] = 0;
+				}
+				break;
+			case MC_SEQ_CAS_TIMING:
+				for (k = 0; k < table->num_entries; k++) {
+					if (table->mc_reg_table_entry[k].mclk_max == 125000)
+						table->mc_reg_table_entry[k].mc_data[i] =
+							(table->mc_reg_table_entry[k].mc_data[i] & 0xFFE0FE0F) |
+							0x000C0140;
+					else if (table->mc_reg_table_entry[k].mclk_max == 137500)
+						table->mc_reg_table_entry[k].mc_data[i] =
+							(table->mc_reg_table_entry[k].mc_data[i] & 0xFFE0FE0F) |
+							0x000C0150;
+				}
+				break;
+			case MC_SEQ_MISC_TIMING:
+				for (k = 0; k < table->num_entries; k++) {
+					if (table->mc_reg_table_entry[k].mclk_max == 125000)
+						table->mc_reg_table_entry[k].mc_data[i] =
+							(table->mc_reg_table_entry[k].mc_data[i] & 0xFFFFFFE0) |
+							0x00000030;
+					else if (table->mc_reg_table_entry[k].mclk_max == 137500)
+						table->mc_reg_table_entry[k].mc_data[i] =
+							(table->mc_reg_table_entry[k].mc_data[i] & 0xFFFFFFE0) |
+							0x00000035;
+				}
+				break;
+			default:
+				break;
+			}
+		}
+
+		WREG32(MC_SEQ_IO_DEBUG_INDEX, 3);
+		tmp = RREG32(MC_SEQ_IO_DEBUG_DATA);
+		tmp = (tmp & 0xFFF8FFFF) | (1 << 16);
+		WREG32(MC_SEQ_IO_DEBUG_INDEX, 3);
+		WREG32(MC_SEQ_IO_DEBUG_DATA, tmp);
+	}
+
+	return 0;
+}
+
 static int ci_initialize_mc_reg_table(struct radeon_device *rdev)
 {
 	struct ci_power_info *pi = ci_get_pi(rdev);
@@ -4079,6 +4596,10 @@ static int ci_initialize_mc_reg_table(struct radeon_device *rdev)
 
 	ci_set_s0_mc_reg_index(ci_table);
 
+	ret = ci_register_patching_mc_seq(rdev, ci_table);
+	if (ret)
+		goto init_mc_done;
+
 	ret = ci_set_mc_special_registers(rdev, ci_table);
 	if (ret)
 		goto init_mc_done;
@@ -4675,36 +5196,51 @@ int ci_dpm_enable(struct radeon_device *rdev)
 		return ret;
 	}
 
+	ret = ci_power_control_set_level(rdev);
+	if (ret) {
+		DRM_ERROR("ci_power_control_set_level failed\n");
+		return ret;
+	}
+
 	ci_enable_auto_throttle_source(rdev, RADEON_DPM_AUTO_THROTTLE_SRC_THERMAL, true);
 
+	ret = ci_enable_thermal_based_sclk_dpm(rdev, true);
+	if (ret) {
+		DRM_ERROR("ci_enable_thermal_based_sclk_dpm failed\n");
+		return ret;
+	}
+
+	ci_thermal_start_thermal_controller(rdev);
+
 	ci_update_current_ps(rdev, boot_ps);
 
 	return 0;
 }
 
-int ci_dpm_late_enable(struct radeon_device *rdev)
+static int ci_set_temperature_range(struct radeon_device *rdev)
 {
 	int ret;
 
-	if (rdev->irq.installed &&
-	    r600_is_internal_thermal_sensor(rdev->pm.int_thermal_type)) {
-#if 0
-		PPSMC_Result result;
-#endif
-		ret = ci_set_thermal_temperature_range(rdev, R600_TEMP_RANGE_MIN, R600_TEMP_RANGE_MAX);
-		if (ret) {
-			DRM_ERROR("ci_set_thermal_temperature_range failed\n");
-			return ret;
-		}
-		rdev->irq.dpm_thermal = true;
-		radeon_irq_set(rdev);
-#if 0
-		result = ci_send_msg_to_smc(rdev, PPSMC_MSG_EnableThermalInterrupt);
+	ret = ci_thermal_enable_alert(rdev, false);
+	if (ret)
+		return ret;
+	ret = ci_thermal_set_temperature_range(rdev, R600_TEMP_RANGE_MIN, R600_TEMP_RANGE_MAX);
+	if (ret)
+		return ret;
+	ret = ci_thermal_enable_alert(rdev, true);
+	if (ret)
+		return ret;
 
-		if (result != PPSMC_Result_OK)
-			DRM_DEBUG_KMS("Could not enable thermal interrupts.\n");
-#endif
-	}
+	return ret;
+}
+
+int ci_dpm_late_enable(struct radeon_device *rdev)
+{
+	int ret;
+
+	ret = ci_set_temperature_range(rdev);
+	if (ret)
+		return ret;
 
 	ci_dpm_powergate_uvd(rdev, true);
 
@@ -4721,6 +5257,8 @@ void ci_dpm_disable(struct radeon_device *rdev)
 	if (!ci_is_smc_running(rdev))
 		return;
 
+	ci_thermal_stop_thermal_controller(rdev);
+
 	if (pi->thermal_protection)
 		ci_enable_thermal_protection(rdev, false);
 	ci_enable_power_containment(rdev, false);
@@ -4729,12 +5267,13 @@ void ci_dpm_disable(struct radeon_device *rdev)
 	ci_enable_spread_spectrum(rdev, false);
 	ci_enable_auto_throttle_source(rdev, RADEON_DPM_AUTO_THROTTLE_SRC_THERMAL, false);
 	ci_stop_dpm(rdev);
-	ci_enable_ds_master_switch(rdev, true);
+	ci_enable_ds_master_switch(rdev, false);
 	ci_enable_ulv(rdev, false);
 	ci_clear_vc(rdev);
 	ci_reset_to_default(rdev);
 	ci_dpm_stop_smc(rdev);
 	ci_force_switch_to_arb_f0(rdev);
+	ci_enable_thermal_based_sclk_dpm(rdev, false);
 
 	ci_update_current_ps(rdev, boot_ps);
 }
@@ -4804,11 +5343,6 @@ int ci_dpm_set_power_state(struct radeon_device *rdev)
 	return 0;
 }
 
-int ci_dpm_power_control_set_level(struct radeon_device *rdev)
-{
-	return ci_power_control_set_level(rdev);
-}
-
 void ci_dpm_reset_asic(struct radeon_device *rdev)
 {
 	ci_set_boot_state(rdev);
@@ -5068,6 +5602,8 @@ void ci_dpm_fini(struct radeon_device *rdev)
 int ci_dpm_init(struct radeon_device *rdev)
 {
 	int index = GetIndexIntoMasterTable(DATA, ASIC_InternalSS_Info);
+	SMU7_Discrete_DpmTable  *dpm_table;
+	struct radeon_gpio_rec gpio;
 	u16 data_offset, size;
 	u8 frev, crev;
 	struct ci_power_info *pi;
@@ -5137,6 +5673,7 @@ int ci_dpm_init(struct radeon_device *rdev)
 	pi->sclk_dpm_key_disabled = 0;
 	pi->mclk_dpm_key_disabled = 0;
 	pi->pcie_dpm_key_disabled = 0;
+	pi->thermal_sclk_dpm_enabled = 0;
 
 	/* mclk dpm is unstable on some R7 260X cards with the old mc ucode */
 	if ((rdev->pdev->device == 0x6658) &&
@@ -5201,6 +5738,55 @@ int ci_dpm_init(struct radeon_device *rdev)
 
 	pi->uvd_enabled = false;
 
+	dpm_table = &pi->smc_state_table;
+
+	gpio = radeon_atombios_lookup_gpio(rdev, VDDC_VRHOT_GPIO_PINID);
+	if (gpio.valid) {
+		dpm_table->VRHotGpio = gpio.shift;
+		rdev->pm.dpm.platform_caps |= ATOM_PP_PLATFORM_CAP_REGULATOR_HOT;
+	} else {
+		dpm_table->VRHotGpio = CISLANDS_UNUSED_GPIO_PIN;
+		rdev->pm.dpm.platform_caps &= ~ATOM_PP_PLATFORM_CAP_REGULATOR_HOT;
+	}
+
+	gpio = radeon_atombios_lookup_gpio(rdev, PP_AC_DC_SWITCH_GPIO_PINID);
+	if (gpio.valid) {
+		dpm_table->AcDcGpio = gpio.shift;
+		rdev->pm.dpm.platform_caps |= ATOM_PP_PLATFORM_CAP_HARDWAREDC;
+	} else {
+		dpm_table->AcDcGpio = CISLANDS_UNUSED_GPIO_PIN;
+		rdev->pm.dpm.platform_caps &= ~ATOM_PP_PLATFORM_CAP_HARDWAREDC;
+	}
+
+	gpio = radeon_atombios_lookup_gpio(rdev, VDDC_PCC_GPIO_PINID);
+	if (gpio.valid) {
+		u32 tmp = RREG32_SMC(CNB_PWRMGT_CNTL);
+
+		switch (gpio.shift) {
+		case 0:
+			tmp &= ~GNB_SLOW_MODE_MASK;
+			tmp |= GNB_SLOW_MODE(1);
+			break;
+		case 1:
+			tmp &= ~GNB_SLOW_MODE_MASK;
+			tmp |= GNB_SLOW_MODE(2);
+			break;
+		case 2:
+			tmp |= GNB_SLOW;
+			break;
+		case 3:
+			tmp |= FORCE_NB_PS1;
+			break;
+		case 4:
+			tmp |= DPM_ENABLED;
+			break;
+		default:
+			DRM_ERROR("Invalid PCC GPIO: %u!\n", gpio.shift);
+			break;
+		}
+		WREG32_SMC(CNB_PWRMGT_CNTL, tmp);
+	}
+
 	pi->voltage_control = CISLANDS_VOLTAGE_CONTROL_NONE;
 	pi->vddci_control = CISLANDS_VOLTAGE_CONTROL_NONE;
 	pi->mvdd_control = CISLANDS_VOLTAGE_CONTROL_NONE;
@@ -5262,6 +5848,8 @@ int ci_dpm_init(struct radeon_device *rdev)
 		rdev->pm.dpm.dyn_state.max_clock_voltage_on_dc =
 			rdev->pm.dpm.dyn_state.max_clock_voltage_on_ac;
 
+	pi->fan_ctrl_is_in_default_mode = true;
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/ci_dpm.h b/drivers/gpu/drm/radeon/ci_dpm.h
index 93bbed977ffb..84e3d3bcf9f3 100644
--- a/drivers/gpu/drm/radeon/ci_dpm.h
+++ b/drivers/gpu/drm/radeon/ci_dpm.h
@@ -33,6 +33,8 @@
 
 #define CISLANDS_MAX_HARDWARE_POWERLEVELS 2
 
+#define CISLANDS_UNUSED_GPIO_PIN 0x7F
+
 struct ci_pl {
 	u32 mclk;
 	u32 sclk;
@@ -237,6 +239,7 @@ struct ci_power_info {
 	u32 sclk_dpm_key_disabled;
 	u32 mclk_dpm_key_disabled;
 	u32 pcie_dpm_key_disabled;
+	u32 thermal_sclk_dpm_enabled;
 	struct ci_pcie_perf_range pcie_gen_performance;
 	struct ci_pcie_perf_range pcie_lane_performance;
 	struct ci_pcie_perf_range pcie_gen_powersaving;
@@ -264,6 +267,7 @@ struct ci_power_info {
 	bool caps_automatic_dc_transition;
 	bool caps_sclk_throttle_low_notification;
 	bool caps_dynamic_ac_timing;
+	bool caps_od_fuzzy_fan_control_support;
 	/* flags */
 	bool thermal_protection;
 	bool pcie_performance_request;
@@ -285,6 +289,10 @@ struct ci_power_info {
 	struct ci_ps current_ps;
 	struct radeon_ps requested_rps;
 	struct ci_ps requested_ps;
+	/* fan control */
+	bool fan_ctrl_is_in_default_mode;
+	u32 t_min;
+	u32 fan_ctrl_default_mode;
 };
 
 #define CISLANDS_VOLTAGE_CONTROL_NONE                   0x0
diff --git a/drivers/gpu/drm/radeon/ci_smc.c b/drivers/gpu/drm/radeon/ci_smc.c
index b630edc2fd0c..e78bcad7a43e 100644
--- a/drivers/gpu/drm/radeon/ci_smc.c
+++ b/drivers/gpu/drm/radeon/ci_smc.c
@@ -129,7 +129,7 @@ void ci_reset_smc(struct radeon_device *rdev)
 
 int ci_program_jump_on_start(struct radeon_device *rdev)
 {
-	static u8 data[] = { 0xE0, 0x00, 0x80, 0x40 };
+	static const u8 data[] = { 0xE0, 0x00, 0x80, 0x40 };
 
 	return ci_copy_bytes_to_smc(rdev, 0x0, data, 4, sizeof(data)+1);
 }
diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index 89c01fa6dd8e..6dcde3798b45 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -32,6 +32,7 @@
 #include "cik_blit_shaders.h"
 #include "radeon_ucode.h"
 #include "clearstate_ci.h"
+#include "radeon_kfd.h"
 
 MODULE_FIRMWARE("radeon/BONAIRE_pfp.bin");
 MODULE_FIRMWARE("radeon/BONAIRE_me.bin");
@@ -1563,6 +1564,8 @@ static const u32 godavari_golden_registers[] =
 
 static void cik_init_golden_registers(struct radeon_device *rdev)
 {
+	/* Some of the registers might be dependent on GRBM_GFX_INDEX */
+	mutex_lock(&rdev->grbm_idx_mutex);
 	switch (rdev->family) {
 	case CHIP_BONAIRE:
 		radeon_program_register_sequence(rdev,
@@ -1637,6 +1640,7 @@ static void cik_init_golden_registers(struct radeon_device *rdev)
 	default:
 		break;
 	}
+	mutex_unlock(&rdev->grbm_idx_mutex);
 }
 
 /**
@@ -1806,7 +1810,7 @@ int ci_mc_load_microcode(struct radeon_device *rdev)
 {
 	const __be32 *fw_data = NULL;
 	const __le32 *new_fw_data = NULL;
-	u32 running, blackout = 0;
+	u32 running, blackout = 0, tmp;
 	u32 *io_mc_regs = NULL;
 	const __le32 *new_io_mc_regs = NULL;
 	int i, regs_size, ucode_size;
@@ -1866,6 +1870,15 @@ int ci_mc_load_microcode(struct radeon_device *rdev)
 				WREG32(MC_SEQ_IO_DEBUG_DATA, io_mc_regs[(i << 1) + 1]);
 			}
 		}
+
+		tmp = RREG32(MC_SEQ_MISC0);
+		if ((rdev->pdev->device == 0x6649) && ((tmp & 0xff00) == 0x5600)) {
+			WREG32(MC_SEQ_IO_DEBUG_INDEX, 5);
+			WREG32(MC_SEQ_IO_DEBUG_DATA, 0x00000023);
+			WREG32(MC_SEQ_IO_DEBUG_INDEX, 9);
+			WREG32(MC_SEQ_IO_DEBUG_DATA, 0x000001f0);
+		}
+
 		/* load the MC ucode */
 		for (i = 0; i < ucode_size; i++) {
 			if (rdev->new_fw)
@@ -3419,6 +3432,7 @@ static void cik_setup_rb(struct radeon_device *rdev,
 	u32 disabled_rbs = 0;
 	u32 enabled_rbs = 0;
 
+	mutex_lock(&rdev->grbm_idx_mutex);
 	for (i = 0; i < se_num; i++) {
 		for (j = 0; j < sh_per_se; j++) {
 			cik_select_se_sh(rdev, i, j);
@@ -3430,6 +3444,7 @@ static void cik_setup_rb(struct radeon_device *rdev,
 		}
 	}
 	cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
+	mutex_unlock(&rdev->grbm_idx_mutex);
 
 	mask = 1;
 	for (i = 0; i < max_rb_num_per_se * se_num; i++) {
@@ -3440,6 +3455,7 @@ static void cik_setup_rb(struct radeon_device *rdev,
 
 	rdev->config.cik.backend_enable_mask = enabled_rbs;
 
+	mutex_lock(&rdev->grbm_idx_mutex);
 	for (i = 0; i < se_num; i++) {
 		cik_select_se_sh(rdev, i, 0xffffffff);
 		data = 0;
@@ -3467,6 +3483,7 @@ static void cik_setup_rb(struct radeon_device *rdev,
 		WREG32(PA_SC_RASTER_CONFIG, data);
 	}
 	cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
+	mutex_unlock(&rdev->grbm_idx_mutex);
 }
 
 /**
@@ -3684,6 +3701,12 @@ static void cik_gpu_init(struct radeon_device *rdev)
 	/* set HW defaults for 3D engine */
 	WREG32(CP_MEQ_THRESHOLDS, MEQ1_START(0x30) | MEQ2_START(0x60));
 
+	mutex_lock(&rdev->grbm_idx_mutex);
+	/*
+	 * making sure that the following register writes will be broadcasted
+	 * to all the shaders
+	 */
+	cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
 	WREG32(SX_DEBUG_1, 0x20);
 
 	WREG32(TA_CNTL_AUX, 0x00010000);
@@ -3739,6 +3762,7 @@ static void cik_gpu_init(struct radeon_device *rdev)
 
 	WREG32(PA_CL_ENHANCE, CLIP_VTX_REORDER_ENA | NUM_CLIP_SEQ(3));
 	WREG32(PA_SC_ENHANCE, ENABLE_PA_SC_OUT_OF_ORDER);
+	mutex_unlock(&rdev->grbm_idx_mutex);
 
 	udelay(50);
 }
@@ -3970,31 +3994,27 @@ struct radeon_fence *cik_copy_cpdma(struct radeon_device *rdev,
 				    unsigned num_gpu_pages,
 				    struct reservation_object *resv)
 {
-	struct radeon_semaphore *sem = NULL;
 	struct radeon_fence *fence;
+	struct radeon_sync sync;
 	int ring_index = rdev->asic->copy.blit_ring_index;
 	struct radeon_ring *ring = &rdev->ring[ring_index];
 	u32 size_in_bytes, cur_size_in_bytes, control;
 	int i, num_loops;
 	int r = 0;
 
-	r = radeon_semaphore_create(rdev, &sem);
-	if (r) {
-		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		return ERR_PTR(r);
-	}
+	radeon_sync_create(&sync);
 
 	size_in_bytes = (num_gpu_pages << RADEON_GPU_PAGE_SHIFT);
 	num_loops = DIV_ROUND_UP(size_in_bytes, 0x1fffff);
 	r = radeon_ring_lock(rdev, ring, num_loops * 7 + 18);
 	if (r) {
 		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
-	radeon_semaphore_sync_resv(rdev, sem, resv, false);
-	radeon_semaphore_sync_rings(rdev, sem, ring->idx);
+	radeon_sync_resv(rdev, &sync, resv, false);
+	radeon_sync_rings(rdev, &sync, ring->idx);
 
 	for (i = 0; i < num_loops; i++) {
 		cur_size_in_bytes = size_in_bytes;
@@ -4018,12 +4038,12 @@ struct radeon_fence *cik_copy_cpdma(struct radeon_device *rdev,
 	r = radeon_fence_emit(rdev, &fence, ring->idx);
 	if (r) {
 		radeon_ring_unlock_undo(rdev, ring);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
 	radeon_ring_unlock_commit(rdev, ring, false);
-	radeon_semaphore_free(rdev, &sem, fence);
+	radeon_sync_free(rdev, &sync, fence);
 
 	return fence;
 }
@@ -4046,6 +4066,7 @@ struct radeon_fence *cik_copy_cpdma(struct radeon_device *rdev,
 void cik_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib)
 {
 	struct radeon_ring *ring = &rdev->ring[ib->ring];
+	unsigned vm_id = ib->vm ? ib->vm->ids[ib->ring].id : 0;
 	u32 header, control = INDIRECT_BUFFER_VALID;
 
 	if (ib->is_const_ib) {
@@ -4074,8 +4095,7 @@ void cik_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib)
 		header = PACKET3(PACKET3_INDIRECT_BUFFER, 2);
 	}
 
-	control |= ib->length_dw |
-		(ib->vm ? (ib->vm->id << 24) : 0);
+	control |= ib->length_dw | (vm_id << 24);
 
 	radeon_ring_write(ring, header);
 	radeon_ring_write(ring,
@@ -4675,12 +4695,11 @@ static int cik_mec_init(struct radeon_device *rdev)
 	/*
 	 * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
 	 * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
+	 * Nonetheless, we assign only 1 pipe because all other pipes will
+	 * be handled by KFD
 	 */
-	if (rdev->family == CHIP_KAVERI)
-		rdev->mec.num_mec = 2;
-	else
-		rdev->mec.num_mec = 1;
-	rdev->mec.num_pipe = 4;
+	rdev->mec.num_mec = 1;
+	rdev->mec.num_pipe = 1;
 	rdev->mec.num_queue = rdev->mec.num_mec * rdev->mec.num_pipe * 8;
 
 	if (rdev->mec.hpd_eop_obj == NULL) {
@@ -4822,28 +4841,24 @@ static int cik_cp_compute_resume(struct radeon_device *rdev)
 
 	/* init the pipes */
 	mutex_lock(&rdev->srbm_mutex);
-	for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
-		int me = (i < 4) ? 1 : 2;
-		int pipe = (i < 4) ? i : (i - 4);
 
-		eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i * MEC_HPD_SIZE * 2);
+	eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
 
-		cik_srbm_select(rdev, me, pipe, 0, 0);
+	cik_srbm_select(rdev, 0, 0, 0, 0);
 
-		/* write the EOP addr */
-		WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
-		WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
+	/* write the EOP addr */
+	WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
+	WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
 
-		/* set the VMID assigned */
-		WREG32(CP_HPD_EOP_VMID, 0);
+	/* set the VMID assigned */
+	WREG32(CP_HPD_EOP_VMID, 0);
+
+	/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
+	tmp = RREG32(CP_HPD_EOP_CONTROL);
+	tmp &= ~EOP_SIZE_MASK;
+	tmp |= order_base_2(MEC_HPD_SIZE / 8);
+	WREG32(CP_HPD_EOP_CONTROL, tmp);
 
-		/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
-		tmp = RREG32(CP_HPD_EOP_CONTROL);
-		tmp &= ~EOP_SIZE_MASK;
-		tmp |= order_base_2(MEC_HPD_SIZE / 8);
-		WREG32(CP_HPD_EOP_CONTROL, tmp);
-	}
-	cik_srbm_select(rdev, 0, 0, 0, 0);
 	mutex_unlock(&rdev->srbm_mutex);
 
 	/* init the queues.  Just two for now. */
@@ -5897,8 +5912,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib)
  */
 int cik_vm_init(struct radeon_device *rdev)
 {
-	/* number of VMs */
-	rdev->vm_manager.nvm = 16;
+	/*
+	 * number of VMs
+	 * VMID 0 is reserved for System
+	 * radeon graphics/compute will use VMIDs 1-7
+	 * amdkfd will use VMIDs 8-15
+	 */
+	rdev->vm_manager.nvm = RADEON_NUM_OF_VMIDS;
 	/* base offset of vram pages */
 	if (rdev->flags & RADEON_IS_IGP) {
 		u64 tmp = RREG32(MC_VM_FB_OFFSET);
@@ -5958,26 +5978,23 @@ static void cik_vm_decode_fault(struct radeon_device *rdev,
  * Update the page table base and flush the VM TLB
  * using the CP (CIK).
  */
-void cik_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm)
+void cik_vm_flush(struct radeon_device *rdev, struct radeon_ring *ring,
+		  unsigned vm_id, uint64_t pd_addr)
 {
-	struct radeon_ring *ring = &rdev->ring[ridx];
-	int usepfp = (ridx == RADEON_RING_TYPE_GFX_INDEX);
-
-	if (vm == NULL)
-		return;
+	int usepfp = (ring->idx == RADEON_RING_TYPE_GFX_INDEX);
 
 	radeon_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3));
 	radeon_ring_write(ring, (WRITE_DATA_ENGINE_SEL(usepfp) |
 				 WRITE_DATA_DST_SEL(0)));
-	if (vm->id < 8) {
+	if (vm_id < 8) {
 		radeon_ring_write(ring,
-				  (VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (vm->id << 2)) >> 2);
+				  (VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (vm_id << 2)) >> 2);
 	} else {
 		radeon_ring_write(ring,
-				  (VM_CONTEXT8_PAGE_TABLE_BASE_ADDR + ((vm->id - 8) << 2)) >> 2);
+				  (VM_CONTEXT8_PAGE_TABLE_BASE_ADDR + ((vm_id - 8) << 2)) >> 2);
 	}
 	radeon_ring_write(ring, 0);
-	radeon_ring_write(ring, vm->pd_gpu_addr >> 12);
+	radeon_ring_write(ring, pd_addr >> 12);
 
 	/* update SH_MEM_* regs */
 	radeon_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3));
@@ -5985,7 +6002,7 @@ void cik_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm)
 				 WRITE_DATA_DST_SEL(0)));
 	radeon_ring_write(ring, SRBM_GFX_CNTL >> 2);
 	radeon_ring_write(ring, 0);
-	radeon_ring_write(ring, VMID(vm->id));
+	radeon_ring_write(ring, VMID(vm_id));
 
 	radeon_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 6));
 	radeon_ring_write(ring, (WRITE_DATA_ENGINE_SEL(usepfp) |
@@ -6006,7 +6023,7 @@ void cik_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm)
 	radeon_ring_write(ring, VMID(0));
 
 	/* HDP flush */
-	cik_hdp_flush_cp_ring_emit(rdev, ridx);
+	cik_hdp_flush_cp_ring_emit(rdev, ring->idx);
 
 	/* bits 0-15 are the VM contexts0-15 */
 	radeon_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3));
@@ -6014,7 +6031,7 @@ void cik_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm)
 				 WRITE_DATA_DST_SEL(0)));
 	radeon_ring_write(ring, VM_INVALIDATE_REQUEST >> 2);
 	radeon_ring_write(ring, 0);
-	radeon_ring_write(ring, 1 << vm->id);
+	radeon_ring_write(ring, 1 << vm_id);
 
 	/* compute doesn't have PFP */
 	if (usepfp) {
@@ -6059,6 +6076,7 @@ static void cik_wait_for_rlc_serdes(struct radeon_device *rdev)
 	u32 i, j, k;
 	u32 mask;
 
+	mutex_lock(&rdev->grbm_idx_mutex);
 	for (i = 0; i < rdev->config.cik.max_shader_engines; i++) {
 		for (j = 0; j < rdev->config.cik.max_sh_per_se; j++) {
 			cik_select_se_sh(rdev, i, j);
@@ -6070,6 +6088,7 @@ static void cik_wait_for_rlc_serdes(struct radeon_device *rdev)
 		}
 	}
 	cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
+	mutex_unlock(&rdev->grbm_idx_mutex);
 
 	mask = SE_MASTER_BUSY_MASK | GC_MASTER_BUSY | TC0_MASTER_BUSY | TC1_MASTER_BUSY;
 	for (k = 0; k < rdev->usec_timeout; k++) {
@@ -6204,10 +6223,12 @@ static int cik_rlc_resume(struct radeon_device *rdev)
 	WREG32(RLC_LB_CNTR_INIT, 0);
 	WREG32(RLC_LB_CNTR_MAX, 0x00008000);
 
+	mutex_lock(&rdev->grbm_idx_mutex);
 	cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
 	WREG32(RLC_LB_INIT_CU_MASK, 0xffffffff);
 	WREG32(RLC_LB_PARAMS, 0x00600408);
 	WREG32(RLC_LB_CNTL, 0x80000004);
+	mutex_unlock(&rdev->grbm_idx_mutex);
 
 	WREG32(RLC_MC_CNTL, 0);
 	WREG32(RLC_UCODE_CNTL, 0);
@@ -6274,11 +6295,13 @@ static void cik_enable_cgcg(struct radeon_device *rdev, bool enable)
 
 		tmp = cik_halt_rlc(rdev);
 
+		mutex_lock(&rdev->grbm_idx_mutex);
 		cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
 		WREG32(RLC_SERDES_WR_CU_MASTER_MASK, 0xffffffff);
 		WREG32(RLC_SERDES_WR_NONCU_MASTER_MASK, 0xffffffff);
 		tmp2 = BPM_ADDR_MASK | CGCG_OVERRIDE_0 | CGLS_ENABLE;
 		WREG32(RLC_SERDES_WR_CTRL, tmp2);
+		mutex_unlock(&rdev->grbm_idx_mutex);
 
 		cik_update_rlc(rdev, tmp);
 
@@ -6314,17 +6337,20 @@ static void cik_enable_mgcg(struct radeon_device *rdev, bool enable)
 		}
 
 		orig = data = RREG32(RLC_CGTT_MGCG_OVERRIDE);
+		data |= 0x00000001;
 		data &= 0xfffffffd;
 		if (orig != data)
 			WREG32(RLC_CGTT_MGCG_OVERRIDE, data);
 
 		tmp = cik_halt_rlc(rdev);
 
+		mutex_lock(&rdev->grbm_idx_mutex);
 		cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
 		WREG32(RLC_SERDES_WR_CU_MASTER_MASK, 0xffffffff);
 		WREG32(RLC_SERDES_WR_NONCU_MASTER_MASK, 0xffffffff);
 		data = BPM_ADDR_MASK | MGCG_OVERRIDE_0;
 		WREG32(RLC_SERDES_WR_CTRL, data);
+		mutex_unlock(&rdev->grbm_idx_mutex);
 
 		cik_update_rlc(rdev, tmp);
 
@@ -6345,7 +6371,7 @@ static void cik_enable_mgcg(struct radeon_device *rdev, bool enable)
 		}
 	} else {
 		orig = data = RREG32(RLC_CGTT_MGCG_OVERRIDE);
-		data |= 0x00000002;
+		data |= 0x00000003;
 		if (orig != data)
 			WREG32(RLC_CGTT_MGCG_OVERRIDE, data);
 
@@ -6368,11 +6394,13 @@ static void cik_enable_mgcg(struct radeon_device *rdev, bool enable)
 
 		tmp = cik_halt_rlc(rdev);
 
+		mutex_lock(&rdev->grbm_idx_mutex);
 		cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
 		WREG32(RLC_SERDES_WR_CU_MASTER_MASK, 0xffffffff);
 		WREG32(RLC_SERDES_WR_NONCU_MASTER_MASK, 0xffffffff);
 		data = BPM_ADDR_MASK | MGCG_OVERRIDE_1;
 		WREG32(RLC_SERDES_WR_CTRL, data);
+		mutex_unlock(&rdev->grbm_idx_mutex);
 
 		cik_update_rlc(rdev, tmp);
 	}
@@ -6801,10 +6829,12 @@ static u32 cik_get_cu_active_bitmap(struct radeon_device *rdev, u32 se, u32 sh)
 	u32 mask = 0, tmp, tmp1;
 	int i;
 
+	mutex_lock(&rdev->grbm_idx_mutex);
 	cik_select_se_sh(rdev, se, sh);
 	tmp = RREG32(CC_GC_SHADER_ARRAY_CONFIG);
 	tmp1 = RREG32(GC_USER_SHADER_ARRAY_CONFIG);
 	cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
+	mutex_unlock(&rdev->grbm_idx_mutex);
 
 	tmp &= 0xffff0000;
 
@@ -7288,8 +7318,7 @@ static int cik_irq_init(struct radeon_device *rdev)
 int cik_irq_set(struct radeon_device *rdev)
 {
 	u32 cp_int_cntl;
-	u32 cp_m1p0, cp_m1p1, cp_m1p2, cp_m1p3;
-	u32 cp_m2p0, cp_m2p1, cp_m2p2, cp_m2p3;
+	u32 cp_m1p0;
 	u32 crtc1 = 0, crtc2 = 0, crtc3 = 0, crtc4 = 0, crtc5 = 0, crtc6 = 0;
 	u32 hpd1, hpd2, hpd3, hpd4, hpd5, hpd6;
 	u32 grbm_int_cntl = 0;
@@ -7323,13 +7352,6 @@ int cik_irq_set(struct radeon_device *rdev)
 	dma_cntl1 = RREG32(SDMA0_CNTL + SDMA1_REGISTER_OFFSET) & ~TRAP_ENABLE;
 
 	cp_m1p0 = RREG32(CP_ME1_PIPE0_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m1p1 = RREG32(CP_ME1_PIPE1_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m1p2 = RREG32(CP_ME1_PIPE2_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m1p3 = RREG32(CP_ME1_PIPE3_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m2p0 = RREG32(CP_ME2_PIPE0_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m2p1 = RREG32(CP_ME2_PIPE1_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m2p2 = RREG32(CP_ME2_PIPE2_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m2p3 = RREG32(CP_ME2_PIPE3_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
 
 	if (rdev->flags & RADEON_IS_IGP)
 		thermal_int = RREG32_SMC(CG_THERMAL_INT_CTRL) &
@@ -7351,33 +7373,6 @@ int cik_irq_set(struct radeon_device *rdev)
 			case 0:
 				cp_m1p0 |= TIME_STAMP_INT_ENABLE;
 				break;
-			case 1:
-				cp_m1p1 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 2:
-				cp_m1p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 3:
-				cp_m1p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			default:
-				DRM_DEBUG("si_irq_set: sw int cp1 invalid pipe %d\n", ring->pipe);
-				break;
-			}
-		} else if (ring->me == 2) {
-			switch (ring->pipe) {
-			case 0:
-				cp_m2p0 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 1:
-				cp_m2p1 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 2:
-				cp_m2p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 3:
-				cp_m2p2 |= TIME_STAMP_INT_ENABLE;
-				break;
 			default:
 				DRM_DEBUG("si_irq_set: sw int cp1 invalid pipe %d\n", ring->pipe);
 				break;
@@ -7394,33 +7389,6 @@ int cik_irq_set(struct radeon_device *rdev)
 			case 0:
 				cp_m1p0 |= TIME_STAMP_INT_ENABLE;
 				break;
-			case 1:
-				cp_m1p1 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 2:
-				cp_m1p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 3:
-				cp_m1p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			default:
-				DRM_DEBUG("si_irq_set: sw int cp2 invalid pipe %d\n", ring->pipe);
-				break;
-			}
-		} else if (ring->me == 2) {
-			switch (ring->pipe) {
-			case 0:
-				cp_m2p0 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 1:
-				cp_m2p1 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 2:
-				cp_m2p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 3:
-				cp_m2p2 |= TIME_STAMP_INT_ENABLE;
-				break;
 			default:
 				DRM_DEBUG("si_irq_set: sw int cp2 invalid pipe %d\n", ring->pipe);
 				break;
@@ -7509,13 +7477,6 @@ int cik_irq_set(struct radeon_device *rdev)
 	WREG32(SDMA0_CNTL + SDMA1_REGISTER_OFFSET, dma_cntl1);
 
 	WREG32(CP_ME1_PIPE0_INT_CNTL, cp_m1p0);
-	WREG32(CP_ME1_PIPE1_INT_CNTL, cp_m1p1);
-	WREG32(CP_ME1_PIPE2_INT_CNTL, cp_m1p2);
-	WREG32(CP_ME1_PIPE3_INT_CNTL, cp_m1p3);
-	WREG32(CP_ME2_PIPE0_INT_CNTL, cp_m2p0);
-	WREG32(CP_ME2_PIPE1_INT_CNTL, cp_m2p1);
-	WREG32(CP_ME2_PIPE2_INT_CNTL, cp_m2p2);
-	WREG32(CP_ME2_PIPE3_INT_CNTL, cp_m2p3);
 
 	WREG32(GRBM_INT_CNTL, grbm_int_cntl);
 
@@ -7832,6 +7793,10 @@ restart_ih:
 	while (rptr != wptr) {
 		/* wptr/rptr are in bytes! */
 		ring_index = rptr / 4;
+
+		radeon_kfd_interrupt(rdev,
+				(const void *) &rdev->ih.ring[ring_index]);
+
 		src_id =  le32_to_cpu(rdev->ih.ring[ring_index]) & 0xff;
 		src_data = le32_to_cpu(rdev->ih.ring[ring_index + 1]) & 0xfffffff;
 		ring_id = le32_to_cpu(rdev->ih.ring[ring_index + 2]) & 0xff;
@@ -8521,6 +8486,10 @@ static int cik_startup(struct radeon_device *rdev)
 	if (r)
 		return r;
 
+	r = radeon_kfd_resume(rdev);
+	if (r)
+		return r;
+
 	return 0;
 }
 
@@ -8569,6 +8538,7 @@ int cik_resume(struct radeon_device *rdev)
  */
 int cik_suspend(struct radeon_device *rdev)
 {
+	radeon_kfd_suspend(rdev);
 	radeon_pm_suspend(rdev);
 	dce6_audio_fini(rdev);
 	radeon_vm_manager_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/cik_reg.h b/drivers/gpu/drm/radeon/cik_reg.h
index ca1bb6133580..79c45e8a536b 100644
--- a/drivers/gpu/drm/radeon/cik_reg.h
+++ b/drivers/gpu/drm/radeon/cik_reg.h
@@ -147,4 +147,140 @@
 
 #define CIK_LB_DESKTOP_HEIGHT                     0x6b0c
 
+#define CP_HQD_IQ_RPTR					0xC970u
+#define AQL_ENABLE					(1U << 0)
+
+#define IDLE					(1 << 2)
+
+struct cik_mqd {
+	uint32_t header;
+	uint32_t compute_dispatch_initiator;
+	uint32_t compute_dim_x;
+	uint32_t compute_dim_y;
+	uint32_t compute_dim_z;
+	uint32_t compute_start_x;
+	uint32_t compute_start_y;
+	uint32_t compute_start_z;
+	uint32_t compute_num_thread_x;
+	uint32_t compute_num_thread_y;
+	uint32_t compute_num_thread_z;
+	uint32_t compute_pipelinestat_enable;
+	uint32_t compute_perfcount_enable;
+	uint32_t compute_pgm_lo;
+	uint32_t compute_pgm_hi;
+	uint32_t compute_tba_lo;
+	uint32_t compute_tba_hi;
+	uint32_t compute_tma_lo;
+	uint32_t compute_tma_hi;
+	uint32_t compute_pgm_rsrc1;
+	uint32_t compute_pgm_rsrc2;
+	uint32_t compute_vmid;
+	uint32_t compute_resource_limits;
+	uint32_t compute_static_thread_mgmt_se0;
+	uint32_t compute_static_thread_mgmt_se1;
+	uint32_t compute_tmpring_size;
+	uint32_t compute_static_thread_mgmt_se2;
+	uint32_t compute_static_thread_mgmt_se3;
+	uint32_t compute_restart_x;
+	uint32_t compute_restart_y;
+	uint32_t compute_restart_z;
+	uint32_t compute_thread_trace_enable;
+	uint32_t compute_misc_reserved;
+	uint32_t compute_user_data_0;
+	uint32_t compute_user_data_1;
+	uint32_t compute_user_data_2;
+	uint32_t compute_user_data_3;
+	uint32_t compute_user_data_4;
+	uint32_t compute_user_data_5;
+	uint32_t compute_user_data_6;
+	uint32_t compute_user_data_7;
+	uint32_t compute_user_data_8;
+	uint32_t compute_user_data_9;
+	uint32_t compute_user_data_10;
+	uint32_t compute_user_data_11;
+	uint32_t compute_user_data_12;
+	uint32_t compute_user_data_13;
+	uint32_t compute_user_data_14;
+	uint32_t compute_user_data_15;
+	uint32_t cp_compute_csinvoc_count_lo;
+	uint32_t cp_compute_csinvoc_count_hi;
+	uint32_t cp_mqd_base_addr_lo;
+	uint32_t cp_mqd_base_addr_hi;
+	uint32_t cp_hqd_active;
+	uint32_t cp_hqd_vmid;
+	uint32_t cp_hqd_persistent_state;
+	uint32_t cp_hqd_pipe_priority;
+	uint32_t cp_hqd_queue_priority;
+	uint32_t cp_hqd_quantum;
+	uint32_t cp_hqd_pq_base_lo;
+	uint32_t cp_hqd_pq_base_hi;
+	uint32_t cp_hqd_pq_rptr;
+	uint32_t cp_hqd_pq_rptr_report_addr_lo;
+	uint32_t cp_hqd_pq_rptr_report_addr_hi;
+	uint32_t cp_hqd_pq_wptr_poll_addr_lo;
+	uint32_t cp_hqd_pq_wptr_poll_addr_hi;
+	uint32_t cp_hqd_pq_doorbell_control;
+	uint32_t cp_hqd_pq_wptr;
+	uint32_t cp_hqd_pq_control;
+	uint32_t cp_hqd_ib_base_addr_lo;
+	uint32_t cp_hqd_ib_base_addr_hi;
+	uint32_t cp_hqd_ib_rptr;
+	uint32_t cp_hqd_ib_control;
+	uint32_t cp_hqd_iq_timer;
+	uint32_t cp_hqd_iq_rptr;
+	uint32_t cp_hqd_dequeue_request;
+	uint32_t cp_hqd_dma_offload;
+	uint32_t cp_hqd_sema_cmd;
+	uint32_t cp_hqd_msg_type;
+	uint32_t cp_hqd_atomic0_preop_lo;
+	uint32_t cp_hqd_atomic0_preop_hi;
+	uint32_t cp_hqd_atomic1_preop_lo;
+	uint32_t cp_hqd_atomic1_preop_hi;
+	uint32_t cp_hqd_hq_status0;
+	uint32_t cp_hqd_hq_control0;
+	uint32_t cp_mqd_control;
+	uint32_t cp_mqd_query_time_lo;
+	uint32_t cp_mqd_query_time_hi;
+	uint32_t cp_mqd_connect_start_time_lo;
+	uint32_t cp_mqd_connect_start_time_hi;
+	uint32_t cp_mqd_connect_end_time_lo;
+	uint32_t cp_mqd_connect_end_time_hi;
+	uint32_t cp_mqd_connect_end_wf_count;
+	uint32_t cp_mqd_connect_end_pq_rptr;
+	uint32_t cp_mqd_connect_end_pq_wptr;
+	uint32_t cp_mqd_connect_end_ib_rptr;
+	uint32_t reserved_96;
+	uint32_t reserved_97;
+	uint32_t reserved_98;
+	uint32_t reserved_99;
+	uint32_t iqtimer_pkt_header;
+	uint32_t iqtimer_pkt_dw0;
+	uint32_t iqtimer_pkt_dw1;
+	uint32_t iqtimer_pkt_dw2;
+	uint32_t iqtimer_pkt_dw3;
+	uint32_t iqtimer_pkt_dw4;
+	uint32_t iqtimer_pkt_dw5;
+	uint32_t iqtimer_pkt_dw6;
+	uint32_t reserved_108;
+	uint32_t reserved_109;
+	uint32_t reserved_110;
+	uint32_t reserved_111;
+	uint32_t queue_doorbell_id0;
+	uint32_t queue_doorbell_id1;
+	uint32_t queue_doorbell_id2;
+	uint32_t queue_doorbell_id3;
+	uint32_t queue_doorbell_id4;
+	uint32_t queue_doorbell_id5;
+	uint32_t queue_doorbell_id6;
+	uint32_t queue_doorbell_id7;
+	uint32_t queue_doorbell_id8;
+	uint32_t queue_doorbell_id9;
+	uint32_t queue_doorbell_id10;
+	uint32_t queue_doorbell_id11;
+	uint32_t queue_doorbell_id12;
+	uint32_t queue_doorbell_id13;
+	uint32_t queue_doorbell_id14;
+	uint32_t queue_doorbell_id15;
+};
+
 #endif
diff --git a/drivers/gpu/drm/radeon/cik_sdma.c b/drivers/gpu/drm/radeon/cik_sdma.c
index d748963af08b..dde5c7e29eb2 100644
--- a/drivers/gpu/drm/radeon/cik_sdma.c
+++ b/drivers/gpu/drm/radeon/cik_sdma.c
@@ -134,7 +134,7 @@ void cik_sdma_ring_ib_execute(struct radeon_device *rdev,
 			      struct radeon_ib *ib)
 {
 	struct radeon_ring *ring = &rdev->ring[ib->ring];
-	u32 extra_bits = (ib->vm ? ib->vm->id : 0) & 0xf;
+	u32 extra_bits = (ib->vm ? ib->vm->ids[ib->ring].id : 0) & 0xf;
 
 	if (rdev->wb.enabled) {
 		u32 next_rptr = ring->wptr + 5;
@@ -541,31 +541,27 @@ struct radeon_fence *cik_copy_dma(struct radeon_device *rdev,
 				  unsigned num_gpu_pages,
 				  struct reservation_object *resv)
 {
-	struct radeon_semaphore *sem = NULL;
 	struct radeon_fence *fence;
+	struct radeon_sync sync;
 	int ring_index = rdev->asic->copy.dma_ring_index;
 	struct radeon_ring *ring = &rdev->ring[ring_index];
 	u32 size_in_bytes, cur_size_in_bytes;
 	int i, num_loops;
 	int r = 0;
 
-	r = radeon_semaphore_create(rdev, &sem);
-	if (r) {
-		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		return ERR_PTR(r);
-	}
+	radeon_sync_create(&sync);
 
 	size_in_bytes = (num_gpu_pages << RADEON_GPU_PAGE_SHIFT);
 	num_loops = DIV_ROUND_UP(size_in_bytes, 0x1fffff);
 	r = radeon_ring_lock(rdev, ring, num_loops * 7 + 14);
 	if (r) {
 		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
-	radeon_semaphore_sync_resv(rdev, sem, resv, false);
-	radeon_semaphore_sync_rings(rdev, sem, ring->idx);
+	radeon_sync_resv(rdev, &sync, resv, false);
+	radeon_sync_rings(rdev, &sync, ring->idx);
 
 	for (i = 0; i < num_loops; i++) {
 		cur_size_in_bytes = size_in_bytes;
@@ -586,12 +582,12 @@ struct radeon_fence *cik_copy_dma(struct radeon_device *rdev,
 	r = radeon_fence_emit(rdev, &fence, ring->idx);
 	if (r) {
 		radeon_ring_unlock_undo(rdev, ring);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
 	radeon_ring_unlock_commit(rdev, ring, false);
-	radeon_semaphore_free(rdev, &sem, fence);
+	radeon_sync_free(rdev, &sync, fence);
 
 	return fence;
 }
@@ -904,25 +900,21 @@ void cik_sdma_vm_pad_ib(struct radeon_ib *ib)
  * Update the page table base and flush the VM TLB
  * using sDMA (CIK).
  */
-void cik_dma_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm)
+void cik_dma_vm_flush(struct radeon_device *rdev, struct radeon_ring *ring,
+		      unsigned vm_id, uint64_t pd_addr)
 {
-	struct radeon_ring *ring = &rdev->ring[ridx];
-
-	if (vm == NULL)
-		return;
-
 	radeon_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_SRBM_WRITE, 0, 0xf000));
-	if (vm->id < 8) {
-		radeon_ring_write(ring, (VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (vm->id << 2)) >> 2);
+	if (vm_id < 8) {
+		radeon_ring_write(ring, (VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (vm_id << 2)) >> 2);
 	} else {
-		radeon_ring_write(ring, (VM_CONTEXT8_PAGE_TABLE_BASE_ADDR + ((vm->id - 8) << 2)) >> 2);
+		radeon_ring_write(ring, (VM_CONTEXT8_PAGE_TABLE_BASE_ADDR + ((vm_id - 8) << 2)) >> 2);
 	}
-	radeon_ring_write(ring, vm->pd_gpu_addr >> 12);
+	radeon_ring_write(ring, pd_addr >> 12);
 
 	/* update SH_MEM_* regs */
 	radeon_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_SRBM_WRITE, 0, 0xf000));
 	radeon_ring_write(ring, SRBM_GFX_CNTL >> 2);
-	radeon_ring_write(ring, VMID(vm->id));
+	radeon_ring_write(ring, VMID(vm_id));
 
 	radeon_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_SRBM_WRITE, 0, 0xf000));
 	radeon_ring_write(ring, SH_MEM_BASES >> 2);
@@ -945,11 +937,11 @@ void cik_dma_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm
 	radeon_ring_write(ring, VMID(0));
 
 	/* flush HDP */
-	cik_sdma_hdp_flush_ring_emit(rdev, ridx);
+	cik_sdma_hdp_flush_ring_emit(rdev, ring->idx);
 
 	/* flush TLB */
 	radeon_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_SRBM_WRITE, 0, 0xf000));
 	radeon_ring_write(ring, VM_INVALIDATE_REQUEST >> 2);
-	radeon_ring_write(ring, 1 << vm->id);
+	radeon_ring_write(ring, 1 << vm_id);
 }
 
diff --git a/drivers/gpu/drm/radeon/cikd.h b/drivers/gpu/drm/radeon/cikd.h
index 0c6e1b55d968..ba85986febea 100644
--- a/drivers/gpu/drm/radeon/cikd.h
+++ b/drivers/gpu/drm/radeon/cikd.h
@@ -30,6 +30,8 @@
 #define CIK_RB_BITMAP_WIDTH_PER_SH     2
 #define HAWAII_RB_BITMAP_WIDTH_PER_SH  4
 
+#define RADEON_NUM_OF_VMIDS	8
+
 /* DIDT IND registers */
 #define DIDT_SQ_CTRL0                                     0x0
 #       define DIDT_CTRL_EN                               (1 << 0)
@@ -184,7 +186,10 @@
 #define		DIG_THERM_DPM(x)			((x) << 14)
 #define		DIG_THERM_DPM_MASK			0x003FC000
 #define		DIG_THERM_DPM_SHIFT			14
-
+#define	CG_THERMAL_STATUS				0xC0300008
+#define		FDO_PWM_DUTY(x)				((x) << 9)
+#define		FDO_PWM_DUTY_MASK			(0xff << 9)
+#define		FDO_PWM_DUTY_SHIFT			9
 #define	CG_THERMAL_INT					0xC030000C
 #define		CI_DIG_THERM_INTH(x)			((x) << 8)
 #define		CI_DIG_THERM_INTH_MASK			0x0000FF00
@@ -194,7 +199,10 @@
 #define		CI_DIG_THERM_INTL_SHIFT			16
 #define 	THERM_INT_MASK_HIGH			(1 << 24)
 #define 	THERM_INT_MASK_LOW			(1 << 25)
-
+#define	CG_MULT_THERMAL_CTRL				0xC0300010
+#define		TEMP_SEL(x)				((x) << 20)
+#define		TEMP_SEL_MASK				(0xff << 20)
+#define		TEMP_SEL_SHIFT				20
 #define	CG_MULT_THERMAL_STATUS				0xC0300014
 #define		ASIC_MAX_TEMP(x)			((x) << 0)
 #define		ASIC_MAX_TEMP_MASK			0x000001ff
@@ -203,6 +211,36 @@
 #define		CTF_TEMP_MASK				0x0003fe00
 #define		CTF_TEMP_SHIFT				9
 
+#define	CG_FDO_CTRL0					0xC0300064
+#define		FDO_STATIC_DUTY(x)			((x) << 0)
+#define		FDO_STATIC_DUTY_MASK			0x000000FF
+#define		FDO_STATIC_DUTY_SHIFT			0
+#define	CG_FDO_CTRL1					0xC0300068
+#define		FMAX_DUTY100(x)				((x) << 0)
+#define		FMAX_DUTY100_MASK			0x000000FF
+#define		FMAX_DUTY100_SHIFT			0
+#define	CG_FDO_CTRL2					0xC030006C
+#define		TMIN(x)					((x) << 0)
+#define		TMIN_MASK				0x000000FF
+#define		TMIN_SHIFT				0
+#define		FDO_PWM_MODE(x)				((x) << 11)
+#define		FDO_PWM_MODE_MASK			(7 << 11)
+#define		FDO_PWM_MODE_SHIFT			11
+#define		TACH_PWM_RESP_RATE(x)			((x) << 25)
+#define		TACH_PWM_RESP_RATE_MASK			(0x7f << 25)
+#define		TACH_PWM_RESP_RATE_SHIFT		25
+#define CG_TACH_CTRL                                    0xC0300070
+#       define EDGE_PER_REV(x)                          ((x) << 0)
+#       define EDGE_PER_REV_MASK                        (0x7 << 0)
+#       define EDGE_PER_REV_SHIFT                       0
+#       define TARGET_PERIOD(x)                         ((x) << 3)
+#       define TARGET_PERIOD_MASK                       0xfffffff8
+#       define TARGET_PERIOD_SHIFT                      3
+#define CG_TACH_STATUS                                  0xC0300074
+#       define TACH_PERIOD(x)                           ((x) << 0)
+#       define TACH_PERIOD_MASK                         0xffffffff
+#       define TACH_PERIOD_SHIFT                        0
+
 #define CG_ECLK_CNTL                                    0xC05000AC
 #       define ECLK_DIVIDER_MASK                        0x7f
 #       define ECLK_DIR_CNTL_EN                         (1 << 8)
@@ -1137,6 +1175,9 @@
 #define			SH_MEM_ALIGNMENT_MODE_UNALIGNED			3
 #define		DEFAULT_MTYPE(x)				((x) << 4)
 #define		APE1_MTYPE(x)					((x) << 7)
+/* valid for both DEFAULT_MTYPE and APE1_MTYPE */
+#define	MTYPE_CACHED					0
+#define	MTYPE_NONCACHED					3
 
 #define	SX_DEBUG_1					0x9060
 
@@ -1447,6 +1488,16 @@
 #define CP_HQD_ACTIVE                                     0xC91C
 #define CP_HQD_VMID                                       0xC920
 
+#define CP_HQD_PERSISTENT_STATE				0xC924u
+#define	DEFAULT_CP_HQD_PERSISTENT_STATE			(0x33U << 8)
+
+#define CP_HQD_PIPE_PRIORITY				0xC928u
+#define CP_HQD_QUEUE_PRIORITY				0xC92Cu
+#define CP_HQD_QUANTUM					0xC930u
+#define	QUANTUM_EN					1U
+#define	QUANTUM_SCALE_1MS				(1U << 4)
+#define	QUANTUM_DURATION(x)				((x) << 8)
+
 #define CP_HQD_PQ_BASE                                    0xC934
 #define CP_HQD_PQ_BASE_HI                                 0xC938
 #define CP_HQD_PQ_RPTR                                    0xC93C
@@ -1474,12 +1525,32 @@
 #define		PRIV_STATE      			(1 << 30)
 #define		KMD_QUEUE      				(1 << 31)
 
-#define CP_HQD_DEQUEUE_REQUEST                          0xC974
+#define CP_HQD_IB_BASE_ADDR				0xC95Cu
+#define CP_HQD_IB_BASE_ADDR_HI			0xC960u
+#define CP_HQD_IB_RPTR					0xC964u
+#define CP_HQD_IB_CONTROL				0xC968u
+#define	IB_ATC_EN					(1U << 23)
+#define	DEFAULT_MIN_IB_AVAIL_SIZE			(3U << 20)
+
+#define CP_HQD_DEQUEUE_REQUEST			0xC974
+#define	DEQUEUE_REQUEST_DRAIN				1
+#define DEQUEUE_REQUEST_RESET				2
 
 #define CP_MQD_CONTROL                                  0xC99C
 #define		MQD_VMID(x)				((x) << 0)
 #define		MQD_VMID_MASK      			(0xf << 0)
 
+#define CP_HQD_SEMA_CMD					0xC97Cu
+#define CP_HQD_MSG_TYPE					0xC980u
+#define CP_HQD_ATOMIC0_PREOP_LO			0xC984u
+#define CP_HQD_ATOMIC0_PREOP_HI			0xC988u
+#define CP_HQD_ATOMIC1_PREOP_LO			0xC98Cu
+#define CP_HQD_ATOMIC1_PREOP_HI			0xC990u
+#define CP_HQD_HQ_SCHEDULER0			0xC994u
+#define CP_HQD_HQ_SCHEDULER1			0xC998u
+
+#define SH_STATIC_MEM_CONFIG			0x9604u
+
 #define DB_RENDER_CONTROL                               0x28000
 
 #define PA_SC_RASTER_CONFIG                             0x28350
@@ -2069,4 +2140,20 @@
 #define VCE_CMD_IB_AUTO		0x00000005
 #define VCE_CMD_SEMAPHORE	0x00000006
 
+#define ATC_VMID0_PASID_MAPPING					0x339Cu
+#define	ATC_VMID_PASID_MAPPING_UPDATE_STATUS	0x3398u
+#define	ATC_VMID_PASID_MAPPING_VALID				(1U << 31)
+
+#define ATC_VM_APERTURE0_CNTL					0x3310u
+#define	ATS_ACCESS_MODE_NEVER						0
+#define	ATS_ACCESS_MODE_ALWAYS						1
+
+#define ATC_VM_APERTURE0_CNTL2					0x3318u
+#define ATC_VM_APERTURE0_HIGH_ADDR				0x3308u
+#define ATC_VM_APERTURE0_LOW_ADDR				0x3300u
+#define ATC_VM_APERTURE1_CNTL					0x3314u
+#define ATC_VM_APERTURE1_CNTL2					0x331Cu
+#define ATC_VM_APERTURE1_HIGH_ADDR				0x330Cu
+#define ATC_VM_APERTURE1_LOW_ADDR				0x3304u
+
 #endif
diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c
index 5c8b358f9fba..924b1b7ab455 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -35,7 +35,7 @@
 #define MIN(a,b)                   (((a)<(b))?(a):(b))
 
 int r600_dma_cs_next_reloc(struct radeon_cs_parser *p,
-			   struct radeon_cs_reloc **cs_reloc);
+			   struct radeon_bo_list **cs_reloc);
 struct evergreen_cs_track {
 	u32			group_size;
 	u32			nbanks;
@@ -1094,7 +1094,7 @@ static int evergreen_cs_parse_packet0(struct radeon_cs_parser *p,
 static int evergreen_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx)
 {
 	struct evergreen_cs_track *track = (struct evergreen_cs_track *)p->track;
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	u32 last_reg;
 	u32 m, i, tmp, *ib;
 	int r;
@@ -1792,7 +1792,7 @@ static bool evergreen_is_safe_reg(struct radeon_cs_parser *p, u32 reg, u32 idx)
 static int evergreen_packet3_check(struct radeon_cs_parser *p,
 				   struct radeon_cs_packet *pkt)
 {
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	struct evergreen_cs_track *track;
 	volatile u32 *ib;
 	unsigned idx;
@@ -2661,7 +2661,7 @@ int evergreen_cs_parse(struct radeon_cs_parser *p)
 			p->track = NULL;
 			return r;
 		}
-	} while (p->idx < p->chunks[p->chunk_ib_idx].length_dw);
+	} while (p->idx < p->chunk_ib->length_dw);
 #if 0
 	for (r = 0; r < p->ib.length_dw; r++) {
 		printk(KERN_INFO "%05d  0x%08X\n", r, p->ib.ptr[r]);
@@ -2684,8 +2684,8 @@ int evergreen_cs_parse(struct radeon_cs_parser *p)
  **/
 int evergreen_dma_cs_parse(struct radeon_cs_parser *p)
 {
-	struct radeon_cs_chunk *ib_chunk = &p->chunks[p->chunk_ib_idx];
-	struct radeon_cs_reloc *src_reloc, *dst_reloc, *dst2_reloc;
+	struct radeon_cs_chunk *ib_chunk = p->chunk_ib;
+	struct radeon_bo_list *src_reloc, *dst_reloc, *dst2_reloc;
 	u32 header, cmd, count, sub_cmd;
 	volatile u32 *ib = p->ib.ptr;
 	u32 idx;
@@ -3100,7 +3100,7 @@ int evergreen_dma_cs_parse(struct radeon_cs_parser *p)
 			DRM_ERROR("Unknown packet type %d at %d !\n", cmd, idx);
 			return -EINVAL;
 		}
-	} while (p->idx < p->chunks[p->chunk_ib_idx].length_dw);
+	} while (p->idx < p->chunk_ib->length_dw);
 #if 0
 	for (r = 0; r < p->ib->length_dw; r++) {
 		printk(KERN_INFO "%05d  0x%08X\n", r, p->ib.ptr[r]);
diff --git a/drivers/gpu/drm/radeon/evergreen_dma.c b/drivers/gpu/drm/radeon/evergreen_dma.c
index 66bcfadeedd1..96535aa8659c 100644
--- a/drivers/gpu/drm/radeon/evergreen_dma.c
+++ b/drivers/gpu/drm/radeon/evergreen_dma.c
@@ -110,31 +110,27 @@ struct radeon_fence *evergreen_copy_dma(struct radeon_device *rdev,
 					unsigned num_gpu_pages,
 					struct reservation_object *resv)
 {
-	struct radeon_semaphore *sem = NULL;
 	struct radeon_fence *fence;
+	struct radeon_sync sync;
 	int ring_index = rdev->asic->copy.dma_ring_index;
 	struct radeon_ring *ring = &rdev->ring[ring_index];
 	u32 size_in_dw, cur_size_in_dw;
 	int i, num_loops;
 	int r = 0;
 
-	r = radeon_semaphore_create(rdev, &sem);
-	if (r) {
-		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		return ERR_PTR(r);
-	}
+	radeon_sync_create(&sync);
 
 	size_in_dw = (num_gpu_pages << RADEON_GPU_PAGE_SHIFT) / 4;
 	num_loops = DIV_ROUND_UP(size_in_dw, 0xfffff);
 	r = radeon_ring_lock(rdev, ring, num_loops * 5 + 11);
 	if (r) {
 		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
-	radeon_semaphore_sync_resv(rdev, sem, resv, false);
-	radeon_semaphore_sync_rings(rdev, sem, ring->idx);
+	radeon_sync_resv(rdev, &sync, resv, false);
+	radeon_sync_rings(rdev, &sync, ring->idx);
 
 	for (i = 0; i < num_loops; i++) {
 		cur_size_in_dw = size_in_dw;
@@ -153,12 +149,12 @@ struct radeon_fence *evergreen_copy_dma(struct radeon_device *rdev,
 	r = radeon_fence_emit(rdev, &fence, ring->idx);
 	if (r) {
 		radeon_ring_unlock_undo(rdev, ring);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
 	radeon_ring_unlock_commit(rdev, ring, false);
-	radeon_semaphore_free(rdev, &sem, fence);
+	radeon_sync_free(rdev, &sync, fence);
 
 	return fence;
 }
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index 3faee58946dd..360de9f1f491 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1373,6 +1373,7 @@ void cayman_fence_ring_emit(struct radeon_device *rdev,
 void cayman_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib)
 {
 	struct radeon_ring *ring = &rdev->ring[ib->ring];
+	unsigned vm_id = ib->vm ? ib->vm->ids[ib->ring].id : 0;
 	u32 cp_coher_cntl = PACKET3_FULL_CACHE_ENA | PACKET3_TC_ACTION_ENA |
 		PACKET3_SH_ACTION_ENA;
 
@@ -1395,15 +1396,14 @@ void cayman_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib)
 #endif
 			  (ib->gpu_addr & 0xFFFFFFFC));
 	radeon_ring_write(ring, upper_32_bits(ib->gpu_addr) & 0xFF);
-	radeon_ring_write(ring, ib->length_dw | 
-			  (ib->vm ? (ib->vm->id << 24) : 0));
+	radeon_ring_write(ring, ib->length_dw | (vm_id << 24));
 
 	/* flush read cache over gart for this vmid */
 	radeon_ring_write(ring, PACKET3(PACKET3_SURFACE_SYNC, 3));
 	radeon_ring_write(ring, PACKET3_ENGINE_ME | cp_coher_cntl);
 	radeon_ring_write(ring, 0xFFFFFFFF);
 	radeon_ring_write(ring, 0);
-	radeon_ring_write(ring, ((ib->vm ? ib->vm->id : 0) << 24) | 10); /* poll interval */
+	radeon_ring_write(ring, (vm_id << 24) | 10); /* poll interval */
 }
 
 static void cayman_cp_enable(struct radeon_device *rdev, bool enable)
@@ -2502,15 +2502,11 @@ void cayman_vm_decode_fault(struct radeon_device *rdev,
  * Update the page table base and flush the VM TLB
  * using the CP (cayman-si).
  */
-void cayman_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm)
+void cayman_vm_flush(struct radeon_device *rdev, struct radeon_ring *ring,
+		     unsigned vm_id, uint64_t pd_addr)
 {
-	struct radeon_ring *ring = &rdev->ring[ridx];
-
-	if (vm == NULL)
-		return;
-
-	radeon_ring_write(ring, PACKET0(VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (vm->id << 2), 0));
-	radeon_ring_write(ring, vm->pd_gpu_addr >> 12);
+	radeon_ring_write(ring, PACKET0(VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (vm_id << 2), 0));
+	radeon_ring_write(ring, pd_addr >> 12);
 
 	/* flush hdp cache */
 	radeon_ring_write(ring, PACKET0(HDP_MEM_COHERENCY_FLUSH_CNTL, 0));
@@ -2518,7 +2514,7 @@ void cayman_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm)
 
 	/* bits 0-7 are the VM contexts0-7 */
 	radeon_ring_write(ring, PACKET0(VM_INVALIDATE_REQUEST, 0));
-	radeon_ring_write(ring, 1 << vm->id);
+	radeon_ring_write(ring, 1 << vm_id);
 
 	/* sync PFP to ME, otherwise we might get invalid PFP reads */
 	radeon_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0));
diff --git a/drivers/gpu/drm/radeon/ni_dma.c b/drivers/gpu/drm/radeon/ni_dma.c
index f26f0a9fb522..50f88611ff60 100644
--- a/drivers/gpu/drm/radeon/ni_dma.c
+++ b/drivers/gpu/drm/radeon/ni_dma.c
@@ -123,6 +123,7 @@ void cayman_dma_ring_ib_execute(struct radeon_device *rdev,
 				struct radeon_ib *ib)
 {
 	struct radeon_ring *ring = &rdev->ring[ib->ring];
+	unsigned vm_id = ib->vm ? ib->vm->ids[ib->ring].id : 0;
 
 	if (rdev->wb.enabled) {
 		u32 next_rptr = ring->wptr + 4;
@@ -140,7 +141,7 @@ void cayman_dma_ring_ib_execute(struct radeon_device *rdev,
 	 */
 	while ((ring->wptr & 7) != 5)
 		radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_NOP, 0, 0, 0));
-	radeon_ring_write(ring, DMA_IB_PACKET(DMA_PACKET_INDIRECT_BUFFER, ib->vm ? ib->vm->id : 0, 0));
+	radeon_ring_write(ring, DMA_IB_PACKET(DMA_PACKET_INDIRECT_BUFFER, vm_id, 0));
 	radeon_ring_write(ring, (ib->gpu_addr & 0xFFFFFFE0));
 	radeon_ring_write(ring, (ib->length_dw << 12) | (upper_32_bits(ib->gpu_addr) & 0xFF));
 
@@ -446,16 +447,12 @@ void cayman_dma_vm_pad_ib(struct radeon_ib *ib)
 		ib->ptr[ib->length_dw++] = DMA_PACKET(DMA_PACKET_NOP, 0, 0, 0);
 }
 
-void cayman_dma_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm)
+void cayman_dma_vm_flush(struct radeon_device *rdev, struct radeon_ring *ring,
+			 unsigned vm_id, uint64_t pd_addr)
 {
-	struct radeon_ring *ring = &rdev->ring[ridx];
-
-	if (vm == NULL)
-		return;
-
 	radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_SRBM_WRITE, 0, 0, 0));
-	radeon_ring_write(ring, (0xf << 16) | ((VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (vm->id << 2)) >> 2));
-	radeon_ring_write(ring, vm->pd_gpu_addr >> 12);
+	radeon_ring_write(ring, (0xf << 16) | ((VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (vm_id << 2)) >> 2));
+	radeon_ring_write(ring, pd_addr >> 12);
 
 	/* flush hdp cache */
 	radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_SRBM_WRITE, 0, 0, 0));
@@ -465,6 +462,6 @@ void cayman_dma_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm
 	/* bits 0-7 are the VM contexts0-7 */
 	radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_SRBM_WRITE, 0, 0, 0));
 	radeon_ring_write(ring, (0xf << 16) | (VM_INVALIDATE_REQUEST >> 2));
-	radeon_ring_write(ring, 1 << vm->id);
+	radeon_ring_write(ring, 1 << vm_id);
 }
 
diff --git a/drivers/gpu/drm/radeon/ppsmc.h b/drivers/gpu/drm/radeon/ppsmc.h
index 5670b8291285..7e5724a12f8b 100644
--- a/drivers/gpu/drm/radeon/ppsmc.h
+++ b/drivers/gpu/drm/radeon/ppsmc.h
@@ -56,6 +56,14 @@
 #define PPSMC_STATEFLAG_DEEPSLEEP_THROTTLE 0x20
 #define PPSMC_STATEFLAG_DEEPSLEEP_BYPASS   0x40
 
+#define FDO_MODE_HARDWARE 0
+#define FDO_MODE_PIECE_WISE_LINEAR 1
+
+enum FAN_CONTROL {
+	FAN_CONTROL_FUZZY,
+	FAN_CONTROL_TABLE
+};
+
 #define PPSMC_Result_OK             ((uint8_t)0x01)
 #define PPSMC_Result_Failed         ((uint8_t)0xFF)
 
@@ -79,6 +87,8 @@ typedef uint8_t PPSMC_Result;
 #define PPSMC_MSG_DisableCac                ((uint8_t)0x54)
 #define PPSMC_TDPClampingActive             ((uint8_t)0x59)
 #define PPSMC_TDPClampingInactive           ((uint8_t)0x5A)
+#define PPSMC_StartFanControl               ((uint8_t)0x5B)
+#define PPSMC_StopFanControl                ((uint8_t)0x5C)
 #define PPSMC_MSG_NoDisplay                 ((uint8_t)0x5D)
 #define PPSMC_MSG_HasDisplay                ((uint8_t)0x5E)
 #define PPSMC_MSG_UVDPowerOFF               ((uint8_t)0x60)
@@ -106,6 +116,7 @@ typedef uint8_t PPSMC_Result;
 #define PPSMC_MSG_SAMUDPM_SetEnabledMask      ((uint16_t) 0x130)
 #define PPSMC_MSG_MCLKDPM_ForceState          ((uint16_t) 0x131)
 #define PPSMC_MSG_MCLKDPM_NoForcedLevel       ((uint16_t) 0x132)
+#define PPSMC_MSG_Thermal_Cntl_Disable        ((uint16_t) 0x133)
 #define PPSMC_MSG_Voltage_Cntl_Disable        ((uint16_t) 0x135)
 #define PPSMC_MSG_PCIeDPM_Enable              ((uint16_t) 0x136)
 #define PPSMC_MSG_PCIeDPM_Disable             ((uint16_t) 0x13d)
@@ -149,6 +160,10 @@ typedef uint8_t PPSMC_Result;
 #define PPSMC_MSG_MASTER_DeepSleep_ON         ((uint16_t) 0x18F)
 #define PPSMC_MSG_MASTER_DeepSleep_OFF        ((uint16_t) 0x190)
 #define PPSMC_MSG_Remove_DC_Clamp             ((uint16_t) 0x191)
+#define PPSMC_MSG_SetFanPwmMax                ((uint16_t) 0x19A)
+
+#define PPSMC_MSG_ENABLE_THERMAL_DPM          ((uint16_t) 0x19C)
+#define PPSMC_MSG_DISABLE_THERMAL_DPM         ((uint16_t) 0x19D)
 
 #define PPSMC_MSG_API_GetSclkFrequency        ((uint16_t) 0x200)
 #define PPSMC_MSG_API_GetMclkFrequency        ((uint16_t) 0x201)
@@ -157,10 +172,11 @@ typedef uint8_t PPSMC_Result;
 #define PPSMC_MSG_DPM_Config                ((uint32_t) 0x102)
 #define PPSMC_MSG_DPM_ForceState            ((uint32_t) 0x104)
 #define PPSMC_MSG_PG_SIMD_Config            ((uint32_t) 0x108)
-#define PPSMC_MSG_DPM_N_LevelsDisabled      ((uint32_t) 0x112)
+#define PPSMC_MSG_Thermal_Cntl_Enable       ((uint32_t) 0x10a)
 #define PPSMC_MSG_Voltage_Cntl_Enable       ((uint32_t) 0x109)
 #define PPSMC_MSG_VCEPowerOFF               ((uint32_t) 0x10e)
 #define PPSMC_MSG_VCEPowerON                ((uint32_t) 0x10f)
+#define PPSMC_MSG_DPM_N_LevelsDisabled      ((uint32_t) 0x112)
 #define PPSMC_MSG_DCE_RemoveVoltageAdjustment   ((uint32_t) 0x11d)
 #define PPSMC_MSG_DCE_AllowVoltageAdjustment    ((uint32_t) 0x11e)
 #define PPSMC_MSG_EnableBAPM                ((uint32_t) 0x120)
diff --git a/drivers/gpu/drm/radeon/pptable.h b/drivers/gpu/drm/radeon/pptable.h
index 2d532996c697..4c2eec49dadc 100644
--- a/drivers/gpu/drm/radeon/pptable.h
+++ b/drivers/gpu/drm/radeon/pptable.h
@@ -96,6 +96,14 @@ typedef struct _ATOM_PPLIB_FANTABLE2
     USHORT  usTMax;                          // The max temperature
 } ATOM_PPLIB_FANTABLE2;
 
+typedef struct _ATOM_PPLIB_FANTABLE3
+{
+	ATOM_PPLIB_FANTABLE2 basicTable2;
+	UCHAR ucFanControlMode;
+	USHORT usFanPWMMax;
+	USHORT usFanOutputSensitivity;
+} ATOM_PPLIB_FANTABLE3;
+
 typedef struct _ATOM_PPLIB_EXTENDEDHEADER
 {
     USHORT  usSize;
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index b53b31a7b76f..74f06d540591 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -1254,7 +1254,7 @@ int r100_reloc_pitch_offset(struct radeon_cs_parser *p,
 	int r;
 	u32 tile_flags = 0;
 	u32 tmp;
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	u32 value;
 
 	r = radeon_cs_packet_next_reloc(p, &reloc, 0);
@@ -1293,7 +1293,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p,
 			     int idx)
 {
 	unsigned c, i;
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	struct r100_cs_track *track;
 	int r = 0;
 	volatile uint32_t *ib;
@@ -1542,7 +1542,7 @@ static int r100_packet0_check(struct radeon_cs_parser *p,
 			      struct radeon_cs_packet *pkt,
 			      unsigned idx, unsigned reg)
 {
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	struct r100_cs_track *track;
 	volatile uint32_t *ib;
 	uint32_t tmp;
@@ -1901,7 +1901,7 @@ int r100_cs_track_check_pkt3_indx_buffer(struct radeon_cs_parser *p,
 static int r100_packet3_check(struct radeon_cs_parser *p,
 			      struct radeon_cs_packet *pkt)
 {
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	struct r100_cs_track *track;
 	unsigned idx;
 	volatile uint32_t *ib;
@@ -2061,7 +2061,7 @@ int r100_cs_parse(struct radeon_cs_parser *p)
 		}
 		if (r)
 			return r;
-	} while (p->idx < p->chunks[p->chunk_ib_idx].length_dw);
+	} while (p->idx < p->chunk_ib->length_dw);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/r200.c b/drivers/gpu/drm/radeon/r200.c
index 732d4938aab7..c70e6d5bcd19 100644
--- a/drivers/gpu/drm/radeon/r200.c
+++ b/drivers/gpu/drm/radeon/r200.c
@@ -146,7 +146,7 @@ int r200_packet0_check(struct radeon_cs_parser *p,
 		       struct radeon_cs_packet *pkt,
 		       unsigned idx, unsigned reg)
 {
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	struct r100_cs_track *track;
 	volatile uint32_t *ib;
 	uint32_t tmp;
diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c
index 1bc4704034ce..064ad5569cca 100644
--- a/drivers/gpu/drm/radeon/r300.c
+++ b/drivers/gpu/drm/radeon/r300.c
@@ -598,7 +598,7 @@ static int r300_packet0_check(struct radeon_cs_parser *p,
 		struct radeon_cs_packet *pkt,
 		unsigned idx, unsigned reg)
 {
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	struct r100_cs_track *track;
 	volatile uint32_t *ib;
 	uint32_t tmp, tile_flags = 0;
@@ -1142,7 +1142,7 @@ fail:
 static int r300_packet3_check(struct radeon_cs_parser *p,
 			      struct radeon_cs_packet *pkt)
 {
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	struct r100_cs_track *track;
 	volatile uint32_t *ib;
 	unsigned idx;
@@ -1283,7 +1283,7 @@ int r300_cs_parse(struct radeon_cs_parser *p)
 		if (r) {
 			return r;
 		}
-	} while (p->idx < p->chunks[p->chunk_ib_idx].length_dw);
+	} while (p->idx < p->chunk_ib->length_dw);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 56b02927cd3d..ef5d6066fa5b 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2889,31 +2889,27 @@ struct radeon_fence *r600_copy_cpdma(struct radeon_device *rdev,
 				     unsigned num_gpu_pages,
 				     struct reservation_object *resv)
 {
-	struct radeon_semaphore *sem = NULL;
 	struct radeon_fence *fence;
+	struct radeon_sync sync;
 	int ring_index = rdev->asic->copy.blit_ring_index;
 	struct radeon_ring *ring = &rdev->ring[ring_index];
 	u32 size_in_bytes, cur_size_in_bytes, tmp;
 	int i, num_loops;
 	int r = 0;
 
-	r = radeon_semaphore_create(rdev, &sem);
-	if (r) {
-		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		return ERR_PTR(r);
-	}
+	radeon_sync_create(&sync);
 
 	size_in_bytes = (num_gpu_pages << RADEON_GPU_PAGE_SHIFT);
 	num_loops = DIV_ROUND_UP(size_in_bytes, 0x1fffff);
 	r = radeon_ring_lock(rdev, ring, num_loops * 6 + 24);
 	if (r) {
 		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
-	radeon_semaphore_sync_resv(rdev, sem, resv, false);
-	radeon_semaphore_sync_rings(rdev, sem, ring->idx);
+	radeon_sync_resv(rdev, &sync, resv, false);
+	radeon_sync_rings(rdev, &sync, ring->idx);
 
 	radeon_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
 	radeon_ring_write(ring, (WAIT_UNTIL - PACKET3_SET_CONFIG_REG_OFFSET) >> 2);
@@ -2942,12 +2938,12 @@ struct radeon_fence *r600_copy_cpdma(struct radeon_device *rdev,
 	r = radeon_fence_emit(rdev, &fence, ring->idx);
 	if (r) {
 		radeon_ring_unlock_undo(rdev, ring);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
 	radeon_ring_unlock_commit(rdev, ring, false);
-	radeon_semaphore_free(rdev, &sem, fence);
+	radeon_sync_free(rdev, &sync, fence);
 
 	return fence;
 }
diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c
index c47537a1ddba..acc1f99c84d9 100644
--- a/drivers/gpu/drm/radeon/r600_cs.c
+++ b/drivers/gpu/drm/radeon/r600_cs.c
@@ -969,7 +969,7 @@ static int r600_cs_parse_packet0(struct radeon_cs_parser *p,
 static int r600_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx)
 {
 	struct r600_cs_track *track = (struct r600_cs_track *)p->track;
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	u32 m, i, tmp, *ib;
 	int r;
 
@@ -1626,7 +1626,7 @@ static bool r600_is_safe_reg(struct radeon_cs_parser *p, u32 reg, u32 idx)
 static int r600_packet3_check(struct radeon_cs_parser *p,
 				struct radeon_cs_packet *pkt)
 {
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	struct r600_cs_track *track;
 	volatile u32 *ib;
 	unsigned idx;
@@ -2316,7 +2316,7 @@ int r600_cs_parse(struct radeon_cs_parser *p)
 			p->track = NULL;
 			return r;
 		}
-	} while (p->idx < p->chunks[p->chunk_ib_idx].length_dw);
+	} while (p->idx < p->chunk_ib->length_dw);
 #if 0
 	for (r = 0; r < p->ib.length_dw; r++) {
 		printk(KERN_INFO "%05d  0x%08X\n", r, p->ib.ptr[r]);
@@ -2351,10 +2351,10 @@ static void r600_cs_parser_fini(struct radeon_cs_parser *parser, int error)
 
 static int r600_cs_parser_relocs_legacy(struct radeon_cs_parser *p)
 {
-	if (p->chunk_relocs_idx == -1) {
+	if (p->chunk_relocs == NULL) {
 		return 0;
 	}
-	p->relocs = kzalloc(sizeof(struct radeon_cs_reloc), GFP_KERNEL);
+	p->relocs = kzalloc(sizeof(struct radeon_bo_list), GFP_KERNEL);
 	if (p->relocs == NULL) {
 		return -ENOMEM;
 	}
@@ -2398,7 +2398,7 @@ int r600_cs_legacy(struct drm_device *dev, void *data, struct drm_file *filp,
 	/* Copy the packet into the IB, the parser will read from the
 	 * input memory (cached) and write to the IB (which can be
 	 * uncached). */
-	ib_chunk = &parser.chunks[parser.chunk_ib_idx];
+	ib_chunk = parser.chunk_ib;
 	parser.ib.length_dw = ib_chunk->length_dw;
 	*l = parser.ib.length_dw;
 	if (copy_from_user(ib, ib_chunk->user_ptr, ib_chunk->length_dw * 4)) {
@@ -2435,24 +2435,24 @@ void r600_cs_legacy_init(void)
  * GPU offset using the provided start.
  **/
 int r600_dma_cs_next_reloc(struct radeon_cs_parser *p,
-			   struct radeon_cs_reloc **cs_reloc)
+			   struct radeon_bo_list **cs_reloc)
 {
 	struct radeon_cs_chunk *relocs_chunk;
 	unsigned idx;
 
 	*cs_reloc = NULL;
-	if (p->chunk_relocs_idx == -1) {
+	if (p->chunk_relocs == NULL) {
 		DRM_ERROR("No relocation chunk !\n");
 		return -EINVAL;
 	}
-	relocs_chunk = &p->chunks[p->chunk_relocs_idx];
+	relocs_chunk = p->chunk_relocs;
 	idx = p->dma_reloc_idx;
 	if (idx >= p->nrelocs) {
 		DRM_ERROR("Relocs at %d after relocations chunk end %d !\n",
 			  idx, p->nrelocs);
 		return -EINVAL;
 	}
-	*cs_reloc = p->relocs_ptr[idx];
+	*cs_reloc = &p->relocs[idx];
 	p->dma_reloc_idx++;
 	return 0;
 }
@@ -2472,8 +2472,8 @@ int r600_dma_cs_next_reloc(struct radeon_cs_parser *p,
  **/
 int r600_dma_cs_parse(struct radeon_cs_parser *p)
 {
-	struct radeon_cs_chunk *ib_chunk = &p->chunks[p->chunk_ib_idx];
-	struct radeon_cs_reloc *src_reloc, *dst_reloc;
+	struct radeon_cs_chunk *ib_chunk = p->chunk_ib;
+	struct radeon_bo_list *src_reloc, *dst_reloc;
 	u32 header, cmd, count, tiled;
 	volatile u32 *ib = p->ib.ptr;
 	u32 idx, idx_value;
@@ -2619,7 +2619,7 @@ int r600_dma_cs_parse(struct radeon_cs_parser *p)
 			DRM_ERROR("Unknown packet type %d at %d !\n", cmd, idx);
 			return -EINVAL;
 		}
-	} while (p->idx < p->chunks[p->chunk_ib_idx].length_dw);
+	} while (p->idx < p->chunk_ib->length_dw);
 #if 0
 	for (r = 0; r < p->ib->length_dw; r++) {
 		printk(KERN_INFO "%05d  0x%08X\n", r, p->ib.ptr[r]);
diff --git a/drivers/gpu/drm/radeon/r600_dma.c b/drivers/gpu/drm/radeon/r600_dma.c
index cf0df45d455e..d2dd29ab24fa 100644
--- a/drivers/gpu/drm/radeon/r600_dma.c
+++ b/drivers/gpu/drm/radeon/r600_dma.c
@@ -441,31 +441,27 @@ struct radeon_fence *r600_copy_dma(struct radeon_device *rdev,
 				   unsigned num_gpu_pages,
 				   struct reservation_object *resv)
 {
-	struct radeon_semaphore *sem = NULL;
 	struct radeon_fence *fence;
+	struct radeon_sync sync;
 	int ring_index = rdev->asic->copy.dma_ring_index;
 	struct radeon_ring *ring = &rdev->ring[ring_index];
 	u32 size_in_dw, cur_size_in_dw;
 	int i, num_loops;
 	int r = 0;
 
-	r = radeon_semaphore_create(rdev, &sem);
-	if (r) {
-		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		return ERR_PTR(r);
-	}
+	radeon_sync_create(&sync);
 
 	size_in_dw = (num_gpu_pages << RADEON_GPU_PAGE_SHIFT) / 4;
 	num_loops = DIV_ROUND_UP(size_in_dw, 0xFFFE);
 	r = radeon_ring_lock(rdev, ring, num_loops * 4 + 8);
 	if (r) {
 		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
-	radeon_semaphore_sync_resv(rdev, sem, resv, false);
-	radeon_semaphore_sync_rings(rdev, sem, ring->idx);
+	radeon_sync_resv(rdev, &sync, resv, false);
+	radeon_sync_rings(rdev, &sync, ring->idx);
 
 	for (i = 0; i < num_loops; i++) {
 		cur_size_in_dw = size_in_dw;
@@ -484,12 +480,12 @@ struct radeon_fence *r600_copy_dma(struct radeon_device *rdev,
 	r = radeon_fence_emit(rdev, &fence, ring->idx);
 	if (r) {
 		radeon_ring_unlock_undo(rdev, ring);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
 	radeon_ring_unlock_commit(rdev, ring, false);
-	radeon_semaphore_free(rdev, &sem, fence);
+	radeon_sync_free(rdev, &sync, fence);
 
 	return fence;
 }
diff --git a/drivers/gpu/drm/radeon/r600_dpm.c b/drivers/gpu/drm/radeon/r600_dpm.c
index b5c73df8e202..843b65f46ece 100644
--- a/drivers/gpu/drm/radeon/r600_dpm.c
+++ b/drivers/gpu/drm/radeon/r600_dpm.c
@@ -811,6 +811,7 @@ union power_info {
 union fan_info {
 	struct _ATOM_PPLIB_FANTABLE fan;
 	struct _ATOM_PPLIB_FANTABLE2 fan2;
+	struct _ATOM_PPLIB_FANTABLE3 fan3;
 };
 
 static int r600_parse_clk_voltage_dep_table(struct radeon_clock_voltage_dependency_table *radeon_table,
@@ -900,6 +901,14 @@ int r600_parse_extended_power_table(struct radeon_device *rdev)
 			else
 				rdev->pm.dpm.fan.t_max = 10900;
 			rdev->pm.dpm.fan.cycle_delay = 100000;
+			if (fan_info->fan.ucFanTableFormat >= 3) {
+				rdev->pm.dpm.fan.control_mode = fan_info->fan3.ucFanControlMode;
+				rdev->pm.dpm.fan.default_max_fan_pwm =
+					le16_to_cpu(fan_info->fan3.usFanPWMMax);
+				rdev->pm.dpm.fan.default_fan_output_sensitivity = 4836;
+				rdev->pm.dpm.fan.fan_output_sensitivity =
+					le16_to_cpu(fan_info->fan3.usFanOutputSensitivity);
+			}
 			rdev->pm.dpm.fan.ucode_fan_control = true;
 		}
 	}
diff --git a/drivers/gpu/drm/radeon/r600_dpm.h b/drivers/gpu/drm/radeon/r600_dpm.h
index 46b9d2a03018..bd499d749bc9 100644
--- a/drivers/gpu/drm/radeon/r600_dpm.h
+++ b/drivers/gpu/drm/radeon/r600_dpm.h
@@ -96,6 +96,9 @@
 #define R600_TEMP_RANGE_MIN (90 * 1000)
 #define R600_TEMP_RANGE_MAX (120 * 1000)
 
+#define FDO_PWM_MODE_STATIC  1
+#define FDO_PWM_MODE_STATIC_RPM 5
+
 enum r600_power_level {
 	R600_POWER_LEVEL_LOW = 0,
 	R600_POWER_LEVEL_MEDIUM = 1,
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index a9717b3fbf1b..54529b837afa 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -150,9 +150,6 @@ extern int radeon_backlight;
 /* number of hw syncs before falling back on blocking */
 #define RADEON_NUM_SYNCS			4
 
-/* number of hw syncs before falling back on blocking */
-#define RADEON_NUM_SYNCS			4
-
 /* hardcode those limit for now */
 #define RADEON_VA_IB_OFFSET			(1 << 20)
 #define RADEON_VA_RESERVED_SIZE			(8 << 20)
@@ -363,14 +360,15 @@ struct radeon_fence_driver {
 };
 
 struct radeon_fence {
-	struct fence base;
+	struct fence		base;
 
-	struct radeon_device		*rdev;
-	uint64_t			seq;
+	struct radeon_device	*rdev;
+	uint64_t		seq;
 	/* RB, DMA, etc. */
-	unsigned			ring;
+	unsigned		ring;
+	bool			is_vm_update;
 
-	wait_queue_t			fence_wake;
+	wait_queue_t		fence_wake;
 };
 
 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -452,12 +450,22 @@ struct radeon_mman {
 #endif
 };
 
+struct radeon_bo_list {
+	struct radeon_bo		*robj;
+	struct ttm_validate_buffer	tv;
+	uint64_t			gpu_offset;
+	unsigned			prefered_domains;
+	unsigned			allowed_domains;
+	uint32_t			tiling_flags;
+};
+
 /* bo virtual address in a specific vm */
 struct radeon_bo_va {
 	/* protected by bo being reserved */
 	struct list_head		bo_list;
 	uint32_t			flags;
 	uint64_t			addr;
+	struct radeon_fence		*last_pt_update;
 	unsigned			ref_count;
 
 	/* protected by vm mutex */
@@ -474,7 +482,7 @@ struct radeon_bo {
 	struct list_head		list;
 	/* Protected by tbo.reserved */
 	u32				initial_domain;
-	struct ttm_place		placements[3];
+	struct ttm_place		placements[4];
 	struct ttm_placement		placement;
 	struct ttm_buffer_object	tbo;
 	struct ttm_bo_kmap_obj		kmap;
@@ -576,10 +584,9 @@ int radeon_mode_dumb_mmap(struct drm_file *filp,
  * Semaphores.
  */
 struct radeon_semaphore {
-	struct radeon_sa_bo		*sa_bo;
-	signed				waiters;
-	uint64_t			gpu_addr;
-	struct radeon_fence		*sync_to[RADEON_NUM_RINGS];
+	struct radeon_sa_bo	*sa_bo;
+	signed			waiters;
+	uint64_t		gpu_addr;
 };
 
 int radeon_semaphore_create(struct radeon_device *rdev,
@@ -588,20 +595,33 @@ bool radeon_semaphore_emit_signal(struct radeon_device *rdev, int ring,
 				  struct radeon_semaphore *semaphore);
 bool radeon_semaphore_emit_wait(struct radeon_device *rdev, int ring,
 				struct radeon_semaphore *semaphore);
-void radeon_semaphore_sync_fence(struct radeon_semaphore *semaphore,
-				 struct radeon_fence *fence);
-int radeon_semaphore_sync_resv(struct radeon_device *rdev,
-			       struct radeon_semaphore *semaphore,
-			       struct reservation_object *resv,
-			       bool shared);
-int radeon_semaphore_sync_rings(struct radeon_device *rdev,
-				struct radeon_semaphore *semaphore,
-				int waiting_ring);
 void radeon_semaphore_free(struct radeon_device *rdev,
 			   struct radeon_semaphore **semaphore,
 			   struct radeon_fence *fence);
 
 /*
+ * Synchronization
+ */
+struct radeon_sync {
+	struct radeon_semaphore *semaphores[RADEON_NUM_SYNCS];
+	struct radeon_fence	*sync_to[RADEON_NUM_RINGS];
+	struct radeon_fence	*last_vm_update;
+};
+
+void radeon_sync_create(struct radeon_sync *sync);
+void radeon_sync_fence(struct radeon_sync *sync,
+		       struct radeon_fence *fence);
+int radeon_sync_resv(struct radeon_device *rdev,
+		     struct radeon_sync *sync,
+		     struct reservation_object *resv,
+		     bool shared);
+int radeon_sync_rings(struct radeon_device *rdev,
+		      struct radeon_sync *sync,
+		      int waiting_ring);
+void radeon_sync_free(struct radeon_device *rdev, struct radeon_sync *sync,
+		      struct radeon_fence *fence);
+
+/*
  * GART structures, functions & helpers
  */
 struct radeon_mc;
@@ -701,6 +721,10 @@ struct radeon_doorbell {
 
 int radeon_doorbell_get(struct radeon_device *rdev, u32 *page);
 void radeon_doorbell_free(struct radeon_device *rdev, u32 doorbell);
+void radeon_doorbell_get_kfd_info(struct radeon_device *rdev,
+				  phys_addr_t *aperture_base,
+				  size_t *aperture_size,
+				  size_t *start_offset);
 
 /*
  * IRQS.
@@ -814,7 +838,7 @@ struct radeon_ib {
 	struct radeon_fence		*fence;
 	struct radeon_vm		*vm;
 	bool				is_const_ib;
-	struct radeon_semaphore		*semaphore;
+	struct radeon_sync		sync;
 };
 
 struct radeon_ring {
@@ -891,33 +915,40 @@ struct radeon_vm_pt {
 	uint64_t			addr;
 };
 
+struct radeon_vm_id {
+	unsigned		id;
+	uint64_t		pd_gpu_addr;
+	/* last flushed PD/PT update */
+	struct radeon_fence	*flushed_updates;
+	/* last use of vmid */
+	struct radeon_fence	*last_id_use;
+};
+
 struct radeon_vm {
-	struct rb_root			va;
-	unsigned			id;
+	struct mutex		mutex;
+
+	struct rb_root		va;
+
+	/* protecting invalidated and freed */
+	spinlock_t		status_lock;
 
 	/* BOs moved, but not yet updated in the PT */
-	struct list_head		invalidated;
+	struct list_head	invalidated;
 
 	/* BOs freed, but not yet updated in the PT */
-	struct list_head		freed;
+	struct list_head	freed;
 
 	/* contains the page directory */
-	struct radeon_bo		*page_directory;
-	uint64_t			pd_gpu_addr;
-	unsigned			max_pde_used;
+	struct radeon_bo	*page_directory;
+	unsigned		max_pde_used;
 
 	/* array of page tables, one for each page directory entry */
-	struct radeon_vm_pt		*page_tables;
+	struct radeon_vm_pt	*page_tables;
 
-	struct radeon_bo_va		*ib_bo_va;
+	struct radeon_bo_va	*ib_bo_va;
 
-	struct mutex			mutex;
-	/* last fence for cs using this vm */
-	struct radeon_fence		*fence;
-	/* last flush or NULL if we still need to flush */
-	struct radeon_fence		*last_flush;
-	/* last use of vmid */
-	struct radeon_fence		*last_id_use;
+	/* for id and flush management per ring */
+	struct radeon_vm_id	ids[RADEON_NUM_RINGS];
 };
 
 struct radeon_vm_manager {
@@ -1025,19 +1056,7 @@ void cayman_dma_fini(struct radeon_device *rdev);
 /*
  * CS.
  */
-struct radeon_cs_reloc {
-	struct drm_gem_object		*gobj;
-	struct radeon_bo		*robj;
-	struct ttm_validate_buffer	tv;
-	uint64_t			gpu_offset;
-	unsigned			prefered_domains;
-	unsigned			allowed_domains;
-	uint32_t			tiling_flags;
-	uint32_t			handle;
-};
-
 struct radeon_cs_chunk {
-	uint32_t		chunk_id;
 	uint32_t		length_dw;
 	uint32_t		*kdata;
 	void __user		*user_ptr;
@@ -1055,16 +1074,15 @@ struct radeon_cs_parser {
 	unsigned		idx;
 	/* relocations */
 	unsigned		nrelocs;
-	struct radeon_cs_reloc	*relocs;
-	struct radeon_cs_reloc	**relocs_ptr;
-	struct radeon_cs_reloc	*vm_bos;
+	struct radeon_bo_list	*relocs;
+	struct radeon_bo_list	*vm_bos;
 	struct list_head	validated;
 	unsigned		dma_reloc_idx;
 	/* indices of various chunks */
-	int			chunk_ib_idx;
-	int			chunk_relocs_idx;
-	int			chunk_flags_idx;
-	int			chunk_const_ib_idx;
+	struct radeon_cs_chunk  *chunk_ib;
+	struct radeon_cs_chunk  *chunk_relocs;
+	struct radeon_cs_chunk  *chunk_flags;
+	struct radeon_cs_chunk  *chunk_const_ib;
 	struct radeon_ib	ib;
 	struct radeon_ib	const_ib;
 	void			*track;
@@ -1078,7 +1096,7 @@ struct radeon_cs_parser {
 
 static inline u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
 {
-	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
+	struct radeon_cs_chunk *ibc = p->chunk_ib;
 
 	if (ibc->kdata)
 		return ibc->kdata[idx];
@@ -1490,6 +1508,10 @@ struct radeon_dpm_fan {
 	u8 t_hyst;
 	u32 cycle_delay;
 	u16 t_max;
+	u8 control_mode;
+	u16 default_max_fan_pwm;
+	u16 default_fan_output_sensitivity;
+	u16 fan_output_sensitivity;
 	bool ucode_fan_control;
 };
 
@@ -1623,6 +1645,11 @@ struct radeon_pm {
 	/* internal thermal controller on rv6xx+ */
 	enum radeon_int_thermal_type int_thermal_type;
 	struct device	        *int_hwmon_dev;
+	/* fan control parameters */
+	bool                    no_fan;
+	u8                      fan_pulses_per_revolution;
+	u8                      fan_min_rpm;
+	u8                      fan_max_rpm;
 	/* dpm */
 	bool                    dpm_enabled;
 	struct radeon_dpm       dpm;
@@ -1785,7 +1812,8 @@ struct radeon_asic_ring {
 	void (*hdp_flush)(struct radeon_device *rdev, struct radeon_ring *ring);
 	bool (*emit_semaphore)(struct radeon_device *rdev, struct radeon_ring *cp,
 			       struct radeon_semaphore *semaphore, bool emit_wait);
-	void (*vm_flush)(struct radeon_device *rdev, int ridx, struct radeon_vm *vm);
+	void (*vm_flush)(struct radeon_device *rdev, struct radeon_ring *ring,
+			 unsigned vm_id, uint64_t pd_addr);
 
 	/* testing functions */
 	int (*ring_test)(struct radeon_device *rdev, struct radeon_ring *cp);
@@ -2388,6 +2416,8 @@ struct radeon_device {
 	struct radeon_atcs		atcs;
 	/* srbm instance registers */
 	struct mutex			srbm_mutex;
+	/* GRBM index mutex. Protects concurrents access to GRBM index */
+	struct mutex			grbm_idx_mutex;
 	/* clock, powergating flags */
 	u32 cg_flags;
 	u32 pg_flags;
@@ -2400,6 +2430,10 @@ struct radeon_device {
 	u64 vram_pin_size;
 	u64 gart_pin_size;
 
+	/* amdkfd interface */
+	struct kfd_dev		*kfd;
+	struct radeon_sa_manager	kfd_bo;
+
 	struct mutex	mn_lock;
 	DECLARE_HASHTABLE(mn_hash, 7);
 };
@@ -2831,7 +2865,7 @@ static inline void radeon_ring_write(struct radeon_ring *ring, uint32_t v)
 #define radeon_ring_ib_execute(rdev, r, ib) (rdev)->asic->ring[(r)]->ib_execute((rdev), (ib))
 #define radeon_ring_ib_parse(rdev, r, ib) (rdev)->asic->ring[(r)]->ib_parse((rdev), (ib))
 #define radeon_ring_is_lockup(rdev, r, cp) (rdev)->asic->ring[(r)]->is_lockup((rdev), (cp))
-#define radeon_ring_vm_flush(rdev, r, vm) (rdev)->asic->ring[(r)]->vm_flush((rdev), (r), (vm))
+#define radeon_ring_vm_flush(rdev, r, vm_id, pd_addr) (rdev)->asic->ring[(r)->idx]->vm_flush((rdev), (r), (vm_id), (pd_addr))
 #define radeon_ring_get_rptr(rdev, r) (rdev)->asic->ring[(r)->idx]->get_rptr((rdev), (r))
 #define radeon_ring_get_wptr(rdev, r) (rdev)->asic->ring[(r)->idx]->get_wptr((rdev), (r))
 #define radeon_ring_set_wptr(rdev, r) (rdev)->asic->ring[(r)->idx]->set_wptr((rdev), (r))
@@ -2940,14 +2974,14 @@ int radeon_vm_manager_init(struct radeon_device *rdev);
 void radeon_vm_manager_fini(struct radeon_device *rdev);
 int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm);
 void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm);
-struct radeon_cs_reloc *radeon_vm_get_bos(struct radeon_device *rdev,
+struct radeon_bo_list *radeon_vm_get_bos(struct radeon_device *rdev,
 					  struct radeon_vm *vm,
                                           struct list_head *head);
 struct radeon_fence *radeon_vm_grab_id(struct radeon_device *rdev,
 				       struct radeon_vm *vm, int ring);
 void radeon_vm_flush(struct radeon_device *rdev,
                      struct radeon_vm *vm,
-                     int ring);
+		     int ring, struct radeon_fence *fence);
 void radeon_vm_fence(struct radeon_device *rdev,
 		     struct radeon_vm *vm,
 		     struct radeon_fence *fence);
@@ -3054,7 +3088,7 @@ bool radeon_cs_packet_next_is_pkt3_nop(struct radeon_cs_parser *p);
 void radeon_cs_dump_packet(struct radeon_cs_parser *p,
 			   struct radeon_cs_packet *pkt);
 int radeon_cs_packet_next_reloc(struct radeon_cs_parser *p,
-				struct radeon_cs_reloc **cs_reloc,
+				struct radeon_bo_list **cs_reloc,
 				int nomm);
 int r600_cs_common_vline_parse(struct radeon_cs_parser *p,
 			       uint32_t *vline_start_end,
diff --git a/drivers/gpu/drm/radeon/radeon_asic.h b/drivers/gpu/drm/radeon/radeon_asic.h
index d8ace5b28a5b..2a45d548d5ec 100644
--- a/drivers/gpu/drm/radeon/radeon_asic.h
+++ b/drivers/gpu/drm/radeon/radeon_asic.h
@@ -599,7 +599,8 @@ int cayman_asic_reset(struct radeon_device *rdev);
 void cayman_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib);
 int cayman_vm_init(struct radeon_device *rdev);
 void cayman_vm_fini(struct radeon_device *rdev);
-void cayman_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm);
+void cayman_vm_flush(struct radeon_device *rdev, struct radeon_ring *ring,
+		     unsigned vm_id, uint64_t pd_addr);
 uint32_t cayman_vm_page_flags(struct radeon_device *rdev, uint32_t flags);
 int evergreen_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib);
 int evergreen_dma_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib);
@@ -624,7 +625,8 @@ void cayman_dma_vm_set_pages(struct radeon_device *rdev,
 			     uint32_t incr, uint32_t flags);
 void cayman_dma_vm_pad_ib(struct radeon_ib *ib);
 
-void cayman_dma_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm);
+void cayman_dma_vm_flush(struct radeon_device *rdev, struct radeon_ring *ring,
+			 unsigned vm_id, uint64_t pd_addr);
 
 u32 cayman_gfx_get_rptr(struct radeon_device *rdev,
 			struct radeon_ring *ring);
@@ -699,7 +701,8 @@ int si_irq_set(struct radeon_device *rdev);
 int si_irq_process(struct radeon_device *rdev);
 int si_vm_init(struct radeon_device *rdev);
 void si_vm_fini(struct radeon_device *rdev);
-void si_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm);
+void si_vm_flush(struct radeon_device *rdev, struct radeon_ring *ring,
+		 unsigned vm_id, uint64_t pd_addr);
 int si_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib);
 struct radeon_fence *si_copy_dma(struct radeon_device *rdev,
 				 uint64_t src_offset, uint64_t dst_offset,
@@ -721,7 +724,8 @@ void si_dma_vm_set_pages(struct radeon_device *rdev,
 			 uint64_t addr, unsigned count,
 			 uint32_t incr, uint32_t flags);
 
-void si_dma_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm);
+void si_dma_vm_flush(struct radeon_device *rdev, struct radeon_ring *ring,
+		     unsigned vm_id, uint64_t pd_addr);
 u32 si_get_xclk(struct radeon_device *rdev);
 uint64_t si_get_gpu_clock_counter(struct radeon_device *rdev);
 int si_set_uvd_clocks(struct radeon_device *rdev, u32 vclk, u32 dclk);
@@ -793,7 +797,8 @@ int cik_irq_set(struct radeon_device *rdev);
 int cik_irq_process(struct radeon_device *rdev);
 int cik_vm_init(struct radeon_device *rdev);
 void cik_vm_fini(struct radeon_device *rdev);
-void cik_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm);
+void cik_vm_flush(struct radeon_device *rdev, struct radeon_ring *ring,
+		  unsigned vm_id, uint64_t pd_addr);
 
 void cik_sdma_vm_copy_pages(struct radeon_device *rdev,
 			    struct radeon_ib *ib,
@@ -811,7 +816,8 @@ void cik_sdma_vm_set_pages(struct radeon_device *rdev,
 			   uint32_t incr, uint32_t flags);
 void cik_sdma_vm_pad_ib(struct radeon_ib *ib);
 
-void cik_dma_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm);
+void cik_dma_vm_flush(struct radeon_device *rdev, struct radeon_ring *ring,
+		      unsigned vm_id, uint64_t pd_addr);
 int cik_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib);
 u32 cik_gfx_get_rptr(struct radeon_device *rdev,
 		     struct radeon_ring *ring);
diff --git a/drivers/gpu/drm/radeon/radeon_atombios.c b/drivers/gpu/drm/radeon/radeon_atombios.c
index df69b92ba164..dbc94f300297 100644
--- a/drivers/gpu/drm/radeon/radeon_atombios.c
+++ b/drivers/gpu/drm/radeon/radeon_atombios.c
@@ -196,8 +196,8 @@ void radeon_atombios_i2c_init(struct radeon_device *rdev)
 	}
 }
 
-static struct radeon_gpio_rec radeon_lookup_gpio(struct radeon_device *rdev,
-						 u8 id)
+struct radeon_gpio_rec radeon_atombios_lookup_gpio(struct radeon_device *rdev,
+						   u8 id)
 {
 	struct atom_context *ctx = rdev->mode_info.atom_context;
 	struct radeon_gpio_rec gpio;
@@ -221,6 +221,7 @@ static struct radeon_gpio_rec radeon_lookup_gpio(struct radeon_device *rdev,
 			if (id == pin->ucGPIO_ID) {
 				gpio.id = pin->ucGPIO_ID;
 				gpio.reg = le16_to_cpu(pin->usGpioPin_AIndex) * 4;
+				gpio.shift = pin->ucGpioPinBitShift;
 				gpio.mask = (1 << pin->ucGpioPinBitShift);
 				gpio.valid = true;
 				break;
@@ -801,7 +802,7 @@ bool radeon_get_atom_connector_info_from_object_table(struct drm_device *dev)
 								hpd_record =
 									(ATOM_HPD_INT_RECORD *)
 									record;
-								gpio = radeon_lookup_gpio(rdev,
+								gpio = radeon_atombios_lookup_gpio(rdev,
 											  hpd_record->ucHPDIntGPIOID);
 								hpd = radeon_atom_get_hpd_info_from_gpio(rdev, &gpio);
 								hpd.plugged_state = hpd_record->ucPlugged_PinState;
@@ -2128,7 +2129,7 @@ static int radeon_atombios_parse_power_table_1_3(struct radeon_device *rdev)
 				rdev->pm.power_state[state_index].clock_info[0].voltage.type =
 					VOLTAGE_GPIO;
 				rdev->pm.power_state[state_index].clock_info[0].voltage.gpio =
-					radeon_lookup_gpio(rdev,
+					radeon_atombios_lookup_gpio(rdev,
 							   power_info->info.asPowerPlayInfo[i].ucVoltageDropIndex);
 				if (misc & ATOM_PM_MISCINFO_VOLTAGE_DROP_ACTIVE_HIGH)
 					rdev->pm.power_state[state_index].clock_info[0].voltage.active_high =
@@ -2164,7 +2165,7 @@ static int radeon_atombios_parse_power_table_1_3(struct radeon_device *rdev)
 				rdev->pm.power_state[state_index].clock_info[0].voltage.type =
 					VOLTAGE_GPIO;
 				rdev->pm.power_state[state_index].clock_info[0].voltage.gpio =
-					radeon_lookup_gpio(rdev,
+					radeon_atombios_lookup_gpio(rdev,
 							   power_info->info_2.asPowerPlayInfo[i].ucVoltageDropIndex);
 				if (misc & ATOM_PM_MISCINFO_VOLTAGE_DROP_ACTIVE_HIGH)
 					rdev->pm.power_state[state_index].clock_info[0].voltage.active_high =
@@ -2200,7 +2201,7 @@ static int radeon_atombios_parse_power_table_1_3(struct radeon_device *rdev)
 				rdev->pm.power_state[state_index].clock_info[0].voltage.type =
 					VOLTAGE_GPIO;
 				rdev->pm.power_state[state_index].clock_info[0].voltage.gpio =
-					radeon_lookup_gpio(rdev,
+					radeon_atombios_lookup_gpio(rdev,
 							   power_info->info_3.asPowerPlayInfo[i].ucVoltageDropIndex);
 				if (misc & ATOM_PM_MISCINFO_VOLTAGE_DROP_ACTIVE_HIGH)
 					rdev->pm.power_state[state_index].clock_info[0].voltage.active_high =
@@ -2248,6 +2249,14 @@ static void radeon_atombios_add_pplib_thermal_controller(struct radeon_device *r
 
 	/* add the i2c bus for thermal/fan chip */
 	if (controller->ucType > 0) {
+		if (controller->ucFanParameters & ATOM_PP_FANPARAMETERS_NOFAN)
+			rdev->pm.no_fan = true;
+		rdev->pm.fan_pulses_per_revolution =
+			controller->ucFanParameters & ATOM_PP_FANPARAMETERS_TACHOMETER_PULSES_PER_REVOLUTION_MASK;
+		if (rdev->pm.fan_pulses_per_revolution) {
+			rdev->pm.fan_min_rpm = controller->ucFanMinRPM;
+			rdev->pm.fan_max_rpm = controller->ucFanMaxRPM;
+		}
 		if (controller->ucType == ATOM_PP_THERMALCONTROLLER_RV6xx) {
 			DRM_INFO("Internal thermal controller %s fan control\n",
 				 (controller->ucFanParameters &
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 6f377de099f9..c830863bc98a 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -77,22 +77,18 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 	struct drm_device *ddev = p->rdev->ddev;
 	struct radeon_cs_chunk *chunk;
 	struct radeon_cs_buckets buckets;
-	unsigned i, j;
-	bool duplicate, need_mmap_lock = false;
+	unsigned i;
+	bool need_mmap_lock = false;
 	int r;
 
-	if (p->chunk_relocs_idx == -1) {
+	if (p->chunk_relocs == NULL) {
 		return 0;
 	}
-	chunk = &p->chunks[p->chunk_relocs_idx];
+	chunk = p->chunk_relocs;
 	p->dma_reloc_idx = 0;
 	/* FIXME: we assume that each relocs use 4 dwords */
 	p->nrelocs = chunk->length_dw / 4;
-	p->relocs_ptr = kcalloc(p->nrelocs, sizeof(void *), GFP_KERNEL);
-	if (p->relocs_ptr == NULL) {
-		return -ENOMEM;
-	}
-	p->relocs = kcalloc(p->nrelocs, sizeof(struct radeon_cs_reloc), GFP_KERNEL);
+	p->relocs = kcalloc(p->nrelocs, sizeof(struct radeon_bo_list), GFP_KERNEL);
 	if (p->relocs == NULL) {
 		return -ENOMEM;
 	}
@@ -101,31 +97,17 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 
 	for (i = 0; i < p->nrelocs; i++) {
 		struct drm_radeon_cs_reloc *r;
+		struct drm_gem_object *gobj;
 		unsigned priority;
 
-		duplicate = false;
 		r = (struct drm_radeon_cs_reloc *)&chunk->kdata[i*4];
-		for (j = 0; j < i; j++) {
-			if (r->handle == p->relocs[j].handle) {
-				p->relocs_ptr[i] = &p->relocs[j];
-				duplicate = true;
-				break;
-			}
-		}
-		if (duplicate) {
-			p->relocs[i].handle = 0;
-			continue;
-		}
-
-		p->relocs[i].gobj = drm_gem_object_lookup(ddev, p->filp,
-							  r->handle);
-		if (p->relocs[i].gobj == NULL) {
+		gobj = drm_gem_object_lookup(ddev, p->filp, r->handle);
+		if (gobj == NULL) {
 			DRM_ERROR("gem object lookup failed 0x%x\n",
 				  r->handle);
 			return -ENOENT;
 		}
-		p->relocs_ptr[i] = &p->relocs[i];
-		p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj);
+		p->relocs[i].robj = gem_to_radeon_bo(gobj);
 
 		/* The userspace buffer priorities are from 0 to 15. A higher
 		 * number means the buffer is more important.
@@ -184,7 +166,6 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 
 		p->relocs[i].tv.bo = &p->relocs[i].robj->tbo;
 		p->relocs[i].tv.shared = !r->write_domain;
-		p->relocs[i].handle = r->handle;
 
 		radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head,
 				      priority);
@@ -251,15 +232,15 @@ static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s32 priority
 
 static int radeon_cs_sync_rings(struct radeon_cs_parser *p)
 {
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	int r;
 
 	list_for_each_entry(reloc, &p->validated, tv.head) {
 		struct reservation_object *resv;
 
 		resv = reloc->robj->tbo.resv;
-		r = radeon_semaphore_sync_resv(p->rdev, p->ib.semaphore, resv,
-					       reloc->tv.shared);
+		r = radeon_sync_resv(p->rdev, &p->ib.sync, resv,
+				     reloc->tv.shared);
 		if (r)
 			return r;
 	}
@@ -282,13 +263,11 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 	INIT_LIST_HEAD(&p->validated);
 	p->idx = 0;
 	p->ib.sa_bo = NULL;
-	p->ib.semaphore = NULL;
 	p->const_ib.sa_bo = NULL;
-	p->const_ib.semaphore = NULL;
-	p->chunk_ib_idx = -1;
-	p->chunk_relocs_idx = -1;
-	p->chunk_flags_idx = -1;
-	p->chunk_const_ib_idx = -1;
+	p->chunk_ib = NULL;
+	p->chunk_relocs = NULL;
+	p->chunk_flags = NULL;
+	p->chunk_const_ib = NULL;
 	p->chunks_array = kcalloc(cs->num_chunks, sizeof(uint64_t), GFP_KERNEL);
 	if (p->chunks_array == NULL) {
 		return -ENOMEM;
@@ -315,24 +294,23 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 			return -EFAULT;
 		}
 		p->chunks[i].length_dw = user_chunk.length_dw;
-		p->chunks[i].chunk_id = user_chunk.chunk_id;
-		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) {
-			p->chunk_relocs_idx = i;
+		if (user_chunk.chunk_id == RADEON_CHUNK_ID_RELOCS) {
+			p->chunk_relocs = &p->chunks[i];
 		}
-		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_IB) {
-			p->chunk_ib_idx = i;
+		if (user_chunk.chunk_id == RADEON_CHUNK_ID_IB) {
+			p->chunk_ib = &p->chunks[i];
 			/* zero length IB isn't useful */
 			if (p->chunks[i].length_dw == 0)
 				return -EINVAL;
 		}
-		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_CONST_IB) {
-			p->chunk_const_ib_idx = i;
+		if (user_chunk.chunk_id == RADEON_CHUNK_ID_CONST_IB) {
+			p->chunk_const_ib = &p->chunks[i];
 			/* zero length CONST IB isn't useful */
 			if (p->chunks[i].length_dw == 0)
 				return -EINVAL;
 		}
-		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS) {
-			p->chunk_flags_idx = i;
+		if (user_chunk.chunk_id == RADEON_CHUNK_ID_FLAGS) {
+			p->chunk_flags = &p->chunks[i];
 			/* zero length flags aren't useful */
 			if (p->chunks[i].length_dw == 0)
 				return -EINVAL;
@@ -341,10 +319,10 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 		size = p->chunks[i].length_dw;
 		cdata = (void __user *)(unsigned long)user_chunk.chunk_data;
 		p->chunks[i].user_ptr = cdata;
-		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_CONST_IB)
+		if (user_chunk.chunk_id == RADEON_CHUNK_ID_CONST_IB)
 			continue;
 
-		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_IB) {
+		if (user_chunk.chunk_id == RADEON_CHUNK_ID_IB) {
 			if (!p->rdev || !(p->rdev->flags & RADEON_IS_AGP))
 				continue;
 		}
@@ -357,7 +335,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 		if (copy_from_user(p->chunks[i].kdata, cdata, size)) {
 			return -EFAULT;
 		}
-		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS) {
+		if (user_chunk.chunk_id == RADEON_CHUNK_ID_FLAGS) {
 			p->cs_flags = p->chunks[i].kdata[0];
 			if (p->chunks[i].length_dw > 1)
 				ring = p->chunks[i].kdata[1];
@@ -398,8 +376,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 static int cmp_size_smaller_first(void *priv, struct list_head *a,
 				  struct list_head *b)
 {
-	struct radeon_cs_reloc *la = list_entry(a, struct radeon_cs_reloc, tv.head);
-	struct radeon_cs_reloc *lb = list_entry(b, struct radeon_cs_reloc, tv.head);
+	struct radeon_bo_list *la = list_entry(a, struct radeon_bo_list, tv.head);
+	struct radeon_bo_list *lb = list_entry(b, struct radeon_bo_list, tv.head);
 
 	/* Sort A before B if A is smaller. */
 	return (int)la->robj->tbo.num_pages - (int)lb->robj->tbo.num_pages;
@@ -440,13 +418,15 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
 
 	if (parser->relocs != NULL) {
 		for (i = 0; i < parser->nrelocs; i++) {
-			if (parser->relocs[i].gobj)
-				drm_gem_object_unreference_unlocked(parser->relocs[i].gobj);
+			struct radeon_bo *bo = parser->relocs[i].robj;
+			if (bo == NULL)
+				continue;
+
+			drm_gem_object_unreference_unlocked(&bo->gem_base);
 		}
 	}
 	kfree(parser->track);
 	kfree(parser->relocs);
-	kfree(parser->relocs_ptr);
 	drm_free_large(parser->vm_bos);
 	for (i = 0; i < parser->nchunks; i++)
 		drm_free_large(parser->chunks[i].kdata);
@@ -461,7 +441,7 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev,
 {
 	int r;
 
-	if (parser->chunk_ib_idx == -1)
+	if (parser->chunk_ib == NULL)
 		return 0;
 
 	if (parser->cs_flags & RADEON_CS_USE_VM)
@@ -521,10 +501,6 @@ static int radeon_bo_vm_update_pte(struct radeon_cs_parser *p,
 	for (i = 0; i < p->nrelocs; i++) {
 		struct radeon_bo *bo;
 
-		/* ignore duplicates */
-		if (p->relocs_ptr[i] != &p->relocs[i])
-			continue;
-
 		bo = p->relocs[i].robj;
 		bo_va = radeon_vm_bo_find(vm, bo);
 		if (bo_va == NULL) {
@@ -535,6 +511,8 @@ static int radeon_bo_vm_update_pte(struct radeon_cs_parser *p,
 		r = radeon_vm_bo_update(rdev, bo_va, &bo->tbo.mem);
 		if (r)
 			return r;
+
+		radeon_sync_fence(&p->ib.sync, bo_va->last_pt_update);
 	}
 
 	return radeon_vm_clear_invalids(rdev, vm);
@@ -547,7 +525,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
 	struct radeon_vm *vm = &fpriv->vm;
 	int r;
 
-	if (parser->chunk_ib_idx == -1)
+	if (parser->chunk_ib == NULL)
 		return 0;
 	if ((parser->cs_flags & RADEON_CS_USE_VM) == 0)
 		return 0;
@@ -579,10 +557,9 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
 			DRM_ERROR("Failed to sync rings: %i\n", r);
 		goto out;
 	}
-	radeon_semaphore_sync_fence(parser->ib.semaphore, vm->fence);
 
 	if ((rdev->family >= CHIP_TAHITI) &&
-	    (parser->chunk_const_ib_idx != -1)) {
+	    (parser->chunk_const_ib != NULL)) {
 		r = radeon_ib_schedule(rdev, &parser->ib, &parser->const_ib, true);
 	} else {
 		r = radeon_ib_schedule(rdev, &parser->ib, NULL, true);
@@ -609,7 +586,7 @@ static int radeon_cs_ib_fill(struct radeon_device *rdev, struct radeon_cs_parser
 	struct radeon_vm *vm = NULL;
 	int r;
 
-	if (parser->chunk_ib_idx == -1)
+	if (parser->chunk_ib == NULL)
 		return 0;
 
 	if (parser->cs_flags & RADEON_CS_USE_VM) {
@@ -617,8 +594,8 @@ static int radeon_cs_ib_fill(struct radeon_device *rdev, struct radeon_cs_parser
 		vm = &fpriv->vm;
 
 		if ((rdev->family >= CHIP_TAHITI) &&
-		    (parser->chunk_const_ib_idx != -1)) {
-			ib_chunk = &parser->chunks[parser->chunk_const_ib_idx];
+		    (parser->chunk_const_ib != NULL)) {
+			ib_chunk = parser->chunk_const_ib;
 			if (ib_chunk->length_dw > RADEON_IB_VM_MAX_SIZE) {
 				DRM_ERROR("cs IB CONST too big: %d\n", ib_chunk->length_dw);
 				return -EINVAL;
@@ -637,13 +614,13 @@ static int radeon_cs_ib_fill(struct radeon_device *rdev, struct radeon_cs_parser
 				return -EFAULT;
 		}
 
-		ib_chunk = &parser->chunks[parser->chunk_ib_idx];
+		ib_chunk = parser->chunk_ib;
 		if (ib_chunk->length_dw > RADEON_IB_VM_MAX_SIZE) {
 			DRM_ERROR("cs IB too big: %d\n", ib_chunk->length_dw);
 			return -EINVAL;
 		}
 	}
-	ib_chunk = &parser->chunks[parser->chunk_ib_idx];
+	ib_chunk = parser->chunk_ib;
 
 	r =  radeon_ib_get(rdev, parser->ring, &parser->ib,
 			   vm, ib_chunk->length_dw * 4);
@@ -735,7 +712,7 @@ int radeon_cs_packet_parse(struct radeon_cs_parser *p,
 			   struct radeon_cs_packet *pkt,
 			   unsigned idx)
 {
-	struct radeon_cs_chunk *ib_chunk = &p->chunks[p->chunk_ib_idx];
+	struct radeon_cs_chunk *ib_chunk = p->chunk_ib;
 	struct radeon_device *rdev = p->rdev;
 	uint32_t header;
 
@@ -829,7 +806,7 @@ void radeon_cs_dump_packet(struct radeon_cs_parser *p,
  * GPU offset using the provided start.
  **/
 int radeon_cs_packet_next_reloc(struct radeon_cs_parser *p,
-				struct radeon_cs_reloc **cs_reloc,
+				struct radeon_bo_list **cs_reloc,
 				int nomm)
 {
 	struct radeon_cs_chunk *relocs_chunk;
@@ -837,12 +814,12 @@ int radeon_cs_packet_next_reloc(struct radeon_cs_parser *p,
 	unsigned idx;
 	int r;
 
-	if (p->chunk_relocs_idx == -1) {
+	if (p->chunk_relocs == NULL) {
 		DRM_ERROR("No relocation chunk !\n");
 		return -EINVAL;
 	}
 	*cs_reloc = NULL;
-	relocs_chunk = &p->chunks[p->chunk_relocs_idx];
+	relocs_chunk = p->chunk_relocs;
 	r = radeon_cs_packet_parse(p, &p3reloc, p->idx);
 	if (r)
 		return r;
@@ -868,6 +845,6 @@ int radeon_cs_packet_next_reloc(struct radeon_cs_parser *p,
 			(u64)relocs_chunk->kdata[idx + 3] << 32;
 		(*cs_reloc)->gpu_offset |= relocs_chunk->kdata[idx + 0];
 	} else
-		*cs_reloc = p->relocs_ptr[(idx / 4)];
+		*cs_reloc = &p->relocs[(idx / 4)];
 	return 0;
 }
diff --git a/drivers/gpu/drm/radeon/radeon_cursor.c b/drivers/gpu/drm/radeon/radeon_cursor.c
index 9630e8d95fb4..45e54060ee97 100644
--- a/drivers/gpu/drm/radeon/radeon_cursor.c
+++ b/drivers/gpu/drm/radeon/radeon_cursor.c
@@ -117,106 +117,7 @@ static void radeon_show_cursor(struct drm_crtc *crtc)
 	}
 }
 
-static void radeon_set_cursor(struct drm_crtc *crtc, struct drm_gem_object *obj,
-			      uint64_t gpu_addr)
-{
-	struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc);
-	struct radeon_device *rdev = crtc->dev->dev_private;
-
-	if (ASIC_IS_DCE4(rdev)) {
-		WREG32(EVERGREEN_CUR_SURFACE_ADDRESS_HIGH + radeon_crtc->crtc_offset,
-		       upper_32_bits(gpu_addr));
-		WREG32(EVERGREEN_CUR_SURFACE_ADDRESS + radeon_crtc->crtc_offset,
-		       gpu_addr & 0xffffffff);
-	} else if (ASIC_IS_AVIVO(rdev)) {
-		if (rdev->family >= CHIP_RV770) {
-			if (radeon_crtc->crtc_id)
-				WREG32(R700_D2CUR_SURFACE_ADDRESS_HIGH, upper_32_bits(gpu_addr));
-			else
-				WREG32(R700_D1CUR_SURFACE_ADDRESS_HIGH, upper_32_bits(gpu_addr));
-		}
-		WREG32(AVIVO_D1CUR_SURFACE_ADDRESS + radeon_crtc->crtc_offset,
-		       gpu_addr & 0xffffffff);
-	} else {
-		radeon_crtc->legacy_cursor_offset = gpu_addr - radeon_crtc->legacy_display_base_addr;
-		/* offset is from DISP(2)_BASE_ADDRESS */
-		WREG32(RADEON_CUR_OFFSET + radeon_crtc->crtc_offset, radeon_crtc->legacy_cursor_offset);
-	}
-}
-
-int radeon_crtc_cursor_set(struct drm_crtc *crtc,
-			   struct drm_file *file_priv,
-			   uint32_t handle,
-			   uint32_t width,
-			   uint32_t height)
-{
-	struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc);
-	struct radeon_device *rdev = crtc->dev->dev_private;
-	struct drm_gem_object *obj;
-	struct radeon_bo *robj;
-	uint64_t gpu_addr;
-	int ret;
-
-	if (!handle) {
-		/* turn off cursor */
-		radeon_hide_cursor(crtc);
-		obj = NULL;
-		goto unpin;
-	}
-
-	if ((width > radeon_crtc->max_cursor_width) ||
-	    (height > radeon_crtc->max_cursor_height)) {
-		DRM_ERROR("bad cursor width or height %d x %d\n", width, height);
-		return -EINVAL;
-	}
-
-	obj = drm_gem_object_lookup(crtc->dev, file_priv, handle);
-	if (!obj) {
-		DRM_ERROR("Cannot find cursor object %x for crtc %d\n", handle, radeon_crtc->crtc_id);
-		return -ENOENT;
-	}
-
-	robj = gem_to_radeon_bo(obj);
-	ret = radeon_bo_reserve(robj, false);
-	if (unlikely(ret != 0))
-		goto fail;
-	/* Only 27 bit offset for legacy cursor */
-	ret = radeon_bo_pin_restricted(robj, RADEON_GEM_DOMAIN_VRAM,
-				       ASIC_IS_AVIVO(rdev) ? 0 : 1 << 27,
-				       &gpu_addr);
-	radeon_bo_unreserve(robj);
-	if (ret)
-		goto fail;
-
-	radeon_crtc->cursor_width = width;
-	radeon_crtc->cursor_height = height;
-
-	radeon_lock_cursor(crtc, true);
-	radeon_set_cursor(crtc, obj, gpu_addr);
-	radeon_show_cursor(crtc);
-	radeon_lock_cursor(crtc, false);
-
-unpin:
-	if (radeon_crtc->cursor_bo) {
-		robj = gem_to_radeon_bo(radeon_crtc->cursor_bo);
-		ret = radeon_bo_reserve(robj, false);
-		if (likely(ret == 0)) {
-			radeon_bo_unpin(robj);
-			radeon_bo_unreserve(robj);
-		}
-		drm_gem_object_unreference_unlocked(radeon_crtc->cursor_bo);
-	}
-
-	radeon_crtc->cursor_bo = obj;
-	return 0;
-fail:
-	drm_gem_object_unreference_unlocked(obj);
-
-	return ret;
-}
-
-int radeon_crtc_cursor_move(struct drm_crtc *crtc,
-			    int x, int y)
+static int radeon_cursor_move_locked(struct drm_crtc *crtc, int x, int y)
 {
 	struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc);
 	struct radeon_device *rdev = crtc->dev->dev_private;
@@ -281,7 +182,6 @@ int radeon_crtc_cursor_move(struct drm_crtc *crtc,
 		}
 	}
 
-	radeon_lock_cursor(crtc, true);
 	if (ASIC_IS_DCE4(rdev)) {
 		WREG32(EVERGREEN_CUR_POSITION + radeon_crtc->crtc_offset, (x << 16) | y);
 		WREG32(EVERGREEN_CUR_HOT_SPOT + radeon_crtc->crtc_offset, (xorigin << 16) | yorigin);
@@ -308,7 +208,173 @@ int radeon_crtc_cursor_move(struct drm_crtc *crtc,
 		WREG32(RADEON_CUR_OFFSET + radeon_crtc->crtc_offset, (radeon_crtc->legacy_cursor_offset +
 								      (yorigin * 256)));
 	}
+
+	radeon_crtc->cursor_x = x;
+	radeon_crtc->cursor_y = y;
+
+	return 0;
+}
+
+int radeon_crtc_cursor_move(struct drm_crtc *crtc,
+			    int x, int y)
+{
+	int ret;
+
+	radeon_lock_cursor(crtc, true);
+	ret = radeon_cursor_move_locked(crtc, x, y);
 	radeon_lock_cursor(crtc, false);
 
+	return ret;
+}
+
+static int radeon_set_cursor(struct drm_crtc *crtc, struct drm_gem_object *obj)
+{
+	struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc);
+	struct radeon_device *rdev = crtc->dev->dev_private;
+	struct radeon_bo *robj = gem_to_radeon_bo(obj);
+	uint64_t gpu_addr;
+	int ret;
+
+	ret = radeon_bo_reserve(robj, false);
+	if (unlikely(ret != 0))
+		goto fail;
+	/* Only 27 bit offset for legacy cursor */
+	ret = radeon_bo_pin_restricted(robj, RADEON_GEM_DOMAIN_VRAM,
+				       ASIC_IS_AVIVO(rdev) ? 0 : 1 << 27,
+				       &gpu_addr);
+	radeon_bo_unreserve(robj);
+	if (ret)
+		goto fail;
+
+	if (ASIC_IS_DCE4(rdev)) {
+		WREG32(EVERGREEN_CUR_SURFACE_ADDRESS_HIGH + radeon_crtc->crtc_offset,
+		       upper_32_bits(gpu_addr));
+		WREG32(EVERGREEN_CUR_SURFACE_ADDRESS + radeon_crtc->crtc_offset,
+		       gpu_addr & 0xffffffff);
+	} else if (ASIC_IS_AVIVO(rdev)) {
+		if (rdev->family >= CHIP_RV770) {
+			if (radeon_crtc->crtc_id)
+				WREG32(R700_D2CUR_SURFACE_ADDRESS_HIGH, upper_32_bits(gpu_addr));
+			else
+				WREG32(R700_D1CUR_SURFACE_ADDRESS_HIGH, upper_32_bits(gpu_addr));
+		}
+		WREG32(AVIVO_D1CUR_SURFACE_ADDRESS + radeon_crtc->crtc_offset,
+		       gpu_addr & 0xffffffff);
+	} else {
+		radeon_crtc->legacy_cursor_offset = gpu_addr - radeon_crtc->legacy_display_base_addr;
+		/* offset is from DISP(2)_BASE_ADDRESS */
+		WREG32(RADEON_CUR_OFFSET + radeon_crtc->crtc_offset, radeon_crtc->legacy_cursor_offset);
+	}
+
 	return 0;
+
+fail:
+	drm_gem_object_unreference_unlocked(obj);
+
+	return ret;
+}
+
+int radeon_crtc_cursor_set2(struct drm_crtc *crtc,
+			    struct drm_file *file_priv,
+			    uint32_t handle,
+			    uint32_t width,
+			    uint32_t height,
+			    int32_t hot_x,
+			    int32_t hot_y)
+{
+	struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc);
+	struct drm_gem_object *obj;
+	int ret;
+
+	if (!handle) {
+		/* turn off cursor */
+		radeon_hide_cursor(crtc);
+		obj = NULL;
+		goto unpin;
+	}
+
+	if ((width > radeon_crtc->max_cursor_width) ||
+	    (height > radeon_crtc->max_cursor_height)) {
+		DRM_ERROR("bad cursor width or height %d x %d\n", width, height);
+		return -EINVAL;
+	}
+
+	obj = drm_gem_object_lookup(crtc->dev, file_priv, handle);
+	if (!obj) {
+		DRM_ERROR("Cannot find cursor object %x for crtc %d\n", handle, radeon_crtc->crtc_id);
+		return -ENOENT;
+	}
+
+	radeon_crtc->cursor_width = width;
+	radeon_crtc->cursor_height = height;
+
+	radeon_lock_cursor(crtc, true);
+
+	if (hot_x != radeon_crtc->cursor_hot_x ||
+	    hot_y != radeon_crtc->cursor_hot_y) {
+		int x, y;
+
+		x = radeon_crtc->cursor_x + radeon_crtc->cursor_hot_x - hot_x;
+		y = radeon_crtc->cursor_y + radeon_crtc->cursor_hot_y - hot_y;
+
+		radeon_cursor_move_locked(crtc, x, y);
+
+		radeon_crtc->cursor_hot_x = hot_x;
+		radeon_crtc->cursor_hot_y = hot_y;
+	}
+
+	ret = radeon_set_cursor(crtc, obj);
+
+	if (ret)
+		DRM_ERROR("radeon_set_cursor returned %d, not changing cursor\n",
+			  ret);
+	else
+		radeon_show_cursor(crtc);
+
+	radeon_lock_cursor(crtc, false);
+
+unpin:
+	if (radeon_crtc->cursor_bo) {
+		struct radeon_bo *robj = gem_to_radeon_bo(radeon_crtc->cursor_bo);
+		ret = radeon_bo_reserve(robj, false);
+		if (likely(ret == 0)) {
+			radeon_bo_unpin(robj);
+			radeon_bo_unreserve(robj);
+		}
+		if (radeon_crtc->cursor_bo != obj)
+			drm_gem_object_unreference_unlocked(radeon_crtc->cursor_bo);
+	}
+
+	radeon_crtc->cursor_bo = obj;
+	return 0;
+}
+
+/**
+ * radeon_cursor_reset - Re-set the current cursor, if any.
+ *
+ * @crtc: drm crtc
+ *
+ * If the CRTC passed in currently has a cursor assigned, this function
+ * makes sure it's visible.
+ */
+void radeon_cursor_reset(struct drm_crtc *crtc)
+{
+	struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc);
+	int ret;
+
+	if (radeon_crtc->cursor_bo) {
+		radeon_lock_cursor(crtc, true);
+
+		radeon_cursor_move_locked(crtc, radeon_crtc->cursor_x,
+					  radeon_crtc->cursor_y);
+
+		ret = radeon_set_cursor(crtc, radeon_crtc->cursor_bo);
+		if (ret)
+			DRM_ERROR("radeon_set_cursor returned %d, not showing "
+				  "cursor\n", ret);
+		else
+			radeon_show_cursor(crtc);
+
+		radeon_lock_cursor(crtc, false);
+	}
 }
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 995a8b1770dd..0ec65168f331 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -377,6 +377,37 @@ void radeon_doorbell_free(struct radeon_device *rdev, u32 doorbell)
 		__clear_bit(doorbell, rdev->doorbell.used);
 }
 
+/**
+ * radeon_doorbell_get_kfd_info - Report doorbell configuration required to
+ *                                setup KFD
+ *
+ * @rdev: radeon_device pointer
+ * @aperture_base: output returning doorbell aperture base physical address
+ * @aperture_size: output returning doorbell aperture size in bytes
+ * @start_offset: output returning # of doorbell bytes reserved for radeon.
+ *
+ * Radeon and the KFD share the doorbell aperture. Radeon sets it up,
+ * takes doorbells required for its own rings and reports the setup to KFD.
+ * Radeon reserved doorbells are at the start of the doorbell aperture.
+ */
+void radeon_doorbell_get_kfd_info(struct radeon_device *rdev,
+				  phys_addr_t *aperture_base,
+				  size_t *aperture_size,
+				  size_t *start_offset)
+{
+	/* The first num_doorbells are used by radeon.
+	 * KFD takes whatever's left in the aperture. */
+	if (rdev->doorbell.size > rdev->doorbell.num_doorbells * sizeof(u32)) {
+		*aperture_base = rdev->doorbell.base;
+		*aperture_size = rdev->doorbell.size;
+		*start_offset = rdev->doorbell.num_doorbells * sizeof(u32);
+	} else {
+		*aperture_base = 0;
+		*aperture_size = 0;
+		*start_offset = 0;
+	}
+}
+
 /*
  * radeon_wb_*()
  * Writeback is the the method by which the the GPU updates special pages
@@ -1273,6 +1304,7 @@ int radeon_device_init(struct radeon_device *rdev,
 	mutex_init(&rdev->pm.mutex);
 	mutex_init(&rdev->gpu_clock_mutex);
 	mutex_init(&rdev->srbm_mutex);
+	mutex_init(&rdev->grbm_idx_mutex);
 	init_rwsem(&rdev->pm.mclk_lock);
 	init_rwsem(&rdev->exclusive_lock);
 	init_waitqueue_head(&rdev->irq.vblank_queue);
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
index 00ead8c2758a..102116902a07 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -32,6 +32,7 @@
 
 #include <linux/pm_runtime.h>
 #include <drm/drm_crtc_helper.h>
+#include <drm/drm_plane_helper.h>
 #include <drm/drm_edid.h>
 
 #include <linux/gcd.h>
@@ -634,7 +635,7 @@ radeon_crtc_set_config(struct drm_mode_set *set)
 	return ret;
 }
 static const struct drm_crtc_funcs radeon_crtc_funcs = {
-	.cursor_set = radeon_crtc_cursor_set,
+	.cursor_set2 = radeon_crtc_cursor_set2,
 	.cursor_move = radeon_crtc_cursor_move,
 	.gamma_set = radeon_crtc_gamma_set,
 	.set_config = radeon_crtc_set_config,
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index dcffa30ee2db..4f50fb0e3d93 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -41,6 +41,8 @@
 #include <drm/drm_gem.h>
 
 #include "drm_crtc_helper.h"
+#include "radeon_kfd.h"
+
 /*
  * KMS wrapper.
  * - 2.0.0 - initial interface
@@ -654,12 +656,15 @@ static int __init radeon_init(void)
 #endif
 	}
 
+	radeon_kfd_init();
+
 	/* let modprobe override vga console setting */
 	return drm_pci_init(driver, pdriver);
 }
 
 static void __exit radeon_exit(void)
 {
+	radeon_kfd_fini();
 	drm_pci_exit(driver, pdriver);
 	radeon_unregister_atpx_handler();
 }
diff --git a/drivers/gpu/drm/radeon/radeon_fb.c b/drivers/gpu/drm/radeon/radeon_fb.c
index 0ea1db83d573..29b9220ec399 100644
--- a/drivers/gpu/drm/radeon/radeon_fb.c
+++ b/drivers/gpu/drm/radeon/radeon_fb.c
@@ -48,10 +48,40 @@ struct radeon_fbdev {
 	struct radeon_device *rdev;
 };
 
+/**
+ * radeon_fb_helper_set_par - Hide cursor on CRTCs used by fbdev.
+ *
+ * @info: fbdev info
+ *
+ * This function hides the cursor on all CRTCs used by fbdev.
+ */
+static int radeon_fb_helper_set_par(struct fb_info *info)
+{
+	int ret;
+
+	ret = drm_fb_helper_set_par(info);
+
+	/* XXX: with universal plane support fbdev will automatically disable
+	 * all non-primary planes (including the cursor)
+	 */
+	if (ret == 0) {
+		struct drm_fb_helper *fb_helper = info->par;
+		int i;
+
+		for (i = 0; i < fb_helper->crtc_count; i++) {
+			struct drm_crtc *crtc = fb_helper->crtc_info[i].mode_set.crtc;
+
+			radeon_crtc_cursor_set2(crtc, NULL, 0, 0, 0, 0, 0);
+		}
+	}
+
+	return ret;
+}
+
 static struct fb_ops radeonfb_ops = {
 	.owner = THIS_MODULE,
 	.fb_check_var = drm_fb_helper_check_var,
-	.fb_set_par = drm_fb_helper_set_par,
+	.fb_set_par = radeon_fb_helper_set_par,
 	.fb_fillrect = cfb_fillrect,
 	.fb_copyarea = cfb_copyarea,
 	.fb_imageblit = cfb_imageblit,
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 995167025282..d13d1b5a859f 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -140,6 +140,7 @@ int radeon_fence_emit(struct radeon_device *rdev,
 	(*fence)->rdev = rdev;
 	(*fence)->seq = seq;
 	(*fence)->ring = ring;
+	(*fence)->is_vm_update = false;
 	fence_init(&(*fence)->base, &radeon_fence_ops,
 		   &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
 	radeon_fence_ring_emit(rdev, ring, *fence);
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index c194497aa586..fe48f229043e 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -394,9 +394,10 @@ int radeon_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 	return r;
 }
 
-int radeon_mode_dumb_mmap(struct drm_file *filp,
-			  struct drm_device *dev,
-			  uint32_t handle, uint64_t *offset_p)
+static int radeon_mode_mmap(struct drm_file *filp,
+			    struct drm_device *dev,
+			    uint32_t handle, bool dumb,
+			    uint64_t *offset_p)
 {
 	struct drm_gem_object *gobj;
 	struct radeon_bo *robj;
@@ -405,6 +406,14 @@ int radeon_mode_dumb_mmap(struct drm_file *filp,
 	if (gobj == NULL) {
 		return -ENOENT;
 	}
+
+	/*
+	 * We don't allow dumb mmaps on objects created using another
+	 * interface.
+	 */
+	WARN_ONCE(dumb && !(gobj->dumb || gobj->import_attach),
+		"Illegal dumb map of GPU buffer.\n");
+
 	robj = gem_to_radeon_bo(gobj);
 	if (radeon_ttm_tt_has_userptr(robj->tbo.ttm)) {
 		drm_gem_object_unreference_unlocked(gobj);
@@ -415,12 +424,20 @@ int radeon_mode_dumb_mmap(struct drm_file *filp,
 	return 0;
 }
 
+int radeon_mode_dumb_mmap(struct drm_file *filp,
+			  struct drm_device *dev,
+			  uint32_t handle, uint64_t *offset_p)
+{
+	return radeon_mode_mmap(filp, dev, handle, true, offset_p);
+}
+
 int radeon_gem_mmap_ioctl(struct drm_device *dev, void *data,
 			  struct drm_file *filp)
 {
 	struct drm_radeon_gem_mmap *args = data;
 
-	return radeon_mode_dumb_mmap(filp, dev, args->handle, &args->addr_ptr);
+	return radeon_mode_mmap(filp, dev, args->handle, false,
+				&args->addr_ptr);
 }
 
 int radeon_gem_busy_ioctl(struct drm_device *dev, void *data,
@@ -518,6 +535,68 @@ out:
 	return r;
 }
 
+/**
+ * radeon_gem_va_update_vm -update the bo_va in its VM
+ *
+ * @rdev: radeon_device pointer
+ * @bo_va: bo_va to update
+ *
+ * Update the bo_va directly after setting it's address. Errors are not
+ * vital here, so they are not reported back to userspace.
+ */
+static void radeon_gem_va_update_vm(struct radeon_device *rdev,
+				    struct radeon_bo_va *bo_va)
+{
+	struct ttm_validate_buffer tv, *entry;
+	struct radeon_bo_list *vm_bos;
+	struct ww_acquire_ctx ticket;
+	struct list_head list;
+	unsigned domain;
+	int r;
+
+	INIT_LIST_HEAD(&list);
+
+	tv.bo = &bo_va->bo->tbo;
+	tv.shared = true;
+	list_add(&tv.head, &list);
+
+	vm_bos = radeon_vm_get_bos(rdev, bo_va->vm, &list);
+	if (!vm_bos)
+		return;
+
+	r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL);
+	if (r)
+		goto error_free;
+
+	list_for_each_entry(entry, &list, head) {
+		domain = radeon_mem_type_to_domain(entry->bo->mem.mem_type);
+		/* if anything is swapped out don't swap it in here,
+		   just abort and wait for the next CS */
+		if (domain == RADEON_GEM_DOMAIN_CPU)
+			goto error_unreserve;
+	}
+
+	mutex_lock(&bo_va->vm->mutex);
+	r = radeon_vm_clear_freed(rdev, bo_va->vm);
+	if (r)
+		goto error_unlock;
+
+	if (bo_va->it.start)
+		r = radeon_vm_bo_update(rdev, bo_va, &bo_va->bo->tbo.mem);
+
+error_unlock:
+	mutex_unlock(&bo_va->vm->mutex);
+
+error_unreserve:
+	ttm_eu_backoff_reservation(&ticket, &list);
+
+error_free:
+	drm_free_large(vm_bos);
+
+	if (r)
+		DRM_ERROR("Couldn't update BO_VA (%d)\n", r);
+}
+
 int radeon_gem_va_ioctl(struct drm_device *dev, void *data,
 			  struct drm_file *filp)
 {
@@ -601,6 +680,7 @@ int radeon_gem_va_ioctl(struct drm_device *dev, void *data,
 		if (bo_va->it.start) {
 			args->operation = RADEON_VA_RESULT_VA_EXIST;
 			args->offset = bo_va->it.start * RADEON_GPU_PAGE_SIZE;
+			radeon_bo_unreserve(rbo);
 			goto out;
 		}
 		r = radeon_vm_bo_set_addr(rdev, bo_va, args->offset, args->flags);
@@ -611,12 +691,13 @@ int radeon_gem_va_ioctl(struct drm_device *dev, void *data,
 	default:
 		break;
 	}
+	if (!r)
+		radeon_gem_va_update_vm(rdev, bo_va);
 	args->operation = RADEON_VA_RESULT_OK;
 	if (r) {
 		args->operation = RADEON_VA_RESULT_ERROR;
 	}
 out:
-	radeon_bo_unreserve(rbo);
 	drm_gem_object_unreference_unlocked(gobj);
 	return r;
 }
@@ -682,6 +763,7 @@ int radeon_mode_dumb_create(struct drm_file *file_priv,
 		return -ENOMEM;
 
 	r = drm_gem_handle_create(file_priv, gobj, &handle);
+	gobj->dumb = true;
 	/* drop reference from allocate - handle holds it now */
 	drm_gem_object_unreference_unlocked(gobj);
 	if (r) {
diff --git a/drivers/gpu/drm/radeon/radeon_ib.c b/drivers/gpu/drm/radeon/radeon_ib.c
index 3f39fcca4d07..c39ce1f05703 100644
--- a/drivers/gpu/drm/radeon/radeon_ib.c
+++ b/drivers/gpu/drm/radeon/radeon_ib.c
@@ -64,10 +64,7 @@ int radeon_ib_get(struct radeon_device *rdev, int ring,
 		return r;
 	}
 
-	r = radeon_semaphore_create(rdev, &ib->semaphore);
-	if (r) {
-		return r;
-	}
+	radeon_sync_create(&ib->sync);
 
 	ib->ring = ring;
 	ib->fence = NULL;
@@ -96,7 +93,7 @@ int radeon_ib_get(struct radeon_device *rdev, int ring,
  */
 void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib *ib)
 {
-	radeon_semaphore_free(rdev, &ib->semaphore, ib->fence);
+	radeon_sync_free(rdev, &ib->sync, ib->fence);
 	radeon_sa_bo_free(rdev, &ib->sa_bo, ib->fence);
 	radeon_fence_unref(&ib->fence);
 }
@@ -145,11 +142,11 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib,
 	if (ib->vm) {
 		struct radeon_fence *vm_id_fence;
 		vm_id_fence = radeon_vm_grab_id(rdev, ib->vm, ib->ring);
-		radeon_semaphore_sync_fence(ib->semaphore, vm_id_fence);
+		radeon_sync_fence(&ib->sync, vm_id_fence);
 	}
 
 	/* sync with other rings */
-	r = radeon_semaphore_sync_rings(rdev, ib->semaphore, ib->ring);
+	r = radeon_sync_rings(rdev, &ib->sync, ib->ring);
 	if (r) {
 		dev_err(rdev->dev, "failed to sync rings (%d)\n", r);
 		radeon_ring_unlock_undo(rdev, ring);
@@ -157,11 +154,12 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib,
 	}
 
 	if (ib->vm)
-		radeon_vm_flush(rdev, ib->vm, ib->ring);
+		radeon_vm_flush(rdev, ib->vm, ib->ring,
+				ib->sync.last_vm_update);
 
 	if (const_ib) {
 		radeon_ring_ib_execute(rdev, const_ib->ring, const_ib);
-		radeon_semaphore_free(rdev, &const_ib->semaphore, NULL);
+		radeon_sync_free(rdev, &const_ib->sync, NULL);
 	}
 	radeon_ring_ib_execute(rdev, ib->ring, ib);
 	r = radeon_fence_emit(rdev, &ib->fence, ib->ring);
diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
new file mode 100644
index 000000000000..065d02068ec3
--- /dev/null
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -0,0 +1,563 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/fdtable.h>
+#include <linux/uaccess.h>
+#include <drm/drmP.h>
+#include "radeon.h"
+#include "cikd.h"
+#include "cik_reg.h"
+#include "radeon_kfd.h"
+
+#define CIK_PIPE_PER_MEC	(4)
+
+struct kgd_mem {
+	struct radeon_sa_bo *sa_bo;
+	uint64_t gpu_addr;
+	void *ptr;
+};
+
+static int init_sa_manager(struct kgd_dev *kgd, unsigned int size);
+static void fini_sa_manager(struct kgd_dev *kgd);
+
+static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
+		enum kgd_memory_pool pool, struct kgd_mem **mem);
+
+static void free_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
+
+static uint64_t get_vmem_size(struct kgd_dev *kgd);
+static uint64_t get_gpu_clock_counter(struct kgd_dev *kgd);
+
+static uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd);
+
+/*
+ * Register access functions
+ */
+
+static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
+		uint32_t sh_mem_config,	uint32_t sh_mem_ape1_base,
+		uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases);
+
+static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
+					unsigned int vmid);
+
+static int kgd_init_memory(struct kgd_dev *kgd);
+
+static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
+				uint32_t hpd_size, uint64_t hpd_gpu_addr);
+
+static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
+			uint32_t queue_id, uint32_t __user *wptr);
+
+static bool kgd_hqd_is_occupies(struct kgd_dev *kgd, uint64_t queue_address,
+				uint32_t pipe_id, uint32_t queue_id);
+
+static int kgd_hqd_destroy(struct kgd_dev *kgd, uint32_t reset_type,
+				unsigned int timeout, uint32_t pipe_id,
+				uint32_t queue_id);
+
+static const struct kfd2kgd_calls kfd2kgd = {
+	.init_sa_manager = init_sa_manager,
+	.fini_sa_manager = fini_sa_manager,
+	.allocate_mem = allocate_mem,
+	.free_mem = free_mem,
+	.get_vmem_size = get_vmem_size,
+	.get_gpu_clock_counter = get_gpu_clock_counter,
+	.get_max_engine_clock_in_mhz = get_max_engine_clock_in_mhz,
+	.program_sh_mem_settings = kgd_program_sh_mem_settings,
+	.set_pasid_vmid_mapping = kgd_set_pasid_vmid_mapping,
+	.init_memory = kgd_init_memory,
+	.init_pipeline = kgd_init_pipeline,
+	.hqd_load = kgd_hqd_load,
+	.hqd_is_occupies = kgd_hqd_is_occupies,
+	.hqd_destroy = kgd_hqd_destroy,
+};
+
+static const struct kgd2kfd_calls *kgd2kfd;
+
+bool radeon_kfd_init(void)
+{
+	bool (*kgd2kfd_init_p)(unsigned, const struct kfd2kgd_calls*,
+				const struct kgd2kfd_calls**);
+
+	kgd2kfd_init_p = symbol_request(kgd2kfd_init);
+
+	if (kgd2kfd_init_p == NULL)
+		return false;
+
+	if (!kgd2kfd_init_p(KFD_INTERFACE_VERSION, &kfd2kgd, &kgd2kfd)) {
+		symbol_put(kgd2kfd_init);
+		kgd2kfd = NULL;
+
+		return false;
+	}
+
+	return true;
+}
+
+void radeon_kfd_fini(void)
+{
+	if (kgd2kfd) {
+		kgd2kfd->exit();
+		symbol_put(kgd2kfd_init);
+	}
+}
+
+void radeon_kfd_device_probe(struct radeon_device *rdev)
+{
+	if (kgd2kfd)
+		rdev->kfd = kgd2kfd->probe((struct kgd_dev *)rdev, rdev->pdev);
+}
+
+void radeon_kfd_device_init(struct radeon_device *rdev)
+{
+	if (rdev->kfd) {
+		struct kgd2kfd_shared_resources gpu_resources = {
+			.compute_vmid_bitmap = 0xFF00,
+
+			.first_compute_pipe = 1,
+			.compute_pipe_count = 8 - 1,
+		};
+
+		radeon_doorbell_get_kfd_info(rdev,
+				&gpu_resources.doorbell_physical_address,
+				&gpu_resources.doorbell_aperture_size,
+				&gpu_resources.doorbell_start_offset);
+
+		kgd2kfd->device_init(rdev->kfd, &gpu_resources);
+	}
+}
+
+void radeon_kfd_device_fini(struct radeon_device *rdev)
+{
+	if (rdev->kfd) {
+		kgd2kfd->device_exit(rdev->kfd);
+		rdev->kfd = NULL;
+	}
+}
+
+void radeon_kfd_interrupt(struct radeon_device *rdev, const void *ih_ring_entry)
+{
+	if (rdev->kfd)
+		kgd2kfd->interrupt(rdev->kfd, ih_ring_entry);
+}
+
+void radeon_kfd_suspend(struct radeon_device *rdev)
+{
+	if (rdev->kfd)
+		kgd2kfd->suspend(rdev->kfd);
+}
+
+int radeon_kfd_resume(struct radeon_device *rdev)
+{
+	int r = 0;
+
+	if (rdev->kfd)
+		r = kgd2kfd->resume(rdev->kfd);
+
+	return r;
+}
+
+static u32 pool_to_domain(enum kgd_memory_pool p)
+{
+	switch (p) {
+	case KGD_POOL_FRAMEBUFFER: return RADEON_GEM_DOMAIN_VRAM;
+	default: return RADEON_GEM_DOMAIN_GTT;
+	}
+}
+
+static int init_sa_manager(struct kgd_dev *kgd, unsigned int size)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+	int r;
+
+	BUG_ON(kgd == NULL);
+
+	r = radeon_sa_bo_manager_init(rdev, &rdev->kfd_bo,
+				      size,
+				      RADEON_GPU_PAGE_SIZE,
+				      RADEON_GEM_DOMAIN_GTT,
+				      RADEON_GEM_GTT_WC);
+
+	if (r)
+		return r;
+
+	r = radeon_sa_bo_manager_start(rdev, &rdev->kfd_bo);
+	if (r)
+		radeon_sa_bo_manager_fini(rdev, &rdev->kfd_bo);
+
+	return r;
+}
+
+static void fini_sa_manager(struct kgd_dev *kgd)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+
+	BUG_ON(kgd == NULL);
+
+	radeon_sa_bo_manager_suspend(rdev, &rdev->kfd_bo);
+	radeon_sa_bo_manager_fini(rdev, &rdev->kfd_bo);
+}
+
+static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
+		enum kgd_memory_pool pool, struct kgd_mem **mem)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+	u32 domain;
+	int r;
+
+	BUG_ON(kgd == NULL);
+
+	domain = pool_to_domain(pool);
+	if (domain != RADEON_GEM_DOMAIN_GTT) {
+		dev_err(rdev->dev,
+			"Only allowed to allocate gart memory for kfd\n");
+		return -EINVAL;
+	}
+
+	*mem = kmalloc(sizeof(struct kgd_mem), GFP_KERNEL);
+	if ((*mem) == NULL)
+		return -ENOMEM;
+
+	r = radeon_sa_bo_new(rdev, &rdev->kfd_bo, &(*mem)->sa_bo, size,
+				alignment);
+	if (r) {
+		dev_err(rdev->dev, "failed to get memory for kfd (%d)\n", r);
+		return r;
+	}
+
+	(*mem)->ptr = radeon_sa_bo_cpu_addr((*mem)->sa_bo);
+	(*mem)->gpu_addr = radeon_sa_bo_gpu_addr((*mem)->sa_bo);
+
+	return 0;
+}
+
+static void free_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+
+	BUG_ON(kgd == NULL);
+
+	radeon_sa_bo_free(rdev, &mem->sa_bo, NULL);
+	kfree(mem);
+}
+
+static uint64_t get_vmem_size(struct kgd_dev *kgd)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+
+	BUG_ON(kgd == NULL);
+
+	return rdev->mc.real_vram_size;
+}
+
+static uint64_t get_gpu_clock_counter(struct kgd_dev *kgd)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+
+	return rdev->asic->get_gpu_clock_counter(rdev);
+}
+
+static uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+
+	/* The sclk is in quantas of 10kHz */
+	return rdev->pm.dpm.dyn_state.max_clock_voltage_on_ac.sclk / 100;
+}
+
+static inline struct radeon_device *get_radeon_device(struct kgd_dev *kgd)
+{
+	return (struct radeon_device *)kgd;
+}
+
+static void write_register(struct kgd_dev *kgd, uint32_t offset, uint32_t value)
+{
+	struct radeon_device *rdev = get_radeon_device(kgd);
+
+	writel(value, (void __iomem *)(rdev->rmmio + offset));
+}
+
+static uint32_t read_register(struct kgd_dev *kgd, uint32_t offset)
+{
+	struct radeon_device *rdev = get_radeon_device(kgd);
+
+	return readl((void __iomem *)(rdev->rmmio + offset));
+}
+
+static void lock_srbm(struct kgd_dev *kgd, uint32_t mec, uint32_t pipe,
+			uint32_t queue, uint32_t vmid)
+{
+	struct radeon_device *rdev = get_radeon_device(kgd);
+	uint32_t value = PIPEID(pipe) | MEID(mec) | VMID(vmid) | QUEUEID(queue);
+
+	mutex_lock(&rdev->srbm_mutex);
+	write_register(kgd, SRBM_GFX_CNTL, value);
+}
+
+static void unlock_srbm(struct kgd_dev *kgd)
+{
+	struct radeon_device *rdev = get_radeon_device(kgd);
+
+	write_register(kgd, SRBM_GFX_CNTL, 0);
+	mutex_unlock(&rdev->srbm_mutex);
+}
+
+static void acquire_queue(struct kgd_dev *kgd, uint32_t pipe_id,
+				uint32_t queue_id)
+{
+	uint32_t mec = (++pipe_id / CIK_PIPE_PER_MEC) + 1;
+	uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
+
+	lock_srbm(kgd, mec, pipe, queue_id, 0);
+}
+
+static void release_queue(struct kgd_dev *kgd)
+{
+	unlock_srbm(kgd);
+}
+
+static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
+					uint32_t sh_mem_config,
+					uint32_t sh_mem_ape1_base,
+					uint32_t sh_mem_ape1_limit,
+					uint32_t sh_mem_bases)
+{
+	lock_srbm(kgd, 0, 0, 0, vmid);
+
+	write_register(kgd, SH_MEM_CONFIG, sh_mem_config);
+	write_register(kgd, SH_MEM_APE1_BASE, sh_mem_ape1_base);
+	write_register(kgd, SH_MEM_APE1_LIMIT, sh_mem_ape1_limit);
+	write_register(kgd, SH_MEM_BASES, sh_mem_bases);
+
+	unlock_srbm(kgd);
+}
+
+static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
+					unsigned int vmid)
+{
+	/*
+	 * We have to assume that there is no outstanding mapping.
+	 * The ATC_VMID_PASID_MAPPING_UPDATE_STATUS bit could be 0
+	 * because a mapping is in progress or because a mapping finished and
+	 * the SW cleared it.
+	 * So the protocol is to always wait & clear.
+	 */
+	uint32_t pasid_mapping = (pasid == 0) ? 0 :
+				(uint32_t)pasid | ATC_VMID_PASID_MAPPING_VALID;
+
+	write_register(kgd, ATC_VMID0_PASID_MAPPING + vmid*sizeof(uint32_t),
+			pasid_mapping);
+
+	while (!(read_register(kgd, ATC_VMID_PASID_MAPPING_UPDATE_STATUS) &
+								(1U << vmid)))
+		cpu_relax();
+	write_register(kgd, ATC_VMID_PASID_MAPPING_UPDATE_STATUS, 1U << vmid);
+
+	return 0;
+}
+
+static int kgd_init_memory(struct kgd_dev *kgd)
+{
+	/*
+	 * Configure apertures:
+	 * LDS:         0x60000000'00000000 - 0x60000001'00000000 (4GB)
+	 * Scratch:     0x60000001'00000000 - 0x60000002'00000000 (4GB)
+	 * GPUVM:       0x60010000'00000000 - 0x60020000'00000000 (1TB)
+	 */
+	int i;
+	uint32_t sh_mem_bases = PRIVATE_BASE(0x6000) | SHARED_BASE(0x6000);
+
+	for (i = 8; i < 16; i++) {
+		uint32_t sh_mem_config;
+
+		lock_srbm(kgd, 0, 0, 0, i);
+
+		sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED);
+		sh_mem_config |= DEFAULT_MTYPE(MTYPE_NONCACHED);
+
+		write_register(kgd, SH_MEM_CONFIG, sh_mem_config);
+
+		write_register(kgd, SH_MEM_BASES, sh_mem_bases);
+
+		/* Scratch aperture is not supported for now. */
+		write_register(kgd, SH_STATIC_MEM_CONFIG, 0);
+
+		/* APE1 disabled for now. */
+		write_register(kgd, SH_MEM_APE1_BASE, 1);
+		write_register(kgd, SH_MEM_APE1_LIMIT, 0);
+
+		unlock_srbm(kgd);
+	}
+
+	return 0;
+}
+
+static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
+				uint32_t hpd_size, uint64_t hpd_gpu_addr)
+{
+	uint32_t mec = (++pipe_id / CIK_PIPE_PER_MEC) + 1;
+	uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
+
+	lock_srbm(kgd, mec, pipe, 0, 0);
+	write_register(kgd, CP_HPD_EOP_BASE_ADDR,
+			lower_32_bits(hpd_gpu_addr >> 8));
+	write_register(kgd, CP_HPD_EOP_BASE_ADDR_HI,
+			upper_32_bits(hpd_gpu_addr >> 8));
+	write_register(kgd, CP_HPD_EOP_VMID, 0);
+	write_register(kgd, CP_HPD_EOP_CONTROL, hpd_size);
+	unlock_srbm(kgd);
+
+	return 0;
+}
+
+static inline struct cik_mqd *get_mqd(void *mqd)
+{
+	return (struct cik_mqd *)mqd;
+}
+
+static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
+			uint32_t queue_id, uint32_t __user *wptr)
+{
+	uint32_t wptr_shadow, is_wptr_shadow_valid;
+	struct cik_mqd *m;
+
+	m = get_mqd(mqd);
+
+	is_wptr_shadow_valid = !get_user(wptr_shadow, wptr);
+
+	acquire_queue(kgd, pipe_id, queue_id);
+	write_register(kgd, CP_MQD_BASE_ADDR, m->cp_mqd_base_addr_lo);
+	write_register(kgd, CP_MQD_BASE_ADDR_HI, m->cp_mqd_base_addr_hi);
+	write_register(kgd, CP_MQD_CONTROL, m->cp_mqd_control);
+
+	write_register(kgd, CP_HQD_PQ_BASE, m->cp_hqd_pq_base_lo);
+	write_register(kgd, CP_HQD_PQ_BASE_HI, m->cp_hqd_pq_base_hi);
+	write_register(kgd, CP_HQD_PQ_CONTROL, m->cp_hqd_pq_control);
+
+	write_register(kgd, CP_HQD_IB_CONTROL, m->cp_hqd_ib_control);
+	write_register(kgd, CP_HQD_IB_BASE_ADDR, m->cp_hqd_ib_base_addr_lo);
+	write_register(kgd, CP_HQD_IB_BASE_ADDR_HI, m->cp_hqd_ib_base_addr_hi);
+
+	write_register(kgd, CP_HQD_IB_RPTR, m->cp_hqd_ib_rptr);
+
+	write_register(kgd, CP_HQD_PERSISTENT_STATE,
+			m->cp_hqd_persistent_state);
+	write_register(kgd, CP_HQD_SEMA_CMD, m->cp_hqd_sema_cmd);
+	write_register(kgd, CP_HQD_MSG_TYPE, m->cp_hqd_msg_type);
+
+	write_register(kgd, CP_HQD_ATOMIC0_PREOP_LO,
+			m->cp_hqd_atomic0_preop_lo);
+
+	write_register(kgd, CP_HQD_ATOMIC0_PREOP_HI,
+			m->cp_hqd_atomic0_preop_hi);
+
+	write_register(kgd, CP_HQD_ATOMIC1_PREOP_LO,
+			m->cp_hqd_atomic1_preop_lo);
+
+	write_register(kgd, CP_HQD_ATOMIC1_PREOP_HI,
+			m->cp_hqd_atomic1_preop_hi);
+
+	write_register(kgd, CP_HQD_PQ_RPTR_REPORT_ADDR,
+			m->cp_hqd_pq_rptr_report_addr_lo);
+
+	write_register(kgd, CP_HQD_PQ_RPTR_REPORT_ADDR_HI,
+			m->cp_hqd_pq_rptr_report_addr_hi);
+
+	write_register(kgd, CP_HQD_PQ_RPTR, m->cp_hqd_pq_rptr);
+
+	write_register(kgd, CP_HQD_PQ_WPTR_POLL_ADDR,
+			m->cp_hqd_pq_wptr_poll_addr_lo);
+
+	write_register(kgd, CP_HQD_PQ_WPTR_POLL_ADDR_HI,
+			m->cp_hqd_pq_wptr_poll_addr_hi);
+
+	write_register(kgd, CP_HQD_PQ_DOORBELL_CONTROL,
+			m->cp_hqd_pq_doorbell_control);
+
+	write_register(kgd, CP_HQD_VMID, m->cp_hqd_vmid);
+
+	write_register(kgd, CP_HQD_QUANTUM, m->cp_hqd_quantum);
+
+	write_register(kgd, CP_HQD_PIPE_PRIORITY, m->cp_hqd_pipe_priority);
+	write_register(kgd, CP_HQD_QUEUE_PRIORITY, m->cp_hqd_queue_priority);
+
+	write_register(kgd, CP_HQD_IQ_RPTR, m->cp_hqd_iq_rptr);
+
+	if (is_wptr_shadow_valid)
+		write_register(kgd, CP_HQD_PQ_WPTR, wptr_shadow);
+
+	write_register(kgd, CP_HQD_ACTIVE, m->cp_hqd_active);
+	release_queue(kgd);
+
+	return 0;
+}
+
+static bool kgd_hqd_is_occupies(struct kgd_dev *kgd, uint64_t queue_address,
+				uint32_t pipe_id, uint32_t queue_id)
+{
+	uint32_t act;
+	bool retval = false;
+	uint32_t low, high;
+
+	acquire_queue(kgd, pipe_id, queue_id);
+	act = read_register(kgd, CP_HQD_ACTIVE);
+	if (act) {
+		low = lower_32_bits(queue_address >> 8);
+		high = upper_32_bits(queue_address >> 8);
+
+		if (low == read_register(kgd, CP_HQD_PQ_BASE) &&
+				high == read_register(kgd, CP_HQD_PQ_BASE_HI))
+			retval = true;
+	}
+	release_queue(kgd);
+	return retval;
+}
+
+static int kgd_hqd_destroy(struct kgd_dev *kgd, uint32_t reset_type,
+				unsigned int timeout, uint32_t pipe_id,
+				uint32_t queue_id)
+{
+	uint32_t temp;
+
+	acquire_queue(kgd, pipe_id, queue_id);
+	write_register(kgd, CP_HQD_PQ_DOORBELL_CONTROL, 0);
+
+	write_register(kgd, CP_HQD_DEQUEUE_REQUEST, reset_type);
+
+	while (true) {
+		temp = read_register(kgd, CP_HQD_ACTIVE);
+		if (temp & 0x1)
+			break;
+		if (timeout == 0) {
+			pr_err("kfd: cp queue preemption time out (%dms)\n",
+				temp);
+			return -ETIME;
+		}
+		msleep(20);
+		timeout -= 20;
+	}
+
+	release_queue(kgd);
+	return 0;
+}
diff --git a/drivers/gpu/drm/radeon/radeon_kfd.h b/drivers/gpu/drm/radeon/radeon_kfd.h
new file mode 100644
index 000000000000..f90e161ca507
--- /dev/null
+++ b/drivers/gpu/drm/radeon/radeon_kfd.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/*
+ * radeon_kfd.h defines the private interface between the
+ * AMD kernel graphics drivers and the AMD KFD.
+ */
+
+#ifndef RADEON_KFD_H_INCLUDED
+#define RADEON_KFD_H_INCLUDED
+
+#include <linux/types.h>
+#include "../amd/include/kgd_kfd_interface.h"
+
+struct radeon_device;
+
+bool radeon_kfd_init(void);
+void radeon_kfd_fini(void);
+
+void radeon_kfd_suspend(struct radeon_device *rdev);
+int radeon_kfd_resume(struct radeon_device *rdev);
+void radeon_kfd_interrupt(struct radeon_device *rdev,
+			const void *ih_ring_entry);
+void radeon_kfd_device_probe(struct radeon_device *rdev);
+void radeon_kfd_device_init(struct radeon_device *rdev);
+void radeon_kfd_device_fini(struct radeon_device *rdev);
+
+#endif /* RADEON_KFD_H_INCLUDED */
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
index 03586763ee86..3cf9c1fa6475 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -34,6 +34,8 @@
 #include <linux/slab.h>
 #include <linux/pm_runtime.h>
 
+#include "radeon_kfd.h"
+
 #if defined(CONFIG_VGA_SWITCHEROO)
 bool radeon_has_atpx(void);
 #else
@@ -63,6 +65,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
 
 	pm_runtime_get_sync(dev->dev);
 
+	radeon_kfd_device_fini(rdev);
+
 	radeon_acpi_fini(rdev);
 	
 	radeon_modeset_fini(rdev);
@@ -142,6 +146,9 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
 				"Error during ACPI methods call\n");
 	}
 
+	radeon_kfd_device_probe(rdev);
+	radeon_kfd_device_init(rdev);
+
 	if (radeon_is_px(dev)) {
 		pm_runtime_use_autosuspend(dev->dev);
 		pm_runtime_set_autosuspend_delay(dev->dev, 5000);
@@ -621,8 +628,6 @@ int radeon_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 						  RADEON_VA_IB_OFFSET,
 						  RADEON_VM_PAGE_READABLE |
 						  RADEON_VM_PAGE_SNOOPED);
-
-			radeon_bo_unreserve(rdev->ring_tmp_bo.bo);
 			if (r) {
 				radeon_vm_fini(rdev, vm);
 				kfree(fpriv);
diff --git a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
index cafb1ccf2ec3..678b4386540d 100644
--- a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
+++ b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
@@ -1054,6 +1054,7 @@ static int radeon_crtc_mode_set(struct drm_crtc *crtc,
 			DRM_ERROR("Mode need scaling but only first crtc can do that.\n");
 		}
 	}
+	radeon_cursor_reset(crtc);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_mode.h b/drivers/gpu/drm/radeon/radeon_mode.h
index 04db2fdd8692..390db897f322 100644
--- a/drivers/gpu/drm/radeon/radeon_mode.h
+++ b/drivers/gpu/drm/radeon/radeon_mode.h
@@ -321,6 +321,10 @@ struct radeon_crtc {
 	uint32_t crtc_offset;
 	struct drm_gem_object *cursor_bo;
 	uint64_t cursor_addr;
+	int cursor_x;
+	int cursor_y;
+	int cursor_hot_x;
+	int cursor_hot_y;
 	int cursor_width;
 	int cursor_height;
 	int max_cursor_width;
@@ -462,6 +466,7 @@ struct radeon_gpio_rec {
 	u8 id;
 	u32 reg;
 	u32 mask;
+	u32 shift;
 };
 
 struct radeon_hpd {
@@ -748,6 +753,8 @@ extern bool radeon_atombios_get_ppll_ss_info(struct radeon_device *rdev,
 extern bool radeon_atombios_get_asic_ss_info(struct radeon_device *rdev,
 					     struct radeon_atom_ss *ss,
 					     int id, u32 clock);
+extern struct radeon_gpio_rec radeon_atombios_lookup_gpio(struct radeon_device *rdev,
+							  u8 id);
 
 extern void radeon_compute_pll_legacy(struct radeon_pll *pll,
 				      uint64_t freq,
@@ -802,13 +809,16 @@ extern int radeon_crtc_set_base_atomic(struct drm_crtc *crtc,
 extern int radeon_crtc_do_set_base(struct drm_crtc *crtc,
 				   struct drm_framebuffer *fb,
 				   int x, int y, int atomic);
-extern int radeon_crtc_cursor_set(struct drm_crtc *crtc,
-				  struct drm_file *file_priv,
-				  uint32_t handle,
-				  uint32_t width,
-				  uint32_t height);
+extern int radeon_crtc_cursor_set2(struct drm_crtc *crtc,
+				   struct drm_file *file_priv,
+				   uint32_t handle,
+				   uint32_t width,
+				   uint32_t height,
+				   int32_t hot_x,
+				   int32_t hot_y);
 extern int radeon_crtc_cursor_move(struct drm_crtc *crtc,
 				   int x, int y);
+extern void radeon_cursor_reset(struct drm_crtc *crtc);
 
 extern int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc,
 				      unsigned int flags,
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 4c0d786d5c7a..7d68223eb469 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -99,22 +99,39 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
 
 	rbo->placement.placement = rbo->placements;
 	rbo->placement.busy_placement = rbo->placements;
-	if (domain & RADEON_GEM_DOMAIN_VRAM)
+	if (domain & RADEON_GEM_DOMAIN_VRAM) {
+		/* Try placing BOs which don't need CPU access outside of the
+		 * CPU accessible part of VRAM
+		 */
+		if ((rbo->flags & RADEON_GEM_NO_CPU_ACCESS) &&
+		    rbo->rdev->mc.visible_vram_size < rbo->rdev->mc.real_vram_size) {
+			rbo->placements[c].fpfn =
+				rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT;
+			rbo->placements[c++].flags = TTM_PL_FLAG_WC |
+						     TTM_PL_FLAG_UNCACHED |
+						     TTM_PL_FLAG_VRAM;
+		}
+
+		rbo->placements[c].fpfn = 0;
 		rbo->placements[c++].flags = TTM_PL_FLAG_WC |
 					     TTM_PL_FLAG_UNCACHED |
 					     TTM_PL_FLAG_VRAM;
+	}
 
 	if (domain & RADEON_GEM_DOMAIN_GTT) {
 		if (rbo->flags & RADEON_GEM_GTT_UC) {
+			rbo->placements[c].fpfn = 0;
 			rbo->placements[c++].flags = TTM_PL_FLAG_UNCACHED |
 				TTM_PL_FLAG_TT;
 
 		} else if ((rbo->flags & RADEON_GEM_GTT_WC) ||
 			   (rbo->rdev->flags & RADEON_IS_AGP)) {
+			rbo->placements[c].fpfn = 0;
 			rbo->placements[c++].flags = TTM_PL_FLAG_WC |
 				TTM_PL_FLAG_UNCACHED |
 				TTM_PL_FLAG_TT;
 		} else {
+			rbo->placements[c].fpfn = 0;
 			rbo->placements[c++].flags = TTM_PL_FLAG_CACHED |
 						     TTM_PL_FLAG_TT;
 		}
@@ -122,30 +139,35 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
 
 	if (domain & RADEON_GEM_DOMAIN_CPU) {
 		if (rbo->flags & RADEON_GEM_GTT_UC) {
+			rbo->placements[c].fpfn = 0;
 			rbo->placements[c++].flags = TTM_PL_FLAG_UNCACHED |
 				TTM_PL_FLAG_SYSTEM;
 
 		} else if ((rbo->flags & RADEON_GEM_GTT_WC) ||
 		    rbo->rdev->flags & RADEON_IS_AGP) {
+			rbo->placements[c].fpfn = 0;
 			rbo->placements[c++].flags = TTM_PL_FLAG_WC |
 				TTM_PL_FLAG_UNCACHED |
 				TTM_PL_FLAG_SYSTEM;
 		} else {
+			rbo->placements[c].fpfn = 0;
 			rbo->placements[c++].flags = TTM_PL_FLAG_CACHED |
 						     TTM_PL_FLAG_SYSTEM;
 		}
 	}
-	if (!c)
+	if (!c) {
+		rbo->placements[c].fpfn = 0;
 		rbo->placements[c++].flags = TTM_PL_MASK_CACHING |
 					     TTM_PL_FLAG_SYSTEM;
+	}
 
 	rbo->placement.num_placement = c;
 	rbo->placement.num_busy_placement = c;
 
 	for (i = 0; i < c; ++i) {
-		rbo->placements[i].fpfn = 0;
 		if ((rbo->flags & RADEON_GEM_CPU_ACCESS) &&
-		    (rbo->placements[i].flags & TTM_PL_FLAG_VRAM))
+		    (rbo->placements[i].flags & TTM_PL_FLAG_VRAM) &&
+		    !rbo->placements[i].fpfn)
 			rbo->placements[i].lpfn =
 				rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT;
 		else
@@ -157,9 +179,7 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
 	 * improve fragmentation quality.
 	 * 512kb was measured as the most optimal number.
 	 */
-	if (!((rbo->flags & RADEON_GEM_CPU_ACCESS) &&
-	      (rbo->placements[i].flags & TTM_PL_FLAG_VRAM)) &&
-	    rbo->tbo.mem.size > 512 * 1024) {
+	if (rbo->tbo.mem.size > 512 * 1024) {
 		for (i = 0; i < c; i++) {
 			rbo->placements[i].flags |= TTM_PL_FLAG_TOPDOWN;
 		}
@@ -489,25 +509,29 @@ int radeon_bo_list_validate(struct radeon_device *rdev,
 			    struct ww_acquire_ctx *ticket,
 			    struct list_head *head, int ring)
 {
-	struct radeon_cs_reloc *lobj;
-	struct radeon_bo *bo;
+	struct radeon_bo_list *lobj;
+	struct list_head duplicates;
 	int r;
 	u64 bytes_moved = 0, initial_bytes_moved;
 	u64 bytes_moved_threshold = radeon_bo_get_threshold_for_moves(rdev);
 
-	r = ttm_eu_reserve_buffers(ticket, head, true);
+	INIT_LIST_HEAD(&duplicates);
+	r = ttm_eu_reserve_buffers(ticket, head, true, &duplicates);
 	if (unlikely(r != 0)) {
 		return r;
 	}
 
 	list_for_each_entry(lobj, head, tv.head) {
-		bo = lobj->robj;
+		struct radeon_bo *bo = lobj->robj;
 		if (!bo->pin_count) {
 			u32 domain = lobj->prefered_domains;
 			u32 allowed = lobj->allowed_domains;
 			u32 current_domain =
 				radeon_mem_type_to_domain(bo->tbo.mem.mem_type);
 
+			WARN_ONCE(bo->gem_base.dumb,
+				  "GPU use of dumb buffer is illegal.\n");
+
 			/* Check if this buffer will be moved and don't move it
 			 * if we have moved too many buffers for this IB already.
 			 *
@@ -546,6 +570,12 @@ int radeon_bo_list_validate(struct radeon_device *rdev,
 		lobj->gpu_offset = radeon_bo_gpu_offset(bo);
 		lobj->tiling_flags = bo->tiling_flags;
 	}
+
+	list_for_each_entry(lobj, &duplicates, tv.head) {
+		lobj->gpu_offset = radeon_bo_gpu_offset(lobj->robj);
+		lobj->tiling_flags = lobj->robj->tiling_flags;
+	}
+
 	return 0;
 }
 
@@ -750,8 +780,8 @@ int radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
 {
 	struct radeon_device *rdev;
 	struct radeon_bo *rbo;
-	unsigned long offset, size;
-	int r;
+	unsigned long offset, size, lpfn;
+	int i, r;
 
 	if (!radeon_ttm_bo_is_radeon_bo(bo))
 		return 0;
@@ -768,7 +798,13 @@ int radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
 
 	/* hurrah the memory is not visible ! */
 	radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_VRAM);
-	rbo->placements[0].lpfn = rdev->mc.visible_vram_size >> PAGE_SHIFT;
+	lpfn =	rdev->mc.visible_vram_size >> PAGE_SHIFT;
+	for (i = 0; i < rbo->placement.num_placement; i++) {
+		/* Force into visible VRAM */
+		if ((rbo->placements[i].flags & TTM_PL_FLAG_VRAM) &&
+		    (!rbo->placements[i].lpfn || rbo->placements[i].lpfn > lpfn))
+			rbo->placements[i].lpfn = lpfn;
+	}
 	r = ttm_bo_validate(bo, &rbo->placement, false, false);
 	if (unlikely(r == -ENOMEM)) {
 		radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_GTT);
@@ -799,3 +835,22 @@ int radeon_bo_wait(struct radeon_bo *bo, u32 *mem_type, bool no_wait)
 	ttm_bo_unreserve(&bo->tbo);
 	return r;
 }
+
+/**
+ * radeon_bo_fence - add fence to buffer object
+ *
+ * @bo: buffer object in question
+ * @fence: fence to add
+ * @shared: true if fence should be added shared
+ *
+ */
+void radeon_bo_fence(struct radeon_bo *bo, struct radeon_fence *fence,
+                     bool shared)
+{
+	struct reservation_object *resv = bo->tbo.resv;
+
+	if (shared)
+		reservation_object_add_shared_fence(resv, &fence->base);
+	else
+		reservation_object_add_excl_fence(resv, &fence->base);
+}
diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
index 1b8ec7917154..3b0b377f76cb 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -155,6 +155,8 @@ extern void radeon_bo_move_notify(struct ttm_buffer_object *bo,
 				  struct ttm_mem_reg *new_mem);
 extern int radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo);
 extern int radeon_bo_get_surface_reg(struct radeon_bo *bo);
+extern void radeon_bo_fence(struct radeon_bo *bo, struct radeon_fence *fence,
+			    bool shared);
 
 /*
  * sub allocation
diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c b/drivers/gpu/drm/radeon/radeon_semaphore.c
index 6deb08f045b7..e6ad54cdfa62 100644
--- a/drivers/gpu/drm/radeon/radeon_semaphore.c
+++ b/drivers/gpu/drm/radeon/radeon_semaphore.c
@@ -34,15 +34,14 @@
 int radeon_semaphore_create(struct radeon_device *rdev,
 			    struct radeon_semaphore **semaphore)
 {
-	uint64_t *cpu_addr;
-	int i, r;
+	int r;
 
 	*semaphore = kmalloc(sizeof(struct radeon_semaphore), GFP_KERNEL);
 	if (*semaphore == NULL) {
 		return -ENOMEM;
 	}
-	r = radeon_sa_bo_new(rdev, &rdev->ring_tmp_bo, &(*semaphore)->sa_bo,
-			     8 * RADEON_NUM_SYNCS, 8);
+	r = radeon_sa_bo_new(rdev, &rdev->ring_tmp_bo,
+			     &(*semaphore)->sa_bo, 8, 8);
 	if (r) {
 		kfree(*semaphore);
 		*semaphore = NULL;
@@ -51,12 +50,7 @@ int radeon_semaphore_create(struct radeon_device *rdev,
 	(*semaphore)->waiters = 0;
 	(*semaphore)->gpu_addr = radeon_sa_bo_gpu_addr((*semaphore)->sa_bo);
 
-	cpu_addr = radeon_sa_bo_cpu_addr((*semaphore)->sa_bo);
-	for (i = 0; i < RADEON_NUM_SYNCS; ++i)
-		cpu_addr[i] = 0;
-
-	for (i = 0; i < RADEON_NUM_RINGS; ++i)
-		(*semaphore)->sync_to[i] = NULL;
+	*((uint64_t *)radeon_sa_bo_cpu_addr((*semaphore)->sa_bo)) = 0;
 
 	return 0;
 }
@@ -95,146 +89,6 @@ bool radeon_semaphore_emit_wait(struct radeon_device *rdev, int ridx,
 	return false;
 }
 
-/**
- * radeon_semaphore_sync_fence - use the semaphore to sync to a fence
- *
- * @semaphore: semaphore object to add fence to
- * @fence: fence to sync to
- *
- * Sync to the fence using this semaphore object
- */
-void radeon_semaphore_sync_fence(struct radeon_semaphore *semaphore,
-				 struct radeon_fence *fence)
-{
-        struct radeon_fence *other;
-
-        if (!fence)
-                return;
-
-        other = semaphore->sync_to[fence->ring];
-        semaphore->sync_to[fence->ring] = radeon_fence_later(fence, other);
-}
-
-/**
- * radeon_semaphore_sync_to - use the semaphore to sync to a reservation object
- *
- * @sema: semaphore object to add fence from reservation object to
- * @resv: reservation object with embedded fence
- * @shared: true if we should onyl sync to the exclusive fence
- *
- * Sync to the fence using this semaphore object
- */
-int radeon_semaphore_sync_resv(struct radeon_device *rdev,
-			       struct radeon_semaphore *sema,
-			       struct reservation_object *resv,
-			       bool shared)
-{
-	struct reservation_object_list *flist;
-	struct fence *f;
-	struct radeon_fence *fence;
-	unsigned i;
-	int r = 0;
-
-	/* always sync to the exclusive fence */
-	f = reservation_object_get_excl(resv);
-	fence = f ? to_radeon_fence(f) : NULL;
-	if (fence && fence->rdev == rdev)
-		radeon_semaphore_sync_fence(sema, fence);
-	else if (f)
-		r = fence_wait(f, true);
-
-	flist = reservation_object_get_list(resv);
-	if (shared || !flist || r)
-		return r;
-
-	for (i = 0; i < flist->shared_count; ++i) {
-		f = rcu_dereference_protected(flist->shared[i],
-					      reservation_object_held(resv));
-		fence = to_radeon_fence(f);
-		if (fence && fence->rdev == rdev)
-			radeon_semaphore_sync_fence(sema, fence);
-		else
-			r = fence_wait(f, true);
-
-		if (r)
-			break;
-	}
-	return r;
-}
-
-/**
- * radeon_semaphore_sync_rings - sync ring to all registered fences
- *
- * @rdev: radeon_device pointer
- * @semaphore: semaphore object to use for sync
- * @ring: ring that needs sync
- *
- * Ensure that all registered fences are signaled before letting
- * the ring continue. The caller must hold the ring lock.
- */
-int radeon_semaphore_sync_rings(struct radeon_device *rdev,
-				struct radeon_semaphore *semaphore,
-				int ring)
-{
-	unsigned count = 0;
-	int i, r;
-
-        for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-		struct radeon_fence *fence = semaphore->sync_to[i];
-
-		/* check if we really need to sync */
-                if (!radeon_fence_need_sync(fence, ring))
-			continue;
-
-		/* prevent GPU deadlocks */
-		if (!rdev->ring[i].ready) {
-			dev_err(rdev->dev, "Syncing to a disabled ring!");
-			return -EINVAL;
-		}
-
-		if (++count > RADEON_NUM_SYNCS) {
-			/* not enough room, wait manually */
-			r = radeon_fence_wait(fence, false);
-			if (r)
-				return r;
-			continue;
-		}
-
-		/* allocate enough space for sync command */
-		r = radeon_ring_alloc(rdev, &rdev->ring[i], 16);
-		if (r) {
-			return r;
-		}
-
-		/* emit the signal semaphore */
-		if (!radeon_semaphore_emit_signal(rdev, i, semaphore)) {
-			/* signaling wasn't successful wait manually */
-			radeon_ring_undo(&rdev->ring[i]);
-			r = radeon_fence_wait(fence, false);
-			if (r)
-				return r;
-			continue;
-		}
-
-		/* we assume caller has already allocated space on waiters ring */
-		if (!radeon_semaphore_emit_wait(rdev, ring, semaphore)) {
-			/* waiting wasn't successful wait manually */
-			radeon_ring_undo(&rdev->ring[i]);
-			r = radeon_fence_wait(fence, false);
-			if (r)
-				return r;
-			continue;
-		}
-
-		radeon_ring_commit(rdev, &rdev->ring[i], false);
-		radeon_fence_note_sync(fence, ring);
-
-		semaphore->gpu_addr += 8;
-	}
-
-	return 0;
-}
-
 void radeon_semaphore_free(struct radeon_device *rdev,
 			   struct radeon_semaphore **semaphore,
 			   struct radeon_fence *fence)
diff --git a/drivers/gpu/drm/radeon/radeon_sync.c b/drivers/gpu/drm/radeon/radeon_sync.c
new file mode 100644
index 000000000000..02ac8a1de4ff
--- /dev/null
+++ b/drivers/gpu/drm/radeon/radeon_sync.c
@@ -0,0 +1,220 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ */
+/*
+ * Authors:
+ *    Christian König <christian.koenig@amd.com>
+ */
+
+#include <drm/drmP.h>
+#include "radeon.h"
+#include "radeon_trace.h"
+
+/**
+ * radeon_sync_create - zero init sync object
+ *
+ * @sync: sync object to initialize
+ *
+ * Just clear the sync object for now.
+ */
+void radeon_sync_create(struct radeon_sync *sync)
+{
+	unsigned i;
+
+	for (i = 0; i < RADEON_NUM_SYNCS; ++i)
+		sync->semaphores[i] = NULL;
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i)
+		sync->sync_to[i] = NULL;
+
+	sync->last_vm_update = NULL;
+}
+
+/**
+ * radeon_sync_fence - use the semaphore to sync to a fence
+ *
+ * @sync: sync object to add fence to
+ * @fence: fence to sync to
+ *
+ * Sync to the fence using the semaphore objects
+ */
+void radeon_sync_fence(struct radeon_sync *sync,
+		       struct radeon_fence *fence)
+{
+	struct radeon_fence *other;
+
+	if (!fence)
+		return;
+
+	other = sync->sync_to[fence->ring];
+	sync->sync_to[fence->ring] = radeon_fence_later(fence, other);
+
+	if (fence->is_vm_update) {
+		other = sync->last_vm_update;
+		sync->last_vm_update = radeon_fence_later(fence, other);
+	}
+}
+
+/**
+ * radeon_sync_resv - use the semaphores to sync to a reservation object
+ *
+ * @sync: sync object to add fences from reservation object to
+ * @resv: reservation object with embedded fence
+ * @shared: true if we should only sync to the exclusive fence
+ *
+ * Sync to the fence using the semaphore objects
+ */
+int radeon_sync_resv(struct radeon_device *rdev,
+		     struct radeon_sync *sync,
+		     struct reservation_object *resv,
+		     bool shared)
+{
+	struct reservation_object_list *flist;
+	struct fence *f;
+	struct radeon_fence *fence;
+	unsigned i;
+	int r = 0;
+
+	/* always sync to the exclusive fence */
+	f = reservation_object_get_excl(resv);
+	fence = f ? to_radeon_fence(f) : NULL;
+	if (fence && fence->rdev == rdev)
+		radeon_sync_fence(sync, fence);
+	else if (f)
+		r = fence_wait(f, true);
+
+	flist = reservation_object_get_list(resv);
+	if (shared || !flist || r)
+		return r;
+
+	for (i = 0; i < flist->shared_count; ++i) {
+		f = rcu_dereference_protected(flist->shared[i],
+					      reservation_object_held(resv));
+		fence = to_radeon_fence(f);
+		if (fence && fence->rdev == rdev)
+			radeon_sync_fence(sync, fence);
+		else
+			r = fence_wait(f, true);
+
+		if (r)
+			break;
+	}
+	return r;
+}
+
+/**
+ * radeon_sync_rings - sync ring to all registered fences
+ *
+ * @rdev: radeon_device pointer
+ * @sync: sync object to use
+ * @ring: ring that needs sync
+ *
+ * Ensure that all registered fences are signaled before letting
+ * the ring continue. The caller must hold the ring lock.
+ */
+int radeon_sync_rings(struct radeon_device *rdev,
+		      struct radeon_sync *sync,
+		      int ring)
+{
+	unsigned count = 0;
+	int i, r;
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		struct radeon_fence *fence = sync->sync_to[i];
+		struct radeon_semaphore *semaphore;
+
+		/* check if we really need to sync */
+		if (!radeon_fence_need_sync(fence, ring))
+			continue;
+
+		/* prevent GPU deadlocks */
+		if (!rdev->ring[i].ready) {
+			dev_err(rdev->dev, "Syncing to a disabled ring!");
+			return -EINVAL;
+		}
+
+		if (count >= RADEON_NUM_SYNCS) {
+			/* not enough room, wait manually */
+			r = radeon_fence_wait(fence, false);
+			if (r)
+				return r;
+			continue;
+		}
+		r = radeon_semaphore_create(rdev, &semaphore);
+		if (r)
+			return r;
+
+		sync->semaphores[count++] = semaphore;
+
+		/* allocate enough space for sync command */
+		r = radeon_ring_alloc(rdev, &rdev->ring[i], 16);
+		if (r)
+			return r;
+
+		/* emit the signal semaphore */
+		if (!radeon_semaphore_emit_signal(rdev, i, semaphore)) {
+			/* signaling wasn't successful wait manually */
+			radeon_ring_undo(&rdev->ring[i]);
+			r = radeon_fence_wait(fence, false);
+			if (r)
+				return r;
+			continue;
+		}
+
+		/* we assume caller has already allocated space on waiters ring */
+		if (!radeon_semaphore_emit_wait(rdev, ring, semaphore)) {
+			/* waiting wasn't successful wait manually */
+			radeon_ring_undo(&rdev->ring[i]);
+			r = radeon_fence_wait(fence, false);
+			if (r)
+				return r;
+			continue;
+		}
+
+		radeon_ring_commit(rdev, &rdev->ring[i], false);
+		radeon_fence_note_sync(fence, ring);
+	}
+
+	return 0;
+}
+
+/**
+ * radeon_sync_free - free the sync object
+ *
+ * @rdev: radeon_device pointer
+ * @sync: sync object to use
+ * @fence: fence to use for the free
+ *
+ * Free the sync object by freeing all semaphores in it.
+ */
+void radeon_sync_free(struct radeon_device *rdev,
+		      struct radeon_sync *sync,
+		      struct radeon_fence *fence)
+{
+	unsigned i;
+
+	for (i = 0; i < RADEON_NUM_SYNCS; ++i)
+		radeon_semaphore_free(rdev, &sync->semaphores[i], fence);
+}
diff --git a/drivers/gpu/drm/radeon/radeon_trace.h b/drivers/gpu/drm/radeon/radeon_trace.h
index 9db74a96ef61..ce075cb08cb2 100644
--- a/drivers/gpu/drm/radeon/radeon_trace.h
+++ b/drivers/gpu/drm/radeon/radeon_trace.h
@@ -38,7 +38,7 @@ TRACE_EVENT(radeon_cs,
 
 	    TP_fast_assign(
 			   __entry->ring = p->ring;
-			   __entry->dw = p->chunks[p->chunk_ib_idx].length_dw;
+			   __entry->dw = p->chunk_ib->length_dw;
 			   __entry->fences = radeon_fence_count_emitted(
 				p->rdev, p->ring);
 			   ),
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 8624979afb65..d02aa1d0f588 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -196,9 +196,32 @@ static void radeon_evict_flags(struct ttm_buffer_object *bo,
 	rbo = container_of(bo, struct radeon_bo, tbo);
 	switch (bo->mem.mem_type) {
 	case TTM_PL_VRAM:
-		if (rbo->rdev->ring[RADEON_RING_TYPE_GFX_INDEX].ready == false)
+		if (rbo->rdev->ring[radeon_copy_ring_index(rbo->rdev)].ready == false)
 			radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_CPU);
-		else
+		else if (rbo->rdev->mc.visible_vram_size < rbo->rdev->mc.real_vram_size &&
+			 bo->mem.start < (rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT)) {
+			unsigned fpfn = rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT;
+			int i;
+
+			/* Try evicting to the CPU inaccessible part of VRAM
+			 * first, but only set GTT as busy placement, so this
+			 * BO will be evicted to GTT rather than causing other
+			 * BOs to be evicted from VRAM
+			 */
+			radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_VRAM |
+							 RADEON_GEM_DOMAIN_GTT);
+			rbo->placement.num_busy_placement = 0;
+			for (i = 0; i < rbo->placement.num_placement; i++) {
+				if (rbo->placements[i].flags & TTM_PL_FLAG_VRAM) {
+					if (rbo->placements[0].fpfn < fpfn)
+						rbo->placements[0].fpfn = fpfn;
+				} else {
+					rbo->placement.busy_placement =
+						&rbo->placements[i];
+					rbo->placement.num_busy_placement = 1;
+				}
+			}
+		} else
 			radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_GTT);
 		break;
 	case TTM_PL_TT:
diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c
index 11b662469253..c10b2aec6450 100644
--- a/drivers/gpu/drm/radeon/radeon_uvd.c
+++ b/drivers/gpu/drm/radeon/radeon_uvd.c
@@ -488,12 +488,12 @@ static int radeon_uvd_cs_reloc(struct radeon_cs_parser *p,
 			       unsigned buf_sizes[], bool *has_msg_cmd)
 {
 	struct radeon_cs_chunk *relocs_chunk;
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	unsigned idx, cmd, offset;
 	uint64_t start, end;
 	int r;
 
-	relocs_chunk = &p->chunks[p->chunk_relocs_idx];
+	relocs_chunk = p->chunk_relocs;
 	offset = radeon_get_ib_value(p, data0);
 	idx = radeon_get_ib_value(p, data1);
 	if (idx >= relocs_chunk->length_dw) {
@@ -502,7 +502,7 @@ static int radeon_uvd_cs_reloc(struct radeon_cs_parser *p,
 		return -EINVAL;
 	}
 
-	reloc = p->relocs_ptr[(idx / 4)];
+	reloc = &p->relocs[(idx / 4)];
 	start = reloc->gpu_offset;
 	end = start + radeon_bo_size(reloc->robj);
 	start += offset;
@@ -610,13 +610,13 @@ int radeon_uvd_cs_parse(struct radeon_cs_parser *p)
 		[0x00000003]	=	2048,
 	};
 
-	if (p->chunks[p->chunk_ib_idx].length_dw % 16) {
+	if (p->chunk_ib->length_dw % 16) {
 		DRM_ERROR("UVD IB length (%d) not 16 dwords aligned!\n",
-			  p->chunks[p->chunk_ib_idx].length_dw);
+			  p->chunk_ib->length_dw);
 		return -EINVAL;
 	}
 
-	if (p->chunk_relocs_idx == -1) {
+	if (p->chunk_relocs == NULL) {
 		DRM_ERROR("No relocation chunk !\n");
 		return -EINVAL;
 	}
@@ -640,7 +640,7 @@ int radeon_uvd_cs_parse(struct radeon_cs_parser *p)
 			DRM_ERROR("Unknown packet type %d !\n", pkt.type);
 			return -EINVAL;
 		}
-	} while (p->idx < p->chunks[p->chunk_ib_idx].length_dw);
+	} while (p->idx < p->chunk_ib->length_dw);
 
 	if (!has_msg_cmd) {
 		DRM_ERROR("UVD-IBs need a msg command!\n");
diff --git a/drivers/gpu/drm/radeon/radeon_vce.c b/drivers/gpu/drm/radeon/radeon_vce.c
index 9e85757d5599..976fe432f4e2 100644
--- a/drivers/gpu/drm/radeon/radeon_vce.c
+++ b/drivers/gpu/drm/radeon/radeon_vce.c
@@ -453,11 +453,11 @@ int radeon_vce_cs_reloc(struct radeon_cs_parser *p, int lo, int hi,
 			unsigned size)
 {
 	struct radeon_cs_chunk *relocs_chunk;
-	struct radeon_cs_reloc *reloc;
+	struct radeon_bo_list *reloc;
 	uint64_t start, end, offset;
 	unsigned idx;
 
-	relocs_chunk = &p->chunks[p->chunk_relocs_idx];
+	relocs_chunk = p->chunk_relocs;
 	offset = radeon_get_ib_value(p, lo);
 	idx = radeon_get_ib_value(p, hi);
 
@@ -467,7 +467,7 @@ int radeon_vce_cs_reloc(struct radeon_cs_parser *p, int lo, int hi,
 		return -EINVAL;
 	}
 
-	reloc = p->relocs_ptr[(idx / 4)];
+	reloc = &p->relocs[(idx / 4)];
 	start = reloc->gpu_offset;
 	end = start + radeon_bo_size(reloc->robj);
 	start += offset;
@@ -534,7 +534,7 @@ int radeon_vce_cs_parse(struct radeon_cs_parser *p)
 	uint32_t *size = &tmp;
 	int i, r;
 
-	while (p->idx < p->chunks[p->chunk_ib_idx].length_dw) {
+	while (p->idx < p->chunk_ib->length_dw) {
 		uint32_t len = radeon_get_ib_value(p, p->idx);
 		uint32_t cmd = radeon_get_ib_value(p, p->idx + 1);
 
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index dfde266529e2..cde48c42b30a 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -125,41 +125,37 @@ void radeon_vm_manager_fini(struct radeon_device *rdev)
  * Add the page directory to the list of BOs to
  * validate for command submission (cayman+).
  */
-struct radeon_cs_reloc *radeon_vm_get_bos(struct radeon_device *rdev,
+struct radeon_bo_list *radeon_vm_get_bos(struct radeon_device *rdev,
 					  struct radeon_vm *vm,
 					  struct list_head *head)
 {
-	struct radeon_cs_reloc *list;
+	struct radeon_bo_list *list;
 	unsigned i, idx;
 
 	list = drm_malloc_ab(vm->max_pde_used + 2,
-			     sizeof(struct radeon_cs_reloc));
+			     sizeof(struct radeon_bo_list));
 	if (!list)
 		return NULL;
 
 	/* add the vm page table to the list */
-	list[0].gobj = NULL;
 	list[0].robj = vm->page_directory;
 	list[0].prefered_domains = RADEON_GEM_DOMAIN_VRAM;
 	list[0].allowed_domains = RADEON_GEM_DOMAIN_VRAM;
 	list[0].tv.bo = &vm->page_directory->tbo;
-	list[0].tv.shared = false;
+	list[0].tv.shared = true;
 	list[0].tiling_flags = 0;
-	list[0].handle = 0;
 	list_add(&list[0].tv.head, head);
 
 	for (i = 0, idx = 1; i <= vm->max_pde_used; i++) {
 		if (!vm->page_tables[i].bo)
 			continue;
 
-		list[idx].gobj = NULL;
 		list[idx].robj = vm->page_tables[i].bo;
 		list[idx].prefered_domains = RADEON_GEM_DOMAIN_VRAM;
 		list[idx].allowed_domains = RADEON_GEM_DOMAIN_VRAM;
 		list[idx].tv.bo = &list[idx].robj->tbo;
-		list[idx].tv.shared = false;
+		list[idx].tv.shared = true;
 		list[idx].tiling_flags = 0;
-		list[idx].handle = 0;
 		list_add(&list[idx++].tv.head, head);
 	}
 
@@ -182,15 +178,18 @@ struct radeon_fence *radeon_vm_grab_id(struct radeon_device *rdev,
 				       struct radeon_vm *vm, int ring)
 {
 	struct radeon_fence *best[RADEON_NUM_RINGS] = {};
+	struct radeon_vm_id *vm_id = &vm->ids[ring];
+
 	unsigned choices[2] = {};
 	unsigned i;
 
 	/* check if the id is still valid */
-	if (vm->last_id_use && vm->last_id_use == rdev->vm_manager.active[vm->id])
+	if (vm_id->id && vm_id->last_id_use &&
+	    vm_id->last_id_use == rdev->vm_manager.active[vm_id->id])
 		return NULL;
 
 	/* we definately need to flush */
-	radeon_fence_unref(&vm->last_flush);
+	vm_id->pd_gpu_addr = ~0ll;
 
 	/* skip over VMID 0, since it is the system VM */
 	for (i = 1; i < rdev->vm_manager.nvm; ++i) {
@@ -198,8 +197,8 @@ struct radeon_fence *radeon_vm_grab_id(struct radeon_device *rdev,
 
 		if (fence == NULL) {
 			/* found a free one */
-			vm->id = i;
-			trace_radeon_vm_grab_id(vm->id, ring);
+			vm_id->id = i;
+			trace_radeon_vm_grab_id(i, ring);
 			return NULL;
 		}
 
@@ -211,8 +210,8 @@ struct radeon_fence *radeon_vm_grab_id(struct radeon_device *rdev,
 
 	for (i = 0; i < 2; ++i) {
 		if (choices[i]) {
-			vm->id = choices[i];
-			trace_radeon_vm_grab_id(vm->id, ring);
+			vm_id->id = choices[i];
+			trace_radeon_vm_grab_id(choices[i], ring);
 			return rdev->vm_manager.active[choices[i]];
 		}
 	}
@@ -228,6 +227,7 @@ struct radeon_fence *radeon_vm_grab_id(struct radeon_device *rdev,
  * @rdev: radeon_device pointer
  * @vm: vm we want to flush
  * @ring: ring to use for flush
+ * @updates: last vm update that is waited for
  *
  * Flush the vm (cayman+).
  *
@@ -235,15 +235,21 @@ struct radeon_fence *radeon_vm_grab_id(struct radeon_device *rdev,
  */
 void radeon_vm_flush(struct radeon_device *rdev,
 		     struct radeon_vm *vm,
-		     int ring)
+		     int ring, struct radeon_fence *updates)
 {
 	uint64_t pd_addr = radeon_bo_gpu_offset(vm->page_directory);
+	struct radeon_vm_id *vm_id = &vm->ids[ring];
+
+	if (pd_addr != vm_id->pd_gpu_addr || !vm_id->flushed_updates ||
+	    radeon_fence_is_earlier(vm_id->flushed_updates, updates)) {
+
+		trace_radeon_vm_flush(pd_addr, ring, vm->ids[ring].id);
+		radeon_fence_unref(&vm_id->flushed_updates);
+		vm_id->flushed_updates = radeon_fence_ref(updates);
+		vm_id->pd_gpu_addr = pd_addr;
+		radeon_ring_vm_flush(rdev, &rdev->ring[ring],
+				     vm_id->id, vm_id->pd_gpu_addr);
 
-	/* if we can't remember our last VM flush then flush now! */
-	if (!vm->last_flush || pd_addr != vm->pd_gpu_addr) {
-		trace_radeon_vm_flush(pd_addr, ring, vm->id);
-		vm->pd_gpu_addr = pd_addr;
-		radeon_ring_vm_flush(rdev, ring, vm);
 	}
 }
 
@@ -263,18 +269,13 @@ void radeon_vm_fence(struct radeon_device *rdev,
 		     struct radeon_vm *vm,
 		     struct radeon_fence *fence)
 {
-	radeon_fence_unref(&vm->fence);
-	vm->fence = radeon_fence_ref(fence);
-
-	radeon_fence_unref(&rdev->vm_manager.active[vm->id]);
-	rdev->vm_manager.active[vm->id] = radeon_fence_ref(fence);
+	unsigned vm_id = vm->ids[fence->ring].id;
 
-	radeon_fence_unref(&vm->last_id_use);
-	vm->last_id_use = radeon_fence_ref(fence);
+	radeon_fence_unref(&rdev->vm_manager.active[vm_id]);
+	rdev->vm_manager.active[vm_id] = radeon_fence_ref(fence);
 
-        /* we just flushed the VM, remember that */
-        if (!vm->last_flush)
-                vm->last_flush = radeon_fence_ref(fence);
+	radeon_fence_unref(&vm->ids[fence->ring].last_id_use);
+	vm->ids[fence->ring].last_id_use = radeon_fence_ref(fence);
 }
 
 /**
@@ -387,35 +388,25 @@ static void radeon_vm_set_pages(struct radeon_device *rdev,
 static int radeon_vm_clear_bo(struct radeon_device *rdev,
 			      struct radeon_bo *bo)
 {
-        struct ttm_validate_buffer tv;
-        struct ww_acquire_ctx ticket;
-        struct list_head head;
 	struct radeon_ib ib;
 	unsigned entries;
 	uint64_t addr;
 	int r;
 
-        memset(&tv, 0, sizeof(tv));
-        tv.bo = &bo->tbo;
-	tv.shared = false;
-
-        INIT_LIST_HEAD(&head);
-        list_add(&tv.head, &head);
-
-        r = ttm_eu_reserve_buffers(&ticket, &head, true);
-        if (r)
+	r = radeon_bo_reserve(bo, false);
+	if (r)
 		return r;
 
-        r = ttm_bo_validate(&bo->tbo, &bo->placement, true, false);
-        if (r)
-                goto error;
+	r = ttm_bo_validate(&bo->tbo, &bo->placement, true, false);
+	if (r)
+		goto error_unreserve;
 
 	addr = radeon_bo_gpu_offset(bo);
 	entries = radeon_bo_size(bo) / 8;
 
 	r = radeon_ib_get(rdev, R600_RING_TYPE_DMA_INDEX, &ib, NULL, 256);
 	if (r)
-                goto error;
+		goto error_unreserve;
 
 	ib.length_dw = 0;
 
@@ -425,15 +416,16 @@ static int radeon_vm_clear_bo(struct radeon_device *rdev,
 
 	r = radeon_ib_schedule(rdev, &ib, NULL, false);
 	if (r)
-                goto error;
+		goto error_free;
 
-	ttm_eu_fence_buffer_objects(&ticket, &head, &ib.fence->base);
-	radeon_ib_free(rdev, &ib);
+	ib.fence->is_vm_update = true;
+	radeon_bo_fence(bo, ib.fence, false);
 
-	return 0;
+error_free:
+	radeon_ib_free(rdev, &ib);
 
-error:
-	ttm_eu_backoff_reservation(&ticket, &head);
+error_unreserve:
+	radeon_bo_unreserve(bo);
 	return r;
 }
 
@@ -449,7 +441,7 @@ error:
  * Validate and set the offset requested within the vm address space.
  * Returns 0 for success, error for failure.
  *
- * Object has to be reserved!
+ * Object has to be reserved and gets unreserved by this function!
  */
 int radeon_vm_bo_set_addr(struct radeon_device *rdev,
 			  struct radeon_bo_va *bo_va,
@@ -495,7 +487,9 @@ int radeon_vm_bo_set_addr(struct radeon_device *rdev,
 			tmp->vm = vm;
 			tmp->addr = bo_va->addr;
 			tmp->bo = radeon_bo_ref(bo_va->bo);
+			spin_lock(&vm->status_lock);
 			list_add(&tmp->vm_status, &vm->freed);
+			spin_unlock(&vm->status_lock);
 		}
 
 		interval_tree_remove(&bo_va->it, &vm->va);
@@ -575,7 +569,7 @@ int radeon_vm_bo_set_addr(struct radeon_device *rdev,
 	}
 
 	mutex_unlock(&vm->mutex);
-	return radeon_bo_reserve(bo_va->bo, false);
+	return 0;
 }
 
 /**
@@ -699,17 +693,15 @@ int radeon_vm_update_page_directory(struct radeon_device *rdev,
 	if (ib.length_dw != 0) {
 		radeon_asic_vm_pad_ib(rdev, &ib);
 
-		radeon_semaphore_sync_resv(rdev, ib.semaphore, pd->tbo.resv, false);
-		radeon_semaphore_sync_fence(ib.semaphore, vm->last_id_use);
+		radeon_sync_resv(rdev, &ib.sync, pd->tbo.resv, true);
 		WARN_ON(ib.length_dw > ndw);
 		r = radeon_ib_schedule(rdev, &ib, NULL, false);
 		if (r) {
 			radeon_ib_free(rdev, &ib);
 			return r;
 		}
-		radeon_fence_unref(&vm->fence);
-		vm->fence = radeon_fence_ref(ib.fence);
-		radeon_fence_unref(&vm->last_flush);
+		ib.fence->is_vm_update = true;
+		radeon_bo_fence(pd, ib.fence, false);
 	}
 	radeon_ib_free(rdev, &ib);
 
@@ -808,11 +800,11 @@ static void radeon_vm_frag_ptes(struct radeon_device *rdev,
  *
  * Global and local mutex must be locked!
  */
-static void radeon_vm_update_ptes(struct radeon_device *rdev,
-				  struct radeon_vm *vm,
-				  struct radeon_ib *ib,
-				  uint64_t start, uint64_t end,
-				  uint64_t dst, uint32_t flags)
+static int radeon_vm_update_ptes(struct radeon_device *rdev,
+				 struct radeon_vm *vm,
+				 struct radeon_ib *ib,
+				 uint64_t start, uint64_t end,
+				 uint64_t dst, uint32_t flags)
 {
 	uint64_t mask = RADEON_VM_PTE_COUNT - 1;
 	uint64_t last_pte = ~0, last_dst = ~0;
@@ -825,8 +817,12 @@ static void radeon_vm_update_ptes(struct radeon_device *rdev,
 		struct radeon_bo *pt = vm->page_tables[pt_idx].bo;
 		unsigned nptes;
 		uint64_t pte;
+		int r;
 
-		radeon_semaphore_sync_resv(rdev, ib->semaphore, pt->tbo.resv, false);
+		radeon_sync_resv(rdev, &ib->sync, pt->tbo.resv, true);
+		r = reservation_object_reserve_shared(pt->tbo.resv);
+		if (r)
+			return r;
 
 		if ((addr & ~mask) == (end & ~mask))
 			nptes = end - addr;
@@ -860,6 +856,33 @@ static void radeon_vm_update_ptes(struct radeon_device *rdev,
 				    last_pte + 8 * count,
 				    last_dst, flags);
 	}
+
+	return 0;
+}
+
+/**
+ * radeon_vm_fence_pts - fence page tables after an update
+ *
+ * @vm: requested vm
+ * @start: start of GPU address range
+ * @end: end of GPU address range
+ * @fence: fence to use
+ *
+ * Fence the page tables in the range @start - @end (cayman+).
+ *
+ * Global and local mutex must be locked!
+ */
+static void radeon_vm_fence_pts(struct radeon_vm *vm,
+				uint64_t start, uint64_t end,
+				struct radeon_fence *fence)
+{
+	unsigned i;
+
+	start >>= radeon_vm_block_size;
+	end >>= radeon_vm_block_size;
+
+	for (i = start; i <= end; ++i)
+		radeon_bo_fence(vm->page_tables[i].bo, fence, true);
 }
 
 /**
@@ -892,7 +915,9 @@ int radeon_vm_bo_update(struct radeon_device *rdev,
 		return -EINVAL;
 	}
 
+	spin_lock(&vm->status_lock);
 	list_del_init(&bo_va->vm_status);
+	spin_unlock(&vm->status_lock);
 
 	bo_va->flags &= ~RADEON_VM_PAGE_VALID;
 	bo_va->flags &= ~RADEON_VM_PAGE_SYSTEM;
@@ -961,23 +986,34 @@ int radeon_vm_bo_update(struct radeon_device *rdev,
 		return r;
 	ib.length_dw = 0;
 
-	radeon_vm_update_ptes(rdev, vm, &ib, bo_va->it.start,
-			      bo_va->it.last + 1, addr,
-			      radeon_vm_page_flags(bo_va->flags));
+	if (!(bo_va->flags & RADEON_VM_PAGE_VALID)) {
+		unsigned i;
+
+		for (i = 0; i < RADEON_NUM_RINGS; ++i)
+			radeon_sync_fence(&ib.sync, vm->ids[i].last_id_use);
+	}
+
+	r = radeon_vm_update_ptes(rdev, vm, &ib, bo_va->it.start,
+				  bo_va->it.last + 1, addr,
+				  radeon_vm_page_flags(bo_va->flags));
+	if (r) {
+		radeon_ib_free(rdev, &ib);
+		return r;
+	}
 
 	radeon_asic_vm_pad_ib(rdev, &ib);
 	WARN_ON(ib.length_dw > ndw);
 
-	radeon_semaphore_sync_fence(ib.semaphore, vm->fence);
 	r = radeon_ib_schedule(rdev, &ib, NULL, false);
 	if (r) {
 		radeon_ib_free(rdev, &ib);
 		return r;
 	}
-	radeon_fence_unref(&vm->fence);
-	vm->fence = radeon_fence_ref(ib.fence);
+	ib.fence->is_vm_update = true;
+	radeon_vm_fence_pts(vm, bo_va->it.start, bo_va->it.last + 1, ib.fence);
+	radeon_fence_unref(&bo_va->last_pt_update);
+	bo_va->last_pt_update = radeon_fence_ref(ib.fence);
 	radeon_ib_free(rdev, &ib);
-	radeon_fence_unref(&vm->last_flush);
 
 	return 0;
 }
@@ -996,16 +1032,25 @@ int radeon_vm_bo_update(struct radeon_device *rdev,
 int radeon_vm_clear_freed(struct radeon_device *rdev,
 			  struct radeon_vm *vm)
 {
-	struct radeon_bo_va *bo_va, *tmp;
+	struct radeon_bo_va *bo_va;
 	int r;
 
-	list_for_each_entry_safe(bo_va, tmp, &vm->freed, vm_status) {
+	spin_lock(&vm->status_lock);
+	while (!list_empty(&vm->freed)) {
+		bo_va = list_first_entry(&vm->freed,
+			struct radeon_bo_va, vm_status);
+		spin_unlock(&vm->status_lock);
+
 		r = radeon_vm_bo_update(rdev, bo_va, NULL);
 		radeon_bo_unref(&bo_va->bo);
+		radeon_fence_unref(&bo_va->last_pt_update);
 		kfree(bo_va);
 		if (r)
 			return r;
+
+		spin_lock(&vm->status_lock);
 	}
+	spin_unlock(&vm->status_lock);
 	return 0;
 
 }
@@ -1024,14 +1069,23 @@ int radeon_vm_clear_freed(struct radeon_device *rdev,
 int radeon_vm_clear_invalids(struct radeon_device *rdev,
 			     struct radeon_vm *vm)
 {
-	struct radeon_bo_va *bo_va, *tmp;
+	struct radeon_bo_va *bo_va;
 	int r;
 
-	list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, vm_status) {
+	spin_lock(&vm->status_lock);
+	while (!list_empty(&vm->invalidated)) {
+		bo_va = list_first_entry(&vm->invalidated,
+			struct radeon_bo_va, vm_status);
+		spin_unlock(&vm->status_lock);
+
 		r = radeon_vm_bo_update(rdev, bo_va, NULL);
 		if (r)
 			return r;
+
+		spin_lock(&vm->status_lock);
 	}
+	spin_unlock(&vm->status_lock);
+
 	return 0;
 }
 
@@ -1054,14 +1108,17 @@ void radeon_vm_bo_rmv(struct radeon_device *rdev,
 
 	mutex_lock(&vm->mutex);
 	interval_tree_remove(&bo_va->it, &vm->va);
+	spin_lock(&vm->status_lock);
 	list_del(&bo_va->vm_status);
 
 	if (bo_va->addr) {
 		bo_va->bo = radeon_bo_ref(bo_va->bo);
 		list_add(&bo_va->vm_status, &vm->freed);
 	} else {
+		radeon_fence_unref(&bo_va->last_pt_update);
 		kfree(bo_va);
 	}
+	spin_unlock(&vm->status_lock);
 
 	mutex_unlock(&vm->mutex);
 }
@@ -1082,10 +1139,10 @@ void radeon_vm_bo_invalidate(struct radeon_device *rdev,
 
 	list_for_each_entry(bo_va, &bo->va, bo_list) {
 		if (bo_va->addr) {
-			mutex_lock(&bo_va->vm->mutex);
+			spin_lock(&bo_va->vm->status_lock);
 			list_del(&bo_va->vm_status);
 			list_add(&bo_va->vm_status, &bo_va->vm->invalidated);
-			mutex_unlock(&bo_va->vm->mutex);
+			spin_unlock(&bo_va->vm->status_lock);
 		}
 	}
 }
@@ -1103,15 +1160,17 @@ int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm)
 	const unsigned align = min(RADEON_VM_PTB_ALIGN_SIZE,
 		RADEON_VM_PTE_COUNT * 8);
 	unsigned pd_size, pd_entries, pts_size;
-	int r;
+	int i, r;
 
-	vm->id = 0;
 	vm->ib_bo_va = NULL;
-	vm->fence = NULL;
-	vm->last_flush = NULL;
-	vm->last_id_use = NULL;
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		vm->ids[i].id = 0;
+		vm->ids[i].flushed_updates = NULL;
+		vm->ids[i].last_id_use = NULL;
+	}
 	mutex_init(&vm->mutex);
 	vm->va = RB_ROOT;
+	spin_lock_init(&vm->status_lock);
 	INIT_LIST_HEAD(&vm->invalidated);
 	INIT_LIST_HEAD(&vm->freed);
 
@@ -1165,11 +1224,13 @@ void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm)
 		if (!r) {
 			list_del_init(&bo_va->bo_list);
 			radeon_bo_unreserve(bo_va->bo);
+			radeon_fence_unref(&bo_va->last_pt_update);
 			kfree(bo_va);
 		}
 	}
 	list_for_each_entry_safe(bo_va, tmp, &vm->freed, vm_status) {
 		radeon_bo_unref(&bo_va->bo);
+		radeon_fence_unref(&bo_va->last_pt_update);
 		kfree(bo_va);
 	}
 
@@ -1179,9 +1240,10 @@ void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm)
 
 	radeon_bo_unref(&vm->page_directory);
 
-	radeon_fence_unref(&vm->fence);
-	radeon_fence_unref(&vm->last_flush);
-	radeon_fence_unref(&vm->last_id_use);
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		radeon_fence_unref(&vm->ids[i].flushed_updates);
+		radeon_fence_unref(&vm->ids[i].last_id_use);
+	}
 
 	mutex_destroy(&vm->mutex);
 }
diff --git a/drivers/gpu/drm/radeon/rv770_dma.c b/drivers/gpu/drm/radeon/rv770_dma.c
index 7f34bad2e724..acff6e09cc40 100644
--- a/drivers/gpu/drm/radeon/rv770_dma.c
+++ b/drivers/gpu/drm/radeon/rv770_dma.c
@@ -44,31 +44,27 @@ struct radeon_fence *rv770_copy_dma(struct radeon_device *rdev,
 				    unsigned num_gpu_pages,
 				    struct reservation_object *resv)
 {
-	struct radeon_semaphore *sem = NULL;
 	struct radeon_fence *fence;
+	struct radeon_sync sync;
 	int ring_index = rdev->asic->copy.dma_ring_index;
 	struct radeon_ring *ring = &rdev->ring[ring_index];
 	u32 size_in_dw, cur_size_in_dw;
 	int i, num_loops;
 	int r = 0;
 
-	r = radeon_semaphore_create(rdev, &sem);
-	if (r) {
-		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		return ERR_PTR(r);
-	}
+	radeon_sync_create(&sync);
 
 	size_in_dw = (num_gpu_pages << RADEON_GPU_PAGE_SHIFT) / 4;
 	num_loops = DIV_ROUND_UP(size_in_dw, 0xFFFF);
 	r = radeon_ring_lock(rdev, ring, num_loops * 5 + 8);
 	if (r) {
 		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
-	radeon_semaphore_sync_resv(rdev, sem, resv, false);
-	radeon_semaphore_sync_rings(rdev, sem, ring->idx);
+	radeon_sync_resv(rdev, &sync, resv, false);
+	radeon_sync_rings(rdev, &sync, ring->idx);
 
 	for (i = 0; i < num_loops; i++) {
 		cur_size_in_dw = size_in_dw;
@@ -87,12 +83,12 @@ struct radeon_fence *rv770_copy_dma(struct radeon_device *rdev,
 	r = radeon_fence_emit(rdev, &fence, ring->idx);
 	if (r) {
 		radeon_ring_unlock_undo(rdev, ring);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
 	radeon_ring_unlock_commit(rdev, ring, false);
-	radeon_semaphore_free(rdev, &sem, fence);
+	radeon_sync_free(rdev, &sync, fence);
 
 	return fence;
 }
diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c
index 7d5083dc4acb..60df444bd075 100644
--- a/drivers/gpu/drm/radeon/si.c
+++ b/drivers/gpu/drm/radeon/si.c
@@ -3365,6 +3365,7 @@ void si_fence_ring_emit(struct radeon_device *rdev,
 void si_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib)
 {
 	struct radeon_ring *ring = &rdev->ring[ib->ring];
+	unsigned vm_id = ib->vm ? ib->vm->ids[ib->ring].id : 0;
 	u32 header;
 
 	if (ib->is_const_ib) {
@@ -3400,14 +3401,13 @@ void si_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib)
 #endif
 			  (ib->gpu_addr & 0xFFFFFFFC));
 	radeon_ring_write(ring, upper_32_bits(ib->gpu_addr) & 0xFFFF);
-	radeon_ring_write(ring, ib->length_dw |
-			  (ib->vm ? (ib->vm->id << 24) : 0));
+	radeon_ring_write(ring, ib->length_dw | (vm_id << 24));
 
 	if (!ib->is_const_ib) {
 		/* flush read cache over gart for this vmid */
 		radeon_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
 		radeon_ring_write(ring, (CP_COHER_CNTL2 - PACKET3_SET_CONFIG_REG_START) >> 2);
-		radeon_ring_write(ring, ib->vm ? ib->vm->id : 0);
+		radeon_ring_write(ring, vm_id);
 		radeon_ring_write(ring, PACKET3(PACKET3_SURFACE_SYNC, 3));
 		radeon_ring_write(ring, PACKET3_TCL1_ACTION_ENA |
 				  PACKET3_TC_ACTION_ENA |
@@ -5023,27 +5023,23 @@ static void si_vm_decode_fault(struct radeon_device *rdev,
 	       block, mc_id);
 }
 
-void si_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm)
+void si_vm_flush(struct radeon_device *rdev, struct radeon_ring *ring,
+		 unsigned vm_id, uint64_t pd_addr)
 {
-	struct radeon_ring *ring = &rdev->ring[ridx];
-
-	if (vm == NULL)
-		return;
-
 	/* write new base address */
 	radeon_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3));
 	radeon_ring_write(ring, (WRITE_DATA_ENGINE_SEL(1) |
 				 WRITE_DATA_DST_SEL(0)));
 
-	if (vm->id < 8) {
+	if (vm_id < 8) {
 		radeon_ring_write(ring,
-				  (VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (vm->id << 2)) >> 2);
+				  (VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (vm_id << 2)) >> 2);
 	} else {
 		radeon_ring_write(ring,
-				  (VM_CONTEXT8_PAGE_TABLE_BASE_ADDR + ((vm->id - 8) << 2)) >> 2);
+				  (VM_CONTEXT8_PAGE_TABLE_BASE_ADDR + ((vm_id - 8) << 2)) >> 2);
 	}
 	radeon_ring_write(ring, 0);
-	radeon_ring_write(ring, vm->pd_gpu_addr >> 12);
+	radeon_ring_write(ring, pd_addr >> 12);
 
 	/* flush hdp cache */
 	radeon_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3));
@@ -5059,7 +5055,7 @@ void si_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm)
 				 WRITE_DATA_DST_SEL(0)));
 	radeon_ring_write(ring, VM_INVALIDATE_REQUEST >> 2);
 	radeon_ring_write(ring, 0);
-	radeon_ring_write(ring, 1 << vm->id);
+	radeon_ring_write(ring, 1 << vm_id);
 
 	/* sync PFP to ME, otherwise we might get invalid PFP reads */
 	radeon_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0));
diff --git a/drivers/gpu/drm/radeon/si_dma.c b/drivers/gpu/drm/radeon/si_dma.c
index b58f12b762d7..f5cc777e1c5f 100644
--- a/drivers/gpu/drm/radeon/si_dma.c
+++ b/drivers/gpu/drm/radeon/si_dma.c
@@ -185,20 +185,17 @@ void si_dma_vm_set_pages(struct radeon_device *rdev,
 	}
 }
 
-void si_dma_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm)
-{
-	struct radeon_ring *ring = &rdev->ring[ridx];
-
-	if (vm == NULL)
-		return;
+void si_dma_vm_flush(struct radeon_device *rdev, struct radeon_ring *ring,
+		     unsigned vm_id, uint64_t pd_addr)
 
+{
 	radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_SRBM_WRITE, 0, 0, 0, 0));
-	if (vm->id < 8) {
-		radeon_ring_write(ring, (0xf << 16) | ((VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (vm->id << 2)) >> 2));
+	if (vm_id < 8) {
+		radeon_ring_write(ring, (0xf << 16) | ((VM_CONTEXT0_PAGE_TABLE_BASE_ADDR + (vm_id << 2)) >> 2));
 	} else {
-		radeon_ring_write(ring, (0xf << 16) | ((VM_CONTEXT8_PAGE_TABLE_BASE_ADDR + ((vm->id - 8) << 2)) >> 2));
+		radeon_ring_write(ring, (0xf << 16) | ((VM_CONTEXT8_PAGE_TABLE_BASE_ADDR + ((vm_id - 8) << 2)) >> 2));
 	}
-	radeon_ring_write(ring, vm->pd_gpu_addr >> 12);
+	radeon_ring_write(ring, pd_addr >> 12);
 
 	/* flush hdp cache */
 	radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_SRBM_WRITE, 0, 0, 0, 0));
@@ -208,7 +205,7 @@ void si_dma_vm_flush(struct radeon_device *rdev, int ridx, struct radeon_vm *vm)
 	/* bits 0-7 are the VM contexts0-7 */
 	radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_SRBM_WRITE, 0, 0, 0, 0));
 	radeon_ring_write(ring, (0xf << 16) | (VM_INVALIDATE_REQUEST >> 2));
-	radeon_ring_write(ring, 1 << vm->id);
+	radeon_ring_write(ring, 1 << vm_id);
 }
 
 /**
@@ -229,31 +226,27 @@ struct radeon_fence *si_copy_dma(struct radeon_device *rdev,
 				 unsigned num_gpu_pages,
 				 struct reservation_object *resv)
 {
-	struct radeon_semaphore *sem = NULL;
 	struct radeon_fence *fence;
+	struct radeon_sync sync;
 	int ring_index = rdev->asic->copy.dma_ring_index;
 	struct radeon_ring *ring = &rdev->ring[ring_index];
 	u32 size_in_bytes, cur_size_in_bytes;
 	int i, num_loops;
 	int r = 0;
 
-	r = radeon_semaphore_create(rdev, &sem);
-	if (r) {
-		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		return ERR_PTR(r);
-	}
+	radeon_sync_create(&sync);
 
 	size_in_bytes = (num_gpu_pages << RADEON_GPU_PAGE_SHIFT);
 	num_loops = DIV_ROUND_UP(size_in_bytes, 0xfffff);
 	r = radeon_ring_lock(rdev, ring, num_loops * 5 + 11);
 	if (r) {
 		DRM_ERROR("radeon: moving bo (%d).\n", r);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
-	radeon_semaphore_sync_resv(rdev, sem, resv, false);
-	radeon_semaphore_sync_rings(rdev, sem, ring->idx);
+	radeon_sync_resv(rdev, &sync, resv, false);
+	radeon_sync_rings(rdev, &sync, ring->idx);
 
 	for (i = 0; i < num_loops; i++) {
 		cur_size_in_bytes = size_in_bytes;
@@ -272,12 +265,12 @@ struct radeon_fence *si_copy_dma(struct radeon_device *rdev,
 	r = radeon_fence_emit(rdev, &fence, ring->idx);
 	if (r) {
 		radeon_ring_unlock_undo(rdev, ring);
-		radeon_semaphore_free(rdev, &sem, NULL);
+		radeon_sync_free(rdev, &sync, NULL);
 		return ERR_PTR(r);
 	}
 
 	radeon_ring_unlock_commit(rdev, ring, false);
-	radeon_semaphore_free(rdev, &sem, fence);
+	radeon_sync_free(rdev, &sync, fence);
 
 	return fence;
 }
diff --git a/drivers/gpu/drm/radeon/si_dpm.c b/drivers/gpu/drm/radeon/si_dpm.c
index 676e6c2ba90a..32e354b8b0ab 100644
--- a/drivers/gpu/drm/radeon/si_dpm.c
+++ b/drivers/gpu/drm/radeon/si_dpm.c
@@ -3398,6 +3398,15 @@ static int si_process_firmware_header(struct radeon_device *rdev)
 
 	ret = si_read_smc_sram_dword(rdev,
 				     SISLANDS_SMC_FIRMWARE_HEADER_LOCATION +
+				     SISLANDS_SMC_FIRMWARE_HEADER_fanTable,
+				     &tmp, si_pi->sram_end);
+	if (ret)
+		return ret;
+
+	si_pi->fan_table_start = tmp;
+
+	ret = si_read_smc_sram_dword(rdev,
+				     SISLANDS_SMC_FIRMWARE_HEADER_LOCATION +
 				     SISLANDS_SMC_FIRMWARE_HEADER_mcArbDramAutoRefreshTable,
 				     &tmp, si_pi->sram_end);
 	if (ret)
@@ -5817,8 +5826,33 @@ void si_dpm_setup_asic(struct radeon_device *rdev)
 	si_enable_acpi_power_management(rdev);
 }
 
-static int si_set_thermal_temperature_range(struct radeon_device *rdev,
-					int min_temp, int max_temp)
+static int si_thermal_enable_alert(struct radeon_device *rdev,
+				   bool enable)
+{
+	u32 thermal_int = RREG32(CG_THERMAL_INT);
+
+	if (enable) {
+		PPSMC_Result result;
+
+		thermal_int &= ~(THERM_INT_MASK_HIGH | THERM_INT_MASK_LOW);
+		WREG32(CG_THERMAL_INT, thermal_int);
+		rdev->irq.dpm_thermal = false;
+		result = si_send_msg_to_smc(rdev, PPSMC_MSG_EnableThermalInterrupt);
+		if (result != PPSMC_Result_OK) {
+			DRM_DEBUG_KMS("Could not enable thermal interrupts.\n");
+			return -EINVAL;
+		}
+	} else {
+		thermal_int |= THERM_INT_MASK_HIGH | THERM_INT_MASK_LOW;
+		WREG32(CG_THERMAL_INT, thermal_int);
+		rdev->irq.dpm_thermal = true;
+	}
+
+	return 0;
+}
+
+static int si_thermal_set_temperature_range(struct radeon_device *rdev,
+					    int min_temp, int max_temp)
 {
 	int low_temp = 0 * 1000;
 	int high_temp = 255 * 1000;
@@ -5842,6 +5876,309 @@ static int si_set_thermal_temperature_range(struct radeon_device *rdev,
 	return 0;
 }
 
+static void si_fan_ctrl_set_static_mode(struct radeon_device *rdev, u32 mode)
+{
+	struct si_power_info *si_pi = si_get_pi(rdev);
+	u32 tmp;
+
+	if (si_pi->fan_ctrl_is_in_default_mode) {
+		tmp = (RREG32(CG_FDO_CTRL2) & FDO_PWM_MODE_MASK) >> FDO_PWM_MODE_SHIFT;
+		si_pi->fan_ctrl_default_mode = tmp;
+		tmp = (RREG32(CG_FDO_CTRL2) & TMIN_MASK) >> TMIN_SHIFT;
+		si_pi->t_min = tmp;
+		si_pi->fan_ctrl_is_in_default_mode = false;
+	}
+
+	tmp = RREG32(CG_FDO_CTRL2) & ~TMIN_MASK;
+	tmp |= TMIN(0);
+	WREG32(CG_FDO_CTRL2, tmp);
+
+	tmp = RREG32(CG_FDO_CTRL2) & ~FDO_PWM_MODE_MASK;
+	tmp |= FDO_PWM_MODE(mode);
+	WREG32(CG_FDO_CTRL2, tmp);
+}
+
+static int si_thermal_setup_fan_table(struct radeon_device *rdev)
+{
+	struct si_power_info *si_pi = si_get_pi(rdev);
+	PP_SIslands_FanTable fan_table = { FDO_MODE_HARDWARE };
+	u32 duty100;
+	u32 t_diff1, t_diff2, pwm_diff1, pwm_diff2;
+	u16 fdo_min, slope1, slope2;
+	u32 reference_clock, tmp;
+	int ret;
+	u64 tmp64;
+
+	if (!si_pi->fan_table_start) {
+		rdev->pm.dpm.fan.ucode_fan_control = false;
+		return 0;
+	}
+
+	duty100 = (RREG32(CG_FDO_CTRL1) & FMAX_DUTY100_MASK) >> FMAX_DUTY100_SHIFT;
+
+	if (duty100 == 0) {
+		rdev->pm.dpm.fan.ucode_fan_control = false;
+		return 0;
+	}
+
+	tmp64 = (u64)rdev->pm.dpm.fan.pwm_min * duty100;
+	do_div(tmp64, 10000);
+	fdo_min = (u16)tmp64;
+
+	t_diff1 = rdev->pm.dpm.fan.t_med - rdev->pm.dpm.fan.t_min;
+	t_diff2 = rdev->pm.dpm.fan.t_high - rdev->pm.dpm.fan.t_med;
+
+	pwm_diff1 = rdev->pm.dpm.fan.pwm_med - rdev->pm.dpm.fan.pwm_min;
+	pwm_diff2 = rdev->pm.dpm.fan.pwm_high - rdev->pm.dpm.fan.pwm_med;
+
+	slope1 = (u16)((50 + ((16 * duty100 * pwm_diff1) / t_diff1)) / 100);
+	slope2 = (u16)((50 + ((16 * duty100 * pwm_diff2) / t_diff2)) / 100);
+
+	fan_table.slope1 = cpu_to_be16(slope1);
+	fan_table.slope2 = cpu_to_be16(slope2);
+
+	fan_table.fdo_min = cpu_to_be16(fdo_min);
+
+	fan_table.hys_down = cpu_to_be16(rdev->pm.dpm.fan.t_hyst);
+
+	fan_table.hys_up = cpu_to_be16(1);
+
+	fan_table.hys_slope = cpu_to_be16(1);
+
+	fan_table.temp_resp_lim = cpu_to_be16(5);
+
+	reference_clock = radeon_get_xclk(rdev);
+
+	fan_table.refresh_period = cpu_to_be32((rdev->pm.dpm.fan.cycle_delay *
+						reference_clock) / 1600);
+
+	fan_table.fdo_max = cpu_to_be16((u16)duty100);
+
+	tmp = (RREG32(CG_MULT_THERMAL_CTRL) & TEMP_SEL_MASK) >> TEMP_SEL_SHIFT;
+	fan_table.temp_src = (uint8_t)tmp;
+
+	ret = si_copy_bytes_to_smc(rdev,
+				   si_pi->fan_table_start,
+				   (u8 *)(&fan_table),
+				   sizeof(fan_table),
+				   si_pi->sram_end);
+
+	if (ret) {
+		DRM_ERROR("Failed to load fan table to the SMC.");
+		rdev->pm.dpm.fan.ucode_fan_control = false;
+	}
+
+	return 0;
+}
+
+static int si_fan_ctrl_start_smc_fan_control(struct radeon_device *rdev)
+{
+	PPSMC_Result ret;
+
+	ret = si_send_msg_to_smc(rdev, PPSMC_StartFanControl);
+	if (ret == PPSMC_Result_OK)
+		return 0;
+	else
+		return -EINVAL;
+}
+
+static int si_fan_ctrl_stop_smc_fan_control(struct radeon_device *rdev)
+{
+	PPSMC_Result ret;
+
+	ret = si_send_msg_to_smc(rdev, PPSMC_StopFanControl);
+	if (ret == PPSMC_Result_OK)
+		return 0;
+	else
+		return -EINVAL;
+}
+
+#if 0
+static int si_fan_ctrl_get_fan_speed_percent(struct radeon_device *rdev,
+					     u32 *speed)
+{
+	u32 duty, duty100;
+	u64 tmp64;
+
+	if (rdev->pm.no_fan)
+		return -ENOENT;
+
+	duty100 = (RREG32(CG_FDO_CTRL1) & FMAX_DUTY100_MASK) >> FMAX_DUTY100_SHIFT;
+	duty = (RREG32(CG_THERMAL_STATUS) & FDO_PWM_DUTY_MASK) >> FDO_PWM_DUTY_SHIFT;
+
+	if (duty100 == 0)
+		return -EINVAL;
+
+	tmp64 = (u64)duty * 100;
+	do_div(tmp64, duty100);
+	*speed = (u32)tmp64;
+
+	if (*speed > 100)
+		*speed = 100;
+
+	return 0;
+}
+
+static int si_fan_ctrl_set_fan_speed_percent(struct radeon_device *rdev,
+					     u32 speed)
+{
+	u32 tmp;
+	u32 duty, duty100;
+	u64 tmp64;
+
+	if (rdev->pm.no_fan)
+		return -ENOENT;
+
+	if (speed > 100)
+		return -EINVAL;
+
+	if (rdev->pm.dpm.fan.ucode_fan_control)
+		si_fan_ctrl_stop_smc_fan_control(rdev);
+
+	duty100 = (RREG32(CG_FDO_CTRL1) & FMAX_DUTY100_MASK) >> FMAX_DUTY100_SHIFT;
+
+	if (duty100 == 0)
+		return -EINVAL;
+
+	tmp64 = (u64)speed * duty100;
+	do_div(tmp64, 100);
+	duty = (u32)tmp64;
+
+	tmp = RREG32(CG_FDO_CTRL0) & ~FDO_STATIC_DUTY_MASK;
+	tmp |= FDO_STATIC_DUTY(duty);
+	WREG32(CG_FDO_CTRL0, tmp);
+
+	si_fan_ctrl_set_static_mode(rdev, FDO_PWM_MODE_STATIC);
+
+	return 0;
+}
+
+static int si_fan_ctrl_get_fan_speed_rpm(struct radeon_device *rdev,
+					 u32 *speed)
+{
+	u32 tach_period;
+	u32 xclk = radeon_get_xclk(rdev);
+
+	if (rdev->pm.no_fan)
+		return -ENOENT;
+
+	if (rdev->pm.fan_pulses_per_revolution == 0)
+		return -ENOENT;
+
+	tach_period = (RREG32(CG_TACH_STATUS) & TACH_PERIOD_MASK) >> TACH_PERIOD_SHIFT;
+	if (tach_period == 0)
+		return -ENOENT;
+
+	*speed = 60 * xclk * 10000 / tach_period;
+
+	return 0;
+}
+
+static int si_fan_ctrl_set_fan_speed_rpm(struct radeon_device *rdev,
+					 u32 speed)
+{
+	u32 tach_period, tmp;
+	u32 xclk = radeon_get_xclk(rdev);
+
+	if (rdev->pm.no_fan)
+		return -ENOENT;
+
+	if (rdev->pm.fan_pulses_per_revolution == 0)
+		return -ENOENT;
+
+	if ((speed < rdev->pm.fan_min_rpm) ||
+	    (speed > rdev->pm.fan_max_rpm))
+		return -EINVAL;
+
+	if (rdev->pm.dpm.fan.ucode_fan_control)
+		si_fan_ctrl_stop_smc_fan_control(rdev);
+
+	tach_period = 60 * xclk * 10000 / (8 * speed);
+	tmp = RREG32(CG_TACH_CTRL) & ~TARGET_PERIOD_MASK;
+	tmp |= TARGET_PERIOD(tach_period);
+	WREG32(CG_TACH_CTRL, tmp);
+
+	si_fan_ctrl_set_static_mode(rdev, FDO_PWM_MODE_STATIC_RPM);
+
+	return 0;
+}
+#endif
+
+static void si_fan_ctrl_set_default_mode(struct radeon_device *rdev)
+{
+	struct si_power_info *si_pi = si_get_pi(rdev);
+	u32 tmp;
+
+	if (!si_pi->fan_ctrl_is_in_default_mode) {
+		tmp = RREG32(CG_FDO_CTRL2) & ~FDO_PWM_MODE_MASK;
+		tmp |= FDO_PWM_MODE(si_pi->fan_ctrl_default_mode);
+		WREG32(CG_FDO_CTRL2, tmp);
+
+		tmp = RREG32(CG_FDO_CTRL2) & ~TMIN_MASK;
+		tmp |= TMIN(si_pi->t_min);
+		WREG32(CG_FDO_CTRL2, tmp);
+		si_pi->fan_ctrl_is_in_default_mode = true;
+	}
+}
+
+static void si_thermal_start_smc_fan_control(struct radeon_device *rdev)
+{
+	if (rdev->pm.dpm.fan.ucode_fan_control) {
+		si_fan_ctrl_start_smc_fan_control(rdev);
+		si_fan_ctrl_set_static_mode(rdev, FDO_PWM_MODE_STATIC);
+	}
+}
+
+static void si_thermal_initialize(struct radeon_device *rdev)
+{
+	u32 tmp;
+
+	if (rdev->pm.fan_pulses_per_revolution) {
+		tmp = RREG32(CG_TACH_CTRL) & ~EDGE_PER_REV_MASK;
+		tmp |= EDGE_PER_REV(rdev->pm.fan_pulses_per_revolution -1);
+		WREG32(CG_TACH_CTRL, tmp);
+	}
+
+	tmp = RREG32(CG_FDO_CTRL2) & ~TACH_PWM_RESP_RATE_MASK;
+	tmp |= TACH_PWM_RESP_RATE(0x28);
+	WREG32(CG_FDO_CTRL2, tmp);
+}
+
+static int si_thermal_start_thermal_controller(struct radeon_device *rdev)
+{
+	int ret;
+
+	si_thermal_initialize(rdev);
+	ret = si_thermal_set_temperature_range(rdev, R600_TEMP_RANGE_MIN, R600_TEMP_RANGE_MAX);
+	if (ret)
+		return ret;
+	ret = si_thermal_enable_alert(rdev, true);
+	if (ret)
+		return ret;
+	if (rdev->pm.dpm.fan.ucode_fan_control) {
+		ret = si_halt_smc(rdev);
+		if (ret)
+			return ret;
+		ret = si_thermal_setup_fan_table(rdev);
+		if (ret)
+			return ret;
+		ret = si_resume_smc(rdev);
+		if (ret)
+			return ret;
+		si_thermal_start_smc_fan_control(rdev);
+	}
+
+	return 0;
+}
+
+static void si_thermal_stop_thermal_controller(struct radeon_device *rdev)
+{
+	if (!rdev->pm.no_fan) {
+		si_fan_ctrl_set_default_mode(rdev);
+		si_fan_ctrl_stop_smc_fan_control(rdev);
+	}
+}
+
 int si_dpm_enable(struct radeon_device *rdev)
 {
 	struct rv7xx_power_info *pi = rv770_get_pi(rdev);
@@ -5954,31 +6291,39 @@ int si_dpm_enable(struct radeon_device *rdev)
 
 	si_enable_auto_throttle_source(rdev, RADEON_DPM_AUTO_THROTTLE_SRC_THERMAL, true);
 
+	si_thermal_start_thermal_controller(rdev);
+
 	ni_update_current_ps(rdev, boot_ps);
 
 	return 0;
 }
 
-int si_dpm_late_enable(struct radeon_device *rdev)
+static int si_set_temperature_range(struct radeon_device *rdev)
 {
 	int ret;
 
-	if (rdev->irq.installed &&
-	    r600_is_internal_thermal_sensor(rdev->pm.int_thermal_type)) {
-		PPSMC_Result result;
+	ret = si_thermal_enable_alert(rdev, false);
+	if (ret)
+		return ret;
+	ret = si_thermal_set_temperature_range(rdev, R600_TEMP_RANGE_MIN, R600_TEMP_RANGE_MAX);
+	if (ret)
+		return ret;
+	ret = si_thermal_enable_alert(rdev, true);
+	if (ret)
+		return ret;
 
-		ret = si_set_thermal_temperature_range(rdev, R600_TEMP_RANGE_MIN, R600_TEMP_RANGE_MAX);
-		if (ret)
-			return ret;
-		rdev->irq.dpm_thermal = true;
-		radeon_irq_set(rdev);
-		result = si_send_msg_to_smc(rdev, PPSMC_MSG_EnableThermalInterrupt);
+	return ret;
+}
 
-		if (result != PPSMC_Result_OK)
-			DRM_DEBUG_KMS("Could not enable thermal interrupts.\n");
-	}
+int si_dpm_late_enable(struct radeon_device *rdev)
+{
+	int ret;
 
-	return 0;
+	ret = si_set_temperature_range(rdev);
+	if (ret)
+		return ret;
+
+	return ret;
 }
 
 void si_dpm_disable(struct radeon_device *rdev)
@@ -5988,6 +6333,7 @@ void si_dpm_disable(struct radeon_device *rdev)
 
 	if (!si_is_smc_running(rdev))
 		return;
+	si_thermal_stop_thermal_controller(rdev);
 	si_disable_ulv(rdev);
 	si_clear_vc(rdev);
 	if (pi->thermal_protection)
@@ -6526,6 +6872,9 @@ int si_dpm_init(struct radeon_device *rdev)
 		rdev->pm.dpm.dyn_state.max_clock_voltage_on_dc =
 			rdev->pm.dpm.dyn_state.max_clock_voltage_on_ac;
 
+	si_pi->fan_ctrl_is_in_default_mode = true;
+	rdev->pm.dpm.fan.ucode_fan_control = false;
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/si_dpm.h b/drivers/gpu/drm/radeon/si_dpm.h
index 8b5c06a0832d..d16bb1b5f10f 100644
--- a/drivers/gpu/drm/radeon/si_dpm.h
+++ b/drivers/gpu/drm/radeon/si_dpm.h
@@ -182,6 +182,7 @@ struct si_power_info {
 	u32 dte_table_start;
 	u32 spll_table_start;
 	u32 papm_cfg_table_start;
+	u32 fan_table_start;
 	/* CAC stuff */
 	const struct si_cac_config_reg *cac_weights;
 	const struct si_cac_config_reg *lcac_config;
@@ -197,6 +198,10 @@ struct si_power_info {
 	/* SVI2 */
 	u8 svd_gpio_id;
 	u8 svc_gpio_id;
+	/* fan control */
+	bool fan_ctrl_is_in_default_mode;
+	u32 t_min;
+	u32 fan_ctrl_default_mode;
 };
 
 #define SISLANDS_INITIAL_STATE_ARB_INDEX    0
diff --git a/drivers/gpu/drm/radeon/si_smc.c b/drivers/gpu/drm/radeon/si_smc.c
index 73dbc79c959d..e5bb92f16775 100644
--- a/drivers/gpu/drm/radeon/si_smc.c
+++ b/drivers/gpu/drm/radeon/si_smc.c
@@ -135,7 +135,7 @@ void si_reset_smc(struct radeon_device *rdev)
 
 int si_program_jump_on_start(struct radeon_device *rdev)
 {
-	static u8 data[] = { 0x0E, 0x00, 0x40, 0x40 };
+	static const u8 data[] = { 0x0E, 0x00, 0x40, 0x40 };
 
 	return si_copy_bytes_to_smc(rdev, 0x0, data, 4, sizeof(data)+1);
 }
diff --git a/drivers/gpu/drm/radeon/sid.h b/drivers/gpu/drm/radeon/sid.h
index 6635da9ec986..4069be89e585 100644
--- a/drivers/gpu/drm/radeon/sid.h
+++ b/drivers/gpu/drm/radeon/sid.h
@@ -180,7 +180,10 @@
 #define		DIG_THERM_DPM(x)			((x) << 14)
 #define		DIG_THERM_DPM_MASK			0x003FC000
 #define		DIG_THERM_DPM_SHIFT			14
-
+#define	CG_THERMAL_STATUS				0x704
+#define		FDO_PWM_DUTY(x)				((x) << 9)
+#define		FDO_PWM_DUTY_MASK			(0xff << 9)
+#define		FDO_PWM_DUTY_SHIFT			9
 #define	CG_THERMAL_INT					0x708
 #define		DIG_THERM_INTH(x)			((x) << 8)
 #define		DIG_THERM_INTH_MASK			0x0000FF00
@@ -191,6 +194,10 @@
 #define 	THERM_INT_MASK_HIGH			(1 << 24)
 #define 	THERM_INT_MASK_LOW			(1 << 25)
 
+#define	CG_MULT_THERMAL_CTRL					0x710
+#define		TEMP_SEL(x)					((x) << 20)
+#define		TEMP_SEL_MASK					(0xff << 20)
+#define		TEMP_SEL_SHIFT					20
 #define	CG_MULT_THERMAL_STATUS					0x714
 #define		ASIC_MAX_TEMP(x)				((x) << 0)
 #define		ASIC_MAX_TEMP_MASK				0x000001ff
@@ -199,6 +206,37 @@
 #define		CTF_TEMP_MASK					0x0003fe00
 #define		CTF_TEMP_SHIFT					9
 
+#define	CG_FDO_CTRL0					0x754
+#define		FDO_STATIC_DUTY(x)			((x) << 0)
+#define		FDO_STATIC_DUTY_MASK			0x000000FF
+#define		FDO_STATIC_DUTY_SHIFT			0
+#define	CG_FDO_CTRL1					0x758
+#define		FMAX_DUTY100(x)				((x) << 0)
+#define		FMAX_DUTY100_MASK			0x000000FF
+#define		FMAX_DUTY100_SHIFT			0
+#define	CG_FDO_CTRL2					0x75C
+#define		TMIN(x)					((x) << 0)
+#define		TMIN_MASK				0x000000FF
+#define		TMIN_SHIFT				0
+#define		FDO_PWM_MODE(x)				((x) << 11)
+#define		FDO_PWM_MODE_MASK			(7 << 11)
+#define		FDO_PWM_MODE_SHIFT			11
+#define		TACH_PWM_RESP_RATE(x)			((x) << 25)
+#define		TACH_PWM_RESP_RATE_MASK			(0x7f << 25)
+#define		TACH_PWM_RESP_RATE_SHIFT		25
+
+#define CG_TACH_CTRL                                    0x770
+#       define EDGE_PER_REV(x)                          ((x) << 0)
+#       define EDGE_PER_REV_MASK                        (0x7 << 0)
+#       define EDGE_PER_REV_SHIFT                       0
+#       define TARGET_PERIOD(x)                         ((x) << 3)
+#       define TARGET_PERIOD_MASK                       0xfffffff8
+#       define TARGET_PERIOD_SHIFT                      3
+#define CG_TACH_STATUS                                  0x774
+#       define TACH_PERIOD(x)                           ((x) << 0)
+#       define TACH_PERIOD_MASK                         0xffffffff
+#       define TACH_PERIOD_SHIFT                        0
+
 #define GENERAL_PWRMGT                                  0x780
 #       define GLOBAL_PWRMGT_EN                         (1 << 0)
 #       define STATIC_PM_EN                             (1 << 1)
diff --git a/drivers/gpu/drm/radeon/sislands_smc.h b/drivers/gpu/drm/radeon/sislands_smc.h
index 623a0b1e2d9d..3c779838d9ab 100644
--- a/drivers/gpu/drm/radeon/sislands_smc.h
+++ b/drivers/gpu/drm/radeon/sislands_smc.h
@@ -245,6 +245,31 @@ typedef struct SISLANDS_SMC_STATETABLE SISLANDS_SMC_STATETABLE;
 #define SI_SMC_SOFT_REGISTER_svi_rework_gpio_id_svd   0x11c
 #define SI_SMC_SOFT_REGISTER_svi_rework_gpio_id_svc   0x120
 
+struct PP_SIslands_FanTable
+{
+	uint8_t  fdo_mode;
+	uint8_t  padding;
+	int16_t  temp_min;
+	int16_t  temp_med;
+	int16_t  temp_max;
+	int16_t  slope1;
+	int16_t  slope2;
+	int16_t  fdo_min;
+	int16_t  hys_up;
+	int16_t  hys_down;
+	int16_t  hys_slope;
+	int16_t  temp_resp_lim;
+	int16_t  temp_curr;
+	int16_t  slope_curr;
+	int16_t  pwm_curr;
+	uint32_t refresh_period;
+	int16_t  fdo_max;
+	uint8_t  temp_src;
+	int8_t  padding2;
+};
+
+typedef struct PP_SIslands_FanTable PP_SIslands_FanTable;
+
 #define SMC_SISLANDS_LKGE_LUT_NUM_OF_TEMP_ENTRIES 16
 #define SMC_SISLANDS_LKGE_LUT_NUM_OF_VOLT_ENTRIES 32
 
diff --git a/drivers/gpu/drm/radeon/smu7_discrete.h b/drivers/gpu/drm/radeon/smu7_discrete.h
index 82f70c90a9ee..0b0b404ff091 100644
--- a/drivers/gpu/drm/radeon/smu7_discrete.h
+++ b/drivers/gpu/drm/radeon/smu7_discrete.h
@@ -431,6 +431,31 @@ struct SMU7_Discrete_MCRegisters
 
 typedef struct SMU7_Discrete_MCRegisters SMU7_Discrete_MCRegisters;
 
+struct SMU7_Discrete_FanTable
+{
+	uint16_t FdoMode;
+	int16_t  TempMin;
+	int16_t  TempMed;
+	int16_t  TempMax;
+	int16_t  Slope1;
+	int16_t  Slope2;
+	int16_t  FdoMin;
+	int16_t  HystUp;
+	int16_t  HystDown;
+	int16_t  HystSlope;
+	int16_t  TempRespLim;
+	int16_t  TempCurr;
+	int16_t  SlopeCurr;
+	int16_t  PwmCurr;
+	uint32_t RefreshPeriod;
+	int16_t  FdoMax;
+	uint8_t  TempSrc;
+	int8_t   Padding;
+};
+
+typedef struct SMU7_Discrete_FanTable SMU7_Discrete_FanTable;
+
+
 struct SMU7_Discrete_PmFuses {
   // dw0-dw1
   uint8_t BapmVddCVidHiSidd[8];
@@ -462,7 +487,10 @@ struct SMU7_Discrete_PmFuses {
   uint8_t BapmVddCVidHiSidd2[8];
 
   // dw11-dw12
-  uint32_t Reserved6[2];
+  int16_t FuzzyFan_ErrorSetDelta;
+  int16_t FuzzyFan_ErrorRateSetDelta;
+  int16_t FuzzyFan_PwmSetDelta;
+  uint16_t CalcMeasPowerBlend;
 
   // dw13-dw16
   uint8_t GnbLPML[16];
diff --git a/drivers/gpu/drm/rcar-du/Kconfig b/drivers/gpu/drm/rcar-du/Kconfig
index c96f6089f8bf..2324a526de65 100644
--- a/drivers/gpu/drm/rcar-du/Kconfig
+++ b/drivers/gpu/drm/rcar-du/Kconfig
@@ -11,10 +11,17 @@ config DRM_RCAR_DU
 	  Choose this option if you have an R-Car chipset.
 	  If M is selected the module will be called rcar-du-drm.
 
+config DRM_RCAR_HDMI
+	bool "R-Car DU HDMI Encoder Support"
+	depends on DRM_RCAR_DU
+	depends on OF
+	help
+	  Enable support for external HDMI encoders.
+
 config DRM_RCAR_LVDS
 	bool "R-Car DU LVDS Encoder Support"
 	depends on DRM_RCAR_DU
 	depends on ARCH_R8A7790 || ARCH_R8A7791 || COMPILE_TEST
 	help
-	  Enable support the R-Car Display Unit embedded LVDS encoders
-	  (currently only on R8A7790).
+	  Enable support for the R-Car Display Unit embedded LVDS encoders
+	  (currently only on R8A7790 and R8A7791).
diff --git a/drivers/gpu/drm/rcar-du/Makefile b/drivers/gpu/drm/rcar-du/Makefile
index 12b8d4477835..05de1c4097af 100644
--- a/drivers/gpu/drm/rcar-du/Makefile
+++ b/drivers/gpu/drm/rcar-du/Makefile
@@ -7,6 +7,8 @@ rcar-du-drm-y := rcar_du_crtc.o \
 		 rcar_du_plane.o \
 		 rcar_du_vgacon.o
 
+rcar-du-drm-$(CONFIG_DRM_RCAR_HDMI)	+= rcar_du_hdmicon.o \
+					   rcar_du_hdmienc.o
 rcar-du-drm-$(CONFIG_DRM_RCAR_LVDS)	+= rcar_du_lvdsenc.o
 
 obj-$(CONFIG_DRM_RCAR_DU)		+= rcar-du-drm.o
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_crtc.c b/drivers/gpu/drm/rcar-du/rcar_du_crtc.c
index 148b50589181..23cc910951f4 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_crtc.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_crtc.c
@@ -19,6 +19,7 @@
 #include <drm/drm_crtc_helper.h>
 #include <drm/drm_fb_cma_helper.h>
 #include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_plane_helper.h>
 
 #include "rcar_du_crtc.h"
 #include "rcar_du_drv.h"
@@ -585,7 +586,7 @@ int rcar_du_crtc_create(struct rcar_du_group *rgrp, unsigned int index)
 
 	if (irq < 0) {
 		dev_err(rcdu->dev, "no IRQ for CRTC %u\n", index);
-		return ret;
+		return irq;
 	}
 
 	ret = devm_request_irq(rcdu->dev, irq, rcar_du_crtc_irq, irqflags,
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_crtc.h b/drivers/gpu/drm/rcar-du/rcar_du_crtc.h
index e97ae502dec5..984e6083699f 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_crtc.h
+++ b/drivers/gpu/drm/rcar-du/rcar_du_crtc.h
@@ -15,7 +15,6 @@
 #define __RCAR_DU_CRTC_H__
 
 #include <linux/mutex.h>
-#include <linux/platform_data/rcar-du.h>
 
 #include <drm/drmP.h>
 #include <drm/drm_crtc.h>
@@ -41,6 +40,15 @@ struct rcar_du_crtc {
 
 #define to_rcar_crtc(c)	container_of(c, struct rcar_du_crtc, crtc)
 
+enum rcar_du_output {
+	RCAR_DU_OUTPUT_DPAD0,
+	RCAR_DU_OUTPUT_DPAD1,
+	RCAR_DU_OUTPUT_LVDS0,
+	RCAR_DU_OUTPUT_LVDS1,
+	RCAR_DU_OUTPUT_TCON,
+	RCAR_DU_OUTPUT_MAX,
+};
+
 int rcar_du_crtc_create(struct rcar_du_group *rgrp, unsigned int index);
 void rcar_du_crtc_enable_vblank(struct rcar_du_crtc *rcrtc, bool enable);
 void rcar_du_crtc_cancel_page_flip(struct rcar_du_crtc *rcrtc,
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_drv.c b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
index e419aade2209..7bfa09cf18d5 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_drv.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
@@ -146,12 +146,11 @@ static int rcar_du_load(struct drm_device *dev, unsigned long flags)
 {
 	struct platform_device *pdev = dev->platformdev;
 	struct device_node *np = pdev->dev.of_node;
-	struct rcar_du_platform_data *pdata = pdev->dev.platform_data;
 	struct rcar_du_device *rcdu;
 	struct resource *mem;
 	int ret;
 
-	if (pdata == NULL && np == NULL) {
+	if (np == NULL) {
 		dev_err(dev->dev, "no platform data\n");
 		return -ENODEV;
 	}
@@ -163,7 +162,6 @@ static int rcar_du_load(struct drm_device *dev, unsigned long flags)
 	}
 
 	rcdu->dev = &pdev->dev;
-	rcdu->pdata = pdata;
 	rcdu->info = np ? of_match_device(rcar_du_of_table, rcdu->dev)->data
 		   : (void *)platform_get_device_id(pdev)->driver_data;
 	rcdu->ddev = dev;
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_drv.h b/drivers/gpu/drm/rcar-du/rcar_du_drv.h
index 8e494633c3b3..0a724669f02d 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_drv.h
+++ b/drivers/gpu/drm/rcar-du/rcar_du_drv.h
@@ -15,7 +15,6 @@
 #define __RCAR_DU_DRV_H__
 
 #include <linux/kernel.h>
-#include <linux/platform_data/rcar-du.h>
 
 #include "rcar_du_crtc.h"
 #include "rcar_du_group.h"
@@ -67,7 +66,6 @@ struct rcar_du_device_info {
 
 struct rcar_du_device {
 	struct device *dev;
-	const struct rcar_du_platform_data *pdata;
 	const struct rcar_du_device_info *info;
 
 	void __iomem *mmio;
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_encoder.c b/drivers/gpu/drm/rcar-du/rcar_du_encoder.c
index 7c0ec95915ef..34a122a39664 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_encoder.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_encoder.c
@@ -19,6 +19,8 @@
 
 #include "rcar_du_drv.h"
 #include "rcar_du_encoder.h"
+#include "rcar_du_hdmicon.h"
+#include "rcar_du_hdmienc.h"
 #include "rcar_du_kms.h"
 #include "rcar_du_lvdscon.h"
 #include "rcar_du_lvdsenc.h"
@@ -33,7 +35,7 @@ rcar_du_connector_best_encoder(struct drm_connector *connector)
 {
 	struct rcar_du_connector *rcon = to_rcar_connector(connector);
 
-	return &rcon->encoder->encoder;
+	return rcar_encoder_to_drm_encoder(rcon->encoder);
 }
 
 /* -----------------------------------------------------------------------------
@@ -142,10 +144,11 @@ static const struct drm_encoder_funcs encoder_funcs = {
 int rcar_du_encoder_init(struct rcar_du_device *rcdu,
 			 enum rcar_du_encoder_type type,
 			 enum rcar_du_output output,
-			 const struct rcar_du_encoder_data *data,
-			 struct device_node *np)
+			 struct device_node *enc_node,
+			 struct device_node *con_node)
 {
 	struct rcar_du_encoder *renc;
+	struct drm_encoder *encoder;
 	unsigned int encoder_type;
 	int ret;
 
@@ -154,6 +157,7 @@ int rcar_du_encoder_init(struct rcar_du_device *rcdu,
 		return -ENOMEM;
 
 	renc->output = output;
+	encoder = rcar_encoder_to_drm_encoder(renc);
 
 	switch (output) {
 	case RCAR_DU_OUTPUT_LVDS0:
@@ -175,6 +179,9 @@ int rcar_du_encoder_init(struct rcar_du_device *rcdu,
 	case RCAR_DU_ENCODER_LVDS:
 		encoder_type = DRM_MODE_ENCODER_LVDS;
 		break;
+	case RCAR_DU_ENCODER_HDMI:
+		encoder_type = DRM_MODE_ENCODER_TMDS;
+		break;
 	case RCAR_DU_ENCODER_NONE:
 	default:
 		/* No external encoder, use the internal encoder type. */
@@ -182,23 +189,35 @@ int rcar_du_encoder_init(struct rcar_du_device *rcdu,
 		break;
 	}
 
-	ret = drm_encoder_init(rcdu->ddev, &renc->encoder, &encoder_funcs,
-			       encoder_type);
-	if (ret < 0)
-		return ret;
+	if (type == RCAR_DU_ENCODER_HDMI) {
+		if (renc->lvds) {
+			dev_err(rcdu->dev,
+				"Chaining LVDS and HDMI encoders not supported\n");
+			return -EINVAL;
+		}
 
-	drm_encoder_helper_add(&renc->encoder, &encoder_helper_funcs);
+		ret = rcar_du_hdmienc_init(rcdu, renc, enc_node);
+		if (ret < 0)
+			return ret;
+	} else {
+		ret = drm_encoder_init(rcdu->ddev, encoder, &encoder_funcs,
+				       encoder_type);
+		if (ret < 0)
+			return ret;
 
-	switch (encoder_type) {
-	case DRM_MODE_ENCODER_LVDS: {
-		const struct rcar_du_panel_data *pdata =
-			data ? &data->connector.lvds.panel : NULL;
-		return rcar_du_lvds_connector_init(rcdu, renc, pdata, np);
+		drm_encoder_helper_add(encoder, &encoder_helper_funcs);
 	}
 
+	switch (encoder_type) {
+	case DRM_MODE_ENCODER_LVDS:
+		return rcar_du_lvds_connector_init(rcdu, renc, con_node);
+
 	case DRM_MODE_ENCODER_DAC:
 		return rcar_du_vga_connector_init(rcdu, renc);
 
+	case DRM_MODE_ENCODER_TMDS:
+		return rcar_du_hdmi_connector_init(rcdu, renc);
+
 	default:
 		return -EINVAL;
 	}
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_encoder.h b/drivers/gpu/drm/rcar-du/rcar_du_encoder.h
index bd624135ef1f..719b6f2a031c 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_encoder.h
+++ b/drivers/gpu/drm/rcar-du/rcar_du_encoder.h
@@ -14,21 +14,32 @@
 #ifndef __RCAR_DU_ENCODER_H__
 #define __RCAR_DU_ENCODER_H__
 
-#include <linux/platform_data/rcar-du.h>
-
 #include <drm/drm_crtc.h>
+#include <drm/drm_encoder_slave.h>
 
 struct rcar_du_device;
+struct rcar_du_hdmienc;
 struct rcar_du_lvdsenc;
 
+enum rcar_du_encoder_type {
+	RCAR_DU_ENCODER_UNUSED = 0,
+	RCAR_DU_ENCODER_NONE,
+	RCAR_DU_ENCODER_VGA,
+	RCAR_DU_ENCODER_LVDS,
+	RCAR_DU_ENCODER_HDMI,
+};
+
 struct rcar_du_encoder {
-	struct drm_encoder encoder;
+	struct drm_encoder_slave slave;
 	enum rcar_du_output output;
+	struct rcar_du_hdmienc *hdmi;
 	struct rcar_du_lvdsenc *lvds;
 };
 
 #define to_rcar_encoder(e) \
-	container_of(e, struct rcar_du_encoder, encoder)
+	container_of(e, struct rcar_du_encoder, slave.base)
+
+#define rcar_encoder_to_drm_encoder(e)	(&(e)->slave.base)
 
 struct rcar_du_connector {
 	struct drm_connector connector;
@@ -44,7 +55,7 @@ rcar_du_connector_best_encoder(struct drm_connector *connector);
 int rcar_du_encoder_init(struct rcar_du_device *rcdu,
 			 enum rcar_du_encoder_type type,
 			 enum rcar_du_output output,
-			 const struct rcar_du_encoder_data *data,
-			 struct device_node *np);
+			 struct device_node *enc_node,
+			 struct device_node *con_node);
 
 #endif /* __RCAR_DU_ENCODER_H__ */
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_hdmicon.c b/drivers/gpu/drm/rcar-du/rcar_du_hdmicon.c
new file mode 100644
index 000000000000..4d7d4dd46d26
--- /dev/null
+++ b/drivers/gpu/drm/rcar-du/rcar_du_hdmicon.c
@@ -0,0 +1,121 @@
+/*
+ * R-Car Display Unit HDMI Connector
+ *
+ * Copyright (C) 2014 Renesas Electronics Corporation
+ *
+ * Contact: Laurent Pinchart (laurent.pinchart@ideasonboard.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <drm/drmP.h>
+#include <drm/drm_crtc.h>
+#include <drm/drm_crtc_helper.h>
+#include <drm/drm_encoder_slave.h>
+
+#include "rcar_du_drv.h"
+#include "rcar_du_encoder.h"
+#include "rcar_du_hdmicon.h"
+#include "rcar_du_kms.h"
+
+#define to_slave_funcs(e)	(to_rcar_encoder(e)->slave.slave_funcs)
+
+static int rcar_du_hdmi_connector_get_modes(struct drm_connector *connector)
+{
+	struct rcar_du_connector *con = to_rcar_connector(connector);
+	struct drm_encoder *encoder = rcar_encoder_to_drm_encoder(con->encoder);
+	struct drm_encoder_slave_funcs *sfuncs = to_slave_funcs(encoder);
+
+	if (sfuncs->get_modes == NULL)
+		return 0;
+
+	return sfuncs->get_modes(encoder, connector);
+}
+
+static int rcar_du_hdmi_connector_mode_valid(struct drm_connector *connector,
+					     struct drm_display_mode *mode)
+{
+	struct rcar_du_connector *con = to_rcar_connector(connector);
+	struct drm_encoder *encoder = rcar_encoder_to_drm_encoder(con->encoder);
+	struct drm_encoder_slave_funcs *sfuncs = to_slave_funcs(encoder);
+
+	if (sfuncs->mode_valid == NULL)
+		return MODE_OK;
+
+	return sfuncs->mode_valid(encoder, mode);
+}
+
+static const struct drm_connector_helper_funcs connector_helper_funcs = {
+	.get_modes = rcar_du_hdmi_connector_get_modes,
+	.mode_valid = rcar_du_hdmi_connector_mode_valid,
+	.best_encoder = rcar_du_connector_best_encoder,
+};
+
+static void rcar_du_hdmi_connector_destroy(struct drm_connector *connector)
+{
+	drm_connector_unregister(connector);
+	drm_connector_cleanup(connector);
+}
+
+static enum drm_connector_status
+rcar_du_hdmi_connector_detect(struct drm_connector *connector, bool force)
+{
+	struct rcar_du_connector *con = to_rcar_connector(connector);
+	struct drm_encoder *encoder = rcar_encoder_to_drm_encoder(con->encoder);
+	struct drm_encoder_slave_funcs *sfuncs = to_slave_funcs(encoder);
+
+	if (sfuncs->detect == NULL)
+		return connector_status_unknown;
+
+	return sfuncs->detect(encoder, connector);
+}
+
+static const struct drm_connector_funcs connector_funcs = {
+	.dpms = drm_helper_connector_dpms,
+	.detect = rcar_du_hdmi_connector_detect,
+	.fill_modes = drm_helper_probe_single_connector_modes,
+	.destroy = rcar_du_hdmi_connector_destroy,
+};
+
+int rcar_du_hdmi_connector_init(struct rcar_du_device *rcdu,
+				struct rcar_du_encoder *renc)
+{
+	struct drm_encoder *encoder = rcar_encoder_to_drm_encoder(renc);
+	struct rcar_du_connector *rcon;
+	struct drm_connector *connector;
+	int ret;
+
+	rcon = devm_kzalloc(rcdu->dev, sizeof(*rcon), GFP_KERNEL);
+	if (rcon == NULL)
+		return -ENOMEM;
+
+	connector = &rcon->connector;
+	connector->display_info.width_mm = 0;
+	connector->display_info.height_mm = 0;
+
+	ret = drm_connector_init(rcdu->ddev, connector, &connector_funcs,
+				 DRM_MODE_CONNECTOR_HDMIA);
+	if (ret < 0)
+		return ret;
+
+	drm_connector_helper_add(connector, &connector_helper_funcs);
+	ret = drm_connector_register(connector);
+	if (ret < 0)
+		return ret;
+
+	drm_helper_connector_dpms(connector, DRM_MODE_DPMS_OFF);
+	drm_object_property_set_value(&connector->base,
+		rcdu->ddev->mode_config.dpms_property, DRM_MODE_DPMS_OFF);
+
+	ret = drm_mode_connector_attach_encoder(connector, encoder);
+	if (ret < 0)
+		return ret;
+
+	connector->encoder = encoder;
+	rcon->encoder = renc;
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_hdmicon.h b/drivers/gpu/drm/rcar-du/rcar_du_hdmicon.h
new file mode 100644
index 000000000000..87daa949227f
--- /dev/null
+++ b/drivers/gpu/drm/rcar-du/rcar_du_hdmicon.h
@@ -0,0 +1,31 @@
+/*
+ * R-Car Display Unit HDMI Connector
+ *
+ * Copyright (C) 2014 Renesas Electronics Corporation
+ *
+ * Contact: Laurent Pinchart (laurent.pinchart@ideasonboard.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __RCAR_DU_HDMICON_H__
+#define __RCAR_DU_HDMICON_H__
+
+struct rcar_du_device;
+struct rcar_du_encoder;
+
+#if IS_ENABLED(CONFIG_DRM_RCAR_HDMI)
+int rcar_du_hdmi_connector_init(struct rcar_du_device *rcdu,
+				struct rcar_du_encoder *renc);
+#else
+static inline int rcar_du_hdmi_connector_init(struct rcar_du_device *rcdu,
+					      struct rcar_du_encoder *renc)
+{
+	return -ENOSYS;
+}
+#endif
+
+#endif /* __RCAR_DU_HDMICON_H__ */
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_hdmienc.c b/drivers/gpu/drm/rcar-du/rcar_du_hdmienc.c
new file mode 100644
index 000000000000..359bc999a9c8
--- /dev/null
+++ b/drivers/gpu/drm/rcar-du/rcar_du_hdmienc.c
@@ -0,0 +1,151 @@
+/*
+ * R-Car Display Unit HDMI Encoder
+ *
+ * Copyright (C) 2014 Renesas Electronics Corporation
+ *
+ * Contact: Laurent Pinchart (laurent.pinchart@ideasonboard.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/slab.h>
+
+#include <drm/drmP.h>
+#include <drm/drm_crtc.h>
+#include <drm/drm_crtc_helper.h>
+#include <drm/drm_encoder_slave.h>
+
+#include "rcar_du_drv.h"
+#include "rcar_du_encoder.h"
+#include "rcar_du_hdmienc.h"
+
+struct rcar_du_hdmienc {
+	struct rcar_du_encoder *renc;
+	struct device *dev;
+	int dpms;
+};
+
+#define to_rcar_hdmienc(e)	(to_rcar_encoder(e)->hdmi)
+#define to_slave_funcs(e)	(to_rcar_encoder(e)->slave.slave_funcs)
+
+static void rcar_du_hdmienc_dpms(struct drm_encoder *encoder, int mode)
+{
+	struct rcar_du_hdmienc *hdmienc = to_rcar_hdmienc(encoder);
+	struct drm_encoder_slave_funcs *sfuncs = to_slave_funcs(encoder);
+
+	if (hdmienc->dpms == mode)
+		return;
+
+	if (sfuncs->dpms)
+		sfuncs->dpms(encoder, mode);
+
+	hdmienc->dpms = mode;
+}
+
+static bool rcar_du_hdmienc_mode_fixup(struct drm_encoder *encoder,
+				       const struct drm_display_mode *mode,
+				       struct drm_display_mode *adjusted_mode)
+{
+	struct drm_encoder_slave_funcs *sfuncs = to_slave_funcs(encoder);
+
+	if (sfuncs->mode_fixup == NULL)
+		return true;
+
+	return sfuncs->mode_fixup(encoder, mode, adjusted_mode);
+}
+
+static void rcar_du_hdmienc_mode_prepare(struct drm_encoder *encoder)
+{
+	rcar_du_hdmienc_dpms(encoder, DRM_MODE_DPMS_OFF);
+}
+
+static void rcar_du_hdmienc_mode_commit(struct drm_encoder *encoder)
+{
+	rcar_du_hdmienc_dpms(encoder, DRM_MODE_DPMS_ON);
+}
+
+static void rcar_du_hdmienc_mode_set(struct drm_encoder *encoder,
+				     struct drm_display_mode *mode,
+				     struct drm_display_mode *adjusted_mode)
+{
+	struct rcar_du_hdmienc *hdmienc = to_rcar_hdmienc(encoder);
+	struct drm_encoder_slave_funcs *sfuncs = to_slave_funcs(encoder);
+
+	if (sfuncs->mode_set)
+		sfuncs->mode_set(encoder, mode, adjusted_mode);
+
+	rcar_du_crtc_route_output(encoder->crtc, hdmienc->renc->output);
+}
+
+static const struct drm_encoder_helper_funcs encoder_helper_funcs = {
+	.dpms = rcar_du_hdmienc_dpms,
+	.mode_fixup = rcar_du_hdmienc_mode_fixup,
+	.prepare = rcar_du_hdmienc_mode_prepare,
+	.commit = rcar_du_hdmienc_mode_commit,
+	.mode_set = rcar_du_hdmienc_mode_set,
+};
+
+static void rcar_du_hdmienc_cleanup(struct drm_encoder *encoder)
+{
+	struct rcar_du_hdmienc *hdmienc = to_rcar_hdmienc(encoder);
+
+	rcar_du_hdmienc_dpms(encoder, DRM_MODE_DPMS_OFF);
+
+	drm_encoder_cleanup(encoder);
+	put_device(hdmienc->dev);
+}
+
+static const struct drm_encoder_funcs encoder_funcs = {
+	.destroy = rcar_du_hdmienc_cleanup,
+};
+
+int rcar_du_hdmienc_init(struct rcar_du_device *rcdu,
+			 struct rcar_du_encoder *renc, struct device_node *np)
+{
+	struct drm_encoder *encoder = rcar_encoder_to_drm_encoder(renc);
+	struct drm_i2c_encoder_driver *driver;
+	struct i2c_client *i2c_slave;
+	struct rcar_du_hdmienc *hdmienc;
+	int ret;
+
+	hdmienc = devm_kzalloc(rcdu->dev, sizeof(*hdmienc), GFP_KERNEL);
+	if (hdmienc == NULL)
+		return -ENOMEM;
+
+	/* Locate the slave I2C device and driver. */
+	i2c_slave = of_find_i2c_device_by_node(np);
+	if (!i2c_slave || !i2c_get_clientdata(i2c_slave))
+		return -EPROBE_DEFER;
+
+	hdmienc->dev = &i2c_slave->dev;
+
+	if (hdmienc->dev->driver == NULL) {
+		ret = -EPROBE_DEFER;
+		goto error;
+	}
+
+	/* Initialize the slave encoder. */
+	driver = to_drm_i2c_encoder_driver(to_i2c_driver(hdmienc->dev->driver));
+	ret = driver->encoder_init(i2c_slave, rcdu->ddev, &renc->slave);
+	if (ret < 0)
+		goto error;
+
+	ret = drm_encoder_init(rcdu->ddev, encoder, &encoder_funcs,
+			       DRM_MODE_ENCODER_TMDS);
+	if (ret < 0)
+		goto error;
+
+	drm_encoder_helper_add(encoder, &encoder_helper_funcs);
+
+	renc->hdmi = hdmienc;
+	hdmienc->renc = renc;
+
+	return 0;
+
+error:
+	put_device(hdmienc->dev);
+	return ret;
+}
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_hdmienc.h b/drivers/gpu/drm/rcar-du/rcar_du_hdmienc.h
new file mode 100644
index 000000000000..2ff0128ac8e1
--- /dev/null
+++ b/drivers/gpu/drm/rcar-du/rcar_du_hdmienc.h
@@ -0,0 +1,35 @@
+/*
+ * R-Car Display Unit HDMI Encoder
+ *
+ * Copyright (C) 2014 Renesas Electronics Corporation
+ *
+ * Contact: Laurent Pinchart (laurent.pinchart@ideasonboard.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __RCAR_DU_HDMIENC_H__
+#define __RCAR_DU_HDMIENC_H__
+
+#include <linux/module.h>
+
+struct device_node;
+struct rcar_du_device;
+struct rcar_du_encoder;
+
+#if IS_ENABLED(CONFIG_DRM_RCAR_HDMI)
+int rcar_du_hdmienc_init(struct rcar_du_device *rcdu,
+			 struct rcar_du_encoder *renc, struct device_node *np);
+#else
+static inline int rcar_du_hdmienc_init(struct rcar_du_device *rcdu,
+				       struct rcar_du_encoder *renc,
+				       struct device_node *np)
+{
+	return -ENOSYS;
+}
+#endif
+
+#endif /* __RCAR_DU_HDMIENC_H__ */
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
index 6c24ad7d03ef..0c5ee616b5a3 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
@@ -126,9 +126,9 @@ int rcar_du_dumb_create(struct drm_file *file, struct drm_device *dev,
 	else
 		align = 16 * args->bpp / 8;
 
-	args->pitch = roundup(max(args->pitch, min_pitch), align);
+	args->pitch = roundup(min_pitch, align);
 
-	return drm_gem_cma_dumb_create(file, dev, args);
+	return drm_gem_cma_dumb_create_internal(file, dev, args);
 }
 
 static struct drm_framebuffer *
@@ -190,49 +190,16 @@ static const struct drm_mode_config_funcs rcar_du_mode_config_funcs = {
 	.output_poll_changed = rcar_du_output_poll_changed,
 };
 
-static int rcar_du_encoders_init_pdata(struct rcar_du_device *rcdu)
-{
-	unsigned int num_encoders = 0;
-	unsigned int i;
-	int ret;
-
-	for (i = 0; i < rcdu->pdata->num_encoders; ++i) {
-		const struct rcar_du_encoder_data *pdata =
-			&rcdu->pdata->encoders[i];
-		const struct rcar_du_output_routing *route =
-			&rcdu->info->routes[pdata->output];
-
-		if (pdata->type == RCAR_DU_ENCODER_UNUSED)
-			continue;
-
-		if (pdata->output >= RCAR_DU_OUTPUT_MAX ||
-		    route->possible_crtcs == 0) {
-			dev_warn(rcdu->dev,
-				 "encoder %u references unexisting output %u, skipping\n",
-				 i, pdata->output);
-			continue;
-		}
-
-		ret = rcar_du_encoder_init(rcdu, pdata->type, pdata->output,
-					   pdata, NULL);
-		if (ret < 0)
-			return ret;
-
-		num_encoders++;
-	}
-
-	return num_encoders;
-}
-
-static int rcar_du_encoders_init_dt_one(struct rcar_du_device *rcdu,
-					enum rcar_du_output output,
-					struct of_endpoint *ep)
+static int rcar_du_encoders_init_one(struct rcar_du_device *rcdu,
+				     enum rcar_du_output output,
+				     struct of_endpoint *ep)
 {
 	static const struct {
 		const char *compatible;
 		enum rcar_du_encoder_type type;
 	} encoders[] = {
 		{ "adi,adv7123", RCAR_DU_ENCODER_VGA },
+		{ "adi,adv7511w", RCAR_DU_ENCODER_HDMI },
 		{ "thine,thc63lvdm83d", RCAR_DU_ENCODER_LVDS },
 	};
 
@@ -323,14 +290,14 @@ static int rcar_du_encoders_init_dt_one(struct rcar_du_device *rcdu,
 		connector = entity;
 	}
 
-	ret = rcar_du_encoder_init(rcdu, enc_type, output, NULL, connector);
+	ret = rcar_du_encoder_init(rcdu, enc_type, output, encoder, connector);
 	of_node_put(encoder);
 	of_node_put(connector);
 
 	return ret < 0 ? ret : 1;
 }
 
-static int rcar_du_encoders_init_dt(struct rcar_du_device *rcdu)
+static int rcar_du_encoders_init(struct rcar_du_device *rcdu)
 {
 	struct device_node *np = rcdu->dev->of_node;
 	struct device_node *prev = NULL;
@@ -377,7 +344,7 @@ static int rcar_du_encoders_init_dt(struct rcar_du_device *rcdu)
 		}
 
 		/* Process the output pipeline. */
-		ret = rcar_du_encoders_init_dt_one(rcdu, output, &ep);
+		ret = rcar_du_encoders_init_one(rcdu, output, &ep);
 		if (ret < 0) {
 			of_node_put(ep_node);
 			return ret;
@@ -442,11 +409,7 @@ int rcar_du_modeset_init(struct rcar_du_device *rcdu)
 	if (ret < 0)
 		return ret;
 
-	if (rcdu->pdata)
-		ret = rcar_du_encoders_init_pdata(rcdu);
-	else
-		ret = rcar_du_encoders_init_dt(rcdu);
-
+	ret = rcar_du_encoders_init(rcdu);
 	if (ret < 0)
 		return ret;
 
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_lvdscon.c b/drivers/gpu/drm/rcar-du/rcar_du_lvdscon.c
index 115eed20db12..6d9811c052c4 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_lvdscon.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_lvdscon.c
@@ -27,7 +27,11 @@
 struct rcar_du_lvds_connector {
 	struct rcar_du_connector connector;
 
-	struct rcar_du_panel_data panel;
+	struct {
+		unsigned int width_mm;		/* Panel width in mm */
+		unsigned int height_mm;		/* Panel height in mm */
+		struct videomode mode;
+	} panel;
 };
 
 #define to_rcar_lvds_connector(c) \
@@ -78,31 +82,26 @@ static const struct drm_connector_funcs connector_funcs = {
 
 int rcar_du_lvds_connector_init(struct rcar_du_device *rcdu,
 				struct rcar_du_encoder *renc,
-				const struct rcar_du_panel_data *panel,
 				/* TODO const */ struct device_node *np)
 {
+	struct drm_encoder *encoder = rcar_encoder_to_drm_encoder(renc);
 	struct rcar_du_lvds_connector *lvdscon;
 	struct drm_connector *connector;
+	struct display_timing timing;
 	int ret;
 
 	lvdscon = devm_kzalloc(rcdu->dev, sizeof(*lvdscon), GFP_KERNEL);
 	if (lvdscon == NULL)
 		return -ENOMEM;
 
-	if (panel) {
-		lvdscon->panel = *panel;
-	} else {
-		struct display_timing timing;
-
-		ret = of_get_display_timing(np, "panel-timing", &timing);
-		if (ret < 0)
-			return ret;
+	ret = of_get_display_timing(np, "panel-timing", &timing);
+	if (ret < 0)
+		return ret;
 
-		videomode_from_timing(&timing, &lvdscon->panel.mode);
+	videomode_from_timing(&timing, &lvdscon->panel.mode);
 
-		of_property_read_u32(np, "width-mm", &lvdscon->panel.width_mm);
-		of_property_read_u32(np, "height-mm", &lvdscon->panel.height_mm);
-	}
+	of_property_read_u32(np, "width-mm", &lvdscon->panel.width_mm);
+	of_property_read_u32(np, "height-mm", &lvdscon->panel.height_mm);
 
 	connector = &lvdscon->connector.connector;
 	connector->display_info.width_mm = lvdscon->panel.width_mm;
@@ -122,11 +121,11 @@ int rcar_du_lvds_connector_init(struct rcar_du_device *rcdu,
 	drm_object_property_set_value(&connector->base,
 		rcdu->ddev->mode_config.dpms_property, DRM_MODE_DPMS_OFF);
 
-	ret = drm_mode_connector_attach_encoder(connector, &renc->encoder);
+	ret = drm_mode_connector_attach_encoder(connector, encoder);
 	if (ret < 0)
 		return ret;
 
-	connector->encoder = &renc->encoder;
+	connector->encoder = encoder;
 	lvdscon->connector.encoder = renc;
 
 	return 0;
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_lvdscon.h b/drivers/gpu/drm/rcar-du/rcar_du_lvdscon.h
index d11424d537f9..d4881ee0be7e 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_lvdscon.h
+++ b/drivers/gpu/drm/rcar-du/rcar_du_lvdscon.h
@@ -16,11 +16,9 @@
 
 struct rcar_du_device;
 struct rcar_du_encoder;
-struct rcar_du_panel_data;
 
 int rcar_du_lvds_connector_init(struct rcar_du_device *rcdu,
 				struct rcar_du_encoder *renc,
-				const struct rcar_du_panel_data *panel,
 				struct device_node *np);
 
 #endif /* __RCAR_DU_LVDSCON_H__ */
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_lvdsenc.h b/drivers/gpu/drm/rcar-du/rcar_du_lvdsenc.h
index 3303a55cec79..f65aabda0796 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_lvdsenc.h
+++ b/drivers/gpu/drm/rcar-du/rcar_du_lvdsenc.h
@@ -16,7 +16,6 @@
 
 #include <linux/io.h>
 #include <linux/module.h>
-#include <linux/platform_data/rcar-du.h>
 
 struct rcar_drm_crtc;
 struct rcar_du_lvdsenc;
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_vgacon.c b/drivers/gpu/drm/rcar-du/rcar_du_vgacon.c
index 564a723ede03..752747a5e920 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_vgacon.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_vgacon.c
@@ -52,6 +52,7 @@ static const struct drm_connector_funcs connector_funcs = {
 int rcar_du_vga_connector_init(struct rcar_du_device *rcdu,
 			       struct rcar_du_encoder *renc)
 {
+	struct drm_encoder *encoder = rcar_encoder_to_drm_encoder(renc);
 	struct rcar_du_connector *rcon;
 	struct drm_connector *connector;
 	int ret;
@@ -78,11 +79,11 @@ int rcar_du_vga_connector_init(struct rcar_du_device *rcdu,
 	drm_object_property_set_value(&connector->base,
 		rcdu->ddev->mode_config.dpms_property, DRM_MODE_DPMS_OFF);
 
-	ret = drm_mode_connector_attach_encoder(connector, &renc->encoder);
+	ret = drm_mode_connector_attach_encoder(connector, encoder);
 	if (ret < 0)
 		return ret;
 
-	connector->encoder = &renc->encoder;
+	connector->encoder = encoder;
 	rcon->encoder = renc;
 
 	return 0;
diff --git a/drivers/gpu/drm/rockchip/Kconfig b/drivers/gpu/drm/rockchip/Kconfig
new file mode 100644
index 000000000000..ca9f085efa92
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/Kconfig
@@ -0,0 +1,17 @@
+config DRM_ROCKCHIP
+	tristate "DRM Support for Rockchip"
+	depends on DRM && ROCKCHIP_IOMMU
+	select DRM_KMS_HELPER
+	select DRM_KMS_FB_HELPER
+	select DRM_PANEL
+	select FB_CFB_FILLRECT
+	select FB_CFB_COPYAREA
+	select FB_CFB_IMAGEBLIT
+	select VT_HW_CONSOLE_BINDING if FRAMEBUFFER_CONSOLE
+	select VIDEOMODE_HELPERS
+	help
+	  Choose this option if you have a Rockchip soc chipset.
+	  This driver provides kernel mode setting and buffer
+	  management to userspace. This driver does not provide
+	  2D or 3D acceleration; acceleration is performed by other
+	  IP found on the SoC.
diff --git a/drivers/gpu/drm/rockchip/Makefile b/drivers/gpu/drm/rockchip/Makefile
new file mode 100644
index 000000000000..2cb0672f57ed
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/Makefile
@@ -0,0 +1,8 @@
+#
+# Makefile for the drm device driver.  This driver provides support for the
+# Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
+
+rockchipdrm-y := rockchip_drm_drv.o rockchip_drm_fb.o rockchip_drm_fbdev.o \
+		rockchip_drm_gem.o
+
+obj-$(CONFIG_DRM_ROCKCHIP) += rockchipdrm.o rockchip_drm_vop.o
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
new file mode 100644
index 000000000000..a798c7c71f91
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
@@ -0,0 +1,551 @@
+/*
+ * Copyright (C) Fuzhou Rockchip Electronics Co.Ltd
+ * Author:Mark Yao <mark.yao@rock-chips.com>
+ *
+ * based on exynos_drm_drv.c
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <asm/dma-iommu.h>
+
+#include <drm/drmP.h>
+#include <drm/drm_crtc_helper.h>
+#include <drm/drm_fb_helper.h>
+#include <linux/dma-mapping.h>
+#include <linux/pm_runtime.h>
+#include <linux/of_graph.h>
+#include <linux/component.h>
+
+#include "rockchip_drm_drv.h"
+#include "rockchip_drm_fb.h"
+#include "rockchip_drm_fbdev.h"
+#include "rockchip_drm_gem.h"
+
+#define DRIVER_NAME	"rockchip"
+#define DRIVER_DESC	"RockChip Soc DRM"
+#define DRIVER_DATE	"20140818"
+#define DRIVER_MAJOR	1
+#define DRIVER_MINOR	0
+
+/*
+ * Attach a (component) device to the shared drm dma mapping from master drm
+ * device.  This is used by the VOPs to map GEM buffers to a common DMA
+ * mapping.
+ */
+int rockchip_drm_dma_attach_device(struct drm_device *drm_dev,
+				   struct device *dev)
+{
+	struct dma_iommu_mapping *mapping = drm_dev->dev->archdata.mapping;
+	int ret;
+
+	ret = dma_set_coherent_mask(dev, DMA_BIT_MASK(32));
+	if (ret)
+		return ret;
+
+	dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
+
+	return arm_iommu_attach_device(dev, mapping);
+}
+EXPORT_SYMBOL_GPL(rockchip_drm_dma_attach_device);
+
+void rockchip_drm_dma_detach_device(struct drm_device *drm_dev,
+				    struct device *dev)
+{
+	arm_iommu_detach_device(dev);
+}
+EXPORT_SYMBOL_GPL(rockchip_drm_dma_detach_device);
+
+int rockchip_register_crtc_funcs(struct drm_device *dev,
+				 const struct rockchip_crtc_funcs *crtc_funcs,
+				 int pipe)
+{
+	struct rockchip_drm_private *priv = dev->dev_private;
+
+	if (pipe > ROCKCHIP_MAX_CRTC)
+		return -EINVAL;
+
+	priv->crtc_funcs[pipe] = crtc_funcs;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(rockchip_register_crtc_funcs);
+
+void rockchip_unregister_crtc_funcs(struct drm_device *dev, int pipe)
+{
+	struct rockchip_drm_private *priv = dev->dev_private;
+
+	if (pipe > ROCKCHIP_MAX_CRTC)
+		return;
+
+	priv->crtc_funcs[pipe] = NULL;
+}
+EXPORT_SYMBOL_GPL(rockchip_unregister_crtc_funcs);
+
+static struct drm_crtc *rockchip_crtc_from_pipe(struct drm_device *drm,
+						int pipe)
+{
+	struct drm_crtc *crtc;
+	int i = 0;
+
+	list_for_each_entry(crtc, &drm->mode_config.crtc_list, head)
+		if (i++ == pipe)
+			return crtc;
+
+	return NULL;
+}
+
+static int rockchip_drm_crtc_enable_vblank(struct drm_device *dev, int pipe)
+{
+	struct rockchip_drm_private *priv = dev->dev_private;
+	struct drm_crtc *crtc = rockchip_crtc_from_pipe(dev, pipe);
+
+	if (crtc && priv->crtc_funcs[pipe] &&
+	    priv->crtc_funcs[pipe]->enable_vblank)
+		return priv->crtc_funcs[pipe]->enable_vblank(crtc);
+
+	return 0;
+}
+
+static void rockchip_drm_crtc_disable_vblank(struct drm_device *dev, int pipe)
+{
+	struct rockchip_drm_private *priv = dev->dev_private;
+	struct drm_crtc *crtc = rockchip_crtc_from_pipe(dev, pipe);
+
+	if (crtc && priv->crtc_funcs[pipe] &&
+	    priv->crtc_funcs[pipe]->enable_vblank)
+		priv->crtc_funcs[pipe]->disable_vblank(crtc);
+}
+
+static int rockchip_drm_load(struct drm_device *drm_dev, unsigned long flags)
+{
+	struct rockchip_drm_private *private;
+	struct dma_iommu_mapping *mapping;
+	struct device *dev = drm_dev->dev;
+	int ret;
+
+	private = devm_kzalloc(drm_dev->dev, sizeof(*private), GFP_KERNEL);
+	if (!private)
+		return -ENOMEM;
+
+	drm_dev->dev_private = private;
+
+	drm_mode_config_init(drm_dev);
+
+	rockchip_drm_mode_config_init(drm_dev);
+
+	dev->dma_parms = devm_kzalloc(dev, sizeof(*dev->dma_parms),
+				      GFP_KERNEL);
+	if (!dev->dma_parms) {
+		ret = -ENOMEM;
+		goto err_config_cleanup;
+	}
+
+	/* TODO(djkurtz): fetch the mapping start/size from somewhere */
+	mapping = arm_iommu_create_mapping(&platform_bus_type, 0x00000000,
+					   SZ_2G);
+	if (IS_ERR(mapping)) {
+		ret = PTR_ERR(mapping);
+		goto err_config_cleanup;
+	}
+
+	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
+	if (ret)
+		goto err_release_mapping;
+
+	dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
+
+	ret = arm_iommu_attach_device(dev, mapping);
+	if (ret)
+		goto err_release_mapping;
+
+	/* Try to bind all sub drivers. */
+	ret = component_bind_all(dev, drm_dev);
+	if (ret)
+		goto err_detach_device;
+
+	/* init kms poll for handling hpd */
+	drm_kms_helper_poll_init(drm_dev);
+
+	/*
+	 * enable drm irq mode.
+	 * - with irq_enabled = true, we can use the vblank feature.
+	 */
+	drm_dev->irq_enabled = true;
+
+	ret = drm_vblank_init(drm_dev, ROCKCHIP_MAX_CRTC);
+	if (ret)
+		goto err_kms_helper_poll_fini;
+
+	/*
+	 * with vblank_disable_allowed = true, vblank interrupt will be disabled
+	 * by drm timer once a current process gives up ownership of
+	 * vblank event.(after drm_vblank_put function is called)
+	 */
+	drm_dev->vblank_disable_allowed = true;
+
+	ret = rockchip_drm_fbdev_init(drm_dev);
+	if (ret)
+		goto err_vblank_cleanup;
+
+	return 0;
+err_vblank_cleanup:
+	drm_vblank_cleanup(drm_dev);
+err_kms_helper_poll_fini:
+	drm_kms_helper_poll_fini(drm_dev);
+	component_unbind_all(dev, drm_dev);
+err_detach_device:
+	arm_iommu_detach_device(dev);
+err_release_mapping:
+	arm_iommu_release_mapping(dev->archdata.mapping);
+err_config_cleanup:
+	drm_mode_config_cleanup(drm_dev);
+	drm_dev->dev_private = NULL;
+	return ret;
+}
+
+static int rockchip_drm_unload(struct drm_device *drm_dev)
+{
+	struct device *dev = drm_dev->dev;
+
+	rockchip_drm_fbdev_fini(drm_dev);
+	drm_vblank_cleanup(drm_dev);
+	drm_kms_helper_poll_fini(drm_dev);
+	component_unbind_all(dev, drm_dev);
+	arm_iommu_detach_device(dev);
+	arm_iommu_release_mapping(dev->archdata.mapping);
+	drm_mode_config_cleanup(drm_dev);
+	drm_dev->dev_private = NULL;
+
+	return 0;
+}
+
+void rockchip_drm_lastclose(struct drm_device *dev)
+{
+	struct rockchip_drm_private *priv = dev->dev_private;
+
+	drm_fb_helper_restore_fbdev_mode_unlocked(&priv->fbdev_helper);
+}
+
+static const struct file_operations rockchip_drm_driver_fops = {
+	.owner = THIS_MODULE,
+	.open = drm_open,
+	.mmap = rockchip_gem_mmap,
+	.poll = drm_poll,
+	.read = drm_read,
+	.unlocked_ioctl = drm_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl = drm_compat_ioctl,
+#endif
+	.release = drm_release,
+};
+
+const struct vm_operations_struct rockchip_drm_vm_ops = {
+	.open = drm_gem_vm_open,
+	.close = drm_gem_vm_close,
+};
+
+static struct drm_driver rockchip_drm_driver = {
+	.driver_features	= DRIVER_MODESET | DRIVER_GEM | DRIVER_PRIME,
+	.load			= rockchip_drm_load,
+	.unload			= rockchip_drm_unload,
+	.lastclose		= rockchip_drm_lastclose,
+	.get_vblank_counter	= drm_vblank_count,
+	.enable_vblank		= rockchip_drm_crtc_enable_vblank,
+	.disable_vblank		= rockchip_drm_crtc_disable_vblank,
+	.gem_vm_ops		= &rockchip_drm_vm_ops,
+	.gem_free_object	= rockchip_gem_free_object,
+	.dumb_create		= rockchip_gem_dumb_create,
+	.dumb_map_offset	= rockchip_gem_dumb_map_offset,
+	.dumb_destroy		= drm_gem_dumb_destroy,
+	.prime_handle_to_fd	= drm_gem_prime_handle_to_fd,
+	.prime_fd_to_handle	= drm_gem_prime_fd_to_handle,
+	.gem_prime_import	= drm_gem_prime_import,
+	.gem_prime_export	= drm_gem_prime_export,
+	.gem_prime_get_sg_table	= rockchip_gem_prime_get_sg_table,
+	.gem_prime_vmap		= rockchip_gem_prime_vmap,
+	.gem_prime_vunmap	= rockchip_gem_prime_vunmap,
+	.gem_prime_mmap		= rockchip_gem_mmap_buf,
+	.fops			= &rockchip_drm_driver_fops,
+	.name	= DRIVER_NAME,
+	.desc	= DRIVER_DESC,
+	.date	= DRIVER_DATE,
+	.major	= DRIVER_MAJOR,
+	.minor	= DRIVER_MINOR,
+};
+
+#ifdef CONFIG_PM_SLEEP
+static int rockchip_drm_sys_suspend(struct device *dev)
+{
+	struct drm_device *drm = dev_get_drvdata(dev);
+	struct drm_connector *connector;
+
+	if (!drm)
+		return 0;
+
+	drm_modeset_lock_all(drm);
+	list_for_each_entry(connector, &drm->mode_config.connector_list, head) {
+		int old_dpms = connector->dpms;
+
+		if (connector->funcs->dpms)
+			connector->funcs->dpms(connector, DRM_MODE_DPMS_OFF);
+
+		/* Set the old mode back to the connector for resume */
+		connector->dpms = old_dpms;
+	}
+	drm_modeset_unlock_all(drm);
+
+	return 0;
+}
+
+static int rockchip_drm_sys_resume(struct device *dev)
+{
+	struct drm_device *drm = dev_get_drvdata(dev);
+	struct drm_connector *connector;
+	enum drm_connector_status status;
+	bool changed = false;
+
+	if (!drm)
+		return 0;
+
+	drm_modeset_lock_all(drm);
+	list_for_each_entry(connector, &drm->mode_config.connector_list, head) {
+		int desired_mode = connector->dpms;
+
+		/*
+		 * at suspend time, we save dpms to connector->dpms,
+		 * restore the old_dpms, and at current time, the connector
+		 * dpms status must be DRM_MODE_DPMS_OFF.
+		 */
+		connector->dpms = DRM_MODE_DPMS_OFF;
+
+		/*
+		 * If the connector has been disconnected during suspend,
+		 * disconnect it from the encoder and leave it off. We'll notify
+		 * userspace at the end.
+		 */
+		if (desired_mode == DRM_MODE_DPMS_ON) {
+			status = connector->funcs->detect(connector, true);
+			if (status == connector_status_disconnected) {
+				connector->encoder = NULL;
+				connector->status = status;
+				changed = true;
+				continue;
+			}
+		}
+		if (connector->funcs->dpms)
+			connector->funcs->dpms(connector, desired_mode);
+	}
+	drm_modeset_unlock_all(drm);
+
+	drm_helper_resume_force_mode(drm);
+
+	if (changed)
+		drm_kms_helper_hotplug_event(drm);
+
+	return 0;
+}
+#endif
+
+static const struct dev_pm_ops rockchip_drm_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(rockchip_drm_sys_suspend,
+				rockchip_drm_sys_resume)
+};
+
+/*
+ * @node: device tree node containing encoder input ports
+ * @encoder: drm_encoder
+ */
+int rockchip_drm_encoder_get_mux_id(struct device_node *node,
+				    struct drm_encoder *encoder)
+{
+	struct device_node *ep = NULL;
+	struct drm_crtc *crtc = encoder->crtc;
+	struct of_endpoint endpoint;
+	struct device_node *port;
+	int ret;
+
+	if (!node || !crtc)
+		return -EINVAL;
+
+	do {
+		ep = of_graph_get_next_endpoint(node, ep);
+		if (!ep)
+			break;
+
+		port = of_graph_get_remote_port(ep);
+		of_node_put(port);
+		if (port == crtc->port) {
+			ret = of_graph_parse_endpoint(ep, &endpoint);
+			return ret ?: endpoint.id;
+		}
+	} while (ep);
+
+	return -EINVAL;
+}
+
+static int compare_of(struct device *dev, void *data)
+{
+	struct device_node *np = data;
+
+	return dev->of_node == np;
+}
+
+static void rockchip_add_endpoints(struct device *dev,
+				   struct component_match **match,
+				   struct device_node *port)
+{
+	struct device_node *ep, *remote;
+
+	for_each_child_of_node(port, ep) {
+		remote = of_graph_get_remote_port_parent(ep);
+		if (!remote || !of_device_is_available(remote)) {
+			of_node_put(remote);
+			continue;
+		} else if (!of_device_is_available(remote->parent)) {
+			dev_warn(dev, "parent device of %s is not available\n",
+				 remote->full_name);
+			of_node_put(remote);
+			continue;
+		}
+
+		component_match_add(dev, match, compare_of, remote);
+		of_node_put(remote);
+	}
+}
+
+static int rockchip_drm_bind(struct device *dev)
+{
+	struct drm_device *drm;
+	int ret;
+
+	drm = drm_dev_alloc(&rockchip_drm_driver, dev);
+	if (!drm)
+		return -ENOMEM;
+
+	ret = drm_dev_set_unique(drm, "%s", dev_name(dev));
+	if (ret)
+		goto err_free;
+
+	ret = drm_dev_register(drm, 0);
+	if (ret)
+		goto err_free;
+
+	dev_set_drvdata(dev, drm);
+
+	return 0;
+
+err_free:
+	drm_dev_unref(drm);
+	return ret;
+}
+
+static void rockchip_drm_unbind(struct device *dev)
+{
+	struct drm_device *drm = dev_get_drvdata(dev);
+
+	drm_dev_unregister(drm);
+	drm_dev_unref(drm);
+	dev_set_drvdata(dev, NULL);
+}
+
+static const struct component_master_ops rockchip_drm_ops = {
+	.bind = rockchip_drm_bind,
+	.unbind = rockchip_drm_unbind,
+};
+
+static int rockchip_drm_platform_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct component_match *match = NULL;
+	struct device_node *np = dev->of_node;
+	struct device_node *port;
+	int i;
+
+	if (!np)
+		return -ENODEV;
+	/*
+	 * Bind the crtc ports first, so that
+	 * drm_of_find_possible_crtcs called from encoder .bind callbacks
+	 * works as expected.
+	 */
+	for (i = 0;; i++) {
+		port = of_parse_phandle(np, "ports", i);
+		if (!port)
+			break;
+
+		if (!of_device_is_available(port->parent)) {
+			of_node_put(port);
+			continue;
+		}
+
+		component_match_add(dev, &match, compare_of, port->parent);
+		of_node_put(port);
+	}
+
+	if (i == 0) {
+		dev_err(dev, "missing 'ports' property\n");
+		return -ENODEV;
+	}
+
+	if (!match) {
+		dev_err(dev, "No available vop found for display-subsystem.\n");
+		return -ENODEV;
+	}
+	/*
+	 * For each bound crtc, bind the encoders attached to its
+	 * remote endpoint.
+	 */
+	for (i = 0;; i++) {
+		port = of_parse_phandle(np, "ports", i);
+		if (!port)
+			break;
+
+		if (!of_device_is_available(port->parent)) {
+			of_node_put(port);
+			continue;
+		}
+
+		rockchip_add_endpoints(dev, &match, port);
+		of_node_put(port);
+	}
+
+	return component_master_add_with_match(dev, &rockchip_drm_ops, match);
+}
+
+static int rockchip_drm_platform_remove(struct platform_device *pdev)
+{
+	component_master_del(&pdev->dev, &rockchip_drm_ops);
+
+	return 0;
+}
+
+static const struct of_device_id rockchip_drm_dt_ids[] = {
+	{ .compatible = "rockchip,display-subsystem", },
+	{ /* sentinel */ },
+};
+MODULE_DEVICE_TABLE(of, rockchip_drm_dt_ids);
+
+static struct platform_driver rockchip_drm_platform_driver = {
+	.probe = rockchip_drm_platform_probe,
+	.remove = rockchip_drm_platform_remove,
+	.driver = {
+		.owner = THIS_MODULE,
+		.name = "rockchip-drm",
+		.of_match_table = rockchip_drm_dt_ids,
+		.pm = &rockchip_drm_pm_ops,
+	},
+};
+
+module_platform_driver(rockchip_drm_platform_driver);
+
+MODULE_AUTHOR("Mark Yao <mark.yao@rock-chips.com>");
+MODULE_DESCRIPTION("ROCKCHIP DRM Driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
new file mode 100644
index 000000000000..dc4e5f03ac79
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
@@ -0,0 +1,68 @@
+/*
+ * Copyright (C) Fuzhou Rockchip Electronics Co.Ltd
+ * Author:Mark Yao <mark.yao@rock-chips.com>
+ *
+ * based on exynos_drm_drv.h
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _ROCKCHIP_DRM_DRV_H
+#define _ROCKCHIP_DRM_DRV_H
+
+#include <drm/drm_fb_helper.h>
+#include <drm/drm_gem.h>
+
+#include <linux/module.h>
+#include <linux/component.h>
+
+#define ROCKCHIP_MAX_FB_BUFFER	3
+#define ROCKCHIP_MAX_CONNECTOR	2
+#define ROCKCHIP_MAX_CRTC	2
+
+struct drm_device;
+struct drm_connector;
+
+/*
+ * Rockchip drm private crtc funcs.
+ * @enable_vblank: enable crtc vblank irq.
+ * @disable_vblank: disable crtc vblank irq.
+ */
+struct rockchip_crtc_funcs {
+	int (*enable_vblank)(struct drm_crtc *crtc);
+	void (*disable_vblank)(struct drm_crtc *crtc);
+};
+
+/*
+ * Rockchip drm private structure.
+ *
+ * @crtc: array of enabled CRTCs, used to map from "pipe" to drm_crtc.
+ * @num_pipe: number of pipes for this device.
+ */
+struct rockchip_drm_private {
+	struct drm_fb_helper fbdev_helper;
+	struct drm_gem_object *fbdev_bo;
+	const struct rockchip_crtc_funcs *crtc_funcs[ROCKCHIP_MAX_CRTC];
+};
+
+int rockchip_register_crtc_funcs(struct drm_device *dev,
+				 const struct rockchip_crtc_funcs *crtc_funcs,
+				 int pipe);
+void rockchip_unregister_crtc_funcs(struct drm_device *dev, int pipe);
+int rockchip_drm_encoder_get_mux_id(struct device_node *node,
+				    struct drm_encoder *encoder);
+int rockchip_drm_crtc_mode_config(struct drm_crtc *crtc, int connector_type,
+				  int out_mode);
+int rockchip_drm_dma_attach_device(struct drm_device *drm_dev,
+				   struct device *dev);
+void rockchip_drm_dma_detach_device(struct drm_device *drm_dev,
+				    struct device *dev);
+
+#endif /* _ROCKCHIP_DRM_DRV_H_ */
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
new file mode 100644
index 000000000000..77d52893d40f
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
@@ -0,0 +1,201 @@
+/*
+ * Copyright (C) Fuzhou Rockchip Electronics Co.Ltd
+ * Author:Mark Yao <mark.yao@rock-chips.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <drm/drm.h>
+#include <drm/drmP.h>
+#include <drm/drm_fb_helper.h>
+#include <drm/drm_crtc_helper.h>
+
+#include "rockchip_drm_drv.h"
+#include "rockchip_drm_gem.h"
+
+#define to_rockchip_fb(x) container_of(x, struct rockchip_drm_fb, fb)
+
+struct rockchip_drm_fb {
+	struct drm_framebuffer fb;
+	struct drm_gem_object *obj[ROCKCHIP_MAX_FB_BUFFER];
+};
+
+struct drm_gem_object *rockchip_fb_get_gem_obj(struct drm_framebuffer *fb,
+					       unsigned int plane)
+{
+	struct rockchip_drm_fb *rk_fb = to_rockchip_fb(fb);
+
+	if (plane >= ROCKCHIP_MAX_FB_BUFFER)
+		return NULL;
+
+	return rk_fb->obj[plane];
+}
+EXPORT_SYMBOL_GPL(rockchip_fb_get_gem_obj);
+
+static void rockchip_drm_fb_destroy(struct drm_framebuffer *fb)
+{
+	struct rockchip_drm_fb *rockchip_fb = to_rockchip_fb(fb);
+	struct drm_gem_object *obj;
+	int i;
+
+	for (i = 0; i < ROCKCHIP_MAX_FB_BUFFER; i++) {
+		obj = rockchip_fb->obj[i];
+		if (obj)
+			drm_gem_object_unreference_unlocked(obj);
+	}
+
+	drm_framebuffer_cleanup(fb);
+	kfree(rockchip_fb);
+}
+
+static int rockchip_drm_fb_create_handle(struct drm_framebuffer *fb,
+					 struct drm_file *file_priv,
+					 unsigned int *handle)
+{
+	struct rockchip_drm_fb *rockchip_fb = to_rockchip_fb(fb);
+
+	return drm_gem_handle_create(file_priv,
+				     rockchip_fb->obj[0], handle);
+}
+
+static struct drm_framebuffer_funcs rockchip_drm_fb_funcs = {
+	.destroy	= rockchip_drm_fb_destroy,
+	.create_handle	= rockchip_drm_fb_create_handle,
+};
+
+static struct rockchip_drm_fb *
+rockchip_fb_alloc(struct drm_device *dev, struct drm_mode_fb_cmd2 *mode_cmd,
+		  struct drm_gem_object **obj, unsigned int num_planes)
+{
+	struct rockchip_drm_fb *rockchip_fb;
+	int ret;
+	int i;
+
+	rockchip_fb = kzalloc(sizeof(*rockchip_fb), GFP_KERNEL);
+	if (!rockchip_fb)
+		return ERR_PTR(-ENOMEM);
+
+	drm_helper_mode_fill_fb_struct(&rockchip_fb->fb, mode_cmd);
+
+	for (i = 0; i < num_planes; i++)
+		rockchip_fb->obj[i] = obj[i];
+
+	ret = drm_framebuffer_init(dev, &rockchip_fb->fb,
+				   &rockchip_drm_fb_funcs);
+	if (ret) {
+		dev_err(dev->dev, "Failed to initialize framebuffer: %d\n",
+			ret);
+		kfree(rockchip_fb);
+		return ERR_PTR(ret);
+	}
+
+	return rockchip_fb;
+}
+
+static struct drm_framebuffer *
+rockchip_user_fb_create(struct drm_device *dev, struct drm_file *file_priv,
+			struct drm_mode_fb_cmd2 *mode_cmd)
+{
+	struct rockchip_drm_fb *rockchip_fb;
+	struct drm_gem_object *objs[ROCKCHIP_MAX_FB_BUFFER];
+	struct drm_gem_object *obj;
+	unsigned int hsub;
+	unsigned int vsub;
+	int num_planes;
+	int ret;
+	int i;
+
+	hsub = drm_format_horz_chroma_subsampling(mode_cmd->pixel_format);
+	vsub = drm_format_vert_chroma_subsampling(mode_cmd->pixel_format);
+	num_planes = min(drm_format_num_planes(mode_cmd->pixel_format),
+			 ROCKCHIP_MAX_FB_BUFFER);
+
+	for (i = 0; i < num_planes; i++) {
+		unsigned int width = mode_cmd->width / (i ? hsub : 1);
+		unsigned int height = mode_cmd->height / (i ? vsub : 1);
+		unsigned int min_size;
+
+		obj = drm_gem_object_lookup(dev, file_priv,
+					    mode_cmd->handles[i]);
+		if (!obj) {
+			dev_err(dev->dev, "Failed to lookup GEM object\n");
+			ret = -ENXIO;
+			goto err_gem_object_unreference;
+		}
+
+		min_size = (height - 1) * mode_cmd->pitches[i] +
+			mode_cmd->offsets[i] +
+			width * drm_format_plane_cpp(mode_cmd->pixel_format, i);
+
+		if (obj->size < min_size) {
+			drm_gem_object_unreference_unlocked(obj);
+			ret = -EINVAL;
+			goto err_gem_object_unreference;
+		}
+		objs[i] = obj;
+	}
+
+	rockchip_fb = rockchip_fb_alloc(dev, mode_cmd, objs, i);
+	if (IS_ERR(rockchip_fb)) {
+		ret = PTR_ERR(rockchip_fb);
+		goto err_gem_object_unreference;
+	}
+
+	return &rockchip_fb->fb;
+
+err_gem_object_unreference:
+	for (i--; i >= 0; i--)
+		drm_gem_object_unreference_unlocked(objs[i]);
+	return ERR_PTR(ret);
+}
+
+static void rockchip_drm_output_poll_changed(struct drm_device *dev)
+{
+	struct rockchip_drm_private *private = dev->dev_private;
+	struct drm_fb_helper *fb_helper = &private->fbdev_helper;
+
+	drm_fb_helper_hotplug_event(fb_helper);
+}
+
+static const struct drm_mode_config_funcs rockchip_drm_mode_config_funcs = {
+	.fb_create = rockchip_user_fb_create,
+	.output_poll_changed = rockchip_drm_output_poll_changed,
+};
+
+struct drm_framebuffer *
+rockchip_drm_framebuffer_init(struct drm_device *dev,
+			      struct drm_mode_fb_cmd2 *mode_cmd,
+			      struct drm_gem_object *obj)
+{
+	struct rockchip_drm_fb *rockchip_fb;
+
+	rockchip_fb = rockchip_fb_alloc(dev, mode_cmd, &obj, 1);
+	if (IS_ERR(rockchip_fb))
+		return NULL;
+
+	return &rockchip_fb->fb;
+}
+
+void rockchip_drm_mode_config_init(struct drm_device *dev)
+{
+	dev->mode_config.min_width = 0;
+	dev->mode_config.min_height = 0;
+
+	/*
+	 * set max width and height as default value(4096x4096).
+	 * this value would be used to check framebuffer size limitation
+	 * at drm_mode_addfb().
+	 */
+	dev->mode_config.max_width = 4096;
+	dev->mode_config.max_height = 4096;
+
+	dev->mode_config.funcs = &rockchip_drm_mode_config_funcs;
+}
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.h b/drivers/gpu/drm/rockchip/rockchip_drm_fb.h
new file mode 100644
index 000000000000..09574d48226f
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.h
@@ -0,0 +1,28 @@
+/*
+ * Copyright (C) Fuzhou Rockchip Electronics Co.Ltd
+ * Author:Mark Yao <mark.yao@rock-chips.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _ROCKCHIP_DRM_FB_H
+#define _ROCKCHIP_DRM_FB_H
+
+struct drm_framebuffer *
+rockchip_drm_framebuffer_init(struct drm_device *dev,
+			      struct drm_mode_fb_cmd2 *mode_cmd,
+			      struct drm_gem_object *obj);
+void rockchip_drm_framebuffer_fini(struct drm_framebuffer *fb);
+
+void rockchip_drm_mode_config_init(struct drm_device *dev);
+
+struct drm_gem_object *rockchip_fb_get_gem_obj(struct drm_framebuffer *fb,
+					       unsigned int plane);
+#endif /* _ROCKCHIP_DRM_FB_H */
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fbdev.c b/drivers/gpu/drm/rockchip/rockchip_drm_fbdev.c
new file mode 100644
index 000000000000..a5d889a8716b
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_fbdev.c
@@ -0,0 +1,210 @@
+/*
+ * Copyright (C) Fuzhou Rockchip Electronics Co.Ltd
+ * Author:Mark Yao <mark.yao@rock-chips.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <drm/drm.h>
+#include <drm/drmP.h>
+#include <drm/drm_fb_helper.h>
+#include <drm/drm_crtc_helper.h>
+
+#include "rockchip_drm_drv.h"
+#include "rockchip_drm_gem.h"
+#include "rockchip_drm_fb.h"
+
+#define PREFERRED_BPP		32
+#define to_drm_private(x) \
+		container_of(x, struct rockchip_drm_private, fbdev_helper)
+
+static int rockchip_fbdev_mmap(struct fb_info *info,
+			       struct vm_area_struct *vma)
+{
+	struct drm_fb_helper *helper = info->par;
+	struct rockchip_drm_private *private = to_drm_private(helper);
+
+	return rockchip_gem_mmap_buf(private->fbdev_bo, vma);
+}
+
+static struct fb_ops rockchip_drm_fbdev_ops = {
+	.owner		= THIS_MODULE,
+	.fb_mmap	= rockchip_fbdev_mmap,
+	.fb_fillrect	= cfb_fillrect,
+	.fb_copyarea	= cfb_copyarea,
+	.fb_imageblit	= cfb_imageblit,
+	.fb_check_var	= drm_fb_helper_check_var,
+	.fb_set_par	= drm_fb_helper_set_par,
+	.fb_blank	= drm_fb_helper_blank,
+	.fb_pan_display	= drm_fb_helper_pan_display,
+	.fb_setcmap	= drm_fb_helper_setcmap,
+};
+
+static int rockchip_drm_fbdev_create(struct drm_fb_helper *helper,
+				     struct drm_fb_helper_surface_size *sizes)
+{
+	struct rockchip_drm_private *private = to_drm_private(helper);
+	struct drm_mode_fb_cmd2 mode_cmd = { 0 };
+	struct drm_device *dev = helper->dev;
+	struct rockchip_gem_object *rk_obj;
+	struct drm_framebuffer *fb;
+	unsigned int bytes_per_pixel;
+	unsigned long offset;
+	struct fb_info *fbi;
+	size_t size;
+	int ret;
+
+	bytes_per_pixel = DIV_ROUND_UP(sizes->surface_bpp, 8);
+
+	mode_cmd.width = sizes->surface_width;
+	mode_cmd.height = sizes->surface_height;
+	mode_cmd.pitches[0] = sizes->surface_width * bytes_per_pixel;
+	mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp,
+		sizes->surface_depth);
+
+	size = mode_cmd.pitches[0] * mode_cmd.height;
+
+	rk_obj = rockchip_gem_create_object(dev, size);
+	if (IS_ERR(rk_obj))
+		return -ENOMEM;
+
+	private->fbdev_bo = &rk_obj->base;
+
+	fbi = framebuffer_alloc(0, dev->dev);
+	if (!fbi) {
+		dev_err(dev->dev, "Failed to allocate framebuffer info.\n");
+		ret = -ENOMEM;
+		goto err_rockchip_gem_free_object;
+	}
+
+	helper->fb = rockchip_drm_framebuffer_init(dev, &mode_cmd,
+						   private->fbdev_bo);
+	if (IS_ERR(helper->fb)) {
+		dev_err(dev->dev, "Failed to allocate DRM framebuffer.\n");
+		ret = PTR_ERR(helper->fb);
+		goto err_framebuffer_release;
+	}
+
+	helper->fbdev = fbi;
+
+	fbi->par = helper;
+	fbi->flags = FBINFO_FLAG_DEFAULT;
+	fbi->fbops = &rockchip_drm_fbdev_ops;
+
+	ret = fb_alloc_cmap(&fbi->cmap, 256, 0);
+	if (ret) {
+		dev_err(dev->dev, "Failed to allocate color map.\n");
+		goto err_drm_framebuffer_unref;
+	}
+
+	fb = helper->fb;
+	drm_fb_helper_fill_fix(fbi, fb->pitches[0], fb->depth);
+	drm_fb_helper_fill_var(fbi, helper, fb->width, fb->height);
+
+	offset = fbi->var.xoffset * bytes_per_pixel;
+	offset += fbi->var.yoffset * fb->pitches[0];
+
+	dev->mode_config.fb_base = 0;
+	fbi->screen_base = rk_obj->kvaddr + offset;
+	fbi->screen_size = rk_obj->base.size;
+	fbi->fix.smem_len = rk_obj->base.size;
+
+	DRM_DEBUG_KMS("FB [%dx%d]-%d kvaddr=%p offset=%ld size=%d\n",
+		      fb->width, fb->height, fb->depth, rk_obj->kvaddr,
+		      offset, size);
+	return 0;
+
+err_drm_framebuffer_unref:
+	drm_framebuffer_unreference(helper->fb);
+err_framebuffer_release:
+	framebuffer_release(fbi);
+err_rockchip_gem_free_object:
+	rockchip_gem_free_object(&rk_obj->base);
+	return ret;
+}
+
+static const struct drm_fb_helper_funcs rockchip_drm_fb_helper_funcs = {
+	.fb_probe = rockchip_drm_fbdev_create,
+};
+
+int rockchip_drm_fbdev_init(struct drm_device *dev)
+{
+	struct rockchip_drm_private *private = dev->dev_private;
+	struct drm_fb_helper *helper;
+	unsigned int num_crtc;
+	int ret;
+
+	if (!dev->mode_config.num_crtc || !dev->mode_config.num_connector)
+		return -EINVAL;
+
+	num_crtc = dev->mode_config.num_crtc;
+
+	helper = &private->fbdev_helper;
+
+	drm_fb_helper_prepare(dev, helper, &rockchip_drm_fb_helper_funcs);
+
+	ret = drm_fb_helper_init(dev, helper, num_crtc, ROCKCHIP_MAX_CONNECTOR);
+	if (ret < 0) {
+		dev_err(dev->dev, "Failed to initialize drm fb helper - %d.\n",
+			ret);
+		return ret;
+	}
+
+	ret = drm_fb_helper_single_add_all_connectors(helper);
+	if (ret < 0) {
+		dev_err(dev->dev, "Failed to add connectors - %d.\n", ret);
+		goto err_drm_fb_helper_fini;
+	}
+
+	/* disable all the possible outputs/crtcs before entering KMS mode */
+	drm_helper_disable_unused_functions(dev);
+
+	ret = drm_fb_helper_initial_config(helper, PREFERRED_BPP);
+	if (ret < 0) {
+		dev_err(dev->dev, "Failed to set initial hw config - %d.\n",
+			ret);
+		goto err_drm_fb_helper_fini;
+	}
+
+	return 0;
+
+err_drm_fb_helper_fini:
+	drm_fb_helper_fini(helper);
+	return ret;
+}
+
+void rockchip_drm_fbdev_fini(struct drm_device *dev)
+{
+	struct rockchip_drm_private *private = dev->dev_private;
+	struct drm_fb_helper *helper;
+
+	helper = &private->fbdev_helper;
+
+	if (helper->fbdev) {
+		struct fb_info *info;
+		int ret;
+
+		info = helper->fbdev;
+		ret = unregister_framebuffer(info);
+		if (ret < 0)
+			DRM_DEBUG_KMS("failed unregister_framebuffer() - %d\n",
+				      ret);
+
+		if (info->cmap.len)
+			fb_dealloc_cmap(&info->cmap);
+
+		framebuffer_release(info);
+	}
+
+	if (helper->fb)
+		drm_framebuffer_unreference(helper->fb);
+
+	drm_fb_helper_fini(helper);
+}
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fbdev.h b/drivers/gpu/drm/rockchip/rockchip_drm_fbdev.h
new file mode 100644
index 000000000000..50432e9b5b37
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_fbdev.h
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) Fuzhou Rockchip Electronics Co.Ltd
+ * Author:Mark Yao <mark.yao@rock-chips.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _ROCKCHIP_DRM_FBDEV_H
+#define _ROCKCHIP_DRM_FBDEV_H
+
+int rockchip_drm_fbdev_init(struct drm_device *dev);
+void rockchip_drm_fbdev_fini(struct drm_device *dev);
+
+#endif /* _ROCKCHIP_DRM_FBDEV_H */
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
new file mode 100644
index 000000000000..bc98a227dc76
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -0,0 +1,294 @@
+/*
+ * Copyright (C) Fuzhou Rockchip Electronics Co.Ltd
+ * Author:Mark Yao <mark.yao@rock-chips.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <drm/drm.h>
+#include <drm/drmP.h>
+#include <drm/drm_gem.h>
+#include <drm/drm_vma_manager.h>
+
+#include <linux/dma-attrs.h>
+
+#include "rockchip_drm_drv.h"
+#include "rockchip_drm_gem.h"
+
+static int rockchip_gem_alloc_buf(struct rockchip_gem_object *rk_obj)
+{
+	struct drm_gem_object *obj = &rk_obj->base;
+	struct drm_device *drm = obj->dev;
+
+	init_dma_attrs(&rk_obj->dma_attrs);
+	dma_set_attr(DMA_ATTR_WRITE_COMBINE, &rk_obj->dma_attrs);
+
+	/* TODO(djkurtz): Use DMA_ATTR_NO_KERNEL_MAPPING except for fbdev */
+	rk_obj->kvaddr = dma_alloc_attrs(drm->dev, obj->size,
+					 &rk_obj->dma_addr, GFP_KERNEL,
+					 &rk_obj->dma_attrs);
+	if (IS_ERR(rk_obj->kvaddr)) {
+		int ret = PTR_ERR(rk_obj->kvaddr);
+
+		DRM_ERROR("failed to allocate %#x byte dma buffer, %d",
+			  obj->size, ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void rockchip_gem_free_buf(struct rockchip_gem_object *rk_obj)
+{
+	struct drm_gem_object *obj = &rk_obj->base;
+	struct drm_device *drm = obj->dev;
+
+	dma_free_attrs(drm->dev, obj->size, rk_obj->kvaddr, rk_obj->dma_addr,
+		       &rk_obj->dma_attrs);
+}
+
+int rockchip_gem_mmap_buf(struct drm_gem_object *obj,
+			  struct vm_area_struct *vma)
+{
+	struct rockchip_gem_object *rk_obj = to_rockchip_obj(obj);
+	struct drm_device *drm = obj->dev;
+	unsigned long vm_size;
+
+	vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+	vm_size = vma->vm_end - vma->vm_start;
+
+	if (vm_size > obj->size)
+		return -EINVAL;
+
+	return dma_mmap_attrs(drm->dev, vma, rk_obj->kvaddr, rk_obj->dma_addr,
+			     obj->size, &rk_obj->dma_attrs);
+}
+
+/* drm driver mmap file operations */
+int rockchip_gem_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+	struct drm_file *priv = filp->private_data;
+	struct drm_device *dev = priv->minor->dev;
+	struct drm_gem_object *obj;
+	struct drm_vma_offset_node *node;
+	int ret;
+
+	if (drm_device_is_unplugged(dev))
+		return -ENODEV;
+
+	mutex_lock(&dev->struct_mutex);
+
+	node = drm_vma_offset_exact_lookup(dev->vma_offset_manager,
+					   vma->vm_pgoff,
+					   vma_pages(vma));
+	if (!node) {
+		mutex_unlock(&dev->struct_mutex);
+		DRM_ERROR("failed to find vma node.\n");
+		return -EINVAL;
+	} else if (!drm_vma_node_is_allowed(node, filp)) {
+		mutex_unlock(&dev->struct_mutex);
+		return -EACCES;
+	}
+
+	obj = container_of(node, struct drm_gem_object, vma_node);
+	ret = rockchip_gem_mmap_buf(obj, vma);
+
+	mutex_unlock(&dev->struct_mutex);
+
+	return ret;
+}
+
+struct rockchip_gem_object *
+	rockchip_gem_create_object(struct drm_device *drm, unsigned int size)
+{
+	struct rockchip_gem_object *rk_obj;
+	struct drm_gem_object *obj;
+	int ret;
+
+	size = round_up(size, PAGE_SIZE);
+
+	rk_obj = kzalloc(sizeof(*rk_obj), GFP_KERNEL);
+	if (!rk_obj)
+		return ERR_PTR(-ENOMEM);
+
+	obj = &rk_obj->base;
+
+	drm_gem_private_object_init(drm, obj, size);
+
+	ret = rockchip_gem_alloc_buf(rk_obj);
+	if (ret)
+		goto err_free_rk_obj;
+
+	return rk_obj;
+
+err_free_rk_obj:
+	kfree(rk_obj);
+	return ERR_PTR(ret);
+}
+
+/*
+ * rockchip_gem_free_object - (struct drm_driver)->gem_free_object callback
+ * function
+ */
+void rockchip_gem_free_object(struct drm_gem_object *obj)
+{
+	struct rockchip_gem_object *rk_obj;
+
+	drm_gem_free_mmap_offset(obj);
+
+	rk_obj = to_rockchip_obj(obj);
+
+	rockchip_gem_free_buf(rk_obj);
+
+	kfree(rk_obj);
+}
+
+/*
+ * rockchip_gem_create_with_handle - allocate an object with the given
+ * size and create a gem handle on it
+ *
+ * returns a struct rockchip_gem_object* on success or ERR_PTR values
+ * on failure.
+ */
+static struct rockchip_gem_object *
+rockchip_gem_create_with_handle(struct drm_file *file_priv,
+				struct drm_device *drm, unsigned int size,
+				unsigned int *handle)
+{
+	struct rockchip_gem_object *rk_obj;
+	struct drm_gem_object *obj;
+	int ret;
+
+	rk_obj = rockchip_gem_create_object(drm, size);
+	if (IS_ERR(rk_obj))
+		return ERR_CAST(rk_obj);
+
+	obj = &rk_obj->base;
+
+	/*
+	 * allocate a id of idr table where the obj is registered
+	 * and handle has the id what user can see.
+	 */
+	ret = drm_gem_handle_create(file_priv, obj, handle);
+	if (ret)
+		goto err_handle_create;
+
+	/* drop reference from allocate - handle holds it now. */
+	drm_gem_object_unreference_unlocked(obj);
+
+	return rk_obj;
+
+err_handle_create:
+	rockchip_gem_free_object(obj);
+
+	return ERR_PTR(ret);
+}
+
+int rockchip_gem_dumb_map_offset(struct drm_file *file_priv,
+				 struct drm_device *dev, uint32_t handle,
+				 uint64_t *offset)
+{
+	struct drm_gem_object *obj;
+	int ret;
+
+	mutex_lock(&dev->struct_mutex);
+
+	obj = drm_gem_object_lookup(dev, file_priv, handle);
+	if (!obj) {
+		DRM_ERROR("failed to lookup gem object.\n");
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	ret = drm_gem_create_mmap_offset(obj);
+	if (ret)
+		goto out;
+
+	*offset = drm_vma_node_offset_addr(&obj->vma_node);
+	DRM_DEBUG_KMS("offset = 0x%llx\n", *offset);
+
+out:
+	drm_gem_object_unreference(obj);
+unlock:
+	mutex_unlock(&dev->struct_mutex);
+	return ret;
+}
+
+/*
+ * rockchip_gem_dumb_create - (struct drm_driver)->dumb_create callback
+ * function
+ *
+ * This aligns the pitch and size arguments to the minimum required. wrap
+ * this into your own function if you need bigger alignment.
+ */
+int rockchip_gem_dumb_create(struct drm_file *file_priv,
+			     struct drm_device *dev,
+			     struct drm_mode_create_dumb *args)
+{
+	struct rockchip_gem_object *rk_obj;
+	int min_pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
+
+	/*
+	 * align to 64 bytes since Mali requires it.
+	 */
+	min_pitch = ALIGN(min_pitch, 64);
+
+	if (args->pitch < min_pitch)
+		args->pitch = min_pitch;
+
+	if (args->size < args->pitch * args->height)
+		args->size = args->pitch * args->height;
+
+	rk_obj = rockchip_gem_create_with_handle(file_priv, dev, args->size,
+						 &args->handle);
+
+	return PTR_ERR_OR_ZERO(rk_obj);
+}
+
+/*
+ * Allocate a sg_table for this GEM object.
+ * Note: Both the table's contents, and the sg_table itself must be freed by
+ *       the caller.
+ * Returns a pointer to the newly allocated sg_table, or an ERR_PTR() error.
+ */
+struct sg_table *rockchip_gem_prime_get_sg_table(struct drm_gem_object *obj)
+{
+	struct rockchip_gem_object *rk_obj = to_rockchip_obj(obj);
+	struct drm_device *drm = obj->dev;
+	struct sg_table *sgt;
+	int ret;
+
+	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt)
+		return ERR_PTR(-ENOMEM);
+
+	ret = dma_get_sgtable_attrs(drm->dev, sgt, rk_obj->kvaddr,
+				    rk_obj->dma_addr, obj->size,
+				    &rk_obj->dma_attrs);
+	if (ret) {
+		DRM_ERROR("failed to allocate sgt, %d\n", ret);
+		kfree(sgt);
+		return ERR_PTR(ret);
+	}
+
+	return sgt;
+}
+
+void *rockchip_gem_prime_vmap(struct drm_gem_object *obj)
+{
+	struct rockchip_gem_object *rk_obj = to_rockchip_obj(obj);
+
+	return rk_obj->kvaddr;
+}
+
+void rockchip_gem_prime_vunmap(struct drm_gem_object *obj, void *vaddr)
+{
+	/* Nothing to do */
+}
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.h b/drivers/gpu/drm/rockchip/rockchip_drm_gem.h
new file mode 100644
index 000000000000..67bcebe90003
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright (C) Fuzhou Rockchip Electronics Co.Ltd
+ * Author:Mark Yao <mark.yao@rock-chips.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _ROCKCHIP_DRM_GEM_H
+#define _ROCKCHIP_DRM_GEM_H
+
+#define to_rockchip_obj(x) container_of(x, struct rockchip_gem_object, base)
+
+struct rockchip_gem_object {
+	struct drm_gem_object base;
+	unsigned int flags;
+
+	void *kvaddr;
+	dma_addr_t dma_addr;
+	struct dma_attrs dma_attrs;
+};
+
+struct sg_table *rockchip_gem_prime_get_sg_table(struct drm_gem_object *obj);
+struct drm_gem_object *
+rockchip_gem_prime_import_sg_table(struct drm_device *dev, size_t size,
+				   struct sg_table *sgt);
+void *rockchip_gem_prime_vmap(struct drm_gem_object *obj);
+void rockchip_gem_prime_vunmap(struct drm_gem_object *obj, void *vaddr);
+
+/* drm driver mmap file operations */
+int rockchip_gem_mmap(struct file *filp, struct vm_area_struct *vma);
+
+/* mmap a gem object to userspace. */
+int rockchip_gem_mmap_buf(struct drm_gem_object *obj,
+			  struct vm_area_struct *vma);
+
+struct rockchip_gem_object *
+	rockchip_gem_create_object(struct drm_device *drm, unsigned int size);
+
+void rockchip_gem_free_object(struct drm_gem_object *obj);
+
+int rockchip_gem_dumb_create(struct drm_file *file_priv,
+			     struct drm_device *dev,
+			     struct drm_mode_create_dumb *args);
+int rockchip_gem_dumb_map_offset(struct drm_file *file_priv,
+				 struct drm_device *dev, uint32_t handle,
+				 uint64_t *offset);
+#endif /* _ROCKCHIP_DRM_GEM_H */
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
new file mode 100644
index 000000000000..e7ca25b3fb38
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -0,0 +1,1455 @@
+/*
+ * Copyright (C) Fuzhou Rockchip Electronics Co.Ltd
+ * Author:Mark Yao <mark.yao@rock-chips.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <drm/drm.h>
+#include <drm/drmP.h>
+#include <drm/drm_crtc.h>
+#include <drm/drm_crtc_helper.h>
+#include <drm/drm_plane_helper.h>
+
+#include <linux/kernel.h>
+#include <linux/platform_device.h>
+#include <linux/clk.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/component.h>
+
+#include <linux/reset.h>
+#include <linux/delay.h>
+
+#include "rockchip_drm_drv.h"
+#include "rockchip_drm_gem.h"
+#include "rockchip_drm_fb.h"
+#include "rockchip_drm_vop.h"
+
+#define VOP_REG(off, _mask, s) \
+		{.offset = off, \
+		 .mask = _mask, \
+		 .shift = s,}
+
+#define __REG_SET_RELAXED(x, off, mask, shift, v) \
+		vop_mask_write_relaxed(x, off, (mask) << shift, (v) << shift)
+#define __REG_SET_NORMAL(x, off, mask, shift, v) \
+		vop_mask_write(x, off, (mask) << shift, (v) << shift)
+
+#define REG_SET(x, base, reg, v, mode) \
+		__REG_SET_##mode(x, base + reg.offset, reg.mask, reg.shift, v)
+
+#define VOP_WIN_SET(x, win, name, v) \
+		REG_SET(x, win->base, win->phy->name, v, RELAXED)
+#define VOP_CTRL_SET(x, name, v) \
+		REG_SET(x, 0, (x)->data->ctrl->name, v, NORMAL)
+
+#define VOP_WIN_GET(x, win, name) \
+		vop_read_reg(x, win->base, &win->phy->name)
+
+#define VOP_WIN_GET_YRGBADDR(vop, win) \
+		vop_readl(vop, win->base + win->phy->yrgb_mst.offset)
+
+#define to_vop(x) container_of(x, struct vop, crtc)
+#define to_vop_win(x) container_of(x, struct vop_win, base)
+
+struct vop_win_state {
+	struct list_head head;
+	struct drm_framebuffer *fb;
+	dma_addr_t yrgb_mst;
+	struct drm_pending_vblank_event *event;
+};
+
+struct vop_win {
+	struct drm_plane base;
+	const struct vop_win_data *data;
+	struct vop *vop;
+
+	struct list_head pending;
+	struct vop_win_state *active;
+};
+
+struct vop {
+	struct drm_crtc crtc;
+	struct device *dev;
+	struct drm_device *drm_dev;
+	unsigned int dpms;
+
+	int connector_type;
+	int connector_out_mode;
+
+	/* mutex vsync_ work */
+	struct mutex vsync_mutex;
+	bool vsync_work_pending;
+
+	const struct vop_data *data;
+
+	uint32_t *regsbak;
+	void __iomem *regs;
+
+	/* physical map length of vop register */
+	uint32_t len;
+
+	/* one time only one process allowed to config the register */
+	spinlock_t reg_lock;
+	/* lock vop irq reg */
+	spinlock_t irq_lock;
+
+	unsigned int irq;
+
+	/* vop AHP clk */
+	struct clk *hclk;
+	/* vop dclk */
+	struct clk *dclk;
+	/* vop share memory frequency */
+	struct clk *aclk;
+
+	/* vop dclk reset */
+	struct reset_control *dclk_rst;
+
+	int pipe;
+
+	struct vop_win win[];
+};
+
+enum vop_data_format {
+	VOP_FMT_ARGB8888 = 0,
+	VOP_FMT_RGB888,
+	VOP_FMT_RGB565,
+	VOP_FMT_YUV420SP = 4,
+	VOP_FMT_YUV422SP,
+	VOP_FMT_YUV444SP,
+};
+
+struct vop_reg_data {
+	uint32_t offset;
+	uint32_t value;
+};
+
+struct vop_reg {
+	uint32_t offset;
+	uint32_t shift;
+	uint32_t mask;
+};
+
+struct vop_ctrl {
+	struct vop_reg standby;
+	struct vop_reg data_blank;
+	struct vop_reg gate_en;
+	struct vop_reg mmu_en;
+	struct vop_reg rgb_en;
+	struct vop_reg edp_en;
+	struct vop_reg hdmi_en;
+	struct vop_reg mipi_en;
+	struct vop_reg out_mode;
+	struct vop_reg dither_down;
+	struct vop_reg dither_up;
+	struct vop_reg pin_pol;
+
+	struct vop_reg htotal_pw;
+	struct vop_reg hact_st_end;
+	struct vop_reg vtotal_pw;
+	struct vop_reg vact_st_end;
+	struct vop_reg hpost_st_end;
+	struct vop_reg vpost_st_end;
+};
+
+struct vop_win_phy {
+	const uint32_t *data_formats;
+	uint32_t nformats;
+
+	struct vop_reg enable;
+	struct vop_reg format;
+	struct vop_reg act_info;
+	struct vop_reg dsp_info;
+	struct vop_reg dsp_st;
+	struct vop_reg yrgb_mst;
+	struct vop_reg uv_mst;
+	struct vop_reg yrgb_vir;
+	struct vop_reg uv_vir;
+
+	struct vop_reg dst_alpha_ctl;
+	struct vop_reg src_alpha_ctl;
+};
+
+struct vop_win_data {
+	uint32_t base;
+	const struct vop_win_phy *phy;
+	enum drm_plane_type type;
+};
+
+struct vop_data {
+	const struct vop_reg_data *init_table;
+	unsigned int table_size;
+	const struct vop_ctrl *ctrl;
+	const struct vop_win_data *win;
+	unsigned int win_size;
+};
+
+static const uint32_t formats_01[] = {
+	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_ARGB8888,
+	DRM_FORMAT_RGB888,
+	DRM_FORMAT_RGB565,
+	DRM_FORMAT_NV12,
+	DRM_FORMAT_NV16,
+	DRM_FORMAT_NV24,
+};
+
+static const uint32_t formats_234[] = {
+	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_ARGB8888,
+	DRM_FORMAT_RGB888,
+	DRM_FORMAT_RGB565,
+};
+
+static const struct vop_win_phy win01_data = {
+	.data_formats = formats_01,
+	.nformats = ARRAY_SIZE(formats_01),
+	.enable = VOP_REG(WIN0_CTRL0, 0x1, 0),
+	.format = VOP_REG(WIN0_CTRL0, 0x7, 1),
+	.act_info = VOP_REG(WIN0_ACT_INFO, 0x1fff1fff, 0),
+	.dsp_info = VOP_REG(WIN0_DSP_INFO, 0x0fff0fff, 0),
+	.dsp_st = VOP_REG(WIN0_DSP_ST, 0x1fff1fff, 0),
+	.yrgb_mst = VOP_REG(WIN0_YRGB_MST, 0xffffffff, 0),
+	.uv_mst = VOP_REG(WIN0_CBR_MST, 0xffffffff, 0),
+	.yrgb_vir = VOP_REG(WIN0_VIR, 0x3fff, 0),
+	.uv_vir = VOP_REG(WIN0_VIR, 0x3fff, 16),
+	.src_alpha_ctl = VOP_REG(WIN0_SRC_ALPHA_CTRL, 0xff, 0),
+	.dst_alpha_ctl = VOP_REG(WIN0_DST_ALPHA_CTRL, 0xff, 0),
+};
+
+static const struct vop_win_phy win23_data = {
+	.data_formats = formats_234,
+	.nformats = ARRAY_SIZE(formats_234),
+	.enable = VOP_REG(WIN2_CTRL0, 0x1, 0),
+	.format = VOP_REG(WIN2_CTRL0, 0x7, 1),
+	.dsp_info = VOP_REG(WIN2_DSP_INFO0, 0x0fff0fff, 0),
+	.dsp_st = VOP_REG(WIN2_DSP_ST0, 0x1fff1fff, 0),
+	.yrgb_mst = VOP_REG(WIN2_MST0, 0xffffffff, 0),
+	.yrgb_vir = VOP_REG(WIN2_VIR0_1, 0x1fff, 0),
+	.src_alpha_ctl = VOP_REG(WIN2_SRC_ALPHA_CTRL, 0xff, 0),
+	.dst_alpha_ctl = VOP_REG(WIN2_DST_ALPHA_CTRL, 0xff, 0),
+};
+
+static const struct vop_win_phy cursor_data = {
+	.data_formats = formats_234,
+	.nformats = ARRAY_SIZE(formats_234),
+	.enable = VOP_REG(HWC_CTRL0, 0x1, 0),
+	.format = VOP_REG(HWC_CTRL0, 0x7, 1),
+	.dsp_st = VOP_REG(HWC_DSP_ST, 0x1fff1fff, 0),
+	.yrgb_mst = VOP_REG(HWC_MST, 0xffffffff, 0),
+};
+
+static const struct vop_ctrl ctrl_data = {
+	.standby = VOP_REG(SYS_CTRL, 0x1, 22),
+	.gate_en = VOP_REG(SYS_CTRL, 0x1, 23),
+	.mmu_en = VOP_REG(SYS_CTRL, 0x1, 20),
+	.rgb_en = VOP_REG(SYS_CTRL, 0x1, 12),
+	.hdmi_en = VOP_REG(SYS_CTRL, 0x1, 13),
+	.edp_en = VOP_REG(SYS_CTRL, 0x1, 14),
+	.mipi_en = VOP_REG(SYS_CTRL, 0x1, 15),
+	.dither_down = VOP_REG(DSP_CTRL1, 0xf, 1),
+	.dither_up = VOP_REG(DSP_CTRL1, 0x1, 6),
+	.data_blank = VOP_REG(DSP_CTRL0, 0x1, 19),
+	.out_mode = VOP_REG(DSP_CTRL0, 0xf, 0),
+	.pin_pol = VOP_REG(DSP_CTRL0, 0xf, 4),
+	.htotal_pw = VOP_REG(DSP_HTOTAL_HS_END, 0x1fff1fff, 0),
+	.hact_st_end = VOP_REG(DSP_HACT_ST_END, 0x1fff1fff, 0),
+	.vtotal_pw = VOP_REG(DSP_VTOTAL_VS_END, 0x1fff1fff, 0),
+	.vact_st_end = VOP_REG(DSP_VACT_ST_END, 0x1fff1fff, 0),
+	.hpost_st_end = VOP_REG(POST_DSP_HACT_INFO, 0x1fff1fff, 0),
+	.vpost_st_end = VOP_REG(POST_DSP_VACT_INFO, 0x1fff1fff, 0),
+};
+
+static const struct vop_reg_data vop_init_reg_table[] = {
+	{SYS_CTRL, 0x00c00000},
+	{DSP_CTRL0, 0x00000000},
+	{WIN0_CTRL0, 0x00000080},
+	{WIN1_CTRL0, 0x00000080},
+};
+
+/*
+ * Note: rk3288 has a dedicated 'cursor' window, however, that window requires
+ * special support to get alpha blending working.  For now, just use overlay
+ * window 1 for the drm cursor.
+ */
+static const struct vop_win_data rk3288_vop_win_data[] = {
+	{ .base = 0x00, .phy = &win01_data, .type = DRM_PLANE_TYPE_PRIMARY },
+	{ .base = 0x40, .phy = &win01_data, .type = DRM_PLANE_TYPE_CURSOR },
+	{ .base = 0x00, .phy = &win23_data, .type = DRM_PLANE_TYPE_OVERLAY },
+	{ .base = 0x50, .phy = &win23_data, .type = DRM_PLANE_TYPE_OVERLAY },
+	{ .base = 0x00, .phy = &cursor_data, .type = DRM_PLANE_TYPE_OVERLAY },
+};
+
+static const struct vop_data rk3288_vop = {
+	.init_table = vop_init_reg_table,
+	.table_size = ARRAY_SIZE(vop_init_reg_table),
+	.ctrl = &ctrl_data,
+	.win = rk3288_vop_win_data,
+	.win_size = ARRAY_SIZE(rk3288_vop_win_data),
+};
+
+static const struct of_device_id vop_driver_dt_match[] = {
+	{ .compatible = "rockchip,rk3288-vop",
+	  .data = &rk3288_vop },
+	{},
+};
+
+static inline void vop_writel(struct vop *vop, uint32_t offset, uint32_t v)
+{
+	writel(v, vop->regs + offset);
+	vop->regsbak[offset >> 2] = v;
+}
+
+static inline uint32_t vop_readl(struct vop *vop, uint32_t offset)
+{
+	return readl(vop->regs + offset);
+}
+
+static inline uint32_t vop_read_reg(struct vop *vop, uint32_t base,
+				    const struct vop_reg *reg)
+{
+	return (vop_readl(vop, base + reg->offset) >> reg->shift) & reg->mask;
+}
+
+static inline void vop_cfg_done(struct vop *vop)
+{
+	writel(0x01, vop->regs + REG_CFG_DONE);
+}
+
+static inline void vop_mask_write(struct vop *vop, uint32_t offset,
+				  uint32_t mask, uint32_t v)
+{
+	if (mask) {
+		uint32_t cached_val = vop->regsbak[offset >> 2];
+
+		cached_val = (cached_val & ~mask) | v;
+		writel(cached_val, vop->regs + offset);
+		vop->regsbak[offset >> 2] = cached_val;
+	}
+}
+
+static inline void vop_mask_write_relaxed(struct vop *vop, uint32_t offset,
+					  uint32_t mask, uint32_t v)
+{
+	if (mask) {
+		uint32_t cached_val = vop->regsbak[offset >> 2];
+
+		cached_val = (cached_val & ~mask) | v;
+		writel_relaxed(cached_val, vop->regs + offset);
+		vop->regsbak[offset >> 2] = cached_val;
+	}
+}
+
+static enum vop_data_format vop_convert_format(uint32_t format)
+{
+	switch (format) {
+	case DRM_FORMAT_XRGB8888:
+	case DRM_FORMAT_ARGB8888:
+		return VOP_FMT_ARGB8888;
+	case DRM_FORMAT_RGB888:
+		return VOP_FMT_RGB888;
+	case DRM_FORMAT_RGB565:
+		return VOP_FMT_RGB565;
+	case DRM_FORMAT_NV12:
+		return VOP_FMT_YUV420SP;
+	case DRM_FORMAT_NV16:
+		return VOP_FMT_YUV422SP;
+	case DRM_FORMAT_NV24:
+		return VOP_FMT_YUV444SP;
+	default:
+		DRM_ERROR("unsupport format[%08x]\n", format);
+		return -EINVAL;
+	}
+}
+
+static bool is_alpha_support(uint32_t format)
+{
+	switch (format) {
+	case DRM_FORMAT_ARGB8888:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static void vop_enable(struct drm_crtc *crtc)
+{
+	struct vop *vop = to_vop(crtc);
+	int ret;
+
+	ret = clk_enable(vop->hclk);
+	if (ret < 0) {
+		dev_err(vop->dev, "failed to enable hclk - %d\n", ret);
+		return;
+	}
+
+	ret = clk_enable(vop->dclk);
+	if (ret < 0) {
+		dev_err(vop->dev, "failed to enable dclk - %d\n", ret);
+		goto err_disable_hclk;
+	}
+
+	ret = clk_enable(vop->aclk);
+	if (ret < 0) {
+		dev_err(vop->dev, "failed to enable aclk - %d\n", ret);
+		goto err_disable_dclk;
+	}
+
+	/*
+	 * Slave iommu shares power, irq and clock with vop.  It was associated
+	 * automatically with this master device via common driver code.
+	 * Now that we have enabled the clock we attach it to the shared drm
+	 * mapping.
+	 */
+	ret = rockchip_drm_dma_attach_device(vop->drm_dev, vop->dev);
+	if (ret) {
+		dev_err(vop->dev, "failed to attach dma mapping, %d\n", ret);
+		goto err_disable_aclk;
+	}
+
+	spin_lock(&vop->reg_lock);
+
+	VOP_CTRL_SET(vop, standby, 0);
+
+	spin_unlock(&vop->reg_lock);
+
+	enable_irq(vop->irq);
+
+	drm_vblank_on(vop->drm_dev, vop->pipe);
+
+	return;
+
+err_disable_aclk:
+	clk_disable(vop->aclk);
+err_disable_dclk:
+	clk_disable(vop->dclk);
+err_disable_hclk:
+	clk_disable(vop->hclk);
+}
+
+static void vop_disable(struct drm_crtc *crtc)
+{
+	struct vop *vop = to_vop(crtc);
+
+	drm_vblank_off(crtc->dev, vop->pipe);
+
+	disable_irq(vop->irq);
+
+	/*
+	 * TODO: Since standby doesn't take effect until the next vblank,
+	 * when we turn off dclk below, the vop is probably still active.
+	 */
+	spin_lock(&vop->reg_lock);
+
+	VOP_CTRL_SET(vop, standby, 1);
+
+	spin_unlock(&vop->reg_lock);
+	/*
+	 * disable dclk to stop frame scan, so we can safely detach iommu,
+	 */
+	clk_disable(vop->dclk);
+
+	rockchip_drm_dma_detach_device(vop->drm_dev, vop->dev);
+
+	clk_disable(vop->aclk);
+	clk_disable(vop->hclk);
+}
+
+/*
+ * Caller must hold vsync_mutex.
+ */
+static struct drm_framebuffer *vop_win_last_pending_fb(struct vop_win *vop_win)
+{
+	struct vop_win_state *last;
+	struct vop_win_state *active = vop_win->active;
+
+	if (list_empty(&vop_win->pending))
+		return active ? active->fb : NULL;
+
+	last = list_last_entry(&vop_win->pending, struct vop_win_state, head);
+	return last ? last->fb : NULL;
+}
+
+/*
+ * Caller must hold vsync_mutex.
+ */
+static int vop_win_queue_fb(struct vop_win *vop_win,
+			    struct drm_framebuffer *fb, dma_addr_t yrgb_mst,
+			    struct drm_pending_vblank_event *event)
+{
+	struct vop_win_state *state;
+
+	state = kzalloc(sizeof(*state), GFP_KERNEL);
+	if (!state)
+		return -ENOMEM;
+
+	state->fb = fb;
+	state->yrgb_mst = yrgb_mst;
+	state->event = event;
+
+	list_add_tail(&state->head, &vop_win->pending);
+
+	return 0;
+}
+
+static int vop_update_plane_event(struct drm_plane *plane,
+				  struct drm_crtc *crtc,
+				  struct drm_framebuffer *fb, int crtc_x,
+				  int crtc_y, unsigned int crtc_w,
+				  unsigned int crtc_h, uint32_t src_x,
+				  uint32_t src_y, uint32_t src_w,
+				  uint32_t src_h,
+				  struct drm_pending_vblank_event *event)
+{
+	struct vop_win *vop_win = to_vop_win(plane);
+	const struct vop_win_data *win = vop_win->data;
+	struct vop *vop = to_vop(crtc);
+	struct drm_gem_object *obj;
+	struct rockchip_gem_object *rk_obj;
+	unsigned long offset;
+	unsigned int actual_w;
+	unsigned int actual_h;
+	unsigned int dsp_stx;
+	unsigned int dsp_sty;
+	unsigned int y_vir_stride;
+	dma_addr_t yrgb_mst;
+	enum vop_data_format format;
+	uint32_t val;
+	bool is_alpha;
+	bool visible;
+	int ret;
+	struct drm_rect dest = {
+		.x1 = crtc_x,
+		.y1 = crtc_y,
+		.x2 = crtc_x + crtc_w,
+		.y2 = crtc_y + crtc_h,
+	};
+	struct drm_rect src = {
+		/* 16.16 fixed point */
+		.x1 = src_x,
+		.y1 = src_y,
+		.x2 = src_x + src_w,
+		.y2 = src_y + src_h,
+	};
+	const struct drm_rect clip = {
+		.x2 = crtc->mode.hdisplay,
+		.y2 = crtc->mode.vdisplay,
+	};
+	bool can_position = plane->type != DRM_PLANE_TYPE_PRIMARY;
+
+	ret = drm_plane_helper_check_update(plane, crtc, fb,
+					    &src, &dest, &clip,
+					    DRM_PLANE_HELPER_NO_SCALING,
+					    DRM_PLANE_HELPER_NO_SCALING,
+					    can_position, false, &visible);
+	if (ret)
+		return ret;
+
+	if (!visible)
+		return 0;
+
+	is_alpha = is_alpha_support(fb->pixel_format);
+	format = vop_convert_format(fb->pixel_format);
+	if (format < 0)
+		return format;
+
+	obj = rockchip_fb_get_gem_obj(fb, 0);
+	if (!obj) {
+		DRM_ERROR("fail to get rockchip gem object from framebuffer\n");
+		return -EINVAL;
+	}
+
+	rk_obj = to_rockchip_obj(obj);
+
+	actual_w = (src.x2 - src.x1) >> 16;
+	actual_h = (src.y2 - src.y1) >> 16;
+	crtc_x = max(0, crtc_x);
+	crtc_y = max(0, crtc_y);
+
+	dsp_stx = crtc_x + crtc->mode.htotal - crtc->mode.hsync_start;
+	dsp_sty = crtc_y + crtc->mode.vtotal - crtc->mode.vsync_start;
+
+	offset = (src.x1 >> 16) * (fb->bits_per_pixel >> 3);
+	offset += (src.y1 >> 16) * fb->pitches[0];
+	yrgb_mst = rk_obj->dma_addr + offset;
+
+	y_vir_stride = fb->pitches[0] / (fb->bits_per_pixel >> 3);
+
+	/*
+	 * If this plane update changes the plane's framebuffer, (or more
+	 * precisely, if this update has a different framebuffer than the last
+	 * update), enqueue it so we can track when it completes.
+	 *
+	 * Only when we discover that this update has completed, can we
+	 * unreference any previous framebuffers.
+	 */
+	mutex_lock(&vop->vsync_mutex);
+	if (fb != vop_win_last_pending_fb(vop_win)) {
+		ret = drm_vblank_get(plane->dev, vop->pipe);
+		if (ret) {
+			DRM_ERROR("failed to get vblank, %d\n", ret);
+			mutex_unlock(&vop->vsync_mutex);
+			return ret;
+		}
+
+		drm_framebuffer_reference(fb);
+
+		ret = vop_win_queue_fb(vop_win, fb, yrgb_mst, event);
+		if (ret) {
+			drm_vblank_put(plane->dev, vop->pipe);
+			mutex_unlock(&vop->vsync_mutex);
+			return ret;
+		}
+
+		vop->vsync_work_pending = true;
+	}
+	mutex_unlock(&vop->vsync_mutex);
+
+	spin_lock(&vop->reg_lock);
+
+	VOP_WIN_SET(vop, win, format, format);
+	VOP_WIN_SET(vop, win, yrgb_vir, y_vir_stride);
+	VOP_WIN_SET(vop, win, yrgb_mst, yrgb_mst);
+	val = (actual_h - 1) << 16;
+	val |= (actual_w - 1) & 0xffff;
+	VOP_WIN_SET(vop, win, act_info, val);
+	VOP_WIN_SET(vop, win, dsp_info, val);
+	val = (dsp_sty - 1) << 16;
+	val |= (dsp_stx - 1) & 0xffff;
+	VOP_WIN_SET(vop, win, dsp_st, val);
+
+	if (is_alpha) {
+		VOP_WIN_SET(vop, win, dst_alpha_ctl,
+			    DST_FACTOR_M0(ALPHA_SRC_INVERSE));
+		val = SRC_ALPHA_EN(1) | SRC_COLOR_M0(ALPHA_SRC_PRE_MUL) |
+			SRC_ALPHA_M0(ALPHA_STRAIGHT) |
+			SRC_BLEND_M0(ALPHA_PER_PIX) |
+			SRC_ALPHA_CAL_M0(ALPHA_NO_SATURATION) |
+			SRC_FACTOR_M0(ALPHA_ONE);
+		VOP_WIN_SET(vop, win, src_alpha_ctl, val);
+	} else {
+		VOP_WIN_SET(vop, win, src_alpha_ctl, SRC_ALPHA_EN(0));
+	}
+
+	VOP_WIN_SET(vop, win, enable, 1);
+
+	vop_cfg_done(vop);
+	spin_unlock(&vop->reg_lock);
+
+	return 0;
+}
+
+static int vop_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
+			    struct drm_framebuffer *fb, int crtc_x, int crtc_y,
+			    unsigned int crtc_w, unsigned int crtc_h,
+			    uint32_t src_x, uint32_t src_y, uint32_t src_w,
+			    uint32_t src_h)
+{
+	return vop_update_plane_event(plane, crtc, fb, crtc_x, crtc_y, crtc_w,
+				      crtc_h, src_x, src_y, src_w, src_h,
+				      NULL);
+}
+
+static int vop_update_primary_plane(struct drm_crtc *crtc,
+				    struct drm_pending_vblank_event *event)
+{
+	unsigned int crtc_w, crtc_h;
+
+	crtc_w = crtc->primary->fb->width - crtc->x;
+	crtc_h = crtc->primary->fb->height - crtc->y;
+
+	return vop_update_plane_event(crtc->primary, crtc, crtc->primary->fb,
+				      0, 0, crtc_w, crtc_h, crtc->x << 16,
+				      crtc->y << 16, crtc_w << 16,
+				      crtc_h << 16, event);
+}
+
+static int vop_disable_plane(struct drm_plane *plane)
+{
+	struct vop_win *vop_win = to_vop_win(plane);
+	const struct vop_win_data *win = vop_win->data;
+	struct vop *vop;
+	int ret;
+
+	if (!plane->crtc)
+		return 0;
+
+	vop = to_vop(plane->crtc);
+
+	ret = drm_vblank_get(plane->dev, vop->pipe);
+	if (ret) {
+		DRM_ERROR("failed to get vblank, %d\n", ret);
+		return ret;
+	}
+
+	mutex_lock(&vop->vsync_mutex);
+
+	ret = vop_win_queue_fb(vop_win, NULL, 0, NULL);
+	if (ret) {
+		drm_vblank_put(plane->dev, vop->pipe);
+		mutex_unlock(&vop->vsync_mutex);
+		return ret;
+	}
+
+	vop->vsync_work_pending = true;
+	mutex_unlock(&vop->vsync_mutex);
+
+	spin_lock(&vop->reg_lock);
+	VOP_WIN_SET(vop, win, enable, 0);
+	vop_cfg_done(vop);
+	spin_unlock(&vop->reg_lock);
+
+	return 0;
+}
+
+static void vop_plane_destroy(struct drm_plane *plane)
+{
+	vop_disable_plane(plane);
+	drm_plane_cleanup(plane);
+}
+
+static const struct drm_plane_funcs vop_plane_funcs = {
+	.update_plane = vop_update_plane,
+	.disable_plane = vop_disable_plane,
+	.destroy = vop_plane_destroy,
+};
+
+int rockchip_drm_crtc_mode_config(struct drm_crtc *crtc,
+				  int connector_type,
+				  int out_mode)
+{
+	struct vop *vop = to_vop(crtc);
+
+	vop->connector_type = connector_type;
+	vop->connector_out_mode = out_mode;
+
+	return 0;
+}
+
+static int vop_crtc_enable_vblank(struct drm_crtc *crtc)
+{
+	struct vop *vop = to_vop(crtc);
+	unsigned long flags;
+
+	if (vop->dpms != DRM_MODE_DPMS_ON)
+		return -EPERM;
+
+	spin_lock_irqsave(&vop->irq_lock, flags);
+
+	vop_mask_write(vop, INTR_CTRL0, FS_INTR_MASK, FS_INTR_EN(1));
+
+	spin_unlock_irqrestore(&vop->irq_lock, flags);
+
+	return 0;
+}
+
+static void vop_crtc_disable_vblank(struct drm_crtc *crtc)
+{
+	struct vop *vop = to_vop(crtc);
+	unsigned long flags;
+
+	if (vop->dpms != DRM_MODE_DPMS_ON)
+		return;
+	spin_lock_irqsave(&vop->irq_lock, flags);
+	vop_mask_write(vop, INTR_CTRL0, FS_INTR_MASK, FS_INTR_EN(0));
+	spin_unlock_irqrestore(&vop->irq_lock, flags);
+}
+
+static const struct rockchip_crtc_funcs private_crtc_funcs = {
+	.enable_vblank = vop_crtc_enable_vblank,
+	.disable_vblank = vop_crtc_disable_vblank,
+};
+
+static void vop_crtc_dpms(struct drm_crtc *crtc, int mode)
+{
+	struct vop *vop = to_vop(crtc);
+
+	DRM_DEBUG_KMS("crtc[%d] mode[%d]\n", crtc->base.id, mode);
+
+	if (vop->dpms == mode) {
+		DRM_DEBUG_KMS("desired dpms mode is same as previous one.\n");
+		return;
+	}
+
+	switch (mode) {
+	case DRM_MODE_DPMS_ON:
+		vop_enable(crtc);
+		break;
+	case DRM_MODE_DPMS_STANDBY:
+	case DRM_MODE_DPMS_SUSPEND:
+	case DRM_MODE_DPMS_OFF:
+		vop_disable(crtc);
+		break;
+	default:
+		DRM_DEBUG_KMS("unspecified mode %d\n", mode);
+		break;
+	}
+
+	vop->dpms = mode;
+}
+
+static void vop_crtc_prepare(struct drm_crtc *crtc)
+{
+	vop_crtc_dpms(crtc, DRM_MODE_DPMS_ON);
+}
+
+static bool vop_crtc_mode_fixup(struct drm_crtc *crtc,
+				const struct drm_display_mode *mode,
+				struct drm_display_mode *adjusted_mode)
+{
+	if (adjusted_mode->htotal == 0 || adjusted_mode->vtotal == 0)
+		return false;
+
+	return true;
+}
+
+static int vop_crtc_mode_set_base(struct drm_crtc *crtc, int x, int y,
+				  struct drm_framebuffer *old_fb)
+{
+	int ret;
+
+	crtc->x = x;
+	crtc->y = y;
+
+	ret = vop_update_primary_plane(crtc, NULL);
+	if (ret < 0) {
+		DRM_ERROR("fail to update plane\n");
+		return ret;
+	}
+
+	return 0;
+}
+
+static int vop_crtc_mode_set(struct drm_crtc *crtc,
+			     struct drm_display_mode *mode,
+			     struct drm_display_mode *adjusted_mode,
+			     int x, int y, struct drm_framebuffer *fb)
+{
+	struct vop *vop = to_vop(crtc);
+	u16 hsync_len = adjusted_mode->hsync_end - adjusted_mode->hsync_start;
+	u16 hdisplay = adjusted_mode->hdisplay;
+	u16 htotal = adjusted_mode->htotal;
+	u16 hact_st = adjusted_mode->htotal - adjusted_mode->hsync_start;
+	u16 hact_end = hact_st + hdisplay;
+	u16 vdisplay = adjusted_mode->vdisplay;
+	u16 vtotal = adjusted_mode->vtotal;
+	u16 vsync_len = adjusted_mode->vsync_end - adjusted_mode->vsync_start;
+	u16 vact_st = adjusted_mode->vtotal - adjusted_mode->vsync_start;
+	u16 vact_end = vact_st + vdisplay;
+	int ret;
+	uint32_t val;
+
+	/*
+	 * disable dclk to stop frame scan, so that we can safe config mode and
+	 * enable iommu.
+	 */
+	clk_disable(vop->dclk);
+
+	switch (vop->connector_type) {
+	case DRM_MODE_CONNECTOR_LVDS:
+		VOP_CTRL_SET(vop, rgb_en, 1);
+		break;
+	case DRM_MODE_CONNECTOR_eDP:
+		VOP_CTRL_SET(vop, edp_en, 1);
+		break;
+	case DRM_MODE_CONNECTOR_HDMIA:
+		VOP_CTRL_SET(vop, hdmi_en, 1);
+		break;
+	default:
+		DRM_ERROR("unsupport connector_type[%d]\n",
+			  vop->connector_type);
+		return -EINVAL;
+	};
+	VOP_CTRL_SET(vop, out_mode, vop->connector_out_mode);
+
+	val = 0x8;
+	val |= (adjusted_mode->flags & DRM_MODE_FLAG_NHSYNC) ? 1 : 0;
+	val |= (adjusted_mode->flags & DRM_MODE_FLAG_NVSYNC) ? (1 << 1) : 0;
+	VOP_CTRL_SET(vop, pin_pol, val);
+
+	VOP_CTRL_SET(vop, htotal_pw, (htotal << 16) | hsync_len);
+	val = hact_st << 16;
+	val |= hact_end;
+	VOP_CTRL_SET(vop, hact_st_end, val);
+	VOP_CTRL_SET(vop, hpost_st_end, val);
+
+	VOP_CTRL_SET(vop, vtotal_pw, (vtotal << 16) | vsync_len);
+	val = vact_st << 16;
+	val |= vact_end;
+	VOP_CTRL_SET(vop, vact_st_end, val);
+	VOP_CTRL_SET(vop, vpost_st_end, val);
+
+	ret = vop_crtc_mode_set_base(crtc, x, y, fb);
+	if (ret)
+		return ret;
+
+	/*
+	 * reset dclk, take all mode config affect, so the clk would run in
+	 * correct frame.
+	 */
+	reset_control_assert(vop->dclk_rst);
+	usleep_range(10, 20);
+	reset_control_deassert(vop->dclk_rst);
+
+	clk_set_rate(vop->dclk, adjusted_mode->clock * 1000);
+	ret = clk_enable(vop->dclk);
+	if (ret < 0) {
+		dev_err(vop->dev, "failed to enable dclk - %d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void vop_crtc_commit(struct drm_crtc *crtc)
+{
+}
+
+static const struct drm_crtc_helper_funcs vop_crtc_helper_funcs = {
+	.dpms = vop_crtc_dpms,
+	.prepare = vop_crtc_prepare,
+	.mode_fixup = vop_crtc_mode_fixup,
+	.mode_set = vop_crtc_mode_set,
+	.mode_set_base = vop_crtc_mode_set_base,
+	.commit = vop_crtc_commit,
+};
+
+static int vop_crtc_page_flip(struct drm_crtc *crtc,
+			      struct drm_framebuffer *fb,
+			      struct drm_pending_vblank_event *event,
+			      uint32_t page_flip_flags)
+{
+	struct vop *vop = to_vop(crtc);
+	struct drm_framebuffer *old_fb = crtc->primary->fb;
+	int ret;
+
+	/* when the page flip is requested, crtc's dpms should be on */
+	if (vop->dpms > DRM_MODE_DPMS_ON) {
+		DRM_DEBUG("failed page flip request at dpms[%d].\n", vop->dpms);
+		return 0;
+	}
+
+	crtc->primary->fb = fb;
+
+	ret = vop_update_primary_plane(crtc, event);
+	if (ret)
+		crtc->primary->fb = old_fb;
+
+	return ret;
+}
+
+static void vop_win_state_complete(struct vop_win *vop_win,
+				   struct vop_win_state *state)
+{
+	struct vop *vop = vop_win->vop;
+	struct drm_crtc *crtc = &vop->crtc;
+	struct drm_device *drm = crtc->dev;
+	unsigned long flags;
+
+	if (state->event) {
+		spin_lock_irqsave(&drm->event_lock, flags);
+		drm_send_vblank_event(drm, -1, state->event);
+		spin_unlock_irqrestore(&drm->event_lock, flags);
+	}
+
+	list_del(&state->head);
+	drm_vblank_put(crtc->dev, vop->pipe);
+}
+
+static void vop_crtc_destroy(struct drm_crtc *crtc)
+{
+	drm_crtc_cleanup(crtc);
+}
+
+static const struct drm_crtc_funcs vop_crtc_funcs = {
+	.set_config = drm_crtc_helper_set_config,
+	.page_flip = vop_crtc_page_flip,
+	.destroy = vop_crtc_destroy,
+};
+
+static bool vop_win_state_is_active(struct vop_win *vop_win,
+				    struct vop_win_state *state)
+{
+	bool active = false;
+
+	if (state->fb) {
+		dma_addr_t yrgb_mst;
+
+		/* check yrgb_mst to tell if pending_fb is now front */
+		yrgb_mst = VOP_WIN_GET_YRGBADDR(vop_win->vop, vop_win->data);
+
+		active = (yrgb_mst == state->yrgb_mst);
+	} else {
+		bool enabled;
+
+		/* if enable bit is clear, plane is now disabled */
+		enabled = VOP_WIN_GET(vop_win->vop, vop_win->data, enable);
+
+		active = (enabled == 0);
+	}
+
+	return active;
+}
+
+static void vop_win_state_destroy(struct vop_win_state *state)
+{
+	struct drm_framebuffer *fb = state->fb;
+
+	if (fb)
+		drm_framebuffer_unreference(fb);
+
+	kfree(state);
+}
+
+static void vop_win_update_state(struct vop_win *vop_win)
+{
+	struct vop_win_state *state, *n, *new_active = NULL;
+
+	/* Check if any pending states are now active */
+	list_for_each_entry(state, &vop_win->pending, head)
+		if (vop_win_state_is_active(vop_win, state)) {
+			new_active = state;
+			break;
+		}
+
+	if (!new_active)
+		return;
+
+	/*
+	 * Destroy any 'skipped' pending states - states that were queued
+	 * before the newly active state.
+	 */
+	list_for_each_entry_safe(state, n, &vop_win->pending, head) {
+		if (state == new_active)
+			break;
+		vop_win_state_complete(vop_win, state);
+		vop_win_state_destroy(state);
+	}
+
+	vop_win_state_complete(vop_win, new_active);
+
+	if (vop_win->active)
+		vop_win_state_destroy(vop_win->active);
+	vop_win->active = new_active;
+}
+
+static bool vop_win_has_pending_state(struct vop_win *vop_win)
+{
+	return !list_empty(&vop_win->pending);
+}
+
+static irqreturn_t vop_isr_thread(int irq, void *data)
+{
+	struct vop *vop = data;
+	const struct vop_data *vop_data = vop->data;
+	unsigned int i;
+
+	mutex_lock(&vop->vsync_mutex);
+
+	if (!vop->vsync_work_pending)
+		goto done;
+
+	vop->vsync_work_pending = false;
+
+	for (i = 0; i < vop_data->win_size; i++) {
+		struct vop_win *vop_win = &vop->win[i];
+
+		vop_win_update_state(vop_win);
+		if (vop_win_has_pending_state(vop_win))
+			vop->vsync_work_pending = true;
+	}
+
+done:
+	mutex_unlock(&vop->vsync_mutex);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t vop_isr(int irq, void *data)
+{
+	struct vop *vop = data;
+	uint32_t intr0_reg, active_irqs;
+	unsigned long flags;
+
+	/*
+	 * INTR_CTRL0 register has interrupt status, enable and clear bits, we
+	 * must hold irq_lock to avoid a race with enable/disable_vblank().
+	*/
+	spin_lock_irqsave(&vop->irq_lock, flags);
+	intr0_reg = vop_readl(vop, INTR_CTRL0);
+	active_irqs = intr0_reg & INTR_MASK;
+	/* Clear all active interrupt sources */
+	if (active_irqs)
+		vop_writel(vop, INTR_CTRL0,
+			   intr0_reg | (active_irqs << INTR_CLR_SHIFT));
+	spin_unlock_irqrestore(&vop->irq_lock, flags);
+
+	/* This is expected for vop iommu irqs, since the irq is shared */
+	if (!active_irqs)
+		return IRQ_NONE;
+
+	/* Only Frame Start Interrupt is enabled; other irqs are spurious. */
+	if (!(active_irqs & FS_INTR)) {
+		DRM_ERROR("Unknown VOP IRQs: %#02x\n", active_irqs);
+		return IRQ_NONE;
+	}
+
+	drm_handle_vblank(vop->drm_dev, vop->pipe);
+
+	return (vop->vsync_work_pending) ? IRQ_WAKE_THREAD : IRQ_HANDLED;
+}
+
+static int vop_create_crtc(struct vop *vop)
+{
+	const struct vop_data *vop_data = vop->data;
+	struct device *dev = vop->dev;
+	struct drm_device *drm_dev = vop->drm_dev;
+	struct drm_plane *primary = NULL, *cursor = NULL, *plane;
+	struct drm_crtc *crtc = &vop->crtc;
+	struct device_node *port;
+	int ret;
+	int i;
+
+	/*
+	 * Create drm_plane for primary and cursor planes first, since we need
+	 * to pass them to drm_crtc_init_with_planes, which sets the
+	 * "possible_crtcs" to the newly initialized crtc.
+	 */
+	for (i = 0; i < vop_data->win_size; i++) {
+		struct vop_win *vop_win = &vop->win[i];
+		const struct vop_win_data *win_data = vop_win->data;
+
+		if (win_data->type != DRM_PLANE_TYPE_PRIMARY &&
+		    win_data->type != DRM_PLANE_TYPE_CURSOR)
+			continue;
+
+		ret = drm_universal_plane_init(vop->drm_dev, &vop_win->base,
+					       0, &vop_plane_funcs,
+					       win_data->phy->data_formats,
+					       win_data->phy->nformats,
+					       win_data->type);
+		if (ret) {
+			DRM_ERROR("failed to initialize plane\n");
+			goto err_cleanup_planes;
+		}
+
+		plane = &vop_win->base;
+		if (plane->type == DRM_PLANE_TYPE_PRIMARY)
+			primary = plane;
+		else if (plane->type == DRM_PLANE_TYPE_CURSOR)
+			cursor = plane;
+	}
+
+	ret = drm_crtc_init_with_planes(drm_dev, crtc, primary, cursor,
+					&vop_crtc_funcs);
+	if (ret)
+		return ret;
+
+	drm_crtc_helper_add(crtc, &vop_crtc_helper_funcs);
+
+	/*
+	 * Create drm_planes for overlay windows with possible_crtcs restricted
+	 * to the newly created crtc.
+	 */
+	for (i = 0; i < vop_data->win_size; i++) {
+		struct vop_win *vop_win = &vop->win[i];
+		const struct vop_win_data *win_data = vop_win->data;
+		unsigned long possible_crtcs = 1 << drm_crtc_index(crtc);
+
+		if (win_data->type != DRM_PLANE_TYPE_OVERLAY)
+			continue;
+
+		ret = drm_universal_plane_init(vop->drm_dev, &vop_win->base,
+					       possible_crtcs,
+					       &vop_plane_funcs,
+					       win_data->phy->data_formats,
+					       win_data->phy->nformats,
+					       win_data->type);
+		if (ret) {
+			DRM_ERROR("failed to initialize overlay plane\n");
+			goto err_cleanup_crtc;
+		}
+	}
+
+	port = of_get_child_by_name(dev->of_node, "port");
+	if (!port) {
+		DRM_ERROR("no port node found in %s\n",
+			  dev->of_node->full_name);
+		goto err_cleanup_crtc;
+	}
+
+	crtc->port = port;
+	vop->pipe = drm_crtc_index(crtc);
+	rockchip_register_crtc_funcs(drm_dev, &private_crtc_funcs, vop->pipe);
+
+	return 0;
+
+err_cleanup_crtc:
+	drm_crtc_cleanup(crtc);
+err_cleanup_planes:
+	list_for_each_entry(plane, &drm_dev->mode_config.plane_list, head)
+		drm_plane_cleanup(plane);
+	return ret;
+}
+
+static void vop_destroy_crtc(struct vop *vop)
+{
+	struct drm_crtc *crtc = &vop->crtc;
+
+	rockchip_unregister_crtc_funcs(vop->drm_dev, vop->pipe);
+	of_node_put(crtc->port);
+	drm_crtc_cleanup(crtc);
+}
+
+static int vop_initial(struct vop *vop)
+{
+	const struct vop_data *vop_data = vop->data;
+	const struct vop_reg_data *init_table = vop_data->init_table;
+	struct reset_control *ahb_rst;
+	int i, ret;
+
+	vop->hclk = devm_clk_get(vop->dev, "hclk_vop");
+	if (IS_ERR(vop->hclk)) {
+		dev_err(vop->dev, "failed to get hclk source\n");
+		return PTR_ERR(vop->hclk);
+	}
+	vop->aclk = devm_clk_get(vop->dev, "aclk_vop");
+	if (IS_ERR(vop->aclk)) {
+		dev_err(vop->dev, "failed to get aclk source\n");
+		return PTR_ERR(vop->aclk);
+	}
+	vop->dclk = devm_clk_get(vop->dev, "dclk_vop");
+	if (IS_ERR(vop->dclk)) {
+		dev_err(vop->dev, "failed to get dclk source\n");
+		return PTR_ERR(vop->dclk);
+	}
+
+	ret = clk_prepare(vop->hclk);
+	if (ret < 0) {
+		dev_err(vop->dev, "failed to prepare hclk\n");
+		return ret;
+	}
+
+	ret = clk_prepare(vop->dclk);
+	if (ret < 0) {
+		dev_err(vop->dev, "failed to prepare dclk\n");
+		goto err_unprepare_hclk;
+	}
+
+	ret = clk_prepare(vop->aclk);
+	if (ret < 0) {
+		dev_err(vop->dev, "failed to prepare aclk\n");
+		goto err_unprepare_dclk;
+	}
+
+	/*
+	 * enable hclk, so that we can config vop register.
+	 */
+	ret = clk_enable(vop->hclk);
+	if (ret < 0) {
+		dev_err(vop->dev, "failed to prepare aclk\n");
+		goto err_unprepare_aclk;
+	}
+	/*
+	 * do hclk_reset, reset all vop registers.
+	 */
+	ahb_rst = devm_reset_control_get(vop->dev, "ahb");
+	if (IS_ERR(ahb_rst)) {
+		dev_err(vop->dev, "failed to get ahb reset\n");
+		ret = PTR_ERR(ahb_rst);
+		goto err_disable_hclk;
+	}
+	reset_control_assert(ahb_rst);
+	usleep_range(10, 20);
+	reset_control_deassert(ahb_rst);
+
+	memcpy(vop->regsbak, vop->regs, vop->len);
+
+	for (i = 0; i < vop_data->table_size; i++)
+		vop_writel(vop, init_table[i].offset, init_table[i].value);
+
+	for (i = 0; i < vop_data->win_size; i++) {
+		const struct vop_win_data *win = &vop_data->win[i];
+
+		VOP_WIN_SET(vop, win, enable, 0);
+	}
+
+	vop_cfg_done(vop);
+
+	/*
+	 * do dclk_reset, let all config take affect.
+	 */
+	vop->dclk_rst = devm_reset_control_get(vop->dev, "dclk");
+	if (IS_ERR(vop->dclk_rst)) {
+		dev_err(vop->dev, "failed to get dclk reset\n");
+		ret = PTR_ERR(vop->dclk_rst);
+		goto err_unprepare_aclk;
+	}
+	reset_control_assert(vop->dclk_rst);
+	usleep_range(10, 20);
+	reset_control_deassert(vop->dclk_rst);
+
+	clk_disable(vop->hclk);
+
+	vop->dpms = DRM_MODE_DPMS_OFF;
+
+	return 0;
+
+err_disable_hclk:
+	clk_disable(vop->hclk);
+err_unprepare_aclk:
+	clk_unprepare(vop->aclk);
+err_unprepare_dclk:
+	clk_unprepare(vop->dclk);
+err_unprepare_hclk:
+	clk_unprepare(vop->hclk);
+	return ret;
+}
+
+/*
+ * Initialize the vop->win array elements.
+ */
+static void vop_win_init(struct vop *vop)
+{
+	const struct vop_data *vop_data = vop->data;
+	unsigned int i;
+
+	for (i = 0; i < vop_data->win_size; i++) {
+		struct vop_win *vop_win = &vop->win[i];
+		const struct vop_win_data *win_data = &vop_data->win[i];
+
+		vop_win->data = win_data;
+		vop_win->vop = vop;
+		INIT_LIST_HEAD(&vop_win->pending);
+	}
+}
+
+static int vop_bind(struct device *dev, struct device *master, void *data)
+{
+	struct platform_device *pdev = to_platform_device(dev);
+	const struct of_device_id *of_id;
+	const struct vop_data *vop_data;
+	struct drm_device *drm_dev = data;
+	struct vop *vop;
+	struct resource *res;
+	size_t alloc_size;
+	int ret;
+
+	of_id = of_match_device(vop_driver_dt_match, dev);
+	vop_data = of_id->data;
+	if (!vop_data)
+		return -ENODEV;
+
+	/* Allocate vop struct and its vop_win array */
+	alloc_size = sizeof(*vop) + sizeof(*vop->win) * vop_data->win_size;
+	vop = devm_kzalloc(dev, alloc_size, GFP_KERNEL);
+	if (!vop)
+		return -ENOMEM;
+
+	vop->dev = dev;
+	vop->data = vop_data;
+	vop->drm_dev = drm_dev;
+	dev_set_drvdata(dev, vop);
+
+	vop_win_init(vop);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	vop->len = resource_size(res);
+	vop->regs = devm_ioremap_resource(dev, res);
+	if (IS_ERR(vop->regs))
+		return PTR_ERR(vop->regs);
+
+	vop->regsbak = devm_kzalloc(dev, vop->len, GFP_KERNEL);
+	if (!vop->regsbak)
+		return -ENOMEM;
+
+	ret = vop_initial(vop);
+	if (ret < 0) {
+		dev_err(&pdev->dev, "cannot initial vop dev - err %d\n", ret);
+		return ret;
+	}
+
+	vop->irq = platform_get_irq(pdev, 0);
+	if (vop->irq < 0) {
+		dev_err(dev, "cannot find irq for vop\n");
+		return vop->irq;
+	}
+
+	spin_lock_init(&vop->reg_lock);
+	spin_lock_init(&vop->irq_lock);
+
+	mutex_init(&vop->vsync_mutex);
+
+	ret = devm_request_threaded_irq(dev, vop->irq, vop_isr, vop_isr_thread,
+					IRQF_SHARED, dev_name(dev), vop);
+	if (ret)
+		return ret;
+
+	/* IRQ is initially disabled; it gets enabled in power_on */
+	disable_irq(vop->irq);
+
+	ret = vop_create_crtc(vop);
+	if (ret)
+		return ret;
+
+	pm_runtime_enable(&pdev->dev);
+	return 0;
+}
+
+static void vop_unbind(struct device *dev, struct device *master, void *data)
+{
+	struct vop *vop = dev_get_drvdata(dev);
+
+	pm_runtime_disable(dev);
+	vop_destroy_crtc(vop);
+}
+
+static const struct component_ops vop_component_ops = {
+	.bind = vop_bind,
+	.unbind = vop_unbind,
+};
+
+static int vop_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+
+	if (!dev->of_node) {
+		dev_err(dev, "can't find vop devices\n");
+		return -ENODEV;
+	}
+
+	return component_add(dev, &vop_component_ops);
+}
+
+static int vop_remove(struct platform_device *pdev)
+{
+	component_del(&pdev->dev, &vop_component_ops);
+
+	return 0;
+}
+
+struct platform_driver vop_platform_driver = {
+	.probe = vop_probe,
+	.remove = vop_remove,
+	.driver = {
+		.name = "rockchip-vop",
+		.owner = THIS_MODULE,
+		.of_match_table = of_match_ptr(vop_driver_dt_match),
+	},
+};
+
+module_platform_driver(vop_platform_driver);
+
+MODULE_AUTHOR("Mark Yao <mark.yao@rock-chips.com>");
+MODULE_DESCRIPTION("ROCKCHIP VOP Driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.h b/drivers/gpu/drm/rockchip/rockchip_drm_vop.h
new file mode 100644
index 000000000000..63e9b3a084c5
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.h
@@ -0,0 +1,201 @@
+/*
+ * Copyright (C) Fuzhou Rockchip Electronics Co.Ltd
+ * Author:Mark Yao <mark.yao@rock-chips.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _ROCKCHIP_DRM_VOP_H
+#define _ROCKCHIP_DRM_VOP_H
+
+/* register definition */
+#define REG_CFG_DONE			0x0000
+#define VERSION_INFO			0x0004
+#define SYS_CTRL			0x0008
+#define SYS_CTRL1			0x000c
+#define DSP_CTRL0			0x0010
+#define DSP_CTRL1			0x0014
+#define DSP_BG				0x0018
+#define MCU_CTRL			0x001c
+#define INTR_CTRL0			0x0020
+#define INTR_CTRL1			0x0024
+#define WIN0_CTRL0			0x0030
+#define WIN0_CTRL1			0x0034
+#define WIN0_COLOR_KEY			0x0038
+#define WIN0_VIR			0x003c
+#define WIN0_YRGB_MST			0x0040
+#define WIN0_CBR_MST			0x0044
+#define WIN0_ACT_INFO			0x0048
+#define WIN0_DSP_INFO			0x004c
+#define WIN0_DSP_ST			0x0050
+#define WIN0_SCL_FACTOR_YRGB		0x0054
+#define WIN0_SCL_FACTOR_CBR		0x0058
+#define WIN0_SCL_OFFSET			0x005c
+#define WIN0_SRC_ALPHA_CTRL		0x0060
+#define WIN0_DST_ALPHA_CTRL		0x0064
+#define WIN0_FADING_CTRL		0x0068
+/* win1 register */
+#define WIN1_CTRL0			0x0070
+#define WIN1_CTRL1			0x0074
+#define WIN1_COLOR_KEY			0x0078
+#define WIN1_VIR			0x007c
+#define WIN1_YRGB_MST			0x0080
+#define WIN1_CBR_MST			0x0084
+#define WIN1_ACT_INFO			0x0088
+#define WIN1_DSP_INFO			0x008c
+#define WIN1_DSP_ST			0x0090
+#define WIN1_SCL_FACTOR_YRGB		0x0094
+#define WIN1_SCL_FACTOR_CBR		0x0098
+#define WIN1_SCL_OFFSET			0x009c
+#define WIN1_SRC_ALPHA_CTRL		0x00a0
+#define WIN1_DST_ALPHA_CTRL		0x00a4
+#define WIN1_FADING_CTRL		0x00a8
+/* win2 register */
+#define WIN2_CTRL0			0x00b0
+#define WIN2_CTRL1			0x00b4
+#define WIN2_VIR0_1			0x00b8
+#define WIN2_VIR2_3			0x00bc
+#define WIN2_MST0			0x00c0
+#define WIN2_DSP_INFO0			0x00c4
+#define WIN2_DSP_ST0			0x00c8
+#define WIN2_COLOR_KEY			0x00cc
+#define WIN2_MST1			0x00d0
+#define WIN2_DSP_INFO1			0x00d4
+#define WIN2_DSP_ST1			0x00d8
+#define WIN2_SRC_ALPHA_CTRL		0x00dc
+#define WIN2_MST2			0x00e0
+#define WIN2_DSP_INFO2			0x00e4
+#define WIN2_DSP_ST2			0x00e8
+#define WIN2_DST_ALPHA_CTRL		0x00ec
+#define WIN2_MST3			0x00f0
+#define WIN2_DSP_INFO3			0x00f4
+#define WIN2_DSP_ST3			0x00f8
+#define WIN2_FADING_CTRL		0x00fc
+/* win3 register */
+#define WIN3_CTRL0			0x0100
+#define WIN3_CTRL1			0x0104
+#define WIN3_VIR0_1			0x0108
+#define WIN3_VIR2_3			0x010c
+#define WIN3_MST0			0x0110
+#define WIN3_DSP_INFO0			0x0114
+#define WIN3_DSP_ST0			0x0118
+#define WIN3_COLOR_KEY			0x011c
+#define WIN3_MST1			0x0120
+#define WIN3_DSP_INFO1			0x0124
+#define WIN3_DSP_ST1			0x0128
+#define WIN3_SRC_ALPHA_CTRL		0x012c
+#define WIN3_MST2			0x0130
+#define WIN3_DSP_INFO2			0x0134
+#define WIN3_DSP_ST2			0x0138
+#define WIN3_DST_ALPHA_CTRL		0x013c
+#define WIN3_MST3			0x0140
+#define WIN3_DSP_INFO3			0x0144
+#define WIN3_DSP_ST3			0x0148
+#define WIN3_FADING_CTRL		0x014c
+/* hwc register */
+#define HWC_CTRL0			0x0150
+#define HWC_CTRL1			0x0154
+#define HWC_MST				0x0158
+#define HWC_DSP_ST			0x015c
+#define HWC_SRC_ALPHA_CTRL		0x0160
+#define HWC_DST_ALPHA_CTRL		0x0164
+#define HWC_FADING_CTRL			0x0168
+/* post process register */
+#define POST_DSP_HACT_INFO		0x0170
+#define POST_DSP_VACT_INFO		0x0174
+#define POST_SCL_FACTOR_YRGB		0x0178
+#define POST_SCL_CTRL			0x0180
+#define POST_DSP_VACT_INFO_F1		0x0184
+#define DSP_HTOTAL_HS_END		0x0188
+#define DSP_HACT_ST_END			0x018c
+#define DSP_VTOTAL_VS_END		0x0190
+#define DSP_VACT_ST_END			0x0194
+#define DSP_VS_ST_END_F1		0x0198
+#define DSP_VACT_ST_END_F1		0x019c
+/* register definition end */
+
+/* interrupt define */
+#define DSP_HOLD_VALID_INTR		(1 << 0)
+#define FS_INTR				(1 << 1)
+#define LINE_FLAG_INTR			(1 << 2)
+#define BUS_ERROR_INTR			(1 << 3)
+
+#define INTR_MASK			(DSP_HOLD_VALID_INTR | FS_INTR | \
+					 LINE_FLAG_INTR | BUS_ERROR_INTR)
+
+#define DSP_HOLD_VALID_INTR_EN(x)	((x) << 4)
+#define FS_INTR_EN(x)			((x) << 5)
+#define LINE_FLAG_INTR_EN(x)		((x) << 6)
+#define BUS_ERROR_INTR_EN(x)		((x) << 7)
+#define DSP_HOLD_VALID_INTR_MASK	(1 << 4)
+#define FS_INTR_MASK			(1 << 5)
+#define LINE_FLAG_INTR_MASK		(1 << 6)
+#define BUS_ERROR_INTR_MASK		(1 << 7)
+
+#define INTR_CLR_SHIFT			8
+#define DSP_HOLD_VALID_INTR_CLR		(1 << (INTR_CLR_SHIFT + 0))
+#define FS_INTR_CLR			(1 << (INTR_CLR_SHIFT + 1))
+#define LINE_FLAG_INTR_CLR		(1 << (INTR_CLR_SHIFT + 2))
+#define BUS_ERROR_INTR_CLR		(1 << (INTR_CLR_SHIFT + 3))
+
+#define DSP_LINE_NUM(x)			(((x) & 0x1fff) << 12)
+#define DSP_LINE_NUM_MASK		(0x1fff << 12)
+
+/* src alpha ctrl define */
+#define SRC_FADING_VALUE(x)		(((x) & 0xff) << 24)
+#define SRC_GLOBAL_ALPHA(x)		(((x) & 0xff) << 16)
+#define SRC_FACTOR_M0(x)		(((x) & 0x7) << 6)
+#define SRC_ALPHA_CAL_M0(x)		(((x) & 0x1) << 5)
+#define SRC_BLEND_M0(x)			(((x) & 0x3) << 3)
+#define SRC_ALPHA_M0(x)			(((x) & 0x1) << 2)
+#define SRC_COLOR_M0(x)			(((x) & 0x1) << 1)
+#define SRC_ALPHA_EN(x)			(((x) & 0x1) << 0)
+/* dst alpha ctrl define */
+#define DST_FACTOR_M0(x)		(((x) & 0x7) << 6)
+
+/*
+ * display output interface supported by rockchip lcdc
+ */
+#define ROCKCHIP_OUT_MODE_P888	0
+#define ROCKCHIP_OUT_MODE_P666	1
+#define ROCKCHIP_OUT_MODE_P565	2
+/* for use special outface */
+#define ROCKCHIP_OUT_MODE_AAAA	15
+
+enum alpha_mode {
+	ALPHA_STRAIGHT,
+	ALPHA_INVERSE,
+};
+
+enum global_blend_mode {
+	ALPHA_GLOBAL,
+	ALPHA_PER_PIX,
+	ALPHA_PER_PIX_GLOBAL,
+};
+
+enum alpha_cal_mode {
+	ALPHA_SATURATION,
+	ALPHA_NO_SATURATION,
+};
+
+enum color_mode {
+	ALPHA_SRC_PRE_MUL,
+	ALPHA_SRC_NO_PRE_MUL,
+};
+
+enum factor_mode {
+	ALPHA_ZERO,
+	ALPHA_ONE,
+	ALPHA_SRC,
+	ALPHA_SRC_INVERSE,
+	ALPHA_SRC_GLOBAL,
+};
+
+#endif /* _ROCKCHIP_DRM_VOP_H */
diff --git a/drivers/gpu/drm/shmobile/shmob_drm_crtc.c b/drivers/gpu/drm/shmobile/shmob_drm_crtc.c
index 0ddce4d046d9..859ccb658601 100644
--- a/drivers/gpu/drm/shmobile/shmob_drm_crtc.c
+++ b/drivers/gpu/drm/shmobile/shmob_drm_crtc.c
@@ -19,6 +19,7 @@
 #include <drm/drm_crtc_helper.h>
 #include <drm/drm_fb_cma_helper.h>
 #include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_plane_helper.h>
 
 #include <video/sh_mobile_meram.h>
 
diff --git a/drivers/gpu/drm/sti/Kconfig b/drivers/gpu/drm/sti/Kconfig
index ae8850f3e63b..d6d6b705b8c1 100644
--- a/drivers/gpu/drm/sti/Kconfig
+++ b/drivers/gpu/drm/sti/Kconfig
@@ -5,6 +5,7 @@ config DRM_STI
 	select DRM_KMS_HELPER
 	select DRM_GEM_CMA_HELPER
 	select DRM_KMS_CMA_HELPER
+	select FW_LOADER_USER_HELPER_FALLBACK
 	help
 	  Choose this option to enable DRM on STM stiH41x chipset
 
diff --git a/drivers/gpu/drm/sti/Makefile b/drivers/gpu/drm/sti/Makefile
index 04ac2ceef27f..6ba9d27c1b90 100644
--- a/drivers/gpu/drm/sti/Makefile
+++ b/drivers/gpu/drm/sti/Makefile
@@ -3,6 +3,7 @@ sticompositor-y := \
 	sti_mixer.o \
 	sti_gdp.o \
 	sti_vid.o \
+	sti_cursor.o \
 	sti_compositor.o \
 	sti_drm_crtc.o \
 	sti_drm_plane.o
@@ -18,4 +19,5 @@ obj-$(CONFIG_DRM_STI) = \
 	sti_hda.o \
 	sti_tvout.o \
 	sticompositor.o \
-	sti_drm_drv.o
-\ No newline at end of file
+	sti_hqvdp.o \
+	sti_drm_drv.o
diff --git a/drivers/gpu/drm/sti/sti_compositor.c b/drivers/gpu/drm/sti/sti_compositor.c
index 9e31dfe154ed..43215d3020fb 100644
--- a/drivers/gpu/drm/sti/sti_compositor.c
+++ b/drivers/gpu/drm/sti/sti_compositor.c
@@ -24,14 +24,16 @@
  * stiH407 compositor properties
  */
 struct sti_compositor_data stih407_compositor_data = {
-	.nb_subdev = 6,
+	.nb_subdev = 8,
 	.subdev_desc = {
+			{STI_CURSOR_SUBDEV, (int)STI_CURSOR, 0x000},
 			{STI_GPD_SUBDEV, (int)STI_GDP_0, 0x100},
 			{STI_GPD_SUBDEV, (int)STI_GDP_1, 0x200},
 			{STI_GPD_SUBDEV, (int)STI_GDP_2, 0x300},
 			{STI_GPD_SUBDEV, (int)STI_GDP_3, 0x400},
 			{STI_VID_SUBDEV, (int)STI_VID_0, 0x700},
-			{STI_MIXER_MAIN_SUBDEV, STI_MIXER_MAIN, 0xC00}
+			{STI_MIXER_MAIN_SUBDEV, STI_MIXER_MAIN, 0xC00},
+			{STI_MIXER_AUX_SUBDEV, STI_MIXER_AUX, 0xD00},
 	},
 };
 
@@ -67,11 +69,11 @@ static int sti_compositor_init_subdev(struct sti_compositor *compo,
 			break;
 		case STI_GPD_SUBDEV:
 		case STI_VID_SUBDEV:
+		case STI_CURSOR_SUBDEV:
 			compo->layer[layer_id++] =
 			    sti_layer_create(compo->dev, desc[i].id,
 					     compo->regs + desc[i].offset);
 			break;
-			/* case STI_CURSOR_SUBDEV : TODO */
 		default:
 			DRM_ERROR("Unknow subdev compoment type\n");
 			return 1;
@@ -102,33 +104,35 @@ static int sti_compositor_bind(struct device *dev, struct device *master,
 			enum sti_layer_type type = desc & STI_LAYER_TYPE_MASK;
 			enum drm_plane_type plane_type = DRM_PLANE_TYPE_OVERLAY;
 
-			if (compo->mixer[crtc])
+			if (crtc < compo->nb_mixers)
 				plane_type = DRM_PLANE_TYPE_PRIMARY;
 
 			switch (type) {
 			case STI_CUR:
 				cursor = sti_drm_plane_init(drm_dev,
 						compo->layer[i],
-						(1 << crtc) - 1,
-						DRM_PLANE_TYPE_CURSOR);
+						1, DRM_PLANE_TYPE_CURSOR);
 				break;
 			case STI_GDP:
 			case STI_VID:
 				primary = sti_drm_plane_init(drm_dev,
 						compo->layer[i],
-						(1 << crtc) - 1, plane_type);
+						(1 << compo->nb_mixers) - 1,
+						plane_type);
 				plane++;
 				break;
 			case STI_BCK:
+			case STI_VDP:
 				break;
 			}
 
 			/* The first planes are reserved for primary planes*/
-			if (compo->mixer[crtc]) {
+			if (crtc < compo->nb_mixers && primary) {
 				sti_drm_crtc_init(drm_dev, compo->mixer[crtc],
 						primary, cursor);
 				crtc++;
 				cursor = NULL;
+				primary = NULL;
 			}
 		}
 	}
diff --git a/drivers/gpu/drm/sti/sti_compositor.h b/drivers/gpu/drm/sti/sti_compositor.h
index 3ea19db72e0f..019eb44c62cc 100644
--- a/drivers/gpu/drm/sti/sti_compositor.h
+++ b/drivers/gpu/drm/sti/sti_compositor.h
@@ -64,7 +64,6 @@ struct sti_compositor_data {
  * @layer: array of layers
  * @nb_mixers: number of mixers for this compositor
  * @nb_layers: number of layers (GDP,VID,...) for this compositor
- * @enable: true if compositor is enable else false
  * @vtg_vblank_nb: callback for VTG VSYNC notification
  */
 struct sti_compositor {
@@ -83,7 +82,6 @@ struct sti_compositor {
 	struct sti_layer *layer[STI_MAX_LAYER];
 	int nb_mixers;
 	int nb_layers;
-	bool enable;
 	struct notifier_block vtg_vblank_nb;
 };
 
diff --git a/drivers/gpu/drm/sti/sti_cursor.c b/drivers/gpu/drm/sti/sti_cursor.c
new file mode 100644
index 000000000000..010eaee60bf7
--- /dev/null
+++ b/drivers/gpu/drm/sti/sti_cursor.c
@@ -0,0 +1,242 @@
+/*
+ * Copyright (C) STMicroelectronics SA 2014
+ * Authors: Vincent Abriou <vincent.abriou@st.com>
+ *          Fabien Dessenne <fabien.dessenne@st.com>
+ *          for STMicroelectronics.
+ * License terms:  GNU General Public License (GPL), version 2
+ */
+#include <drm/drmP.h>
+
+#include "sti_cursor.h"
+#include "sti_layer.h"
+#include "sti_vtg.h"
+
+/* Registers */
+#define CUR_CTL             0x00
+#define CUR_VPO             0x0C
+#define CUR_PML             0x14
+#define CUR_PMP             0x18
+#define CUR_SIZE            0x1C
+#define CUR_CML             0x20
+#define CUR_AWS             0x28
+#define CUR_AWE             0x2C
+
+#define CUR_CTL_CLUT_UPDATE BIT(1)
+
+#define STI_CURS_MIN_SIZE   1
+#define STI_CURS_MAX_SIZE   128
+
+/*
+ * pixmap dma buffer stucture
+ *
+ * @paddr:  physical address
+ * @size:   buffer size
+ * @base:   virtual address
+ */
+struct dma_pixmap {
+	dma_addr_t paddr;
+	size_t size;
+	void *base;
+};
+
+/**
+ * STI Cursor structure
+ *
+ * @layer:      layer structure
+ * @width:      cursor width
+ * @height:     cursor height
+ * @clut:       color look up table
+ * @clut_paddr: color look up table physical address
+ * @pixmap:     pixmap dma buffer (clut8-format cursor)
+ */
+struct sti_cursor {
+	struct sti_layer layer;
+	unsigned int width;
+	unsigned int height;
+	unsigned short *clut;
+	dma_addr_t clut_paddr;
+	struct dma_pixmap pixmap;
+};
+
+static const uint32_t cursor_supported_formats[] = {
+	DRM_FORMAT_ARGB8888,
+};
+
+#define to_sti_cursor(x) container_of(x, struct sti_cursor, layer)
+
+static const uint32_t *sti_cursor_get_formats(struct sti_layer *layer)
+{
+	return cursor_supported_formats;
+}
+
+static unsigned int sti_cursor_get_nb_formats(struct sti_layer *layer)
+{
+	return ARRAY_SIZE(cursor_supported_formats);
+}
+
+static void sti_cursor_argb8888_to_clut8(struct sti_layer *layer)
+{
+	struct sti_cursor *cursor = to_sti_cursor(layer);
+	u32 *src = layer->vaddr;
+	u8  *dst = cursor->pixmap.base;
+	unsigned int i, j;
+	u32 a, r, g, b;
+
+	for (i = 0; i < cursor->height; i++) {
+		for (j = 0; j < cursor->width; j++) {
+			/* Pick the 2 higher bits of each component */
+			a = (*src >> 30) & 3;
+			r = (*src >> 22) & 3;
+			g = (*src >> 14) & 3;
+			b = (*src >> 6) & 3;
+			*dst = a << 6 | r << 4 | g << 2 | b;
+			src++;
+			dst++;
+		}
+	}
+}
+
+static int sti_cursor_prepare_layer(struct sti_layer *layer, bool first_prepare)
+{
+	struct sti_cursor *cursor = to_sti_cursor(layer);
+	struct drm_display_mode *mode = layer->mode;
+	u32 y, x;
+	u32 val;
+
+	DRM_DEBUG_DRIVER("\n");
+
+	dev_dbg(layer->dev, "%s %s\n", __func__, sti_layer_to_str(layer));
+
+	if (layer->src_w < STI_CURS_MIN_SIZE ||
+	    layer->src_h < STI_CURS_MIN_SIZE ||
+	    layer->src_w > STI_CURS_MAX_SIZE ||
+	    layer->src_h > STI_CURS_MAX_SIZE) {
+		DRM_ERROR("Invalid cursor size (%dx%d)\n",
+				layer->src_w, layer->src_h);
+		return -EINVAL;
+	}
+
+	/* If the cursor size has changed, re-allocated the pixmap */
+	if (!cursor->pixmap.base ||
+	    (cursor->width != layer->src_w) ||
+	    (cursor->height != layer->src_h)) {
+		cursor->width = layer->src_w;
+		cursor->height = layer->src_h;
+
+		if (cursor->pixmap.base)
+			dma_free_writecombine(layer->dev,
+					      cursor->pixmap.size,
+					      cursor->pixmap.base,
+					      cursor->pixmap.paddr);
+
+		cursor->pixmap.size = cursor->width * cursor->height;
+
+		cursor->pixmap.base = dma_alloc_writecombine(layer->dev,
+							cursor->pixmap.size,
+							&cursor->pixmap.paddr,
+							GFP_KERNEL | GFP_DMA);
+		if (!cursor->pixmap.base) {
+			DRM_ERROR("Failed to allocate memory for pixmap\n");
+			return -ENOMEM;
+		}
+	}
+
+	/* Convert ARGB8888 to CLUT8 */
+	sti_cursor_argb8888_to_clut8(layer);
+
+	/* AWS and AWE depend on the mode */
+	y = sti_vtg_get_line_number(*mode, 0);
+	x = sti_vtg_get_pixel_number(*mode, 0);
+	val = y << 16 | x;
+	writel(val, layer->regs + CUR_AWS);
+	y = sti_vtg_get_line_number(*mode, mode->vdisplay - 1);
+	x = sti_vtg_get_pixel_number(*mode, mode->hdisplay - 1);
+	val = y << 16 | x;
+	writel(val, layer->regs + CUR_AWE);
+
+	if (first_prepare) {
+		/* Set and fetch CLUT */
+		writel(cursor->clut_paddr, layer->regs + CUR_CML);
+		writel(CUR_CTL_CLUT_UPDATE, layer->regs + CUR_CTL);
+	}
+
+	return 0;
+}
+
+static int sti_cursor_commit_layer(struct sti_layer *layer)
+{
+	struct sti_cursor *cursor = to_sti_cursor(layer);
+	struct drm_display_mode *mode = layer->mode;
+	u32 ydo, xdo;
+
+	dev_dbg(layer->dev, "%s %s\n", __func__, sti_layer_to_str(layer));
+
+	/* Set memory location, size, and position */
+	writel(cursor->pixmap.paddr, layer->regs + CUR_PML);
+	writel(cursor->width, layer->regs + CUR_PMP);
+	writel(cursor->height << 16 | cursor->width, layer->regs + CUR_SIZE);
+
+	ydo = sti_vtg_get_line_number(*mode, layer->dst_y);
+	xdo = sti_vtg_get_pixel_number(*mode, layer->dst_y);
+	writel((ydo << 16) | xdo, layer->regs + CUR_VPO);
+
+	return 0;
+}
+
+static int sti_cursor_disable_layer(struct sti_layer *layer)
+{
+	return 0;
+}
+
+static void sti_cursor_init(struct sti_layer *layer)
+{
+	struct sti_cursor *cursor = to_sti_cursor(layer);
+	unsigned short *base = cursor->clut;
+	unsigned int a, r, g, b;
+
+	/* Assign CLUT values, ARGB444 format */
+	for (a = 0; a < 4; a++)
+		for (r = 0; r < 4; r++)
+			for (g = 0; g < 4; g++)
+				for (b = 0; b < 4; b++)
+					*base++ = (a * 5) << 12 |
+						  (r * 5) << 8 |
+						  (g * 5) << 4 |
+						  (b * 5);
+}
+
+static const struct sti_layer_funcs cursor_ops = {
+	.get_formats = sti_cursor_get_formats,
+	.get_nb_formats = sti_cursor_get_nb_formats,
+	.init = sti_cursor_init,
+	.prepare = sti_cursor_prepare_layer,
+	.commit = sti_cursor_commit_layer,
+	.disable = sti_cursor_disable_layer,
+};
+
+struct sti_layer *sti_cursor_create(struct device *dev)
+{
+	struct sti_cursor *cursor;
+
+	cursor = devm_kzalloc(dev, sizeof(*cursor), GFP_KERNEL);
+	if (!cursor) {
+		DRM_ERROR("Failed to allocate memory for cursor\n");
+		return NULL;
+	}
+
+	/* Allocate clut buffer */
+	cursor->clut = dma_alloc_writecombine(dev,
+			0x100 * sizeof(unsigned short),
+			&cursor->clut_paddr,
+			GFP_KERNEL | GFP_DMA);
+
+	if (!cursor->clut) {
+		DRM_ERROR("Failed to allocate memory for cursor clut\n");
+		devm_kfree(dev, cursor);
+		return NULL;
+	}
+
+	cursor->layer.ops = &cursor_ops;
+
+	return (struct sti_layer *)cursor;
+}
diff --git a/drivers/gpu/drm/sti/sti_cursor.h b/drivers/gpu/drm/sti/sti_cursor.h
new file mode 100644
index 000000000000..3c9827404f27
--- /dev/null
+++ b/drivers/gpu/drm/sti/sti_cursor.h
@@ -0,0 +1,12 @@
+/*
+ * Copyright (C) STMicroelectronics SA 2013
+ * Authors: Vincent Abriou <vincent.abriou@st.com> for STMicroelectronics.
+ * License terms:  GNU General Public License (GPL), version 2
+ */
+
+#ifndef _STI_CURSOR_H_
+#define _STI_CURSOR_H_
+
+struct sti_layer *sti_cursor_create(struct device *dev);
+
+#endif
diff --git a/drivers/gpu/drm/sti/sti_drm_crtc.c b/drivers/gpu/drm/sti/sti_drm_crtc.c
index d2ae0c0e13be..4c651c200f20 100644
--- a/drivers/gpu/drm/sti/sti_drm_crtc.c
+++ b/drivers/gpu/drm/sti/sti_drm_crtc.c
@@ -10,6 +10,7 @@
 
 #include <drm/drmP.h>
 #include <drm/drm_crtc_helper.h>
+#include <drm/drm_plane_helper.h>
 
 #include "sti_compositor.h"
 #include "sti_drm_drv.h"
@@ -27,7 +28,7 @@ static void sti_drm_crtc_prepare(struct drm_crtc *crtc)
 	struct device *dev = mixer->dev;
 	struct sti_compositor *compo = dev_get_drvdata(dev);
 
-	compo->enable = true;
+	mixer->enabled = true;
 
 	/* Prepare and enable the compo IP clock */
 	if (mixer->id == STI_MIXER_MAIN) {
@@ -37,6 +38,8 @@ static void sti_drm_crtc_prepare(struct drm_crtc *crtc)
 		if (clk_prepare_enable(compo->clk_compo_aux))
 			DRM_INFO("Failed to prepare/enable compo_aux clk\n");
 	}
+
+	sti_mixer_clear_all_layers(mixer);
 }
 
 static void sti_drm_crtc_commit(struct drm_crtc *crtc)
@@ -61,6 +64,8 @@ static void sti_drm_crtc_commit(struct drm_crtc *crtc)
 	/* Enable layer on mixer */
 	if (sti_mixer_set_layer_status(mixer, layer, true))
 		DRM_ERROR("Can not enable layer at mixer\n");
+
+	drm_crtc_vblank_on(crtc);
 }
 
 static bool sti_drm_crtc_mode_fixup(struct drm_crtc *crtc,
@@ -143,7 +148,8 @@ sti_drm_crtc_mode_set(struct drm_crtc *crtc, struct drm_display_mode *mode,
 	w = crtc->primary->fb->width - x;
 	h = crtc->primary->fb->height - y;
 
-	return sti_layer_prepare(layer, crtc->primary->fb, &crtc->mode,
+	return sti_layer_prepare(layer, crtc,
+			crtc->primary->fb, &crtc->mode,
 			mixer->id, 0, 0, w, h, x, y, w, h);
 }
 
@@ -170,7 +176,8 @@ static int sti_drm_crtc_mode_set_base(struct drm_crtc *crtc, int x, int y,
 	w = crtc->primary->fb->width - crtc->x;
 	h = crtc->primary->fb->height - crtc->y;
 
-	ret = sti_layer_prepare(layer, crtc->primary->fb, &crtc->mode,
+	ret = sti_layer_prepare(layer, crtc,
+				crtc->primary->fb, &crtc->mode,
 				mixer->id, 0, 0, w, h,
 				crtc->x, crtc->y, w, h);
 	if (ret) {
@@ -195,7 +202,7 @@ static void sti_drm_crtc_disable(struct drm_crtc *crtc)
 	struct sti_compositor *compo = dev_get_drvdata(dev);
 	struct sti_layer *layer;
 
-	if (!compo->enable)
+	if (!mixer->enabled)
 		return;
 
 	DRM_DEBUG_KMS("CRTC:%d (%s)\n", crtc->base.id, sti_mixer_to_str(mixer));
@@ -221,7 +228,7 @@ static void sti_drm_crtc_disable(struct drm_crtc *crtc)
 	/* Then disable layer itself */
 	sti_layer_disable(layer);
 
-	drm_vblank_off(crtc->dev, mixer->id);
+	drm_crtc_vblank_off(crtc);
 
 	/* Disable pixel clock and compo IP clocks */
 	if (mixer->id == STI_MIXER_MAIN) {
@@ -232,7 +239,7 @@ static void sti_drm_crtc_disable(struct drm_crtc *crtc)
 		clk_disable_unprepare(compo->clk_compo_aux);
 	}
 
-	compo->enable = false;
+	mixer->enabled = false;
 }
 
 static struct drm_crtc_helper_funcs sti_crtc_helper_funcs = {
@@ -363,7 +370,6 @@ void sti_drm_crtc_disable_vblank(struct drm_device *dev, int crtc)
 	struct sti_drm_private *priv = dev->dev_private;
 	struct sti_compositor *compo = priv->compo;
 	struct notifier_block *vtg_vblank_nb = &compo->vtg_vblank_nb;
-	unsigned long flags;
 
 	DRM_DEBUG_DRIVER("\n");
 
@@ -372,13 +378,10 @@ void sti_drm_crtc_disable_vblank(struct drm_device *dev, int crtc)
 		DRM_DEBUG_DRIVER("Warning: cannot unregister VTG notifier\n");
 
 	/* free the resources of the pending requests */
-	spin_lock_irqsave(&dev->event_lock, flags);
 	if (compo->mixer[crtc]->pending_event) {
 		drm_vblank_put(dev, crtc);
 		compo->mixer[crtc]->pending_event = NULL;
 	}
-	spin_unlock_irqrestore(&dev->event_lock, flags);
-
 }
 EXPORT_SYMBOL(sti_drm_crtc_disable_vblank);
 
@@ -398,6 +401,7 @@ bool sti_drm_crtc_is_main(struct drm_crtc *crtc)
 
 	return false;
 }
+EXPORT_SYMBOL(sti_drm_crtc_is_main);
 
 int sti_drm_crtc_init(struct drm_device *drm_dev, struct sti_mixer *mixer,
 		struct drm_plane *primary, struct drm_plane *cursor)
diff --git a/drivers/gpu/drm/sti/sti_drm_drv.c b/drivers/gpu/drm/sti/sti_drm_drv.c
index 8e64220e8796..5239fa121726 100644
--- a/drivers/gpu/drm/sti/sti_drm_drv.c
+++ b/drivers/gpu/drm/sti/sti_drm_drv.c
@@ -67,8 +67,12 @@ static int sti_drm_load(struct drm_device *dev, unsigned long flags)
 	sti_drm_mode_config_init(dev);
 
 	ret = component_bind_all(dev->dev, dev);
-	if (ret)
+	if (ret) {
+		drm_kms_helper_poll_fini(dev);
+		drm_mode_config_cleanup(dev);
+		kfree(private);
 		return ret;
+	}
 
 	drm_helper_disable_unused_functions(dev);
 
diff --git a/drivers/gpu/drm/sti/sti_drm_plane.c b/drivers/gpu/drm/sti/sti_drm_plane.c
index f4118d4cac22..bb6a29339e10 100644
--- a/drivers/gpu/drm/sti/sti_drm_plane.c
+++ b/drivers/gpu/drm/sti/sti_drm_plane.c
@@ -45,7 +45,8 @@ sti_drm_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
 	}
 
 	/* src_x are in 16.16 format. */
-	res = sti_layer_prepare(layer, fb, &crtc->mode, mixer->id,
+	res = sti_layer_prepare(layer, crtc, fb,
+			&crtc->mode, mixer->id,
 			crtc_x, crtc_y, crtc_w, crtc_h,
 			src_x >> 16, src_y >> 16,
 			src_w >> 16, src_h >> 16);
@@ -193,3 +194,4 @@ struct drm_plane *sti_drm_plane_init(struct drm_device *dev,
 
 	return &layer->plane;
 }
+EXPORT_SYMBOL(sti_drm_plane_init);
diff --git a/drivers/gpu/drm/sti/sti_gdp.c b/drivers/gpu/drm/sti/sti_gdp.c
index 4e30b74559f5..32448d1d1e8f 100644
--- a/drivers/gpu/drm/sti/sti_gdp.c
+++ b/drivers/gpu/drm/sti/sti_gdp.c
@@ -73,7 +73,9 @@ struct sti_gdp_node {
 
 struct sti_gdp_node_list {
 	struct sti_gdp_node *top_field;
+	dma_addr_t top_field_paddr;
 	struct sti_gdp_node *btm_field;
+	dma_addr_t btm_field_paddr;
 };
 
 /**
@@ -81,6 +83,8 @@ struct sti_gdp_node_list {
  *
  * @layer:		layer structure
  * @clk_pix:            pixel clock for the current gdp
+ * @clk_main_parent:    gdp parent clock if main path used
+ * @clk_aux_parent:     gdp parent clock if aux path used
  * @vtg_field_nb:       callback for VTG FIELD (top or bottom) notification
  * @is_curr_top:        true if the current node processed is the top field
  * @node_list:		array of node list
@@ -88,6 +92,8 @@ struct sti_gdp_node_list {
 struct sti_gdp {
 	struct sti_layer layer;
 	struct clk *clk_pix;
+	struct clk *clk_main_parent;
+	struct clk *clk_aux_parent;
 	struct notifier_block vtg_field_nb;
 	bool is_curr_top;
 	struct sti_gdp_node_list node_list[GDP_NODE_NB_BANK];
@@ -168,7 +174,6 @@ static int sti_gdp_get_alpharange(int format)
 static struct sti_gdp_node_list *sti_gdp_get_free_nodes(struct sti_layer *layer)
 {
 	int hw_nvn;
-	void *virt_nvn;
 	struct sti_gdp *gdp = to_sti_gdp(layer);
 	unsigned int i;
 
@@ -176,11 +181,9 @@ static struct sti_gdp_node_list *sti_gdp_get_free_nodes(struct sti_layer *layer)
 	if (!hw_nvn)
 		goto end;
 
-	virt_nvn = dma_to_virt(layer->dev, (dma_addr_t) hw_nvn);
-
 	for (i = 0; i < GDP_NODE_NB_BANK; i++)
-		if ((virt_nvn != gdp->node_list[i].btm_field) &&
-		    (virt_nvn != gdp->node_list[i].top_field))
+		if ((hw_nvn != gdp->node_list[i].btm_field_paddr) &&
+		    (hw_nvn != gdp->node_list[i].top_field_paddr))
 			return &gdp->node_list[i];
 
 	/* in hazardious cases restart with the first node */
@@ -204,7 +207,6 @@ static
 struct sti_gdp_node_list *sti_gdp_get_current_nodes(struct sti_layer *layer)
 {
 	int hw_nvn;
-	void *virt_nvn;
 	struct sti_gdp *gdp = to_sti_gdp(layer);
 	unsigned int i;
 
@@ -212,11 +214,9 @@ struct sti_gdp_node_list *sti_gdp_get_current_nodes(struct sti_layer *layer)
 	if (!hw_nvn)
 		goto end;
 
-	virt_nvn = dma_to_virt(layer->dev, (dma_addr_t) hw_nvn);
-
 	for (i = 0; i < GDP_NODE_NB_BANK; i++)
-		if ((virt_nvn == gdp->node_list[i].btm_field) ||
-				(virt_nvn == gdp->node_list[i].top_field))
+		if ((hw_nvn == gdp->node_list[i].btm_field_paddr) ||
+				(hw_nvn == gdp->node_list[i].top_field_paddr))
 			return &gdp->node_list[i];
 
 end:
@@ -292,8 +292,8 @@ static int sti_gdp_prepare_layer(struct sti_layer *layer, bool first_prepare)
 
 	/* Same content and chained together */
 	memcpy(btm_field, top_field, sizeof(*btm_field));
-	top_field->gam_gdp_nvn = virt_to_dma(dev, btm_field);
-	btm_field->gam_gdp_nvn = virt_to_dma(dev, top_field);
+	top_field->gam_gdp_nvn = list->btm_field_paddr;
+	btm_field->gam_gdp_nvn = list->top_field_paddr;
 
 	/* Interlaced mode */
 	if (layer->mode->flags & DRM_MODE_FLAG_INTERLACE)
@@ -311,6 +311,17 @@ static int sti_gdp_prepare_layer(struct sti_layer *layer, bool first_prepare)
 
 		/* Set and enable gdp clock */
 		if (gdp->clk_pix) {
+			struct clk *clkp;
+			/* According to the mixer used, the gdp pixel clock
+			 * should have a different parent clock. */
+			if (layer->mixer_id == STI_MIXER_MAIN)
+				clkp = gdp->clk_main_parent;
+			else
+				clkp = gdp->clk_aux_parent;
+
+			if (clkp)
+				clk_set_parent(gdp->clk_pix, clkp);
+
 			res = clk_set_rate(gdp->clk_pix, rate);
 			if (res < 0) {
 				DRM_ERROR("Cannot set rate (%dHz) for gdp\n",
@@ -349,8 +360,8 @@ static int sti_gdp_commit_layer(struct sti_layer *layer)
 	struct sti_gdp_node *updated_top_node = updated_list->top_field;
 	struct sti_gdp_node *updated_btm_node = updated_list->btm_field;
 	struct sti_gdp *gdp = to_sti_gdp(layer);
-	u32 dma_updated_top = virt_to_dma(layer->dev, updated_top_node);
-	u32 dma_updated_btm = virt_to_dma(layer->dev, updated_btm_node);
+	u32 dma_updated_top = updated_list->top_field_paddr;
+	u32 dma_updated_btm = updated_list->btm_field_paddr;
 	struct sti_gdp_node_list *curr_list = sti_gdp_get_current_nodes(layer);
 
 	dev_dbg(layer->dev, "%s %s top/btm_node:0x%p/0x%p\n", __func__,
@@ -461,16 +472,16 @@ static void sti_gdp_init(struct sti_layer *layer)
 {
 	struct sti_gdp *gdp = to_sti_gdp(layer);
 	struct device_node *np = layer->dev->of_node;
-	dma_addr_t dma;
+	dma_addr_t dma_addr;
 	void *base;
 	unsigned int i, size;
 
 	/* Allocate all the nodes within a single memory page */
 	size = sizeof(struct sti_gdp_node) *
 	    GDP_NODE_PER_FIELD * GDP_NODE_NB_BANK;
-
 	base = dma_alloc_writecombine(layer->dev,
-			size, &dma, GFP_KERNEL | GFP_DMA);
+			size, &dma_addr, GFP_KERNEL | GFP_DMA);
+
 	if (!base) {
 		DRM_ERROR("Failed to allocate memory for GDP node\n");
 		return;
@@ -478,21 +489,26 @@ static void sti_gdp_init(struct sti_layer *layer)
 	memset(base, 0, size);
 
 	for (i = 0; i < GDP_NODE_NB_BANK; i++) {
-		if (virt_to_dma(layer->dev, base) & 0xF) {
+		if (dma_addr & 0xF) {
 			DRM_ERROR("Mem alignment failed\n");
 			return;
 		}
 		gdp->node_list[i].top_field = base;
+		gdp->node_list[i].top_field_paddr = dma_addr;
+
 		DRM_DEBUG_DRIVER("node[%d].top_field=%p\n", i, base);
 		base += sizeof(struct sti_gdp_node);
+		dma_addr += sizeof(struct sti_gdp_node);
 
-		if (virt_to_dma(layer->dev, base) & 0xF) {
+		if (dma_addr & 0xF) {
 			DRM_ERROR("Mem alignment failed\n");
 			return;
 		}
 		gdp->node_list[i].btm_field = base;
+		gdp->node_list[i].btm_field_paddr = dma_addr;
 		DRM_DEBUG_DRIVER("node[%d].btm_field=%p\n", i, base);
 		base += sizeof(struct sti_gdp_node);
+		dma_addr += sizeof(struct sti_gdp_node);
 	}
 
 	if (of_device_is_compatible(np, "st,stih407-compositor")) {
@@ -520,6 +536,14 @@ static void sti_gdp_init(struct sti_layer *layer)
 		gdp->clk_pix = devm_clk_get(layer->dev, clk_name);
 		if (IS_ERR(gdp->clk_pix))
 			DRM_ERROR("Cannot get %s clock\n", clk_name);
+
+		gdp->clk_main_parent = devm_clk_get(layer->dev, "main_parent");
+		if (IS_ERR(gdp->clk_main_parent))
+			DRM_ERROR("Cannot get main_parent clock\n");
+
+		gdp->clk_aux_parent = devm_clk_get(layer->dev, "aux_parent");
+		if (IS_ERR(gdp->clk_aux_parent))
+			DRM_ERROR("Cannot get aux_parent clock\n");
 	}
 }
 
diff --git a/drivers/gpu/drm/sti/sti_hdmi.c b/drivers/gpu/drm/sti/sti_hdmi.c
index b22968c08d1f..d032e024b0b8 100644
--- a/drivers/gpu/drm/sti/sti_hdmi.c
+++ b/drivers/gpu/drm/sti/sti_hdmi.c
@@ -130,8 +130,7 @@ static irqreturn_t hdmi_irq_thread(int irq, void *arg)
 
 	/* Hot plug/unplug IRQ */
 	if (hdmi->irq_status & HDMI_INT_HOT_PLUG) {
-		/* read gpio to get the status */
-		hdmi->hpd = gpio_get_value(hdmi->hpd_gpio);
+		hdmi->hpd = readl(hdmi->regs + HDMI_STA) & HDMI_STA_HOT_PLUG;
 		if (hdmi->drm_dev)
 			drm_helper_hpd_irq_event(hdmi->drm_dev);
 	}
@@ -273,31 +272,32 @@ static int hdmi_avi_infoframe_config(struct sti_hdmi *hdmi)
 	hdmi_write(hdmi, val, HDMI_SW_DI_CFG);
 
 	/* Infoframe header */
-	val = buffer[0x0];
-	val |= buffer[0x1] << 8;
-	val |= buffer[0x2] << 16;
+	val =  buffer[0];
+	val |= buffer[1] << 8;
+	val |= buffer[2] << 16;
 	hdmi_write(hdmi, val, HDMI_SW_DI_N_HEAD_WORD(HDMI_IFRAME_SLOT_AVI));
 
 	/* Infoframe packet bytes */
-	val = frame[0x0];
-	val |= frame[0x1] << 8;
-	val |= frame[0x2] << 16;
-	val |= frame[0x3] << 24;
+	val =  buffer[3];
+	val |= *(frame++) << 8;
+	val |= *(frame++) << 16;
+	val |= *(frame++) << 24;
 	hdmi_write(hdmi, val, HDMI_SW_DI_N_PKT_WORD0(HDMI_IFRAME_SLOT_AVI));
 
-	val = frame[0x4];
-	val |= frame[0x5] << 8;
-	val |= frame[0x6] << 16;
-	val |= frame[0x7] << 24;
+	val =  *(frame++);
+	val |= *(frame++) << 8;
+	val |= *(frame++) << 16;
+	val |= *(frame++) << 24;
 	hdmi_write(hdmi, val, HDMI_SW_DI_N_PKT_WORD1(HDMI_IFRAME_SLOT_AVI));
 
-	val = frame[0x8];
-	val |= frame[0x9] << 8;
-	val |= frame[0xA] << 16;
-	val |= frame[0xB] << 24;
+	val =  *(frame++);
+	val |= *(frame++) << 8;
+	val |= *(frame++) << 16;
+	val |= *(frame++) << 24;
 	hdmi_write(hdmi, val, HDMI_SW_DI_N_PKT_WORD2(HDMI_IFRAME_SLOT_AVI));
 
-	val = frame[0xC];
+	val = *(frame++);
+	val |= *(frame) << 8;
 	hdmi_write(hdmi, val, HDMI_SW_DI_N_PKT_WORD3(HDMI_IFRAME_SLOT_AVI));
 
 	/* Enable transmission slot for AVI infoframe
@@ -480,17 +480,15 @@ static const struct drm_bridge_funcs sti_hdmi_bridge_funcs = {
 
 static int sti_hdmi_connector_get_modes(struct drm_connector *connector)
 {
-	struct i2c_adapter *i2c_adap;
+	struct sti_hdmi_connector *hdmi_connector
+		= to_sti_hdmi_connector(connector);
+	struct sti_hdmi *hdmi = hdmi_connector->hdmi;
 	struct edid *edid;
 	int count;
 
 	DRM_DEBUG_DRIVER("\n");
 
-	i2c_adap = i2c_get_adapter(1);
-	if (!i2c_adap)
-		goto fail;
-
-	edid = drm_get_edid(connector, i2c_adap);
+	edid = drm_get_edid(connector, hdmi->ddc_adapt);
 	if (!edid)
 		goto fail;
 
@@ -603,29 +601,38 @@ static int sti_hdmi_bind(struct device *dev, struct device *master, void *data)
 	struct sti_hdmi_connector *connector;
 	struct drm_connector *drm_connector;
 	struct drm_bridge *bridge;
-	struct i2c_adapter *i2c_adap;
+	struct device_node *ddc;
 	int err;
 
-	i2c_adap = i2c_get_adapter(1);
-	if (!i2c_adap)
-		return -EPROBE_DEFER;
+	ddc = of_parse_phandle(dev->of_node, "ddc", 0);
+	if (ddc) {
+		hdmi->ddc_adapt = of_find_i2c_adapter_by_node(ddc);
+		if (!hdmi->ddc_adapt) {
+			err = -EPROBE_DEFER;
+			of_node_put(ddc);
+			return err;
+		}
+
+		of_node_put(ddc);
+	}
 
 	/* Set the drm device handle */
 	hdmi->drm_dev = drm_dev;
 
 	encoder = sti_hdmi_find_encoder(drm_dev);
 	if (!encoder)
-		return -ENOMEM;
+		goto err_adapt;
 
 	connector = devm_kzalloc(dev, sizeof(*connector), GFP_KERNEL);
 	if (!connector)
-		return -ENOMEM;
+		goto err_adapt;
+
 
 	connector->hdmi = hdmi;
 
 	bridge = devm_kzalloc(dev, sizeof(*bridge), GFP_KERNEL);
 	if (!bridge)
-		return -ENOMEM;
+		goto err_adapt;
 
 	bridge->driver_private = hdmi;
 	drm_bridge_init(drm_dev, bridge, &sti_hdmi_bridge_funcs);
@@ -662,6 +669,8 @@ err_sysfs:
 err_connector:
 	drm_bridge_cleanup(bridge);
 	drm_connector_cleanup(drm_connector);
+err_adapt:
+	put_device(&hdmi->ddc_adapt->dev);
 	return -EINVAL;
 }
 
@@ -757,13 +766,7 @@ static int sti_hdmi_probe(struct platform_device *pdev)
 		return PTR_ERR(hdmi->clk_audio);
 	}
 
-	hdmi->hpd_gpio = of_get_named_gpio(np, "hdmi,hpd-gpio", 0);
-	if (hdmi->hpd_gpio < 0) {
-		DRM_ERROR("Failed to get hdmi hpd-gpio\n");
-		return -EIO;
-	}
-
-	hdmi->hpd = gpio_get_value(hdmi->hpd_gpio);
+	hdmi->hpd = readl(hdmi->regs + HDMI_STA) & HDMI_STA_HOT_PLUG;
 
 	init_waitqueue_head(&hdmi->wait_event);
 
@@ -788,6 +791,11 @@ static int sti_hdmi_probe(struct platform_device *pdev)
 
 static int sti_hdmi_remove(struct platform_device *pdev)
 {
+	struct sti_hdmi *hdmi = dev_get_drvdata(&pdev->dev);
+
+	if (hdmi->ddc_adapt)
+		put_device(&hdmi->ddc_adapt->dev);
+
 	component_del(&pdev->dev, &sti_hdmi_ops);
 	return 0;
 }
diff --git a/drivers/gpu/drm/sti/sti_hdmi.h b/drivers/gpu/drm/sti/sti_hdmi.h
index 61bec6557ceb..3d22390e1f3b 100644
--- a/drivers/gpu/drm/sti/sti_hdmi.h
+++ b/drivers/gpu/drm/sti/sti_hdmi.h
@@ -14,6 +14,9 @@
 #define HDMI_STA           0x0010
 #define HDMI_STA_DLL_LCK   BIT(5)
 
+#define HDMI_STA_HOT_PLUG_SHIFT 4
+#define HDMI_STA_HOT_PLUG	(1 << HDMI_STA_HOT_PLUG_SHIFT)
+
 struct sti_hdmi;
 
 struct hdmi_phy_ops {
@@ -37,7 +40,6 @@ struct hdmi_phy_ops {
  * @irq_status: interrupt status register
  * @phy_ops: phy start/stop operations
  * @enabled: true if hdmi is enabled else false
- * @hpd_gpio: hdmi hot plug detect gpio number
  * @hpd: hot plug detect status
  * @wait_event: wait event
  * @event_received: wait event status
@@ -57,11 +59,11 @@ struct sti_hdmi {
 	u32 irq_status;
 	struct hdmi_phy_ops *phy_ops;
 	bool enabled;
-	int hpd_gpio;
 	bool hpd;
 	wait_queue_head_t wait_event;
 	bool event_received;
 	struct reset_control *reset;
+	struct i2c_adapter *ddc_adapt;
 };
 
 u32 hdmi_read(struct sti_hdmi *hdmi, int offset);
diff --git a/drivers/gpu/drm/sti/sti_hqvdp.c b/drivers/gpu/drm/sti/sti_hqvdp.c
new file mode 100644
index 000000000000..f3db05dab0ab
--- /dev/null
+++ b/drivers/gpu/drm/sti/sti_hqvdp.c
@@ -0,0 +1,1073 @@
+/*
+ * Copyright (C) STMicroelectronics SA 2014
+ * Authors: Fabien Dessenne <fabien.dessenne@st.com> for STMicroelectronics.
+ * License terms:  GNU General Public License (GPL), version 2
+ */
+
+#include <linux/clk.h>
+#include <linux/component.h>
+#include <linux/firmware.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/reset.h>
+
+#include <drm/drmP.h>
+
+#include "sti_drm_plane.h"
+#include "sti_hqvdp.h"
+#include "sti_hqvdp_lut.h"
+#include "sti_layer.h"
+#include "sti_vtg.h"
+
+/* Firmware name */
+#define HQVDP_FMW_NAME          "hqvdp-stih407.bin"
+
+/* Regs address */
+#define HQVDP_DMEM              0x00000000               /* 0x00000000 */
+#define HQVDP_PMEM              0x00040000               /* 0x00040000 */
+#define HQVDP_RD_PLUG           0x000E0000               /* 0x000E0000 */
+#define HQVDP_RD_PLUG_CONTROL   (HQVDP_RD_PLUG + 0x1000) /* 0x000E1000 */
+#define HQVDP_RD_PLUG_PAGE_SIZE (HQVDP_RD_PLUG + 0x1004) /* 0x000E1004 */
+#define HQVDP_RD_PLUG_MIN_OPC   (HQVDP_RD_PLUG + 0x1008) /* 0x000E1008 */
+#define HQVDP_RD_PLUG_MAX_OPC   (HQVDP_RD_PLUG + 0x100C) /* 0x000E100C */
+#define HQVDP_RD_PLUG_MAX_CHK   (HQVDP_RD_PLUG + 0x1010) /* 0x000E1010 */
+#define HQVDP_RD_PLUG_MAX_MSG   (HQVDP_RD_PLUG + 0x1014) /* 0x000E1014 */
+#define HQVDP_RD_PLUG_MIN_SPACE (HQVDP_RD_PLUG + 0x1018) /* 0x000E1018 */
+#define HQVDP_WR_PLUG           0x000E2000               /* 0x000E2000 */
+#define HQVDP_WR_PLUG_CONTROL   (HQVDP_WR_PLUG + 0x1000) /* 0x000E3000 */
+#define HQVDP_WR_PLUG_PAGE_SIZE (HQVDP_WR_PLUG + 0x1004) /* 0x000E3004 */
+#define HQVDP_WR_PLUG_MIN_OPC   (HQVDP_WR_PLUG + 0x1008) /* 0x000E3008 */
+#define HQVDP_WR_PLUG_MAX_OPC   (HQVDP_WR_PLUG + 0x100C) /* 0x000E300C */
+#define HQVDP_WR_PLUG_MAX_CHK   (HQVDP_WR_PLUG + 0x1010) /* 0x000E3010 */
+#define HQVDP_WR_PLUG_MAX_MSG   (HQVDP_WR_PLUG + 0x1014) /* 0x000E3014 */
+#define HQVDP_WR_PLUG_MIN_SPACE (HQVDP_WR_PLUG + 0x1018) /* 0x000E3018 */
+#define HQVDP_MBX               0x000E4000               /* 0x000E4000 */
+#define HQVDP_MBX_IRQ_TO_XP70   (HQVDP_MBX + 0x0000)     /* 0x000E4000 */
+#define HQVDP_MBX_INFO_HOST     (HQVDP_MBX + 0x0004)     /* 0x000E4004 */
+#define HQVDP_MBX_IRQ_TO_HOST   (HQVDP_MBX + 0x0008)     /* 0x000E4008 */
+#define HQVDP_MBX_INFO_XP70     (HQVDP_MBX + 0x000C)     /* 0x000E400C */
+#define HQVDP_MBX_SW_RESET_CTRL (HQVDP_MBX + 0x0010)     /* 0x000E4010 */
+#define HQVDP_MBX_STARTUP_CTRL1 (HQVDP_MBX + 0x0014)     /* 0x000E4014 */
+#define HQVDP_MBX_STARTUP_CTRL2 (HQVDP_MBX + 0x0018)     /* 0x000E4018 */
+#define HQVDP_MBX_GP_STATUS     (HQVDP_MBX + 0x001C)     /* 0x000E401C */
+#define HQVDP_MBX_NEXT_CMD      (HQVDP_MBX + 0x0020)     /* 0x000E4020 */
+#define HQVDP_MBX_CURRENT_CMD   (HQVDP_MBX + 0x0024)     /* 0x000E4024 */
+#define HQVDP_MBX_SOFT_VSYNC    (HQVDP_MBX + 0x0028)     /* 0x000E4028 */
+
+/* Plugs config */
+#define PLUG_CONTROL_ENABLE     0x00000001
+#define PLUG_PAGE_SIZE_256      0x00000002
+#define PLUG_MIN_OPC_8          0x00000003
+#define PLUG_MAX_OPC_64         0x00000006
+#define PLUG_MAX_CHK_2X         0x00000001
+#define PLUG_MAX_MSG_1X         0x00000000
+#define PLUG_MIN_SPACE_1        0x00000000
+
+/* SW reset CTRL */
+#define SW_RESET_CTRL_FULL      BIT(0)
+#define SW_RESET_CTRL_CORE      BIT(1)
+
+/* Startup ctrl 1 */
+#define STARTUP_CTRL1_RST_DONE  BIT(0)
+#define STARTUP_CTRL1_AUTH_IDLE BIT(2)
+
+/* Startup ctrl 2 */
+#define STARTUP_CTRL2_FETCH_EN  BIT(1)
+
+/* Info xP70 */
+#define INFO_XP70_FW_READY      BIT(15)
+#define INFO_XP70_FW_PROCESSING BIT(14)
+#define INFO_XP70_FW_INITQUEUES BIT(13)
+
+/* SOFT_VSYNC */
+#define SOFT_VSYNC_HW           0x00000000
+#define SOFT_VSYNC_SW_CMD       0x00000001
+#define SOFT_VSYNC_SW_CTRL_IRQ  0x00000003
+
+/* Reset & boot poll config */
+#define POLL_MAX_ATTEMPT        50
+#define POLL_DELAY_MS           20
+
+#define SCALE_FACTOR            8192
+#define SCALE_MAX_FOR_LEG_LUT_F 4096
+#define SCALE_MAX_FOR_LEG_LUT_E 4915
+#define SCALE_MAX_FOR_LEG_LUT_D 6654
+#define SCALE_MAX_FOR_LEG_LUT_C 8192
+
+enum sti_hvsrc_orient {
+	HVSRC_HORI,
+	HVSRC_VERT
+};
+
+/* Command structures */
+struct sti_hqvdp_top {
+	u32 config;
+	u32 mem_format;
+	u32 current_luma;
+	u32 current_enh_luma;
+	u32 current_right_luma;
+	u32 current_enh_right_luma;
+	u32 current_chroma;
+	u32 current_enh_chroma;
+	u32 current_right_chroma;
+	u32 current_enh_right_chroma;
+	u32 output_luma;
+	u32 output_chroma;
+	u32 luma_src_pitch;
+	u32 luma_enh_src_pitch;
+	u32 luma_right_src_pitch;
+	u32 luma_enh_right_src_pitch;
+	u32 chroma_src_pitch;
+	u32 chroma_enh_src_pitch;
+	u32 chroma_right_src_pitch;
+	u32 chroma_enh_right_src_pitch;
+	u32 luma_processed_pitch;
+	u32 chroma_processed_pitch;
+	u32 input_frame_size;
+	u32 input_viewport_ori;
+	u32 input_viewport_ori_right;
+	u32 input_viewport_size;
+	u32 left_view_border_width;
+	u32 right_view_border_width;
+	u32 left_view_3d_offset_width;
+	u32 right_view_3d_offset_width;
+	u32 side_stripe_color;
+	u32 crc_reset_ctrl;
+};
+
+/* Configs for interlaced : no IT, no pass thru, 3 fields */
+#define TOP_CONFIG_INTER_BTM            0x00000000
+#define TOP_CONFIG_INTER_TOP            0x00000002
+
+/* Config for progressive : no IT, no pass thru, 3 fields */
+#define TOP_CONFIG_PROGRESSIVE          0x00000001
+
+/* Default MemFormat: in=420_raster_dual out=444_raster;opaque Mem2Tv mode */
+#define TOP_MEM_FORMAT_DFLT             0x00018060
+
+/* Min/Max size */
+#define MAX_WIDTH                       0x1FFF
+#define MAX_HEIGHT                      0x0FFF
+#define MIN_WIDTH                       0x0030
+#define MIN_HEIGHT                      0x0010
+
+struct sti_hqvdp_vc1re {
+	u32 ctrl_prv_csdi;
+	u32 ctrl_cur_csdi;
+	u32 ctrl_nxt_csdi;
+	u32 ctrl_cur_fmd;
+	u32 ctrl_nxt_fmd;
+};
+
+struct sti_hqvdp_fmd {
+	u32 config;
+	u32 viewport_ori;
+	u32 viewport_size;
+	u32 next_next_luma;
+	u32 next_next_right_luma;
+	u32 next_next_next_luma;
+	u32 next_next_next_right_luma;
+	u32 threshold_scd;
+	u32 threshold_rfd;
+	u32 threshold_move;
+	u32 threshold_cfd;
+};
+
+struct sti_hqvdp_csdi {
+	u32 config;
+	u32 config2;
+	u32 dcdi_config;
+	u32 prev_luma;
+	u32 prev_enh_luma;
+	u32 prev_right_luma;
+	u32 prev_enh_right_luma;
+	u32 next_luma;
+	u32 next_enh_luma;
+	u32 next_right_luma;
+	u32 next_enh_right_luma;
+	u32 prev_chroma;
+	u32 prev_enh_chroma;
+	u32 prev_right_chroma;
+	u32 prev_enh_right_chroma;
+	u32 next_chroma;
+	u32 next_enh_chroma;
+	u32 next_right_chroma;
+	u32 next_enh_right_chroma;
+	u32 prev_motion;
+	u32 prev_right_motion;
+	u32 cur_motion;
+	u32 cur_right_motion;
+	u32 next_motion;
+	u32 next_right_motion;
+};
+
+/* Config for progressive: by pass */
+#define CSDI_CONFIG_PROG                0x00000000
+/* Config for directional deinterlacing without motion */
+#define CSDI_CONFIG_INTER_DIR           0x00000016
+/* Additional configs for fader, blender, motion,... deinterlace algorithms */
+#define CSDI_CONFIG2_DFLT               0x000001B3
+#define CSDI_DCDI_CONFIG_DFLT           0x00203803
+
+struct sti_hqvdp_hvsrc {
+	u32 hor_panoramic_ctrl;
+	u32 output_picture_size;
+	u32 init_horizontal;
+	u32 init_vertical;
+	u32 param_ctrl;
+	u32 yh_coef[NB_COEF];
+	u32 ch_coef[NB_COEF];
+	u32 yv_coef[NB_COEF];
+	u32 cv_coef[NB_COEF];
+	u32 hori_shift;
+	u32 vert_shift;
+};
+
+/* Default ParamCtrl: all controls enabled */
+#define HVSRC_PARAM_CTRL_DFLT           0xFFFFFFFF
+
+struct sti_hqvdp_iqi {
+	u32 config;
+	u32 demo_wind_size;
+	u32 pk_config;
+	u32 coeff0_coeff1;
+	u32 coeff2_coeff3;
+	u32 coeff4;
+	u32 pk_lut;
+	u32 pk_gain;
+	u32 pk_coring_level;
+	u32 cti_config;
+	u32 le_config;
+	u32 le_lut[64];
+	u32 con_bri;
+	u32 sat_gain;
+	u32 pxf_conf;
+	u32 default_color;
+};
+
+/* Default Config : IQI bypassed */
+#define IQI_CONFIG_DFLT                 0x00000001
+/* Default Contrast & Brightness gain = 256 */
+#define IQI_CON_BRI_DFLT                0x00000100
+/* Default Saturation gain = 256 */
+#define IQI_SAT_GAIN_DFLT               0x00000100
+/* Default PxfConf : P2I bypassed */
+#define IQI_PXF_CONF_DFLT               0x00000001
+
+struct sti_hqvdp_top_status {
+	u32 processing_time;
+	u32 input_y_crc;
+	u32 input_uv_crc;
+};
+
+struct sti_hqvdp_fmd_status {
+	u32 fmd_repeat_move_status;
+	u32 fmd_scene_count_status;
+	u32 cfd_sum;
+	u32 field_sum;
+	u32 next_y_fmd_crc;
+	u32 next_next_y_fmd_crc;
+	u32 next_next_next_y_fmd_crc;
+};
+
+struct sti_hqvdp_csdi_status {
+	u32 prev_y_csdi_crc;
+	u32 cur_y_csdi_crc;
+	u32 next_y_csdi_crc;
+	u32 prev_uv_csdi_crc;
+	u32 cur_uv_csdi_crc;
+	u32 next_uv_csdi_crc;
+	u32 y_csdi_crc;
+	u32 uv_csdi_crc;
+	u32 uv_cup_crc;
+	u32 mot_csdi_crc;
+	u32 mot_cur_csdi_crc;
+	u32 mot_prev_csdi_crc;
+};
+
+struct sti_hqvdp_hvsrc_status {
+	u32 y_hvsrc_crc;
+	u32 u_hvsrc_crc;
+	u32 v_hvsrc_crc;
+};
+
+struct sti_hqvdp_iqi_status {
+	u32 pxf_it_status;
+	u32 y_iqi_crc;
+	u32 u_iqi_crc;
+	u32 v_iqi_crc;
+};
+
+/* Main commands. We use 2 commands one being processed by the firmware, one
+ * ready to be fetched upon next Vsync*/
+#define NB_VDP_CMD	2
+
+struct sti_hqvdp_cmd {
+	struct sti_hqvdp_top top;
+	struct sti_hqvdp_vc1re vc1re;
+	struct sti_hqvdp_fmd fmd;
+	struct sti_hqvdp_csdi csdi;
+	struct sti_hqvdp_hvsrc hvsrc;
+	struct sti_hqvdp_iqi iqi;
+	struct sti_hqvdp_top_status top_status;
+	struct sti_hqvdp_fmd_status fmd_status;
+	struct sti_hqvdp_csdi_status csdi_status;
+	struct sti_hqvdp_hvsrc_status hvsrc_status;
+	struct sti_hqvdp_iqi_status iqi_status;
+};
+
+/*
+ * STI HQVDP structure
+ *
+ * @dev:               driver device
+ * @drm_dev:           the drm device
+ * @regs:              registers
+ * @layer:             layer structure for hqvdp it self
+ * @vid_plane:         VID plug used as link with compositor IP
+ * @clk:               IP clock
+ * @clk_pix_main:      pix main clock
+ * @reset:             reset control
+ * @vtg_nb:            notifier to handle VTG Vsync
+ * @btm_field_pending: is there any bottom field (interlaced frame) to display
+ * @curr_field_count:  number of field updates
+ * @last_field_count:  number of field updates since last fps measure
+ * @hqvdp_cmd:         buffer of commands
+ * @hqvdp_cmd_paddr:   physical address of hqvdp_cmd
+ * @vtg:               vtg for main data path
+ */
+struct sti_hqvdp {
+	struct device *dev;
+	struct drm_device *drm_dev;
+	void __iomem *regs;
+	struct sti_layer layer;
+	struct drm_plane *vid_plane;
+	struct clk *clk;
+	struct clk *clk_pix_main;
+	struct reset_control *reset;
+	struct notifier_block vtg_nb;
+	bool btm_field_pending;
+	unsigned int curr_field_count;
+	unsigned int last_field_count;
+	void *hqvdp_cmd;
+	dma_addr_t hqvdp_cmd_paddr;
+	struct sti_vtg *vtg;
+};
+
+#define to_sti_hqvdp(x) container_of(x, struct sti_hqvdp, layer)
+
+static const uint32_t hqvdp_supported_formats[] = {
+	DRM_FORMAT_NV12,
+};
+
+static const uint32_t *sti_hqvdp_get_formats(struct sti_layer *layer)
+{
+	return hqvdp_supported_formats;
+}
+
+static unsigned int sti_hqvdp_get_nb_formats(struct sti_layer *layer)
+{
+	return ARRAY_SIZE(hqvdp_supported_formats);
+}
+
+/**
+ * sti_hqvdp_get_free_cmd
+ * @hqvdp: hqvdp structure
+ *
+ * Look for a hqvdp_cmd that is not being used (or about to be used) by the FW.
+ *
+ * RETURNS:
+ * the offset of the command to be used.
+ * -1 in error cases
+ */
+static int sti_hqvdp_get_free_cmd(struct sti_hqvdp *hqvdp)
+{
+	int curr_cmd, next_cmd;
+	dma_addr_t cmd = hqvdp->hqvdp_cmd_paddr;
+	int i;
+
+	curr_cmd = readl(hqvdp->regs + HQVDP_MBX_CURRENT_CMD);
+	next_cmd = readl(hqvdp->regs + HQVDP_MBX_NEXT_CMD);
+
+	for (i = 0; i < NB_VDP_CMD; i++) {
+		if ((cmd != curr_cmd) && (cmd != next_cmd))
+			return i * sizeof(struct sti_hqvdp_cmd);
+		cmd += sizeof(struct sti_hqvdp_cmd);
+	}
+
+	return -1;
+}
+
+/**
+ * sti_hqvdp_get_curr_cmd
+ * @hqvdp: hqvdp structure
+ *
+ * Look for the hqvdp_cmd that is being used by the FW.
+ *
+ * RETURNS:
+ *  the offset of the command to be used.
+ * -1 in error cases
+ */
+static int sti_hqvdp_get_curr_cmd(struct sti_hqvdp *hqvdp)
+{
+	int curr_cmd;
+	dma_addr_t cmd = hqvdp->hqvdp_cmd_paddr;
+	unsigned int i;
+
+	curr_cmd = readl(hqvdp->regs + HQVDP_MBX_CURRENT_CMD);
+
+	for (i = 0; i < NB_VDP_CMD; i++) {
+		if (cmd == curr_cmd)
+			return i * sizeof(struct sti_hqvdp_cmd);
+
+		cmd += sizeof(struct sti_hqvdp_cmd);
+	}
+
+	return -1;
+}
+
+/**
+ * sti_hqvdp_update_hvsrc
+ * @orient: horizontal or vertical
+ * @scale:  scaling/zoom factor
+ * @hvsrc:  the structure containing the LUT coef
+ *
+ * Update the Y and C Lut coef, as well as the shift param
+ *
+ * RETURNS:
+ * None.
+ */
+static void sti_hqvdp_update_hvsrc(enum sti_hvsrc_orient orient, int scale,
+		struct sti_hqvdp_hvsrc *hvsrc)
+{
+	const int *coef_c, *coef_y;
+	int shift_c, shift_y;
+
+	/* Get the appropriate coef tables */
+	if (scale < SCALE_MAX_FOR_LEG_LUT_F) {
+		coef_y = coef_lut_f_y_legacy;
+		coef_c = coef_lut_f_c_legacy;
+		shift_y = SHIFT_LUT_F_Y_LEGACY;
+		shift_c = SHIFT_LUT_F_C_LEGACY;
+	} else if (scale < SCALE_MAX_FOR_LEG_LUT_E) {
+		coef_y = coef_lut_e_y_legacy;
+		coef_c = coef_lut_e_c_legacy;
+		shift_y = SHIFT_LUT_E_Y_LEGACY;
+		shift_c = SHIFT_LUT_E_C_LEGACY;
+	} else if (scale < SCALE_MAX_FOR_LEG_LUT_D) {
+		coef_y = coef_lut_d_y_legacy;
+		coef_c = coef_lut_d_c_legacy;
+		shift_y = SHIFT_LUT_D_Y_LEGACY;
+		shift_c = SHIFT_LUT_D_C_LEGACY;
+	} else if (scale < SCALE_MAX_FOR_LEG_LUT_C) {
+		coef_y = coef_lut_c_y_legacy;
+		coef_c = coef_lut_c_c_legacy;
+		shift_y = SHIFT_LUT_C_Y_LEGACY;
+		shift_c = SHIFT_LUT_C_C_LEGACY;
+	} else if (scale == SCALE_MAX_FOR_LEG_LUT_C) {
+		coef_y = coef_c = coef_lut_b;
+		shift_y = shift_c = SHIFT_LUT_B;
+	} else {
+		coef_y = coef_c = coef_lut_a_legacy;
+		shift_y = shift_c = SHIFT_LUT_A_LEGACY;
+	}
+
+	if (orient == HVSRC_HORI) {
+		hvsrc->hori_shift = (shift_c << 16) | shift_y;
+		memcpy(hvsrc->yh_coef, coef_y, sizeof(hvsrc->yh_coef));
+		memcpy(hvsrc->ch_coef, coef_c, sizeof(hvsrc->ch_coef));
+	} else {
+		hvsrc->vert_shift = (shift_c << 16) | shift_y;
+		memcpy(hvsrc->yv_coef, coef_y, sizeof(hvsrc->yv_coef));
+		memcpy(hvsrc->cv_coef, coef_c, sizeof(hvsrc->cv_coef));
+	}
+}
+
+/**
+ * sti_hqvdp_check_hw_scaling
+ * @layer: hqvdp layer
+ *
+ * Check if the HW is able to perform the scaling request
+ * The firmware scaling limitation is "CEIL(1/Zy) <= FLOOR(LFW)" where:
+ *   Zy = OutputHeight / InputHeight
+ *   LFW = (Tx * IPClock) / (MaxNbCycles * Cp)
+ *     Tx : Total video mode horizontal resolution
+ *     IPClock : HQVDP IP clock (Mhz)
+ *     MaxNbCycles: max(InputWidth, OutputWidth)
+ *     Cp: Video mode pixel clock (Mhz)
+ *
+ * RETURNS:
+ * True if the HW can scale.
+ */
+static bool sti_hqvdp_check_hw_scaling(struct sti_layer *layer)
+{
+	struct sti_hqvdp *hqvdp = to_sti_hqvdp(layer);
+	unsigned long lfw;
+	unsigned int inv_zy;
+
+	lfw = layer->mode->htotal * (clk_get_rate(hqvdp->clk) / 1000000);
+	lfw /= max(layer->src_w, layer->dst_w) * layer->mode->clock / 1000;
+
+	inv_zy = DIV_ROUND_UP(layer->src_h, layer->dst_h);
+
+	return (inv_zy <= lfw) ? true : false;
+}
+
+/**
+ * sti_hqvdp_prepare_layer
+ * @layer: hqvdp layer
+ * @first_prepare: true if it is the first time this function is called
+ *
+ * Prepares a command for the firmware
+ *
+ * RETURNS:
+ * 0 on success.
+ */
+static int sti_hqvdp_prepare_layer(struct sti_layer *layer, bool first_prepare)
+{
+	struct sti_hqvdp *hqvdp = to_sti_hqvdp(layer);
+	struct sti_hqvdp_cmd *cmd;
+	int scale_h, scale_v;
+	int cmd_offset;
+
+	dev_dbg(hqvdp->dev, "%s %s\n", __func__, sti_layer_to_str(layer));
+
+	/* prepare and commit VID plane */
+	hqvdp->vid_plane->funcs->update_plane(hqvdp->vid_plane,
+					layer->crtc, layer->fb,
+					layer->dst_x, layer->dst_y,
+					layer->dst_w, layer->dst_h,
+					layer->src_x, layer->src_y,
+					layer->src_w, layer->src_h);
+
+	cmd_offset = sti_hqvdp_get_free_cmd(hqvdp);
+	if (cmd_offset == -1) {
+		DRM_ERROR("No available hqvdp_cmd now\n");
+		return -EBUSY;
+	}
+	cmd = hqvdp->hqvdp_cmd + cmd_offset;
+
+	if (!sti_hqvdp_check_hw_scaling(layer)) {
+		DRM_ERROR("Scaling beyond HW capabilities\n");
+		return -EINVAL;
+	}
+
+	/* Static parameters, defaulting to progressive mode */
+	cmd->top.config = TOP_CONFIG_PROGRESSIVE;
+	cmd->top.mem_format = TOP_MEM_FORMAT_DFLT;
+	cmd->hvsrc.param_ctrl = HVSRC_PARAM_CTRL_DFLT;
+	cmd->csdi.config = CSDI_CONFIG_PROG;
+
+	/* VC1RE, FMD bypassed : keep everything set to 0
+	 * IQI/P2I bypassed */
+	cmd->iqi.config = IQI_CONFIG_DFLT;
+	cmd->iqi.con_bri = IQI_CON_BRI_DFLT;
+	cmd->iqi.sat_gain = IQI_SAT_GAIN_DFLT;
+	cmd->iqi.pxf_conf = IQI_PXF_CONF_DFLT;
+
+	/* Buffer planes address */
+	cmd->top.current_luma = (u32) layer->paddr + layer->offsets[0];
+	cmd->top.current_chroma = (u32) layer->paddr + layer->offsets[1];
+
+	/* Pitches */
+	cmd->top.luma_processed_pitch = cmd->top.luma_src_pitch =
+			layer->pitches[0];
+	cmd->top.chroma_processed_pitch = cmd->top.chroma_src_pitch =
+			layer->pitches[1];
+
+	/* Input / output size
+	 * Align to upper even value */
+	layer->dst_w = ALIGN(layer->dst_w, 2);
+	layer->dst_h = ALIGN(layer->dst_h, 2);
+
+	if ((layer->src_w > MAX_WIDTH) || (layer->src_w < MIN_WIDTH) ||
+	    (layer->src_h > MAX_HEIGHT) || (layer->src_h < MIN_HEIGHT) ||
+	    (layer->dst_w > MAX_WIDTH) || (layer->dst_w < MIN_WIDTH) ||
+	    (layer->dst_h > MAX_HEIGHT) || (layer->dst_h < MIN_HEIGHT)) {
+		DRM_ERROR("Invalid in/out size %dx%d -> %dx%d\n",
+				layer->src_w, layer->src_h,
+				layer->dst_w, layer->dst_h);
+		return -EINVAL;
+	}
+	cmd->top.input_viewport_size = cmd->top.input_frame_size =
+			layer->src_h << 16 | layer->src_w;
+	cmd->hvsrc.output_picture_size = layer->dst_h << 16 | layer->dst_w;
+	cmd->top.input_viewport_ori = layer->src_y << 16 | layer->src_x;
+
+	/* Handle interlaced */
+	if (layer->fb->flags & DRM_MODE_FB_INTERLACED) {
+		/* Top field to display */
+		cmd->top.config = TOP_CONFIG_INTER_TOP;
+
+		/* Update pitches and vert size */
+		cmd->top.input_frame_size = (layer->src_h / 2) << 16 |
+					     layer->src_w;
+		cmd->top.luma_processed_pitch *= 2;
+		cmd->top.luma_src_pitch *= 2;
+		cmd->top.chroma_processed_pitch *= 2;
+		cmd->top.chroma_src_pitch *= 2;
+
+		/* Enable directional deinterlacing processing */
+		cmd->csdi.config = CSDI_CONFIG_INTER_DIR;
+		cmd->csdi.config2 = CSDI_CONFIG2_DFLT;
+		cmd->csdi.dcdi_config = CSDI_DCDI_CONFIG_DFLT;
+	}
+
+	/* Update hvsrc lut coef */
+	scale_h = SCALE_FACTOR * layer->dst_w / layer->src_w;
+	sti_hqvdp_update_hvsrc(HVSRC_HORI, scale_h, &cmd->hvsrc);
+
+	scale_v = SCALE_FACTOR * layer->dst_h / layer->src_h;
+	sti_hqvdp_update_hvsrc(HVSRC_VERT, scale_v, &cmd->hvsrc);
+
+	if (first_prepare) {
+		/* Prevent VTG shutdown */
+		if (clk_prepare_enable(hqvdp->clk_pix_main)) {
+			DRM_ERROR("Failed to prepare/enable pix main clk\n");
+			return -ENXIO;
+		}
+
+		/* Register VTG Vsync callback to handle bottom fields */
+		if ((layer->fb->flags & DRM_MODE_FB_INTERLACED) &&
+				sti_vtg_register_client(hqvdp->vtg,
+					&hqvdp->vtg_nb, layer->mixer_id)) {
+			DRM_ERROR("Cannot register VTG notifier\n");
+			return -ENXIO;
+		}
+	}
+
+	return 0;
+}
+
+static int sti_hqvdp_commit_layer(struct sti_layer *layer)
+{
+	struct sti_hqvdp *hqvdp = to_sti_hqvdp(layer);
+	int cmd_offset;
+
+	dev_dbg(hqvdp->dev, "%s %s\n", __func__, sti_layer_to_str(layer));
+
+	cmd_offset = sti_hqvdp_get_free_cmd(hqvdp);
+	if (cmd_offset == -1) {
+		DRM_ERROR("No available hqvdp_cmd now\n");
+		return -EBUSY;
+	}
+
+	writel(hqvdp->hqvdp_cmd_paddr + cmd_offset,
+			hqvdp->regs + HQVDP_MBX_NEXT_CMD);
+
+	hqvdp->curr_field_count++;
+
+	/* Interlaced : get ready to display the bottom field at next Vsync */
+	if (layer->fb->flags & DRM_MODE_FB_INTERLACED)
+		hqvdp->btm_field_pending = true;
+
+	dev_dbg(hqvdp->dev, "%s Posted command:0x%x\n",
+			__func__, hqvdp->hqvdp_cmd_paddr + cmd_offset);
+
+	return 0;
+}
+
+static int sti_hqvdp_disable_layer(struct sti_layer *layer)
+{
+	struct sti_hqvdp *hqvdp = to_sti_hqvdp(layer);
+	int i;
+
+	DRM_DEBUG_DRIVER("%s\n", sti_layer_to_str(layer));
+
+	/* Unregister VTG Vsync callback */
+	if ((layer->fb->flags & DRM_MODE_FB_INTERLACED) &&
+		sti_vtg_unregister_client(hqvdp->vtg, &hqvdp->vtg_nb))
+		DRM_DEBUG_DRIVER("Warning: cannot unregister VTG notifier\n");
+
+	/* Set next cmd to NULL */
+	writel(0, hqvdp->regs + HQVDP_MBX_NEXT_CMD);
+
+	for (i = 0; i < POLL_MAX_ATTEMPT; i++) {
+		if (readl(hqvdp->regs + HQVDP_MBX_INFO_XP70)
+				& INFO_XP70_FW_READY)
+			break;
+		msleep(POLL_DELAY_MS);
+	}
+
+	/* VTG can stop now */
+	clk_disable_unprepare(hqvdp->clk_pix_main);
+
+	if (i == POLL_MAX_ATTEMPT) {
+		DRM_ERROR("XP70 could not revert to idle\n");
+		return -ENXIO;
+	}
+
+	/* disable VID plane */
+	hqvdp->vid_plane->funcs->disable_plane(hqvdp->vid_plane);
+
+	return 0;
+}
+
+/**
+ * sti_vdp_vtg_cb
+ * @nb: notifier block
+ * @evt: event message
+ * @data: private data
+ *
+ * Handle VTG Vsync event, display pending bottom field
+ *
+ * RETURNS:
+ * 0 on success.
+ */
+int sti_hqvdp_vtg_cb(struct notifier_block *nb, unsigned long evt, void *data)
+{
+	struct sti_hqvdp *hqvdp = container_of(nb, struct sti_hqvdp, vtg_nb);
+	int btm_cmd_offset, top_cmd_offest;
+	struct sti_hqvdp_cmd *btm_cmd, *top_cmd;
+
+	if ((evt != VTG_TOP_FIELD_EVENT) && (evt != VTG_BOTTOM_FIELD_EVENT)) {
+		DRM_DEBUG_DRIVER("Unknown event\n");
+		return 0;
+	}
+
+	if (hqvdp->btm_field_pending) {
+		/* Create the btm field command from the current one */
+		btm_cmd_offset = sti_hqvdp_get_free_cmd(hqvdp);
+		top_cmd_offest = sti_hqvdp_get_curr_cmd(hqvdp);
+		if ((btm_cmd_offset == -1) || (top_cmd_offest == -1)) {
+			DRM_ERROR("Cannot get cmds, skip btm field\n");
+			return -EBUSY;
+		}
+
+		btm_cmd = hqvdp->hqvdp_cmd + btm_cmd_offset;
+		top_cmd = hqvdp->hqvdp_cmd + top_cmd_offest;
+
+		memcpy(btm_cmd, top_cmd, sizeof(*btm_cmd));
+
+		btm_cmd->top.config = TOP_CONFIG_INTER_BTM;
+		btm_cmd->top.current_luma +=
+				btm_cmd->top.luma_src_pitch / 2;
+		btm_cmd->top.current_chroma +=
+				btm_cmd->top.chroma_src_pitch / 2;
+
+		/* Post the command to mailbox */
+		writel(hqvdp->hqvdp_cmd_paddr + btm_cmd_offset,
+				hqvdp->regs + HQVDP_MBX_NEXT_CMD);
+
+		hqvdp->curr_field_count++;
+		hqvdp->btm_field_pending = false;
+
+		dev_dbg(hqvdp->dev, "%s Posted command:0x%x\n",
+				__func__, hqvdp->hqvdp_cmd_paddr);
+	}
+
+	return 0;
+}
+
+static struct drm_plane *sti_hqvdp_find_vid(struct drm_device *dev, int id)
+{
+	struct drm_plane *plane;
+
+	list_for_each_entry(plane, &dev->mode_config.plane_list, head) {
+		struct sti_layer *layer = to_sti_layer(plane);
+
+		if (layer->desc == id)
+			return plane;
+	}
+
+	return NULL;
+}
+
+static void sti_hqvd_init(struct sti_layer *layer)
+{
+	struct sti_hqvdp *hqvdp = to_sti_hqvdp(layer);
+	int size;
+
+	/* find the plane macthing with vid 0 */
+	hqvdp->vid_plane = sti_hqvdp_find_vid(hqvdp->drm_dev, STI_VID_0);
+	if (!hqvdp->vid_plane) {
+		DRM_ERROR("Cannot find Main video layer\n");
+		return;
+	}
+
+	hqvdp->vtg_nb.notifier_call = sti_hqvdp_vtg_cb;
+
+	/* Allocate memory for the VDP commands */
+	size = NB_VDP_CMD * sizeof(struct sti_hqvdp_cmd);
+	hqvdp->hqvdp_cmd = dma_alloc_writecombine(hqvdp->dev, size,
+					 &hqvdp->hqvdp_cmd_paddr,
+					 GFP_KERNEL | GFP_DMA);
+	if (!hqvdp->hqvdp_cmd) {
+		DRM_ERROR("Failed to allocate memory for VDP cmd\n");
+		return;
+	}
+
+	memset(hqvdp->hqvdp_cmd, 0, size);
+}
+
+static const struct sti_layer_funcs hqvdp_ops = {
+	.get_formats = sti_hqvdp_get_formats,
+	.get_nb_formats = sti_hqvdp_get_nb_formats,
+	.init = sti_hqvd_init,
+	.prepare = sti_hqvdp_prepare_layer,
+	.commit = sti_hqvdp_commit_layer,
+	.disable = sti_hqvdp_disable_layer,
+};
+
+struct sti_layer *sti_hqvdp_create(struct device *dev)
+{
+	struct sti_hqvdp *hqvdp = dev_get_drvdata(dev);
+
+	hqvdp->layer.ops = &hqvdp_ops;
+
+	return &hqvdp->layer;
+}
+EXPORT_SYMBOL(sti_hqvdp_create);
+
+static void sti_hqvdp_init_plugs(struct sti_hqvdp *hqvdp)
+{
+	/* Configure Plugs (same for RD & WR) */
+	writel(PLUG_PAGE_SIZE_256, hqvdp->regs + HQVDP_RD_PLUG_PAGE_SIZE);
+	writel(PLUG_MIN_OPC_8, hqvdp->regs + HQVDP_RD_PLUG_MIN_OPC);
+	writel(PLUG_MAX_OPC_64, hqvdp->regs + HQVDP_RD_PLUG_MAX_OPC);
+	writel(PLUG_MAX_CHK_2X, hqvdp->regs + HQVDP_RD_PLUG_MAX_CHK);
+	writel(PLUG_MAX_MSG_1X, hqvdp->regs + HQVDP_RD_PLUG_MAX_MSG);
+	writel(PLUG_MIN_SPACE_1, hqvdp->regs + HQVDP_RD_PLUG_MIN_SPACE);
+	writel(PLUG_CONTROL_ENABLE, hqvdp->regs + HQVDP_RD_PLUG_CONTROL);
+
+	writel(PLUG_PAGE_SIZE_256, hqvdp->regs + HQVDP_WR_PLUG_PAGE_SIZE);
+	writel(PLUG_MIN_OPC_8, hqvdp->regs + HQVDP_WR_PLUG_MIN_OPC);
+	writel(PLUG_MAX_OPC_64, hqvdp->regs + HQVDP_WR_PLUG_MAX_OPC);
+	writel(PLUG_MAX_CHK_2X, hqvdp->regs + HQVDP_WR_PLUG_MAX_CHK);
+	writel(PLUG_MAX_MSG_1X, hqvdp->regs + HQVDP_WR_PLUG_MAX_MSG);
+	writel(PLUG_MIN_SPACE_1, hqvdp->regs + HQVDP_WR_PLUG_MIN_SPACE);
+	writel(PLUG_CONTROL_ENABLE, hqvdp->regs + HQVDP_WR_PLUG_CONTROL);
+}
+
+/**
+ * sti_hqvdp_start_xp70
+ * @firmware: firmware found
+ * @ctxt:     hqvdp structure
+ *
+ * Run the xP70 initialization sequence
+ */
+static void sti_hqvdp_start_xp70(const struct firmware *firmware, void *ctxt)
+{
+	struct sti_hqvdp *hqvdp = ctxt;
+	u32 *fw_rd_plug, *fw_wr_plug, *fw_pmem, *fw_dmem;
+	u8 *data;
+	int i;
+	struct fw_header {
+		int rd_size;
+		int wr_size;
+		int pmem_size;
+		int dmem_size;
+	} *header;
+
+	DRM_DEBUG_DRIVER("\n");
+	/* Check firmware parts */
+	if (!firmware) {
+		DRM_ERROR("Firmware not available\n");
+		return;
+	}
+
+	header = (struct fw_header *) firmware->data;
+	if (firmware->size < sizeof(*header)) {
+		DRM_ERROR("Invalid firmware size (%d)\n", firmware->size);
+		goto out;
+	}
+	if ((sizeof(*header) + header->rd_size + header->wr_size +
+		header->pmem_size + header->dmem_size) != firmware->size) {
+		DRM_ERROR("Invalid fmw structure (%d+%d+%d+%d+%d != %d)\n",
+			   sizeof(*header), header->rd_size, header->wr_size,
+			   header->pmem_size, header->dmem_size,
+			   firmware->size);
+		goto out;
+	}
+
+	data = (u8 *) firmware->data;
+	data += sizeof(*header);
+	fw_rd_plug = (void *) data;
+	data += header->rd_size;
+	fw_wr_plug = (void *) data;
+	data += header->wr_size;
+	fw_pmem = (void *) data;
+	data += header->pmem_size;
+	fw_dmem = (void *) data;
+
+	/* Enable clock */
+	if (clk_prepare_enable(hqvdp->clk))
+		DRM_ERROR("Failed to prepare/enable HQVDP clk\n");
+
+	/* Reset */
+	writel(SW_RESET_CTRL_FULL, hqvdp->regs + HQVDP_MBX_SW_RESET_CTRL);
+
+	for (i = 0; i < POLL_MAX_ATTEMPT; i++) {
+		if (readl(hqvdp->regs + HQVDP_MBX_STARTUP_CTRL1)
+				& STARTUP_CTRL1_RST_DONE)
+			break;
+		msleep(POLL_DELAY_MS);
+	}
+	if (i == POLL_MAX_ATTEMPT) {
+		DRM_ERROR("Could not reset\n");
+		goto out;
+	}
+
+	/* Init Read & Write plugs */
+	for (i = 0; i < header->rd_size / 4; i++)
+		writel(fw_rd_plug[i], hqvdp->regs + HQVDP_RD_PLUG + i * 4);
+	for (i = 0; i < header->wr_size / 4; i++)
+		writel(fw_wr_plug[i], hqvdp->regs + HQVDP_WR_PLUG + i * 4);
+
+	sti_hqvdp_init_plugs(hqvdp);
+
+	/* Authorize Idle Mode */
+	writel(STARTUP_CTRL1_AUTH_IDLE, hqvdp->regs + HQVDP_MBX_STARTUP_CTRL1);
+
+	/* Prevent VTG interruption during the boot */
+	writel(SOFT_VSYNC_SW_CTRL_IRQ, hqvdp->regs + HQVDP_MBX_SOFT_VSYNC);
+	writel(0, hqvdp->regs + HQVDP_MBX_NEXT_CMD);
+
+	/* Download PMEM & DMEM */
+	for (i = 0; i < header->pmem_size / 4; i++)
+		writel(fw_pmem[i], hqvdp->regs + HQVDP_PMEM + i * 4);
+	for (i = 0; i < header->dmem_size / 4; i++)
+		writel(fw_dmem[i], hqvdp->regs + HQVDP_DMEM + i * 4);
+
+	/* Enable fetch */
+	writel(STARTUP_CTRL2_FETCH_EN, hqvdp->regs + HQVDP_MBX_STARTUP_CTRL2);
+
+	/* Wait end of boot */
+	for (i = 0; i < POLL_MAX_ATTEMPT; i++) {
+		if (readl(hqvdp->regs + HQVDP_MBX_INFO_XP70)
+				& INFO_XP70_FW_READY)
+			break;
+		msleep(POLL_DELAY_MS);
+	}
+	if (i == POLL_MAX_ATTEMPT) {
+		DRM_ERROR("Could not boot\n");
+		goto out;
+	}
+
+	/* Launch Vsync */
+	writel(SOFT_VSYNC_HW, hqvdp->regs + HQVDP_MBX_SOFT_VSYNC);
+
+	DRM_INFO("HQVDP XP70 started\n");
+out:
+	release_firmware(firmware);
+}
+
+int sti_hqvdp_bind(struct device *dev, struct device *master, void *data)
+{
+	struct sti_hqvdp *hqvdp = dev_get_drvdata(dev);
+	struct drm_device *drm_dev = data;
+	struct sti_layer *layer;
+	int err;
+
+	DRM_DEBUG_DRIVER("\n");
+
+	hqvdp->drm_dev = drm_dev;
+
+	/* Request for firmware */
+	err = request_firmware_nowait(THIS_MODULE, FW_ACTION_HOTPLUG,
+				HQVDP_FMW_NAME,	hqvdp->dev,
+				GFP_KERNEL, hqvdp, sti_hqvdp_start_xp70);
+	if (err) {
+		DRM_ERROR("Can't get HQVDP firmware\n");
+		return err;
+	}
+
+	layer = sti_layer_create(hqvdp->dev, STI_HQVDP_0, hqvdp->regs);
+	if (!layer) {
+		DRM_ERROR("Can't create HQVDP plane\n");
+		return -ENOMEM;
+	}
+
+	sti_drm_plane_init(drm_dev, layer, 1, DRM_PLANE_TYPE_OVERLAY);
+
+	return 0;
+}
+
+static void sti_hqvdp_unbind(struct device *dev,
+		struct device *master, void *data)
+{
+	/* do nothing */
+}
+
+static const struct component_ops sti_hqvdp_ops = {
+	.bind = sti_hqvdp_bind,
+	.unbind = sti_hqvdp_unbind,
+};
+
+static int sti_hqvdp_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct device_node *vtg_np;
+	struct sti_hqvdp *hqvdp;
+	struct resource *res;
+
+	DRM_DEBUG_DRIVER("\n");
+
+	hqvdp = devm_kzalloc(dev, sizeof(*hqvdp), GFP_KERNEL);
+	if (!hqvdp) {
+		DRM_ERROR("Failed to allocate HQVDP context\n");
+		return -ENOMEM;
+	}
+
+	hqvdp->dev = dev;
+
+	/* Get Memory resources */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (res == NULL) {
+		DRM_ERROR("Get memory resource failed\n");
+		return -ENXIO;
+	}
+	hqvdp->regs = devm_ioremap(dev, res->start, resource_size(res));
+	if (hqvdp->regs == NULL) {
+		DRM_ERROR("Register mapping failed\n");
+		return -ENXIO;
+	}
+
+	/* Get clock resources */
+	hqvdp->clk = devm_clk_get(dev, "hqvdp");
+	hqvdp->clk_pix_main = devm_clk_get(dev, "pix_main");
+	if (IS_ERR(hqvdp->clk) || IS_ERR(hqvdp->clk)) {
+		DRM_ERROR("Cannot get clocks\n");
+		return -ENXIO;
+	}
+
+	/* Get reset resources */
+	hqvdp->reset = devm_reset_control_get(dev, "hqvdp");
+	if (!IS_ERR(hqvdp->reset))
+		reset_control_deassert(hqvdp->reset);
+
+	vtg_np = of_parse_phandle(pdev->dev.of_node, "st,vtg", 0);
+	if (vtg_np)
+		hqvdp->vtg = of_vtg_find(vtg_np);
+
+	platform_set_drvdata(pdev, hqvdp);
+
+	return component_add(&pdev->dev, &sti_hqvdp_ops);
+}
+
+static int sti_hqvdp_remove(struct platform_device *pdev)
+{
+	component_del(&pdev->dev, &sti_hqvdp_ops);
+	return 0;
+}
+
+static struct of_device_id hqvdp_of_match[] = {
+	{ .compatible = "st,stih407-hqvdp", },
+	{ /* end node */ }
+};
+MODULE_DEVICE_TABLE(of, hqvdp_of_match);
+
+struct platform_driver sti_hqvdp_driver = {
+	.driver = {
+		.name = "sti-hqvdp",
+		.owner = THIS_MODULE,
+		.of_match_table = hqvdp_of_match,
+	},
+	.probe = sti_hqvdp_probe,
+	.remove = sti_hqvdp_remove,
+};
+
+module_platform_driver(sti_hqvdp_driver);
+
+MODULE_AUTHOR("Benjamin Gaignard <benjamin.gaignard@st.com>");
+MODULE_DESCRIPTION("STMicroelectronics SoC DRM driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/drm/sti/sti_hqvdp.h b/drivers/gpu/drm/sti/sti_hqvdp.h
new file mode 100644
index 000000000000..cd5ecd0a6dea
--- /dev/null
+++ b/drivers/gpu/drm/sti/sti_hqvdp.h
@@ -0,0 +1,12 @@
+/*
+ * Copyright (C) STMicroelectronics SA 2014
+ * Authors: Fabien Dessenne <fabien.dessenne@st.com> for STMicroelectronics.
+ * License terms:  GNU General Public License (GPL), version 2
+ */
+
+#ifndef _STI_HQVDP_H_
+#define _STI_HQVDP_H_
+
+struct sti_layer *sti_hqvdp_create(struct device *dev);
+
+#endif
diff --git a/drivers/gpu/drm/sti/sti_hqvdp_lut.h b/drivers/gpu/drm/sti/sti_hqvdp_lut.h
new file mode 100644
index 000000000000..619af7f4384e
--- /dev/null
+++ b/drivers/gpu/drm/sti/sti_hqvdp_lut.h
@@ -0,0 +1,373 @@
+/*
+ * Copyright (C) STMicroelectronics SA 2014
+ * Authors: Fabien Dessenne <fabien.dessenne@st.com> for STMicroelectronics.
+ * License terms:  GNU General Public License (GPL), version 2
+ */
+
+#ifndef _STI_HQVDP_LUT_H_
+#define _STI_HQVDP_LUT_H_
+
+#define NB_COEF                 128
+
+#define SHIFT_LUT_A_LEGACY      8
+#define SHIFT_LUT_B             8
+#define SHIFT_LUT_C_Y_LEGACY    8
+#define SHIFT_LUT_C_C_LEGACY    8
+#define SHIFT_LUT_D_Y_LEGACY    8
+#define SHIFT_LUT_D_C_LEGACY    8
+#define SHIFT_LUT_E_Y_LEGACY    8
+#define SHIFT_LUT_E_C_LEGACY    8
+#define SHIFT_LUT_F_Y_LEGACY    8
+#define SHIFT_LUT_F_C_LEGACY    8
+
+static const u32 coef_lut_a_legacy[NB_COEF] = {
+	0x0000ffff, 0x00010000, 0x000100ff, 0x00000000,
+	0x00000000, 0x00050000, 0xfffc00ff, 0x00000000,
+	0x00000000, 0x00090000, 0xfff900fe, 0x00000000,
+	0x00000000, 0x0010ffff, 0xfff600fb, 0x00000000,
+	0x00000000, 0x0017fffe, 0xfff400f7, 0x00000000,
+	0x00000000, 0x001ffffd, 0xfff200f2, 0x00000000,
+	0x00000000, 0x0027fffc, 0xfff100ec, 0x00000000,
+	0x00000000, 0x0030fffb, 0xfff000e5, 0x00000000,
+	0x00000000, 0x003afffa, 0xffee00de, 0x00000000,
+	0x00000000, 0x0044fff9, 0xffed00d6, 0x00000000,
+	0x00000000, 0x004efff8, 0xffed00cd, 0x00000000,
+	0x00000000, 0x0059fff6, 0xffed00c4, 0x00000000,
+	0x00000000, 0x0064fff5, 0xffed00ba, 0x00000000,
+	0x00000000, 0x006ffff3, 0xffee00b0, 0x00000000,
+	0x00000000, 0x007afff2, 0xffee00a6, 0x00000000,
+	0x00000000, 0x0085fff1, 0xffef009b, 0x00000000,
+	0x00000000, 0x0090fff0, 0xfff00090, 0x00000000,
+	0x00000000, 0x009bffef, 0xfff10085, 0x00000000,
+	0x00000000, 0x00a6ffee, 0xfff2007a, 0x00000000,
+	0x00000000, 0x00b0ffee, 0xfff3006f, 0x00000000,
+	0x00000000, 0x00baffed, 0xfff50064, 0x00000000,
+	0x00000000, 0x00c4ffed, 0xfff60059, 0x00000000,
+	0x00000000, 0x00cdffed, 0xfff8004e, 0x00000000,
+	0x00000000, 0x00d6ffed, 0xfff90044, 0x00000000,
+	0x00000000, 0x00deffee, 0xfffa003a, 0x00000000,
+	0x00000000, 0x00e5fff0, 0xfffb0030, 0x00000000,
+	0x00000000, 0x00ecfff1, 0xfffc0027, 0x00000000,
+	0x00000000, 0x00f2fff2, 0xfffd001f, 0x00000000,
+	0x00000000, 0x00f7fff4, 0xfffe0017, 0x00000000,
+	0x00000000, 0x00fbfff6, 0xffff0010, 0x00000000,
+	0x00000000, 0x00fefff9, 0x00000009, 0x00000000,
+	0x00000000, 0x00fffffc, 0x00000005, 0x00000000
+};
+
+static const u32 coef_lut_b[NB_COEF] = {
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000,
+	0x00000000, 0x00000000, 0x00000100, 0x00000000
+};
+
+static const u32 coef_lut_c_y_legacy[NB_COEF] = {
+	0x00060004, 0x0038ffe1, 0x003800be, 0x0006ffe1,
+	0x00050005, 0x0042ffe1, 0x003800b3, 0x0007ffe1,
+	0x00040006, 0x0046ffe1, 0x003300b2, 0x0008ffe2,
+	0x00030007, 0x004cffe1, 0x002e00b1, 0x0008ffe2,
+	0x00020006, 0x0051ffe2, 0x002900b0, 0x0009ffe3,
+	0x00010008, 0x0056ffe2, 0x002400ae, 0x0009ffe4,
+	0xffff0008, 0x005cffe3, 0x001f00ad, 0x000affe4,
+	0xfffe0008, 0x0062ffe4, 0x001a00ab, 0x000affe5,
+	0xfffd000a, 0x0066ffe5, 0x001500a8, 0x000bffe6,
+	0xfffc0009, 0x006bffe7, 0x001100a5, 0x000bffe8,
+	0xfffa000a, 0x0070ffe8, 0x000d00a3, 0x000bffe9,
+	0xfff9000b, 0x0076ffea, 0x0008009f, 0x000bffea,
+	0xfff7000b, 0x007affec, 0x0005009b, 0x000cffec,
+	0xfff6000b, 0x007effef, 0x00010098, 0x000cffed,
+	0xfff4000b, 0x0084fff1, 0xfffd0095, 0x000cffee,
+	0xfff3000b, 0x0088fff4, 0xfffa0090, 0x000cfff0,
+	0xfff1000b, 0x008dfff7, 0xfff7008d, 0x000bfff1,
+	0xfff0000c, 0x0090fffa, 0xfff40088, 0x000bfff3,
+	0xffee000c, 0x0095fffd, 0xfff10084, 0x000bfff4,
+	0xffed000c, 0x00980001, 0xffef007e, 0x000bfff6,
+	0xffec000c, 0x009b0005, 0xffec007a, 0x000bfff7,
+	0xffea000b, 0x009f0008, 0xffea0076, 0x000bfff9,
+	0xffe9000b, 0x00a3000d, 0xffe80070, 0x000afffa,
+	0xffe8000b, 0x00a50011, 0xffe7006b, 0x0009fffc,
+	0xffe6000b, 0x00a80015, 0xffe50066, 0x000afffd,
+	0xffe5000a, 0x00ab001a, 0xffe40062, 0x0008fffe,
+	0xffe4000a, 0x00ad001f, 0xffe3005c, 0x0008ffff,
+	0xffe40009, 0x00ae0024, 0xffe20056, 0x00080001,
+	0xffe30009, 0x00b00029, 0xffe20051, 0x00060002,
+	0xffe20008, 0x00b1002e, 0xffe1004c, 0x00070003,
+	0xffe20008, 0x00b20033, 0xffe10046, 0x00060004,
+	0xffe10007, 0x00b30038, 0xffe10042, 0x00050005
+};
+
+static const u32 coef_lut_c_c_legacy[NB_COEF] = {
+	0x0001fff3, 0x003afffb, 0x003a00a1, 0x0001fffb,
+	0x0001fff5, 0x0041fffb, 0x0038009a, 0x0001fffb,
+	0x0001fff5, 0x0046fffb, 0x00340099, 0x0001fffb,
+	0x0001fff7, 0x0049fffb, 0x00300098, 0x0001fffb,
+	0x0001fff9, 0x004cfffb, 0x002d0096, 0x0001fffb,
+	0x0001fffa, 0x004ffffc, 0x00290095, 0x0001fffb,
+	0x0001fff9, 0x0054fffd, 0x00250093, 0x0001fffc,
+	0x0001fffa, 0x0058fffd, 0x00220092, 0x0000fffc,
+	0x0001fffb, 0x005bfffe, 0x001f0090, 0x0000fffc,
+	0x0001fffd, 0x005effff, 0x001c008c, 0x0000fffd,
+	0x0001fffd, 0x00620000, 0x0019008a, 0x0000fffd,
+	0x0001fffe, 0x00660001, 0x00160088, 0xfffffffd,
+	0x0000fffe, 0x006a0003, 0x00130085, 0xfffffffe,
+	0x0000fffe, 0x006e0004, 0x00100083, 0xfffffffe,
+	0x0000fffe, 0x00710006, 0x000e007f, 0xffffffff,
+	0x0000fffe, 0x00750008, 0x000c007c, 0xfffeffff,
+	0xfffffffe, 0x0079000a, 0x000a0079, 0xfffeffff,
+	0xfffffffe, 0x007c000c, 0x00080075, 0xfffe0000,
+	0xffffffff, 0x007f000e, 0x00060071, 0xfffe0000,
+	0xfffeffff, 0x00830010, 0x0004006e, 0xfffe0000,
+	0xfffeffff, 0x00850013, 0x0003006a, 0xfffe0000,
+	0xfffdffff, 0x00880016, 0x00010066, 0xfffe0001,
+	0xfffd0000, 0x008a0019, 0x00000062, 0xfffd0001,
+	0xfffd0000, 0x008c001c, 0xffff005e, 0xfffd0001,
+	0xfffc0000, 0x0090001f, 0xfffe005b, 0xfffb0001,
+	0xfffc0000, 0x00920022, 0xfffd0058, 0xfffa0001,
+	0xfffc0001, 0x00930025, 0xfffd0054, 0xfff90001,
+	0xfffb0001, 0x00950029, 0xfffc004f, 0xfffa0001,
+	0xfffb0001, 0x0096002d, 0xfffb004c, 0xfff90001,
+	0xfffb0001, 0x00980030, 0xfffb0049, 0xfff70001,
+	0xfffb0001, 0x00990034, 0xfffb0046, 0xfff50001,
+	0xfffb0001, 0x009a0038, 0xfffb0041, 0xfff50001
+};
+
+static const u32 coef_lut_d_y_legacy[NB_COEF] = {
+	0xfff80009, 0x0046ffec, 0x004600a3, 0xfff8ffec,
+	0xfff70009, 0x004effed, 0x0044009d, 0xfff9ffeb,
+	0xfff6000a, 0x0052ffee, 0x003f009d, 0xfffaffea,
+	0xfff50009, 0x0057ffef, 0x003b009d, 0xfffbffe9,
+	0xfff50008, 0x005bfff0, 0x0037009c, 0xfffcffe9,
+	0xfff40008, 0x005ffff2, 0x0033009b, 0xfffcffe9,
+	0xfff30007, 0x0064fff3, 0x002f009b, 0xfffdffe8,
+	0xfff20007, 0x0068fff5, 0x002b0099, 0xfffeffe8,
+	0xfff10008, 0x006bfff7, 0x00270097, 0xffffffe8,
+	0xfff00007, 0x006ffff9, 0x00230097, 0xffffffe8,
+	0xffef0006, 0x0073fffb, 0x00200095, 0x0000ffe8,
+	0xffee0005, 0x0077fffe, 0x001c0093, 0x0000ffe9,
+	0xffee0005, 0x007a0000, 0x00180091, 0x0001ffe9,
+	0xffed0005, 0x007d0003, 0x0015008e, 0x0002ffe9,
+	0xffec0005, 0x00800006, 0x0012008b, 0x0002ffea,
+	0xffeb0004, 0x00840008, 0x000e008a, 0x0003ffea,
+	0xffeb0003, 0x0087000b, 0x000b0087, 0x0003ffeb,
+	0xffea0003, 0x008a000e, 0x00080084, 0x0004ffeb,
+	0xffea0002, 0x008b0012, 0x00060080, 0x0005ffec,
+	0xffe90002, 0x008e0015, 0x0003007d, 0x0005ffed,
+	0xffe90001, 0x00910018, 0x0000007a, 0x0005ffee,
+	0xffe90000, 0x0093001c, 0xfffe0077, 0x0005ffee,
+	0xffe80000, 0x00950020, 0xfffb0073, 0x0006ffef,
+	0xffe8ffff, 0x00970023, 0xfff9006f, 0x0007fff0,
+	0xffe8ffff, 0x00970027, 0xfff7006b, 0x0008fff1,
+	0xffe8fffe, 0x0099002b, 0xfff50068, 0x0007fff2,
+	0xffe8fffd, 0x009b002f, 0xfff30064, 0x0007fff3,
+	0xffe9fffc, 0x009b0033, 0xfff2005f, 0x0008fff4,
+	0xffe9fffc, 0x009c0037, 0xfff0005b, 0x0008fff5,
+	0xffe9fffb, 0x009d003b, 0xffef0057, 0x0009fff5,
+	0xffeafffa, 0x009d003f, 0xffee0052, 0x000afff6,
+	0xffebfff9, 0x009d0044, 0xffed004e, 0x0009fff7
+};
+
+static const u32 coef_lut_d_c_legacy[NB_COEF] = {
+	0xfffeffff, 0x003fffff, 0x003f0089, 0xfffeffff,
+	0xfffe0000, 0x00460000, 0x0042007d, 0xfffffffe,
+	0xfffe0000, 0x00490001, 0x003f007d, 0xfffffffd,
+	0xfffd0001, 0x004b0002, 0x003c007d, 0x0000fffc,
+	0xfffd0001, 0x004e0003, 0x0039007c, 0x0000fffc,
+	0xfffc0001, 0x00510005, 0x0036007c, 0x0000fffb,
+	0xfffc0001, 0x00540006, 0x0033007b, 0x0001fffa,
+	0xfffc0003, 0x00550008, 0x00310078, 0x0001fffa,
+	0xfffb0003, 0x00580009, 0x002e0078, 0x0001fffa,
+	0xfffb0002, 0x005b000b, 0x002b0077, 0x0002fff9,
+	0xfffa0003, 0x005e000d, 0x00280075, 0x0002fff9,
+	0xfffa0002, 0x0060000f, 0x00260074, 0x0002fff9,
+	0xfffa0004, 0x00610011, 0x00230072, 0x0002fff9,
+	0xfffa0004, 0x00640013, 0x00200070, 0x0002fff9,
+	0xfff90004, 0x00660015, 0x001e006e, 0x0003fff9,
+	0xfff90004, 0x00680017, 0x001c006c, 0x0003fff9,
+	0xfff90003, 0x006b0019, 0x0019006b, 0x0003fff9,
+	0xfff90003, 0x006c001c, 0x00170068, 0x0004fff9,
+	0xfff90003, 0x006e001e, 0x00150066, 0x0004fff9,
+	0xfff90002, 0x00700020, 0x00130064, 0x0004fffa,
+	0xfff90002, 0x00720023, 0x00110061, 0x0004fffa,
+	0xfff90002, 0x00740026, 0x000f0060, 0x0002fffa,
+	0xfff90002, 0x00750028, 0x000d005e, 0x0003fffa,
+	0xfff90002, 0x0077002b, 0x000b005b, 0x0002fffb,
+	0xfffa0001, 0x0078002e, 0x00090058, 0x0003fffb,
+	0xfffa0001, 0x00780031, 0x00080055, 0x0003fffc,
+	0xfffa0001, 0x007b0033, 0x00060054, 0x0001fffc,
+	0xfffb0000, 0x007c0036, 0x00050051, 0x0001fffc,
+	0xfffc0000, 0x007c0039, 0x0003004e, 0x0001fffd,
+	0xfffc0000, 0x007d003c, 0x0002004b, 0x0001fffd,
+	0xfffdffff, 0x007d003f, 0x00010049, 0x0000fffe,
+	0xfffeffff, 0x007d0042, 0x00000046, 0x0000fffe
+};
+
+static const u32 coef_lut_e_y_legacy[NB_COEF] = {
+	0xfff10001, 0x00490004, 0x00490083, 0xfff10004,
+	0xfff10000, 0x00500006, 0x004b007b, 0xfff10002,
+	0xfff10000, 0x00530007, 0x0048007b, 0xfff10001,
+	0xfff10000, 0x00550009, 0x0046007a, 0xfff10000,
+	0xfff1fffe, 0x0058000b, 0x0043007b, 0xfff2fffe,
+	0xfff1ffff, 0x005a000d, 0x0040007a, 0xfff2fffd,
+	0xfff1fffd, 0x005d000f, 0x003e007a, 0xfff2fffc,
+	0xfff1fffd, 0x005f0011, 0x003b0079, 0xfff3fffb,
+	0xfff1fffc, 0x00610013, 0x00390079, 0xfff3fffa,
+	0xfff1fffb, 0x00640015, 0x00360079, 0xfff3fff9,
+	0xfff1fffa, 0x00660017, 0x00340078, 0xfff4fff8,
+	0xfff1fffb, 0x00680019, 0x00310077, 0xfff4fff7,
+	0xfff2fff9, 0x006a001b, 0x002f0076, 0xfff5fff6,
+	0xfff2fff9, 0x006c001e, 0x002c0075, 0xfff5fff5,
+	0xfff2fff9, 0x006d0020, 0x002a0073, 0xfff6fff5,
+	0xfff3fff7, 0x00700022, 0x00270073, 0xfff6fff4,
+	0xfff3fff7, 0x00710025, 0x00250071, 0xfff7fff3,
+	0xfff4fff6, 0x00730027, 0x00220070, 0xfff7fff3,
+	0xfff5fff6, 0x0073002a, 0x0020006d, 0xfff9fff2,
+	0xfff5fff5, 0x0075002c, 0x001e006c, 0xfff9fff2,
+	0xfff6fff5, 0x0076002f, 0x001b006a, 0xfff9fff2,
+	0xfff7fff4, 0x00770031, 0x00190068, 0xfffbfff1,
+	0xfff8fff4, 0x00780034, 0x00170066, 0xfffafff1,
+	0xfff9fff3, 0x00790036, 0x00150064, 0xfffbfff1,
+	0xfffafff3, 0x00790039, 0x00130061, 0xfffcfff1,
+	0xfffbfff3, 0x0079003b, 0x0011005f, 0xfffdfff1,
+	0xfffcfff2, 0x007a003e, 0x000f005d, 0xfffdfff1,
+	0xfffdfff2, 0x007a0040, 0x000d005a, 0xfffffff1,
+	0xfffefff2, 0x007b0043, 0x000b0058, 0xfffefff1,
+	0x0000fff1, 0x007a0046, 0x00090055, 0x0000fff1,
+	0x0001fff1, 0x007b0048, 0x00070053, 0x0000fff1,
+	0x0002fff1, 0x007b004b, 0x00060050, 0x0000fff1
+};
+
+static const u32 coef_lut_e_c_legacy[NB_COEF] = {
+	0xfffa0001, 0x003f0010, 0x003f006d, 0xfffa0010,
+	0xfffb0002, 0x00440011, 0x00440062, 0xfffa000e,
+	0xfffb0001, 0x00460013, 0x00420062, 0xfffa000d,
+	0xfffb0000, 0x00480014, 0x00410062, 0xfffa000c,
+	0xfffb0001, 0x00490015, 0x003f0061, 0xfffb000b,
+	0xfffb0000, 0x004b0017, 0x003d0061, 0xfffb000a,
+	0xfffb0000, 0x004d0018, 0x003b0062, 0xfffb0008,
+	0xfffcffff, 0x004f001a, 0x00390061, 0xfffb0007,
+	0xfffc0000, 0x004f001c, 0x00380060, 0xfffb0006,
+	0xfffcffff, 0x0052001d, 0x00360060, 0xfffb0005,
+	0xfffdfffe, 0x0053001f, 0x00340060, 0xfffb0004,
+	0xfffdfffe, 0x00540021, 0x0032005e, 0xfffc0004,
+	0xfffeffff, 0x00550022, 0x0030005d, 0xfffc0003,
+	0xfffeffff, 0x00560024, 0x002f005c, 0xfffc0002,
+	0xfffffffd, 0x00580026, 0x002d005c, 0xfffc0001,
+	0xfffffffd, 0x005a0027, 0x002b005c, 0xfffc0000,
+	0x0000fffd, 0x005a0029, 0x0029005a, 0xfffd0000,
+	0x0000fffc, 0x005c002b, 0x0027005a, 0xfffdffff,
+	0x0001fffc, 0x005c002d, 0x00260058, 0xfffdffff,
+	0x0002fffc, 0x005c002f, 0x00240056, 0xfffffffe,
+	0x0003fffc, 0x005d0030, 0x00220055, 0xfffffffe,
+	0x0004fffc, 0x005e0032, 0x00210054, 0xfffefffd,
+	0x0004fffb, 0x00600034, 0x001f0053, 0xfffefffd,
+	0x0005fffb, 0x00600036, 0x001d0052, 0xfffffffc,
+	0x0006fffb, 0x00600038, 0x001c004f, 0x0000fffc,
+	0x0007fffb, 0x00610039, 0x001a004f, 0xfffffffc,
+	0x0008fffb, 0x0062003b, 0x0018004d, 0x0000fffb,
+	0x000afffb, 0x0061003d, 0x0017004b, 0x0000fffb,
+	0x000bfffb, 0x0061003f, 0x00150049, 0x0001fffb,
+	0x000cfffa, 0x00620041, 0x00140048, 0x0000fffb,
+	0x000dfffa, 0x00620042, 0x00130046, 0x0001fffb,
+	0x000efffa, 0x00620044, 0x00110044, 0x0002fffb
+};
+
+static const u32 coef_lut_f_y_legacy[NB_COEF] = {
+	0xfff6fff0, 0x00490012, 0x0049006e, 0xfff60012,
+	0xfff7fff1, 0x004e0013, 0x00490068, 0xfff60010,
+	0xfff7fff2, 0x004f0015, 0x00470067, 0xfff6000f,
+	0xfff7fff5, 0x004f0017, 0x00450065, 0xfff6000e,
+	0xfff8fff5, 0x00500018, 0x00440065, 0xfff6000c,
+	0xfff8fff6, 0x0051001a, 0x00420064, 0xfff6000b,
+	0xfff8fff6, 0x0052001c, 0x00400064, 0xfff6000a,
+	0xfff9fff6, 0x0054001d, 0x003e0064, 0xfff60008,
+	0xfff9fff8, 0x0054001f, 0x003c0063, 0xfff60007,
+	0xfffafff8, 0x00550021, 0x003a0062, 0xfff60006,
+	0xfffbfff7, 0x00560022, 0x00390062, 0xfff60005,
+	0xfffbfff8, 0x00570024, 0x00370061, 0xfff60004,
+	0xfffcfff8, 0x00580026, 0x00350060, 0xfff60003,
+	0xfffdfff8, 0x00590028, 0x0033005f, 0xfff60002,
+	0xfffdfff7, 0x005b002a, 0x0031005f, 0xfff60001,
+	0xfffefff7, 0x005c002c, 0x002f005e, 0xfff60000,
+	0xfffffff6, 0x005e002d, 0x002d005e, 0xfff6ffff,
+	0x0000fff6, 0x005e002f, 0x002c005c, 0xfff7fffe,
+	0x0001fff6, 0x005f0031, 0x002a005b, 0xfff7fffd,
+	0x0002fff6, 0x005f0033, 0x00280059, 0xfff8fffd,
+	0x0003fff6, 0x00600035, 0x00260058, 0xfff8fffc,
+	0x0004fff6, 0x00610037, 0x00240057, 0xfff8fffb,
+	0x0005fff6, 0x00620039, 0x00220056, 0xfff7fffb,
+	0x0006fff6, 0x0062003a, 0x00210055, 0xfff8fffa,
+	0x0007fff6, 0x0063003c, 0x001f0054, 0xfff8fff9,
+	0x0008fff6, 0x0064003e, 0x001d0054, 0xfff6fff9,
+	0x000afff6, 0x00640040, 0x001c0052, 0xfff6fff8,
+	0x000bfff6, 0x00640042, 0x001a0051, 0xfff6fff8,
+	0x000cfff6, 0x00650044, 0x00180050, 0xfff5fff8,
+	0x000efff6, 0x00650045, 0x0017004f, 0xfff5fff7,
+	0x000ffff6, 0x00670047, 0x0015004f, 0xfff2fff7,
+	0x0010fff6, 0x00680049, 0x0013004e, 0xfff1fff7
+};
+
+static const u32 coef_lut_f_c_legacy[NB_COEF] = {
+	0x0000fffb, 0x003a001a, 0x003a005d, 0x0000001a,
+	0x0001fffb, 0x003f001b, 0x00400051, 0x00000019,
+	0x0001fffc, 0x0040001c, 0x003f0051, 0x00000017,
+	0x0002fffb, 0x0042001d, 0x003e0051, 0xffff0016,
+	0x0002fffb, 0x0043001e, 0x003d0051, 0xffff0015,
+	0x0003fffc, 0x00430020, 0x003b0050, 0xffff0014,
+	0x0003fffb, 0x00450021, 0x003a0051, 0xfffe0013,
+	0x0004fffc, 0x00450022, 0x00390050, 0xfffe0012,
+	0x0005fffc, 0x00460023, 0x0038004f, 0xfffe0011,
+	0x0005fffb, 0x00480025, 0x00360050, 0xfffd0010,
+	0x0006fffc, 0x00480026, 0x0035004f, 0xfffd000f,
+	0x0006fffc, 0x00490027, 0x0034004f, 0xfffd000e,
+	0x0007fffd, 0x00490028, 0x0033004e, 0xfffd000d,
+	0x0008fffc, 0x004a002a, 0x0031004d, 0xfffd000d,
+	0x0009fffd, 0x004a002b, 0x0030004d, 0xfffc000c,
+	0x0009fffc, 0x004c002c, 0x002f004d, 0xfffc000b,
+	0x000afffc, 0x004c002e, 0x002e004c, 0xfffc000a,
+	0x000bfffc, 0x004d002f, 0x002c004c, 0xfffc0009,
+	0x000cfffc, 0x004d0030, 0x002b004a, 0xfffd0009,
+	0x000dfffd, 0x004d0031, 0x002a004a, 0xfffc0008,
+	0x000dfffd, 0x004e0033, 0x00280049, 0xfffd0007,
+	0x000efffd, 0x004f0034, 0x00270049, 0xfffc0006,
+	0x000ffffd, 0x004f0035, 0x00260048, 0xfffc0006,
+	0x0010fffd, 0x00500036, 0x00250048, 0xfffb0005,
+	0x0011fffe, 0x004f0038, 0x00230046, 0xfffc0005,
+	0x0012fffe, 0x00500039, 0x00220045, 0xfffc0004,
+	0x0013fffe, 0x0051003a, 0x00210045, 0xfffb0003,
+	0x0014ffff, 0x0050003b, 0x00200043, 0xfffc0003,
+	0x0015ffff, 0x0051003d, 0x001e0043, 0xfffb0002,
+	0x0016ffff, 0x0051003e, 0x001d0042, 0xfffb0002,
+	0x00170000, 0x0051003f, 0x001c0040, 0xfffc0001,
+	0x00190000, 0x00510040, 0x001b003f, 0xfffb0001
+};
+
+#endif
diff --git a/drivers/gpu/drm/sti/sti_layer.c b/drivers/gpu/drm/sti/sti_layer.c
index 06a587c4f1bb..899104f9d4bc 100644
--- a/drivers/gpu/drm/sti/sti_layer.c
+++ b/drivers/gpu/drm/sti/sti_layer.c
@@ -11,7 +11,9 @@
 #include <drm/drm_fb_cma_helper.h>
 
 #include "sti_compositor.h"
+#include "sti_cursor.h"
 #include "sti_gdp.h"
+#include "sti_hqvdp.h"
 #include "sti_layer.h"
 #include "sti_vid.h"
 
@@ -32,10 +34,13 @@ const char *sti_layer_to_str(struct sti_layer *layer)
 		return "VID1";
 	case STI_CURSOR:
 		return "CURSOR";
+	case STI_HQVDP_0:
+		return "HQVDP0";
 	default:
 		return "<UNKNOWN LAYER>";
 	}
 }
+EXPORT_SYMBOL(sti_layer_to_str);
 
 struct sti_layer *sti_layer_create(struct device *dev, int desc,
 				   void __iomem *baseaddr)
@@ -50,6 +55,12 @@ struct sti_layer *sti_layer_create(struct device *dev, int desc,
 	case STI_VID:
 		layer = sti_vid_create(dev);
 		break;
+	case STI_CUR:
+		layer = sti_cursor_create(dev);
+		break;
+	case STI_VDP:
+		layer = sti_hqvdp_create(dev);
+		break;
 	}
 
 	if (!layer) {
@@ -67,8 +78,11 @@ struct sti_layer *sti_layer_create(struct device *dev, int desc,
 
 	return layer;
 }
+EXPORT_SYMBOL(sti_layer_create);
 
-int sti_layer_prepare(struct sti_layer *layer, struct drm_framebuffer *fb,
+int sti_layer_prepare(struct sti_layer *layer,
+		      struct drm_crtc *crtc,
+		      struct drm_framebuffer *fb,
 		      struct drm_display_mode *mode, int mixer_id,
 		      int dest_x, int dest_y, int dest_w, int dest_h,
 		      int src_x, int src_y, int src_w, int src_h)
@@ -88,6 +102,7 @@ int sti_layer_prepare(struct sti_layer *layer, struct drm_framebuffer *fb,
 		return 1;
 	}
 
+	layer->crtc = crtc;
 	layer->fb = fb;
 	layer->mode = mode;
 	layer->mixer_id = mixer_id;
@@ -100,6 +115,7 @@ int sti_layer_prepare(struct sti_layer *layer, struct drm_framebuffer *fb,
 	layer->src_w = src_w;
 	layer->src_h = src_h;
 	layer->format = fb->pixel_format;
+	layer->vaddr = cma_obj->vaddr;
 	layer->paddr = cma_obj->paddr;
 	for (i = 0; i < 4; i++) {
 		layer->pitches[i] = fb->pitches[i];
diff --git a/drivers/gpu/drm/sti/sti_layer.h b/drivers/gpu/drm/sti/sti_layer.h
index 198c3774cc12..ceff497f557e 100644
--- a/drivers/gpu/drm/sti/sti_layer.h
+++ b/drivers/gpu/drm/sti/sti_layer.h
@@ -22,7 +22,8 @@ enum sti_layer_type {
 	STI_GDP = 1 << STI_LAYER_TYPE_SHIFT,
 	STI_VID = 2 << STI_LAYER_TYPE_SHIFT,
 	STI_CUR = 3 << STI_LAYER_TYPE_SHIFT,
-	STI_BCK = 4 << STI_LAYER_TYPE_SHIFT
+	STI_BCK = 4 << STI_LAYER_TYPE_SHIFT,
+	STI_VDP = 5 << STI_LAYER_TYPE_SHIFT
 };
 
 enum sti_layer_id_of_type {
@@ -39,6 +40,7 @@ enum sti_layer_desc {
 	STI_GDP_3       = STI_GDP | STI_ID_3,
 	STI_VID_0       = STI_VID | STI_ID_0,
 	STI_VID_1       = STI_VID | STI_ID_1,
+	STI_HQVDP_0     = STI_VDP | STI_ID_0,
 	STI_CURSOR      = STI_CUR,
 	STI_BACK        = STI_BCK
 };
@@ -67,6 +69,7 @@ struct sti_layer_funcs {
  *
  * @plane:              drm plane it is bound to (if any)
  * @fb:                 drm fb it is bound to
+ * @crtc:               crtc it is bound to
  * @mode:               display mode
  * @desc:               layer type & id
  * @device:		driver device
@@ -82,11 +85,13 @@ struct sti_layer_funcs {
  * @format:             format
  * @pitches:            pitch of 'planes' (eg: Y, U, V)
  * @offsets:            offset of 'planes'
+ * @vaddr:              virtual address of the input buffer
  * @paddr:              physical address of the input buffer
  */
 struct sti_layer {
 	struct drm_plane plane;
 	struct drm_framebuffer *fb;
+	struct drm_crtc *crtc;
 	struct drm_display_mode *mode;
 	enum sti_layer_desc desc;
 	struct device *dev;
@@ -102,12 +107,15 @@ struct sti_layer {
 	uint32_t format;
 	unsigned int pitches[4];
 	unsigned int offsets[4];
+	void *vaddr;
 	dma_addr_t paddr;
 };
 
 struct sti_layer *sti_layer_create(struct device *dev, int desc,
 			void __iomem *baseaddr);
-int sti_layer_prepare(struct sti_layer *layer, struct drm_framebuffer *fb,
+int sti_layer_prepare(struct sti_layer *layer,
+			struct drm_crtc *crtc,
+			struct drm_framebuffer *fb,
 			struct drm_display_mode *mode,
 			int mixer_id,
 			int dest_x, int dest_y,
diff --git a/drivers/gpu/drm/sti/sti_mixer.c b/drivers/gpu/drm/sti/sti_mixer.c
index 79f369db9fb6..13a4b84deab6 100644
--- a/drivers/gpu/drm/sti/sti_mixer.c
+++ b/drivers/gpu/drm/sti/sti_mixer.c
@@ -45,6 +45,7 @@ static const u32 mixerColorSpaceMatIdentity[] = {
 #define GAM_CTL_GDP1_MASK  BIT(4)
 #define GAM_CTL_GDP2_MASK  BIT(5)
 #define GAM_CTL_GDP3_MASK  BIT(6)
+#define GAM_CTL_CURSOR_MASK BIT(9)
 
 const char *sti_mixer_to_str(struct sti_mixer *mixer)
 {
@@ -122,11 +123,15 @@ int sti_mixer_set_layer_depth(struct sti_mixer *mixer, struct sti_layer *layer)
 		layer_id = GAM_DEPTH_GDP3_ID;
 		break;
 	case STI_VID_0:
+	case STI_HQVDP_0:
 		layer_id = GAM_DEPTH_VID0_ID;
 		break;
 	case STI_VID_1:
 		layer_id = GAM_DEPTH_VID1_ID;
 		break;
+	case STI_CURSOR:
+		/* no need to set depth for cursor */
+		return 0;
 	default:
 		DRM_ERROR("Unknown layer %d\n", layer->desc);
 		return 1;
@@ -185,9 +190,12 @@ static u32 sti_mixer_get_layer_mask(struct sti_layer *layer)
 	case STI_GDP_3:
 		return GAM_CTL_GDP3_MASK;
 	case STI_VID_0:
+	case STI_HQVDP_0:
 		return GAM_CTL_VID0_MASK;
 	case STI_VID_1:
 		return GAM_CTL_VID1_MASK;
+	case STI_CURSOR:
+		return GAM_CTL_CURSOR_MASK;
 	default:
 		return 0;
 	}
@@ -215,6 +223,15 @@ int sti_mixer_set_layer_status(struct sti_mixer *mixer,
 	return 0;
 }
 
+void sti_mixer_clear_all_layers(struct sti_mixer *mixer)
+{
+	u32 val;
+
+	DRM_DEBUG_DRIVER("%s clear all layer\n", sti_mixer_to_str(mixer));
+	val = sti_mixer_reg_read(mixer, GAM_MIXER_CTL) & 0xFFFF0000;
+	sti_mixer_reg_write(mixer, GAM_MIXER_CTL, val);
+}
+
 void sti_mixer_set_matrix(struct sti_mixer *mixer)
 {
 	unsigned int i;
diff --git a/drivers/gpu/drm/sti/sti_mixer.h b/drivers/gpu/drm/sti/sti_mixer.h
index 874372102e52..b97282182908 100644
--- a/drivers/gpu/drm/sti/sti_mixer.h
+++ b/drivers/gpu/drm/sti/sti_mixer.h
@@ -23,6 +23,7 @@
  * @id: id of the mixer
  * @drm_crtc: crtc object link to the mixer
  * @pending_event: set if a flip event is pending on crtc
+ * @enabled: to know if the mixer is active or not
  */
 struct sti_mixer {
 	struct device *dev;
@@ -30,6 +31,7 @@ struct sti_mixer {
 	int id;
 	struct drm_crtc	drm_crtc;
 	struct drm_pending_vblank_event *pending_event;
+	bool enabled;
 };
 
 const char *sti_mixer_to_str(struct sti_mixer *mixer);
@@ -39,6 +41,7 @@ struct sti_mixer *sti_mixer_create(struct device *dev, int id,
 
 int sti_mixer_set_layer_status(struct sti_mixer *mixer,
 		struct sti_layer *layer, bool status);
+void sti_mixer_clear_all_layers(struct sti_mixer *mixer);
 int sti_mixer_set_layer_depth(struct sti_mixer *mixer, struct sti_layer *layer);
 int sti_mixer_active_video_area(struct sti_mixer *mixer,
 		struct drm_display_mode *mode);
diff --git a/drivers/gpu/drm/sti/sti_tvout.c b/drivers/gpu/drm/sti/sti_tvout.c
index b8afe490356a..cb924aa2b321 100644
--- a/drivers/gpu/drm/sti/sti_tvout.c
+++ b/drivers/gpu/drm/sti/sti_tvout.c
@@ -16,6 +16,8 @@
 #include <drm/drmP.h>
 #include <drm/drm_crtc_helper.h>
 
+#include "sti_drm_crtc.h"
+
 /* glue registers */
 #define TVO_CSC_MAIN_M0                  0x000
 #define TVO_CSC_MAIN_M1                  0x004
@@ -96,7 +98,7 @@
 
 #define TVO_SYNC_HD_DCS_SHIFT            8
 
-#define ENCODER_MAIN_CRTC_MASK           BIT(0)
+#define ENCODER_CRTC_MASK                (BIT(0) | BIT(1))
 
 /* enum listing the supported output data format */
 enum sti_tvout_video_out_type {
@@ -149,14 +151,15 @@ static void tvout_write(struct sti_tvout *tvout, u32 val, int offset)
  * Set the clipping mode of a VIP
  *
  * @tvout: tvout structure
+ * @reg: register to set
  * @cr_r:
  * @y_g:
  * @cb_b:
  */
-static void tvout_vip_set_color_order(struct sti_tvout *tvout,
+static void tvout_vip_set_color_order(struct sti_tvout *tvout, int reg,
 				      u32 cr_r, u32 y_g, u32 cb_b)
 {
-	u32 val = tvout_read(tvout, TVO_VIP_HDMI);
+	u32 val = tvout_read(tvout, reg);
 
 	val &= ~(TVO_VIP_REORDER_MASK << TVO_VIP_REORDER_R_SHIFT);
 	val &= ~(TVO_VIP_REORDER_MASK << TVO_VIP_REORDER_G_SHIFT);
@@ -165,52 +168,58 @@ static void tvout_vip_set_color_order(struct sti_tvout *tvout,
 	val |= y_g << TVO_VIP_REORDER_G_SHIFT;
 	val |= cb_b << TVO_VIP_REORDER_B_SHIFT;
 
-	tvout_write(tvout, val, TVO_VIP_HDMI);
+	tvout_write(tvout, val, reg);
 }
 
 /**
  * Set the clipping mode of a VIP
  *
  * @tvout: tvout structure
+ * @reg: register to set
  * @range: clipping range
  */
-static void tvout_vip_set_clip_mode(struct sti_tvout *tvout, u32 range)
+static void tvout_vip_set_clip_mode(struct sti_tvout *tvout, int reg, u32 range)
 {
-	u32 val = tvout_read(tvout, TVO_VIP_HDMI);
+	u32 val = tvout_read(tvout, reg);
 
 	val &= ~(TVO_VIP_CLIP_MASK << TVO_VIP_CLIP_SHIFT);
 	val |= range << TVO_VIP_CLIP_SHIFT;
-	tvout_write(tvout, val, TVO_VIP_HDMI);
+	tvout_write(tvout, val, reg);
 }
 
 /**
  * Set the rounded value of a VIP
  *
  * @tvout: tvout structure
+ * @reg: register to set
  * @rnd: rounded val per component
  */
-static void tvout_vip_set_rnd(struct sti_tvout *tvout, u32 rnd)
+static void tvout_vip_set_rnd(struct sti_tvout *tvout, int reg, u32 rnd)
 {
-	u32 val = tvout_read(tvout, TVO_VIP_HDMI);
+	u32 val = tvout_read(tvout, reg);
 
 	val &= ~(TVO_VIP_RND_MASK << TVO_VIP_RND_SHIFT);
 	val |= rnd << TVO_VIP_RND_SHIFT;
-	tvout_write(tvout, val, TVO_VIP_HDMI);
+	tvout_write(tvout, val, reg);
 }
 
 /**
  * Select the VIP input
  *
  * @tvout: tvout structure
+ * @reg: register to set
+ * @main_path: main or auxiliary path
+ * @sel_input_logic_inverted: need to invert the logic
  * @sel_input: selected_input (main/aux + conv)
  */
 static void tvout_vip_set_sel_input(struct sti_tvout *tvout,
+				    int reg,
 				    bool main_path,
 				    bool sel_input_logic_inverted,
 				    enum sti_tvout_video_out_type video_out)
 {
 	u32 sel_input;
-	u32 val = tvout_read(tvout, TVO_VIP_HDMI);
+	u32 val = tvout_read(tvout, reg);
 
 	if (main_path)
 		sel_input = TVO_VIP_SEL_INPUT_MAIN;
@@ -232,22 +241,24 @@ static void tvout_vip_set_sel_input(struct sti_tvout *tvout,
 
 	val &= ~TVO_VIP_SEL_INPUT_MASK;
 	val |= sel_input;
-	tvout_write(tvout, val, TVO_VIP_HDMI);
+	tvout_write(tvout, val, reg);
 }
 
 /**
  * Select the input video signed or unsigned
  *
  * @tvout: tvout structure
+ * @reg: register to set
  * @in_vid_signed: used video input format
  */
-static void tvout_vip_set_in_vid_fmt(struct sti_tvout *tvout, u32 in_vid_fmt)
+static void tvout_vip_set_in_vid_fmt(struct sti_tvout *tvout,
+		int reg, u32 in_vid_fmt)
 {
-	u32 val = tvout_read(tvout, TVO_VIP_HDMI);
+	u32 val = tvout_read(tvout, reg);
 
 	val &= ~TVO_IN_FMT_SIGNED;
 	val |= in_vid_fmt;
-	tvout_write(tvout, val, TVO_MAIN_IN_VID_FORMAT);
+	tvout_write(tvout, val, reg);
 }
 
 /**
@@ -261,6 +272,7 @@ static void tvout_hdmi_start(struct sti_tvout *tvout, bool main_path)
 {
 	struct device_node *node = tvout->dev->of_node;
 	bool sel_input_logic_inverted = false;
+	u32 tvo_in_vid_format;
 
 	dev_dbg(tvout->dev, "%s\n", __func__);
 
@@ -268,33 +280,36 @@ static void tvout_hdmi_start(struct sti_tvout *tvout, bool main_path)
 		DRM_DEBUG_DRIVER("main vip for hdmi\n");
 		/* select the input sync for hdmi = VTG set 1 */
 		tvout_write(tvout, TVO_SYNC_MAIN_VTG_SET_1, TVO_HDMI_SYNC_SEL);
+		tvo_in_vid_format = TVO_MAIN_IN_VID_FORMAT;
 	} else {
 		DRM_DEBUG_DRIVER("aux vip for hdmi\n");
 		/* select the input sync for hdmi = VTG set 1 */
 		tvout_write(tvout, TVO_SYNC_AUX_VTG_SET_1, TVO_HDMI_SYNC_SEL);
+		tvo_in_vid_format = TVO_AUX_IN_VID_FORMAT;
 	}
 
 	/* set color channel order */
-	tvout_vip_set_color_order(tvout,
+	tvout_vip_set_color_order(tvout, TVO_VIP_HDMI,
 				  TVO_VIP_REORDER_CR_R_SEL,
 				  TVO_VIP_REORDER_Y_G_SEL,
 				  TVO_VIP_REORDER_CB_B_SEL);
 
 	/* set clipping mode (Limited range RGB/Y) */
-	tvout_vip_set_clip_mode(tvout, TVO_VIP_CLIP_LIMITED_RANGE_RGB_Y);
+	tvout_vip_set_clip_mode(tvout, TVO_VIP_HDMI,
+			TVO_VIP_CLIP_LIMITED_RANGE_RGB_Y);
 
 	/* set round mode (rounded to 8-bit per component) */
-	tvout_vip_set_rnd(tvout, TVO_VIP_RND_8BIT_ROUNDED);
+	tvout_vip_set_rnd(tvout, TVO_VIP_HDMI, TVO_VIP_RND_8BIT_ROUNDED);
 
 	if (of_device_is_compatible(node, "st,stih407-tvout")) {
 		/* set input video format */
-		tvout_vip_set_in_vid_fmt(tvout->regs + TVO_MAIN_IN_VID_FORMAT,
-					 TVO_IN_FMT_SIGNED);
+		tvout_vip_set_in_vid_fmt(tvout, tvo_in_vid_format,
+					TVO_IN_FMT_SIGNED);
 		sel_input_logic_inverted = true;
 	}
 
 	/* input selection */
-	tvout_vip_set_sel_input(tvout, main_path,
+	tvout_vip_set_sel_input(tvout, TVO_VIP_HDMI, main_path,
 			sel_input_logic_inverted, STI_TVOUT_VIDEO_OUT_RGB);
 }
 
@@ -309,48 +324,47 @@ static void tvout_hda_start(struct sti_tvout *tvout, bool main_path)
 {
 	struct device_node *node = tvout->dev->of_node;
 	bool sel_input_logic_inverted = false;
+	u32 tvo_in_vid_format;
+	int val;
 
 	dev_dbg(tvout->dev, "%s\n", __func__);
 
-	if (!main_path) {
-		DRM_ERROR("HD Analog on aux not implemented\n");
-		return;
+	if (main_path) {
+		val = TVO_SYNC_MAIN_VTG_SET_2 << TVO_SYNC_HD_DCS_SHIFT;
+		val |= TVO_SYNC_MAIN_VTG_SET_3;
+		tvout_write(tvout, val, TVO_HD_SYNC_SEL);
+		tvo_in_vid_format = TVO_MAIN_IN_VID_FORMAT;
+	} else {
+		val = TVO_SYNC_AUX_VTG_SET_2 << TVO_SYNC_HD_DCS_SHIFT;
+		val |= TVO_SYNC_AUX_VTG_SET_3;
+		tvout_write(tvout, val, TVO_HD_SYNC_SEL);
+		tvo_in_vid_format = TVO_AUX_IN_VID_FORMAT;
 	}
 
-	DRM_DEBUG_DRIVER("main vip for HDF\n");
-
 	/* set color channel order */
-	tvout_vip_set_color_order(tvout->regs + TVO_VIP_HDF,
+	tvout_vip_set_color_order(tvout, TVO_VIP_HDF,
 				  TVO_VIP_REORDER_CR_R_SEL,
 				  TVO_VIP_REORDER_Y_G_SEL,
 				  TVO_VIP_REORDER_CB_B_SEL);
 
-	/* set clipping mode (Limited range RGB/Y) */
-	tvout_vip_set_clip_mode(tvout->regs + TVO_VIP_HDF,
-				TVO_VIP_CLIP_LIMITED_RANGE_CB_CR);
+	/* set clipping mode (EAV/SAV clipping) */
+	tvout_vip_set_clip_mode(tvout, TVO_VIP_HDF, TVO_VIP_CLIP_EAV_SAV);
 
 	/* set round mode (rounded to 10-bit per component) */
-	tvout_vip_set_rnd(tvout->regs + TVO_VIP_HDF, TVO_VIP_RND_10BIT_ROUNDED);
+	tvout_vip_set_rnd(tvout, TVO_VIP_HDF, TVO_VIP_RND_10BIT_ROUNDED);
 
 	if (of_device_is_compatible(node, "st,stih407-tvout")) {
 		/* set input video format */
-		tvout_vip_set_in_vid_fmt(tvout, TVO_IN_FMT_SIGNED);
+		tvout_vip_set_in_vid_fmt(tvout,
+			tvo_in_vid_format, TVO_IN_FMT_SIGNED);
 		sel_input_logic_inverted = true;
 	}
 
 	/* Input selection */
-	tvout_vip_set_sel_input(tvout->regs + TVO_VIP_HDF,
-				main_path,
+	tvout_vip_set_sel_input(tvout, TVO_VIP_HDF, main_path,
 				sel_input_logic_inverted,
 				STI_TVOUT_VIDEO_OUT_YUV);
 
-	/* select the input sync for HD analog = VTG set 3
-	 * and HD DCS = VTG set 2 */
-	tvout_write(tvout,
-		(TVO_SYNC_MAIN_VTG_SET_2 << TVO_SYNC_HD_DCS_SHIFT)
-		| TVO_SYNC_MAIN_VTG_SET_3,
-		TVO_HD_SYNC_SEL);
-
 	/* power up HD DAC */
 	tvout_write(tvout, 0, TVO_HD_DAC_CFG_OFF);
 }
@@ -392,7 +406,7 @@ static void sti_hda_encoder_commit(struct drm_encoder *encoder)
 {
 	struct sti_tvout *tvout = to_sti_tvout(encoder);
 
-	tvout_hda_start(tvout, true);
+	tvout_hda_start(tvout, sti_drm_crtc_is_main(encoder->crtc));
 }
 
 static void sti_hda_encoder_disable(struct drm_encoder *encoder)
@@ -429,7 +443,7 @@ static struct drm_encoder *sti_tvout_create_hda_encoder(struct drm_device *dev,
 
 	drm_encoder = (struct drm_encoder *) encoder;
 
-	drm_encoder->possible_crtcs = ENCODER_MAIN_CRTC_MASK;
+	drm_encoder->possible_crtcs = ENCODER_CRTC_MASK;
 	drm_encoder->possible_clones = 1 << 0;
 
 	drm_encoder_init(dev, drm_encoder,
@@ -444,7 +458,7 @@ static void sti_hdmi_encoder_commit(struct drm_encoder *encoder)
 {
 	struct sti_tvout *tvout = to_sti_tvout(encoder);
 
-	tvout_hdmi_start(tvout, true);
+	tvout_hdmi_start(tvout, sti_drm_crtc_is_main(encoder->crtc));
 }
 
 static void sti_hdmi_encoder_disable(struct drm_encoder *encoder)
@@ -478,7 +492,7 @@ static struct drm_encoder *sti_tvout_create_hdmi_encoder(struct drm_device *dev,
 
 	drm_encoder = (struct drm_encoder *) encoder;
 
-	drm_encoder->possible_crtcs = ENCODER_MAIN_CRTC_MASK;
+	drm_encoder->possible_crtcs = ENCODER_CRTC_MASK;
 	drm_encoder->possible_clones = 1 << 1;
 
 	drm_encoder_init(dev, drm_encoder,
diff --git a/drivers/gpu/drm/sti/sti_vtg.c b/drivers/gpu/drm/sti/sti_vtg.c
index 740d6e347a62..9564f2568e2c 100644
--- a/drivers/gpu/drm/sti/sti_vtg.c
+++ b/drivers/gpu/drm/sti/sti_vtg.c
@@ -51,10 +51,19 @@
 #define VTG_TOP_V_HD_3      0x010C
 #define VTG_BOT_V_HD_3      0x0110
 
+#define VTG_H_HD_4          0x0120
+#define VTG_TOP_V_VD_4      0x0124
+#define VTG_BOT_V_VD_4      0x0128
+#define VTG_TOP_V_HD_4      0x012c
+#define VTG_BOT_V_HD_4      0x0130
+
 #define VTG_IRQ_BOTTOM      BIT(0)
 #define VTG_IRQ_TOP         BIT(1)
 #define VTG_IRQ_MASK        (VTG_IRQ_TOP | VTG_IRQ_BOTTOM)
 
+/* Delay introduced by the HDMI in nb of pixel */
+#define HDMI_DELAY          (6)
+
 /* delay introduced by the Arbitrary Waveform Generator in nb of pixels */
 #define AWG_DELAY_HD        (-9)
 #define AWG_DELAY_ED        (-8)
@@ -133,10 +142,10 @@ static void vtg_set_mode(struct sti_vtg *vtg,
 	writel(tmp, vtg->regs + VTG_VID_TFS);
 	writel(tmp, vtg->regs + VTG_VID_BFS);
 
-	/* prepare VTG set 1 and 2 for HDMI and VTG set 3 for HD DAC */
-	tmp = (mode->hsync_end - mode->hsync_start) << 16;
+	/* prepare VTG set 1 for HDMI */
+	tmp = (mode->hsync_end - mode->hsync_start + HDMI_DELAY) << 16;
+	tmp |= HDMI_DELAY;
 	writel(tmp, vtg->regs + VTG_H_HD_1);
-	writel(tmp, vtg->regs + VTG_H_HD_2);
 
 	tmp = (mode->vsync_end - mode->vsync_start + 1) << 16;
 	tmp |= 1;
@@ -146,6 +155,11 @@ static void vtg_set_mode(struct sti_vtg *vtg,
 	writel(0, vtg->regs + VTG_BOT_V_HD_1);
 
 	/* prepare VTG set 2 for for HD DCS */
+	tmp = (mode->hsync_end - mode->hsync_start) << 16;
+	writel(tmp, vtg->regs + VTG_H_HD_2);
+
+	tmp = (mode->vsync_end - mode->vsync_start + 1) << 16;
+	tmp |= 1;
 	writel(tmp, vtg->regs + VTG_TOP_V_VD_2);
 	writel(tmp, vtg->regs + VTG_BOT_V_VD_2);
 	writel(0, vtg->regs + VTG_TOP_V_HD_2);
@@ -166,6 +180,17 @@ static void vtg_set_mode(struct sti_vtg *vtg,
 	writel(tmp, vtg->regs + VTG_TOP_V_HD_3);
 	writel(tmp, vtg->regs + VTG_BOT_V_HD_3);
 
+	/* Prepare VTG set 4 for DVO */
+	tmp = (mode->hsync_end - mode->hsync_start) << 16;
+	writel(tmp, vtg->regs + VTG_H_HD_4);
+
+	tmp = (mode->vsync_end - mode->vsync_start + 1) << 16;
+	tmp |= 1;
+	writel(tmp, vtg->regs + VTG_TOP_V_VD_4);
+	writel(tmp, vtg->regs + VTG_BOT_V_VD_4);
+	writel(0, vtg->regs + VTG_TOP_V_HD_4);
+	writel(0, vtg->regs + VTG_BOT_V_HD_4);
+
 	/* mode */
 	writel(type, vtg->regs + VTG_MODE);
 }
diff --git a/drivers/gpu/drm/tegra/Kconfig b/drivers/gpu/drm/tegra/Kconfig
index 354ddb29231f..74d9d621453d 100644
--- a/drivers/gpu/drm/tegra/Kconfig
+++ b/drivers/gpu/drm/tegra/Kconfig
@@ -1,6 +1,7 @@
 config DRM_TEGRA
 	tristate "NVIDIA Tegra DRM"
 	depends on ARCH_TEGRA || (ARM && COMPILE_TEST)
+	depends on COMMON_CLK
 	depends on DRM
 	depends on RESET_CONTROLLER
 	select DRM_KMS_HELPER
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 054a79f143ae..3367960286a6 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -9,17 +9,23 @@
 
 #include <linux/clk.h>
 #include <linux/debugfs.h>
+#include <linux/iommu.h>
 #include <linux/reset.h>
 
+#include <soc/tegra/pmc.h>
+
 #include "dc.h"
 #include "drm.h"
 #include "gem.h"
 
+#include <drm/drm_plane_helper.h>
+
 struct tegra_dc_soc_info {
 	bool supports_interlacing;
 	bool supports_cursor;
 	bool supports_block_linear;
 	unsigned int pitch_align;
+	bool has_powergate;
 };
 
 struct tegra_plane {
@@ -32,6 +38,26 @@ static inline struct tegra_plane *to_tegra_plane(struct drm_plane *plane)
 	return container_of(plane, struct tegra_plane, base);
 }
 
+static void tegra_dc_window_commit(struct tegra_dc *dc, unsigned int index)
+{
+	u32 value = WIN_A_ACT_REQ << index;
+
+	tegra_dc_writel(dc, value << 8, DC_CMD_STATE_CONTROL);
+	tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL);
+}
+
+static void tegra_dc_cursor_commit(struct tegra_dc *dc)
+{
+	tegra_dc_writel(dc, CURSOR_ACT_REQ << 8, DC_CMD_STATE_CONTROL);
+	tegra_dc_writel(dc, CURSOR_ACT_REQ, DC_CMD_STATE_CONTROL);
+}
+
+static void tegra_dc_commit(struct tegra_dc *dc)
+{
+	tegra_dc_writel(dc, GENERAL_ACT_REQ << 8, DC_CMD_STATE_CONTROL);
+	tegra_dc_writel(dc, GENERAL_ACT_REQ, DC_CMD_STATE_CONTROL);
+}
+
 static unsigned int tegra_dc_format(uint32_t format, uint32_t *swap)
 {
 	/* assume no swapping of fetched data */
@@ -303,17 +329,260 @@ static int tegra_dc_setup_window(struct tegra_dc *dc, unsigned int index,
 		break;
 	}
 
-	tegra_dc_writel(dc, WIN_A_UPDATE << index, DC_CMD_STATE_CONTROL);
-	tegra_dc_writel(dc, WIN_A_ACT_REQ << index, DC_CMD_STATE_CONTROL);
+	tegra_dc_window_commit(dc, index);
+
+	return 0;
+}
+
+static int tegra_window_plane_disable(struct drm_plane *plane)
+{
+	struct tegra_dc *dc = to_tegra_dc(plane->crtc);
+	struct tegra_plane *p = to_tegra_plane(plane);
+	u32 value;
+
+	if (!plane->crtc)
+		return 0;
+
+	value = WINDOW_A_SELECT << p->index;
+	tegra_dc_writel(dc, value, DC_CMD_DISPLAY_WINDOW_HEADER);
+
+	value = tegra_dc_readl(dc, DC_WIN_WIN_OPTIONS);
+	value &= ~WIN_ENABLE;
+	tegra_dc_writel(dc, value, DC_WIN_WIN_OPTIONS);
+
+	tegra_dc_window_commit(dc, p->index);
+
+	return 0;
+}
+
+static void tegra_plane_destroy(struct drm_plane *plane)
+{
+	struct tegra_plane *p = to_tegra_plane(plane);
+
+	drm_plane_cleanup(plane);
+	kfree(p);
+}
+
+static const u32 tegra_primary_plane_formats[] = {
+	DRM_FORMAT_XBGR8888,
+	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_RGB565,
+};
+
+static int tegra_primary_plane_update(struct drm_plane *plane,
+				      struct drm_crtc *crtc,
+				      struct drm_framebuffer *fb, int crtc_x,
+				      int crtc_y, unsigned int crtc_w,
+				      unsigned int crtc_h, uint32_t src_x,
+				      uint32_t src_y, uint32_t src_w,
+				      uint32_t src_h)
+{
+	struct tegra_bo *bo = tegra_fb_get_plane(fb, 0);
+	struct tegra_plane *p = to_tegra_plane(plane);
+	struct tegra_dc *dc = to_tegra_dc(crtc);
+	struct tegra_dc_window window;
+	int err;
+
+	memset(&window, 0, sizeof(window));
+	window.src.x = src_x >> 16;
+	window.src.y = src_y >> 16;
+	window.src.w = src_w >> 16;
+	window.src.h = src_h >> 16;
+	window.dst.x = crtc_x;
+	window.dst.y = crtc_y;
+	window.dst.w = crtc_w;
+	window.dst.h = crtc_h;
+	window.format = tegra_dc_format(fb->pixel_format, &window.swap);
+	window.bits_per_pixel = fb->bits_per_pixel;
+	window.bottom_up = tegra_fb_is_bottom_up(fb);
+
+	err = tegra_fb_get_tiling(fb, &window.tiling);
+	if (err < 0)
+		return err;
+
+	window.base[0] = bo->paddr + fb->offsets[0];
+	window.stride[0] = fb->pitches[0];
+
+	err = tegra_dc_setup_window(dc, p->index, &window);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+static void tegra_primary_plane_destroy(struct drm_plane *plane)
+{
+	tegra_window_plane_disable(plane);
+	tegra_plane_destroy(plane);
+}
+
+static const struct drm_plane_funcs tegra_primary_plane_funcs = {
+	.update_plane = tegra_primary_plane_update,
+	.disable_plane = tegra_window_plane_disable,
+	.destroy = tegra_primary_plane_destroy,
+};
+
+static struct drm_plane *tegra_dc_primary_plane_create(struct drm_device *drm,
+						       struct tegra_dc *dc)
+{
+	struct tegra_plane *plane;
+	unsigned int num_formats;
+	const u32 *formats;
+	int err;
+
+	plane = kzalloc(sizeof(*plane), GFP_KERNEL);
+	if (!plane)
+		return ERR_PTR(-ENOMEM);
+
+	num_formats = ARRAY_SIZE(tegra_primary_plane_formats);
+	formats = tegra_primary_plane_formats;
+
+	err = drm_universal_plane_init(drm, &plane->base, 1 << dc->pipe,
+				       &tegra_primary_plane_funcs, formats,
+				       num_formats, DRM_PLANE_TYPE_PRIMARY);
+	if (err < 0) {
+		kfree(plane);
+		return ERR_PTR(err);
+	}
+
+	return &plane->base;
+}
+
+static const u32 tegra_cursor_plane_formats[] = {
+	DRM_FORMAT_RGBA8888,
+};
+
+static int tegra_cursor_plane_update(struct drm_plane *plane,
+				     struct drm_crtc *crtc,
+				     struct drm_framebuffer *fb, int crtc_x,
+				     int crtc_y, unsigned int crtc_w,
+				     unsigned int crtc_h, uint32_t src_x,
+				     uint32_t src_y, uint32_t src_w,
+				     uint32_t src_h)
+{
+	struct tegra_bo *bo = tegra_fb_get_plane(fb, 0);
+	struct tegra_dc *dc = to_tegra_dc(crtc);
+	u32 value = CURSOR_CLIP_DISPLAY;
+
+	/* scaling not supported for cursor */
+	if ((src_w >> 16 != crtc_w) || (src_h >> 16 != crtc_h))
+		return -EINVAL;
+
+	/* only square cursors supported */
+	if (src_w != src_h)
+		return -EINVAL;
+
+	switch (crtc_w) {
+	case 32:
+		value |= CURSOR_SIZE_32x32;
+		break;
+
+	case 64:
+		value |= CURSOR_SIZE_64x64;
+		break;
+
+	case 128:
+		value |= CURSOR_SIZE_128x128;
+		break;
+
+	case 256:
+		value |= CURSOR_SIZE_256x256;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	value |= (bo->paddr >> 10) & 0x3fffff;
+	tegra_dc_writel(dc, value, DC_DISP_CURSOR_START_ADDR);
+
+#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
+	value = (bo->paddr >> 32) & 0x3;
+	tegra_dc_writel(dc, value, DC_DISP_CURSOR_START_ADDR_HI);
+#endif
+
+	/* enable cursor and set blend mode */
+	value = tegra_dc_readl(dc, DC_DISP_DISP_WIN_OPTIONS);
+	value |= CURSOR_ENABLE;
+	tegra_dc_writel(dc, value, DC_DISP_DISP_WIN_OPTIONS);
+
+	value = tegra_dc_readl(dc, DC_DISP_BLEND_CURSOR_CONTROL);
+	value &= ~CURSOR_DST_BLEND_MASK;
+	value &= ~CURSOR_SRC_BLEND_MASK;
+	value |= CURSOR_MODE_NORMAL;
+	value |= CURSOR_DST_BLEND_NEG_K1_TIMES_SRC;
+	value |= CURSOR_SRC_BLEND_K1_TIMES_SRC;
+	value |= CURSOR_ALPHA;
+	tegra_dc_writel(dc, value, DC_DISP_BLEND_CURSOR_CONTROL);
+
+	/* position the cursor */
+	value = (crtc_y & 0x3fff) << 16 | (crtc_x & 0x3fff);
+	tegra_dc_writel(dc, value, DC_DISP_CURSOR_POSITION);
+
+	/* apply changes */
+	tegra_dc_cursor_commit(dc);
+	tegra_dc_commit(dc);
+
+	return 0;
+}
+
+static int tegra_cursor_plane_disable(struct drm_plane *plane)
+{
+	struct tegra_dc *dc = to_tegra_dc(plane->crtc);
+	u32 value;
+
+	if (!plane->crtc)
+		return 0;
+
+	value = tegra_dc_readl(dc, DC_DISP_DISP_WIN_OPTIONS);
+	value &= ~CURSOR_ENABLE;
+	tegra_dc_writel(dc, value, DC_DISP_DISP_WIN_OPTIONS);
+
+	tegra_dc_cursor_commit(dc);
+	tegra_dc_commit(dc);
 
 	return 0;
 }
 
-static int tegra_plane_update(struct drm_plane *plane, struct drm_crtc *crtc,
-			      struct drm_framebuffer *fb, int crtc_x,
-			      int crtc_y, unsigned int crtc_w,
-			      unsigned int crtc_h, uint32_t src_x,
-			      uint32_t src_y, uint32_t src_w, uint32_t src_h)
+static const struct drm_plane_funcs tegra_cursor_plane_funcs = {
+	.update_plane = tegra_cursor_plane_update,
+	.disable_plane = tegra_cursor_plane_disable,
+	.destroy = tegra_plane_destroy,
+};
+
+static struct drm_plane *tegra_dc_cursor_plane_create(struct drm_device *drm,
+						      struct tegra_dc *dc)
+{
+	struct tegra_plane *plane;
+	unsigned int num_formats;
+	const u32 *formats;
+	int err;
+
+	plane = kzalloc(sizeof(*plane), GFP_KERNEL);
+	if (!plane)
+		return ERR_PTR(-ENOMEM);
+
+	num_formats = ARRAY_SIZE(tegra_cursor_plane_formats);
+	formats = tegra_cursor_plane_formats;
+
+	err = drm_universal_plane_init(drm, &plane->base, 1 << dc->pipe,
+				       &tegra_cursor_plane_funcs, formats,
+				       num_formats, DRM_PLANE_TYPE_CURSOR);
+	if (err < 0) {
+		kfree(plane);
+		return ERR_PTR(err);
+	}
+
+	return &plane->base;
+}
+
+static int tegra_overlay_plane_update(struct drm_plane *plane,
+				      struct drm_crtc *crtc,
+				      struct drm_framebuffer *fb, int crtc_x,
+				      int crtc_y, unsigned int crtc_w,
+				      unsigned int crtc_h, uint32_t src_x,
+				      uint32_t src_y, uint32_t src_w,
+				      uint32_t src_h)
 {
 	struct tegra_plane *p = to_tegra_plane(plane);
 	struct tegra_dc *dc = to_tegra_dc(crtc);
@@ -359,44 +628,19 @@ static int tegra_plane_update(struct drm_plane *plane, struct drm_crtc *crtc,
 	return tegra_dc_setup_window(dc, p->index, &window);
 }
 
-static int tegra_plane_disable(struct drm_plane *plane)
+static void tegra_overlay_plane_destroy(struct drm_plane *plane)
 {
-	struct tegra_dc *dc = to_tegra_dc(plane->crtc);
-	struct tegra_plane *p = to_tegra_plane(plane);
-	unsigned long value;
-
-	if (!plane->crtc)
-		return 0;
-
-	value = WINDOW_A_SELECT << p->index;
-	tegra_dc_writel(dc, value, DC_CMD_DISPLAY_WINDOW_HEADER);
-
-	value = tegra_dc_readl(dc, DC_WIN_WIN_OPTIONS);
-	value &= ~WIN_ENABLE;
-	tegra_dc_writel(dc, value, DC_WIN_WIN_OPTIONS);
-
-	tegra_dc_writel(dc, WIN_A_UPDATE << p->index, DC_CMD_STATE_CONTROL);
-	tegra_dc_writel(dc, WIN_A_ACT_REQ << p->index, DC_CMD_STATE_CONTROL);
-
-	return 0;
-}
-
-static void tegra_plane_destroy(struct drm_plane *plane)
-{
-	struct tegra_plane *p = to_tegra_plane(plane);
-
-	tegra_plane_disable(plane);
-	drm_plane_cleanup(plane);
-	kfree(p);
+	tegra_window_plane_disable(plane);
+	tegra_plane_destroy(plane);
 }
 
-static const struct drm_plane_funcs tegra_plane_funcs = {
-	.update_plane = tegra_plane_update,
-	.disable_plane = tegra_plane_disable,
-	.destroy = tegra_plane_destroy,
+static const struct drm_plane_funcs tegra_overlay_plane_funcs = {
+	.update_plane = tegra_overlay_plane_update,
+	.disable_plane = tegra_window_plane_disable,
+	.destroy = tegra_overlay_plane_destroy,
 };
 
-static const uint32_t plane_formats[] = {
+static const uint32_t tegra_overlay_plane_formats[] = {
 	DRM_FORMAT_XBGR8888,
 	DRM_FORMAT_XRGB8888,
 	DRM_FORMAT_RGB565,
@@ -406,27 +650,44 @@ static const uint32_t plane_formats[] = {
 	DRM_FORMAT_YUV422,
 };
 
-static int tegra_dc_add_planes(struct drm_device *drm, struct tegra_dc *dc)
+static struct drm_plane *tegra_dc_overlay_plane_create(struct drm_device *drm,
+						       struct tegra_dc *dc,
+						       unsigned int index)
 {
-	unsigned int i;
-	int err = 0;
+	struct tegra_plane *plane;
+	unsigned int num_formats;
+	const u32 *formats;
+	int err;
 
-	for (i = 0; i < 2; i++) {
-		struct tegra_plane *plane;
+	plane = kzalloc(sizeof(*plane), GFP_KERNEL);
+	if (!plane)
+		return ERR_PTR(-ENOMEM);
 
-		plane = kzalloc(sizeof(*plane), GFP_KERNEL);
-		if (!plane)
-			return -ENOMEM;
+	plane->index = index;
 
-		plane->index = 1 + i;
+	num_formats = ARRAY_SIZE(tegra_overlay_plane_formats);
+	formats = tegra_overlay_plane_formats;
 
-		err = drm_plane_init(drm, &plane->base, 1 << dc->pipe,
-				     &tegra_plane_funcs, plane_formats,
-				     ARRAY_SIZE(plane_formats), false);
-		if (err < 0) {
-			kfree(plane);
-			return err;
-		}
+	err = drm_universal_plane_init(drm, &plane->base, 1 << dc->pipe,
+				       &tegra_overlay_plane_funcs, formats,
+				       num_formats, DRM_PLANE_TYPE_OVERLAY);
+	if (err < 0) {
+		kfree(plane);
+		return ERR_PTR(err);
+	}
+
+	return &plane->base;
+}
+
+static int tegra_dc_add_planes(struct drm_device *drm, struct tegra_dc *dc)
+{
+	struct drm_plane *plane;
+	unsigned int i;
+
+	for (i = 0; i < 2; i++) {
+		plane = tegra_dc_overlay_plane_create(drm, dc, 1 + i);
+		if (IS_ERR(plane))
+			return PTR_ERR(plane);
 	}
 
 	return 0;
@@ -513,10 +774,8 @@ static int tegra_dc_set_base(struct tegra_dc *dc, int x, int y,
 	tegra_dc_writel(dc, h_offset, DC_WINBUF_ADDR_H_OFFSET);
 	tegra_dc_writel(dc, v_offset, DC_WINBUF_ADDR_V_OFFSET);
 
-	value = GENERAL_UPDATE | WIN_A_UPDATE;
-	tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL);
-
 	value = GENERAL_ACT_REQ | WIN_A_ACT_REQ;
+	tegra_dc_writel(dc, value << 8, DC_CMD_STATE_CONTROL);
 	tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL);
 
 	return 0;
@@ -548,109 +807,6 @@ void tegra_dc_disable_vblank(struct tegra_dc *dc)
 	spin_unlock_irqrestore(&dc->lock, flags);
 }
 
-static int tegra_dc_cursor_set2(struct drm_crtc *crtc, struct drm_file *file,
-				uint32_t handle, uint32_t width,
-				uint32_t height, int32_t hot_x, int32_t hot_y)
-{
-	unsigned long value = CURSOR_CLIP_DISPLAY;
-	struct tegra_dc *dc = to_tegra_dc(crtc);
-	struct drm_gem_object *gem;
-	struct tegra_bo *bo = NULL;
-
-	if (!dc->soc->supports_cursor)
-		return -ENXIO;
-
-	if (width != height)
-		return -EINVAL;
-
-	switch (width) {
-	case 32:
-		value |= CURSOR_SIZE_32x32;
-		break;
-
-	case 64:
-		value |= CURSOR_SIZE_64x64;
-		break;
-
-	case 128:
-		value |= CURSOR_SIZE_128x128;
-
-	case 256:
-		value |= CURSOR_SIZE_256x256;
-		break;
-
-	default:
-		return -EINVAL;
-	}
-
-	if (handle) {
-		gem = drm_gem_object_lookup(crtc->dev, file, handle);
-		if (!gem)
-			return -ENOENT;
-
-		bo = to_tegra_bo(gem);
-	}
-
-	if (bo) {
-		unsigned long addr = (bo->paddr & 0xfffffc00) >> 10;
-#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
-		unsigned long high = (bo->paddr & 0xfffffffc) >> 32;
-#endif
-
-		tegra_dc_writel(dc, value | addr, DC_DISP_CURSOR_START_ADDR);
-
-#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
-		tegra_dc_writel(dc, high, DC_DISP_CURSOR_START_ADDR_HI);
-#endif
-
-		value = tegra_dc_readl(dc, DC_DISP_DISP_WIN_OPTIONS);
-		value |= CURSOR_ENABLE;
-		tegra_dc_writel(dc, value, DC_DISP_DISP_WIN_OPTIONS);
-
-		value = tegra_dc_readl(dc, DC_DISP_BLEND_CURSOR_CONTROL);
-		value &= ~CURSOR_DST_BLEND_MASK;
-		value &= ~CURSOR_SRC_BLEND_MASK;
-		value |= CURSOR_MODE_NORMAL;
-		value |= CURSOR_DST_BLEND_NEG_K1_TIMES_SRC;
-		value |= CURSOR_SRC_BLEND_K1_TIMES_SRC;
-		value |= CURSOR_ALPHA;
-		tegra_dc_writel(dc, value, DC_DISP_BLEND_CURSOR_CONTROL);
-	} else {
-		value = tegra_dc_readl(dc, DC_DISP_DISP_WIN_OPTIONS);
-		value &= ~CURSOR_ENABLE;
-		tegra_dc_writel(dc, value, DC_DISP_DISP_WIN_OPTIONS);
-	}
-
-	tegra_dc_writel(dc, CURSOR_ACT_REQ << 8, DC_CMD_STATE_CONTROL);
-	tegra_dc_writel(dc, CURSOR_ACT_REQ, DC_CMD_STATE_CONTROL);
-
-	tegra_dc_writel(dc, GENERAL_ACT_REQ << 8, DC_CMD_STATE_CONTROL);
-	tegra_dc_writel(dc, GENERAL_ACT_REQ, DC_CMD_STATE_CONTROL);
-
-	return 0;
-}
-
-static int tegra_dc_cursor_move(struct drm_crtc *crtc, int x, int y)
-{
-	struct tegra_dc *dc = to_tegra_dc(crtc);
-	unsigned long value;
-
-	if (!dc->soc->supports_cursor)
-		return -ENXIO;
-
-	value = ((y & 0x3fff) << 16) | (x & 0x3fff);
-	tegra_dc_writel(dc, value, DC_DISP_CURSOR_POSITION);
-
-	tegra_dc_writel(dc, CURSOR_ACT_REQ << 8, DC_CMD_STATE_CONTROL);
-	tegra_dc_writel(dc, CURSOR_ACT_REQ, DC_CMD_STATE_CONTROL);
-
-	/* XXX: only required on generations earlier than Tegra124? */
-	tegra_dc_writel(dc, GENERAL_ACT_REQ << 8, DC_CMD_STATE_CONTROL);
-	tegra_dc_writel(dc, GENERAL_ACT_REQ, DC_CMD_STATE_CONTROL);
-
-	return 0;
-}
-
 static void tegra_dc_finish_page_flip(struct tegra_dc *dc)
 {
 	struct drm_device *drm = dc->base.dev;
@@ -727,8 +883,6 @@ static void tegra_dc_destroy(struct drm_crtc *crtc)
 }
 
 static const struct drm_crtc_funcs tegra_crtc_funcs = {
-	.cursor_set2 = tegra_dc_cursor_set2,
-	.cursor_move = tegra_dc_cursor_move,
 	.page_flip = tegra_dc_page_flip,
 	.set_config = drm_crtc_helper_set_config,
 	.destroy = tegra_dc_destroy,
@@ -736,12 +890,13 @@ static const struct drm_crtc_funcs tegra_crtc_funcs = {
 
 static void tegra_crtc_disable(struct drm_crtc *crtc)
 {
+	struct tegra_dc *dc = to_tegra_dc(crtc);
 	struct drm_device *drm = crtc->dev;
 	struct drm_plane *plane;
 
 	drm_for_each_legacy_plane(plane, &drm->mode_config.plane_list) {
 		if (plane->crtc == crtc) {
-			tegra_plane_disable(plane);
+			tegra_window_plane_disable(plane);
 			plane->crtc = NULL;
 
 			if (plane->fb) {
@@ -752,6 +907,7 @@ static void tegra_crtc_disable(struct drm_crtc *crtc)
 	}
 
 	drm_crtc_vblank_off(crtc);
+	tegra_dc_commit(dc);
 }
 
 static bool tegra_crtc_mode_fixup(struct drm_crtc *crtc,
@@ -934,15 +1090,9 @@ static void tegra_crtc_prepare(struct drm_crtc *crtc)
 static void tegra_crtc_commit(struct drm_crtc *crtc)
 {
 	struct tegra_dc *dc = to_tegra_dc(crtc);
-	unsigned long value;
-
-	value = GENERAL_UPDATE | WIN_A_UPDATE;
-	tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL);
-
-	value = GENERAL_ACT_REQ | WIN_A_ACT_REQ;
-	tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL);
 
 	drm_crtc_vblank_on(crtc);
+	tegra_dc_commit(dc);
 }
 
 static void tegra_crtc_load_lut(struct drm_crtc *crtc)
@@ -996,7 +1146,7 @@ static int tegra_dc_show_regs(struct seq_file *s, void *data)
 	struct tegra_dc *dc = node->info_ent->data;
 
 #define DUMP_REG(name)						\
-	seq_printf(s, "%-40s %#05x %08lx\n", #name, name,	\
+	seq_printf(s, "%-40s %#05x %08x\n", #name, name,	\
 		   tegra_dc_readl(dc, name))
 
 	DUMP_REG(DC_CMD_GENERAL_INCR_SYNCPT);
@@ -1284,9 +1434,40 @@ static int tegra_dc_init(struct host1x_client *client)
 	struct drm_device *drm = dev_get_drvdata(client->parent);
 	struct tegra_dc *dc = host1x_client_to_dc(client);
 	struct tegra_drm *tegra = drm->dev_private;
+	struct drm_plane *primary = NULL;
+	struct drm_plane *cursor = NULL;
 	int err;
 
-	drm_crtc_init(drm, &dc->base, &tegra_crtc_funcs);
+	if (tegra->domain) {
+		err = iommu_attach_device(tegra->domain, dc->dev);
+		if (err < 0) {
+			dev_err(dc->dev, "failed to attach to domain: %d\n",
+				err);
+			return err;
+		}
+
+		dc->domain = tegra->domain;
+	}
+
+	primary = tegra_dc_primary_plane_create(drm, dc);
+	if (IS_ERR(primary)) {
+		err = PTR_ERR(primary);
+		goto cleanup;
+	}
+
+	if (dc->soc->supports_cursor) {
+		cursor = tegra_dc_cursor_plane_create(drm, dc);
+		if (IS_ERR(cursor)) {
+			err = PTR_ERR(cursor);
+			goto cleanup;
+		}
+	}
+
+	err = drm_crtc_init_with_planes(drm, &dc->base, primary, cursor,
+					&tegra_crtc_funcs);
+	if (err < 0)
+		goto cleanup;
+
 	drm_mode_crtc_set_gamma_size(&dc->base, 256);
 	drm_crtc_helper_add(&dc->base, &tegra_crtc_helper_funcs);
 
@@ -1300,12 +1481,12 @@ static int tegra_dc_init(struct host1x_client *client)
 	err = tegra_dc_rgb_init(drm, dc);
 	if (err < 0 && err != -ENODEV) {
 		dev_err(dc->dev, "failed to initialize RGB output: %d\n", err);
-		return err;
+		goto cleanup;
 	}
 
 	err = tegra_dc_add_planes(drm, dc);
 	if (err < 0)
-		return err;
+		goto cleanup;
 
 	if (IS_ENABLED(CONFIG_DEBUG_FS)) {
 		err = tegra_dc_debugfs_init(dc, drm->primary);
@@ -1318,10 +1499,24 @@ static int tegra_dc_init(struct host1x_client *client)
 	if (err < 0) {
 		dev_err(dc->dev, "failed to request IRQ#%u: %d\n", dc->irq,
 			err);
-		return err;
+		goto cleanup;
 	}
 
 	return 0;
+
+cleanup:
+	if (cursor)
+		drm_plane_cleanup(cursor);
+
+	if (primary)
+		drm_plane_cleanup(primary);
+
+	if (tegra->domain) {
+		iommu_detach_device(tegra->domain, dc->dev);
+		dc->domain = NULL;
+	}
+
+	return err;
 }
 
 static int tegra_dc_exit(struct host1x_client *client)
@@ -1343,6 +1538,11 @@ static int tegra_dc_exit(struct host1x_client *client)
 		return err;
 	}
 
+	if (dc->domain) {
+		iommu_detach_device(dc->domain, dc->dev);
+		dc->domain = NULL;
+	}
+
 	return 0;
 }
 
@@ -1356,6 +1556,7 @@ static const struct tegra_dc_soc_info tegra20_dc_soc_info = {
 	.supports_cursor = false,
 	.supports_block_linear = false,
 	.pitch_align = 8,
+	.has_powergate = false,
 };
 
 static const struct tegra_dc_soc_info tegra30_dc_soc_info = {
@@ -1363,6 +1564,7 @@ static const struct tegra_dc_soc_info tegra30_dc_soc_info = {
 	.supports_cursor = false,
 	.supports_block_linear = false,
 	.pitch_align = 8,
+	.has_powergate = false,
 };
 
 static const struct tegra_dc_soc_info tegra114_dc_soc_info = {
@@ -1370,6 +1572,7 @@ static const struct tegra_dc_soc_info tegra114_dc_soc_info = {
 	.supports_cursor = false,
 	.supports_block_linear = false,
 	.pitch_align = 64,
+	.has_powergate = true,
 };
 
 static const struct tegra_dc_soc_info tegra124_dc_soc_info = {
@@ -1377,6 +1580,7 @@ static const struct tegra_dc_soc_info tegra124_dc_soc_info = {
 	.supports_cursor = true,
 	.supports_block_linear = true,
 	.pitch_align = 64,
+	.has_powergate = true,
 };
 
 static const struct of_device_id tegra_dc_of_match[] = {
@@ -1384,6 +1588,9 @@ static const struct of_device_id tegra_dc_of_match[] = {
 		.compatible = "nvidia,tegra124-dc",
 		.data = &tegra124_dc_soc_info,
 	}, {
+		.compatible = "nvidia,tegra114-dc",
+		.data = &tegra114_dc_soc_info,
+	}, {
 		.compatible = "nvidia,tegra30-dc",
 		.data = &tegra30_dc_soc_info,
 	}, {
@@ -1466,9 +1673,34 @@ static int tegra_dc_probe(struct platform_device *pdev)
 		return PTR_ERR(dc->rst);
 	}
 
-	err = clk_prepare_enable(dc->clk);
-	if (err < 0)
-		return err;
+	if (dc->soc->has_powergate) {
+		if (dc->pipe == 0)
+			dc->powergate = TEGRA_POWERGATE_DIS;
+		else
+			dc->powergate = TEGRA_POWERGATE_DISB;
+
+		err = tegra_powergate_sequence_power_up(dc->powergate, dc->clk,
+							dc->rst);
+		if (err < 0) {
+			dev_err(&pdev->dev, "failed to power partition: %d\n",
+				err);
+			return err;
+		}
+	} else {
+		err = clk_prepare_enable(dc->clk);
+		if (err < 0) {
+			dev_err(&pdev->dev, "failed to enable clock: %d\n",
+				err);
+			return err;
+		}
+
+		err = reset_control_deassert(dc->rst);
+		if (err < 0) {
+			dev_err(&pdev->dev, "failed to deassert reset: %d\n",
+				err);
+			return err;
+		}
+	}
 
 	regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	dc->regs = devm_ioremap_resource(&pdev->dev, regs);
@@ -1522,6 +1754,10 @@ static int tegra_dc_remove(struct platform_device *pdev)
 	}
 
 	reset_control_assert(dc->rst);
+
+	if (dc->soc->has_powergate)
+		tegra_powergate_power_off(dc->powergate);
+
 	clk_disable_unprepare(dc->clk);
 
 	return 0;
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 59736bb810cd..e549afeece1f 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/host1x.h>
+#include <linux/iommu.h>
 
 #include "drm.h"
 #include "gem.h"
@@ -33,6 +34,17 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 	if (!tegra)
 		return -ENOMEM;
 
+	if (iommu_present(&platform_bus_type)) {
+		tegra->domain = iommu_domain_alloc(&platform_bus_type);
+		if (IS_ERR(tegra->domain)) {
+			err = PTR_ERR(tegra->domain);
+			goto free;
+		}
+
+		DRM_DEBUG("IOMMU context initialized\n");
+		drm_mm_init(&tegra->mm, 0, SZ_2G);
+	}
+
 	mutex_init(&tegra->clients_lock);
 	INIT_LIST_HEAD(&tegra->clients);
 	drm->dev_private = tegra;
@@ -42,13 +54,13 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 
 	err = tegra_drm_fb_prepare(drm);
 	if (err < 0)
-		return err;
+		goto config;
 
 	drm_kms_helper_poll_init(drm);
 
 	err = host1x_device_init(device);
 	if (err < 0)
-		return err;
+		goto fbdev;
 
 	/*
 	 * We don't use the drm_irq_install() helpers provided by the DRM
@@ -59,18 +71,37 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 
 	err = drm_vblank_init(drm, drm->mode_config.num_crtc);
 	if (err < 0)
-		return err;
+		goto device;
 
 	err = tegra_drm_fb_init(drm);
 	if (err < 0)
-		return err;
+		goto vblank;
 
 	return 0;
+
+vblank:
+	drm_vblank_cleanup(drm);
+device:
+	host1x_device_exit(device);
+fbdev:
+	drm_kms_helper_poll_fini(drm);
+	tegra_drm_fb_free(drm);
+config:
+	drm_mode_config_cleanup(drm);
+
+	if (tegra->domain) {
+		iommu_domain_free(tegra->domain);
+		drm_mm_takedown(&tegra->mm);
+	}
+free:
+	kfree(tegra);
+	return err;
 }
 
 static int tegra_drm_unload(struct drm_device *drm)
 {
 	struct host1x_device *device = to_host1x_device(drm->dev);
+	struct tegra_drm *tegra = drm->dev_private;
 	int err;
 
 	drm_kms_helper_poll_fini(drm);
@@ -82,6 +113,13 @@ static int tegra_drm_unload(struct drm_device *drm)
 	if (err < 0)
 		return err;
 
+	if (tegra->domain) {
+		iommu_domain_free(tegra->domain);
+		drm_mm_takedown(&tegra->mm);
+	}
+
+	kfree(tegra);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index e89c70fa82d5..3a3b2e7b5b3f 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -39,6 +39,9 @@ struct tegra_fbdev {
 struct tegra_drm {
 	struct drm_device *drm;
 
+	struct iommu_domain *domain;
+	struct drm_mm mm;
+
 	struct mutex clients_lock;
 	struct list_head clients;
 
@@ -101,6 +104,7 @@ struct tegra_dc {
 	spinlock_t lock;
 
 	struct drm_crtc base;
+	int powergate;
 	int pipe;
 
 	struct clk *clk;
@@ -120,6 +124,8 @@ struct tegra_dc {
 	struct drm_pending_vblank_event *event;
 
 	const struct tegra_dc_soc_info *soc;
+
+	struct iommu_domain *domain;
 };
 
 static inline struct tegra_dc *
@@ -133,16 +139,15 @@ static inline struct tegra_dc *to_tegra_dc(struct drm_crtc *crtc)
 	return crtc ? container_of(crtc, struct tegra_dc, base) : NULL;
 }
 
-static inline void tegra_dc_writel(struct tegra_dc *dc, unsigned long value,
-				   unsigned long reg)
+static inline void tegra_dc_writel(struct tegra_dc *dc, u32 value,
+				   unsigned long offset)
 {
-	writel(value, dc->regs + (reg << 2));
+	writel(value, dc->regs + (offset << 2));
 }
 
-static inline unsigned long tegra_dc_readl(struct tegra_dc *dc,
-					   unsigned long reg)
+static inline u32 tegra_dc_readl(struct tegra_dc *dc, unsigned long offset)
 {
-	return readl(dc->regs + (reg << 2));
+	return readl(dc->regs + (offset << 2));
 }
 
 struct tegra_dc_window {
@@ -287,6 +292,7 @@ bool tegra_fb_is_bottom_up(struct drm_framebuffer *framebuffer);
 int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer,
 			struct tegra_bo_tiling *tiling);
 int tegra_drm_fb_prepare(struct drm_device *drm);
+void tegra_drm_fb_free(struct drm_device *drm);
 int tegra_drm_fb_init(struct drm_device *drm);
 void tegra_drm_fb_exit(struct drm_device *drm);
 #ifdef CONFIG_DRM_TEGRA_FBDEV
diff --git a/drivers/gpu/drm/tegra/dsi.c b/drivers/gpu/drm/tegra/dsi.c
index f7874458926a..33f67fd601c6 100644
--- a/drivers/gpu/drm/tegra/dsi.c
+++ b/drivers/gpu/drm/tegra/dsi.c
@@ -11,6 +11,7 @@
 #include <linux/host1x.h>
 #include <linux/module.h>
 #include <linux/of.h>
+#include <linux/of_platform.h>
 #include <linux/platform_device.h>
 #include <linux/reset.h>
 
@@ -26,9 +27,6 @@
 #include "dsi.h"
 #include "mipi-phy.h"
 
-#define DSI_VIDEO_FIFO_DEPTH (1920 / 4)
-#define DSI_HOST_FIFO_DEPTH 64
-
 struct tegra_dsi {
 	struct host1x_client client;
 	struct tegra_output output;
@@ -54,6 +52,13 @@ struct tegra_dsi {
 
 	struct regulator *vdd;
 	bool enabled;
+
+	unsigned int video_fifo_depth;
+	unsigned int host_fifo_depth;
+
+	/* for ganged-mode support */
+	struct tegra_dsi *master;
+	struct tegra_dsi *slave;
 };
 
 static inline struct tegra_dsi *
@@ -318,6 +323,21 @@ static const u32 pkt_seq_video_non_burst_sync_events[NUM_PKT_SEQ] = {
 	[11] = PKT_ID0(MIPI_DSI_BLANKING_PACKET) | PKT_LEN0(4),
 };
 
+static const u32 pkt_seq_command_mode[NUM_PKT_SEQ] = {
+	[ 0] = 0,
+	[ 1] = 0,
+	[ 2] = 0,
+	[ 3] = 0,
+	[ 4] = 0,
+	[ 5] = 0,
+	[ 6] = PKT_ID0(MIPI_DSI_DCS_LONG_WRITE) | PKT_LEN0(3) | PKT_LP,
+	[ 7] = 0,
+	[ 8] = 0,
+	[ 9] = 0,
+	[10] = PKT_ID0(MIPI_DSI_DCS_LONG_WRITE) | PKT_LEN0(5) | PKT_LP,
+	[11] = 0,
+};
+
 static int tegra_dsi_set_phy_timing(struct tegra_dsi *dsi)
 {
 	struct mipi_dphy_timing timing;
@@ -329,7 +349,7 @@ static int tegra_dsi_set_phy_timing(struct tegra_dsi *dsi)
 	if (rate < 0)
 		return rate;
 
-	period = DIV_ROUND_CLOSEST(1000000000UL, rate * 2);
+	period = DIV_ROUND_CLOSEST(NSEC_PER_SEC, rate * 2);
 
 	err = mipi_dphy_timing_get_default(&timing, period);
 	if (err < 0)
@@ -369,6 +389,9 @@ static int tegra_dsi_set_phy_timing(struct tegra_dsi *dsi)
 		DSI_TIMING_FIELD(timing.tago, period, 1);
 	tegra_dsi_writel(dsi, value, DSI_BTA_TIMING);
 
+	if (dsi->slave)
+		return tegra_dsi_set_phy_timing(dsi->slave);
+
 	return 0;
 }
 
@@ -426,26 +449,59 @@ static int tegra_dsi_get_format(enum mipi_dsi_pixel_format format,
 	return 0;
 }
 
-static int tegra_output_dsi_enable(struct tegra_output *output)
+static void tegra_dsi_ganged_enable(struct tegra_dsi *dsi, unsigned int start,
+				    unsigned int size)
+{
+	u32 value;
+
+	tegra_dsi_writel(dsi, start, DSI_GANGED_MODE_START);
+	tegra_dsi_writel(dsi, size << 16 | size, DSI_GANGED_MODE_SIZE);
+
+	value = DSI_GANGED_MODE_CONTROL_ENABLE;
+	tegra_dsi_writel(dsi, value, DSI_GANGED_MODE_CONTROL);
+}
+
+static void tegra_dsi_enable(struct tegra_dsi *dsi)
+{
+	u32 value;
+
+	value = tegra_dsi_readl(dsi, DSI_POWER_CONTROL);
+	value |= DSI_POWER_CONTROL_ENABLE;
+	tegra_dsi_writel(dsi, value, DSI_POWER_CONTROL);
+
+	if (dsi->slave)
+		tegra_dsi_enable(dsi->slave);
+}
+
+static unsigned int tegra_dsi_get_lanes(struct tegra_dsi *dsi)
+{
+	if (dsi->master)
+		return dsi->master->lanes + dsi->lanes;
+
+	if (dsi->slave)
+		return dsi->lanes + dsi->slave->lanes;
+
+	return dsi->lanes;
+}
+
+static int tegra_dsi_configure(struct tegra_dsi *dsi, unsigned int pipe,
+			       const struct drm_display_mode *mode)
 {
-	struct tegra_dc *dc = to_tegra_dc(output->encoder.crtc);
-	struct drm_display_mode *mode = &dc->base.mode;
 	unsigned int hact, hsw, hbp, hfp, i, mul, div;
-	struct tegra_dsi *dsi = to_dsi(output);
 	enum tegra_dsi_format format;
-	unsigned long value;
 	const u32 *pkt_seq;
+	u32 value;
 	int err;
 
-	if (dsi->enabled)
-		return 0;
-
 	if (dsi->flags & MIPI_DSI_MODE_VIDEO_SYNC_PULSE) {
 		DRM_DEBUG_KMS("Non-burst video mode with sync pulses\n");
 		pkt_seq = pkt_seq_video_non_burst_sync_pulses;
-	} else {
+	} else if (dsi->flags & MIPI_DSI_MODE_VIDEO) {
 		DRM_DEBUG_KMS("Non-burst video mode with sync events\n");
 		pkt_seq = pkt_seq_video_non_burst_sync_events;
+	} else {
+		DRM_DEBUG_KMS("Command mode\n");
+		pkt_seq = pkt_seq_command_mode;
 	}
 
 	err = tegra_dsi_get_muldiv(dsi->format, &mul, &div);
@@ -456,61 +512,136 @@ static int tegra_output_dsi_enable(struct tegra_output *output)
 	if (err < 0)
 		return err;
 
-	err = clk_enable(dsi->clk);
-	if (err < 0)
-		return err;
-
-	reset_control_deassert(dsi->rst);
-
 	value = DSI_CONTROL_CHANNEL(0) | DSI_CONTROL_FORMAT(format) |
 		DSI_CONTROL_LANES(dsi->lanes - 1) |
-		DSI_CONTROL_SOURCE(dc->pipe);
+		DSI_CONTROL_SOURCE(pipe);
 	tegra_dsi_writel(dsi, value, DSI_CONTROL);
 
-	tegra_dsi_writel(dsi, DSI_VIDEO_FIFO_DEPTH, DSI_MAX_THRESHOLD);
+	tegra_dsi_writel(dsi, dsi->video_fifo_depth, DSI_MAX_THRESHOLD);
 
-	value = DSI_HOST_CONTROL_HS | DSI_HOST_CONTROL_CS |
-		DSI_HOST_CONTROL_ECC;
+	value = DSI_HOST_CONTROL_HS;
 	tegra_dsi_writel(dsi, value, DSI_HOST_CONTROL);
 
 	value = tegra_dsi_readl(dsi, DSI_CONTROL);
+
 	if (dsi->flags & MIPI_DSI_CLOCK_NON_CONTINUOUS)
 		value |= DSI_CONTROL_HS_CLK_CTRL;
+
 	value &= ~DSI_CONTROL_TX_TRIG(3);
-	value &= ~DSI_CONTROL_DCS_ENABLE;
+
+	/* enable DCS commands for command mode */
+	if (dsi->flags & MIPI_DSI_MODE_VIDEO)
+		value &= ~DSI_CONTROL_DCS_ENABLE;
+	else
+		value |= DSI_CONTROL_DCS_ENABLE;
+
 	value |= DSI_CONTROL_VIDEO_ENABLE;
 	value &= ~DSI_CONTROL_HOST_ENABLE;
 	tegra_dsi_writel(dsi, value, DSI_CONTROL);
 
-	err = tegra_dsi_set_phy_timing(dsi);
-	if (err < 0)
-		return err;
-
 	for (i = 0; i < NUM_PKT_SEQ; i++)
 		tegra_dsi_writel(dsi, pkt_seq[i], DSI_PKT_SEQ_0_LO + i);
 
-	/* horizontal active pixels */
-	hact = mode->hdisplay * mul / div;
+	if (dsi->flags & MIPI_DSI_MODE_VIDEO) {
+		/* horizontal active pixels */
+		hact = mode->hdisplay * mul / div;
 
-	/* horizontal sync width */
-	hsw = (mode->hsync_end - mode->hsync_start) * mul / div;
-	hsw -= 10;
+		/* horizontal sync width */
+		hsw = (mode->hsync_end - mode->hsync_start) * mul / div;
+		hsw -= 10;
 
-	/* horizontal back porch */
-	hbp = (mode->htotal - mode->hsync_end) * mul / div;
-	hbp -= 14;
+		/* horizontal back porch */
+		hbp = (mode->htotal - mode->hsync_end) * mul / div;
+		hbp -= 14;
 
-	/* horizontal front porch */
-	hfp = (mode->hsync_start  - mode->hdisplay) * mul / div;
-	hfp -= 8;
+		/* horizontal front porch */
+		hfp = (mode->hsync_start - mode->hdisplay) * mul / div;
+		hfp -= 8;
 
-	tegra_dsi_writel(dsi, hsw << 16 | 0, DSI_PKT_LEN_0_1);
-	tegra_dsi_writel(dsi, hact << 16 | hbp, DSI_PKT_LEN_2_3);
-	tegra_dsi_writel(dsi, hfp, DSI_PKT_LEN_4_5);
-	tegra_dsi_writel(dsi, 0x0f0f << 16, DSI_PKT_LEN_6_7);
+		tegra_dsi_writel(dsi, hsw << 16 | 0, DSI_PKT_LEN_0_1);
+		tegra_dsi_writel(dsi, hact << 16 | hbp, DSI_PKT_LEN_2_3);
+		tegra_dsi_writel(dsi, hfp, DSI_PKT_LEN_4_5);
+		tegra_dsi_writel(dsi, 0x0f0f << 16, DSI_PKT_LEN_6_7);
 
-	/* set SOL delay */
-	tegra_dsi_writel(dsi, 8 * mul / div, DSI_SOL_DELAY);
+		/* set SOL delay (for non-burst mode only) */
+		tegra_dsi_writel(dsi, 8 * mul / div, DSI_SOL_DELAY);
+
+		/* TODO: implement ganged mode */
+	} else {
+		u16 bytes;
+
+		if (dsi->master || dsi->slave) {
+			/*
+			 * For ganged mode, assume symmetric left-right mode.
+			 */
+			bytes = 1 + (mode->hdisplay / 2) * mul / div;
+		} else {
+			/* 1 byte (DCS command) + pixel data */
+			bytes = 1 + mode->hdisplay * mul / div;
+		}
+
+		tegra_dsi_writel(dsi, 0, DSI_PKT_LEN_0_1);
+		tegra_dsi_writel(dsi, bytes << 16, DSI_PKT_LEN_2_3);
+		tegra_dsi_writel(dsi, bytes << 16, DSI_PKT_LEN_4_5);
+		tegra_dsi_writel(dsi, 0, DSI_PKT_LEN_6_7);
+
+		value = MIPI_DCS_WRITE_MEMORY_START << 8 |
+			MIPI_DCS_WRITE_MEMORY_CONTINUE;
+		tegra_dsi_writel(dsi, value, DSI_DCS_CMDS);
+
+		/* set SOL delay */
+		if (dsi->master || dsi->slave) {
+			unsigned int lanes = tegra_dsi_get_lanes(dsi);
+			unsigned long delay, bclk, bclk_ganged;
+
+			/* SOL to valid, valid to FIFO and FIFO write delay */
+			delay = 4 + 4 + 2;
+			delay = DIV_ROUND_UP(delay * mul, div * lanes);
+			/* FIFO read delay */
+			delay = delay + 6;
+
+			bclk = DIV_ROUND_UP(mode->htotal * mul, div * lanes);
+			bclk_ganged = DIV_ROUND_UP(bclk * lanes / 2, lanes);
+			value = bclk - bclk_ganged + delay + 20;
+		} else {
+			/* TODO: revisit for non-ganged mode */
+			value = 8 * mul / div;
+		}
+
+		tegra_dsi_writel(dsi, value, DSI_SOL_DELAY);
+	}
+
+	if (dsi->slave) {
+		err = tegra_dsi_configure(dsi->slave, pipe, mode);
+		if (err < 0)
+			return err;
+
+		/*
+		 * TODO: Support modes other than symmetrical left-right
+		 * split.
+		 */
+		tegra_dsi_ganged_enable(dsi, 0, mode->hdisplay / 2);
+		tegra_dsi_ganged_enable(dsi->slave, mode->hdisplay / 2,
+					mode->hdisplay / 2);
+	}
+
+	return 0;
+}
+
+static int tegra_output_dsi_enable(struct tegra_output *output)
+{
+	struct tegra_dc *dc = to_tegra_dc(output->encoder.crtc);
+	const struct drm_display_mode *mode = &dc->base.mode;
+	struct tegra_dsi *dsi = to_dsi(output);
+	u32 value;
+	int err;
+
+	if (dsi->enabled)
+		return 0;
+
+	err = tegra_dsi_configure(dsi, dc->pipe, mode);
+	if (err < 0)
+		return err;
 
 	/* enable display controller */
 	value = tegra_dc_readl(dc, DC_DISP_DISP_WIN_OPTIONS);
@@ -531,28 +662,79 @@ static int tegra_output_dsi_enable(struct tegra_output *output)
 	tegra_dc_writel(dc, GENERAL_ACT_REQ, DC_CMD_STATE_CONTROL);
 
 	/* enable DSI controller */
-	value = tegra_dsi_readl(dsi, DSI_POWER_CONTROL);
-	value |= DSI_POWER_CONTROL_ENABLE;
-	tegra_dsi_writel(dsi, value, DSI_POWER_CONTROL);
+	tegra_dsi_enable(dsi);
 
 	dsi->enabled = true;
 
 	return 0;
 }
 
+static int tegra_dsi_wait_idle(struct tegra_dsi *dsi, unsigned long timeout)
+{
+	u32 value;
+
+	timeout = jiffies + msecs_to_jiffies(timeout);
+
+	while (time_before(jiffies, timeout)) {
+		value = tegra_dsi_readl(dsi, DSI_STATUS);
+		if (value & DSI_STATUS_IDLE)
+			return 0;
+
+		usleep_range(1000, 2000);
+	}
+
+	return -ETIMEDOUT;
+}
+
+static void tegra_dsi_video_disable(struct tegra_dsi *dsi)
+{
+	u32 value;
+
+	value = tegra_dsi_readl(dsi, DSI_CONTROL);
+	value &= ~DSI_CONTROL_VIDEO_ENABLE;
+	tegra_dsi_writel(dsi, value, DSI_CONTROL);
+
+	if (dsi->slave)
+		tegra_dsi_video_disable(dsi->slave);
+}
+
+static void tegra_dsi_ganged_disable(struct tegra_dsi *dsi)
+{
+	tegra_dsi_writel(dsi, 0, DSI_GANGED_MODE_START);
+	tegra_dsi_writel(dsi, 0, DSI_GANGED_MODE_SIZE);
+	tegra_dsi_writel(dsi, 0, DSI_GANGED_MODE_CONTROL);
+}
+
+static void tegra_dsi_disable(struct tegra_dsi *dsi)
+{
+	u32 value;
+
+	if (dsi->slave) {
+		tegra_dsi_ganged_disable(dsi->slave);
+		tegra_dsi_ganged_disable(dsi);
+	}
+
+	value = tegra_dsi_readl(dsi, DSI_POWER_CONTROL);
+	value &= ~DSI_POWER_CONTROL_ENABLE;
+	tegra_dsi_writel(dsi, value, DSI_POWER_CONTROL);
+
+	if (dsi->slave)
+		tegra_dsi_disable(dsi->slave);
+
+	usleep_range(5000, 10000);
+}
+
 static int tegra_output_dsi_disable(struct tegra_output *output)
 {
 	struct tegra_dc *dc = to_tegra_dc(output->encoder.crtc);
 	struct tegra_dsi *dsi = to_dsi(output);
 	unsigned long value;
+	int err;
 
 	if (!dsi->enabled)
 		return 0;
 
-	/* disable DSI controller */
-	value = tegra_dsi_readl(dsi, DSI_POWER_CONTROL);
-	value &= ~DSI_POWER_CONTROL_ENABLE;
-	tegra_dsi_writel(dsi, value, DSI_POWER_CONTROL);
+	tegra_dsi_video_disable(dsi);
 
 	/*
 	 * The following accesses registers of the display controller, so make
@@ -576,39 +758,68 @@ static int tegra_output_dsi_disable(struct tegra_output *output)
 		tegra_dc_writel(dc, GENERAL_ACT_REQ, DC_CMD_STATE_CONTROL);
 	}
 
-	clk_disable(dsi->clk);
+	err = tegra_dsi_wait_idle(dsi, 100);
+	if (err < 0)
+		dev_dbg(dsi->dev, "failed to idle DSI: %d\n", err);
+
+	tegra_dsi_disable(dsi);
 
 	dsi->enabled = false;
 
 	return 0;
 }
 
+static void tegra_dsi_set_timeout(struct tegra_dsi *dsi, unsigned long bclk,
+				  unsigned int vrefresh)
+{
+	unsigned int timeout;
+	u32 value;
+
+	/* one frame high-speed transmission timeout */
+	timeout = (bclk / vrefresh) / 512;
+	value = DSI_TIMEOUT_LRX(0x2000) | DSI_TIMEOUT_HTX(timeout);
+	tegra_dsi_writel(dsi, value, DSI_TIMEOUT_0);
+
+	/* 2 ms peripheral timeout for panel */
+	timeout = 2 * bclk / 512 * 1000;
+	value = DSI_TIMEOUT_PR(timeout) | DSI_TIMEOUT_TA(0x2000);
+	tegra_dsi_writel(dsi, value, DSI_TIMEOUT_1);
+
+	value = DSI_TALLY_TA(0) | DSI_TALLY_LRX(0) | DSI_TALLY_HTX(0);
+	tegra_dsi_writel(dsi, value, DSI_TO_TALLY);
+
+	if (dsi->slave)
+		tegra_dsi_set_timeout(dsi->slave, bclk, vrefresh);
+}
+
 static int tegra_output_dsi_setup_clock(struct tegra_output *output,
 					struct clk *clk, unsigned long pclk,
 					unsigned int *divp)
 {
 	struct tegra_dc *dc = to_tegra_dc(output->encoder.crtc);
 	struct drm_display_mode *mode = &dc->base.mode;
-	unsigned int timeout, mul, div, vrefresh;
 	struct tegra_dsi *dsi = to_dsi(output);
-	unsigned long bclk, plld, value;
+	unsigned int mul, div, vrefresh, lanes;
+	unsigned long bclk, plld;
 	int err;
 
+	lanes = tegra_dsi_get_lanes(dsi);
+
 	err = tegra_dsi_get_muldiv(dsi->format, &mul, &div);
 	if (err < 0)
 		return err;
 
-	DRM_DEBUG_KMS("mul: %u, div: %u, lanes: %u\n", mul, div, dsi->lanes);
+	DRM_DEBUG_KMS("mul: %u, div: %u, lanes: %u\n", mul, div, lanes);
 	vrefresh = drm_mode_vrefresh(mode);
 	DRM_DEBUG_KMS("vrefresh: %u\n", vrefresh);
 
 	/* compute byte clock */
-	bclk = (pclk * mul) / (div * dsi->lanes);
+	bclk = (pclk * mul) / (div * lanes);
 
 	/*
 	 * Compute bit clock and round up to the next MHz.
 	 */
-	plld = DIV_ROUND_UP(bclk * 8, 1000000) * 1000000;
+	plld = DIV_ROUND_UP(bclk * 8, USEC_PER_SEC) * USEC_PER_SEC;
 
 	/*
 	 * We divide the frequency by two here, but we make up for that by
@@ -640,25 +851,17 @@ static int tegra_output_dsi_setup_clock(struct tegra_output *output,
 	 * not working properly otherwise. Perhaps the PLLs cannot generate
 	 * frequencies sufficiently high.
 	 */
-	*divp = ((8 * mul) / (div * dsi->lanes)) - 2;
+	*divp = ((8 * mul) / (div * lanes)) - 2;
 
 	/*
 	 * XXX: Move the below somewhere else so that we don't need to have
 	 * access to the vrefresh in this function?
 	 */
+	tegra_dsi_set_timeout(dsi, bclk, vrefresh);
 
-	/* one frame high-speed transmission timeout */
-	timeout = (bclk / vrefresh) / 512;
-	value = DSI_TIMEOUT_LRX(0x2000) | DSI_TIMEOUT_HTX(timeout);
-	tegra_dsi_writel(dsi, value, DSI_TIMEOUT_0);
-
-	/* 2 ms peripheral timeout for panel */
-	timeout = 2 * bclk / 512 * 1000;
-	value = DSI_TIMEOUT_PR(timeout) | DSI_TIMEOUT_TA(0x2000);
-	tegra_dsi_writel(dsi, value, DSI_TIMEOUT_1);
-
-	value = DSI_TALLY_TA(0) | DSI_TALLY_LRX(0) | DSI_TALLY_HTX(0);
-	tegra_dsi_writel(dsi, value, DSI_TO_TALLY);
+	err = tegra_dsi_set_phy_timing(dsi);
+	if (err < 0)
+		return err;
 
 	return 0;
 }
@@ -695,7 +898,7 @@ static int tegra_dsi_pad_enable(struct tegra_dsi *dsi)
 
 static int tegra_dsi_pad_calibrate(struct tegra_dsi *dsi)
 {
-	unsigned long value;
+	u32 value;
 
 	tegra_dsi_writel(dsi, 0, DSI_PAD_CONTROL_0);
 	tegra_dsi_writel(dsi, 0, DSI_PAD_CONTROL_1);
@@ -720,14 +923,17 @@ static int tegra_dsi_init(struct host1x_client *client)
 	struct tegra_dsi *dsi = host1x_client_to_dsi(client);
 	int err;
 
-	dsi->output.type = TEGRA_OUTPUT_DSI;
-	dsi->output.dev = client->dev;
-	dsi->output.ops = &dsi_ops;
-
-	err = tegra_output_init(drm, &dsi->output);
-	if (err < 0) {
-		dev_err(client->dev, "output setup failed: %d\n", err);
-		return err;
+	/* Gangsters must not register their own outputs. */
+	if (!dsi->master) {
+		dsi->output.type = TEGRA_OUTPUT_DSI;
+		dsi->output.dev = client->dev;
+		dsi->output.ops = &dsi_ops;
+
+		err = tegra_output_init(drm, &dsi->output);
+		if (err < 0) {
+			dev_err(client->dev, "output setup failed: %d\n", err);
+			return err;
+		}
 	}
 
 	if (IS_ENABLED(CONFIG_DEBUG_FS)) {
@@ -736,12 +942,6 @@ static int tegra_dsi_init(struct host1x_client *client)
 			dev_err(dsi->dev, "debugfs setup failed: %d\n", err);
 	}
 
-	err = tegra_dsi_pad_calibrate(dsi);
-	if (err < 0) {
-		dev_err(dsi->dev, "MIPI calibration failed: %d\n", err);
-		return err;
-	}
-
 	return 0;
 }
 
@@ -756,16 +956,20 @@ static int tegra_dsi_exit(struct host1x_client *client)
 			dev_err(dsi->dev, "debugfs cleanup failed: %d\n", err);
 	}
 
-	err = tegra_output_disable(&dsi->output);
-	if (err < 0) {
-		dev_err(client->dev, "output failed to disable: %d\n", err);
-		return err;
-	}
-
-	err = tegra_output_exit(&dsi->output);
-	if (err < 0) {
-		dev_err(client->dev, "output cleanup failed: %d\n", err);
-		return err;
+	if (!dsi->master) {
+		err = tegra_output_disable(&dsi->output);
+		if (err < 0) {
+			dev_err(client->dev, "output failed to disable: %d\n",
+				err);
+			return err;
+		}
+
+		err = tegra_output_exit(&dsi->output);
+		if (err < 0) {
+			dev_err(client->dev, "output cleanup failed: %d\n",
+				err);
+			return err;
+		}
 	}
 
 	return 0;
@@ -792,20 +996,324 @@ static int tegra_dsi_setup_clocks(struct tegra_dsi *dsi)
 	return 0;
 }
 
+static const char * const error_report[16] = {
+	"SoT Error",
+	"SoT Sync Error",
+	"EoT Sync Error",
+	"Escape Mode Entry Command Error",
+	"Low-Power Transmit Sync Error",
+	"Peripheral Timeout Error",
+	"False Control Error",
+	"Contention Detected",
+	"ECC Error, single-bit",
+	"ECC Error, multi-bit",
+	"Checksum Error",
+	"DSI Data Type Not Recognized",
+	"DSI VC ID Invalid",
+	"Invalid Transmission Length",
+	"Reserved",
+	"DSI Protocol Violation",
+};
+
+static ssize_t tegra_dsi_read_response(struct tegra_dsi *dsi,
+				       const struct mipi_dsi_msg *msg,
+				       size_t count)
+{
+	u8 *rx = msg->rx_buf;
+	unsigned int i, j, k;
+	size_t size = 0;
+	u16 errors;
+	u32 value;
+
+	/* read and parse packet header */
+	value = tegra_dsi_readl(dsi, DSI_RD_DATA);
+
+	switch (value & 0x3f) {
+	case MIPI_DSI_RX_ACKNOWLEDGE_AND_ERROR_REPORT:
+		errors = (value >> 8) & 0xffff;
+		dev_dbg(dsi->dev, "Acknowledge and error report: %04x\n",
+			errors);
+		for (i = 0; i < ARRAY_SIZE(error_report); i++)
+			if (errors & BIT(i))
+				dev_dbg(dsi->dev, "  %2u: %s\n", i,
+					error_report[i]);
+		break;
+
+	case MIPI_DSI_RX_DCS_SHORT_READ_RESPONSE_1BYTE:
+		rx[0] = (value >> 8) & 0xff;
+		size = 1;
+		break;
+
+	case MIPI_DSI_RX_DCS_SHORT_READ_RESPONSE_2BYTE:
+		rx[0] = (value >>  8) & 0xff;
+		rx[1] = (value >> 16) & 0xff;
+		size = 2;
+		break;
+
+	case MIPI_DSI_RX_DCS_LONG_READ_RESPONSE:
+		size = ((value >> 8) & 0xff00) | ((value >> 8) & 0xff);
+		break;
+
+	case MIPI_DSI_RX_GENERIC_LONG_READ_RESPONSE:
+		size = ((value >> 8) & 0xff00) | ((value >> 8) & 0xff);
+		break;
+
+	default:
+		dev_err(dsi->dev, "unhandled response type: %02x\n",
+			value & 0x3f);
+		return -EPROTO;
+	}
+
+	size = min(size, msg->rx_len);
+
+	if (msg->rx_buf && size > 0) {
+		for (i = 0, j = 0; i < count - 1; i++, j += 4) {
+			u8 *rx = msg->rx_buf + j;
+
+			value = tegra_dsi_readl(dsi, DSI_RD_DATA);
+
+			for (k = 0; k < 4 && (j + k) < msg->rx_len; k++)
+				rx[j + k] = (value >> (k << 3)) & 0xff;
+		}
+	}
+
+	return size;
+}
+
+static int tegra_dsi_transmit(struct tegra_dsi *dsi, unsigned long timeout)
+{
+	tegra_dsi_writel(dsi, DSI_TRIGGER_HOST, DSI_TRIGGER);
+
+	timeout = jiffies + msecs_to_jiffies(timeout);
+
+	while (time_before(jiffies, timeout)) {
+		u32 value = tegra_dsi_readl(dsi, DSI_TRIGGER);
+		if ((value & DSI_TRIGGER_HOST) == 0)
+			return 0;
+
+		usleep_range(1000, 2000);
+	}
+
+	DRM_DEBUG_KMS("timeout waiting for transmission to complete\n");
+	return -ETIMEDOUT;
+}
+
+static int tegra_dsi_wait_for_response(struct tegra_dsi *dsi,
+				       unsigned long timeout)
+{
+	timeout = jiffies + msecs_to_jiffies(250);
+
+	while (time_before(jiffies, timeout)) {
+		u32 value = tegra_dsi_readl(dsi, DSI_STATUS);
+		u8 count = value & 0x1f;
+
+		if (count > 0)
+			return count;
+
+		usleep_range(1000, 2000);
+	}
+
+	DRM_DEBUG_KMS("peripheral returned no data\n");
+	return -ETIMEDOUT;
+}
+
+static void tegra_dsi_writesl(struct tegra_dsi *dsi, unsigned long offset,
+			      const void *buffer, size_t size)
+{
+	const u8 *buf = buffer;
+	size_t i, j;
+	u32 value;
+
+	for (j = 0; j < size; j += 4) {
+		value = 0;
+
+		for (i = 0; i < 4 && j + i < size; i++)
+			value |= buf[j + i] << (i << 3);
+
+		tegra_dsi_writel(dsi, value, DSI_WR_DATA);
+	}
+}
+
+static ssize_t tegra_dsi_host_transfer(struct mipi_dsi_host *host,
+				       const struct mipi_dsi_msg *msg)
+{
+	struct tegra_dsi *dsi = host_to_tegra(host);
+	struct mipi_dsi_packet packet;
+	const u8 *header;
+	size_t count;
+	ssize_t err;
+	u32 value;
+
+	err = mipi_dsi_create_packet(&packet, msg);
+	if (err < 0)
+		return err;
+
+	header = packet.header;
+
+	/* maximum FIFO depth is 1920 words */
+	if (packet.size > dsi->video_fifo_depth * 4)
+		return -ENOSPC;
+
+	/* reset underflow/overflow flags */
+	value = tegra_dsi_readl(dsi, DSI_STATUS);
+	if (value & (DSI_STATUS_UNDERFLOW | DSI_STATUS_OVERFLOW)) {
+		value = DSI_HOST_CONTROL_FIFO_RESET;
+		tegra_dsi_writel(dsi, value, DSI_HOST_CONTROL);
+		usleep_range(10, 20);
+	}
+
+	value = tegra_dsi_readl(dsi, DSI_POWER_CONTROL);
+	value |= DSI_POWER_CONTROL_ENABLE;
+	tegra_dsi_writel(dsi, value, DSI_POWER_CONTROL);
+
+	usleep_range(5000, 10000);
+
+	value = DSI_HOST_CONTROL_CRC_RESET | DSI_HOST_CONTROL_TX_TRIG_HOST |
+		DSI_HOST_CONTROL_CS | DSI_HOST_CONTROL_ECC;
+
+	if ((msg->flags & MIPI_DSI_MSG_USE_LPM) == 0)
+		value |= DSI_HOST_CONTROL_HS;
+
+	/*
+	 * The host FIFO has a maximum of 64 words, so larger transmissions
+	 * need to use the video FIFO.
+	 */
+	if (packet.size > dsi->host_fifo_depth * 4)
+		value |= DSI_HOST_CONTROL_FIFO_SEL;
+
+	tegra_dsi_writel(dsi, value, DSI_HOST_CONTROL);
+
+	/*
+	 * For reads and messages with explicitly requested ACK, generate a
+	 * BTA sequence after the transmission of the packet.
+	 */
+	if ((msg->flags & MIPI_DSI_MSG_REQ_ACK) ||
+	    (msg->rx_buf && msg->rx_len > 0)) {
+		value = tegra_dsi_readl(dsi, DSI_HOST_CONTROL);
+		value |= DSI_HOST_CONTROL_PKT_BTA;
+		tegra_dsi_writel(dsi, value, DSI_HOST_CONTROL);
+	}
+
+	value = DSI_CONTROL_LANES(0) | DSI_CONTROL_HOST_ENABLE;
+	tegra_dsi_writel(dsi, value, DSI_CONTROL);
+
+	/* write packet header, ECC is generated by hardware */
+	value = header[2] << 16 | header[1] << 8 | header[0];
+	tegra_dsi_writel(dsi, value, DSI_WR_DATA);
+
+	/* write payload (if any) */
+	if (packet.payload_length > 0)
+		tegra_dsi_writesl(dsi, DSI_WR_DATA, packet.payload,
+				  packet.payload_length);
+
+	err = tegra_dsi_transmit(dsi, 250);
+	if (err < 0)
+		return err;
+
+	if ((msg->flags & MIPI_DSI_MSG_REQ_ACK) ||
+	    (msg->rx_buf && msg->rx_len > 0)) {
+		err = tegra_dsi_wait_for_response(dsi, 250);
+		if (err < 0)
+			return err;
+
+		count = err;
+
+		value = tegra_dsi_readl(dsi, DSI_RD_DATA);
+		switch (value) {
+		case 0x84:
+			/*
+			dev_dbg(dsi->dev, "ACK\n");
+			*/
+			break;
+
+		case 0x87:
+			/*
+			dev_dbg(dsi->dev, "ESCAPE\n");
+			*/
+			break;
+
+		default:
+			dev_err(dsi->dev, "unknown status: %08x\n", value);
+			break;
+		}
+
+		if (count > 1) {
+			err = tegra_dsi_read_response(dsi, msg, count);
+			if (err < 0)
+				dev_err(dsi->dev,
+					"failed to parse response: %zd\n",
+					err);
+			else {
+				/*
+				 * For read commands, return the number of
+				 * bytes returned by the peripheral.
+				 */
+				count = err;
+			}
+		}
+	} else {
+		/*
+		 * For write commands, we have transmitted the 4-byte header
+		 * plus the variable-length payload.
+		 */
+		count = 4 + packet.payload_length;
+	}
+
+	return count;
+}
+
+static int tegra_dsi_ganged_setup(struct tegra_dsi *dsi)
+{
+	struct clk *parent;
+	int err;
+
+	/* make sure both DSI controllers share the same PLL */
+	parent = clk_get_parent(dsi->slave->clk);
+	if (!parent)
+		return -EINVAL;
+
+	err = clk_set_parent(parent, dsi->clk_parent);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
 static int tegra_dsi_host_attach(struct mipi_dsi_host *host,
 				 struct mipi_dsi_device *device)
 {
 	struct tegra_dsi *dsi = host_to_tegra(host);
-	struct tegra_output *output = &dsi->output;
 
 	dsi->flags = device->mode_flags;
 	dsi->format = device->format;
 	dsi->lanes = device->lanes;
 
-	output->panel = of_drm_find_panel(device->dev.of_node);
-	if (output->panel) {
-		if (output->connector.dev)
+	if (dsi->slave) {
+		int err;
+
+		dev_dbg(dsi->dev, "attaching dual-channel device %s\n",
+			dev_name(&device->dev));
+
+		err = tegra_dsi_ganged_setup(dsi);
+		if (err < 0) {
+			dev_err(dsi->dev, "failed to set up ganged mode: %d\n",
+				err);
+			return err;
+		}
+	}
+
+	/*
+	 * Slaves don't have a panel associated with them, so they provide
+	 * merely the second channel.
+	 */
+	if (!dsi->master) {
+		struct tegra_output *output = &dsi->output;
+
+		output->panel = of_drm_find_panel(device->dev.of_node);
+		if (output->panel && output->connector.dev) {
+			drm_panel_attach(output->panel, &output->connector);
 			drm_helper_hpd_irq_event(output->connector.dev);
+		}
 	}
 
 	return 0;
@@ -818,10 +1326,10 @@ static int tegra_dsi_host_detach(struct mipi_dsi_host *host,
 	struct tegra_output *output = &dsi->output;
 
 	if (output->panel && &device->dev == output->panel->dev) {
+		output->panel = NULL;
+
 		if (output->connector.dev)
 			drm_helper_hpd_irq_event(output->connector.dev);
-
-		output->panel = NULL;
 	}
 
 	return 0;
@@ -830,8 +1338,29 @@ static int tegra_dsi_host_detach(struct mipi_dsi_host *host,
 static const struct mipi_dsi_host_ops tegra_dsi_host_ops = {
 	.attach = tegra_dsi_host_attach,
 	.detach = tegra_dsi_host_detach,
+	.transfer = tegra_dsi_host_transfer,
 };
 
+static int tegra_dsi_ganged_probe(struct tegra_dsi *dsi)
+{
+	struct device_node *np;
+
+	np = of_parse_phandle(dsi->dev->of_node, "nvidia,ganged-mode", 0);
+	if (np) {
+		struct platform_device *gangster = of_find_device_by_node(np);
+
+		dsi->slave = platform_get_drvdata(gangster);
+		of_node_put(np);
+
+		if (!dsi->slave)
+			return -EPROBE_DEFER;
+
+		dsi->slave->master = dsi;
+	}
+
+	return 0;
+}
+
 static int tegra_dsi_probe(struct platform_device *pdev)
 {
 	struct tegra_dsi *dsi;
@@ -843,11 +1372,19 @@ static int tegra_dsi_probe(struct platform_device *pdev)
 		return -ENOMEM;
 
 	dsi->output.dev = dsi->dev = &pdev->dev;
+	dsi->video_fifo_depth = 1920;
+	dsi->host_fifo_depth = 64;
+
+	err = tegra_dsi_ganged_probe(dsi);
+	if (err < 0)
+		return err;
 
 	err = tegra_output_probe(&dsi->output);
 	if (err < 0)
 		return err;
 
+	dsi->output.connector.polled = DRM_CONNECTOR_POLL_HPD;
+
 	/*
 	 * Assume these values by default. When a DSI peripheral driver
 	 * attaches to the DSI host, the parameters will be taken from
@@ -861,68 +1398,83 @@ static int tegra_dsi_probe(struct platform_device *pdev)
 	if (IS_ERR(dsi->rst))
 		return PTR_ERR(dsi->rst);
 
+	err = reset_control_deassert(dsi->rst);
+	if (err < 0) {
+		dev_err(&pdev->dev, "failed to bring DSI out of reset: %d\n",
+			err);
+		return err;
+	}
+
 	dsi->clk = devm_clk_get(&pdev->dev, NULL);
 	if (IS_ERR(dsi->clk)) {
 		dev_err(&pdev->dev, "cannot get DSI clock\n");
-		return PTR_ERR(dsi->clk);
+		err = PTR_ERR(dsi->clk);
+		goto reset;
 	}
 
 	err = clk_prepare_enable(dsi->clk);
 	if (err < 0) {
 		dev_err(&pdev->dev, "cannot enable DSI clock\n");
-		return err;
+		goto reset;
 	}
 
 	dsi->clk_lp = devm_clk_get(&pdev->dev, "lp");
 	if (IS_ERR(dsi->clk_lp)) {
 		dev_err(&pdev->dev, "cannot get low-power clock\n");
-		return PTR_ERR(dsi->clk_lp);
+		err = PTR_ERR(dsi->clk_lp);
+		goto disable_clk;
 	}
 
 	err = clk_prepare_enable(dsi->clk_lp);
 	if (err < 0) {
 		dev_err(&pdev->dev, "cannot enable low-power clock\n");
-		return err;
+		goto disable_clk;
 	}
 
 	dsi->clk_parent = devm_clk_get(&pdev->dev, "parent");
 	if (IS_ERR(dsi->clk_parent)) {
 		dev_err(&pdev->dev, "cannot get parent clock\n");
-		return PTR_ERR(dsi->clk_parent);
-	}
-
-	err = clk_prepare_enable(dsi->clk_parent);
-	if (err < 0) {
-		dev_err(&pdev->dev, "cannot enable parent clock\n");
-		return err;
+		err = PTR_ERR(dsi->clk_parent);
+		goto disable_clk_lp;
 	}
 
 	dsi->vdd = devm_regulator_get(&pdev->dev, "avdd-dsi-csi");
 	if (IS_ERR(dsi->vdd)) {
 		dev_err(&pdev->dev, "cannot get VDD supply\n");
-		return PTR_ERR(dsi->vdd);
+		err = PTR_ERR(dsi->vdd);
+		goto disable_clk_lp;
 	}
 
 	err = regulator_enable(dsi->vdd);
 	if (err < 0) {
 		dev_err(&pdev->dev, "cannot enable VDD supply\n");
-		return err;
+		goto disable_clk_lp;
 	}
 
 	err = tegra_dsi_setup_clocks(dsi);
 	if (err < 0) {
 		dev_err(&pdev->dev, "cannot setup clocks\n");
-		return err;
+		goto disable_vdd;
 	}
 
 	regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	dsi->regs = devm_ioremap_resource(&pdev->dev, regs);
-	if (IS_ERR(dsi->regs))
-		return PTR_ERR(dsi->regs);
+	if (IS_ERR(dsi->regs)) {
+		err = PTR_ERR(dsi->regs);
+		goto disable_vdd;
+	}
 
 	dsi->mipi = tegra_mipi_request(&pdev->dev);
-	if (IS_ERR(dsi->mipi))
-		return PTR_ERR(dsi->mipi);
+	if (IS_ERR(dsi->mipi)) {
+		err = PTR_ERR(dsi->mipi);
+		goto disable_vdd;
+	}
+
+	err = tegra_dsi_pad_calibrate(dsi);
+	if (err < 0) {
+		dev_err(dsi->dev, "MIPI calibration failed: %d\n", err);
+		goto mipi_free;
+	}
 
 	dsi->host.ops = &tegra_dsi_host_ops;
 	dsi->host.dev = &pdev->dev;
@@ -930,7 +1482,7 @@ static int tegra_dsi_probe(struct platform_device *pdev)
 	err = mipi_dsi_host_register(&dsi->host);
 	if (err < 0) {
 		dev_err(&pdev->dev, "failed to register DSI host: %d\n", err);
-		return err;
+		goto mipi_free;
 	}
 
 	INIT_LIST_HEAD(&dsi->client.list);
@@ -941,12 +1493,26 @@ static int tegra_dsi_probe(struct platform_device *pdev)
 	if (err < 0) {
 		dev_err(&pdev->dev, "failed to register host1x client: %d\n",
 			err);
-		return err;
+		goto unregister;
 	}
 
 	platform_set_drvdata(pdev, dsi);
 
 	return 0;
+
+unregister:
+	mipi_dsi_host_unregister(&dsi->host);
+mipi_free:
+	tegra_mipi_free(dsi->mipi);
+disable_vdd:
+	regulator_disable(dsi->vdd);
+disable_clk_lp:
+	clk_disable_unprepare(dsi->clk_lp);
+disable_clk:
+	clk_disable_unprepare(dsi->clk);
+reset:
+	reset_control_assert(dsi->rst);
+	return err;
 }
 
 static int tegra_dsi_remove(struct platform_device *pdev)
@@ -965,7 +1531,6 @@ static int tegra_dsi_remove(struct platform_device *pdev)
 	tegra_mipi_free(dsi->mipi);
 
 	regulator_disable(dsi->vdd);
-	clk_disable_unprepare(dsi->clk_parent);
 	clk_disable_unprepare(dsi->clk_lp);
 	clk_disable_unprepare(dsi->clk);
 	reset_control_assert(dsi->rst);
diff --git a/drivers/gpu/drm/tegra/dsi.h b/drivers/gpu/drm/tegra/dsi.h
index 5ce610d08d77..bad1006a5150 100644
--- a/drivers/gpu/drm/tegra/dsi.h
+++ b/drivers/gpu/drm/tegra/dsi.h
@@ -21,9 +21,16 @@
 #define DSI_INT_STATUS			0x0d
 #define DSI_INT_MASK			0x0e
 #define DSI_HOST_CONTROL		0x0f
+#define DSI_HOST_CONTROL_FIFO_RESET	(1 << 21)
+#define DSI_HOST_CONTROL_CRC_RESET	(1 << 20)
+#define DSI_HOST_CONTROL_TX_TRIG_SOL	(0 << 12)
+#define DSI_HOST_CONTROL_TX_TRIG_FIFO	(1 << 12)
+#define DSI_HOST_CONTROL_TX_TRIG_HOST	(2 << 12)
 #define DSI_HOST_CONTROL_RAW		(1 << 6)
 #define DSI_HOST_CONTROL_HS		(1 << 5)
-#define DSI_HOST_CONTROL_BTA		(1 << 2)
+#define DSI_HOST_CONTROL_FIFO_SEL	(1 << 4)
+#define DSI_HOST_CONTROL_IMM_BTA	(1 << 3)
+#define DSI_HOST_CONTROL_PKT_BTA	(1 << 2)
 #define DSI_HOST_CONTROL_CS		(1 << 1)
 #define DSI_HOST_CONTROL_ECC		(1 << 0)
 #define DSI_CONTROL			0x10
@@ -39,9 +46,13 @@
 #define DSI_SOL_DELAY			0x11
 #define DSI_MAX_THRESHOLD		0x12
 #define DSI_TRIGGER			0x13
+#define DSI_TRIGGER_HOST		(1 << 1)
+#define DSI_TRIGGER_VIDEO		(1 << 0)
 #define DSI_TX_CRC			0x14
 #define DSI_STATUS			0x15
 #define DSI_STATUS_IDLE			(1 << 10)
+#define DSI_STATUS_UNDERFLOW		(1 <<  9)
+#define DSI_STATUS_OVERFLOW		(1 <<  8)
 #define DSI_INIT_SEQ_CONTROL		0x1a
 #define DSI_INIT_SEQ_DATA_0		0x1b
 #define DSI_INIT_SEQ_DATA_1		0x1c
@@ -104,6 +115,7 @@
 #define DSI_PAD_CONTROL_3		0x51
 #define DSI_PAD_CONTROL_4		0x52
 #define DSI_GANGED_MODE_CONTROL		0x53
+#define DSI_GANGED_MODE_CONTROL_ENABLE	(1 << 0)
 #define DSI_GANGED_MODE_START		0x54
 #define DSI_GANGED_MODE_SIZE		0x55
 #define DSI_RAW_DATA_BYTE_COUNT		0x56
diff --git a/drivers/gpu/drm/tegra/fb.c b/drivers/gpu/drm/tegra/fb.c
index 3513d12d5aa1..e9c715d89261 100644
--- a/drivers/gpu/drm/tegra/fb.c
+++ b/drivers/gpu/drm/tegra/fb.c
@@ -65,8 +65,12 @@ static void tegra_fb_destroy(struct drm_framebuffer *framebuffer)
 	for (i = 0; i < fb->num_planes; i++) {
 		struct tegra_bo *bo = fb->planes[i];
 
-		if (bo)
+		if (bo) {
+			if (bo->pages && bo->vaddr)
+				vunmap(bo->vaddr);
+
 			drm_gem_object_unreference_unlocked(&bo->gem);
+		}
 	}
 
 	drm_framebuffer_cleanup(framebuffer);
@@ -223,14 +227,16 @@ static int tegra_fbdev_probe(struct drm_fb_helper *helper,
 	info = framebuffer_alloc(0, drm->dev);
 	if (!info) {
 		dev_err(drm->dev, "failed to allocate framebuffer info\n");
-		tegra_bo_free_object(&bo->gem);
+		drm_gem_object_unreference_unlocked(&bo->gem);
 		return -ENOMEM;
 	}
 
 	fbdev->fb = tegra_fb_alloc(drm, &cmd, &bo, 1);
 	if (IS_ERR(fbdev->fb)) {
-		dev_err(drm->dev, "failed to allocate DRM framebuffer\n");
 		err = PTR_ERR(fbdev->fb);
+		dev_err(drm->dev, "failed to allocate DRM framebuffer: %d\n",
+			err);
+		drm_gem_object_unreference_unlocked(&bo->gem);
 		goto release;
 	}
 
@@ -254,6 +260,16 @@ static int tegra_fbdev_probe(struct drm_fb_helper *helper,
 	offset = info->var.xoffset * bytes_per_pixel +
 		 info->var.yoffset * fb->pitches[0];
 
+	if (bo->pages) {
+		bo->vaddr = vmap(bo->pages, bo->num_pages, VM_MAP,
+				 pgprot_writecombine(PAGE_KERNEL));
+		if (!bo->vaddr) {
+			dev_err(drm->dev, "failed to vmap() framebuffer\n");
+			err = -ENOMEM;
+			goto destroy;
+		}
+	}
+
 	drm->mode_config.fb_base = (resource_size_t)bo->paddr;
 	info->screen_base = (void __iomem *)bo->vaddr + offset;
 	info->screen_size = size;
@@ -289,6 +305,11 @@ static struct tegra_fbdev *tegra_fbdev_create(struct drm_device *drm)
 	return fbdev;
 }
 
+static void tegra_fbdev_free(struct tegra_fbdev *fbdev)
+{
+	kfree(fbdev);
+}
+
 static int tegra_fbdev_init(struct tegra_fbdev *fbdev,
 			    unsigned int preferred_bpp,
 			    unsigned int num_crtc,
@@ -299,19 +320,21 @@ static int tegra_fbdev_init(struct tegra_fbdev *fbdev,
 
 	err = drm_fb_helper_init(drm, &fbdev->base, num_crtc, max_connectors);
 	if (err < 0) {
-		dev_err(drm->dev, "failed to initialize DRM FB helper\n");
+		dev_err(drm->dev, "failed to initialize DRM FB helper: %d\n",
+			err);
 		return err;
 	}
 
 	err = drm_fb_helper_single_add_all_connectors(&fbdev->base);
 	if (err < 0) {
-		dev_err(drm->dev, "failed to add connectors\n");
+		dev_err(drm->dev, "failed to add connectors: %d\n", err);
 		goto fini;
 	}
 
 	err = drm_fb_helper_initial_config(&fbdev->base, preferred_bpp);
 	if (err < 0) {
-		dev_err(drm->dev, "failed to set initial configuration\n");
+		dev_err(drm->dev, "failed to set initial configuration: %d\n",
+			err);
 		goto fini;
 	}
 
@@ -322,7 +345,7 @@ fini:
 	return err;
 }
 
-static void tegra_fbdev_free(struct tegra_fbdev *fbdev)
+static void tegra_fbdev_exit(struct tegra_fbdev *fbdev)
 {
 	struct fb_info *info = fbdev->base.fbdev;
 
@@ -341,11 +364,11 @@ static void tegra_fbdev_free(struct tegra_fbdev *fbdev)
 
 	if (fbdev->fb) {
 		drm_framebuffer_unregister_private(&fbdev->fb->base);
-		tegra_fb_destroy(&fbdev->fb->base);
+		drm_framebuffer_remove(&fbdev->fb->base);
 	}
 
 	drm_fb_helper_fini(&fbdev->base);
-	kfree(fbdev);
+	tegra_fbdev_free(fbdev);
 }
 
 void tegra_fbdev_restore_mode(struct tegra_fbdev *fbdev)
@@ -393,6 +416,15 @@ int tegra_drm_fb_prepare(struct drm_device *drm)
 	return 0;
 }
 
+void tegra_drm_fb_free(struct drm_device *drm)
+{
+#ifdef CONFIG_DRM_TEGRA_FBDEV
+	struct tegra_drm *tegra = drm->dev_private;
+
+	tegra_fbdev_free(tegra->fbdev);
+#endif
+}
+
 int tegra_drm_fb_init(struct drm_device *drm)
 {
 #ifdef CONFIG_DRM_TEGRA_FBDEV
@@ -413,6 +445,6 @@ void tegra_drm_fb_exit(struct drm_device *drm)
 #ifdef CONFIG_DRM_TEGRA_FBDEV
 	struct tegra_drm *tegra = drm->dev_private;
 
-	tegra_fbdev_free(tegra->fbdev);
+	tegra_fbdev_exit(tegra->fbdev);
 #endif
 }
diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index ce023fa3e8ae..da32086cbeaf 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -14,6 +14,7 @@
  */
 
 #include <linux/dma-buf.h>
+#include <linux/iommu.h>
 #include <drm/tegra_drm.h>
 
 #include "drm.h"
@@ -91,13 +92,90 @@ static const struct host1x_bo_ops tegra_bo_ops = {
 	.kunmap = tegra_bo_kunmap,
 };
 
-static void tegra_bo_destroy(struct drm_device *drm, struct tegra_bo *bo)
+/*
+ * A generic iommu_map_sg() function is being reviewed and will hopefully be
+ * merged soon. At that point this function can be dropped in favour of the
+ * one provided by the IOMMU API.
+ */
+static ssize_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
+			      struct scatterlist *sg, unsigned int nents,
+			      int prot)
 {
-	dma_free_writecombine(drm->dev, bo->gem.size, bo->vaddr, bo->paddr);
+	struct scatterlist *s;
+	size_t offset = 0;
+	unsigned int i;
+	int err;
+
+	for_each_sg(sg, s, nents, i) {
+		phys_addr_t phys = page_to_phys(sg_page(s));
+		size_t length = s->offset + s->length;
+
+		err = iommu_map(domain, iova + offset, phys, length, prot);
+		if (err < 0) {
+			iommu_unmap(domain, iova, offset);
+			return err;
+		}
+
+		offset += length;
+	}
+
+	return offset;
 }
 
-struct tegra_bo *tegra_bo_create(struct drm_device *drm, unsigned int size,
-				 unsigned long flags)
+static int tegra_bo_iommu_map(struct tegra_drm *tegra, struct tegra_bo *bo)
+{
+	int prot = IOMMU_READ | IOMMU_WRITE;
+	ssize_t err;
+
+	if (bo->mm)
+		return -EBUSY;
+
+	bo->mm = kzalloc(sizeof(*bo->mm), GFP_KERNEL);
+	if (!bo->mm)
+		return -ENOMEM;
+
+	err = drm_mm_insert_node_generic(&tegra->mm, bo->mm, bo->gem.size,
+					 PAGE_SIZE, 0, 0, 0);
+	if (err < 0) {
+		dev_err(tegra->drm->dev, "out of I/O virtual memory: %zd\n",
+			err);
+		goto free;
+	}
+
+	bo->paddr = bo->mm->start;
+
+	err = __iommu_map_sg(tegra->domain, bo->paddr, bo->sgt->sgl,
+			     bo->sgt->nents, prot);
+	if (err < 0) {
+		dev_err(tegra->drm->dev, "failed to map buffer: %zd\n", err);
+		goto remove;
+	}
+
+	bo->size = err;
+
+	return 0;
+
+remove:
+	drm_mm_remove_node(bo->mm);
+free:
+	kfree(bo->mm);
+	return err;
+}
+
+static int tegra_bo_iommu_unmap(struct tegra_drm *tegra, struct tegra_bo *bo)
+{
+	if (!bo->mm)
+		return 0;
+
+	iommu_unmap(tegra->domain, bo->paddr, bo->size);
+	drm_mm_remove_node(bo->mm);
+	kfree(bo->mm);
+
+	return 0;
+}
+
+static struct tegra_bo *tegra_bo_alloc_object(struct drm_device *drm,
+					      size_t size)
 {
 	struct tegra_bo *bo;
 	int err;
@@ -109,22 +187,96 @@ struct tegra_bo *tegra_bo_create(struct drm_device *drm, unsigned int size,
 	host1x_bo_init(&bo->base, &tegra_bo_ops);
 	size = round_up(size, PAGE_SIZE);
 
-	bo->vaddr = dma_alloc_writecombine(drm->dev, size, &bo->paddr,
-					   GFP_KERNEL | __GFP_NOWARN);
-	if (!bo->vaddr) {
-		dev_err(drm->dev, "failed to allocate buffer with size %u\n",
-			size);
-		err = -ENOMEM;
-		goto err_dma;
-	}
-
 	err = drm_gem_object_init(drm, &bo->gem, size);
-	if (err)
-		goto err_init;
+	if (err < 0)
+		goto free;
 
 	err = drm_gem_create_mmap_offset(&bo->gem);
-	if (err)
-		goto err_mmap;
+	if (err < 0)
+		goto release;
+
+	return bo;
+
+release:
+	drm_gem_object_release(&bo->gem);
+free:
+	kfree(bo);
+	return ERR_PTR(err);
+}
+
+static void tegra_bo_free(struct drm_device *drm, struct tegra_bo *bo)
+{
+	if (bo->pages) {
+		drm_gem_put_pages(&bo->gem, bo->pages, true, true);
+		sg_free_table(bo->sgt);
+		kfree(bo->sgt);
+	} else if (bo->vaddr) {
+		dma_free_writecombine(drm->dev, bo->gem.size, bo->vaddr,
+				      bo->paddr);
+	}
+}
+
+static int tegra_bo_get_pages(struct drm_device *drm, struct tegra_bo *bo,
+			      size_t size)
+{
+	bo->pages = drm_gem_get_pages(&bo->gem);
+	if (IS_ERR(bo->pages))
+		return PTR_ERR(bo->pages);
+
+	bo->num_pages = size >> PAGE_SHIFT;
+
+	bo->sgt = drm_prime_pages_to_sg(bo->pages, bo->num_pages);
+	if (IS_ERR(bo->sgt)) {
+		drm_gem_put_pages(&bo->gem, bo->pages, false, false);
+		return PTR_ERR(bo->sgt);
+	}
+
+	return 0;
+}
+
+static int tegra_bo_alloc(struct drm_device *drm, struct tegra_bo *bo,
+			  size_t size)
+{
+	struct tegra_drm *tegra = drm->dev_private;
+	int err;
+
+	if (tegra->domain) {
+		err = tegra_bo_get_pages(drm, bo, size);
+		if (err < 0)
+			return err;
+
+		err = tegra_bo_iommu_map(tegra, bo);
+		if (err < 0) {
+			tegra_bo_free(drm, bo);
+			return err;
+		}
+	} else {
+		bo->vaddr = dma_alloc_writecombine(drm->dev, size, &bo->paddr,
+						   GFP_KERNEL | __GFP_NOWARN);
+		if (!bo->vaddr) {
+			dev_err(drm->dev,
+				"failed to allocate buffer of size %zu\n",
+				size);
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
+struct tegra_bo *tegra_bo_create(struct drm_device *drm, size_t size,
+				 unsigned long flags)
+{
+	struct tegra_bo *bo;
+	int err;
+
+	bo = tegra_bo_alloc_object(drm, size);
+	if (IS_ERR(bo))
+		return bo;
+
+	err = tegra_bo_alloc(drm, bo, size);
+	if (err < 0)
+		goto release;
 
 	if (flags & DRM_TEGRA_GEM_CREATE_TILED)
 		bo->tiling.mode = TEGRA_BO_TILING_MODE_TILED;
@@ -134,69 +286,52 @@ struct tegra_bo *tegra_bo_create(struct drm_device *drm, unsigned int size,
 
 	return bo;
 
-err_mmap:
+release:
 	drm_gem_object_release(&bo->gem);
-err_init:
-	tegra_bo_destroy(drm, bo);
-err_dma:
 	kfree(bo);
-
 	return ERR_PTR(err);
 }
 
 struct tegra_bo *tegra_bo_create_with_handle(struct drm_file *file,
 					     struct drm_device *drm,
-					     unsigned int size,
+					     size_t size,
 					     unsigned long flags,
-					     unsigned int *handle)
+					     u32 *handle)
 {
 	struct tegra_bo *bo;
-	int ret;
+	int err;
 
 	bo = tegra_bo_create(drm, size, flags);
 	if (IS_ERR(bo))
 		return bo;
 
-	ret = drm_gem_handle_create(file, &bo->gem, handle);
-	if (ret)
-		goto err;
+	err = drm_gem_handle_create(file, &bo->gem, handle);
+	if (err) {
+		tegra_bo_free_object(&bo->gem);
+		return ERR_PTR(err);
+	}
 
 	drm_gem_object_unreference_unlocked(&bo->gem);
 
 	return bo;
-
-err:
-	tegra_bo_free_object(&bo->gem);
-	return ERR_PTR(ret);
 }
 
 static struct tegra_bo *tegra_bo_import(struct drm_device *drm,
 					struct dma_buf *buf)
 {
+	struct tegra_drm *tegra = drm->dev_private;
 	struct dma_buf_attachment *attach;
 	struct tegra_bo *bo;
-	ssize_t size;
 	int err;
 
-	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
-	if (!bo)
-		return ERR_PTR(-ENOMEM);
-
-	host1x_bo_init(&bo->base, &tegra_bo_ops);
-	size = round_up(buf->size, PAGE_SIZE);
-
-	err = drm_gem_object_init(drm, &bo->gem, size);
-	if (err < 0)
-		goto free;
-
-	err = drm_gem_create_mmap_offset(&bo->gem);
-	if (err < 0)
-		goto release;
+	bo = tegra_bo_alloc_object(drm, buf->size);
+	if (IS_ERR(bo))
+		return bo;
 
 	attach = dma_buf_attach(buf, drm->dev);
 	if (IS_ERR(attach)) {
 		err = PTR_ERR(attach);
-		goto free_mmap;
+		goto free;
 	}
 
 	get_dma_buf(buf);
@@ -212,12 +347,19 @@ static struct tegra_bo *tegra_bo_import(struct drm_device *drm,
 		goto detach;
 	}
 
-	if (bo->sgt->nents > 1) {
-		err = -EINVAL;
-		goto detach;
+	if (tegra->domain) {
+		err = tegra_bo_iommu_map(tegra, bo);
+		if (err < 0)
+			goto detach;
+	} else {
+		if (bo->sgt->nents > 1) {
+			err = -EINVAL;
+			goto detach;
+		}
+
+		bo->paddr = sg_dma_address(bo->sgt->sgl);
 	}
 
-	bo->paddr = sg_dma_address(bo->sgt->sgl);
 	bo->gem.import_attach = attach;
 
 	return bo;
@@ -228,47 +370,41 @@ detach:
 
 	dma_buf_detach(buf, attach);
 	dma_buf_put(buf);
-free_mmap:
-	drm_gem_free_mmap_offset(&bo->gem);
-release:
-	drm_gem_object_release(&bo->gem);
 free:
+	drm_gem_object_release(&bo->gem);
 	kfree(bo);
-
 	return ERR_PTR(err);
 }
 
 void tegra_bo_free_object(struct drm_gem_object *gem)
 {
+	struct tegra_drm *tegra = gem->dev->dev_private;
 	struct tegra_bo *bo = to_tegra_bo(gem);
 
+	if (tegra->domain)
+		tegra_bo_iommu_unmap(tegra, bo);
+
 	if (gem->import_attach) {
 		dma_buf_unmap_attachment(gem->import_attach, bo->sgt,
 					 DMA_TO_DEVICE);
 		drm_prime_gem_destroy(gem, NULL);
 	} else {
-		tegra_bo_destroy(gem->dev, bo);
+		tegra_bo_free(gem->dev, bo);
 	}
 
-	drm_gem_free_mmap_offset(gem);
 	drm_gem_object_release(gem);
-
 	kfree(bo);
 }
 
 int tegra_bo_dumb_create(struct drm_file *file, struct drm_device *drm,
 			 struct drm_mode_create_dumb *args)
 {
-	int min_pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
+	unsigned int min_pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
 	struct tegra_drm *tegra = drm->dev_private;
 	struct tegra_bo *bo;
 
-	min_pitch = round_up(min_pitch, tegra->pitch_align);
-	if (args->pitch < min_pitch)
-		args->pitch = min_pitch;
-
-	if (args->size < args->pitch * args->height)
-		args->size = args->pitch * args->height;
+	args->pitch = round_up(min_pitch, tegra->pitch_align);
+	args->size = args->pitch * args->height;
 
 	bo = tegra_bo_create_with_handle(file, drm, args->size, 0,
 					 &args->handle);
@@ -279,7 +415,7 @@ int tegra_bo_dumb_create(struct drm_file *file, struct drm_device *drm,
 }
 
 int tegra_bo_dumb_map_offset(struct drm_file *file, struct drm_device *drm,
-			     uint32_t handle, uint64_t *offset)
+			     u32 handle, u64 *offset)
 {
 	struct drm_gem_object *gem;
 	struct tegra_bo *bo;
@@ -304,7 +440,38 @@ int tegra_bo_dumb_map_offset(struct drm_file *file, struct drm_device *drm,
 	return 0;
 }
 
+static int tegra_bo_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	struct drm_gem_object *gem = vma->vm_private_data;
+	struct tegra_bo *bo = to_tegra_bo(gem);
+	struct page *page;
+	pgoff_t offset;
+	int err;
+
+	if (!bo->pages)
+		return VM_FAULT_SIGBUS;
+
+	offset = ((unsigned long)vmf->virtual_address - vma->vm_start) >> PAGE_SHIFT;
+	page = bo->pages[offset];
+
+	err = vm_insert_page(vma, (unsigned long)vmf->virtual_address, page);
+	switch (err) {
+	case -EAGAIN:
+	case 0:
+	case -ERESTARTSYS:
+	case -EINTR:
+	case -EBUSY:
+		return VM_FAULT_NOPAGE;
+
+	case -ENOMEM:
+		return VM_FAULT_OOM;
+	}
+
+	return VM_FAULT_SIGBUS;
+}
+
 const struct vm_operations_struct tegra_bo_vm_ops = {
+	.fault = tegra_bo_fault,
 	.open = drm_gem_vm_open,
 	.close = drm_gem_vm_close,
 };
@@ -322,12 +489,30 @@ int tegra_drm_mmap(struct file *file, struct vm_area_struct *vma)
 	gem = vma->vm_private_data;
 	bo = to_tegra_bo(gem);
 
-	ret = remap_pfn_range(vma, vma->vm_start, bo->paddr >> PAGE_SHIFT,
-			      vma->vm_end - vma->vm_start, vma->vm_page_prot);
-	if (ret)
-		drm_gem_vm_close(vma);
+	if (!bo->pages) {
+		unsigned long vm_pgoff = vma->vm_pgoff;
+
+		vma->vm_flags &= ~VM_PFNMAP;
+		vma->vm_pgoff = 0;
+
+		ret = dma_mmap_writecombine(gem->dev->dev, vma, bo->vaddr,
+					    bo->paddr, gem->size);
+		if (ret) {
+			drm_gem_vm_close(vma);
+			return ret;
+		}
+
+		vma->vm_pgoff = vm_pgoff;
+	} else {
+		pgprot_t prot = vm_get_page_prot(vma->vm_flags);
+
+		vma->vm_flags |= VM_MIXEDMAP;
+		vma->vm_flags &= ~VM_PFNMAP;
 
-	return ret;
+		vma->vm_page_prot = pgprot_writecombine(prot);
+	}
+
+	return 0;
 }
 
 static struct sg_table *
@@ -342,21 +527,44 @@ tegra_gem_prime_map_dma_buf(struct dma_buf_attachment *attach,
 	if (!sgt)
 		return NULL;
 
-	if (sg_alloc_table(sgt, 1, GFP_KERNEL)) {
-		kfree(sgt);
-		return NULL;
-	}
+	if (bo->pages) {
+		struct scatterlist *sg;
+		unsigned int i;
 
-	sg_dma_address(sgt->sgl) = bo->paddr;
-	sg_dma_len(sgt->sgl) = gem->size;
+		if (sg_alloc_table(sgt, bo->num_pages, GFP_KERNEL))
+			goto free;
+
+		for_each_sg(sgt->sgl, sg, bo->num_pages, i)
+			sg_set_page(sg, bo->pages[i], PAGE_SIZE, 0);
+
+		if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0)
+			goto free;
+	} else {
+		if (sg_alloc_table(sgt, 1, GFP_KERNEL))
+			goto free;
+
+		sg_dma_address(sgt->sgl) = bo->paddr;
+		sg_dma_len(sgt->sgl) = gem->size;
+	}
 
 	return sgt;
+
+free:
+	sg_free_table(sgt);
+	kfree(sgt);
+	return NULL;
 }
 
 static void tegra_gem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
 					  struct sg_table *sgt,
 					  enum dma_data_direction dir)
 {
+	struct drm_gem_object *gem = attach->dmabuf->priv;
+	struct tegra_bo *bo = to_tegra_bo(gem);
+
+	if (bo->pages)
+		dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
+
 	sg_free_table(sgt);
 	kfree(sgt);
 }
diff --git a/drivers/gpu/drm/tegra/gem.h b/drivers/gpu/drm/tegra/gem.h
index 6538b56780c2..6c5f12ac0087 100644
--- a/drivers/gpu/drm/tegra/gem.h
+++ b/drivers/gpu/drm/tegra/gem.h
@@ -38,6 +38,12 @@ struct tegra_bo {
 	dma_addr_t paddr;
 	void *vaddr;
 
+	struct drm_mm_node *mm;
+	unsigned long num_pages;
+	struct page **pages;
+	/* size of IOMMU mapping */
+	size_t size;
+
 	struct tegra_bo_tiling tiling;
 };
 
@@ -46,18 +52,18 @@ static inline struct tegra_bo *to_tegra_bo(struct drm_gem_object *gem)
 	return container_of(gem, struct tegra_bo, gem);
 }
 
-struct tegra_bo *tegra_bo_create(struct drm_device *drm, unsigned int size,
+struct tegra_bo *tegra_bo_create(struct drm_device *drm, size_t size,
 				 unsigned long flags);
 struct tegra_bo *tegra_bo_create_with_handle(struct drm_file *file,
 					     struct drm_device *drm,
-					     unsigned int size,
+					     size_t size,
 					     unsigned long flags,
-					     unsigned int *handle);
+					     u32 *handle);
 void tegra_bo_free_object(struct drm_gem_object *gem);
 int tegra_bo_dumb_create(struct drm_file *file, struct drm_device *drm,
 			 struct drm_mode_create_dumb *args);
 int tegra_bo_dumb_map_offset(struct drm_file *file, struct drm_device *drm,
-			     uint32_t handle, uint64_t *offset);
+			     u32 handle, u64 *offset);
 
 int tegra_drm_mmap(struct file *file, struct vm_area_struct *vma);
 
diff --git a/drivers/gpu/drm/tegra/output.c b/drivers/gpu/drm/tegra/output.c
index 0c67d7eebc94..6a5c7b81fbc5 100644
--- a/drivers/gpu/drm/tegra/output.c
+++ b/drivers/gpu/drm/tegra/output.c
@@ -157,22 +157,18 @@ static bool tegra_encoder_mode_fixup(struct drm_encoder *encoder,
 
 static void tegra_encoder_prepare(struct drm_encoder *encoder)
 {
+	tegra_encoder_dpms(encoder, DRM_MODE_DPMS_OFF);
 }
 
 static void tegra_encoder_commit(struct drm_encoder *encoder)
 {
+	tegra_encoder_dpms(encoder, DRM_MODE_DPMS_ON);
 }
 
 static void tegra_encoder_mode_set(struct drm_encoder *encoder,
 				   struct drm_display_mode *mode,
 				   struct drm_display_mode *adjusted)
 {
-	struct tegra_output *output = encoder_to_output(encoder);
-	int err;
-
-	err = tegra_output_enable(output);
-	if (err < 0)
-		dev_err(encoder->dev->dev, "tegra_output_enable(): %d\n", err);
 }
 
 static const struct drm_encoder_helper_funcs encoder_helper_funcs = {
@@ -187,7 +183,8 @@ static irqreturn_t hpd_irq(int irq, void *data)
 {
 	struct tegra_output *output = data;
 
-	drm_helper_hpd_irq_event(output->connector.dev);
+	if (output->connector.dev)
+		drm_helper_hpd_irq_event(output->connector.dev);
 
 	return IRQ_HANDLED;
 }
@@ -259,6 +256,13 @@ int tegra_output_probe(struct tegra_output *output)
 		}
 
 		output->connector.polled = DRM_CONNECTOR_POLL_HPD;
+
+		/*
+		 * Disable the interrupt until the connector has been
+		 * initialized to avoid a race in the hotplug interrupt
+		 * handler.
+		 */
+		disable_irq(output->hpd_irq);
 	}
 
 	return 0;
@@ -324,10 +328,27 @@ int tegra_output_init(struct drm_device *drm, struct tegra_output *output)
 
 	output->encoder.possible_crtcs = 0x3;
 
+	/*
+	 * The connector is now registered and ready to receive hotplug events
+	 * so the hotplug interrupt can be enabled.
+	 */
+	if (gpio_is_valid(output->hpd_gpio))
+		enable_irq(output->hpd_irq);
+
 	return 0;
 }
 
 int tegra_output_exit(struct tegra_output *output)
 {
+	/*
+	 * The connector is going away, so the interrupt must be disabled to
+	 * prevent the hotplug interrupt handler from potentially crashing.
+	 */
+	if (gpio_is_valid(output->hpd_gpio))
+		disable_irq(output->hpd_irq);
+
+	if (output->panel)
+		drm_panel_detach(output->panel);
+
 	return 0;
 }
diff --git a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
index d642d4a02134..c73588483be0 100644
--- a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
+++ b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
@@ -16,6 +16,7 @@
  */
 
 #include "drm_flip_work.h"
+#include <drm/drm_plane_helper.h>
 
 #include "tilcdc_drv.h"
 #include "tilcdc_regs.h"
@@ -664,12 +665,8 @@ struct drm_crtc *tilcdc_crtc_create(struct drm_device *dev)
 	tilcdc_crtc->dpms = DRM_MODE_DPMS_OFF;
 	init_waitqueue_head(&tilcdc_crtc->frame_done_wq);
 
-	ret = drm_flip_work_init(&tilcdc_crtc->unref_work, 16,
+	drm_flip_work_init(&tilcdc_crtc->unref_work,
 			"unref", unref_worker);
-	if (ret) {
-		dev_err(dev->dev, "could not allocate unref FIFO\n");
-		goto fail;
-	}
 
 	ret = drm_crtc_init(dev, crtc, &tilcdc_crtc_funcs);
 	if (ret < 0)
diff --git a/drivers/gpu/drm/tilcdc/tilcdc_drv.c b/drivers/gpu/drm/tilcdc/tilcdc_drv.c
index f8546824d177..095fca91525c 100644
--- a/drivers/gpu/drm/tilcdc/tilcdc_drv.c
+++ b/drivers/gpu/drm/tilcdc/tilcdc_drv.c
@@ -58,8 +58,7 @@ static struct drm_framebuffer *tilcdc_fb_create(struct drm_device *dev,
 static void tilcdc_fb_output_poll_changed(struct drm_device *dev)
 {
 	struct tilcdc_drm_private *priv = dev->dev_private;
-	if (priv->fbdev)
-		drm_fbdev_cma_hotplug_event(priv->fbdev);
+	drm_fbdev_cma_hotplug_event(priv->fbdev);
 }
 
 static const struct drm_mode_config_funcs mode_config_funcs = {
diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c
index 964387fc5c8f..aa0bd054d3e9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
@@ -55,6 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man,
 	struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv;
 	struct drm_mm *mm = &rman->mm;
 	struct drm_mm_node *node = NULL;
+	enum drm_mm_search_flags sflags = DRM_MM_SEARCH_BEST;
 	enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT;
 	unsigned long lpfn;
 	int ret;
@@ -67,15 +68,16 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man,
 	if (!node)
 		return -ENOMEM;
 
-	if (place->flags & TTM_PL_FLAG_TOPDOWN)
+	if (place->flags & TTM_PL_FLAG_TOPDOWN) {
+		sflags = DRM_MM_SEARCH_BELOW;
 		aflags = DRM_MM_CREATE_TOP;
+	}
 
 	spin_lock(&rman->lock);
 	ret = drm_mm_insert_node_in_range_generic(mm, node, mem->num_pages,
 					  mem->page_alignment, 0,
 					  place->fpfn, lpfn,
-					  DRM_MM_SEARCH_BEST,
-					  aflags);
+					  sflags, aflags);
 	spin_unlock(&rman->lock);
 
 	if (unlikely(ret)) {
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index 8ce508e76208..3820ae97a030 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -93,7 +93,8 @@ EXPORT_SYMBOL(ttm_eu_backoff_reservation);
  */
 
 int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
-			   struct list_head *list, bool intr)
+			   struct list_head *list, bool intr,
+			   struct list_head *dups)
 {
 	struct ttm_bo_global *glob;
 	struct ttm_validate_buffer *entry;
@@ -117,6 +118,13 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
 			__ttm_bo_unreserve(bo);
 
 			ret = -EBUSY;
+
+		} else if (ret == -EALREADY && dups) {
+			struct ttm_validate_buffer *safe = entry;
+			entry = list_prev_entry(entry, head);
+			list_del(&safe->head);
+			list_add(&safe->head, dups);
+			continue;
 		}
 
 		if (!ret) {
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index 09874d695188..025c429050c0 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -297,11 +297,12 @@ static void ttm_pool_update_free_locked(struct ttm_page_pool *pool,
  *
  * @pool: to free the pages from
  * @free_all: If set to true will free all pages in pool
- * @gfp: GFP flags.
+ * @use_static: Safe to use static buffer
  **/
 static int ttm_page_pool_free(struct ttm_page_pool *pool, unsigned nr_free,
-			      gfp_t gfp)
+			      bool use_static)
 {
+	static struct page *static_buf[NUM_PAGES_TO_ALLOC];
 	unsigned long irq_flags;
 	struct page *p;
 	struct page **pages_to_free;
@@ -311,7 +312,11 @@ static int ttm_page_pool_free(struct ttm_page_pool *pool, unsigned nr_free,
 	if (NUM_PAGES_TO_ALLOC < nr_free)
 		npages_to_free = NUM_PAGES_TO_ALLOC;
 
-	pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
+	if (use_static)
+		pages_to_free = static_buf;
+	else
+		pages_to_free = kmalloc(npages_to_free * sizeof(struct page *),
+					GFP_KERNEL);
 	if (!pages_to_free) {
 		pr_err("Failed to allocate memory for pool free operation\n");
 		return 0;
@@ -374,7 +379,8 @@ restart:
 	if (freed_pages)
 		ttm_pages_put(pages_to_free, freed_pages);
 out:
-	kfree(pages_to_free);
+	if (pages_to_free != static_buf)
+		kfree(pages_to_free);
 	return nr_free;
 }
 
@@ -383,8 +389,6 @@ out:
  *
  * XXX: (dchinner) Deadlock warning!
  *
- * We need to pass sc->gfp_mask to ttm_page_pool_free().
- *
  * This code is crying out for a shrinker per pool....
  */
 static unsigned long
@@ -407,8 +411,8 @@ ttm_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 		if (shrink_pages == 0)
 			break;
 		pool = &_manager->pools[(i + pool_offset)%NUM_POOLS];
-		shrink_pages = ttm_page_pool_free(pool, nr_free,
-						  sc->gfp_mask);
+		/* OK to use static buffer since global mutex is held. */
+		shrink_pages = ttm_page_pool_free(pool, nr_free, true);
 		freed += nr_free - shrink_pages;
 	}
 	mutex_unlock(&lock);
@@ -710,7 +714,7 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags,
 	}
 	spin_unlock_irqrestore(&pool->lock, irq_flags);
 	if (npages)
-		ttm_page_pool_free(pool, npages, GFP_KERNEL);
+		ttm_page_pool_free(pool, npages, false);
 }
 
 /*
@@ -849,9 +853,9 @@ void ttm_page_alloc_fini(void)
 	pr_info("Finalizing pool allocator\n");
 	ttm_pool_mm_shrink_fini(_manager);
 
+	/* OK to use static buffer since global mutex is no longer used. */
 	for (i = 0; i < NUM_POOLS; ++i)
-		ttm_page_pool_free(&_manager->pools[i], FREE_ALL_PAGES,
-				   GFP_KERNEL);
+		ttm_page_pool_free(&_manager->pools[i], FREE_ALL_PAGES, true);
 
 	kobject_put(&_manager->kobj);
 	_manager = NULL;
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
index c96db433f8af..01e1d27eb078 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -411,11 +411,12 @@ static void ttm_dma_page_put(struct dma_pool *pool, struct dma_page *d_page)
  *
  * @pool: to free the pages from
  * @nr_free: If set to true will free all pages in pool
- * @gfp: GFP flags.
+ * @use_static: Safe to use static buffer
  **/
 static unsigned ttm_dma_page_pool_free(struct dma_pool *pool, unsigned nr_free,
-				       gfp_t gfp)
+				       bool use_static)
 {
+	static struct page *static_buf[NUM_PAGES_TO_ALLOC];
 	unsigned long irq_flags;
 	struct dma_page *dma_p, *tmp;
 	struct page **pages_to_free;
@@ -432,7 +433,11 @@ static unsigned ttm_dma_page_pool_free(struct dma_pool *pool, unsigned nr_free,
 			 npages_to_free, nr_free);
 	}
 #endif
-	pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
+	if (use_static)
+		pages_to_free = static_buf;
+	else
+		pages_to_free = kmalloc(npages_to_free * sizeof(struct page *),
+					GFP_KERNEL);
 
 	if (!pages_to_free) {
 		pr_err("%s: Failed to allocate memory for pool free operation\n",
@@ -502,7 +507,8 @@ restart:
 	if (freed_pages)
 		ttm_dma_pages_put(pool, &d_pages, pages_to_free, freed_pages);
 out:
-	kfree(pages_to_free);
+	if (pages_to_free != static_buf)
+		kfree(pages_to_free);
 	return nr_free;
 }
 
@@ -531,7 +537,8 @@ static void ttm_dma_free_pool(struct device *dev, enum pool_type type)
 		if (pool->type != type)
 			continue;
 		/* Takes a spinlock.. */
-		ttm_dma_page_pool_free(pool, FREE_ALL_PAGES, GFP_KERNEL);
+		/* OK to use static buffer since global mutex is held. */
+		ttm_dma_page_pool_free(pool, FREE_ALL_PAGES, true);
 		WARN_ON(((pool->npages_in_use + pool->npages_free) != 0));
 		/* This code path is called after _all_ references to the
 		 * struct device has been dropped - so nobody should be
@@ -986,7 +993,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
 
 	/* shrink pool if necessary (only on !is_cached pools)*/
 	if (npages)
-		ttm_dma_page_pool_free(pool, npages, GFP_KERNEL);
+		ttm_dma_page_pool_free(pool, npages, false);
 	ttm->state = tt_unpopulated;
 }
 EXPORT_SYMBOL_GPL(ttm_dma_unpopulate);
@@ -996,8 +1003,6 @@ EXPORT_SYMBOL_GPL(ttm_dma_unpopulate);
  *
  * XXX: (dchinner) Deadlock warning!
  *
- * We need to pass sc->gfp_mask to ttm_dma_page_pool_free().
- *
  * I'm getting sadder as I hear more pathetical whimpers about needing per-pool
  * shrinkers
  */
@@ -1030,8 +1035,8 @@ ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 		if (++idx < pool_offset)
 			continue;
 		nr_free = shrink_pages;
-		shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free,
-						      sc->gfp_mask);
+		/* OK to use static buffer since global mutex is held. */
+		shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free, true);
 		freed += nr_free - shrink_pages;
 
 		pr_debug("%s: (%s:%d) Asked to shrink %d, have %d more to go\n",
diff --git a/drivers/gpu/drm/udl/Makefile b/drivers/gpu/drm/udl/Makefile
index 05c7481bfd40..195bcac0b6c8 100644
--- a/drivers/gpu/drm/udl/Makefile
+++ b/drivers/gpu/drm/udl/Makefile
@@ -1,6 +1,6 @@
 
 ccflags-y := -Iinclude/drm
 
-udl-y := udl_drv.o udl_modeset.o udl_connector.o udl_encoder.o udl_main.o udl_fb.o udl_transfer.o udl_gem.o
+udl-y := udl_drv.o udl_modeset.o udl_connector.o udl_encoder.o udl_main.o udl_fb.o udl_transfer.o udl_gem.o udl_dmabuf.o
 
 obj-$(CONFIG_DRM_UDL) := udl.o
diff --git a/drivers/gpu/drm/udl/udl_dmabuf.c b/drivers/gpu/drm/udl/udl_dmabuf.c
new file mode 100644
index 000000000000..ac8a66b4dfc2
--- /dev/null
+++ b/drivers/gpu/drm/udl/udl_dmabuf.c
@@ -0,0 +1,276 @@
+/*
+ * udl_dmabuf.c
+ *
+ * Copyright (c) 2014 The Chromium OS Authors
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <drm/drmP.h>
+#include "udl_drv.h"
+#include <linux/shmem_fs.h>
+#include <linux/dma-buf.h>
+
+struct udl_drm_dmabuf_attachment {
+	struct sg_table sgt;
+	enum dma_data_direction dir;
+	bool is_mapped;
+};
+
+static int udl_attach_dma_buf(struct dma_buf *dmabuf,
+			      struct device *dev,
+			      struct dma_buf_attachment *attach)
+{
+	struct udl_drm_dmabuf_attachment *udl_attach;
+
+	DRM_DEBUG_PRIME("[DEV:%s] size:%zd\n", dev_name(attach->dev),
+			attach->dmabuf->size);
+
+	udl_attach = kzalloc(sizeof(*udl_attach), GFP_KERNEL);
+	if (!udl_attach)
+		return -ENOMEM;
+
+	udl_attach->dir = DMA_NONE;
+	attach->priv = udl_attach;
+
+	return 0;
+}
+
+static void udl_detach_dma_buf(struct dma_buf *dmabuf,
+			       struct dma_buf_attachment *attach)
+{
+	struct udl_drm_dmabuf_attachment *udl_attach = attach->priv;
+	struct sg_table *sgt;
+
+	if (!udl_attach)
+		return;
+
+	DRM_DEBUG_PRIME("[DEV:%s] size:%zd\n", dev_name(attach->dev),
+			attach->dmabuf->size);
+
+	sgt = &udl_attach->sgt;
+
+	if (udl_attach->dir != DMA_NONE)
+		dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents,
+				udl_attach->dir);
+
+	sg_free_table(sgt);
+	kfree(udl_attach);
+	attach->priv = NULL;
+}
+
+static struct sg_table *udl_map_dma_buf(struct dma_buf_attachment *attach,
+					enum dma_data_direction dir)
+{
+	struct udl_drm_dmabuf_attachment *udl_attach = attach->priv;
+	struct udl_gem_object *obj = to_udl_bo(attach->dmabuf->priv);
+	struct drm_device *dev = obj->base.dev;
+	struct scatterlist *rd, *wr;
+	struct sg_table *sgt = NULL;
+	unsigned int i;
+	int page_count;
+	int nents, ret;
+
+	DRM_DEBUG_PRIME("[DEV:%s] size:%zd dir=%d\n", dev_name(attach->dev),
+			attach->dmabuf->size, dir);
+
+	/* just return current sgt if already requested. */
+	if (udl_attach->dir == dir && udl_attach->is_mapped)
+		return &udl_attach->sgt;
+
+	if (!obj->pages) {
+		ret = udl_gem_get_pages(obj);
+		if (ret) {
+			DRM_ERROR("failed to map pages.\n");
+			return ERR_PTR(ret);
+		}
+	}
+
+	page_count = obj->base.size / PAGE_SIZE;
+	obj->sg = drm_prime_pages_to_sg(obj->pages, page_count);
+	if (IS_ERR(obj->sg)) {
+		DRM_ERROR("failed to allocate sgt.\n");
+		return ERR_CAST(obj->sg);
+	}
+
+	sgt = &udl_attach->sgt;
+
+	ret = sg_alloc_table(sgt, obj->sg->orig_nents, GFP_KERNEL);
+	if (ret) {
+		DRM_ERROR("failed to alloc sgt.\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	mutex_lock(&dev->struct_mutex);
+
+	rd = obj->sg->sgl;
+	wr = sgt->sgl;
+	for (i = 0; i < sgt->orig_nents; ++i) {
+		sg_set_page(wr, sg_page(rd), rd->length, rd->offset);
+		rd = sg_next(rd);
+		wr = sg_next(wr);
+	}
+
+	if (dir != DMA_NONE) {
+		nents = dma_map_sg(attach->dev, sgt->sgl, sgt->orig_nents, dir);
+		if (!nents) {
+			DRM_ERROR("failed to map sgl with iommu.\n");
+			sg_free_table(sgt);
+			sgt = ERR_PTR(-EIO);
+			goto err_unlock;
+		}
+	}
+
+	udl_attach->is_mapped = true;
+	udl_attach->dir = dir;
+	attach->priv = udl_attach;
+
+err_unlock:
+	mutex_unlock(&dev->struct_mutex);
+	return sgt;
+}
+
+static void udl_unmap_dma_buf(struct dma_buf_attachment *attach,
+			      struct sg_table *sgt,
+			      enum dma_data_direction dir)
+{
+	/* Nothing to do. */
+	DRM_DEBUG_PRIME("[DEV:%s] size:%zd dir:%d\n", dev_name(attach->dev),
+			attach->dmabuf->size, dir);
+}
+
+static void *udl_dmabuf_kmap(struct dma_buf *dma_buf, unsigned long page_num)
+{
+	/* TODO */
+
+	return NULL;
+}
+
+static void *udl_dmabuf_kmap_atomic(struct dma_buf *dma_buf,
+				    unsigned long page_num)
+{
+	/* TODO */
+
+	return NULL;
+}
+
+static void udl_dmabuf_kunmap(struct dma_buf *dma_buf,
+			      unsigned long page_num, void *addr)
+{
+	/* TODO */
+}
+
+static void udl_dmabuf_kunmap_atomic(struct dma_buf *dma_buf,
+				     unsigned long page_num,
+				     void *addr)
+{
+	/* TODO */
+}
+
+static int udl_dmabuf_mmap(struct dma_buf *dma_buf,
+			   struct vm_area_struct *vma)
+{
+	/* TODO */
+
+	return -EINVAL;
+}
+
+static struct dma_buf_ops udl_dmabuf_ops = {
+	.attach			= udl_attach_dma_buf,
+	.detach			= udl_detach_dma_buf,
+	.map_dma_buf		= udl_map_dma_buf,
+	.unmap_dma_buf		= udl_unmap_dma_buf,
+	.kmap			= udl_dmabuf_kmap,
+	.kmap_atomic		= udl_dmabuf_kmap_atomic,
+	.kunmap			= udl_dmabuf_kunmap,
+	.kunmap_atomic		= udl_dmabuf_kunmap_atomic,
+	.mmap			= udl_dmabuf_mmap,
+	.release		= drm_gem_dmabuf_release,
+};
+
+struct dma_buf *udl_gem_prime_export(struct drm_device *dev,
+				     struct drm_gem_object *obj, int flags)
+{
+	return dma_buf_export(obj, &udl_dmabuf_ops, obj->size, flags, NULL);
+}
+
+static int udl_prime_create(struct drm_device *dev,
+			    size_t size,
+			    struct sg_table *sg,
+			    struct udl_gem_object **obj_p)
+{
+	struct udl_gem_object *obj;
+	int npages;
+
+	npages = size / PAGE_SIZE;
+
+	*obj_p = NULL;
+	obj = udl_gem_alloc_object(dev, npages * PAGE_SIZE);
+	if (!obj)
+		return -ENOMEM;
+
+	obj->sg = sg;
+	obj->pages = drm_malloc_ab(npages, sizeof(struct page *));
+	if (obj->pages == NULL) {
+		DRM_ERROR("obj pages is NULL %d\n", npages);
+		return -ENOMEM;
+	}
+
+	drm_prime_sg_to_page_addr_arrays(sg, obj->pages, NULL, npages);
+
+	*obj_p = obj;
+	return 0;
+}
+
+struct drm_gem_object *udl_gem_prime_import(struct drm_device *dev,
+				struct dma_buf *dma_buf)
+{
+	struct dma_buf_attachment *attach;
+	struct sg_table *sg;
+	struct udl_gem_object *uobj;
+	int ret;
+
+	/* need to attach */
+	get_device(dev->dev);
+	attach = dma_buf_attach(dma_buf, dev->dev);
+	if (IS_ERR(attach)) {
+		put_device(dev->dev);
+		return ERR_CAST(attach);
+	}
+
+	get_dma_buf(dma_buf);
+
+	sg = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
+	if (IS_ERR(sg)) {
+		ret = PTR_ERR(sg);
+		goto fail_detach;
+	}
+
+	ret = udl_prime_create(dev, dma_buf->size, sg, &uobj);
+	if (ret)
+		goto fail_unmap;
+
+	uobj->base.import_attach = attach;
+	uobj->flags = UDL_BO_WC;
+
+	return &uobj->base;
+
+fail_unmap:
+	dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
+fail_detach:
+	dma_buf_detach(dma_buf, attach);
+	dma_buf_put(dma_buf);
+	put_device(dev->dev);
+	return ERR_PTR(ret);
+}
diff --git a/drivers/gpu/drm/udl/udl_drv.c b/drivers/gpu/drm/udl/udl_drv.c
index 8607e9e513db..d5728ec85254 100644
--- a/drivers/gpu/drm/udl/udl_drv.c
+++ b/drivers/gpu/drm/udl/udl_drv.c
@@ -51,7 +51,9 @@ static struct drm_driver driver = {
 	.dumb_destroy = drm_gem_dumb_destroy,
 	.fops = &udl_driver_fops,
 
+	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
 	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
+	.gem_prime_export = udl_gem_prime_export,
 	.gem_prime_import = udl_gem_prime_import,
 
 	.name = DRIVER_NAME,
diff --git a/drivers/gpu/drm/udl/udl_drv.h b/drivers/gpu/drm/udl/udl_drv.h
index c7490a2489a7..80adbac82bde 100644
--- a/drivers/gpu/drm/udl/udl_drv.h
+++ b/drivers/gpu/drm/udl/udl_drv.h
@@ -25,6 +25,9 @@
 #define DRIVER_MINOR		0
 #define DRIVER_PATCHLEVEL	1
 
+#define UDL_BO_CACHEABLE		(1 << 0)
+#define UDL_BO_WC		(1 << 1)
+
 struct udl_device;
 
 struct urb_node {
@@ -69,6 +72,7 @@ struct udl_gem_object {
 	struct page **pages;
 	void *vmapping;
 	struct sg_table *sg;
+	unsigned int flags;
 };
 
 #define to_udl_bo(x) container_of(x, struct udl_gem_object, base)
@@ -120,9 +124,13 @@ int udl_gem_mmap(struct drm_file *file_priv, struct drm_device *dev,
 void udl_gem_free_object(struct drm_gem_object *gem_obj);
 struct udl_gem_object *udl_gem_alloc_object(struct drm_device *dev,
 					    size_t size);
+struct dma_buf *udl_gem_prime_export(struct drm_device *dev,
+				     struct drm_gem_object *obj, int flags);
 struct drm_gem_object *udl_gem_prime_import(struct drm_device *dev,
 				struct dma_buf *dma_buf);
 
+int udl_gem_get_pages(struct udl_gem_object *obj);
+void udl_gem_put_pages(struct udl_gem_object *obj);
 int udl_gem_vmap(struct udl_gem_object *obj);
 void udl_gem_vunmap(struct udl_gem_object *obj);
 int udl_drm_gem_mmap(struct file *filp, struct vm_area_struct *vma);
diff --git a/drivers/gpu/drm/udl/udl_gem.c b/drivers/gpu/drm/udl/udl_gem.c
index 8044f5fb7c49..2a0a784ab6ee 100644
--- a/drivers/gpu/drm/udl/udl_gem.c
+++ b/drivers/gpu/drm/udl/udl_gem.c
@@ -25,6 +25,7 @@ struct udl_gem_object *udl_gem_alloc_object(struct drm_device *dev,
 		return NULL;
 	}
 
+	obj->flags = UDL_BO_CACHEABLE;
 	return obj;
 }
 
@@ -56,6 +57,23 @@ udl_gem_create(struct drm_file *file,
 	return 0;
 }
 
+static void update_vm_cache_attr(struct udl_gem_object *obj,
+				 struct vm_area_struct *vma)
+{
+	DRM_DEBUG_KMS("flags = 0x%x\n", obj->flags);
+
+	/* non-cacheable as default. */
+	if (obj->flags & UDL_BO_CACHEABLE) {
+		vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+	} else if (obj->flags & UDL_BO_WC) {
+		vma->vm_page_prot =
+			pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+	} else {
+		vma->vm_page_prot =
+			pgprot_noncached(vm_get_page_prot(vma->vm_flags));
+	}
+}
+
 int udl_dumb_create(struct drm_file *file,
 		    struct drm_device *dev,
 		    struct drm_mode_create_dumb *args)
@@ -77,6 +95,8 @@ int udl_drm_gem_mmap(struct file *filp, struct vm_area_struct *vma)
 	vma->vm_flags &= ~VM_PFNMAP;
 	vma->vm_flags |= VM_MIXEDMAP;
 
+	update_vm_cache_attr(to_udl_bo(vma->vm_private_data), vma);
+
 	return ret;
 }
 
@@ -107,7 +127,7 @@ int udl_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	}
 }
 
-static int udl_gem_get_pages(struct udl_gem_object *obj)
+int udl_gem_get_pages(struct udl_gem_object *obj)
 {
 	struct page **pages;
 
@@ -123,7 +143,7 @@ static int udl_gem_get_pages(struct udl_gem_object *obj)
 	return 0;
 }
 
-static void udl_gem_put_pages(struct udl_gem_object *obj)
+void udl_gem_put_pages(struct udl_gem_object *obj)
 {
 	if (obj->base.import_attach) {
 		drm_free_large(obj->pages);
@@ -164,8 +184,7 @@ void udl_gem_vunmap(struct udl_gem_object *obj)
 		return;
 	}
 
-	if (obj->vmapping)
-		vunmap(obj->vmapping);
+	vunmap(obj->vmapping);
 
 	udl_gem_put_pages(obj);
 }
@@ -220,73 +239,3 @@ unlock:
 	mutex_unlock(&dev->struct_mutex);
 	return ret;
 }
-
-static int udl_prime_create(struct drm_device *dev,
-			    size_t size,
-			    struct sg_table *sg,
-			    struct udl_gem_object **obj_p)
-{
-	struct udl_gem_object *obj;
-	int npages;
-
-	npages = size / PAGE_SIZE;
-
-	*obj_p = NULL;
-	obj = udl_gem_alloc_object(dev, npages * PAGE_SIZE);
-	if (!obj)
-		return -ENOMEM;
-
-	obj->sg = sg;
-	obj->pages = drm_malloc_ab(npages, sizeof(struct page *));
-	if (obj->pages == NULL) {
-		DRM_ERROR("obj pages is NULL %d\n", npages);
-		return -ENOMEM;
-	}
-
-	drm_prime_sg_to_page_addr_arrays(sg, obj->pages, NULL, npages);
-
-	*obj_p = obj;
-	return 0;
-}
-
-struct drm_gem_object *udl_gem_prime_import(struct drm_device *dev,
-				struct dma_buf *dma_buf)
-{
-	struct dma_buf_attachment *attach;
-	struct sg_table *sg;
-	struct udl_gem_object *uobj;
-	int ret;
-
-	/* need to attach */
-	get_device(dev->dev);
-	attach = dma_buf_attach(dma_buf, dev->dev);
-	if (IS_ERR(attach)) {
-		put_device(dev->dev);
-		return ERR_CAST(attach);
-	}
-
-	get_dma_buf(dma_buf);
-
-	sg = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
-	if (IS_ERR(sg)) {
-		ret = PTR_ERR(sg);
-		goto fail_detach;
-	}
-
-	ret = udl_prime_create(dev, dma_buf->size, sg, &uobj);
-	if (ret) {
-		goto fail_unmap;
-	}
-
-	uobj->base.import_attach = attach;
-
-	return &uobj->base;
-
-fail_unmap:
-	dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
-fail_detach:
-	dma_buf_detach(dma_buf, attach);
-	dma_buf_put(dma_buf);
-	put_device(dev->dev);
-	return ERR_PTR(ret);
-}
diff --git a/drivers/gpu/drm/udl/udl_modeset.c b/drivers/gpu/drm/udl/udl_modeset.c
index dc145d320b25..1701f1dfb23f 100644
--- a/drivers/gpu/drm/udl/udl_modeset.c
+++ b/drivers/gpu/drm/udl/udl_modeset.c
@@ -14,6 +14,7 @@
 #include <drm/drmP.h>
 #include <drm/drm_crtc.h>
 #include <drm/drm_crtc_helper.h>
+#include <drm/drm_plane_helper.h>
 #include "udl_drv.h"
 
 /*
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 25f3c250fd98..7b5d22110f25 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -889,8 +889,7 @@ static int vmw_driver_unload(struct drm_device *dev)
 
 	if (dev_priv->ctx.res_ht_initialized)
 		drm_ht_remove(&dev_priv->ctx.res_ht);
-	if (dev_priv->ctx.cmd_bounce)
-		vfree(dev_priv->ctx.cmd_bounce);
+	vfree(dev_priv->ctx.cmd_bounce);
 	if (dev_priv->enable_fb) {
 		vmw_fb_close(dev_priv);
 		vmw_kms_restore_vga(dev_priv);
@@ -1063,8 +1062,12 @@ static long vmw_generic_ioctl(struct file *filp, unsigned int cmd,
 
 	vmaster = vmw_master_check(dev, file_priv, flags);
 	if (unlikely(IS_ERR(vmaster))) {
-		DRM_INFO("IOCTL ERROR %d\n", nr);
-		return PTR_ERR(vmaster);
+		ret = PTR_ERR(vmaster);
+
+		if (ret != -ERESTARTSYS)
+			DRM_INFO("IOCTL ERROR Command %d, Error %ld.\n",
+				 nr, ret);
+		return ret;
 	}
 
 	ret = ioctl_func(filp, cmd, arg);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index 596cd6dafd33..33176d05db35 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -2487,7 +2487,8 @@ int vmw_execbuf_process(struct drm_file *file_priv,
 	if (unlikely(ret != 0))
 		goto out_err_nores;
 
-	ret = ttm_eu_reserve_buffers(&ticket, &sw_context->validate_nodes, true);
+	ret = ttm_eu_reserve_buffers(&ticket, &sw_context->validate_nodes,
+				     true, NULL);
 	if (unlikely(ret != 0))
 		goto out_err;
 
@@ -2677,7 +2678,8 @@ void __vmw_execbuf_release_pinned_bo(struct vmw_private *dev_priv,
 	query_val.shared = false;
 	list_add_tail(&query_val.head, &validate_list);
 
-	ret = ttm_eu_reserve_buffers(&ticket, &validate_list, false);
+	ret = ttm_eu_reserve_buffers(&ticket, &validate_list,
+				     false, NULL);
 	if (unlikely(ret != 0)) {
 		vmw_execbuf_unpin_panic(dev_priv);
 		goto out_no_reserve;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 197164fd7803..b7594cb758af 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -545,35 +545,19 @@ void vmw_fence_obj_flush(struct vmw_fence_obj *fence)
 
 static void vmw_fence_destroy(struct vmw_fence_obj *fence)
 {
-	struct vmw_fence_manager *fman = fman_from_fence(fence);
-
 	fence_free(&fence->base);
-
-	/*
-	 * Free kernel space accounting.
-	 */
-	ttm_mem_global_free(vmw_mem_glob(fman->dev_priv),
-			    fman->fence_size);
 }
 
 int vmw_fence_create(struct vmw_fence_manager *fman,
 		     uint32_t seqno,
 		     struct vmw_fence_obj **p_fence)
 {
-	struct ttm_mem_global *mem_glob = vmw_mem_glob(fman->dev_priv);
 	struct vmw_fence_obj *fence;
 	int ret;
 
-	ret = ttm_mem_global_alloc(mem_glob, fman->fence_size,
-				   false, false);
-	if (unlikely(ret != 0))
-		return ret;
-
 	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
-	if (unlikely(fence == NULL)) {
-		ret = -ENOMEM;
-		goto out_no_object;
-	}
+	if (unlikely(fence == NULL))
+		return -ENOMEM;
 
 	ret = vmw_fence_obj_init(fman, fence, seqno,
 				 vmw_fence_destroy);
@@ -585,8 +569,6 @@ int vmw_fence_create(struct vmw_fence_manager *fman,
 
 out_err_init:
 	kfree(fence);
-out_no_object:
-	ttm_mem_global_free(mem_glob, fman->fence_size);
 	return ret;
 }
 
@@ -1105,6 +1087,8 @@ static int vmw_event_fence_action_create(struct drm_file *file_priv,
 	if (ret != 0)
 		goto out_no_queue;
 
+	return 0;
+
 out_no_queue:
 	event->base.destroy(&event->base);
 out_no_event:
@@ -1180,17 +1164,10 @@ int vmw_fence_event_ioctl(struct drm_device *dev, void *data,
 
 	BUG_ON(fence == NULL);
 
-	if (arg->flags & DRM_VMW_FE_FLAG_REQ_TIME)
-		ret = vmw_event_fence_action_create(file_priv, fence,
-						    arg->flags,
-						    arg->user_data,
-						    true);
-	else
-		ret = vmw_event_fence_action_create(file_priv, fence,
-						    arg->flags,
-						    arg->user_data,
-						    true);
-
+	ret = vmw_event_fence_action_create(file_priv, fence,
+					    arg->flags,
+					    arg->user_data,
+					    true);
 	if (unlikely(ret != 0)) {
 		if (ret != -ERESTARTSYS)
 			DRM_ERROR("Failed to attach event to fence.\n");
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index 941a7bc0b791..3725b521d931 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -252,7 +252,7 @@ int vmw_du_crtc_cursor_set(struct drm_crtc *crtc, struct drm_file *file_priv,
 	ret = 0;
 out:
 	drm_modeset_unlock_all(dev_priv->dev);
-	drm_modeset_lock_crtc(crtc);
+	drm_modeset_lock_crtc(crtc, crtc->cursor);
 
 	return ret;
 }
@@ -281,7 +281,7 @@ int vmw_du_crtc_cursor_move(struct drm_crtc *crtc, int x, int y)
 				   du->cursor_y + du->hotspot_y);
 
 	drm_modeset_unlock_all(dev_priv->dev);
-	drm_modeset_lock_crtc(crtc);
+	drm_modeset_lock_crtc(crtc, crtc->cursor);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
index 15e185ae4c99..5c289f748ab4 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
@@ -26,6 +26,7 @@
  **************************************************************************/
 
 #include "vmwgfx_kms.h"
+#include <drm/drm_plane_helper.h>
 
 
 #define vmw_crtc_to_ldu(x) \
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 026de7cea0f6..210ef15b1d09 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -1222,7 +1222,7 @@ vmw_resource_check_buffer(struct vmw_resource *res,
 	val_buf->bo = ttm_bo_reference(&res->backup->base);
 	val_buf->shared = false;
 	list_add_tail(&val_buf->head, &val_list);
-	ret = ttm_eu_reserve_buffers(NULL, &val_list, interruptible);
+	ret = ttm_eu_reserve_buffers(NULL, &val_list, interruptible, NULL);
 	if (unlikely(ret != 0))
 		goto out_no_reserve;
 
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
index b295463a60b3..7dc591d04d9a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
@@ -26,6 +26,7 @@
  **************************************************************************/
 
 #include "vmwgfx_kms.h"
+#include <drm/drm_plane_helper.h>
 
 
 #define vmw_crtc_to_sou(x) \
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
index 8719fb3cccc9..6a4584a43aa6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
@@ -198,7 +198,7 @@ static int vmw_gb_shader_bind(struct vmw_resource *res,
 	cmd->header.size = sizeof(cmd->body);
 	cmd->body.shid = res->id;
 	cmd->body.mobid = bo->mem.start;
-	cmd->body.offsetInBytes = 0;
+	cmd->body.offsetInBytes = res->backup_offset;
 	res->backup_dirty = false;
 	vmw_fifo_commit(dev_priv, sizeof(*cmd));
 
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 3995255b16c7..5a8c8d55317a 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -97,7 +97,7 @@ fail:
 static void host1x_pushbuffer_push(struct push_buffer *pb, u32 op1, u32 op2)
 {
 	u32 pos = pb->pos;
-	u32 *p = (u32 *)((u32)pb->mapped + pos);
+	u32 *p = (u32 *)((void *)pb->mapped + pos);
 	WARN_ON(pos == pb->fence);
 	*(p++) = op1;
 	*(p++) = op2;
diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
index 313c4b784348..470087af8fe5 100644
--- a/drivers/gpu/host1x/cdma.h
+++ b/drivers/gpu/host1x/cdma.h
@@ -42,7 +42,7 @@ struct host1x_job;
  */
 
 struct push_buffer {
-	u32 *mapped;			/* mapped pushbuffer memory */
+	void *mapped;			/* mapped pushbuffer memory */
 	dma_addr_t phys;		/* physical address of pushbuffer */
 	u32 fence;			/* index we've written */
 	u32 pos;			/* index to write to */
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index 6b09b71940c2..305ea8f3382d 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -26,11 +26,11 @@
 #include "../debug.h"
 
 /*
- * Put the restart at the end of pushbuffer memor
+ * Put the restart at the end of pushbuffer memory
  */
 static void push_buffer_init(struct push_buffer *pb)
 {
-	*(pb->mapped + (pb->size_bytes >> 2)) = host1x_opcode_restart(0);
+	*(u32 *)(pb->mapped + pb->size_bytes) = host1x_opcode_restart(0);
 }
 
 /*
@@ -51,11 +51,11 @@ static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
 
 	/* NOP all the PB slots */
 	while (nr_slots--) {
-		u32 *p = (u32 *)((u32)pb->mapped + getptr);
+		u32 *p = (u32 *)(pb->mapped + getptr);
 		*(p++) = HOST1X_OPCODE_NOP;
 		*(p++) = HOST1X_OPCODE_NOP;
-		dev_dbg(host1x->dev, "%s: NOP at %#llx\n", __func__,
-			(u64)pb->phys + getptr);
+		dev_dbg(host1x->dev, "%s: NOP at %pad+%#x\n", __func__,
+			&pb->phys, getptr);
 		getptr = (getptr + 8) & (pb->size_bytes - 1);
 	}
 	wmb();
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 4608257ab656..946c332c3906 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -32,6 +32,7 @@
 static void trace_write_gather(struct host1x_cdma *cdma, struct host1x_bo *bo,
 			       u32 offset, u32 words)
 {
+	struct device *dev = cdma_to_channel(cdma)->dev;
 	void *mem = NULL;
 
 	if (host1x_debug_trace_cmdbuf)
@@ -44,11 +45,14 @@ static void trace_write_gather(struct host1x_cdma *cdma, struct host1x_bo *bo,
 		 * of how much you can output to ftrace at once.
 		 */
 		for (i = 0; i < words; i += TRACE_MAX_LENGTH) {
-			trace_host1x_cdma_push_gather(
-				dev_name(cdma_to_channel(cdma)->dev),
-				(u32)bo, min(words - i, TRACE_MAX_LENGTH),
-				offset + i * sizeof(u32), mem);
+			u32 num_words = min(words - i, TRACE_MAX_LENGTH);
+			offset += i * sizeof(u32);
+
+			trace_host1x_cdma_push_gather(dev_name(dev), bo,
+						      num_words, offset,
+						      mem);
 		}
+
 		host1x_bo_munmap(bo, mem);
 	}
 }
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
index f72c873eff81..791de9351eeb 100644
--- a/drivers/gpu/host1x/hw/debug_hw.c
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -163,8 +163,8 @@ static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 				continue;
 			}
 
-			host1x_debug_output(o, "    GATHER at %#llx+%04x, %d words\n",
-					    (u64)g->base, g->offset, g->words);
+			host1x_debug_output(o, "    GATHER at %pad+%#x, %d words\n",
+					    &g->base, g->offset, g->words);
 
 			show_gather(o, g->base + g->offset, g->words, cdma,
 				    g->base, mapped);
diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h
index 33a697d6dcef..8b3c15df0660 100644
--- a/drivers/gpu/host1x/job.h
+++ b/drivers/gpu/host1x/job.h
@@ -23,7 +23,7 @@ struct host1x_job_gather {
 	u32 words;
 	dma_addr_t base;
 	struct host1x_bo *bo;
-	int offset;
+	u32 offset;
 	bool handled;
 };
 
diff --git a/drivers/gpu/host1x/mipi.c b/drivers/gpu/host1x/mipi.c
index 9882ea122024..fbc6ee6ca337 100644
--- a/drivers/gpu/host1x/mipi.c
+++ b/drivers/gpu/host1x/mipi.c
@@ -49,35 +49,47 @@
 #define MIPI_CAL_CONFIG_DSIC		0x10
 #define MIPI_CAL_CONFIG_DSID		0x11
 
+#define MIPI_CAL_CONFIG_DSIAB_CLK	0x19
+#define MIPI_CAL_CONFIG_DSICD_CLK	0x1a
+#define MIPI_CAL_CONFIG_CSIAB_CLK	0x1b
+#define MIPI_CAL_CONFIG_CSICD_CLK	0x1c
+#define MIPI_CAL_CONFIG_CSIE_CLK	0x1d
+
+/* for data and clock lanes */
 #define MIPI_CAL_CONFIG_SELECT		(1 << 21)
+
+/* for data lanes */
 #define MIPI_CAL_CONFIG_HSPDOS(x)	(((x) & 0x1f) << 16)
 #define MIPI_CAL_CONFIG_HSPUOS(x)	(((x) & 0x1f) <<  8)
 #define MIPI_CAL_CONFIG_TERMOS(x)	(((x) & 0x1f) <<  0)
 
+/* for clock lanes */
+#define MIPI_CAL_CONFIG_HSCLKPDOSD(x)	(((x) & 0x1f) <<  8)
+#define MIPI_CAL_CONFIG_HSCLKPUOSD(x)	(((x) & 0x1f) <<  0)
+
 #define MIPI_CAL_BIAS_PAD_CFG0		0x16
 #define MIPI_CAL_BIAS_PAD_PDVCLAMP	(1 << 1)
 #define MIPI_CAL_BIAS_PAD_E_VCLAMP_REF	(1 << 0)
 
 #define MIPI_CAL_BIAS_PAD_CFG1		0x17
+#define MIPI_CAL_BIAS_PAD_DRV_DN_REF(x) (((x) & 0x7) << 16)
 
 #define MIPI_CAL_BIAS_PAD_CFG2		0x18
 #define MIPI_CAL_BIAS_PAD_PDVREG	(1 << 1)
 
-static const struct module {
-	unsigned long reg;
-} modules[] = {
-	{ .reg = MIPI_CAL_CONFIG_CSIA },
-	{ .reg = MIPI_CAL_CONFIG_CSIB },
-	{ .reg = MIPI_CAL_CONFIG_CSIC },
-	{ .reg = MIPI_CAL_CONFIG_CSID },
-	{ .reg = MIPI_CAL_CONFIG_CSIE },
-	{ .reg = MIPI_CAL_CONFIG_DSIA },
-	{ .reg = MIPI_CAL_CONFIG_DSIB },
-	{ .reg = MIPI_CAL_CONFIG_DSIC },
-	{ .reg = MIPI_CAL_CONFIG_DSID },
+struct tegra_mipi_pad {
+	unsigned long data;
+	unsigned long clk;
+};
+
+struct tegra_mipi_soc {
+	bool has_clk_lane;
+	const struct tegra_mipi_pad *pads;
+	unsigned int num_pads;
 };
 
 struct tegra_mipi {
+	const struct tegra_mipi_soc *soc;
 	void __iomem *regs;
 	struct mutex lock;
 	struct clk *clk;
@@ -90,16 +102,16 @@ struct tegra_mipi_device {
 	unsigned long pads;
 };
 
-static inline unsigned long tegra_mipi_readl(struct tegra_mipi *mipi,
-					     unsigned long reg)
+static inline u32 tegra_mipi_readl(struct tegra_mipi *mipi,
+				   unsigned long offset)
 {
-	return readl(mipi->regs + (reg << 2));
+	return readl(mipi->regs + (offset << 2));
 }
 
-static inline void tegra_mipi_writel(struct tegra_mipi *mipi,
-				     unsigned long value, unsigned long reg)
+static inline void tegra_mipi_writel(struct tegra_mipi *mipi, u32 value,
+				     unsigned long offset)
 {
-	writel(value, mipi->regs + (reg << 2));
+	writel(value, mipi->regs + (offset << 2));
 }
 
 struct tegra_mipi_device *tegra_mipi_request(struct device *device)
@@ -117,36 +129,35 @@ struct tegra_mipi_device *tegra_mipi_request(struct device *device)
 
 	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
 	if (!dev) {
-		of_node_put(args.np);
 		err = -ENOMEM;
 		goto out;
 	}
 
 	dev->pdev = of_find_device_by_node(args.np);
 	if (!dev->pdev) {
-		of_node_put(args.np);
 		err = -ENODEV;
 		goto free;
 	}
 
-	of_node_put(args.np);
-
 	dev->mipi = platform_get_drvdata(dev->pdev);
 	if (!dev->mipi) {
 		err = -EPROBE_DEFER;
-		goto pdev_put;
+		goto put;
 	}
 
+	of_node_put(args.np);
+
 	dev->pads = args.args[0];
 	dev->device = device;
 
 	return dev;
 
-pdev_put:
+put:
 	platform_device_put(dev->pdev);
 free:
 	kfree(dev);
 out:
+	of_node_put(args.np);
 	return ERR_PTR(err);
 }
 EXPORT_SYMBOL(tegra_mipi_request);
@@ -161,7 +172,7 @@ EXPORT_SYMBOL(tegra_mipi_free);
 static int tegra_mipi_wait(struct tegra_mipi *mipi)
 {
 	unsigned long timeout = jiffies + msecs_to_jiffies(250);
-	unsigned long value;
+	u32 value;
 
 	while (time_before(jiffies, timeout)) {
 		value = tegra_mipi_readl(mipi, MIPI_CAL_STATUS);
@@ -177,8 +188,9 @@ static int tegra_mipi_wait(struct tegra_mipi *mipi)
 
 int tegra_mipi_calibrate(struct tegra_mipi_device *device)
 {
-	unsigned long value;
+	const struct tegra_mipi_soc *soc = device->mipi->soc;
 	unsigned int i;
+	u32 value;
 	int err;
 
 	err = clk_enable(device->mipi->clk);
@@ -192,23 +204,35 @@ int tegra_mipi_calibrate(struct tegra_mipi_device *device)
 	value |= MIPI_CAL_BIAS_PAD_E_VCLAMP_REF;
 	tegra_mipi_writel(device->mipi, value, MIPI_CAL_BIAS_PAD_CFG0);
 
+	tegra_mipi_writel(device->mipi, MIPI_CAL_BIAS_PAD_DRV_DN_REF(2),
+			  MIPI_CAL_BIAS_PAD_CFG1);
+
 	value = tegra_mipi_readl(device->mipi, MIPI_CAL_BIAS_PAD_CFG2);
 	value &= ~MIPI_CAL_BIAS_PAD_PDVREG;
 	tegra_mipi_writel(device->mipi, value, MIPI_CAL_BIAS_PAD_CFG2);
 
-	for (i = 0; i < ARRAY_SIZE(modules); i++) {
-		if (device->pads & BIT(i))
-			value = MIPI_CAL_CONFIG_SELECT |
-				MIPI_CAL_CONFIG_HSPDOS(0) |
-				MIPI_CAL_CONFIG_HSPUOS(4) |
-				MIPI_CAL_CONFIG_TERMOS(5);
-		else
-			value = 0;
+	for (i = 0; i < soc->num_pads; i++) {
+		u32 clk = 0, data = 0;
+
+		if (device->pads & BIT(i)) {
+			data = MIPI_CAL_CONFIG_SELECT |
+			       MIPI_CAL_CONFIG_HSPDOS(0) |
+			       MIPI_CAL_CONFIG_HSPUOS(4) |
+			       MIPI_CAL_CONFIG_TERMOS(5);
+			clk = MIPI_CAL_CONFIG_SELECT |
+			      MIPI_CAL_CONFIG_HSCLKPDOSD(0) |
+			      MIPI_CAL_CONFIG_HSCLKPUOSD(4);
+		}
 
-		tegra_mipi_writel(device->mipi, value, modules[i].reg);
+		tegra_mipi_writel(device->mipi, data, soc->pads[i].data);
+
+		if (soc->has_clk_lane)
+			tegra_mipi_writel(device->mipi, clk, soc->pads[i].clk);
 	}
 
-	tegra_mipi_writel(device->mipi, MIPI_CAL_CTRL_START, MIPI_CAL_CTRL);
+	value = tegra_mipi_readl(device->mipi, MIPI_CAL_CTRL);
+	value |= MIPI_CAL_CTRL_START;
+	tegra_mipi_writel(device->mipi, value, MIPI_CAL_CTRL);
 
 	err = tegra_mipi_wait(device->mipi);
 
@@ -219,16 +243,63 @@ int tegra_mipi_calibrate(struct tegra_mipi_device *device)
 }
 EXPORT_SYMBOL(tegra_mipi_calibrate);
 
+static const struct tegra_mipi_pad tegra114_mipi_pads[] = {
+	{ .data = MIPI_CAL_CONFIG_CSIA },
+	{ .data = MIPI_CAL_CONFIG_CSIB },
+	{ .data = MIPI_CAL_CONFIG_CSIC },
+	{ .data = MIPI_CAL_CONFIG_CSID },
+	{ .data = MIPI_CAL_CONFIG_CSIE },
+	{ .data = MIPI_CAL_CONFIG_DSIA },
+	{ .data = MIPI_CAL_CONFIG_DSIB },
+	{ .data = MIPI_CAL_CONFIG_DSIC },
+	{ .data = MIPI_CAL_CONFIG_DSID },
+};
+
+static const struct tegra_mipi_soc tegra114_mipi_soc = {
+	.has_clk_lane = false,
+	.pads = tegra114_mipi_pads,
+	.num_pads = ARRAY_SIZE(tegra114_mipi_pads),
+};
+
+static const struct tegra_mipi_pad tegra124_mipi_pads[] = {
+	{ .data = MIPI_CAL_CONFIG_CSIA, .clk = MIPI_CAL_CONFIG_CSIAB_CLK },
+	{ .data = MIPI_CAL_CONFIG_CSIB, .clk = MIPI_CAL_CONFIG_CSIAB_CLK },
+	{ .data = MIPI_CAL_CONFIG_CSIC, .clk = MIPI_CAL_CONFIG_CSICD_CLK },
+	{ .data = MIPI_CAL_CONFIG_CSID, .clk = MIPI_CAL_CONFIG_CSICD_CLK },
+	{ .data = MIPI_CAL_CONFIG_CSIE, .clk = MIPI_CAL_CONFIG_CSIE_CLK },
+	{ .data = MIPI_CAL_CONFIG_DSIA, .clk = MIPI_CAL_CONFIG_DSIAB_CLK },
+	{ .data = MIPI_CAL_CONFIG_DSIB, .clk = MIPI_CAL_CONFIG_DSIAB_CLK },
+};
+
+static const struct tegra_mipi_soc tegra124_mipi_soc = {
+	.has_clk_lane = true,
+	.pads = tegra124_mipi_pads,
+	.num_pads = ARRAY_SIZE(tegra124_mipi_pads),
+};
+
+static struct of_device_id tegra_mipi_of_match[] = {
+	{ .compatible = "nvidia,tegra114-mipi", .data = &tegra114_mipi_soc },
+	{ .compatible = "nvidia,tegra124-mipi", .data = &tegra124_mipi_soc },
+	{ },
+};
+
 static int tegra_mipi_probe(struct platform_device *pdev)
 {
+	const struct of_device_id *match;
 	struct tegra_mipi *mipi;
 	struct resource *res;
 	int err;
 
+	match = of_match_node(tegra_mipi_of_match, pdev->dev.of_node);
+	if (!match)
+		return -ENODEV;
+
 	mipi = devm_kzalloc(&pdev->dev, sizeof(*mipi), GFP_KERNEL);
 	if (!mipi)
 		return -ENOMEM;
 
+	mipi->soc = match->data;
+
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	mipi->regs = devm_ioremap_resource(&pdev->dev, res);
 	if (IS_ERR(mipi->regs))
@@ -260,11 +331,6 @@ static int tegra_mipi_remove(struct platform_device *pdev)
 	return 0;
 }
 
-static struct of_device_id tegra_mipi_of_match[] = {
-	{ .compatible = "nvidia,tegra114-mipi", },
-	{ },
-};
-
 struct platform_driver tegra_mipi_driver = {
 	.driver = {
 		.name = "tegra-mipi",
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index bea878f8e7d3..90f70d0e1141 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -92,13 +92,6 @@ static spinlock_t state_lock;
 
 static struct workqueue_struct *iommu_wq;
 
-/*
- * Empty page table - Used between
- * mmu_notifier_invalidate_range_start and
- * mmu_notifier_invalidate_range_end
- */
-static u64 *empty_page_table;
-
 static void free_pasid_states(struct device_state *dev_state);
 
 static u16 device_id(struct pci_dev *pdev)
@@ -414,46 +407,21 @@ static void mn_invalidate_page(struct mmu_notifier *mn,
 	__mn_flush_page(mn, address);
 }
 
-static void mn_invalidate_range_start(struct mmu_notifier *mn,
-				      struct mm_struct *mm,
-				      unsigned long start, unsigned long end)
-{
-	struct pasid_state *pasid_state;
-	struct device_state *dev_state;
-	unsigned long flags;
-
-	pasid_state = mn_to_state(mn);
-	dev_state   = pasid_state->device_state;
-
-	spin_lock_irqsave(&pasid_state->lock, flags);
-	if (pasid_state->mmu_notifier_count == 0) {
-		amd_iommu_domain_set_gcr3(dev_state->domain,
-					  pasid_state->pasid,
-					  __pa(empty_page_table));
-	}
-	pasid_state->mmu_notifier_count += 1;
-	spin_unlock_irqrestore(&pasid_state->lock, flags);
-}
-
-static void mn_invalidate_range_end(struct mmu_notifier *mn,
-				    struct mm_struct *mm,
-				    unsigned long start, unsigned long end)
+static void mn_invalidate_range(struct mmu_notifier *mn,
+				struct mm_struct *mm,
+				unsigned long start, unsigned long end)
 {
 	struct pasid_state *pasid_state;
 	struct device_state *dev_state;
-	unsigned long flags;
 
 	pasid_state = mn_to_state(mn);
 	dev_state   = pasid_state->device_state;
 
-	spin_lock_irqsave(&pasid_state->lock, flags);
-	pasid_state->mmu_notifier_count -= 1;
-	if (pasid_state->mmu_notifier_count == 0) {
-		amd_iommu_domain_set_gcr3(dev_state->domain,
-					  pasid_state->pasid,
-					  __pa(pasid_state->mm->pgd));
-	}
-	spin_unlock_irqrestore(&pasid_state->lock, flags);
+	if ((start ^ (end - 1)) < PAGE_SIZE)
+		amd_iommu_flush_page(dev_state->domain, pasid_state->pasid,
+				     start);
+	else
+		amd_iommu_flush_tlb(dev_state->domain, pasid_state->pasid);
 }
 
 static void mn_release(struct mmu_notifier *mn, struct mm_struct *mm)
@@ -478,8 +446,7 @@ static struct mmu_notifier_ops iommu_mn = {
 	.release		= mn_release,
 	.clear_flush_young      = mn_clear_flush_young,
 	.invalidate_page        = mn_invalidate_page,
-	.invalidate_range_start = mn_invalidate_range_start,
-	.invalidate_range_end   = mn_invalidate_range_end,
+	.invalidate_range       = mn_invalidate_range,
 };
 
 static void set_pri_tag_status(struct pasid_state *pasid_state,
@@ -972,18 +939,10 @@ static int __init amd_iommu_v2_init(void)
 	if (iommu_wq == NULL)
 		goto out;
 
-	ret = -ENOMEM;
-	empty_page_table = (u64 *)get_zeroed_page(GFP_KERNEL);
-	if (empty_page_table == NULL)
-		goto out_destroy_wq;
-
 	amd_iommu_register_ppr_notifier(&ppr_nb);
 
 	return 0;
 
-out_destroy_wq:
-	destroy_workqueue(iommu_wq);
-
 out:
 	return ret;
 }
@@ -1017,8 +976,6 @@ static void __exit amd_iommu_v2_exit(void)
 	}
 
 	destroy_workqueue(iommu_wq);
-
-	free_page((unsigned long)empty_page_table);
 }
 
 module_init(amd_iommu_v2_init);
diff --git a/drivers/soc/tegra/fuse/fuse-tegra.c b/drivers/soc/tegra/fuse/fuse-tegra.c
index 11a5043959dc..011a3363c265 100644
--- a/drivers/soc/tegra/fuse/fuse-tegra.c
+++ b/drivers/soc/tegra/fuse/fuse-tegra.c
@@ -31,6 +31,7 @@
 static u32 (*fuse_readl)(const unsigned int offset);
 static int fuse_size;
 struct tegra_sku_info tegra_sku_info;
+EXPORT_SYMBOL(tegra_sku_info);
 
 static const char *tegra_revision_name[TEGRA_REVISION_MAX] = {
 	[TEGRA_REVISION_UNKNOWN] = "unknown",
diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 4690ae9a267f..9425728b7eb5 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -86,8 +86,6 @@ source "drivers/staging/gdm72xx/Kconfig"
 
 source "drivers/staging/gdm724x/Kconfig"
 
-source "drivers/staging/imx-drm/Kconfig"
-
 source "drivers/staging/fwserial/Kconfig"
 
 source "drivers/staging/goldfish/Kconfig"
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index c780a0e70e15..bc233dd98a95 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -36,7 +36,6 @@ obj-$(CONFIG_STAGING_BOARD)	+= board/
 obj-$(CONFIG_USB_WPAN_HCD)	+= ozwpan/
 obj-$(CONFIG_WIMAX_GDM72XX)	+= gdm72xx/
 obj-$(CONFIG_LTE_GDM724X)	+= gdm724x/
-obj-$(CONFIG_DRM_IMX)		+= imx-drm/
 obj-$(CONFIG_FIREWIRE_SERIAL)	+= fwserial/
 obj-$(CONFIG_GOLDFISH)		+= goldfish/
 obj-$(CONFIG_LUSTRE_FS)		+= lustre/
diff --git a/drivers/staging/imx-drm/TODO b/drivers/staging/imx-drm/TODO
deleted file mode 100644
index 29636fb13959..000000000000
--- a/drivers/staging/imx-drm/TODO
+++ /dev/null
@@ -1,17 +0,0 @@
-TODO:
-- get DRM Maintainer review for this code
-- decide where to put the base driver. It is not specific to a subsystem
-  and would be used by DRM/KMS and media/V4L2
-
-Missing features (not necessarily for moving out of staging):
-
-- Add support for IC (Image converter)
-- Add support for CSI (CMOS Sensor interface)
-- Add support for VDIC (Video Deinterlacer)
-
-Many work-in-progress patches for the above features exist. Contact
-Sascha Hauer <kernel@pengutronix.de> if you are interested in working
-on a specific feature.
-
-Please send any patches to Greg Kroah-Hartman <gregkh@linuxfoundation.org> and
-Sascha Hauer <kernel@pengutronix.de>
diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index 53ed87698a74..8ba35c622e22 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -125,8 +125,8 @@ struct dma_buf_attachment;
 extern __printf(2, 3)
 void drm_ut_debug_printk(const char *function_name,
 			 const char *format, ...);
-extern __printf(2, 3)
-void drm_err(const char *func, const char *format, ...);
+extern __printf(1, 2)
+void drm_err(const char *format, ...);
 
 /***********************************************************************/
 /** \name DRM template customization defaults */
@@ -155,7 +155,7 @@ void drm_err(const char *func, const char *format, ...);
  * \param arg arguments
  */
 #define DRM_ERROR(fmt, ...)				\
-	drm_err(__func__, fmt, ##__VA_ARGS__)
+	drm_err(fmt, ##__VA_ARGS__)
 
 /**
  * Rate limited error output.  Like DRM_ERROR() but won't flood the log.
@@ -170,7 +170,7 @@ void drm_err(const char *func, const char *format, ...);
 				      DEFAULT_RATELIMIT_BURST);		\
 									\
 	if (__ratelimit(&_rs))						\
-		drm_err(__func__, fmt, ##__VA_ARGS__);			\
+		drm_err(fmt, ##__VA_ARGS__);				\
 })
 
 #define DRM_INFO(fmt, ...)				\
@@ -809,7 +809,7 @@ struct drm_device {
 	struct drm_local_map *agp_buffer_map;
 	unsigned int agp_buffer_token;
 
-        struct drm_mode_config mode_config;	/**< Current mode config */
+	struct drm_mode_config mode_config;	/**< Current mode config */
 
 	/** \name GEM information */
 	/*@{ */
@@ -986,7 +986,7 @@ extern void drm_gem_dmabuf_release(struct dma_buf *dma_buf);
 
 extern int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages,
 					    dma_addr_t *addrs, int max_pages);
-extern struct sg_table *drm_prime_pages_to_sg(struct page **pages, int nr_pages);
+extern struct sg_table *drm_prime_pages_to_sg(struct page **pages, unsigned int nr_pages);
 extern void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg);
 
 
@@ -1028,10 +1028,25 @@ void drm_pci_agp_destroy(struct drm_device *dev);
 
 extern int drm_pci_init(struct drm_driver *driver, struct pci_driver *pdriver);
 extern void drm_pci_exit(struct drm_driver *driver, struct pci_driver *pdriver);
+#ifdef CONFIG_PCI
 extern int drm_get_pci_dev(struct pci_dev *pdev,
 			   const struct pci_device_id *ent,
 			   struct drm_driver *driver);
 extern int drm_pci_set_busid(struct drm_device *dev, struct drm_master *master);
+#else
+static inline int drm_get_pci_dev(struct pci_dev *pdev,
+				  const struct pci_device_id *ent,
+				  struct drm_driver *driver)
+{
+	return -ENOSYS;
+}
+
+static inline int drm_pci_set_busid(struct drm_device *dev,
+				    struct drm_master *master)
+{
+	return -ENOSYS;
+}
+#endif
 
 #define DRM_PCIE_SPEED_25 1
 #define DRM_PCIE_SPEED_50 2
diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h
new file mode 100644
index 000000000000..ad2229574dd9
--- /dev/null
+++ b/include/drm/drm_atomic.h
@@ -0,0 +1,69 @@
+/*
+ * Copyright (C) 2014 Red Hat
+ * Copyright (C) 2014 Intel Corp.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ * Rob Clark <robdclark@gmail.com>
+ * Daniel Vetter <daniel.vetter@ffwll.ch>
+ */
+
+#ifndef DRM_ATOMIC_H_
+#define DRM_ATOMIC_H_
+
+#include <drm/drm_crtc.h>
+
+struct drm_atomic_state * __must_check
+drm_atomic_state_alloc(struct drm_device *dev);
+void drm_atomic_state_clear(struct drm_atomic_state *state);
+void drm_atomic_state_free(struct drm_atomic_state *state);
+
+struct drm_crtc_state * __must_check
+drm_atomic_get_crtc_state(struct drm_atomic_state *state,
+			  struct drm_crtc *crtc);
+struct drm_plane_state * __must_check
+drm_atomic_get_plane_state(struct drm_atomic_state *state,
+			   struct drm_plane *plane);
+struct drm_connector_state * __must_check
+drm_atomic_get_connector_state(struct drm_atomic_state *state,
+			       struct drm_connector *connector);
+
+int __must_check
+drm_atomic_set_crtc_for_plane(struct drm_atomic_state *state,
+			      struct drm_plane *plane, struct drm_crtc *crtc);
+void drm_atomic_set_fb_for_plane(struct drm_plane_state *plane_state,
+				 struct drm_framebuffer *fb);
+int __must_check
+drm_atomic_set_crtc_for_connector(struct drm_connector_state *conn_state,
+				  struct drm_crtc *crtc);
+int __must_check
+drm_atomic_add_affected_connectors(struct drm_atomic_state *state,
+				   struct drm_crtc *crtc);
+int
+drm_atomic_connectors_for_crtc(struct drm_atomic_state *state,
+			       struct drm_crtc *crtc);
+
+void drm_atomic_legacy_backoff(struct drm_atomic_state *state);
+
+int __must_check drm_atomic_check_only(struct drm_atomic_state *state);
+int __must_check drm_atomic_commit(struct drm_atomic_state *state);
+int __must_check drm_atomic_async_commit(struct drm_atomic_state *state);
+
+#endif /* DRM_ATOMIC_H_ */
diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
new file mode 100644
index 000000000000..f956b413311e
--- /dev/null
+++ b/include/drm/drm_atomic_helper.h
@@ -0,0 +1,126 @@
+/*
+ * Copyright (C) 2014 Red Hat
+ * Copyright (C) 2014 Intel Corp.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ * Rob Clark <robdclark@gmail.com>
+ * Daniel Vetter <daniel.vetter@ffwll.ch>
+ */
+
+#ifndef DRM_ATOMIC_HELPER_H_
+#define DRM_ATOMIC_HELPER_H_
+
+#include <drm/drm_crtc.h>
+
+int drm_atomic_helper_check(struct drm_device *dev,
+			    struct drm_atomic_state *state);
+int drm_atomic_helper_commit(struct drm_device *dev,
+			     struct drm_atomic_state *state,
+			     bool async);
+
+void drm_atomic_helper_wait_for_vblanks(struct drm_device *dev,
+					struct drm_atomic_state *old_state);
+
+void drm_atomic_helper_commit_pre_planes(struct drm_device *dev,
+					 struct drm_atomic_state *state);
+void drm_atomic_helper_commit_post_planes(struct drm_device *dev,
+					  struct drm_atomic_state *old_state);
+
+int drm_atomic_helper_prepare_planes(struct drm_device *dev,
+				     struct drm_atomic_state *state);
+void drm_atomic_helper_commit_planes(struct drm_device *dev,
+				     struct drm_atomic_state *state);
+void drm_atomic_helper_cleanup_planes(struct drm_device *dev,
+				      struct drm_atomic_state *old_state);
+
+void drm_atomic_helper_swap_state(struct drm_device *dev,
+				  struct drm_atomic_state *state);
+
+/* implementations for legacy interfaces */
+int drm_atomic_helper_update_plane(struct drm_plane *plane,
+				   struct drm_crtc *crtc,
+				   struct drm_framebuffer *fb,
+				   int crtc_x, int crtc_y,
+				   unsigned int crtc_w, unsigned int crtc_h,
+				   uint32_t src_x, uint32_t src_y,
+				   uint32_t src_w, uint32_t src_h);
+int drm_atomic_helper_disable_plane(struct drm_plane *plane);
+int drm_atomic_helper_set_config(struct drm_mode_set *set);
+
+int drm_atomic_helper_crtc_set_property(struct drm_crtc *crtc,
+					struct drm_property *property,
+					uint64_t val);
+int drm_atomic_helper_plane_set_property(struct drm_plane *plane,
+					struct drm_property *property,
+					uint64_t val);
+int drm_atomic_helper_connector_set_property(struct drm_connector *connector,
+					struct drm_property *property,
+					uint64_t val);
+int drm_atomic_helper_page_flip(struct drm_crtc *crtc,
+				struct drm_framebuffer *fb,
+				struct drm_pending_vblank_event *event,
+				uint32_t flags);
+
+/* default implementations for state handling */
+void drm_atomic_helper_crtc_reset(struct drm_crtc *crtc);
+struct drm_crtc_state *
+drm_atomic_helper_crtc_duplicate_state(struct drm_crtc *crtc);
+void drm_atomic_helper_crtc_destroy_state(struct drm_crtc *crtc,
+					  struct drm_crtc_state *state);
+
+void drm_atomic_helper_plane_reset(struct drm_plane *plane);
+struct drm_plane_state *
+drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane);
+void drm_atomic_helper_plane_destroy_state(struct drm_plane *plane,
+					  struct drm_plane_state *state);
+
+void drm_atomic_helper_connector_reset(struct drm_connector *connector);
+struct drm_connector_state *
+drm_atomic_helper_connector_duplicate_state(struct drm_connector *connector);
+void drm_atomic_helper_connector_destroy_state(struct drm_connector *connector,
+					  struct drm_connector_state *state);
+
+/**
+ * drm_atomic_crtc_for_each_plane - iterate over planes currently attached to CRTC
+ * @plane: the loop cursor
+ * @crtc:  the crtc whose planes are iterated
+ *
+ * This iterates over the current state, useful (for example) when applying
+ * atomic state after it has been checked and swapped.  To iterate over the
+ * planes which *will* be attached (for ->atomic_check()) see
+ * drm_crtc_for_each_pending_plane()
+ */
+#define drm_atomic_crtc_for_each_plane(plane, crtc) \
+	drm_for_each_plane_mask(plane, (crtc)->dev, (crtc)->state->plane_mask)
+
+/**
+ * drm_crtc_atomic_state_for_each_plane - iterate over attached planes in new state
+ * @plane: the loop cursor
+ * @crtc_state: the incoming crtc-state
+ *
+ * Similar to drm_crtc_for_each_plane(), but iterates the planes that will be
+ * attached if the specified state is applied.  Useful during (for example)
+ * ->atomic_check() operations, to validate the incoming state
+ */
+#define drm_atomic_crtc_state_for_each_plane(plane, crtc_state) \
+	drm_for_each_plane_mask(plane, (crtc_state)->state->dev, (crtc_state)->plane_mask)
+
+#endif /* DRM_ATOMIC_HELPER_H_ */
diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h
index c40070a92d6b..b86329813ad3 100644
--- a/include/drm/drm_crtc.h
+++ b/include/drm/drm_crtc.h
@@ -42,6 +42,7 @@ struct drm_object_properties;
 struct drm_file;
 struct drm_clip_rect;
 struct device_node;
+struct fence;
 
 #define DRM_MODE_OBJECT_CRTC 0xcccccccc
 #define DRM_MODE_OBJECT_CONNECTOR 0xc0c0c0c0
@@ -136,14 +137,22 @@ struct drm_display_info {
 	u8 cea_rev;
 };
 
+/* data corresponds to displayid vend/prod/serial */
+struct drm_tile_group {
+	struct kref refcount;
+	struct drm_device *dev;
+	int id;
+	u8 group_data[8];
+};
+
 struct drm_framebuffer_funcs {
 	/* note: use drm_framebuffer_remove() */
 	void (*destroy)(struct drm_framebuffer *framebuffer);
 	int (*create_handle)(struct drm_framebuffer *fb,
 			     struct drm_file *file_priv,
 			     unsigned int *handle);
-	/**
-	 * Optinal callback for the dirty fb ioctl.
+	/*
+	 * Optional callback for the dirty fb ioctl.
 	 *
 	 * Userspace can notify the driver via this callback
 	 * that a area of the framebuffer has changed and should
@@ -196,7 +205,7 @@ struct drm_framebuffer {
 struct drm_property_blob {
 	struct drm_mode_object base;
 	struct list_head head;
-	unsigned int length;
+	size_t length;
 	unsigned char data[];
 };
 
@@ -215,7 +224,7 @@ struct drm_property {
 	uint64_t *values;
 	struct drm_device *dev;
 
-	struct list_head enum_blob_list;
+	struct list_head enum_list;
 };
 
 struct drm_crtc;
@@ -224,19 +233,65 @@ struct drm_encoder;
 struct drm_pending_vblank_event;
 struct drm_plane;
 struct drm_bridge;
+struct drm_atomic_state;
 
 /**
- * drm_crtc_funcs - control CRTCs for a given device
+ * struct drm_crtc_state - mutable CRTC state
+ * @enable: whether the CRTC should be enabled, gates all other state
+ * @mode_changed: for use by helpers and drivers when computing state updates
+ * @plane_mask: bitmask of (1 << drm_plane_index(plane)) of attached planes
+ * @last_vblank_count: for helpers and drivers to capture the vblank of the
+ * 	update to ensure framebuffer cleanup isn't done too early
+ * @planes_changed: for use by helpers and drivers when computing state updates
+ * @adjusted_mode: for use by helpers and drivers to compute adjusted mode timings
+ * @mode: current mode timings
+ * @event: optional pointer to a DRM event to signal upon completion of the
+ * 	state update
+ * @state: backpointer to global drm_atomic_state
+ */
+struct drm_crtc_state {
+	bool enable;
+
+	/* computed state bits used by helpers and drivers */
+	bool planes_changed : 1;
+	bool mode_changed : 1;
+
+	/* attached planes bitmask:
+	 * WARNING: transitional helpers do not maintain plane_mask so
+	 * drivers not converted over to atomic helpers should not rely
+	 * on plane_mask being accurate!
+	 */
+	u32 plane_mask;
+
+	/* last_vblank_count: for vblank waits before cleanup */
+	u32 last_vblank_count;
+
+	/* adjusted_mode: for use by helpers and drivers */
+	struct drm_display_mode adjusted_mode;
+
+	struct drm_display_mode mode;
+
+	struct drm_pending_vblank_event *event;
+
+	struct drm_atomic_state *state;
+};
+
+/**
+ * struct drm_crtc_funcs - control CRTCs for a given device
  * @save: save CRTC state
  * @restore: restore CRTC state
  * @reset: reset CRTC after state has been invalidated (e.g. resume)
  * @cursor_set: setup the cursor
+ * @cursor_set2: setup the cursor with hotspot, superseeds @cursor_set if set
  * @cursor_move: move the cursor
  * @gamma_set: specify color ramp for CRTC
  * @destroy: deinit and free object
  * @set_property: called when a property is changed
  * @set_config: apply a new CRTC configuration
  * @page_flip: initiate a page flip
+ * @atomic_duplicate_state: duplicate the atomic state for this CRTC
+ * @atomic_destroy_state: destroy an atomic state for this CRTC
+ * @atomic_set_property: set a property on an atomic state for this CRTC
  *
  * The drm_crtc_funcs structure is the central CRTC management structure
  * in the DRM.  Each CRTC controls one or more connectors (note that the name
@@ -287,16 +342,28 @@ struct drm_crtc_funcs {
 
 	int (*set_property)(struct drm_crtc *crtc,
 			    struct drm_property *property, uint64_t val);
+
+	/* atomic update handling */
+	struct drm_crtc_state *(*atomic_duplicate_state)(struct drm_crtc *crtc);
+	void (*atomic_destroy_state)(struct drm_crtc *crtc,
+				     struct drm_crtc_state *state);
+	int (*atomic_set_property)(struct drm_crtc *crtc,
+				   struct drm_crtc_state *state,
+				   struct drm_property *property,
+				   uint64_t val);
 };
 
 /**
- * drm_crtc - central CRTC control structure
+ * struct drm_crtc - central CRTC control structure
  * @dev: parent DRM device
+ * @port: OF node used by drm_of_find_possible_crtcs()
  * @head: list management
  * @mutex: per-CRTC locking
  * @base: base KMS object for ID tracking etc.
  * @primary: primary plane for this CRTC
  * @cursor: cursor plane for this CRTC
+ * @cursor_x: current x position of the cursor, used for universal cursor planes
+ * @cursor_y: current y position of the cursor, used for universal cursor planes
  * @enabled: is this CRTC enabled?
  * @mode: current mode timings
  * @hwmode: mode timings as programmed to hw regs
@@ -309,10 +376,13 @@ struct drm_crtc_funcs {
  * @gamma_size: size of gamma ramp
  * @gamma_store: gamma ramp values
  * @framedur_ns: precise frame timing
- * @framedur_ns: precise line timing
+ * @linedur_ns: precise line timing
  * @pixeldur_ns: precise pixel timing
  * @helper_private: mid-layer private data
  * @properties: property tracking for this CRTC
+ * @state: current atomic state for this CRTC
+ * @acquire_ctx: per-CRTC implicit acquire context used by atomic drivers for
+ * 	legacy ioctls
  *
  * Each CRTC may have one or more connectors associated with it.  This structure
  * allows the CRTC to be controlled.
@@ -322,7 +392,7 @@ struct drm_crtc {
 	struct device_node *port;
 	struct list_head head;
 
-	/**
+	/*
 	 * crtc mutex
 	 *
 	 * This provides a read lock for the overall crtc state (mode, dpms
@@ -368,6 +438,8 @@ struct drm_crtc {
 
 	struct drm_object_properties properties;
 
+	struct drm_crtc_state *state;
+
 	/*
 	 * For legacy crtc ioctls so that atomic drivers can get at the locking
 	 * acquire context.
@@ -375,9 +447,22 @@ struct drm_crtc {
 	struct drm_modeset_acquire_ctx *acquire_ctx;
 };
 
+/**
+ * struct drm_connector_state - mutable connector state
+ * @crtc: CRTC to connect connector to, NULL if disabled
+ * @best_encoder: can be used by helpers and drivers to select the encoder
+ * @state: backpointer to global drm_atomic_state
+ */
+struct drm_connector_state {
+	struct drm_crtc *crtc;  /* do not write directly, use drm_atomic_set_crtc_for_connector() */
+
+	struct drm_encoder *best_encoder;
+
+	struct drm_atomic_state *state;
+};
 
 /**
- * drm_connector_funcs - control connectors on a given device
+ * struct drm_connector_funcs - control connectors on a given device
  * @dpms: set power state (see drm_crtc_funcs above)
  * @save: save connector state
  * @restore: restore connector state
@@ -387,6 +472,9 @@ struct drm_crtc {
  * @set_property: property for this connector may need an update
  * @destroy: make object go away
  * @force: notify the driver that the connector is forced on
+ * @atomic_duplicate_state: duplicate the atomic state for this connector
+ * @atomic_destroy_state: destroy an atomic state for this connector
+ * @atomic_set_property: set a property on an atomic state for this connector
  *
  * Each CRTC may have one or more connectors attached to it.  The functions
  * below allow the core DRM code to control connectors, enumerate available modes,
@@ -411,10 +499,19 @@ struct drm_connector_funcs {
 			     uint64_t val);
 	void (*destroy)(struct drm_connector *connector);
 	void (*force)(struct drm_connector *connector);
+
+	/* atomic update handling */
+	struct drm_connector_state *(*atomic_duplicate_state)(struct drm_connector *connector);
+	void (*atomic_destroy_state)(struct drm_connector *connector,
+				     struct drm_connector_state *state);
+	int (*atomic_set_property)(struct drm_connector *connector,
+				   struct drm_connector_state *state,
+				   struct drm_property *property,
+				   uint64_t val);
 };
 
 /**
- * drm_encoder_funcs - encoder controls
+ * struct drm_encoder_funcs - encoder controls
  * @reset: reset state (e.g. at init or resume time)
  * @destroy: cleanup and free associated data
  *
@@ -428,7 +525,7 @@ struct drm_encoder_funcs {
 #define DRM_CONNECTOR_MAX_ENCODER 3
 
 /**
- * drm_encoder - central DRM encoder structure
+ * struct drm_encoder - central DRM encoder structure
  * @dev: parent DRM device
  * @head: list management
  * @base: base KMS object
@@ -472,7 +569,7 @@ struct drm_encoder {
 #define MAX_ELD_BYTES	128
 
 /**
- * drm_connector - central DRM connector control structure
+ * struct drm_connector - central DRM connector control structure
  * @dev: parent DRM device
  * @kdev: kernel device for sysfs attributes
  * @attr: sysfs attributes
@@ -483,6 +580,7 @@ struct drm_encoder {
  * @connector_type_id: index into connector type enum
  * @interlace_allowed: can this connector handle interlaced modes?
  * @doublescan_allowed: can this connector handle doublescan?
+ * @stereo_allowed: can this connector handle stereo modes?
  * @modes: modes available on this connector (from fill_modes() + user)
  * @status: one of the drm_connector_status enums (connected, not, or unknown)
  * @probed_modes: list of modes derived directly from the display
@@ -490,10 +588,13 @@ struct drm_encoder {
  * @funcs: connector control functions
  * @edid_blob_ptr: DRM property containing EDID if present
  * @properties: property tracking for this connector
+ * @path_blob_ptr: DRM blob property data for the DP MST path property
  * @polled: a %DRM_CONNECTOR_POLL_<foo> value for core driven polling
  * @dpms: current dpms state
  * @helper_private: mid-layer private data
+ * @cmdline_mode: mode line parsed from the kernel cmdline for this connector
  * @force: a %DRM_FORCE_<foo> state for forced mode sets
+ * @override_edid: has the EDID been overwritten through debugfs for testing?
  * @encoder_ids: valid encoders for this connector
  * @encoder: encoder driving this connector, if any
  * @eld: EDID-like data, if present
@@ -503,6 +604,18 @@ struct drm_encoder {
  * @video_latency: video latency info from ELD, if found
  * @audio_latency: audio latency info from ELD, if found
  * @null_edid_counter: track sinks that give us all zeros for the EDID
+ * @bad_edid_counter: track sinks that give us an EDID with invalid checksum
+ * @debugfs_entry: debugfs directory for this connector
+ * @state: current atomic state for this connector
+ * @has_tile: is this connector connected to a tiled monitor
+ * @tile_group: tile group for the connected monitor
+ * @tile_is_single_monitor: whether the tile is one monitor housing
+ * @num_h_tile: number of horizontal tiles in the tile group
+ * @num_v_tile: number of vertical tiles in the tile group
+ * @tile_h_loc: horizontal location of this tile
+ * @tile_v_loc: vertical location of this tile
+ * @tile_h_size: horizontal size of this tile.
+ * @tile_v_size: vertical size of this tile.
  *
  * Each connector may be connected to one or more CRTCs, or may be clonable by
  * another connector if they can share a CRTC.  Each connector also has a specific
@@ -538,6 +651,8 @@ struct drm_connector {
 
 	struct drm_property_blob *path_blob_ptr;
 
+	struct drm_property_blob *tile_blob_ptr;
+
 	uint8_t polled; /* DRM_CONNECTOR_POLL_* */
 
 	/* requested DPMS state */
@@ -563,14 +678,63 @@ struct drm_connector {
 	unsigned bad_edid_counter;
 
 	struct dentry *debugfs_entry;
+
+	struct drm_connector_state *state;
+
+	/* DisplayID bits */
+	bool has_tile;
+	struct drm_tile_group *tile_group;
+	bool tile_is_single_monitor;
+
+	uint8_t num_h_tile, num_v_tile;
+	uint8_t tile_h_loc, tile_v_loc;
+	uint16_t tile_h_size, tile_v_size;
+};
+
+/**
+ * struct drm_plane_state - mutable plane state
+ * @crtc: currently bound CRTC, NULL if disabled
+ * @fb: currently bound framebuffer
+ * @fence: optional fence to wait for before scanning out @fb
+ * @crtc_x: left position of visible portion of plane on crtc
+ * @crtc_y: upper position of visible portion of plane on crtc
+ * @crtc_w: width of visible portion of plane on crtc
+ * @crtc_h: height of visible portion of plane on crtc
+ * @src_x: left position of visible portion of plane within
+ *	plane (in 16.16)
+ * @src_y: upper position of visible portion of plane within
+ *	plane (in 16.16)
+ * @src_w: width of visible portion of plane (in 16.16)
+ * @src_h: height of visible portion of plane (in 16.16)
+ * @state: backpointer to global drm_atomic_state
+ */
+struct drm_plane_state {
+	struct drm_crtc *crtc;   /* do not write directly, use drm_atomic_set_crtc_for_plane() */
+	struct drm_framebuffer *fb;  /* do not write directly, use drm_atomic_set_fb_for_plane() */
+	struct fence *fence;
+
+	/* Signed dest location allows it to be partially off screen */
+	int32_t crtc_x, crtc_y;
+	uint32_t crtc_w, crtc_h;
+
+	/* Source values are 16.16 fixed point */
+	uint32_t src_x, src_y;
+	uint32_t src_h, src_w;
+
+	struct drm_atomic_state *state;
 };
 
+
 /**
- * drm_plane_funcs - driver plane control functions
+ * struct drm_plane_funcs - driver plane control functions
  * @update_plane: update the plane configuration
  * @disable_plane: shut down the plane
  * @destroy: clean up plane resources
+ * @reset: reset plane after state has been invalidated (e.g. resume)
  * @set_property: called when a property is changed
+ * @atomic_duplicate_state: duplicate the atomic state for this plane
+ * @atomic_destroy_state: destroy an atomic state for this plane
+ * @atomic_set_property: set a property on an atomic state for this plane
  */
 struct drm_plane_funcs {
 	int (*update_plane)(struct drm_plane *plane,
@@ -585,6 +749,15 @@ struct drm_plane_funcs {
 
 	int (*set_property)(struct drm_plane *plane,
 			    struct drm_property *property, uint64_t val);
+
+	/* atomic update handling */
+	struct drm_plane_state *(*atomic_duplicate_state)(struct drm_plane *plane);
+	void (*atomic_destroy_state)(struct drm_plane *plane,
+				     struct drm_plane_state *state);
+	int (*atomic_set_property)(struct drm_plane *plane,
+				   struct drm_plane_state *state,
+				   struct drm_property *property,
+				   uint64_t val);
 };
 
 enum drm_plane_type {
@@ -594,7 +767,7 @@ enum drm_plane_type {
 };
 
 /**
- * drm_plane - central DRM plane control structure
+ * struct drm_plane - central DRM plane control structure
  * @dev: DRM device this plane belongs to
  * @head: for list management
  * @base: base mode object
@@ -603,14 +776,19 @@ enum drm_plane_type {
  * @format_count: number of formats supported
  * @crtc: currently bound CRTC
  * @fb: currently bound fb
+ * @old_fb: Temporary tracking of the old fb while a modeset is ongoing. Used by
+ * 	drm_mode_set_config_internal() to implement correct refcounting.
  * @funcs: helper functions
  * @properties: property tracking for this plane
  * @type: type of plane (overlay, primary, cursor)
+ * @state: current atomic state for this plane
  */
 struct drm_plane {
 	struct drm_device *dev;
 	struct list_head head;
 
+	struct drm_modeset_lock mutex;
+
 	struct drm_mode_object base;
 
 	uint32_t possible_crtcs;
@@ -620,8 +798,6 @@ struct drm_plane {
 	struct drm_crtc *crtc;
 	struct drm_framebuffer *fb;
 
-	/* Temporary tracking of the old fb while a modeset is ongoing. Used
-	 * by drm_mode_set_config_internal to implement correct refcounting. */
 	struct drm_framebuffer *old_fb;
 
 	const struct drm_plane_funcs *funcs;
@@ -629,10 +805,14 @@ struct drm_plane {
 	struct drm_object_properties properties;
 
 	enum drm_plane_type type;
+
+	void *helper_private;
+
+	struct drm_plane_state *state;
 };
 
 /**
- * drm_bridge_funcs - drm_bridge control functions
+ * struct drm_bridge_funcs - drm_bridge control functions
  * @mode_fixup: Try to fixup (or reject entirely) proposed mode for this bridge
  * @disable: Called right before encoder prepare, disables the bridge
  * @post_disable: Called right after encoder prepare, for lockstepped disable
@@ -656,7 +836,7 @@ struct drm_bridge_funcs {
 };
 
 /**
- * drm_bridge - central DRM bridge control structure
+ * struct drm_bridge - central DRM bridge control structure
  * @dev: DRM device this bridge belongs to
  * @head: list management
  * @base: base mode object
@@ -674,8 +854,35 @@ struct drm_bridge {
 };
 
 /**
- * drm_mode_set - new values for a CRTC config change
- * @head: list management
+ * struct struct drm_atomic_state - the global state object for atomic updates
+ * @dev: parent DRM device
+ * @flags: state flags like async update
+ * @planes: pointer to array of plane pointers
+ * @plane_states: pointer to array of plane states pointers
+ * @crtcs: pointer to array of CRTC pointers
+ * @crtc_states: pointer to array of CRTC states pointers
+ * @num_connector: size of the @connectors and @connector_states arrays
+ * @connectors: pointer to array of connector pointers
+ * @connector_states: pointer to array of connector states pointers
+ * @acquire_ctx: acquire context for this atomic modeset state update
+ */
+struct drm_atomic_state {
+	struct drm_device *dev;
+	uint32_t flags;
+	struct drm_plane **planes;
+	struct drm_plane_state **plane_states;
+	struct drm_crtc **crtcs;
+	struct drm_crtc_state **crtc_states;
+	int num_connector;
+	struct drm_connector **connectors;
+	struct drm_connector_state **connector_states;
+
+	struct drm_modeset_acquire_ctx *acquire_ctx;
+};
+
+
+/**
+ * struct drm_mode_set - new values for a CRTC config change
  * @fb: framebuffer to use for new config
  * @crtc: CRTC whose configuration we're about to change
  * @mode: mode timings to use
@@ -705,6 +912,9 @@ struct drm_mode_set {
  * struct drm_mode_config_funcs - basic driver provided mode setting functions
  * @fb_create: create a new framebuffer object
  * @output_poll_changed: function to handle output configuration changes
+ * @atomic_check: check whether a give atomic state update is possible
+ * @atomic_commit: commit an atomic state update previously verified with
+ * 	atomic_check()
  *
  * Some global (i.e. not per-CRTC, connector, etc) mode setting functions that
  * involve drivers.
@@ -714,13 +924,20 @@ struct drm_mode_config_funcs {
 					     struct drm_file *file_priv,
 					     struct drm_mode_fb_cmd2 *mode_cmd);
 	void (*output_poll_changed)(struct drm_device *dev);
+
+	int (*atomic_check)(struct drm_device *dev,
+			    struct drm_atomic_state *a);
+	int (*atomic_commit)(struct drm_device *dev,
+			     struct drm_atomic_state *a,
+			     bool async);
 };
 
 /**
- * drm_mode_group - group of mode setting resources for potential sub-grouping
+ * struct drm_mode_group - group of mode setting resources for potential sub-grouping
  * @num_crtcs: CRTC count
  * @num_encoders: encoder count
  * @num_connectors: connector count
+ * @num_bridges: bridge count
  * @id_list: list of KMS object IDs in this group
  *
  * Currently this simply tracks the global mode setting state.  But in the
@@ -740,10 +957,14 @@ struct drm_mode_group {
 };
 
 /**
- * drm_mode_config - Mode configuration control structure
+ * struct drm_mode_config - Mode configuration control structure
  * @mutex: mutex protecting KMS related lists and structures
+ * @connection_mutex: ww mutex protecting connector state and routing
+ * @acquire_ctx: global implicit acquire context used by atomic drivers for
+ * 	legacy ioctls
  * @idr_mutex: mutex for KMS ID allocation and management
  * @crtc_idr: main KMS ID tracking object
+ * @fb_lock: mutex to protect fb state and lists
  * @num_fb: number of fbs available
  * @fb_list: list of framebuffers available
  * @num_connector: number of connectors on this device
@@ -752,17 +973,28 @@ struct drm_mode_group {
  * @bridge_list: list of bridge objects
  * @num_encoder: number of encoders on this device
  * @encoder_list: list of encoder objects
+ * @num_overlay_plane: number of overlay planes on this device
+ * @num_total_plane: number of universal (i.e. with primary/curso) planes on this device
+ * @plane_list: list of plane objects
  * @num_crtc: number of CRTCs on this device
  * @crtc_list: list of CRTC objects
+ * @property_list: list of property objects
  * @min_width: minimum pixel width on this device
  * @min_height: minimum pixel height on this device
  * @max_width: maximum pixel width on this device
  * @max_height: maximum pixel height on this device
  * @funcs: core driver provided mode setting functions
  * @fb_base: base address of the framebuffer
- * @poll_enabled: track polling status for this device
+ * @poll_enabled: track polling support for this device
+ * @poll_running: track polling status for this device
  * @output_poll_work: delayed work for polling in process context
+ * @property_blob_list: list of all the blob property objects
  * @*_property: core property tracking
+ * @preferred_depth: preferred RBG pixel depth, used by fb helpers
+ * @prefer_shadow: hint to userspace to prefer shadow-fb rendering
+ * @async_page_flip: does this device support async flips on the primary plane?
+ * @cursor_width: hint to userspace for max cursor width
+ * @cursor_height: hint to userspace for max cursor height
  *
  * Core mode resource tracking structure.  All CRTC, encoders, and connectors
  * enumerated by the driver are added here, as are global properties.  Some
@@ -774,16 +1006,10 @@ struct drm_mode_config {
 	struct drm_modeset_acquire_ctx *acquire_ctx; /* for legacy _lock_all() / _unlock_all() */
 	struct mutex idr_mutex; /* for IDR management */
 	struct idr crtc_idr; /* use this idr for all IDs, fb, crtc, connector, modes - just makes life easier */
+	struct idr tile_idr; /* use this idr for all IDs, fb, crtc, connector, modes - just makes life easier */
 	/* this is limited to one for now */
 
-
-	/**
-	 * fb_lock - mutex to protect fb state
-	 *
-	 * Besides the global fb list his also protects the fbs list in the
-	 * file_priv
-	 */
-	struct mutex fb_lock;
+	struct mutex fb_lock; /* proctects global and per-file fb lists */
 	int num_fb;
 	struct list_head fb_list;
 
@@ -824,6 +1050,7 @@ struct drm_mode_config {
 	struct drm_property *edid_property;
 	struct drm_property *dpms_property;
 	struct drm_property *path_property;
+	struct drm_property *tile_property;
 	struct drm_property *plane_type_property;
 	struct drm_property *rotation_property;
 
@@ -851,6 +1078,10 @@ struct drm_mode_config {
 	struct drm_property *aspect_ratio_property;
 	struct drm_property *dirty_info_property;
 
+	/* properties for virtual machine layout */
+	struct drm_property *suggested_x_property;
+	struct drm_property *suggested_y_property;
+
 	/* dumb ioctl parameters */
 	uint32_t preferred_depth, prefer_shadow;
 
@@ -861,6 +1092,19 @@ struct drm_mode_config {
 	uint32_t cursor_width, cursor_height;
 };
 
+/**
+ * drm_for_each_plane_mask - iterate over planes specified by bitmask
+ * @plane: the loop cursor
+ * @dev: the DRM device
+ * @plane_mask: bitmask of plane indices
+ *
+ * Iterate over all planes specified by bitmask.
+ */
+#define drm_for_each_plane_mask(plane, dev, plane_mask) \
+	list_for_each_entry((plane), &(dev)->mode_config.plane_list, head) \
+		if ((plane_mask) & (1 << drm_plane_index(plane)))
+
+
 #define obj_to_crtc(x) container_of(x, struct drm_crtc, base)
 #define obj_to_connector(x) container_of(x, struct drm_connector, base)
 #define obj_to_encoder(x) container_of(x, struct drm_encoder, base)
@@ -880,9 +1124,6 @@ extern int drm_crtc_init_with_planes(struct drm_device *dev,
 				     struct drm_plane *primary,
 				     struct drm_plane *cursor,
 				     const struct drm_crtc_funcs *funcs);
-extern int drm_crtc_init(struct drm_device *dev,
-			 struct drm_crtc *crtc,
-			 const struct drm_crtc_funcs *funcs);
 extern void drm_crtc_cleanup(struct drm_crtc *crtc);
 extern unsigned int drm_crtc_index(struct drm_crtc *crtc);
 
@@ -978,9 +1219,10 @@ extern void drm_mode_config_reset(struct drm_device *dev);
 extern void drm_mode_config_cleanup(struct drm_device *dev);
 
 extern int drm_mode_connector_set_path_property(struct drm_connector *connector,
-						char *path);
+						const char *path);
+int drm_mode_connector_set_tile_property(struct drm_connector *connector);
 extern int drm_mode_connector_update_edid_property(struct drm_connector *connector,
-						struct edid *edid);
+						   const struct edid *edid);
 
 static inline bool drm_property_type_is(struct drm_property *property,
 		uint32_t type)
@@ -1041,11 +1283,13 @@ extern void drm_property_destroy(struct drm_device *dev, struct drm_property *pr
 extern int drm_property_add_enum(struct drm_property *property, int index,
 				 uint64_t value, const char *name);
 extern int drm_mode_create_dvi_i_properties(struct drm_device *dev);
-extern int drm_mode_create_tv_properties(struct drm_device *dev, int num_formats,
-				     char *formats[]);
+extern int drm_mode_create_tv_properties(struct drm_device *dev,
+					 unsigned int num_modes,
+					 char *modes[]);
 extern int drm_mode_create_scaling_mode_property(struct drm_device *dev);
 extern int drm_mode_create_aspect_ratio_property(struct drm_device *dev);
 extern int drm_mode_create_dirty_info_property(struct drm_device *dev);
+extern int drm_mode_create_suggested_offset_properties(struct drm_device *dev);
 
 extern int drm_mode_connector_attach_encoder(struct drm_connector *connector,
 					     struct drm_encoder *encoder);
@@ -1113,6 +1357,13 @@ extern void drm_set_preferred_mode(struct drm_connector *connector,
 extern int drm_edid_header_is_valid(const u8 *raw_edid);
 extern bool drm_edid_block_valid(u8 *raw_edid, int block, bool print_bad_edid);
 extern bool drm_edid_is_valid(struct edid *edid);
+
+extern struct drm_tile_group *drm_mode_create_tile_group(struct drm_device *dev,
+							 char topology[8]);
+extern struct drm_tile_group *drm_mode_get_tile_group(struct drm_device *dev,
+					       char topology[8]);
+extern void drm_mode_put_tile_group(struct drm_device *dev,
+				   struct drm_tile_group *tg);
 struct drm_display_mode *drm_mode_find_dmt(struct drm_device *dev,
 					   int hsize, int vsize, int fresh,
 					   bool rb);
diff --git a/include/drm/drm_crtc_helper.h b/include/drm/drm_crtc_helper.h
index a3d75fefd010..7adbb65ea8ae 100644
--- a/include/drm/drm_crtc_helper.h
+++ b/include/drm/drm_crtc_helper.h
@@ -68,6 +68,7 @@ struct drm_crtc_helper_funcs {
 	int (*mode_set)(struct drm_crtc *crtc, struct drm_display_mode *mode,
 			struct drm_display_mode *adjusted_mode, int x, int y,
 			struct drm_framebuffer *old_fb);
+	void (*mode_set_nofb)(struct drm_crtc *crtc);
 
 	/* Move the crtc on the current fb to the given position *optional* */
 	int (*mode_set_base)(struct drm_crtc *crtc, int x, int y,
@@ -81,6 +82,12 @@ struct drm_crtc_helper_funcs {
 
 	/* disable crtc when not in use - more explicit than dpms off */
 	void (*disable)(struct drm_crtc *crtc);
+
+	/* atomic helpers */
+	int (*atomic_check)(struct drm_crtc *crtc,
+			    struct drm_crtc_state *state);
+	void (*atomic_begin)(struct drm_crtc *crtc);
+	void (*atomic_flush)(struct drm_crtc *crtc);
 };
 
 /**
@@ -161,6 +168,12 @@ static inline void drm_connector_helper_add(struct drm_connector *connector,
 
 extern void drm_helper_resume_force_mode(struct drm_device *dev);
 
+int drm_helper_crtc_mode_set(struct drm_crtc *crtc, struct drm_display_mode *mode,
+			     struct drm_display_mode *adjusted_mode, int x, int y,
+			     struct drm_framebuffer *old_fb);
+int drm_helper_crtc_mode_set_base(struct drm_crtc *crtc, int x, int y,
+				  struct drm_framebuffer *old_fb);
+
 /* drm_probe_helper.c */
 extern int drm_helper_probe_single_connector_modes(struct drm_connector
 						   *connector, uint32_t maxX,
diff --git a/include/drm/drm_displayid.h b/include/drm/drm_displayid.h
new file mode 100644
index 000000000000..623b4e98e748
--- /dev/null
+++ b/include/drm/drm_displayid.h
@@ -0,0 +1,76 @@
+/*
+ * Copyright © 2014 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+#ifndef DRM_DISPLAYID_H
+#define DRM_DISPLAYID_H
+
+#define DATA_BLOCK_PRODUCT_ID 0x00
+#define DATA_BLOCK_DISPLAY_PARAMETERS 0x01
+#define DATA_BLOCK_COLOR_CHARACTERISTICS 0x02
+#define DATA_BLOCK_TYPE_1_DETAILED_TIMING 0x03
+#define DATA_BLOCK_TYPE_2_DETAILED_TIMING 0x04
+#define DATA_BLOCK_TYPE_3_SHORT_TIMING 0x05
+#define DATA_BLOCK_TYPE_4_DMT_TIMING 0x06
+#define DATA_BLOCK_VESA_TIMING 0x07
+#define DATA_BLOCK_CEA_TIMING 0x08
+#define DATA_BLOCK_VIDEO_TIMING_RANGE 0x09
+#define DATA_BLOCK_PRODUCT_SERIAL_NUMBER 0x0a
+#define DATA_BLOCK_GP_ASCII_STRING 0x0b
+#define DATA_BLOCK_DISPLAY_DEVICE_DATA 0x0c
+#define DATA_BLOCK_INTERFACE_POWER_SEQUENCING 0x0d
+#define DATA_BLOCK_TRANSFER_CHARACTERISTICS 0x0e
+#define DATA_BLOCK_DISPLAY_INTERFACE 0x0f
+#define DATA_BLOCK_STEREO_DISPLAY_INTERFACE 0x10
+#define DATA_BLOCK_TILED_DISPLAY 0x12
+
+#define DATA_BLOCK_VENDOR_SPECIFIC 0x7f
+
+#define PRODUCT_TYPE_EXTENSION 0
+#define PRODUCT_TYPE_TEST 1
+#define PRODUCT_TYPE_PANEL 2
+#define PRODUCT_TYPE_MONITOR 3
+#define PRODUCT_TYPE_TV 4
+#define PRODUCT_TYPE_REPEATER 5
+#define PRODUCT_TYPE_DIRECT_DRIVE 6
+
+struct displayid_hdr {
+	u8 rev;
+	u8 bytes;
+	u8 prod_id;
+	u8 ext_count;
+} __packed;
+
+struct displayid_block {
+	u8 tag;
+	u8 rev;
+	u8 num_bytes;
+} __packed;
+
+struct displayid_tiled_block {
+	struct displayid_block base;
+	u8 tile_cap;
+	u8 topo[3];
+	u8 tile_size[4];
+	u8 tile_pixel_bezel[5];
+	u8 topology_id[8];
+} __packed;
+
+#endif
diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
index 9305c718d789..11f8c84f98ce 100644
--- a/include/drm/drm_dp_helper.h
+++ b/include/drm/drm_dp_helper.h
@@ -303,7 +303,8 @@
 #define DP_TEST_CRC_B_CB		    0x244
 
 #define DP_TEST_SINK_MISC		    0x246
-#define DP_TEST_CRC_SUPPORTED		    (1 << 5)
+# define DP_TEST_CRC_SUPPORTED		    (1 << 5)
+# define DP_TEST_COUNT_MASK		    0x7
 
 #define DP_TEST_RESPONSE		    0x260
 # define DP_TEST_ACK			    (1 << 0)
@@ -313,7 +314,7 @@
 #define DP_TEST_EDID_CHECKSUM		    0x261
 
 #define DP_TEST_SINK			    0x270
-#define DP_TEST_SINK_START	    (1 << 0)
+# define DP_TEST_SINK_START		    (1 << 0)
 
 #define DP_PAYLOAD_TABLE_UPDATE_STATUS      0x2c0   /* 1.2 MST */
 # define DP_PAYLOAD_TABLE_UPDATED           (1 << 0)
@@ -404,26 +405,6 @@
 #define MODE_I2C_READ	4
 #define MODE_I2C_STOP	8
 
-/**
- * struct i2c_algo_dp_aux_data - driver interface structure for i2c over dp
- * 				 aux algorithm
- * @running: set by the algo indicating whether an i2c is ongoing or whether
- * 	     the i2c bus is quiescent
- * @address: i2c target address for the currently ongoing transfer
- * @aux_ch: driver callback to transfer a single byte of the i2c payload
- */
-struct i2c_algo_dp_aux_data {
-	bool running;
-	u16 address;
-	int (*aux_ch) (struct i2c_adapter *adapter,
-		       int mode, uint8_t write_byte,
-		       uint8_t *read_byte);
-};
-
-int
-i2c_dp_aux_add_bus(struct i2c_adapter *adapter);
-
-
 #define DP_LINK_STATUS_SIZE	   6
 bool drm_dp_channel_eq_ok(const u8 link_status[DP_LINK_STATUS_SIZE],
 			  int lane_count);
@@ -550,6 +531,7 @@ struct drm_dp_aux {
 	struct mutex hw_mutex;
 	ssize_t (*transfer)(struct drm_dp_aux *aux,
 			    struct drm_dp_aux_msg *msg);
+	unsigned i2c_nack_count, i2c_defer_count;
 };
 
 ssize_t drm_dp_dpcd_read(struct drm_dp_aux *aux, unsigned int offset,
diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h
index 338fc1053835..00c1da927245 100644
--- a/include/drm/drm_dp_mst_helper.h
+++ b/include/drm/drm_dp_mst_helper.h
@@ -28,7 +28,7 @@
 struct drm_dp_mst_branch;
 
 /**
- * struct drm_dp_vcpi - Virtual Channel Payload Identifer
+ * struct drm_dp_vcpi - Virtual Channel Payload Identifier
  * @vcpi: Virtual channel ID.
  * @pbn: Payload Bandwidth Number for this channel
  * @aligned_pbn: PBN aligned with slot size
@@ -92,6 +92,8 @@ struct drm_dp_mst_port {
 	struct drm_dp_vcpi vcpi;
 	struct drm_connector *connector;
 	struct drm_dp_mst_topology_mgr *mgr;
+
+	struct edid *cached_edid; /* for DP logical ports - make tiling work */
 };
 
 /**
@@ -371,7 +373,7 @@ struct drm_dp_sideband_msg_tx {
 struct drm_dp_mst_topology_mgr;
 struct drm_dp_mst_topology_cbs {
 	/* create a connector for a port */
-	struct drm_connector *(*add_connector)(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port, char *path);
+	struct drm_connector *(*add_connector)(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port, const char *path);
 	void (*destroy_connector)(struct drm_dp_mst_topology_mgr *mgr,
 				  struct drm_connector *connector);
 	void (*hotplug)(struct drm_dp_mst_topology_mgr *mgr);
@@ -474,7 +476,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
 int drm_dp_mst_hpd_irq(struct drm_dp_mst_topology_mgr *mgr, u8 *esi, bool *handled);
 
 
-enum drm_connector_status drm_dp_mst_detect_port(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port);
+enum drm_connector_status drm_dp_mst_detect_port(struct drm_connector *connector, struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port);
 
 struct edid *drm_dp_mst_get_edid(struct drm_connector *connector, struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port);
 
diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h
index b96031d947a0..87d85e81d3a7 100644
--- a/include/drm/drm_edid.h
+++ b/include/drm/drm_edid.h
@@ -27,12 +27,14 @@
 
 #define EDID_LENGTH 128
 #define DDC_ADDR 0x50
+#define DDC_ADDR2 0x52 /* E-DDC 1.2 - where DisplayID can hide */
 
 #define CEA_EXT	    0x02
 #define VTB_EXT	    0x10
 #define DI_EXT	    0x40
 #define LS_EXT	    0x50
 #define MI_EXT	    0x60
+#define DISPLAYID_EXT 0x70
 
 struct est_timings {
 	u8 t1;
@@ -207,6 +209,61 @@ struct detailed_timing {
 #define DRM_EDID_HDMI_DC_30               (1 << 4)
 #define DRM_EDID_HDMI_DC_Y444             (1 << 3)
 
+/* ELD Header Block */
+#define DRM_ELD_HEADER_BLOCK_SIZE	4
+
+#define DRM_ELD_VER			0
+# define DRM_ELD_VER_SHIFT		3
+# define DRM_ELD_VER_MASK		(0x1f << 3)
+
+#define DRM_ELD_BASELINE_ELD_LEN	2	/* in dwords! */
+
+/* ELD Baseline Block for ELD_Ver == 2 */
+#define DRM_ELD_CEA_EDID_VER_MNL	4
+# define DRM_ELD_CEA_EDID_VER_SHIFT	5
+# define DRM_ELD_CEA_EDID_VER_MASK	(7 << 5)
+# define DRM_ELD_CEA_EDID_VER_NONE	(0 << 5)
+# define DRM_ELD_CEA_EDID_VER_CEA861	(1 << 5)
+# define DRM_ELD_CEA_EDID_VER_CEA861A	(2 << 5)
+# define DRM_ELD_CEA_EDID_VER_CEA861BCD	(3 << 5)
+# define DRM_ELD_MNL_SHIFT		0
+# define DRM_ELD_MNL_MASK		(0x1f << 0)
+
+#define DRM_ELD_SAD_COUNT_CONN_TYPE	5
+# define DRM_ELD_SAD_COUNT_SHIFT	4
+# define DRM_ELD_SAD_COUNT_MASK		(0xf << 4)
+# define DRM_ELD_CONN_TYPE_SHIFT	2
+# define DRM_ELD_CONN_TYPE_MASK		(3 << 2)
+# define DRM_ELD_CONN_TYPE_HDMI		(0 << 2)
+# define DRM_ELD_CONN_TYPE_DP		(1 << 2)
+# define DRM_ELD_SUPPORTS_AI		(1 << 1)
+# define DRM_ELD_SUPPORTS_HDCP		(1 << 0)
+
+#define DRM_ELD_AUD_SYNCH_DELAY		6	/* in units of 2 ms */
+# define DRM_ELD_AUD_SYNCH_DELAY_MAX	0xfa	/* 500 ms */
+
+#define DRM_ELD_SPEAKER			7
+# define DRM_ELD_SPEAKER_RLRC		(1 << 6)
+# define DRM_ELD_SPEAKER_FLRC		(1 << 5)
+# define DRM_ELD_SPEAKER_RC		(1 << 4)
+# define DRM_ELD_SPEAKER_RLR		(1 << 3)
+# define DRM_ELD_SPEAKER_FC		(1 << 2)
+# define DRM_ELD_SPEAKER_LFE		(1 << 1)
+# define DRM_ELD_SPEAKER_FLR		(1 << 0)
+
+#define DRM_ELD_PORT_ID			8	/* offsets 8..15 inclusive */
+# define DRM_ELD_PORT_ID_LEN		8
+
+#define DRM_ELD_MANUFACTURER_NAME0	16
+#define DRM_ELD_MANUFACTURER_NAME1	17
+
+#define DRM_ELD_PRODUCT_CODE0		18
+#define DRM_ELD_PRODUCT_CODE1		19
+
+#define DRM_ELD_MONITOR_NAME_STRING	20	/* offsets 20..(20+mnl-1) inclusive */
+
+#define DRM_ELD_CEA_SAD(mnl, sad)	(20 + (mnl) + 3 * (sad))
+
 struct edid {
 	u8 header[8];
 	/* Vendor & product info */
@@ -279,4 +336,56 @@ int
 drm_hdmi_vendor_infoframe_from_display_mode(struct hdmi_vendor_infoframe *frame,
 					    const struct drm_display_mode *mode);
 
+/**
+ * drm_eld_mnl - Get ELD monitor name length in bytes.
+ * @eld: pointer to an eld memory structure with mnl set
+ */
+static inline int drm_eld_mnl(const uint8_t *eld)
+{
+	return (eld[DRM_ELD_CEA_EDID_VER_MNL] & DRM_ELD_MNL_MASK) >> DRM_ELD_MNL_SHIFT;
+}
+
+/**
+ * drm_eld_sad_count - Get ELD SAD count.
+ * @eld: pointer to an eld memory structure with sad_count set
+ */
+static inline int drm_eld_sad_count(const uint8_t *eld)
+{
+	return (eld[DRM_ELD_SAD_COUNT_CONN_TYPE] & DRM_ELD_SAD_COUNT_MASK) >>
+		DRM_ELD_SAD_COUNT_SHIFT;
+}
+
+/**
+ * drm_eld_calc_baseline_block_size - Calculate baseline block size in bytes
+ * @eld: pointer to an eld memory structure with mnl and sad_count set
+ *
+ * This is a helper for determining the payload size of the baseline block, in
+ * bytes, for e.g. setting the Baseline_ELD_Len field in the ELD header block.
+ */
+static inline int drm_eld_calc_baseline_block_size(const uint8_t *eld)
+{
+	return DRM_ELD_MONITOR_NAME_STRING - DRM_ELD_HEADER_BLOCK_SIZE +
+		drm_eld_mnl(eld) + drm_eld_sad_count(eld) * 3;
+}
+
+/**
+ * drm_eld_size - Get ELD size in bytes
+ * @eld: pointer to a complete eld memory structure
+ *
+ * The returned value does not include the vendor block. It's vendor specific,
+ * and comprises of the remaining bytes in the ELD memory buffer after
+ * drm_eld_size() bytes of header and baseline block.
+ *
+ * The returned value is guaranteed to be a multiple of 4.
+ */
+static inline int drm_eld_size(const uint8_t *eld)
+{
+	return DRM_ELD_HEADER_BLOCK_SIZE + eld[DRM_ELD_BASELINE_ELD_LEN] * 4;
+}
+
+struct edid *drm_do_get_edid(struct drm_connector *connector,
+	int (*get_edid_block)(void *data, u8 *buf, unsigned int block,
+			      size_t len),
+	void *data);
+
 #endif /* __DRM_EDID_H__ */
diff --git a/include/drm/drm_fb_helper.h b/include/drm/drm_fb_helper.h
index f4ad254e3488..b597068103aa 100644
--- a/include/drm/drm_fb_helper.h
+++ b/include/drm/drm_fb_helper.h
@@ -34,9 +34,14 @@ struct drm_fb_helper;
 
 #include <linux/kgdb.h>
 
+struct drm_fb_offset {
+	int x, y;
+};
+
 struct drm_fb_helper_crtc {
 	struct drm_mode_set mode_set;
 	struct drm_display_mode *desired_mode;
+	int x, y;
 };
 
 struct drm_fb_helper_surface_size {
@@ -72,6 +77,7 @@ struct drm_fb_helper_funcs {
 	bool (*initial_config)(struct drm_fb_helper *fb_helper,
 			       struct drm_fb_helper_crtc **crtcs,
 			       struct drm_display_mode **modes,
+			       struct drm_fb_offset *offsets,
 			       bool *enabled, int width, int height);
 };
 
diff --git a/include/drm/drm_flip_work.h b/include/drm/drm_flip_work.h
index 9eed34dcd6af..d387cf06ae05 100644
--- a/include/drm/drm_flip_work.h
+++ b/include/drm/drm_flip_work.h
@@ -25,6 +25,7 @@
 #define DRM_FLIP_WORK_H
 
 #include <linux/kfifo.h>
+#include <linux/spinlock.h>
 #include <linux/workqueue.h>
 
 /**
@@ -32,9 +33,9 @@
  *
  * Util to queue up work to run from work-queue context after flip/vblank.
  * Typically this can be used to defer unref of framebuffer's, cursor
- * bo's, etc until after vblank.  The APIs are all safe (and lockless)
- * for up to one producer and once consumer at a time.  The single-consumer
- * aspect is ensured by committing the queued work to a single work-queue.
+ * bo's, etc until after vblank.  The APIs are all thread-safe.
+ * Moreover, drm_flip_work_queue_task and drm_flip_work_queue can be called
+ * in atomic context.
  */
 
 struct drm_flip_work;
@@ -51,26 +52,40 @@ struct drm_flip_work;
 typedef void (*drm_flip_func_t)(struct drm_flip_work *work, void *val);
 
 /**
+ * struct drm_flip_task - flip work task
+ * @node: list entry element
+ * @data: data to pass to work->func
+ */
+struct drm_flip_task {
+	struct list_head node;
+	void *data;
+};
+
+/**
  * struct drm_flip_work - flip work queue
  * @name: debug name
- * @pending: number of queued but not committed items
- * @count: number of committed items
  * @func: callback fxn called for each committed item
  * @worker: worker which calls @func
- * @fifo: queue of committed items
+ * @queued: queued tasks
+ * @commited: commited tasks
+ * @lock: lock to access queued and commited lists
  */
 struct drm_flip_work {
 	const char *name;
-	atomic_t pending, count;
 	drm_flip_func_t func;
 	struct work_struct worker;
-	DECLARE_KFIFO_PTR(fifo, void *);
+	struct list_head queued;
+	struct list_head commited;
+	spinlock_t lock;
 };
 
+struct drm_flip_task *drm_flip_work_allocate_task(void *data, gfp_t flags);
+void drm_flip_work_queue_task(struct drm_flip_work *work,
+			      struct drm_flip_task *task);
 void drm_flip_work_queue(struct drm_flip_work *work, void *val);
 void drm_flip_work_commit(struct drm_flip_work *work,
 		struct workqueue_struct *wq);
-int drm_flip_work_init(struct drm_flip_work *work, int size,
+void drm_flip_work_init(struct drm_flip_work *work,
 		const char *name, drm_flip_func_t func);
 void drm_flip_work_cleanup(struct drm_flip_work *work);
 
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 1e6ae1458f7a..780511a459c0 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -119,6 +119,13 @@ struct drm_gem_object {
 	 * simply leave it as NULL.
 	 */
 	struct dma_buf_attachment *import_attach;
+
+	/**
+	 * dumb - created as dumb buffer
+	 * Whether the gem object was created using the dumb buffer interface
+	 * as such it may not be used for GPU rendering.
+	 */
+	bool dumb;
 };
 
 void drm_gem_object_release(struct drm_gem_object *obj);
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index 2ff35f3de9c5..acd6af8a8e67 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -4,6 +4,13 @@
 #include <drm/drmP.h>
 #include <drm/drm_gem.h>
 
+/**
+ * struct drm_gem_cma_object - GEM object backed by CMA memory allocations
+ * @base: base GEM object
+ * @paddr: physical address of the backing memory
+ * @sgt: scatter/gather table for imported PRIME buffers
+ * @vaddr: kernel virtual address of the backing memory
+ */
 struct drm_gem_cma_object {
 	struct drm_gem_object base;
 	dma_addr_t paddr;
@@ -19,23 +26,30 @@ to_drm_gem_cma_obj(struct drm_gem_object *gem_obj)
 	return container_of(gem_obj, struct drm_gem_cma_object, base);
 }
 
-/* free gem object. */
+/* free GEM object */
 void drm_gem_cma_free_object(struct drm_gem_object *gem_obj);
 
-/* create memory region for drm framebuffer. */
+/* create memory region for DRM framebuffer */
+int drm_gem_cma_dumb_create_internal(struct drm_file *file_priv,
+				     struct drm_device *drm,
+				     struct drm_mode_create_dumb *args);
+
+/* create memory region for DRM framebuffer */
 int drm_gem_cma_dumb_create(struct drm_file *file_priv,
-		struct drm_device *drm, struct drm_mode_create_dumb *args);
+			    struct drm_device *drm,
+			    struct drm_mode_create_dumb *args);
 
-/* map memory region for drm framebuffer to user space. */
+/* map memory region for DRM framebuffer to user space */
 int drm_gem_cma_dumb_map_offset(struct drm_file *file_priv,
-		struct drm_device *drm, uint32_t handle, uint64_t *offset);
+				struct drm_device *drm, u32 handle,
+				u64 *offset);
 
-/* set vm_flags and we can change the vm attribute to other one at here. */
+/* set vm_flags and we can change the VM attribute to other one at here */
 int drm_gem_cma_mmap(struct file *filp, struct vm_area_struct *vma);
 
-/* allocate physical memory. */
+/* allocate physical memory */
 struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
-		unsigned int size);
+					      size_t size);
 
 extern const struct vm_operations_struct drm_gem_cma_vm_ops;
 
diff --git a/include/drm/drm_mipi_dsi.h b/include/drm/drm_mipi_dsi.h
index 8569dc5a1026..f1d8d0dbb4f1 100644
--- a/include/drm/drm_mipi_dsi.h
+++ b/include/drm/drm_mipi_dsi.h
@@ -26,6 +26,7 @@ struct mipi_dsi_device;
  * struct mipi_dsi_msg - read/write DSI buffer
  * @channel: virtual channel id
  * @type: payload data type
+ * @flags: flags controlling this message transmission
  * @tx_len: length of @tx_buf
  * @tx_buf: data to be written
  * @rx_len: length of @rx_buf
@@ -43,12 +44,44 @@ struct mipi_dsi_msg {
 	void *rx_buf;
 };
 
+bool mipi_dsi_packet_format_is_short(u8 type);
+bool mipi_dsi_packet_format_is_long(u8 type);
+
+/**
+ * struct mipi_dsi_packet - represents a MIPI DSI packet in protocol format
+ * @size: size (in bytes) of the packet
+ * @header: the four bytes that make up the header (Data ID, Word Count or
+ *     Packet Data, and ECC)
+ * @payload_length: number of bytes in the payload
+ * @payload: a pointer to a buffer containing the payload, if any
+ */
+struct mipi_dsi_packet {
+	size_t size;
+	u8 header[4];
+	size_t payload_length;
+	const u8 *payload;
+};
+
+int mipi_dsi_create_packet(struct mipi_dsi_packet *packet,
+			   const struct mipi_dsi_msg *msg);
+
 /**
  * struct mipi_dsi_host_ops - DSI bus operations
  * @attach: attach DSI device to DSI host
  * @detach: detach DSI device from DSI host
- * @transfer: send and/or receive DSI packet, return number of received bytes,
- * 	      or error
+ * @transfer: transmit a DSI packet
+ *
+ * DSI packets transmitted by .transfer() are passed in as mipi_dsi_msg
+ * structures. This structure contains information about the type of packet
+ * being transmitted as well as the transmit and receive buffers. When an
+ * error is encountered during transmission, this function will return a
+ * negative error code. On success it shall return the number of bytes
+ * transmitted for write packets or the number of bytes received for read
+ * packets.
+ *
+ * Note that typically DSI packet transmission is atomic, so the .transfer()
+ * function will seldomly return anything other than the number of bytes
+ * contained in the transmit buffer on success.
  */
 struct mipi_dsi_host_ops {
 	int (*attach)(struct mipi_dsi_host *host,
@@ -56,7 +89,7 @@ struct mipi_dsi_host_ops {
 	int (*detach)(struct mipi_dsi_host *host,
 		      struct mipi_dsi_device *dsi);
 	ssize_t (*transfer)(struct mipi_dsi_host *host,
-			    struct mipi_dsi_msg *msg);
+			    const struct mipi_dsi_msg *msg);
 };
 
 /**
@@ -130,12 +163,57 @@ static inline struct mipi_dsi_device *to_mipi_dsi_device(struct device *dev)
 	return container_of(dev, struct mipi_dsi_device, dev);
 }
 
+struct mipi_dsi_device *of_find_mipi_dsi_device_by_node(struct device_node *np);
 int mipi_dsi_attach(struct mipi_dsi_device *dsi);
 int mipi_dsi_detach(struct mipi_dsi_device *dsi);
-ssize_t mipi_dsi_dcs_write(struct mipi_dsi_device *dsi, const void *data,
-			    size_t len);
+int mipi_dsi_set_maximum_return_packet_size(struct mipi_dsi_device *dsi,
+					    u16 value);
+
+ssize_t mipi_dsi_generic_write(struct mipi_dsi_device *dsi, const void *payload,
+			       size_t size);
+ssize_t mipi_dsi_generic_read(struct mipi_dsi_device *dsi, const void *params,
+			      size_t num_params, void *data, size_t size);
+
+/**
+ * enum mipi_dsi_dcs_tear_mode - Tearing Effect Output Line mode
+ * @MIPI_DSI_DCS_TEAR_MODE_VBLANK: the TE output line consists of V-Blanking
+ *    information only
+ * @MIPI_DSI_DCS_TEAR_MODE_VHBLANK : the TE output line consists of both
+ *    V-Blanking and H-Blanking information
+ */
+enum mipi_dsi_dcs_tear_mode {
+	MIPI_DSI_DCS_TEAR_MODE_VBLANK,
+	MIPI_DSI_DCS_TEAR_MODE_VHBLANK,
+};
+
+#define MIPI_DSI_DCS_POWER_MODE_DISPLAY (1 << 2)
+#define MIPI_DSI_DCS_POWER_MODE_NORMAL  (1 << 3)
+#define MIPI_DSI_DCS_POWER_MODE_SLEEP   (1 << 4)
+#define MIPI_DSI_DCS_POWER_MODE_PARTIAL (1 << 5)
+#define MIPI_DSI_DCS_POWER_MODE_IDLE    (1 << 6)
+
+ssize_t mipi_dsi_dcs_write_buffer(struct mipi_dsi_device *dsi,
+				  const void *data, size_t len);
+ssize_t mipi_dsi_dcs_write(struct mipi_dsi_device *dsi, u8 cmd,
+			   const void *data, size_t len);
 ssize_t mipi_dsi_dcs_read(struct mipi_dsi_device *dsi, u8 cmd, void *data,
 			  size_t len);
+int mipi_dsi_dcs_nop(struct mipi_dsi_device *dsi);
+int mipi_dsi_dcs_soft_reset(struct mipi_dsi_device *dsi);
+int mipi_dsi_dcs_get_power_mode(struct mipi_dsi_device *dsi, u8 *mode);
+int mipi_dsi_dcs_get_pixel_format(struct mipi_dsi_device *dsi, u8 *format);
+int mipi_dsi_dcs_enter_sleep_mode(struct mipi_dsi_device *dsi);
+int mipi_dsi_dcs_exit_sleep_mode(struct mipi_dsi_device *dsi);
+int mipi_dsi_dcs_set_display_off(struct mipi_dsi_device *dsi);
+int mipi_dsi_dcs_set_display_on(struct mipi_dsi_device *dsi);
+int mipi_dsi_dcs_set_column_address(struct mipi_dsi_device *dsi, u16 start,
+				    u16 end);
+int mipi_dsi_dcs_set_page_address(struct mipi_dsi_device *dsi, u16 start,
+				  u16 end);
+int mipi_dsi_dcs_set_tear_off(struct mipi_dsi_device *dsi);
+int mipi_dsi_dcs_set_tear_on(struct mipi_dsi_device *dsi,
+			     enum mipi_dsi_dcs_tear_mode mode);
+int mipi_dsi_dcs_set_pixel_format(struct mipi_dsi_device *dsi, u8 format);
 
 /**
  * struct mipi_dsi_driver - DSI driver
@@ -167,9 +245,13 @@ static inline void mipi_dsi_set_drvdata(struct mipi_dsi_device *dsi, void *data)
 	dev_set_drvdata(&dsi->dev, data);
 }
 
-int mipi_dsi_driver_register(struct mipi_dsi_driver *driver);
+int mipi_dsi_driver_register_full(struct mipi_dsi_driver *driver,
+				  struct module *owner);
 void mipi_dsi_driver_unregister(struct mipi_dsi_driver *driver);
 
+#define mipi_dsi_driver_register(driver) \
+	mipi_dsi_driver_register_full(driver, THIS_MODULE)
+
 #define module_mipi_dsi_driver(__mipi_dsi_driver) \
 	module_driver(__mipi_dsi_driver, mipi_dsi_driver_register, \
 			mipi_dsi_driver_unregister)
diff --git a/include/drm/drm_modeset_lock.h b/include/drm/drm_modeset_lock.h
index 75a5c45e21c7..70595ff565ba 100644
--- a/include/drm/drm_modeset_lock.h
+++ b/include/drm/drm_modeset_lock.h
@@ -33,6 +33,7 @@ struct drm_modeset_lock;
  * @ww_ctx: base acquire ctx
  * @contended: used internally for -EDEADLK handling
  * @locked: list of held locks
+ * @trylock_only: trylock mode used in atomic contexts/panic notifiers
  *
  * Each thread competing for a set of locks must use one acquire
  * ctx.  And if any lock fxn returns -EDEADLK, it must backoff and
@@ -126,11 +127,13 @@ void drm_modeset_unlock(struct drm_modeset_lock *lock);
 
 struct drm_device;
 struct drm_crtc;
+struct drm_plane;
 
 void drm_modeset_lock_all(struct drm_device *dev);
 int __drm_modeset_lock_all(struct drm_device *dev, bool trylock);
 void drm_modeset_unlock_all(struct drm_device *dev);
-void drm_modeset_lock_crtc(struct drm_crtc *crtc);
+void drm_modeset_lock_crtc(struct drm_crtc *crtc,
+			   struct drm_plane *plane);
 void drm_modeset_unlock_crtc(struct drm_crtc *crtc);
 void drm_warn_on_modeset_not_all_locked(struct drm_device *dev);
 struct drm_modeset_acquire_ctx *
diff --git a/include/drm/drm_plane_helper.h b/include/drm/drm_plane_helper.h
index 52e6870534b2..a185392cafeb 100644
--- a/include/drm/drm_plane_helper.h
+++ b/include/drm/drm_plane_helper.h
@@ -25,6 +25,7 @@
 #define DRM_PLANE_HELPER_H
 
 #include <drm/drm_rect.h>
+#include <drm/drm_crtc.h>
 
 /*
  * Drivers that don't allow primary plane scaling may pass this macro in place
@@ -42,6 +43,37 @@
  * planes.
  */
 
+extern int drm_crtc_init(struct drm_device *dev,
+			 struct drm_crtc *crtc,
+			 const struct drm_crtc_funcs *funcs);
+
+/**
+ * drm_plane_helper_funcs - helper operations for CRTCs
+ * @prepare_fb: prepare a framebuffer for use by the plane
+ * @cleanup_fb: cleanup a framebuffer when it's no longer used by the plane
+ * @atomic_check: check that a given atomic state is valid and can be applied
+ * @atomic_update: apply an atomic state to the plane
+ *
+ * The helper operations are called by the mid-layer CRTC helper.
+ */
+struct drm_plane_helper_funcs {
+	int (*prepare_fb)(struct drm_plane *plane,
+			  struct drm_framebuffer *fb);
+	void (*cleanup_fb)(struct drm_plane *plane,
+			   struct drm_framebuffer *fb);
+
+	int (*atomic_check)(struct drm_plane *plane,
+			    struct drm_plane_state *state);
+	void (*atomic_update)(struct drm_plane *plane,
+			      struct drm_plane_state *old_state);
+};
+
+static inline void drm_plane_helper_add(struct drm_plane *plane,
+					const struct drm_plane_helper_funcs *funcs)
+{
+	plane->helper_private = (void *)funcs;
+}
+
 extern int drm_plane_helper_check_update(struct drm_plane *plane,
 					 struct drm_crtc *crtc,
 					 struct drm_framebuffer *fb,
@@ -68,4 +100,16 @@ extern struct drm_plane *drm_primary_helper_create_plane(struct drm_device *dev,
 							 int num_formats);
 
 
+int drm_plane_helper_update(struct drm_plane *plane, struct drm_crtc *crtc,
+			    struct drm_framebuffer *fb,
+			    int crtc_x, int crtc_y,
+			    unsigned int crtc_w, unsigned int crtc_h,
+			    uint32_t src_x, uint32_t src_y,
+			    uint32_t src_w, uint32_t src_h);
+int drm_plane_helper_disable(struct drm_plane *plane);
+
+/* For use by drm_crtc_helper.c */
+int drm_plane_helper_commit(struct drm_plane *plane,
+			    struct drm_plane_state *plane_state,
+			    struct drm_framebuffer *old_fb);
 #endif
diff --git a/include/drm/i915_pciids.h b/include/drm/i915_pciids.h
index a70d45647898..180ad0e6de21 100644
--- a/include/drm/i915_pciids.h
+++ b/include/drm/i915_pciids.h
@@ -259,4 +259,21 @@
 	INTEL_VGA_DEVICE(0x22b2, info), \
 	INTEL_VGA_DEVICE(0x22b3, info)
 
+#define INTEL_SKL_IDS(info) \
+	INTEL_VGA_DEVICE(0x1916, info), /* ULT GT2 */ \
+	INTEL_VGA_DEVICE(0x1906, info), /* ULT GT1 */ \
+	INTEL_VGA_DEVICE(0x1926, info), /* ULT GT3 */ \
+	INTEL_VGA_DEVICE(0x1921, info), /* ULT GT2F */ \
+	INTEL_VGA_DEVICE(0x190E, info), /* ULX GT1 */ \
+	INTEL_VGA_DEVICE(0x191E, info), /* ULX GT2 */ \
+	INTEL_VGA_DEVICE(0x1912, info), /* DT  GT2 */ \
+	INTEL_VGA_DEVICE(0x1902, info), /* DT  GT1 */ \
+	INTEL_VGA_DEVICE(0x191B, info), /* Halo GT2 */ \
+	INTEL_VGA_DEVICE(0x192B, info), /* Halo GT3 */ \
+	INTEL_VGA_DEVICE(0x190B, info), /* Halo GT1 */ \
+	INTEL_VGA_DEVICE(0x191A, info), /* SRV GT2 */ \
+	INTEL_VGA_DEVICE(0x192A, info), /* SRV GT3 */ \
+	INTEL_VGA_DEVICE(0x190A, info), /* SRV GT1 */ \
+	INTEL_VGA_DEVICE(0x191D, info)  /* WKS GT2 */
+
 #endif /* _I915_PCIIDS_H */
diff --git a/include/drm/ttm/ttm_execbuf_util.h b/include/drm/ttm/ttm_execbuf_util.h
index 460441714413..b620c317c772 100644
--- a/include/drm/ttm/ttm_execbuf_util.h
+++ b/include/drm/ttm/ttm_execbuf_util.h
@@ -68,6 +68,7 @@ extern void ttm_eu_backoff_reservation(struct ww_acquire_ctx *ticket,
  *           non-blocking reserves should be tried.
  * @list:    thread private list of ttm_validate_buffer structs.
  * @intr:    should the wait be interruptible
+ * @dups:    [out] optional list of duplicates.
  *
  * Tries to reserve bos pointed to by the list entries for validation.
  * If the function returns 0, all buffers are marked as "unfenced",
@@ -83,6 +84,11 @@ extern void ttm_eu_backoff_reservation(struct ww_acquire_ctx *ticket,
  * calling process receives a signal while waiting. In that case, no
  * buffers on the list will be reserved upon return.
  *
+ * If dups is non NULL all buffers already reserved by the current thread
+ * (e.g. duplicates) are added to this list, otherwise -EALREADY is returned
+ * on the first already reserved buffer and all buffers from the list are
+ * unreserved again.
+ *
  * Buffers reserved by this function should be unreserved by
  * a call to either ttm_eu_backoff_reservation() or
  * ttm_eu_fence_buffer_objects() when command submission is complete or
@@ -90,7 +96,8 @@ extern void ttm_eu_backoff_reservation(struct ww_acquire_ctx *ticket,
  */
 
 extern int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
-				  struct list_head *list, bool intr);
+				  struct list_head *list, bool intr,
+				  struct list_head *dups);
 
 /**
  * function ttm_eu_fence_buffer_objects.
diff --git a/include/linux/hdmi.h b/include/linux/hdmi.h
index 11c0182a153b..cbb5790a35cd 100644
--- a/include/linux/hdmi.h
+++ b/include/linux/hdmi.h
@@ -1,9 +1,24 @@
 /*
  * Copyright (C) 2012 Avionic Design GmbH
  *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sub license,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
  */
 
 #ifndef __LINUX_HDMI_H_
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index ab8564b03468..95243d28a0ee 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -98,11 +98,11 @@ struct mmu_notifier_ops {
 	/*
 	 * invalidate_range_start() and invalidate_range_end() must be
 	 * paired and are called only when the mmap_sem and/or the
-	 * locks protecting the reverse maps are held. The subsystem
-	 * must guarantee that no additional references are taken to
-	 * the pages in the range established between the call to
-	 * invalidate_range_start() and the matching call to
-	 * invalidate_range_end().
+	 * locks protecting the reverse maps are held. If the subsystem
+	 * can't guarantee that no additional references are taken to
+	 * the pages in the range, it has to implement the
+	 * invalidate_range() notifier to remove any references taken
+	 * after invalidate_range_start().
 	 *
 	 * Invalidation of multiple concurrent ranges may be
 	 * optionally permitted by the driver. Either way the
@@ -144,6 +144,29 @@ struct mmu_notifier_ops {
 	void (*invalidate_range_end)(struct mmu_notifier *mn,
 				     struct mm_struct *mm,
 				     unsigned long start, unsigned long end);
+
+	/*
+	 * invalidate_range() is either called between
+	 * invalidate_range_start() and invalidate_range_end() when the
+	 * VM has to free pages that where unmapped, but before the
+	 * pages are actually freed, or outside of _start()/_end() when
+	 * a (remote) TLB is necessary.
+	 *
+	 * If invalidate_range() is used to manage a non-CPU TLB with
+	 * shared page-tables, it not necessary to implement the
+	 * invalidate_range_start()/end() notifiers, as
+	 * invalidate_range() alread catches the points in time when an
+	 * external TLB range needs to be flushed.
+	 *
+	 * The invalidate_range() function is called under the ptl
+	 * spin-lock and not allowed to sleep.
+	 *
+	 * Note that this function might be called with just a sub-range
+	 * of what was passed to invalidate_range_start()/end(), if
+	 * called between those functions.
+	 */
+	void (*invalidate_range)(struct mmu_notifier *mn, struct mm_struct *mm,
+				 unsigned long start, unsigned long end);
 };
 
 /*
@@ -190,6 +213,8 @@ extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
 				  unsigned long start, unsigned long end);
 extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
 				  unsigned long start, unsigned long end);
+extern void __mmu_notifier_invalidate_range(struct mm_struct *mm,
+				  unsigned long start, unsigned long end);
 
 static inline void mmu_notifier_release(struct mm_struct *mm)
 {
@@ -242,6 +267,13 @@ static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
 		__mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
+static inline void mmu_notifier_invalidate_range(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_range(mm, start, end);
+}
+
 static inline void mmu_notifier_mm_init(struct mm_struct *mm)
 {
 	mm->mmu_notifier_mm = NULL;
@@ -279,6 +311,44 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
 	__young;							\
 })
 
+#define	ptep_clear_flush_notify(__vma, __address, __ptep)		\
+({									\
+	unsigned long ___addr = __address & PAGE_MASK;			\
+	struct mm_struct *___mm = (__vma)->vm_mm;			\
+	pte_t ___pte;							\
+									\
+	___pte = ptep_clear_flush(__vma, __address, __ptep);		\
+	mmu_notifier_invalidate_range(___mm, ___addr,			\
+					___addr + PAGE_SIZE);		\
+									\
+	___pte;								\
+})
+
+#define pmdp_clear_flush_notify(__vma, __haddr, __pmd)			\
+({									\
+	unsigned long ___haddr = __haddr & HPAGE_PMD_MASK;		\
+	struct mm_struct *___mm = (__vma)->vm_mm;			\
+	pmd_t ___pmd;							\
+									\
+	___pmd = pmdp_clear_flush(__vma, __haddr, __pmd);		\
+	mmu_notifier_invalidate_range(___mm, ___haddr,			\
+				      ___haddr + HPAGE_PMD_SIZE);	\
+									\
+	___pmd;								\
+})
+
+#define pmdp_get_and_clear_notify(__mm, __haddr, __pmd)			\
+({									\
+	unsigned long ___haddr = __haddr & HPAGE_PMD_MASK;		\
+	pmd_t ___pmd;							\
+									\
+	___pmd = pmdp_get_and_clear(__mm, __haddr, __pmd);		\
+	mmu_notifier_invalidate_range(__mm, ___haddr,			\
+				      ___haddr + HPAGE_PMD_SIZE);	\
+									\
+	___pmd;								\
+})
+
 /*
  * set_pte_at_notify() sets the pte _after_ running the notifier.
  * This is safe to start by updating the secondary MMUs, because the primary MMU
@@ -342,6 +412,11 @@ static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
 {
 }
 
+static inline void mmu_notifier_invalidate_range(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+}
+
 static inline void mmu_notifier_mm_init(struct mm_struct *mm)
 {
 }
@@ -352,6 +427,9 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
 
 #define ptep_clear_flush_young_notify ptep_clear_flush_young
 #define pmdp_clear_flush_young_notify pmdp_clear_flush_young
+#define	ptep_clear_flush_notify ptep_clear_flush
+#define pmdp_clear_flush_notify pmdp_clear_flush
+#define pmdp_get_and_clear_notify pmdp_get_and_clear
 #define set_pte_at_notify set_pte_at
 
 #endif /* CONFIG_MMU_NOTIFIER */
diff --git a/include/linux/platform_data/rcar-du.h b/include/linux/platform_data/rcar-du.h
deleted file mode 100644
index a5f045e1d8fe..000000000000
--- a/include/linux/platform_data/rcar-du.h
+++ /dev/null
@@ -1,74 +0,0 @@
-/*
- * rcar_du.h  --  R-Car Display Unit DRM driver
- *
- * Copyright (C) 2013 Renesas Corporation
- *
- * Contact: Laurent Pinchart (laurent.pinchart@ideasonboard.com)
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#ifndef __RCAR_DU_H__
-#define __RCAR_DU_H__
-
-#include <video/videomode.h>
-
-enum rcar_du_output {
-	RCAR_DU_OUTPUT_DPAD0,
-	RCAR_DU_OUTPUT_DPAD1,
-	RCAR_DU_OUTPUT_LVDS0,
-	RCAR_DU_OUTPUT_LVDS1,
-	RCAR_DU_OUTPUT_TCON,
-	RCAR_DU_OUTPUT_MAX,
-};
-
-enum rcar_du_encoder_type {
-	RCAR_DU_ENCODER_UNUSED = 0,
-	RCAR_DU_ENCODER_NONE,
-	RCAR_DU_ENCODER_VGA,
-	RCAR_DU_ENCODER_LVDS,
-};
-
-struct rcar_du_panel_data {
-	unsigned int width_mm;		/* Panel width in mm */
-	unsigned int height_mm;		/* Panel height in mm */
-	struct videomode mode;
-};
-
-struct rcar_du_connector_lvds_data {
-	struct rcar_du_panel_data panel;
-};
-
-struct rcar_du_connector_vga_data {
-	/* TODO: Add DDC information for EDID retrieval */
-};
-
-/*
- * struct rcar_du_encoder_data - Encoder platform data
- * @type: the encoder type (RCAR_DU_ENCODER_*)
- * @output: the DU output the connector is connected to (RCAR_DU_OUTPUT_*)
- * @connector.lvds: platform data for LVDS connectors
- * @connector.vga: platform data for VGA connectors
- *
- * Encoder platform data describes an on-board encoder, its associated DU SoC
- * output, and the connector.
- */
-struct rcar_du_encoder_data {
-	enum rcar_du_encoder_type type;
-	enum rcar_du_output output;
-
-	union {
-		struct rcar_du_connector_lvds_data lvds;
-		struct rcar_du_connector_vga_data vga;
-	} connector;
-};
-
-struct rcar_du_platform_data {
-	struct rcar_du_encoder_data *encoders;
-	unsigned int num_encoders;
-};
-
-#endif /* __RCAR_DU_H__ */
diff --git a/include/trace/events/host1x.h b/include/trace/events/host1x.h
index 94db6a2c3540..63116362543c 100644
--- a/include/trace/events/host1x.h
+++ b/include/trace/events/host1x.h
@@ -29,6 +29,8 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
+struct host1x_bo;
+
 DECLARE_EVENT_CLASS(host1x,
 	TP_PROTO(const char *name),
 	TP_ARGS(name),
@@ -79,14 +81,14 @@ TRACE_EVENT(host1x_cdma_push,
 );
 
 TRACE_EVENT(host1x_cdma_push_gather,
-	TP_PROTO(const char *name, u32 mem_id,
+	TP_PROTO(const char *name, struct host1x_bo *bo,
 			u32 words, u32 offset, void *cmdbuf),
 
-	TP_ARGS(name, mem_id, words, offset, cmdbuf),
+	TP_ARGS(name, bo, words, offset, cmdbuf),
 
 	TP_STRUCT__entry(
 		__field(const char *, name)
-		__field(u32, mem_id)
+		__field(struct host1x_bo *, bo)
 		__field(u32, words)
 		__field(u32, offset)
 		__field(bool, cmdbuf)
@@ -100,13 +102,13 @@ TRACE_EVENT(host1x_cdma_push_gather,
 		}
 		__entry->cmdbuf = cmdbuf;
 		__entry->name = name;
-		__entry->mem_id = mem_id;
+		__entry->bo = bo;
 		__entry->words = words;
 		__entry->offset = offset;
 	),
 
-	TP_printk("name=%s, mem_id=%08x, words=%u, offset=%d, contents=[%s]",
-	  __entry->name, __entry->mem_id,
+	TP_printk("name=%s, bo=%p, words=%u, offset=%d, contents=[%s]",
+	  __entry->name, __entry->bo,
 	  __entry->words, __entry->offset,
 	  __print_hex(__get_dynamic_array(cmdbuf),
 		  __entry->cmdbuf ? __entry->words * 4 : 0))
@@ -221,12 +223,13 @@ TRACE_EVENT(host1x_syncpt_load_min,
 );
 
 TRACE_EVENT(host1x_syncpt_wait_check,
-	TP_PROTO(void *mem_id, u32 offset, u32 syncpt_id, u32 thresh, u32 min),
+	TP_PROTO(struct host1x_bo *bo, u32 offset, u32 syncpt_id, u32 thresh,
+		 u32 min),
 
-	TP_ARGS(mem_id, offset, syncpt_id, thresh, min),
+	TP_ARGS(bo, offset, syncpt_id, thresh, min),
 
 	TP_STRUCT__entry(
-		__field(void *, mem_id)
+		__field(struct host1x_bo *, bo)
 		__field(u32, offset)
 		__field(u32, syncpt_id)
 		__field(u32, thresh)
@@ -234,15 +237,15 @@ TRACE_EVENT(host1x_syncpt_wait_check,
 	),
 
 	TP_fast_assign(
-		__entry->mem_id = mem_id;
+		__entry->bo = bo;
 		__entry->offset = offset;
 		__entry->syncpt_id = syncpt_id;
 		__entry->thresh = thresh;
 		__entry->min = min;
 	),
 
-	TP_printk("mem_id=%p, offset=%05x, id=%d, thresh=%d, current=%d",
-		__entry->mem_id, __entry->offset,
+	TP_printk("bo=%p, offset=%05x, id=%d, thresh=%d, current=%d",
+		__entry->bo, __entry->offset,
 		__entry->syncpt_id, __entry->thresh,
 		__entry->min)
 );
diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index a0db2d4aa5f0..86574b0005ff 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -286,6 +286,8 @@ struct drm_mode_get_property {
 	char name[DRM_PROP_NAME_LEN];
 
 	__u32 count_values;
+	/* This is only used to count enum values, not blobs. The _blobs is
+	 * simply because of a historical reason, i.e. backwards compat. */
 	__u32 count_enum_blobs;
 };
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index ff57f07c3249..250262265ee3 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
 #define I915_PARAM_CMD_PARSER_VERSION	 28
+#define I915_PARAM_HAS_COHERENT_PHYS_GTT 29
 
 typedef struct drm_i915_getparam {
 	int param;
@@ -876,6 +877,12 @@ struct drm_i915_gem_get_tiling {
 	 * mmap mapping.
 	 */
 	__u32 swizzle_mode;
+
+	/**
+	 * Returned address bit 6 swizzling required for CPU access through
+	 * mmap mapping whilst bound.
+	 */
+	__u32 phys_swizzle_mode;
 };
 
 struct drm_i915_gem_get_aperture {
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
new file mode 100644
index 000000000000..7acef41fc209
--- /dev/null
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -0,0 +1,154 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef KFD_IOCTL_H_INCLUDED
+#define KFD_IOCTL_H_INCLUDED
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+#define KFD_IOCTL_MAJOR_VERSION 1
+#define KFD_IOCTL_MINOR_VERSION 0
+
+struct kfd_ioctl_get_version_args {
+	uint32_t major_version;	/* from KFD */
+	uint32_t minor_version;	/* from KFD */
+};
+
+/* For kfd_ioctl_create_queue_args.queue_type. */
+#define KFD_IOC_QUEUE_TYPE_COMPUTE	0
+#define KFD_IOC_QUEUE_TYPE_SDMA		1
+#define KFD_IOC_QUEUE_TYPE_COMPUTE_AQL	2
+
+#define KFD_MAX_QUEUE_PERCENTAGE	100
+#define KFD_MAX_QUEUE_PRIORITY		15
+
+struct kfd_ioctl_create_queue_args {
+	uint64_t ring_base_address;	/* to KFD */
+	uint64_t write_pointer_address;	/* from KFD */
+	uint64_t read_pointer_address;	/* from KFD */
+	uint64_t doorbell_offset;	/* from KFD */
+
+	uint32_t ring_size;		/* to KFD */
+	uint32_t gpu_id;		/* to KFD */
+	uint32_t queue_type;		/* to KFD */
+	uint32_t queue_percentage;	/* to KFD */
+	uint32_t queue_priority;	/* to KFD */
+	uint32_t queue_id;		/* from KFD */
+
+	uint64_t eop_buffer_address;	/* to KFD */
+	uint64_t eop_buffer_size;	/* to KFD */
+	uint64_t ctx_save_restore_address; /* to KFD */
+	uint64_t ctx_save_restore_size;	/* to KFD */
+};
+
+struct kfd_ioctl_destroy_queue_args {
+	uint32_t queue_id;		/* to KFD */
+	uint32_t pad;
+};
+
+struct kfd_ioctl_update_queue_args {
+	uint64_t ring_base_address;	/* to KFD */
+
+	uint32_t queue_id;		/* to KFD */
+	uint32_t ring_size;		/* to KFD */
+	uint32_t queue_percentage;	/* to KFD */
+	uint32_t queue_priority;	/* to KFD */
+};
+
+/* For kfd_ioctl_set_memory_policy_args.default_policy and alternate_policy */
+#define KFD_IOC_CACHE_POLICY_COHERENT 0
+#define KFD_IOC_CACHE_POLICY_NONCOHERENT 1
+
+struct kfd_ioctl_set_memory_policy_args {
+	uint64_t alternate_aperture_base;	/* to KFD */
+	uint64_t alternate_aperture_size;	/* to KFD */
+
+	uint32_t gpu_id;			/* to KFD */
+	uint32_t default_policy;		/* to KFD */
+	uint32_t alternate_policy;		/* to KFD */
+	uint32_t pad;
+};
+
+/*
+ * All counters are monotonic. They are used for profiling of compute jobs.
+ * The profiling is done by userspace.
+ *
+ * In case of GPU reset, the counter should not be affected.
+ */
+
+struct kfd_ioctl_get_clock_counters_args {
+	uint64_t gpu_clock_counter;	/* from KFD */
+	uint64_t cpu_clock_counter;	/* from KFD */
+	uint64_t system_clock_counter;	/* from KFD */
+	uint64_t system_clock_freq;	/* from KFD */
+
+	uint32_t gpu_id;		/* to KFD */
+	uint32_t pad;
+};
+
+#define NUM_OF_SUPPORTED_GPUS 7
+
+struct kfd_process_device_apertures {
+	uint64_t lds_base;		/* from KFD */
+	uint64_t lds_limit;		/* from KFD */
+	uint64_t scratch_base;		/* from KFD */
+	uint64_t scratch_limit;		/* from KFD */
+	uint64_t gpuvm_base;		/* from KFD */
+	uint64_t gpuvm_limit;		/* from KFD */
+	uint32_t gpu_id;		/* from KFD */
+	uint32_t pad;
+};
+
+struct kfd_ioctl_get_process_apertures_args {
+	struct kfd_process_device_apertures
+			process_apertures[NUM_OF_SUPPORTED_GPUS];/* from KFD */
+
+	/* from KFD, should be in the range [1 - NUM_OF_SUPPORTED_GPUS] */
+	uint32_t num_of_nodes;
+	uint32_t pad;
+};
+
+#define KFD_IOC_MAGIC 'K'
+
+#define KFD_IOC_GET_VERSION \
+		_IOR(KFD_IOC_MAGIC, 1, struct kfd_ioctl_get_version_args)
+
+#define KFD_IOC_CREATE_QUEUE \
+		_IOWR(KFD_IOC_MAGIC, 2, struct kfd_ioctl_create_queue_args)
+
+#define KFD_IOC_DESTROY_QUEUE \
+	_IOWR(KFD_IOC_MAGIC, 3, struct kfd_ioctl_destroy_queue_args)
+
+#define KFD_IOC_SET_MEMORY_POLICY \
+	_IOW(KFD_IOC_MAGIC, 4, struct kfd_ioctl_set_memory_policy_args)
+
+#define KFD_IOC_GET_CLOCK_COUNTERS \
+	_IOWR(KFD_IOC_MAGIC, 5, struct kfd_ioctl_get_clock_counters_args)
+
+#define KFD_IOC_GET_PROCESS_APERTURES \
+	_IOR(KFD_IOC_MAGIC, 6, struct kfd_ioctl_get_process_apertures_args)
+
+#define KFD_IOC_UPDATE_QUEUE \
+	_IOW(KFD_IOC_MAGIC, 7, struct kfd_ioctl_update_queue_args)
+
+#endif
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 995a95f61a19..cb346f26a22d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -193,7 +193,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	}
 
 	flush_cache_page(vma, addr, pte_pfn(*ptep));
-	ptep_clear_flush(vma, addr, ptep);
+	ptep_clear_flush_notify(vma, addr, ptep);
 	set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot));
 
 	page_remove_rmap(page);
diff --git a/kernel/time/time.c b/kernel/time/time.c
index 65015ff2f07c..6390517e77d4 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -741,6 +741,7 @@ u64 nsecs_to_jiffies64(u64 n)
 	return div_u64(n * 9, (9ull * NSEC_PER_SEC + HZ / 2) / HZ);
 #endif
 }
+EXPORT_SYMBOL(nsecs_to_jiffies64);
 
 /**
  * nsecs_to_jiffies - Convert nsecs in u64 to jiffies
diff --git a/mm/fremap.c b/mm/fremap.c
index 11ef7ec40d13..2805d71cf476 100644
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -37,7 +37,7 @@ static void zap_pte(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	if (pte_present(pte)) {
 		flush_cache_page(vma, addr, pte_pfn(pte));
-		pte = ptep_clear_flush(vma, addr, ptep);
+		pte = ptep_clear_flush_notify(vma, addr, ptep);
 		page = vm_normal_page(vma, addr, pte);
 		if (page) {
 			if (pte_dirty(pte))
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 46f96c23cc27..817a875f2b8c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1035,7 +1035,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
 		goto out_free_pages;
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 
-	pmdp_clear_flush(vma, haddr, pmd);
+	pmdp_clear_flush_notify(vma, haddr, pmd);
 	/* leave pmd empty until pte is filled */
 
 	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
@@ -1178,7 +1178,7 @@ alloc:
 		pmd_t entry;
 		entry = mk_huge_pmd(new_page, vma->vm_page_prot);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
-		pmdp_clear_flush(vma, haddr, pmd);
+		pmdp_clear_flush_notify(vma, haddr, pmd);
 		page_add_new_anon_rmap(new_page, vma, haddr);
 		mem_cgroup_commit_charge(new_page, memcg, false);
 		lru_cache_add_active_or_unevictable(new_page, vma);
@@ -1512,7 +1512,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		pmd_t entry;
 		ret = 1;
 		if (!prot_numa) {
-			entry = pmdp_get_and_clear(mm, addr, pmd);
+			entry = pmdp_get_and_clear_notify(mm, addr, pmd);
 			if (pmd_numa(entry))
 				entry = pmd_mknonnuma(entry);
 			entry = pmd_modify(entry, newprot);
@@ -1644,6 +1644,7 @@ static int __split_huge_page_splitting(struct page *page,
 		 * serialize against split_huge_page*.
 		 */
 		pmdp_splitting_flush(vma, address, pmd);
+
 		ret = 1;
 		spin_unlock(ptl);
 	}
@@ -2834,7 +2835,7 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
 	pmd_t _pmd;
 	int i;
 
-	pmdp_clear_flush(vma, haddr, pmd);
+	pmdp_clear_flush_notify(vma, haddr, pmd);
 	/* leave pmd empty until pte is filled */
 
 	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 47f6070d7c46..85032de5e20f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2598,8 +2598,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			}
 			set_huge_pte_at(dst, addr, dst_pte, entry);
 		} else {
-			if (cow)
+			if (cow) {
 				huge_ptep_set_wrprotect(src, addr, src_pte);
+				mmu_notifier_invalidate_range(src, mmun_start,
+								   mmun_end);
+			}
 			entry = huge_ptep_get(src_pte);
 			ptepage = pte_page(entry);
 			get_page(ptepage);
@@ -2901,6 +2904,7 @@ retry_avoidcopy:
 
 		/* Break COW */
 		huge_ptep_clear_flush(vma, address, ptep);
+		mmu_notifier_invalidate_range(mm, mmun_start, mmun_end);
 		set_huge_pte_at(mm, address, ptep,
 				make_huge_pte(vma, new_page, 1));
 		page_remove_rmap(old_page);
@@ -3376,6 +3380,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	 * and that page table be reused and filled with junk.
 	 */
 	flush_tlb_range(vma, start, end);
+	mmu_notifier_invalidate_range(mm, start, end);
 	i_mmap_unlock_write(vma->vm_file->f_mapping);
 	mmu_notifier_invalidate_range_end(mm, start, end);
 
diff --git a/mm/ksm.c b/mm/ksm.c
index 6b2e337bc03c..d247efab5073 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -892,7 +892,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 		 * this assure us that no O_DIRECT can happen after the check
 		 * or in the middle of the check.
 		 */
-		entry = ptep_clear_flush(vma, addr, ptep);
+		entry = ptep_clear_flush_notify(vma, addr, ptep);
 		/*
 		 * Check that no O_DIRECT or similar I/O is in progress on the
 		 * page
@@ -960,7 +960,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
 	page_add_anon_rmap(kpage, vma, addr);
 
 	flush_cache_page(vma, addr, pte_pfn(*ptep));
-	ptep_clear_flush(vma, addr, ptep);
+	ptep_clear_flush_notify(vma, addr, ptep);
 	set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot));
 
 	page_remove_rmap(page);
diff --git a/mm/memory.c b/mm/memory.c
index fbf74112de5b..c3b9097251c5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -239,6 +239,7 @@ static void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 		return;
 
 	tlb_flush(tlb);
+	mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb_table_flush(tlb);
 #endif
@@ -2220,7 +2221,7 @@ gotten:
 		 * seen in the presence of one thread doing SMC and another
 		 * thread doing COW.
 		 */
-		ptep_clear_flush(vma, address, page_table);
+		ptep_clear_flush_notify(vma, address, page_table);
 		page_add_new_anon_rmap(new_page, vma, address);
 		mem_cgroup_commit_charge(new_page, memcg, false);
 		lru_cache_add_active_or_unevictable(new_page, vma);
diff --git a/mm/migrate.c b/mm/migrate.c
index 253474c22239..b1d02127e1be 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1862,7 +1862,7 @@ fail_putback:
 	 */
 	flush_cache_range(vma, mmun_start, mmun_end);
 	page_add_anon_rmap(new_page, vma, mmun_start);
-	pmdp_clear_flush(vma, mmun_start, pmd);
+	pmdp_clear_flush_notify(vma, mmun_start, pmd);
 	set_pmd_at(mm, mmun_start, pmd, entry);
 	flush_tlb_range(vma, mmun_start, mmun_end);
 	update_mmu_cache_pmd(vma, address, &entry);
@@ -1870,6 +1870,7 @@ fail_putback:
 	if (page_count(page) != 2) {
 		set_pmd_at(mm, mmun_start, pmd, orig_entry);
 		flush_tlb_range(vma, mmun_start, mmun_end);
+		mmu_notifier_invalidate_range(mm, mmun_start, mmun_end);
 		update_mmu_cache_pmd(vma, address, &entry);
 		page_remove_rmap(new_page);
 		goto fail_putback;
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 2c8da9825fe3..3b9b3d0741b2 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -193,6 +193,16 @@ void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
 
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
+		/*
+		 * Call invalidate_range here too to avoid the need for the
+		 * subsystem of having to register an invalidate_range_end
+		 * call-back when there is invalidate_range already. Usually a
+		 * subsystem registers either invalidate_range_start()/end() or
+		 * invalidate_range(), so this will be no additional overhead
+		 * (besides the pointer check).
+		 */
+		if (mn->ops->invalidate_range)
+			mn->ops->invalidate_range(mn, mm, start, end);
 		if (mn->ops->invalidate_range_end)
 			mn->ops->invalidate_range_end(mn, mm, start, end);
 	}
@@ -200,6 +210,21 @@ void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
 }
 EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_end);
 
+void __mmu_notifier_invalidate_range(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	struct mmu_notifier *mn;
+	int id;
+
+	id = srcu_read_lock(&srcu);
+	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
+		if (mn->ops->invalidate_range)
+			mn->ops->invalidate_range(mn, mm, start, end);
+	}
+	srcu_read_unlock(&srcu, id);
+}
+EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range);
+
 static int do_mmu_notifier_register(struct mmu_notifier *mn,
 				    struct mm_struct *mm,
 				    int take_mmap_sem)
diff --git a/mm/rmap.c b/mm/rmap.c
index c52f43a69eea..45ba250babd8 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1380,7 +1380,7 @@ static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
 
 		/* Nuke the page table entry. */
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		pteval = ptep_clear_flush(vma, address, pte);
+		pteval = ptep_clear_flush_notify(vma, address, pte);
 
 		/* If nonlinear, store the file page offset in the pte. */
 		if (page->index != linear_page_index(vma, address)) {
author	Linus Torvalds <torvalds@linux-foundation.org>	2014-12-15 15:52:01 -0800
committer	Linus Torvalds <torvalds@linux-foundation.org>	2014-12-15 15:52:01 -0800
commit	988adfdffdd43cfd841df734664727993076d7cb (patch)
tree	6794f7bba8f595500c2b7d33376ad6614adcfaf2
parent	x86: mm: consolidate VM_FAULT_RETRY handling (diff)
parent	drm: sti: fix module compilation issue (diff)
download	linux-dev-988adfdffdd43cfd841df734664727993076d7cb.tar.xz linux-dev-988adfdffdd43cfd841df734664727993076d7cb.zip