aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/i915/i915_perf.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-11-18drm/i915/perf: don't forget noa wait after oa configLionel Landwerlin1-2/+7
I'm observing incoherence metric values, changing from run to run. It appears the patches introducing noa wait & reconfiguration from command stream switched places in the series multiple times during the review. This lead to the dependency of one onto the order to go missing... Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191113154639.27144-1-lionel.g.landwerlin@intel.com (cherry picked from commit 93937659dc644f708def8fa58cb63c5c9f499f26) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-11-12drm/i915/perf: always consider holding preemption a privileged opLionel Landwerlin1-10/+10
The ordering of the checks in the existing code can lead to holding preemption not being considered as privileged op. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 9cd20ef7803c ("drm/i915/perf: allow holding preemption on filtered ctx") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191111095308.2550-1-lionel.g.landwerlin@intel.com (cherry picked from commit 0b0120d4c7b013eba59b33254febc0a6e4049e13) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-10-29drm/i915/tgl: Add perf support on TGLLionel Landwerlin1-34/+317
The design of the OA unit has been split into several units. We now have a global unit (OAG) and a render specific unit (OAR). This leads to some changes on how we program things. Some details : OAR: - has its own set of counter registers, they are per-context saved/restored - counters are not written to the circular OA buffer - a snapshot of the counters can be acquired with MI_RECORD_PERF_COUNT, or a single counter can be read with MI_STORE_REGISTER_MEM. OAG: - has global counters that increment across context switches - counters are written into the circular OA buffer (if requested) v2: Fix checkpatch warnings on code style (Lucas) v3: (Umesh) - Update register from which tail, status and head are read - Update logic to sample context reports - Update whitelist mux and b counter regs v4: Fix a bug when updating context image for new contexts (Umesh) v5: Squash patch enabling save/restore of counters into context image We want this so we can preempt performance queries and keep the system responsive even when long running queries are ongoing. We avoid doing it for all contexts. - use LRI to modify context control (Chris) - use MASKED_FIELD to program just the masked bits (Chris) - disable save/restore of counters on cleanup (Chris) v6: Do not use implicit parameters (Chris) BSpec: 28727, 30021 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Chris Wilson <chris.p.wilson@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191025193746.47155-2-umesh.nerlige.ramappa@intel.com
2019-10-29drm/i915/perf: Add helper macros for comparing with whitelisted registersUmesh Nerlige Ramappa1-26/+28
Add helper macros for range and equality comparisons and use them to check with whitelisted registers in oa configurations. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191025193746.47155-1-umesh.nerlige.ramappa@intel.com
2019-10-28drm/i915/execlists: Use vfunc to check engine submission modeMichal Wajdeczko1-5/+5
While processing CSB there is no need to look at GuC submission settings, just check if engine is configured for execlists mode. While today GuC submission is disabled it's settings are still based on modparam values that might not correctly reflect actual submission status in case of any fallback. Until that is fully fixed, use alternate method to confirm that engine really runs in execlists mode by comparing set_default_submission vfunc. v2: add other immediate use of new helper Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191028164520.31772-1-michal.wajdeczko@intel.com
2019-10-24drm/i915/gt: Split intel_ring_submissionChris Wilson1-0/+1
Split the legacy submission backend from the common CS ring buffer handling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk
2019-10-22drm/i915: Drop assertion that ce->pin_mutex guards state updatesChris Wilson1-1/+0
The actual conditions are that we know the GPU is not accessing the context, and we hold a pin on the context image to allow CPU access. We used a fake lock on ce->pin_mutex so that we could try and use lockdep to assert that access is serialised, but the various different hardirq/softirq contexts where we need to *fake* holding the pin_mutex are causing more trouble. Still it would be nice if we did have a way to reassure ourselves that the direct update to the context image is serialised with GPU execution. In the meantime, stop lockdep complaining about false irq inversions. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111923 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191022122845.25038-1-chris@chris-wilson.co.uk
2019-10-20drm/i915/perf: fix oa config reconfigurationLionel Landwerlin1-5/+6
The current logic just reapplies the same configuration already stored into stream->oa_config instead of the newly selected one. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 7831e9a965ea ("drm/i915/perf: Allow dynamic reconfiguration of the OA stream") Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191019214647.27866-1-lionel.g.landwerlin@intel.com
2019-10-14drm/i915/perf: allow holding preemption on filtered ctxLionel Landwerlin1-3/+31
We would like to make use of perf in Vulkan. The Vulkan API is much lower level than OpenGL, with applications directly exposed to the concept of command buffers (pretty much equivalent to our batch buffers). In Vulkan, queries are always limited in scope to a command buffer. In OpenGL, the lack of command buffer concept meant that queries' duration could span multiple command buffers. With that restriction gone in Vulkan, we would like to simplify measuring performance just by measuring the deltas between the counter snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the more complex scheme we currently have in the GL driver, using 2 MI_RECORD_PERF_COUNT commands and doing some post processing on the stream of OA reports, coming from the global OA buffer, to remove any unrelated deltas in between the 2 MI_RECORD_PERF_COUNT. Disabling preemption only apply to a single context with which want to query performance counters for and is considered a privileged operation, by default protected by CAP_SYS_ADMIN. It is possible to enable it for a normal user by disabling the paranoid stream setting. v2: Store preemption setting in intel_context (Chris) v3: Use priorities to avoid preemption rather than the HW mechanism v4: Just modify the port priority reporting function v5: Add nopreempt flag on gem context and always flag requests appropriately, regarless of OA reconfiguration. Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-4-chris@chris-wilson.co.uk
2019-10-14drm/i915/perf: Allow dynamic reconfiguration of the OA streamChris Wilson1-1/+45
Introduce a new perf_ioctl command to change the OA configuration of the active stream. This allows the OA stream to be reconfigured between batch buffers, giving greater flexibility in sampling. We inject a request into the OA context to reconfigure the stream asynchronously on the GPU in between and ordered with execbuffer calls. Original patch for dynamic reconfiguration by Lionel Landwerlin. Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-3-chris@chris-wilson.co.uk
2019-10-14drm/i915: add support for perf configuration queriesLionel Landwerlin1-2/+1
Listing configurations at the moment is supported only through sysfs. This might cause issues for applications wanting to list configurations from a container where sysfs isn't available. This change adds a way to query the number of configurations and their content through the i915 query uAPI. v2: Fix sparse warnings (Lionel) Add support to query configuration using uuid (Lionel) v3: Fix some inconsistency in uapi header (Lionel) Fix unlocking when not locked issue (Lionel) Add debug messages (Lionel) v4: Fix missing unlock (Dan) v5: Drop lock when copying config content to userspace (Chris) v6: Drop lock when copying config list to userspace (Chris) Fix deadlock when calling i915_perf_get_oa_config() under perf.metrics_lock (Lionel) Add i915_oa_config_get() (Chris) Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-2-chris@chris-wilson.co.uk
2019-10-14drm/i915/perf: introduce a versioning of the i915-perf uapiLionel Landwerlin1-0/+10
Reporting this version will help application figure out what level of the support the running kernel provides. v2: Add i915_perf_ioctl_version() (Chris) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-1-chris@chris-wilson.co.uk
2019-10-13drm/i915/perf: Avoid polluting the i915_oa_config with error pointersChris Wilson1-27/+25
Use a local variable to track the allocation errors to avoid polluting the struct and keep the free simple. Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191013095211.2922-1-chris@chris-wilson.co.uk
2019-10-12drm/i915/perf: Prefer using the pinned_ctx for emitting delays on configChris Wilson1-2/+7
When we are watching a particular context, we want the OA config to be applied inline with that context such that it takes effect before the next submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191012091056.28686-1-chris@chris-wilson.co.uk
2019-10-12drm/i915/perf: execute OA configuration from command streamLionel Landwerlin1-43/+156
We haven't run into issues with programming the global OA/NOA registers configuration from CPU so far, but HW engineers actually recommend doing this from the command streamer. On TGL in particular one of the clock domain in which some of that programming goes might not be powered when we poke things from the CPU. Since we have a command buffer prepared for the execbuffer side of things, we can reuse that approach here too. This also allows us to significantly reduce the amount of time we hold the main lock. v2: Drop the global lock as much as possible v3: Take global lock to pin global v4: Create i915 request in emit_oa_config() to avoid deadlocks (Lionel) v5: Move locking to the stream (Lionel) v6: Move active reconfiguration request into i915_perf_stream (Lionel) v7: Pin VMA outside request creation (Chris) Lock VMA before move to active (Chris) v8: Fix double free on stream->initial_oa_config_bo (Lionel) Don't allow interruption when waiting on active config request (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-3-chris@chris-wilson.co.uk
2019-10-12drm/i915/perf: implement active wait for noa configurationsLionel Landwerlin1-0/+224
NOA configuration take some amount of time to apply. That amount of time depends on the size of the GT. There is no documented time for this. For example, past experimentations with powergating configuration changes seem to indicate a 60~70us delay. We go with 500us as default for now which should be over the required amount of time (according to HW architects). v2: Don't forget to save/restore registers used for the wait (Chris) v3: Name used CS_GPR registers (Chris) Fix compile issue due to rebase (Lionel) v4: Fix save/restore helpers (Umesh) v5: Move noa_wait from drm_i915_private to i915_perf_stream (Lionel) v6: Add missing struct declarations in i915_perf.h Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-2-chris@chris-wilson.co.uk
2019-10-12drm/i915/perf: allow for CS OA configs to be created lazilyLionel Landwerlin1-46/+61
Here we introduce a mechanism by which the execbuf part of the i915 driver will be able to request that a batch buffer containing the programming for a particular OA config be created. We'll execute these OA configuration buffers right before executing a set of userspace commands so that a particular user batchbuffer be executed with a given OA configuration. This mechanism essentially allows the userspace driver to go through several OA configuration without having to open/close the i915/perf stream. v2: No need for locking on object OA config object creation (Chris) Flush cpu mapping of OA config (Chris) v3: Properly deal with the perf_metric lock (Chris/Lionel) v4: Fix oa config unref/put when not found (Lionel) v5: Allocate BOs for configurations on the stream instead of globally (Lionel) v6: Fix 64bit division (Chris) v7: Store allocated config BOs into the stream (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-1-chris@chris-wilson.co.uk
2019-10-12drm/i915/perf: Replace global wakeref tracking with engine-pmChris Wilson1-4/+4
As we now have a specific engine to use OA on, exchange the top-level runtime-pm wakeref with the engine-pm. This still results in the same top-level runtime-pm, but with more nuances to keep the engine and its gt awake. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191011190325.10979-1-chris@chris-wilson.co.uk
2019-10-10drm/i915/perf: Store shortcut to intel_uncoreChris Wilson1-24/+24
Now that we have the engine stored in i915_perf, we have a means of accessing intel_gt should we require it. However, we are currently only using the intel_gt to find the right intel_uncore, so replace our i915_perf.gt pointer with the more useful i915_perf.uncore. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191010150520.26488-2-chris@chris-wilson.co.uk
2019-10-10drm/i915/perf: store the associated engine of a streamLionel Landwerlin1-4/+26
We'll use this information later to verify that a client trying to reconfigure the stream does so on the right engine. For now, we want to pull the knowledge of which engine we use into a central property. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191010150520.26488-1-chris@chris-wilson.co.uk
2019-10-08drm/i915/perf: drop list of streamsLionel Landwerlin1-15/+1
At some point in time there was the idea that we could have multiple stream from the same piece of HW but that never materialized and given the hard time we already have making everything work with the submission side, there is no real point having this list of 1 element around. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191008140111.5437-1-chris@chris-wilson.co.uk
2019-10-08drm/i915/perf: Set the exclusive stream under perf->lockChris Wilson1-11/+1
The BKL struct_mutex is no more, the only serialisation we required for setting the exclusive stream is already managed by ce->pin_mutex in gen8_configure_all_contexts(). As such, we can manipulate i915_perf.exclusive_stream underneath our own (already held) perf->lock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191007140812.10963-2-chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20191007210942.18145-2-chris@chris-wilson.co.uk
2019-10-08drm/i915/perf: Wean ourselves off dev_privChris Wilson1-367/+354
Use the local uncore accessors for the GT rather than using the [not-so] magic global dev_priv mmio routines. In the process, we also teach the perf stream to use backpointers to the i915_perf rather than digging it out of dev_priv. v2: Rebase onto i915_perf_types.h Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20191007140812.10963-1-chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20191007210942.18145-1-chris@chris-wilson.co.uk
2019-10-04drm/i915: Move context management under GEMChris Wilson1-7/+17
Keep track of the GEM contexts underneath i915->gem.contexts and assign them their own lock for the purposes of list management. v2: Focus on lock tracking; ctx->vm is protected by ctx->mutex v3: Correct split with removal of logical HW ID Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-15-chris@chris-wilson.co.uk
2019-10-04drm/i915: Remove logical HW IDChris Wilson1-14/+11
With the introduction of ctx->engines[] we allow multiple logical contexts to be used on the same engine (e.g. with virtual engines). According to bspec, aach logical context requires a unique tag in order for context-switching to occur correctly between them. [Simple experiments show that it is not so easy to trick the HW into performing a lite-restore with matching logical IDs, though my memory from early Broadwell experiments do suggest that it should be generating lite-restores.] We only need to keep a unique tag for the active lifetime of the context, and for as long as we need to identify that context. The HW uses the tag to determine if it should use a lite-restore (why not the LRCA?) and passes the tag back for various status identifies. The only status we need to track is for OA, so when using perf, we assign the specific context a unique tag. v2: Calculate required number of tags to fill ELSP. Fixes: 976b55f0e1db ("drm/i915: Allow a context to define its set of engines") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111895 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-14-chris@chris-wilson.co.uk
2019-10-04drm/i915: Pull i915_vma_pin under the vm->mutexChris Wilson1-29/+3
Replace the struct_mutex requirement for pinning the i915_vma with the local vm->mutex instead. Note that the vm->mutex is tainted by the shrinker (we require unbinding from inside fs-reclaim) and so we cannot allocate while holding that mutex. Instead we have to preallocate workers to do allocate and apply the PTE updates after we have we reserved their slot in the drm_mm (using fences to order the PTE writes with the GPU work and with later unbind). In adding the asynchronous vma binding, one subtle requirement is to avoid coupling the binding fence into the backing object->resv. That is the asynchronous binding only applies to the vma timeline itself and not to the pages as that is a more global timeline (the binding of one vma does not need to be ordered with another vma, nor does the implicit GEM fencing depend on a vma, only on writes to the backing store). Keeping the vma binding distinct from the backing store timelines is verified by a number of async gem_exec_fence and gem_exec_schedule tests. The way we do this is quite simple, we keep the fence for the vma binding separate and only wait on it as required, and never add it to the obj->resv itself. Another consequence in reducing the locking around the vma is the destruction of the vma is no longer globally serialised by struct_mutex. A natural solution would be to add a kref to i915_vma, but that requires decoupling the reference cycles, possibly by introducing a new i915_mm_pages object that is own by both obj->mm and vma->pages. However, we have not taken that route due to the overshadowing lmem/ttm discussions, and instead play a series of complicated games with trylocks to (hopefully) ensure that only one destruction path is called! v2: Add some commentary, and some helpers to reduce patch churn. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-09-24drm/i915/selftests: Verify the LRC register layout between init and HWChris Wilson1-20/+15
Before we submit the first context to HW, we need to construct a valid image of the register state. This layout is defined by the HW and should match the layout generated by HW when it saves the context image. Asserting that this should be equivalent should help avoid any undefined behaviour and verify that we haven't missed anything important! Of course, having insisted that the initial register state within the LRC should match that returned by HW, we need to ensure that it does. v2: Drop the RELATIVE_MMIO flag from gen11, we ignore it for constructing the lrc image. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190924145950.3011-1-chris@chris-wilson.co.uk
2019-08-31drm/i915/perf: Assert locking for i915_init_oa_perf_state()Chris Wilson1-0/+3
We use the context->pin_mutex to serialise updates to the OA config and the registers values written into each new context. Document this relationship and assert we do hold the context->pin_mutex as used by gen8_configure_all_contexts() to serialise updates to the OA config itself. v2: Add a white-lie for when we call intel_gt_resume() from init. v3: Lie while we have the context pinned inside atomic reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20190830181929.18663-1-chris@chris-wilson.co.uk
2019-08-27drm/i915/tgl/perf: use the same oa ctx_id format as iclMichel Thierry1-1/+2
Compared to Icelake, Tigerlake's MAX_CONTEXT_HW_ID is smaller by one, but since we just use the upper 32 bits of the lrc_desc, it's guaranteed OA will use the correct one. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190823082055.5992-19-lucas.demarchi@intel.com
2019-08-09drm/i915: extract i915_perf.h from i915_drv.hJani Nikula1-0/+1
It used to be handy that we only had a couple of headers, but over time i915_drv.h has become unwieldy. Extract declarations to a separate header file corresponding to the implementation module, clarifying the modularity of the driver. Ensure the new header is self-contained, and do so with minimal further includes, using forward declarations as needed. Include the new header only where needed, and sort the modified include directives while at it and as needed. No functional changes. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/d7826e365695f691a3ac69a69ff6f2bbdb62700d.1565271681.git.jani.nikula@intel.com
2019-08-07drm/i915/perf: Refactor oa object to better manage resourcesUmesh Nerlige Ramappa1-264/+276
The oa object manages the oa buffer and must be allocated when the user intends to read performance counter snapshots. This can be achieved by making the oa object part of the stream object which is allocated when a stream is opened by the user. Attributes in the oa object that are gen-specific are moved to the perf object so that they can be initialized on driver load. The split provides a better separation of the objects used in perf implementation of i915 driver so that resources are allocated and initialized only when needed. v2: Fix checkpatch warnings v3: Addressed Lionel's review comment v4: Rebase v5: Fix rebase/merge issue with ratelimit_state_init Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190806233002.984-1-umesh.nerlige.ramappa@intel.com
2019-08-06drm/i915/gt: Move the [class][inst] lookup for engines onto the GTChris Wilson1-2/+1
To maintain a fast lookup from a GT centric irq handler, we want the engine lookup tables on the intel_gt. To avoid having multiple copies of the same multi-dimension lookup table, move the generic user engine lookup into an rbtree (for fast and flexible indexing). v2: Split uabi_instance cf uabi_class v3: Set uabi_class/uabi_instance after collating all engines to provide a stable uabi across parallel unordered construction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> #v2 Link: https://patchwork.freedesktop.org/patch/msgid/20190806124300.24945-2-chris@chris-wilson.co.uk
2019-07-31drm/i915: Avoid ce->gem_context->i915Chris Wilson1-1/+1
My plan for the future is to have kernel contexts not to have a GEM context backpointer (as they will not belong to any GEM context). In a few places, we use ce->gem_context to simply obtain the i915 backpointer, for which we can use ce->engine->i915 instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190730163441.16477-1-chris@chris-wilson.co.uk
2019-07-29Merge drm/drm-next into drm-intel-next-queuedRodrigo Vivi1-5/+3
Catching up with 5.3-rc* Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2019-07-26drm/i915/perf: Initialise err to 0 before looping over ce->enginesChris Wilson1-26/+35
Smatch warning that the loop may be empty causing us to check err before it had been set. Ensure that it is initialised to 0, just in case. v2: Refactor the inner loop for better scooping and clarity Fixes: a9877da2d629 ("drm/i915/oa: Reconfigure contexts on the fly") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190726131458.8310-1-chris@chris-wilson.co.uk
2019-07-18proc/sysctl: add shared variables for range checkMatteo Croce1-5/+3
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-17drm/i915/oa: Reconfigure contexts on the flyChris Wilson1-58/+185
Avoid a global idle barrier by reconfiguring each context by rewriting them with MI_STORE_DWORD from the kernel context. v2: We only need to determine the desired register values once, they are the same for all contexts. v3: Don't remove the kernel context from the list of known GEM contexts; the world is not ready for that yet. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190716213443.9874-1-chris@chris-wilson.co.uk
2019-07-10drm/i915/perf: add missing delay for OA muxes configurationLionel Landwerlin1-21/+28
This was dropped from the original patch series, we weren't sure whether it was needed at the time. More recent tests show it's definitely needed to have acurate performance data. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 19f81df2859eb1 ("drm/i915/perf: Add OA unit support for Gen 8+") Acked-by: Chris Wilson <chris@chris-wilson.co.uk> [ickle: combine duplicate code and comments] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190710105524.23017-1-chris@chris-wilson.co.uk
2019-07-09drm/i915/perf: ensure we keep a reference on the driverLionel Landwerlin1-0/+8
The i915 perf stream has its own file descriptor and is tied to reference of the driver. We haven't taken care of keep the driver alive. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Fixes: eec688e1420da5 ("drm/i915: Add i915 perf infrastructure") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190709123351.5645-2-lionel.g.landwerlin@intel.com
2019-06-26drm/i915: Move OA files to separate folderMichal Wajdeczko1-14/+14
OA files look to be auto-generated so we can keep them all in dedicated subdirectory. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190626123826.39760-1-michal.wajdeczko@intel.com
2019-06-25drm/i915/perf: fix ICL perf register offsetsLionel Landwerlin1-3/+7
We got the wrong offsets (could they have changed?). New values were computed off an error state by looking up the register offset in the context image as written by the HW. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 1de401c08fa805 ("drm/i915/perf: enable perf support on ICL") Acked-by: Kenneth Graunke <kenneth@whitecape.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190610081914.25428-1-lionel.g.landwerlin@intel.com
2019-06-14drm/i915: update rpm_get/put to use the rpm structureDaniele Ceraolo Spurio1-3/+3
The functions where internally already only using the structure, so we need to just flip the interface. v2: rebase Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190613232156.34940-7-daniele.ceraolospurio@intel.com
2019-06-10drm/i915/perf: fix whitelist on Gen10+Lionel Landwerlin1-0/+1
Gen10 added an additional NOA_WRITE register (high bits) and we forgot to whitelist it for userspace. Fixes: 95690a02fb5d96 ("drm/i915/perf: enable perf support on CNL") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190601225845.12600-1-lionel.g.landwerlin@intel.com
2019-05-28drm/i915: Move more GEM objects under gem/Chris Wilson1-0/+2
Continuing the theme of separating out the GEM clutter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk
2019-05-28drm/i915: Move shmem object setup to its own fileChris Wilson1-1/+1
Split the plain old shmem object into its own file to start decluttering i915_gem.c v2: Lose the confusing, hysterical raisins, suffix of _gtt. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-4-chris@chris-wilson.co.uk
2019-04-26drm/i915: Switch back to an array of logical per-engine HW contextsChris Wilson1-34/+46
We switched to a tree of per-engine HW context to accommodate the introduction of virtual engines. However, we plan to also support multiple instances of the same engine within the GEM context, defeating our use of the engine as a key to looking up the HW context. Just allocate a logical per-engine instance and always use an index into the ctx->engines[]. Later on, this ctx->engines[] may be replaced by a user specified map. v2: Add for_each_gem_engine() helper to iterator within the engines lock v3: intel_context_create_request() helper v4: s/unsigned long/unsigned int/ 4 billion engines is quite enough. v5: Push iterator locking to caller Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-7-chris@chris-wilson.co.uk
2019-04-26drm/i915: Export intel_context_instance()Chris Wilson1-7/+14
We want to pass in a intel_context into intel_context_pin() and that requires us to first be able to lookup the intel_context! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-2-chris@chris-wilson.co.uk
2019-04-24drm/i915: Pass intel_context to i915_request_create()Chris Wilson1-1/+1
Start acquiring the logical intel_context and using that as our primary means for request allocation. This is the initial step to allow us to avoid requiring struct_mutex for request allocation along the perma-pinned kernel context, but it also provides a foundation for breaking up the complex request allocation to handle different scenarios inside execbuf. For the purpose of emitting a request from inside retirement (see the next patch for engine power management), we also need to lift control over the timeline mutex to the caller. v2: Note that the request carries the active reference upon construction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-4-chris@chris-wilson.co.uk
2019-04-24drm/i915: Move GraphicsTechnology files under gt/Chris Wilson1-1/+2
Start partitioning off the code that talks to the hardware (GT) from the uapi layers and move the device facing code under gt/ One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to subdivide that header and body further (and split out the submission code from the ringbuffer and logical context handling). This patch aims to be simple motion so git can fixup inflight patches with little mess. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk
2019-04-24drm/i915: Store the default sseu setup on the engineChris Wilson1-1/+1
As we push for better compartmentalisation, it is more convenient to copy the default sseu configuration from the engine into the derived logical context, than it is to dig it out from i915->runtime_info. v2: Use intel_sseu_from_device_info() to describe the converter Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424095134.30249-1-chris@chris-wilson.co.uk