aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/i915/i915_perf.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-03-28drm/i915/perf: remove user triggerable warnMatthew Auld1-2/+6
Don't throw a warning if we are given an invalid property id. While here let's also bring back Robert' original idea of catching unhandled enumeration values at compile time. Fixes: eec688e1420d ("drm/i915: Add i915 perf infrastructure") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Robert Bragg <robert@sixbynine.org> Reviewed-by: Robert Bragg <robert@sixbynine.org> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170327203236.18276-1-matthew.auld@intel.com
2017-03-28drm/i915/perf: destroy stream on sample_flags mismatchMatthew Auld1-1/+2
If we were to ever encounter a sample_flags mismatch we need to ensure we destroy the stream when we bail. Fixes: d79651522e89 ("drm/i915: Enable i915 perf stream for Haswell OA unit") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Robert Bragg <robert@sixbynine.org> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170327203459.18398-1-matthew.auld@intel.com
2017-03-02drm/i915: s/assert_spin_locked/lockdep_assert_held/Chris Wilson1-1/+1
assert_spin_locked() becomes an unconditionally compiled BUG_ON(), adding debug code right into the heart of critical routines like interrupt handlers. text data bss dec hex 1296480 19944 2272 1318696 141f28 before (lockdep disabled) 1295984 19944 2272 1318200 141d38 after 1336261 21139 3208 1360608 14c2e0 before (lockdep enabled) 1339920 21139 3208 1364267 14d12b after Small saving for release; hopefully more instructive in debug. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170302132801.599-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-12-18drm/i915: Simplify releasing context referenceChris Wilson1-12/+4
A few users only take the struct_mutex in order to release a reference to a context. We can expose a kref_put_mutex() wrapper in order to simplify these users, and optimise taking of the mutex to the final unref. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-4-chris@chris-wilson.co.uk
2016-12-18drm/i915: Unify active context tracking between legacy/execlists/gucChris Wilson1-11/+7
The requests conversion introduced a nasty bug where we could generate a new request in the middle of constructing a request if we needed to idle the system in order to evict space for a context. The request to idle would be executed (and waited upon) before the current one, creating a minor havoc in the seqno accounting, as we will consider the current request to already be completed (prior to deferred seqno assignment) but ring->last_retired_head would have been updated and still could allow us to overwrite the current request before execution. We also employed two different mechanisms to track the active context until it was switched out. The legacy method allowed for waiting upon an active context (it could forcibly evict any vma, including context's), but the execlists method took a step backwards by pinning the vma for the entire active lifespan of the context (the only way to evict was to idle the entire GPU, not individual contexts). However, to circumvent the tricky issue of locking (i.e. we cannot take struct_mutex at the time of i915_gem_request_submit(), where we would want to move the previous context onto the active tracker and unpin it), we take the execlists approach and keep the contexts pinned until retirement. The benefit of the execlists approach, more important for execlists than legacy, was the reduction in work in pinning the context for each request - as the context was kept pinned until idle, it could short circuit the pinning for all active contexts. We introduce new engine vfuncs to pin and unpin the context respectively. The context is pinned at the start of the request, and only unpinned when the following request is retired (this ensures that the context is idle and coherent in main memory before we unpin it). We move the engine->last_context tracking into the retirement itself (rather than during request submission) in order to allow the submission to be reordered or unwound without undue difficultly. And finally an ulterior motive for unifying context handling was to prepare for mock requests. v2: Rename to last_retired_context, split out legacy_context tracking for MI_SET_CONTEXT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
2016-12-09drm/i915/perf: More documentation hooked to i915.rstRobert Bragg1-29/+383
This adds a 'Perf' section to i915.rst with the following sub sections: - Overview - Comparison with Core Perf - i915 Driver Entry Points - i915 Perf Stream - i915 Perf Observation Architecture Stream - All i915 Perf Internals v2: section headers in i915.rst (Daniel Vetter) missing symbol docs + other fixups (Matthew Auld) Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161207214033.3581-1-robert@sixbynine.org
2016-12-07drm/i915/perf: use DRM_DEBUG for userspace issuesRobert Bragg1-21/+21
Avoid using DRM_ERROR for conditions userspace can trigger with a bad config when opening a stream or from not reading data in a timely fashion (whereby the OA buffer fills up). These conditions are tested by i-g-t which treats error messages as failures if using the test runner. This wasn't an issue while the i915-perf igt tests were being run in isolation. One message relating to seeing a spurious zeroed report was changed to use DRM_NOTE instead of DRM_ERROR. Ideally this warning shouldn't be seen, but it's not a serious problem if it is. Considering that the tail margin mechanism is only a heuristic it's possible we might see this from time to time. Signed-off-by: Robert Bragg <robert@sixbynine.org: Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161201172152.10893-1-robert@sixbynine.org
2016-12-01drm/i915: Make GEM object create and create from data take dev_privTvrtko Ursulin1-1/+1
Makes all GEM object constructors consistent. v2: Fix compilation in GVT code. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
2016-11-29drm/i915/perf: Wrap 64bit divides in do_div()Chris Wilson1-6/+7
Just a couple of naked 64bit divides causing link errors on 32bit builds, with: ERROR: "__udivdi3" [drivers/gpu/drm/i915/i915.ko] undefined! v2: do_div() is only u64/u32, we need a u32/u64! v3: div_u64() == u64/u32, div64_u64() == u64/u64 Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: d79651522e89 ("drm/i915: Enable i915 perf stream for Haswell OA unit") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Robert Bragg <robert@sixbynine.org> Link: http://patchwork.freedesktop.org/patch/msgid/20161123150714.24449-1-chris@chris-wilson.co.uk Reviewed-by: Robert Bragg <robert@sixbynine.org>
2016-11-22drm/i915: Add a kerneldoc summary for i915_perf.cRobert Bragg1-0/+163
In particular this tries to capture for posterity some of the early challenges we had with using the core perf infrastructure in case we ever want to revisit adapting perf for device metrics. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-12-robert@sixbynine.org
2016-11-22drm/i915: add dev.i915.oa_max_sample_rate sysctlRobert Bragg1-11/+50
The maximum OA sampling frequency is now configurable via a dev.i915.oa_max_sample_rate sysctl parameter. Following the precedent set by perf's similar kernel.perf_event_max_sample_rate the default maximum rate is 100000Hz Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-10-robert@sixbynine.org
2016-11-22drm/i915: Add dev.i915.perf_stream_paranoid sysctl optionRobert Bragg1-1/+49
Consistent with the kernel.perf_event_paranoid sysctl option that can allow non-root users to access system wide cpu metrics, this can optionally allow non-root users to access system wide OA counter metrics from Gen graphics hardware. Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-9-robert@sixbynine.org
2016-11-22drm/i915: advertise available metrics via sysfsRobert Bragg1-0/+52
Each metric set is given a sysfs entry like: /sys/class/drm/card0/metrics/<guid>/id This allows userspace to enumerate the specific sets that are available for the current system. The 'id' file contains an unsigned integer that can be used to open the associated metric set via DRM_IOCTL_I915_PERF_OPEN. The <guid> is a globally unique ID for a specific OA unit register configuration that can be reliably used by userspace as a key to lookup corresponding counter meta data and normalization equations. The guid registry is currently maintained as part of gputop along with the XML metric set descriptions and code generation scripts, ref: https://github.com/rib/gputop > gputop-data/guids.xml > scripts/update-guids.py > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-8-robert@sixbynine.org
2016-11-22drm/i915: Enable i915 perf stream for Haswell OA unitRobert Bragg1-17/+1019
Gen graphics hardware can be set up to periodically write snapshots of performance counters into a circular buffer via its Observation Architecture and this patch exposes that capability to userspace via the i915 perf interface. v2: Make sure to initialize ->specific_ctx_id when opening, without relying on _pin_notify hook, in case ctx already pinned. v3: Revert back to pinning ctx upfront when opening stream, removing need to hook in to pinning and to update OACONTROL on the fly. Signed-off-by: Robert Bragg <robert@sixbynine.org> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-7-robert@sixbynine.org
2016-11-22drm/i915: Add i915 perf infrastructureRobert Bragg1-0/+449
Adds base i915 perf infrastructure for Gen performance metrics. This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64 properties to configure a stream of metrics and returns a new fd usable with standard VFS system calls including read() to read typed and sized records; ioctl() to enable or disable capture and poll() to wait for data. A stream is opened something like: uint64_t properties[] = { /* Single context sampling */ DRM_I915_PERF_PROP_CTX_HANDLE, ctx_handle, /* Include OA reports in samples */ DRM_I915_PERF_PROP_SAMPLE_OA, true, /* OA unit configuration */ DRM_I915_PERF_PROP_OA_METRICS_SET, metrics_set_id, DRM_I915_PERF_PROP_OA_FORMAT, report_format, DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent, }; struct drm_i915_perf_open_param parm = { .flags = I915_PERF_FLAG_FD_CLOEXEC | I915_PERF_FLAG_FD_NONBLOCK | I915_PERF_FLAG_DISABLED, .properties_ptr = (uint64_t)properties, .num_properties = sizeof(properties) / 16, }; int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param); Records read all start with a common { type, size } header with DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records contain an extensible number of fields and it's the DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that determine what's included in every sample. No specific streams are supported yet so any attempt to open a stream will return an error. v2: use i915_gem_context_get() - Chris Wilson v3: update read() interface to avoid passing state struct - Chris Wilson fix some rebase fallout, with i915-perf init/deinit v4: s/DRM_IORW/DRM_IOW/ - Emil Velikov Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-2-robert@sixbynine.org