aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/hwtracing (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-03-11coresight: Drop unused 'none' enum value for each componentAnshuman Khandual1-3/+0
CORESIGHT_DEV_TYPE_NONE/CORESIGHT_DEV_SUBTYPE_XXXX_NONE values are not used any where. Actual enumeration can start from 0. Just drop these unused enum values. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1645005118-10561-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2022-03-11coresight: etm3x: Don't trace PID for non-root PID namespaceLeo Yan1-0/+4
ETMv3 driver enables PID tracing by directly using perf config from userspace, this means the tracer will capture PID packets from root namespace but the profiling session runs in non-root PID namespace. Finally, the recorded packets can mislead perf reporting with the mismatched PID values. This patch changes to only enable PID tracing for root PID namespace. Note, the hardware supports VMID tracing from ETMv3.5, but the driver never enables VMID trace, this patch doesn't handle VMID trace (bit 30 in ETMCR register) particularly. Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220204152403.71775-5-leo.yan@linaro.org
2022-03-11coresight: etm4x: Don't trace PID for non-root PID namespaceLeo Yan1-2/+8
When runs in perf mode, the driver always enables the PID tracing. This can lead confusion when the profiling session runs in non-root PID namespace, whereas it records the PIDs from the root PID namespace. To avoid confusion for PID tracing, when runs in perf mode, this patch changes to only enable PID tracing for root PID namespace. As result, after apply this patch, the perf tool reports PID as '-1' for all samples: # unshare --fork --pid perf record -e cs_etm// -m 64K,64K -a \ -o perf_test.data -- uname # perf report -i perf_test.data --itrace=Zi1000i --stdio # Total Lost Samples: 0 # # Samples: 94 of event 'instructions' # Event count (approx.): 94000 # # Overhead Command Shared Object Symbol # ........ ....... ................. .............................. # 68.09% :-1 [kernel.kallsyms] [k] __sched_text_end 3.19% :-1 [kernel.kallsyms] [k] hrtimer_interrupt 2.13% :-1 [kernel.kallsyms] [k] __bitmap_and 2.13% :-1 [kernel.kallsyms] [k] trace_vbprintk 1.06% :-1 [kernel.kallsyms] [k] __fget_files 1.06% :-1 [kernel.kallsyms] [k] __schedule 1.06% :-1 [kernel.kallsyms] [k] __softirqentry_text_start 1.06% :-1 [kernel.kallsyms] [k] __update_load_avg_cfs_rq 1.06% :-1 [kernel.kallsyms] [k] __update_load_avg_se 1.06% :-1 [kernel.kallsyms] [k] arch_counter_get_cntpct 1.06% :-1 [kernel.kallsyms] [k] check_and_switch_context 1.06% :-1 [kernel.kallsyms] [k] format_decode 1.06% :-1 [kernel.kallsyms] [k] handle_percpu_devid_irq 1.06% :-1 [kernel.kallsyms] [k] irq_enter_rcu 1.06% :-1 [kernel.kallsyms] [k] irqtime_account_irq 1.06% :-1 [kernel.kallsyms] [k] ktime_get 1.06% :-1 [kernel.kallsyms] [k] ktime_get_coarse_real_ts64 1.06% :-1 [kernel.kallsyms] [k] memmove 1.06% :-1 [kernel.kallsyms] [k] perf_ioctl 1.06% :-1 [kernel.kallsyms] [k] perf_output_begin 1.06% :-1 [kernel.kallsyms] [k] perf_output_copy 1.06% :-1 [kernel.kallsyms] [k] profile_tick 1.06% :-1 [kernel.kallsyms] [k] sched_clock 1.06% :-1 [kernel.kallsyms] [k] timerqueue_add 1.06% :-1 [kernel.kallsyms] [k] trace_save_cmdline 1.06% :-1 [kernel.kallsyms] [k] update_load_avg 1.06% :-1 [kernel.kallsyms] [k] vbin_printf Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220204152403.71775-4-leo.yan@linaro.org
2022-03-11coresight: etm4x: Don't use virtual contextID for non-root PID namespaceLeo Yan1-0/+28
As commented in the function ctxid_pid_store(), it can cause the PID values mismatching between context ID tracing and PID allocated in a non-root namespace. For this reason, when a process runs in non-root PID namespace, the driver doesn't allow PID tracing and returns failure when access contextID related sysfs nodes. VMID works for virtual contextID when the kernel runs in EL2 mode with VHE; on the other hand, the driver doesn't prevent users from accessing it when programs run in the non-root namespace. Thus this can lead to same issues with contextID described above. This patch imposes the checking on VMID related sysfs knobs and returns failure if current process runs in non-root PID namespace. Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220204152403.71775-3-leo.yan@linaro.org
2022-03-11coresight: etm4x: Add lock for reading virtual context ID comparatorLeo Yan1-0/+2
Updates to the values and the index are protected via the spinlock. Ensure we use the same lock to read the value safely. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220204152403.71775-2-leo.yan@linaro.org
2022-03-11coresight: trbe: Move check for kernel page table isolation from EL0 to probeSudeep Holla1-5/+6
Currently with the check present in the module initialisation, it shouts on all the systems irrespective of presence of coresight trace buffer extensions. Similar to Arm SPE perf driver, move the check for kernel page table isolation from EL0 to the device probe stage instead of the module initialisation so that it complains only on the systems that support TRBE. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: coresight@lists.linaro.org Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220203190159.3145272-1-sudeep.holla@arm.com Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2022-03-11coresight: no-op refactor to make INSTP0 check more idiomaticJames Clark1-1/+1
The spec says this: P0 tracing support field. The permitted values are: 0b00 Tracing of load and store instructions as P0 elements is not supported. 0b11 Tracing of load and store instructions as P0 elements is supported, so TRCCONFIGR.INSTP0 is supported. All other values are reserved. The value we are looking for is 0b11 so simplify this. The double read and && was a bit obfuscated. Suggested-by: Suzuki Poulose <suzuki.poulose@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20220203115336.119735-2-james.clark@arm.com Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2022-03-11hwtracing: coresight: Replace acpi_bus_get_device()Rafael J. Wysocki1-4/+4
Replace acpi_bus_get_device() that is going to be dropped with acpi_fetch_acpi_dev(). No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/5790600.lOV4Wx5bFT@kreacher Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2022-03-11coresight: syscfg: Fix memleak on registration failure in cscfg_create_deviceMiaoqian Lin1-1/+1
device_register() calls device_initialize(), according to doc of device_initialize: Use put_device() to give up your reference instead of freeing * @dev directly once you have called this function. To prevent potential memleak, use put_device() for error handling. Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Fixes: 85e2414c518a ("coresight: syscfg: Initial coresight system configuration") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220124124121.8888-1-linmq006@gmail.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2022-03-11coresight: Fix TRCCONFIGR.QE sysfs interfaceJames Clark1-2/+6
It's impossible to program a valid value for TRCCONFIGR.QE when TRCIDR0.QSUPP==0b10. In that case the following is true: Q element support is implemented, and only supports Q elements without instruction counts. TRCCONFIGR.QE can only take the values 0b00 or 0b11. Currently the low bit of QSUPP is checked to see if the low bit of QE can be written to, but as you can see when QSUPP==0b10 the low bit is cleared making it impossible to ever write the only valid value of 0b11 to QE. 0b10 would be written instead, which is a reserved QE value even for all values of QSUPP. The fix is to allow writing the low bit of QE for any non zero value of QSUPP. This change also ensures that the low bit is always set, even when the user attempts to only set the high bit. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Fixes: d8c66962084f ("coresight-etm4x: Controls pertaining to the reset, mode, pe and events") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220120113047.2839622-2-james.clark@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2022-03-11coresight: trbe: Work around the trace data corruptionAnshuman Khandual1-0/+12
TRBE implementations affected by Arm erratum #1902691 might corrupt trace data or deadlock, when it's being written into the memory. Workaround this problem in the driver, by preventing TRBE initialization on affected cpus. The firmware must have disabled the access to TRBE for the kernel on such implementations. This will cover the kernel for any firmware that doesn't do this already. This just updates the TRBE driver as required. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-doc@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1643120437-14352-8-git-send-email-anshuman.khandual@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2022-03-11coresight: trbe: Work around the invalid prohibited statesAnshuman Khandual1-12/+36
TRBE implementations affected by Arm erratum #2038923 might get TRBE into an inconsistent view on whether trace is prohibited within the CPU. As a result, the trace buffer or trace buffer state might be corrupted. This happens after TRBE buffer has been enabled by setting TRBLIMITR_EL1.E, followed by just a single context synchronization event before execution changes from a context, in which trace is prohibited to one where it isn't, or vice versa. In these mentioned conditions, the view of whether trace is prohibited is inconsistent between parts of the CPU, and the trace buffer or the trace buffer state might be corrupted. Work around this problem in the TRBE driver by preventing an inconsistent view of whether the trace is prohibited or not based on TRBLIMITR_EL1.E by immediately following a change to TRBLIMITR_EL1.E with at least one ISB instruction before an ERET, or two ISB instructions if no ERET is to take place. This just updates the TRBE driver as required. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-doc@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1643120437-14352-7-git-send-email-anshuman.khandual@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2022-03-11coresight: trbe: Work around the ignored system register writesAnshuman Khandual2-24/+38
TRBE implementations affected by Arm erratum #2064142 might fail to write into certain system registers after the TRBE has been disabled. Under some conditions after TRBE has been disabled, writes into certain TRBE registers TRBLIMITR_EL1, TRBPTR_EL1, TRBBASER_EL1, TRBSR_EL1 and TRBTRG_EL1 will be ignored and not be effected. Work around this problem in the TRBE driver by executing TSB CSYNC and DSB just after the trace collection has stopped and before performing a system register write to one of the affected registers. This just updates the TRBE driver as required. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-doc@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1643120437-14352-6-git-send-email-anshuman.khandual@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2022-02-17treewide: Replace zero-length arrays with flexible-array membersGustavo A. R. Silva1-1/+1
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. This code was transformed with the help of Coccinelle: (next-20220214$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch) @@ identifier S, member, array; type T1, T2; @@ struct S { ... T1 member; T2 array[ - 0 ]; }; UAPI and wireless changes were intentionally excluded from this patch and will be sent out separately. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/78 Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2022-02-13intel_th: msu: Use memset_startat() for clearing hw headerKees Cook1-3/+1
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Use memset_startat() so memset() doesn't get confused about writing beyond the destination member that is intended to be the starting point of zeroing through the end of the struct. Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/lkml/87sfyzi97l.fsf@ashishki-desk.ger.corp.intel.com Signed-off-by: Kees Cook <keescook@chromium.org>
2021-12-13coresight: core: Fix typo in a commentJason Wang1-1/+1
The double `the' in the comment in line 732 is repeated. Remove one of them from the comment. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Link: https://lore.kernel.org/r/20211211090221.241529-1-wangborong@cdjrlc.com [Fixed capital letter in title] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-11-26coresight: configfs: Allow configfs to activate configurationMike Leach5-28/+186
Adds configfs attributes to allow a configuration to be enabled for use when sysfs is used to control CoreSight. perf retains independent enabling of configurations. Signed-off-by: Mike Leach <mike.leach@linaro.org> Link: https://lore.kernel.org/r/20211124200038.28662-6-mike.leach@linaro.org Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-11-26coresight: syscfg: Update load API for config loadable modulesMike Leach2-1/+39
CoreSight configurations and features can be added as kernel loadable modules. This patch updates the load owner API to ensure that the module cannot be unloaded either: 1) if the config it supplies is in use 2) if the module is not the last in the load order list. Signed-off-by: Mike Leach <mike.leach@linaro.org> Link: https://lore.kernel.org/r/20211124200038.28662-4-mike.leach@linaro.org Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-11-26coresight: configuration: Update API to permit dynamic load/unloadMike Leach5-0/+160
Expand the configuration API to allow dynamic runtime load and unload of configurations and features. On load, configurations and features are tagged with a "load owner" that is used to determine sets that were loaded together in a given API call. To unload the API uses the load owner to unload all elements previously loaded by that owner. The API also records the order in which different owners loaded their elements into the system. Later loading configurations can use previously loaded features, creating load dependencies. Therefore unload is enforced strictly in the reverse order to load. A load owner will be an additional loadable module, or a configuration loaded via configfs. Signed-off-by: Mike Leach <mike.leach@linaro.org> Link: https://lore.kernel.org/r/20211124200038.28662-3-mike.leach@linaro.org Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-11-26coresight: configuration: Update API to introduce load owner conceptMike Leach4-8/+56
Update the existing load API to introduce a "load owner" concept. This allows the tracking of the loaded configurations and features against the loading owner type, to allow later unload according to owner. A list of loaded configurations by owner is created. The load owner infrastructure will be used in following patches to implement dynanic load and unload, alongside dependency tracking. Signed-off-by: Mike Leach <mike.leach@linaro.org> Link: https://lore.kernel.org/r/20211124200038.28662-2-mike.leach@linaro.org Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-11-16coresight: Use devm_bitmap_zalloc when applicableChristophe JAILLET1-7/+3
'drvdata->chs.guaranteed' is a bitmap. So use 'devm_bitmap_kzalloc()' to simplify code, improve the semantic and avoid some open-coded arithmetic in allocator arguments. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/a4b8454f560b70cedf0e4d06275787f08d576ee5.1635964610.git.christophe.jaillet@wanadoo.fr Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Work around write to out of rangeSuzuki K Poulose1-1/+62
TRBE implementations affected by Arm erratum (2253138 or 2224489), could write to the next address after the TRBLIMITR.LIMIT, instead of wrapping to the TRBBASER. This implies that the TRBE could potentially corrupt : - A page used by the rest of the kernel/user (if the LIMIT = end of perf ring buffer) - A page within the ring buffer, but outside the driver's range. [head, head + size]. This may contain some trace data, may be consumed by the userspace. We workaround this erratum by : - Making sure that there is at least an extra PAGE space left in the TRBE's range than we normally assign. This will be additional to other restrictions (e.g, the TRBE alignment for working around TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE. Thus we would have 2 * PAGE_SIZE) - Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by : TRBLIMITR.LIMIT -= PAGE_SIZE Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-14-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Make sure we have enough spaceSuzuki K Poulose1-1/+5
The TRBE driver makes sure that there is enough space for a meaningful run, otherwise pads the given space and restarts the offset calculation once. But there is no guarantee that we may find space or hit "no space". Make sure that we repeat the step until, either : - We have the minimum space OR - There is NO space at all. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-13-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Add a helper to determine the minimum buffer sizeSuzuki K Poulose1-1/+6
For the TRBE to operate, we need a minimum space available to collect meaningful trace session. This is currently a few bytes, but we may need to extend this for working around errata. So, abstract this into a helper function. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-12-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Workaround TRBE errata overwrite in FILL modeSuzuki K Poulose1-11/+157
ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from an erratum, which when triggered, might cause the TRBE to overwrite the trace data already collected in FILL mode, in the event of a WRAP. i.e, the TRBE doesn't stop writing the data, instead wraps to the base and could write upto 3 cache line size worth trace. Thus, this could corrupt the trace at the "BASE" pointer. The workaround is to program the write pointer 256bytes from the base, such that if the erratum is triggered, it doesn't overwrite the trace data that was captured. This skipped region could be padded with ignore packets at the end of the session, so that the decoder sees a continuous buffer with some padding at the beginning. The trace data written at the base is considered lost as the limit could have been in the middle of the perf ring buffer, and jumping to the "base" is not acceptable. We set the flags already to indicate that some amount of trace was lost during the FILL event IRQ. So this is fine. One important change with the work around is, we program the TRBBASER_EL1 to current page where we are allowed to write. Otherwise, it could overwrite a region that may be consumed by the perf. Towards this, we always make sure that the "handle->head" and thus the trbe_write is PAGE_SIZE aligned, so that we can set the BASE to the PAGE base and move the TRBPTR to the 256bytes offset. Cc: Mike Leach <mike.leach@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-11-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Add infrastructure for Errata handlingSuzuki K Poulose1-1/+68
Add a minimal infrastructure to keep track of the errata affecting the given TRBE instance. Given that we have heterogeneous CPUs, we have to manage the list per-TRBE instance to be able to apply the work around as needed. Thus we will need to check if individual CPUs are affected by the erratum. We rely on the arm64 errata framework for the actual description and the discovery of a given erratum, to keep the Erratum work around at a central place and benefit from the code and the advertisement from the kernel. Though we could reuse the "this_cpu_has_cap()" to apply an erratum work around, it is a bit of a heavy operation, as it must go through the "erratum" detection check on the CPU every time it is called (e.g, scanning through a table of affected MIDRs). Since we need to do this check for every session, may be multiple times (depending on the wrok around), we could save the cycles by caching the affected errata per-CPU instance in the per-CPU struct trbe_cpudata. Since we are only interested in the errata affecting the TRBE driver, we only need to track a very few of them per-CPU. Thus we use a local mapping of the CPUCAP for the erratum to avoid bloating up a bitmap for trbe_cpudata. i.e, each arm64 TRBE erratum bit is assigned a "index" within the driver to track. Each trbe instance updates the list of affected erratum at probe time on the CPU. This makes sure that we can easily access the list of errata on a given TRBE instance without much overhead. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-10-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Allow driver to choose a different alignmentSuzuki K Poulose1-3/+15
The TRBE hardware mandates a minimum alignment for the TRBPTR_EL1, advertised via the TRBIDR_EL1. This is used by the driver to align the buffer write head. This patch allows the driver to choose a different alignment from that of the hardware, by decoupling the alignment tracking. This will be useful for working around errata. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-9-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Decouple buffer base from the hardware baseSuzuki K Poulose1-4/+14
We always set the TRBBASER_EL1 to the base of the virtual ring buffer. We are about to change this for working around an erratum. So, in preparation to that, allow the driver to choose a different base for the TRBBASER_EL1 (which is within the buffer range). Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-8-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Add a helper to pad a given buffer areaSuzuki K Poulose1-1/+7
Refactor the helper to pad a given AUX buffer area to allow "filling" ignore packets, without moving any handle pointers. This will be useful in working around errata, where we may have to fill the buffer after a session. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-7-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Add a helper to calculate the trace generatedSuzuki K Poulose1-18/+29
We collect the trace from the TRBE on FILL event from IRQ context and via update_buffer(), when the event is stopped. Let us consolidate how we calculate the trace generated into a helper. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-6-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Defer the probe on offline CPUsSuzuki K Poulose1-1/+7
If a CPU is offline during the driver init, we could end up causing a kernel crash trying to register the coresight device for the TRBE instance. The trbe_cpudata for the TRBE instance is initialized only when it is probed. Otherwise, we could end up dereferencing a NULL cpudata->drvdata. e.g: [ 0.149999] coresight ete0: CPU0: ete v1.1 initialized [ 0.149999] coresight-etm4x ete_1: ETM arch init failed [ 0.149999] coresight-etm4x: probe of ete_1 failed with error -22 [ 0.150085] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050 [ 0.150085] Mem abort info: [ 0.150085] ESR = 0x96000005 [ 0.150085] EC = 0x25: DABT (current EL), IL = 32 bits [ 0.150085] SET = 0, FnV = 0 [ 0.150085] EA = 0, S1PTW = 0 [ 0.150085] Data abort info: [ 0.150085] ISV = 0, ISS = 0x00000005 [ 0.150085] CM = 0, WnR = 0 [ 0.150085] [0000000000000050] user address but active_mm is swapper [ 0.150085] Internal error: Oops: 96000005 [#1] PREEMPT SMP [ 0.150085] Modules linked in: [ 0.150085] Hardware name: FVP Base RevC (DT) [ 0.150085] pstate: 00800009 (nzcv daif -PAN +UAO -TCO BTYPE=--) [ 0.150155] pc : arm_trbe_register_coresight_cpu+0x74/0x144 [ 0.150155] lr : arm_trbe_register_coresight_cpu+0x48/0x144 ... [ 0.150237] Call trace: [ 0.150237] arm_trbe_register_coresight_cpu+0x74/0x144 [ 0.150237] arm_trbe_device_probe+0x1c0/0x2d8 [ 0.150259] platform_drv_probe+0x94/0xbc [ 0.150259] really_probe+0x1bc/0x4a8 [ 0.150266] driver_probe_device+0x7c/0xb8 [ 0.150266] device_driver_attach+0x6c/0xac [ 0.150266] __driver_attach+0xc4/0x148 [ 0.150266] bus_for_each_dev+0x7c/0xc8 [ 0.150266] driver_attach+0x24/0x30 [ 0.150266] bus_add_driver+0x100/0x1e0 [ 0.150266] driver_register+0x78/0x110 [ 0.150266] __platform_driver_register+0x44/0x50 [ 0.150266] arm_trbe_init+0x28/0x84 [ 0.150266] do_one_initcall+0x94/0x2bc [ 0.150266] do_initcall_level+0xa4/0x158 [ 0.150266] do_initcalls+0x54/0x94 [ 0.150319] do_basic_setup+0x24/0x30 [ 0.150319] kernel_init_freeable+0xe8/0x14c [ 0.150319] kernel_init+0x14/0x18c [ 0.150319] ret_from_fork+0x10/0x30 [ 0.150319] Code: f94012c8 b0004ce2 9134a442 52819801 (f9402917) [ 0.150319] ---[ end trace d23e0cfe5098535e ]--- [ 0.150346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Fix this by skipping the step, if we are unable to probe the CPU. Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver") Reported-by: Bransilav Rankov <branislav.rankov@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: stable <stable@vger.kernel.org> Tested-by: Branislav Rankov <branislav.rankov@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20211014142238.2221248-1-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Fix incorrect access of the sink specific dataSuzuki K Poulose1-1/+1
The TRBE driver wrongly treats the aux private data as the TRBE driver specific buffer for a given perf handle, while it is the ETM PMU's event specific data. Fix this by correcting the instance to use appropriate helper. Cc: stable <stable@vger.kernel.org> Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver") Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210921134121.2423546-2-suzuki.poulose@arm.com [Fixed 13 character SHA down to 12] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: etm4x: Add ETM PID for Kryo-5XXTao Zhang1-0/+1
Add ETM PID for Kryo-5XX to the list of supported ETMs. Otherwise, Kryo-5XX ETMs will not be initialized successfully. e.g. This change can be verified on qrb5165-rb5 board. ETM4-ETM7 nodes will not be visible without this change. Signed-off-by: Tao Zhang <quic_taozha@quicinc.com> Link: https://lore.kernel.org/r/1632477981-13632-2-git-send-email-quic_taozha@quicinc.com Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Prohibit trace before disabling TRBESuzuki K Poulose2-1/+12
When the TRBE generates an IRQ, we stop the TRBE, collect the trace and then reprogram the TRBE with the updated buffer pointers, whenever possible. We might also leave the TRBE disabled, if there is not enough space left in the buffer. However, we do not touch the ETE at all during all of this. This means the ETE is only disabled when the event is disabled later (via irq_work). This is incorrect, as the ETE trace is still ON without actually being captured and may be routed to the ATB (even if it is for a short duration). So, we move the CPU into trace prohibited state always before disabling the TRBE, upon entering the IRQ handler. The state is restored if the TRBE is enabled back. Otherwise the trace remains prohibited. Since, the ETM/ETE driver now controls the TRFCR_EL1 per session, the tracing can be restored/enabled back when the event is rescheduled in. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210923143919.2944311-6-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: End the AUX handle on truncationSuzuki K Poulose1-10/+16
When we detect that there isn't enough space left to start a meaningful session, we disable the TRBE, marking the buffer as TRUNCATED. But we delay the notification to the perf layer by perf_aux_output_end() until the event is scheduled out, triggered from the kernel perf layer. This will cause significant black outs in the trace. Now that the CoreSight PMU layer can handle a closed "AUX" handle properly, we can close the handle as soon as we detect the case, allowing the userspace to collect and re-enable the event. Also, while in the IRQ handler, move the irq_work_run() after we have updated the handle, to make sure the "TRUNCATED" flag causes the event to be disabled as soon as possible. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210923143919.2944311-5-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Do not truncate buffer on IRQSuzuki K Poulose1-6/+21
The TRBE driver marks the AUX buffer as TRUNCATED when we get an IRQ on FILL event. This has rather unwanted side-effect of the event being disabled when there may be more space in the ring buffer. So, instead of TRUNCATE we need a different flag to indicate that the trace may have lost a few bytes (i.e from the point of generating the FILL event until the IRQ is consumed). Anyways, the userspace must use the size from RECORD_AUX headers to restrict the "trace" decoding. Using PARTIAL flag causes the perf tool to generate the following warning: Warning: AUX data had gaps in it XX times out of YY! Are you running a KVM guest in the background? which is pointlessly scary for a user. The other remaining options are : - COLLISION - Use by SPE to indicate samples collided - Add a new flag - Specifically for CoreSight, doesn't sound so good, if we can re-use something. Given that we don't already use the "COLLISION" flag, the above behavior can be notified using this flag for CoreSight. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: James Clark <james.clark@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210923143919.2944311-4-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Fix handling of spurious interruptsSuzuki K Poulose1-13/+9
On a spurious IRQ, right now we disable the TRBE and then re-enable it back, resetting the "buffer" pointers(i.e BASE, LIMIT and more importantly WRITE) to the original pointers from the AUX handle. This implies that we overwrite any trace that was written so far, (by overwriting TRBPTR) while we should have ignored the IRQ. On detecting a spurious IRQ after examining the TRBSR we simply re-enable the TRBE without touching the other parameters. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210923143919.2944311-3-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: irq handler: Do not disable TRBE if no action is neededSuzuki K Poulose1-6/+6
The IRQ handler of the TRBE driver could race against the update_buffer() in consuming the IRQ. So, if the update_buffer() gets to processing the TRBE irq, the TRBSR will be cleared. Thus by the time IRQ handler is triggered, there is nothing to do there. Handle these cases and do not disable the TRBE unnecessarily. Since the TRBSR can be read without stopping the TRBE, we can check that before disabling the TRBE. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210923143919.2944311-2-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Unify the enabling sequenceSuzuki K Poulose1-19/+18
Unify the sequence of enabling the TRBE. We do this from event_start and also from the TRBE IRQ handler. Lets move this to a common helper. The only minor functional change is returning an error when we fail to enable the TRBE. This should be handled already. Since we now have unique entry point to trying to enable TRBE, move the format flag setting to the central place. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-9-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Drop duplicate TRUNCATE flagsSuzuki K Poulose1-6/+6
We mark the buffer as TRUNCATED when there is no space left in the buffer. But we do it at different points. __trbe_normal_offset() and also, at all the callers of the above function via compute_trbe_buffer_limit(), when the limit == base (i.e offset = 0 as returned by the __trbe_normal_offset()). So, given that the callers already mark the buffer as TRUNCATED drop the caller inside the __trbe_normal_offset(). This is in preparation to moving the handling of TRUNCATED into a central place. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-6-suzuki.poulose@arm.com [Moved comment as Anshuman requested] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: trbe: Ensure the format flag is always setSuzuki K Poulose1-4/+3
When the TRBE is stopped on truncating an event, we may not set the FORMAT flag, even though the size of the record is 0. Let us be consistent and not confuse the user. To ensure that the format flag is always set on all the records generated by TRBE, set the flag when we have a new handle. Rather than deferring to the "end" operation, which makes it clear. So, we can do this from - arm_trbe_enable() -> When a new handle is provided by the CoreSight PMU, triggered via etm_event_start() - trbe_handle_overflow() -> When we begin a new handle after closing the previous on overflow. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-5-suzuki.poulose@arm.com [Fixed inverted words in title] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: etm-pmu: Ensure the AUX handle is validSuzuki K Poulose1-3/+24
The ETM perf infrastructure closes out a handle during event_stop or on an error in starting the event. In either case, it is possible for a "sink" to update/close the handle, under certain circumstances. (e.g no space in ring buffer.). So, ensure that we handle this gracefully in the PMU driver by verifying the handle is still valid. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-4-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: etm4x: Use Trace Filtering controls dynamicallySuzuki K Poulose3-18/+59
The Trace Filtering support (FEAT_TRF) ensures that the ETM can be prohibited from generating any trace for a given EL. This is much stricter knob, than the TRCVICTLR exception level masks, which doesn't prevent the ETM from generating Context packets for an "excluded" EL. At the moment, we do a onetime enable trace at user and kernel and leave it untouched for the kernel life time. This implies that the ETM could potentially generate trace packets containing the kernel addresses, and thus leaking the kernel virtual address in the trace. This patch makes the switch dynamic, by honoring the filters set by the user and enforcing them in the TRFCR controls. We also rename the cpu_enable_tracing() appropriately to cpu_detect_trace_filtering() and the drvdata member trfc => trfcr to indicate the "value" of the TRFCR_EL1. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Al Grant <al.grant@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-3-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: etm4x: Save restore TRFCR_EL1Suzuki K Poulose3-12/+57
When the CPU enters a low power mode, the TRFCR_EL1 contents could be reset. Thus we need to save/restore the TRFCR_EL1 along with the ETM4x registers to allow the tracing. The TRFCR related helpers are in a new header file, as we need to use them for TRBE in the later patches. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-2-suzuki.poulose@arm.com [Fixed cosmetic details] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: Don't immediately close events that are run on invalid CPU/sink combosJames Clark1-6/+23
When a traced process runs on a CPU that can't reach the selected sink, the event will be stopped with PERF_HES_STOPPED. This means that even if the process migrates to a valid CPU, tracing will not resume. This can be reproduced (on N1SDP) by using taskset to start the process on CPU 0, and then switching it to CPU 2 (ETF 1 is only reachable from CPU 2): taskset --cpu-list 0 ./perf record -e cs_etm/@tmc_etf1/ --per-thread -- taskset --cpu-list 2 ls This produces a single 0 length AUX record, and then no more trace: 0x3c8 [0x30]: PERF_RECORD_AUX offset: 0 size: 0 flags: 0x1 [T] After the fix, the same command produces normal AUX records. The perf self test "89: Check Arm CoreSight trace data recording and synthesized samples" no longer fails intermittently. This was because the taskset in the test is after the fork, so there is a period where the task is scheduled on a random CPU rather than forced to a valid one. Specifically selecting an invalid CPU will still result in a failure to open the event because it will never produce trace: ./perf record -C 2 -e cs_etm/@tmc_etf0/ failed to mmap with 12 (Cannot allocate memory) The only scenario that has changed is if the CPU mask has a valid CPU sink combo in it. Testing ======= * Coresight self test passes consistently: ./perf test Coresight * CPU wide mode still produces trace: ./perf record -e cs_etm// -a * Invalid -C options still fail to open: ./perf record -C 2,3 -e cs_etm/@tmc_etf0/ failed to mmap with 12 (Cannot allocate memory) * Migrating a task to a valid sink/CPU now produces trace: taskset --cpu-list 0 ./perf record -e cs_etm/@tmc_etf1/ --per-thread -- taskset --cpu-list 2 ls * If the task remains on an invalid CPU, no trace is emitted: taskset --cpu-list 0 ./perf record -e cs_etm/@tmc_etf1/ --per-thread -- ls Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210922125144.133872-2-james.clark@arm.com Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: tmc-etr: Speed up for bounce buffer in flat modeLeo Yan1-4/+22
The AUX bounce buffer is allocated with API dma_alloc_coherent(), in the low level's architecture code, e.g. for Arm64, it maps the memory with the attribution "Normal non-cacheable"; this can be concluded from the definition for pgprot_dmacoherent() in arch/arm64/include/asm/pgtable.h. Later when access the AUX bounce buffer, since the memory mapping is non-cacheable, it's low efficiency due to every load instruction must reach out DRAM. This patch changes to allocate pages with dma_alloc_noncoherent(), the driver can access the memory via cacheable mapping; therefore, load instructions can fetch data from cache lines rather than always read data from DRAM, the driver can boost memory performance. After using the cacheable mapping, the driver uses dma_sync_single_for_cpu() to invalidate cacheline prior to read bounce buffer so can avoid read stale trace data. By measurement the duration for function tmc_update_etr_buffer() with ftrace function_graph tracer, it shows the performance significant improvement for copying 4MiB data from bounce buffer: # echo tmc_etr_get_data_flat_buf > set_graph_notrace // avoid noise # echo tmc_update_etr_buffer > set_graph_function # echo function_graph > current_tracer before: # CPU DURATION FUNCTION CALLS # | | | | | | | 2) | tmc_update_etr_buffer() { ... 2) # 8148.320 us | } after: # CPU DURATION FUNCTION CALLS # | | | | | | | 2) | tmc_update_etr_buffer() { ... 2) # 2525.420 us | } Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210905032144.966766-1-leo.yan@linaro.org Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: Update comments for removing cs_etm_find_snapshot()Leo Yan3-9/+6
Commit 2f01c200d440 ("perf cs-etm: Remove callback cs_etm_find_snapshot()") has removed the function cs_etm_find_snapshot() from the perf tool in the user space, now CoreSight trace directly uses the perf common function __auxtrace_mmap__read() to calcualte the head and size for AUX trace data in snapshot mode. This patch updates the comments in drivers to make them generic and not stick to any specific function from perf tool. Signed-off-by: Leo Yan <leo.yan@linaro.org> Link: https://lore.kernel.org/r/20210912125748.2816606-3-leo.yan@linaro.org Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: tmc-etr: Use perf_output_handle::head for AUX ring bufferLeo Yan1-7/+3
When enable the Arm CoreSight PMU event, the context for AUX ring buffer is prepared in the structure perf_output_handle, and its field "head" points the head of the AUX ring buffer and it is updated after filling AUX trace data into buffer. Current code uses an extra field etr_perf_buffer::head to maintain the header for the AUX ring buffer which is not necessary; alternatively, it's better to directly use perf_output_handle::head. This patch removes the field etr_perf_buffer::head and directly uses perf_output_handle::head for the head of AUX ring buffer. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210912125748.2816606-2-leo.yan@linaro.org Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: tmc-etf: Add comment for store orderingLeo Yan1-0/+5
Since the function CS_LOCK() has contained memory barrier mb(), it ensures the visibility of the AUX trace data before updating the aux_head, thus it's needless to add any explicit barrier anymore. Add comment to make clear for the barrier usage for ETF. Signed-off-by: Leo Yan <leo.yan@linaro.org> Link: https://lore.kernel.org/r/20210809111407.596077-4-leo.yan@linaro.org Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27coresight: tmc-etr: Add barrier after updating AUX ring bufferLeo Yan1-0/+8
Since a memory barrier is required between AUX trace data store and aux_head store, and the AUX trace data is filled with memcpy(), it's sufficient to use smp_wmb() so can ensure the trace data is visible prior to updating aux_head. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210809111407.596077-3-leo.yan@linaro.org Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>