ARM Cache Coherent Network ========================== CCN-504 is a ring-bus interconnect consisting of 11 crosspoints (XPs), with each crosspoint supporting up to two device ports, so nodes (devices) 0 and 1 are connected to crosspoint 0, nodes 2 and 3 to crosspoint 1 etc. PMU (perf) driver ----------------- The CCN driver registers a perf PMU driver, which provides description of available events and configuration options in sysfs, see /sys/bus/event_source/devices/ccn*. The "format" directory describes format of the config, config1 and config2 fields of the perf_event_attr structure. The "events" directory provides configuration templates for all documented events, that can be used with perf tool. For example "xp_valid_flit" is an equivalent of "type=0x8,event=0x4". Other parameters must be explicitly specified. For events originating from device, "node" defines its index. All crosspoint events require "xp" (index), "port" (device port number) and "vc" (virtual channel ID) and "dir" (direction). Watchpoints (special "event" value 0xfe) also require comparator values ("cmp_l" and "cmp_h") and "mask", being index of the comparator mask. Masks are defined separately from the event description (due to limited number of the config values) in the "cmp_mask" directory, with first 8 configurable by user and additional 4 hardcoded for the most frequent use cases. Cycle counter is described by a "type" value 0xff and does not require any other settings. Example of perf tool use: / # perf list | grep ccn ccn/cycles/ [Kernel PMU event] <...> ccn/xp_valid_flit/ [Kernel PMU event] <...> / # perf stat -C 0 -e ccn/cycles/,ccn/xp_valid_flit,xp=1,port=0,vc=1,dir=1/ \ sleep 1 The driver does not support sampling, therefore "perf record" will not work. Also notice that only single cpu is being selected ("-C 0") - this is because perf framework does not support "non-CPU related" counters (yet?) so system-wide session ("-a") would try (and in most cases fail) to set up the same event per each CPU.