diff options
Diffstat (limited to '')
-rw-r--r-- | tools/perf/Documentation/perf-c2c.txt | 53 |
1 files changed, 39 insertions, 14 deletions
diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt index 3b6a2c84ea02..5c5eb2def83e 100644 --- a/tools/perf/Documentation/perf-c2c.txt +++ b/tools/perf/Documentation/perf-c2c.txt @@ -19,9 +19,10 @@ C2C stands for Cache To Cache. The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows you to track down the cacheline contentions. -On x86, the tool is based on load latency and precise store facility events +On Intel, the tool is based on load latency and precise store facility events provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling -with thresholding feature. +with thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware +limitations, perf c2c is not supported on Zen3 cpus). These events provide: - memory address of the access @@ -49,7 +50,8 @@ RECORD OPTIONS -l:: --ldlat:: - Configure mem-loads latency. (x86 only) + Configure mem-loads latency. Supported on Intel and Arm64 processors + only. Ignored on other archs. -k:: --all-kernel:: @@ -109,7 +111,9 @@ REPORT OPTIONS -d:: --display:: - Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default. + Switch to HITM type (rmt, lcl) or peer snooping type (peer) to display + and sort on. Total HITMs (tot) as default, except Arm64 uses peer mode + as default. --stitch-lbr:: Show callgraph with stitched LBRs, which may have more complete @@ -133,11 +137,15 @@ Following perf record options are configured by default: -W,-d,--phys-data,--sample-cpu Unless specified otherwise with '-e' option, following events are monitored by -default on x86: +default on Intel: cpu/mem-loads,ldlat=30/P cpu/mem-stores/P +following on AMD: + + ibs_op// + and following on PowerPC: cpu/mem-loads/ @@ -174,12 +182,18 @@ For each cacheline in the 1) list we display following data: Cacheline - cacheline address (hex number) - Rmt/Lcl Hitm + Rmt/Lcl Hitm (Display with HITM types) - cacheline percentage of all Remote/Local HITM accesses - LLC Load Hitm - Total, LclHitm, RmtHitm + Peer Snoop (Display with peer type) + - cacheline percentage of all peer accesses + + LLC Load Hitm - Total, LclHitm, RmtHitm (For display with HITM types) - count of Total/Local/Remote load HITMs + Load Peer - Total, Local, Remote (For display with peer type) + - count of Total/Local/Remote load from peer cache or DRAM + Total records - sum of all cachelines accesses @@ -189,9 +203,10 @@ For each cacheline in the 1) list we display following data: Total stores - sum of all store accesses - Store Reference - L1Hit, L1Miss + Store Reference - L1Hit, L1Miss, N/A L1Hit - store accesses that hit L1 L1Miss - store accesses that missed L1 + N/A - store accesses with memory level is not available Core Load Hit - FB, L1, L2 - count of load hits in FB (Fill Buffer), L1 and L2 cache @@ -200,18 +215,24 @@ For each cacheline in the 1) list we display following data: - count of LLC load accesses, includes LLC hits and LLC HITMs RMT Load Hit - RmtHit, RmtHitm - - count of remote load accesses, includes remote hits and remote HITMs + - count of remote load accesses, includes remote hits and remote HITMs; + on Arm neoverse cores, RmtHit is used to account remote accesses, + includes remote DRAM or any upward cache level in remote node Load Dram - Lcl, Rmt - count of local and remote DRAM accesses For each offset in the 2) list we display following data: - HITM - Rmt, Lcl + HITM - Rmt, Lcl (Display with HITM types) - % of Remote/Local HITM accesses for given offset within cacheline - Store Refs - L1 Hit, L1 Miss - - % of store accesses that hit/missed L1 for given offset within cacheline + Peer Snoop - Rmt, Lcl (Display with peer type) + - % of Remote/Local peer accesses for given offset within cacheline + + Store Refs - L1 Hit, L1 Miss, N/A + - % of store accesses that hit L1, missed L1 and N/A (no available) memory + level for given offset within cacheline Data address - Offset - offset address @@ -225,9 +246,12 @@ For each offset in the 2) list we display following data: Code address - code address responsible for the accesses - cycles - rmt hitm, lcl hitm, load + cycles - rmt hitm, lcl hitm, load (Display with HITM types) - sum of cycles for given accesses - Remote/Local HITM and generic load + cycles - rmt peer, lcl peer, load (Display with peer type) + - sum of cycles for given accesses - Remote/Local peer load and generic load + cpu cnt - number of cpus that participated on the access @@ -249,7 +273,8 @@ The 'Node' field displays nodes that accesses given cacheline offset. Its output comes in 3 flavors: - node IDs separated by ',' - node IDs with stats for each ID, in following format: - Node{cpus %hitms %stores} + Node{cpus %hitms %stores} (Display with HITM types) + Node{cpus %peers %stores} (Display with peer type) - node IDs with list of affected CPUs in following format: Node{cpu list} |