最重要的一篇paper:
A Top-Down method for performance analysis and counters architecture

这篇文章提出了topdown分析模型,可以视系统为一个黑盒,看出它是CPU bound还是memory bound的系统。
当然,perf stat命令也提供了相应的支持:
# perf stat --topdown^CPerformance counter stats for 'system wide':retiring bad speculation frontend bound backend boundS0-D0-C0 2 22.9% 2.0% 27.9% 47.2%S0-D0-C1 2 10.7% 0.0% 0.0% 89.2%S0-D0-C2 2 10.7% 0.0% 0.0% 89.2%S0-D0-C3 2 11.6% 3.4% 39.5% 45.5%S0-D0-C4 2 8.0% 3.1% 32.1% 56.8%S0-D0-C5 2 12.9% 2.6% 37.8% 46.7%S0-D0-C6 2 16.1% 2.7% 47.5% 33.6%S0-D0-C7 2 9.8% 1.3% 32.7% 56.1%S0-D0-C8 2 10.8% 4.2% 48.0% 37.0%S0-D0-C9 2 9.1% 1.8% 30.9% 58.2%
跟可用pmutools看到更多细节:
# python3 ./toplev.py -l 2Will measure complete system.4.2-full-perf on Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz [skl/skylake]C1 BE Backend_Bound % Slots 89.2C1 BE/Mem Backend_Bound.Memory_Bound % Slots 49.2 <==This metric represents fraction of slots the Memorysubsystem within the Backend was a bottleneck...C1 BE/Core Backend_Bound.Core_Bound % Slots 39.9This metric represents fraction of slots where Core non-memory issues were of a bottleneck...MUX % 14.1PerfMon Event Multiplexing accuracy indicatorC2 BE Backend_Bound % Slots 89.1C2 BE/Mem Backend_Bound.Memory_Bound % Slots 49.1 <==C2 BE/Core Backend_Bound.Core_Bound % Slots 40.0MUX % 14.1MUX % 14.2MUX % 14.1Run toplev --describe Memory_Bound^ to get more information on bottleneck for cpuAdd --nodes '!+Memory_Bound*/3,+MUX' for breakdown.Idle CPUs 0,3-10,13-19 may have been hidden. Override with --idle-threshold 100
pmutools的下载地址:
https://github.com/andikleen/pmu-tools
然后推荐2本书:
这个是大神Brendan Gregg的:

以及easyperf.net的:

这2本书各有千秋,前一本被译为“性能之巅”,里面介绍了很多有用的工具,以及提出了USE模型。这本书是性能分析领域的必读书!!之前第一版的封面是:

后一本,对处理器的微架构等有更细的描述,它的目录一定让你垂涎三尺:




这些书和论文怎么弄,我们不介绍,该付钱不付钱,不要想着搞盗版:-)
最后一本书的pdf是免费的,在这里填个email就好:
https://book.easyperf.net/perf_book
topdown的IEEE论文是付费的,但是这里有一个免费的slides:
https://pdfs.semanticscholar.org/b5e0/1ab1baa6640a39edfa06d556fabd882cdf64.pdf
大神Brendan Gregg的书的第一版是有中文版的:
