📄 220.htm
字号:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<title>CTerm非常精华下载</title>
</head>
<body bgcolor="#FFFFFF">
<table border="0" width="100%" cellspacing="0" cellpadding="0" height="577">
<tr><td width="32%" rowspan="3" height="123"><img src="DDl_back.jpg" width="300" height="129" alt="DDl_back.jpg"></td><td width="30%" background="DDl_back2.jpg" height="35"><p align="center"><a href="http://202.112.58.200"><font face="黑体"><big><big>Tsinghua</big></big></font></a></td></tr>
<tr>
<td width="68%" background="DDl_back2.jpg" height="44"><big><big><font face="黑体"><p align="center"> 嵌入式系统 (BM: turbolinux jacobw) </font></big></big></td></tr>
<tr>
<td width="68%" height="44" bgcolor="#000000"><font face="黑体"><big><big><p align="center"></big></big><a href="http://cterm.163.net"><img src="banner.gif" width="400" height="60" alt="banner.gif"border="0"></a></font></td>
</tr>
<tr><td width="100%" colspan="2" height="100" align="center" valign="top"><br><p align="center">[<a href="嵌入式系统.htm">回到开始</a>][<a href="198.htm">上一层</a>][<a href="221.htm">下一篇</a>]
<hr><p align="left"><small>发信人: plato (纯真年代), 信区: Embedded <br>
标 题: linux fpr ppc chapter 19 <br>
发信站: BBS 水木清华站 (Wed May 30 23:23:31 2001) <br>
<br>
Next Previous Contents <br>
---------------------------------------------------------------------------- <br>
---- <br>
19. Performance <br>
19.1 CPU core <br>
Cache <br>
Firstly, make sure you have both the I and D caches enabled! <br>
Also, make sure you have serialization disabled (Set ICTRL to 0x7). <br>
To get maximum performance, you need to enable copyback data cache. This can <br>
be disabled in order to make the standard Linux/PPC libraries work without <br>
recompiling. If you build your own glibc as described under Runtime Library, <br>
you can enable copyback. Look for a "make config" option, or grep for DC_SF <br>
WT in <br>
arch/ppc/kernel/head.S <br>
and change the <br>
#if 0 <br>
to <br>
#if 1 <br>
. <br>
. <br>
BogoMIPS <br>
The BogoMIPS value on 8xx processors should be within 1% or so of the actual <br>
CPU core frequency, allowing for rounding & minor timing calculation errors <br>
. This makes it a useful sanity check to verify that the internal clock mult <br>
iplier is set correctly, and that the I-cache is turned on. However, note th <br>
at the calculation of the BogoMIPS value is still tied to the external clock <br>
source and internal prescaler settings, so it shouldn't be solely relied on <br>
to verify that the core frequency really is what you think it should be. A <br>
simple cross-check is to perform a 'sleep 10' at the shell prompt, and time <br>
it with a watch to check that you're at least in the ballpark. It's wise to <br>
measure your system more accurately than this with a CRO at least once. <br>
Also, beware that the BogoMIPS rating should not be used as a general CPU pe <br>
rformance measure; see: http://linuxdoc.org/HOWTO/mini/BogoMips.html <br>
19.2 Profiling <br>
There are numerous options available for system profiling, depending on what <br>
you wish to measure, and how invasive you are prepared to be. <br>
/proc/profile <br>
/proc/profile is a standard kernel feature which provides simple kernel prof <br>
iling based on Instruction Pointer sampling in the periodic timer interrupt <br>
routine. It's simplistic but effective, and low overhead since the interrupt <br>
is going to happen anyway. The data is processed with readprofile which loo <br>
ks up the System.map to show which kernel functions are using the most CPU t <br>
ime. It doesn't work for modules yet so at present you need to compile them <br>
in for profiling. <br>
You need to enable this at boot time by passing profile=2 on the command lin <br>
e; The number gives the power of 2 granularity used for the counters -- 2 wi <br>
ll give you a seperate counter for each PowerPC instruction (each 4 bytes). <br>
Higher numbers consume less memory and give less precise results. The data f <br>
rom /proc/profile will be in target byte order, so if you're cross-developin <br>
g you may need to either byte swap it, or compile readprofile to run on your <br>
target. <br>
The PowerPC branch of the Linux kernel has been slow to implement the Instru <br>
ction Pointer sampling function necessary to generate the /proc/profile data <br>
. If it isn't implemented in your kernel, you'll see that readprofile always <br>
shows zero time for every kernel function. In this case you need to apply t <br>
he profile.patch from: http://members.xoom.com/greyhams/linux/patches/ <br>
Linux Trace Toolkit <br>
http://www.opersys.com/LTT <br>
The Linux Trace Toolkit works with an instrumented Linux kernel by saving ti <br>
me-stamped records of important kernel events to a binary data file. A data <br>
decoder converts the binary data to text and calculates statistical summarie <br>
s, such as percent processor utilization by each process. The toolkit also i <br>
ncludes an integrated environment that graphically displays the results and <br>
provides search capability. <br>
A version for embedded PowerPC targets is now available from: ftp://ftp.mvis <br>
ta.com/pub/LTT. <br>
gprof <br>
All the usual Linux user mode profiling tools like gprof are available. <br>
kernprof <br>
http://oss.sgi.com/projects/kernprof <br>
This project aims to make full gprof profiling available for the kernel. How <br>
ever, it hasn't been ported to the PowerPC architecture yet. <br>
19.3 IDMA <br>
Beware that IDMA on the 860 is not designed for high performance, and the CP <br>
U gets better throughput with explicit cache bursted programmed I/O. Search <br>
for IDMA for more discussion. <br>
Confusion sometimes arises because DMA transfers in most systems are faster <br>
than CPU transfers, whereas here the reverse is generally true. Furthermore, <br>
IDMA transfers eat into CPM processing time, limiting throughput on other c <br>
ommunications modules at the same time. <br>
19.4 Network <br>
To get good TCP/IP performance, you need a fast CPU. Using the FEC, a 50 MHz <br>
860P will run about 30 Mbits/sec TCP/IP, and a 100 MHz 860P will run about <br>
60 Mbits/sec TCP/IP. The bottleneck is the protocol and application processi <br>
ng in the PPC core. The performance of a TCP/IP connection scales nearly lin <br>
early with the processor speed. <br>
If you need to go faster, use the 8260. <br>
19.5 Optimisation <br>
Optimising everything for space using gcc's -Os option is likely to provide <br>
both the smallest code size and best performance, because it inhibits loop u <br>
nrolling optimisation which tends to have a negative effect on embedded proc <br>
essors with relatively small cache sizes. Furthermore, PowerPC processors ca <br>
n speculatively execute branches overlapped with other loop instructions, ma <br>
king the branch effectively execute in zero cycles so loop unrolling is unne <br>
cessary in many circumstances. <br>
---------------------------------------------------------------------------- <br>
---- <br>
Next Previous Contents <br>
<br>
-- <br>
<br>
※ 来源:·BBS 水木清华站 smth.org·[FROM: 166.111.161.8] <br>
</small><hr>
<p align="center">[<a href="嵌入式系统.htm">回到开始</a>][<a href="198.htm">上一层</a>][<a href="221.htm">下一篇</a>]
<p align="center"><a href="http://cterm.163.net">欢迎访问Cterm主页</a></p>
</table>
</body>
</html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -