⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 00000020.htm

📁 水木社区 embeded 版精华区 下载
💻 HTM
字号:
<?xml version="1.0" encoding="gb2312"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=gb2312"/><title>linux fpr ppc chapter 19               jacobw </title></head><body><center><h1>BBS 水木清华站∶精华区</h1></center><a name="top"></a>发信人:&nbsp;plato&nbsp;(纯真年代),&nbsp;信区:&nbsp;Embedded&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />标&nbsp;&nbsp;题:&nbsp;linux&nbsp;fpr&nbsp;ppc&nbsp;chapter&nbsp;19&nbsp;<br />发信站:&nbsp;BBS&nbsp;水木清华站&nbsp;(Wed&nbsp;May&nbsp;30&nbsp;23:23:31&nbsp;2001)&nbsp;<br />&nbsp;<br />Next&nbsp;Previous&nbsp;Contents&nbsp;<br />----------------------------------------------------------------------------&nbsp;<br />----&nbsp;<br />19.&nbsp;Performance&nbsp;<br />19.1&nbsp;CPU&nbsp;core&nbsp;<br />Cache&nbsp;<br />Firstly,&nbsp;make&nbsp;sure&nbsp;you&nbsp;have&nbsp;both&nbsp;the&nbsp;I&nbsp;and&nbsp;D&nbsp;caches&nbsp;enabled!&nbsp;<br />Also,&nbsp;make&nbsp;sure&nbsp;you&nbsp;have&nbsp;serialization&nbsp;disabled&nbsp;(Set&nbsp;ICTRL&nbsp;to&nbsp;0x7).&nbsp;<br />To&nbsp;get&nbsp;maximum&nbsp;performance,&nbsp;you&nbsp;need&nbsp;to&nbsp;enable&nbsp;copyback&nbsp;data&nbsp;cache.&nbsp;This&nbsp;can&nbsp;<br />&nbsp;be&nbsp;disabled&nbsp;in&nbsp;order&nbsp;to&nbsp;make&nbsp;the&nbsp;standard&nbsp;Linux/PPC&nbsp;libraries&nbsp;work&nbsp;without&nbsp;&nbsp;<br />recompiling.&nbsp;If&nbsp;you&nbsp;build&nbsp;your&nbsp;own&nbsp;glibc&nbsp;as&nbsp;described&nbsp;under&nbsp;Runtime&nbsp;Library,&nbsp;<br />&nbsp;you&nbsp;can&nbsp;enable&nbsp;copyback.&nbsp;Look&nbsp;for&nbsp;a&nbsp;&quot;make&nbsp;config&quot;&nbsp;option,&nbsp;or&nbsp;grep&nbsp;for&nbsp;DC_SF&nbsp;<br />WT&nbsp;in&nbsp;<br />arch/ppc/kernel/head.S&nbsp;<br />and&nbsp;change&nbsp;the&nbsp;<br />#if&nbsp;0&nbsp;<br />to&nbsp;<br />#if&nbsp;1&nbsp;<br />.&nbsp;<br />BogoMIPS&nbsp;<br />The&nbsp;BogoMIPS&nbsp;value&nbsp;on&nbsp;8xx&nbsp;processors&nbsp;should&nbsp;be&nbsp;within&nbsp;1%&nbsp;or&nbsp;so&nbsp;of&nbsp;the&nbsp;actual&nbsp;<br />&nbsp;CPU&nbsp;core&nbsp;frequency,&nbsp;allowing&nbsp;for&nbsp;rounding&nbsp;&amp;&nbsp;minor&nbsp;timing&nbsp;calculation&nbsp;errors&nbsp;<br />.&nbsp;This&nbsp;makes&nbsp;it&nbsp;a&nbsp;useful&nbsp;sanity&nbsp;check&nbsp;to&nbsp;verify&nbsp;that&nbsp;the&nbsp;internal&nbsp;clock&nbsp;mult&nbsp;<br />iplier&nbsp;is&nbsp;set&nbsp;correctly,&nbsp;and&nbsp;that&nbsp;the&nbsp;I-cache&nbsp;is&nbsp;turned&nbsp;on.&nbsp;However,&nbsp;note&nbsp;th&nbsp;<br />at&nbsp;the&nbsp;calculation&nbsp;of&nbsp;the&nbsp;BogoMIPS&nbsp;value&nbsp;is&nbsp;still&nbsp;tied&nbsp;to&nbsp;the&nbsp;external&nbsp;clock&nbsp;<br />&nbsp;source&nbsp;and&nbsp;internal&nbsp;prescaler&nbsp;settings,&nbsp;so&nbsp;it&nbsp;shouldn't&nbsp;be&nbsp;solely&nbsp;relied&nbsp;on&nbsp;<br />&nbsp;to&nbsp;verify&nbsp;that&nbsp;the&nbsp;core&nbsp;frequency&nbsp;really&nbsp;is&nbsp;what&nbsp;you&nbsp;think&nbsp;it&nbsp;should&nbsp;be.&nbsp;A&nbsp;&nbsp;<br />simple&nbsp;cross-check&nbsp;is&nbsp;to&nbsp;perform&nbsp;a&nbsp;'sleep&nbsp;10'&nbsp;at&nbsp;the&nbsp;shell&nbsp;prompt,&nbsp;and&nbsp;time&nbsp;&nbsp;<br />it&nbsp;with&nbsp;a&nbsp;watch&nbsp;to&nbsp;check&nbsp;that&nbsp;you're&nbsp;at&nbsp;least&nbsp;in&nbsp;the&nbsp;ballpark.&nbsp;It's&nbsp;wise&nbsp;to&nbsp;&nbsp;<br />measure&nbsp;your&nbsp;system&nbsp;more&nbsp;accurately&nbsp;than&nbsp;this&nbsp;with&nbsp;a&nbsp;CRO&nbsp;at&nbsp;least&nbsp;once.&nbsp;<br />Also,&nbsp;beware&nbsp;that&nbsp;the&nbsp;BogoMIPS&nbsp;rating&nbsp;should&nbsp;not&nbsp;be&nbsp;used&nbsp;as&nbsp;a&nbsp;general&nbsp;CPU&nbsp;pe&nbsp;<br />rformance&nbsp;measure;&nbsp;see:&nbsp;<a href="http://linuxdoc.org/HOWTO/mini/BogoMips.html">http://linuxdoc.org/HOWTO/mini/BogoMips.html</a>&nbsp;<br />19.2&nbsp;Profiling&nbsp;<br />There&nbsp;are&nbsp;numerous&nbsp;options&nbsp;available&nbsp;for&nbsp;system&nbsp;profiling,&nbsp;depending&nbsp;on&nbsp;what&nbsp;<br />&nbsp;you&nbsp;wish&nbsp;to&nbsp;measure,&nbsp;and&nbsp;how&nbsp;invasive&nbsp;you&nbsp;are&nbsp;prepared&nbsp;to&nbsp;be.&nbsp;<br />/proc/profile&nbsp;<br />/proc/profile&nbsp;is&nbsp;a&nbsp;standard&nbsp;kernel&nbsp;feature&nbsp;which&nbsp;provides&nbsp;simple&nbsp;kernel&nbsp;prof&nbsp;<br />iling&nbsp;based&nbsp;on&nbsp;Instruction&nbsp;Pointer&nbsp;sampling&nbsp;in&nbsp;the&nbsp;periodic&nbsp;timer&nbsp;interrupt&nbsp;&nbsp;<br />routine.&nbsp;It's&nbsp;simplistic&nbsp;but&nbsp;effective,&nbsp;and&nbsp;low&nbsp;overhead&nbsp;since&nbsp;the&nbsp;interrupt&nbsp;<br />&nbsp;is&nbsp;going&nbsp;to&nbsp;happen&nbsp;anyway.&nbsp;The&nbsp;data&nbsp;is&nbsp;processed&nbsp;with&nbsp;readprofile&nbsp;which&nbsp;loo&nbsp;<br />ks&nbsp;up&nbsp;the&nbsp;System.map&nbsp;to&nbsp;show&nbsp;which&nbsp;kernel&nbsp;functions&nbsp;are&nbsp;using&nbsp;the&nbsp;most&nbsp;CPU&nbsp;t&nbsp;<br />ime.&nbsp;It&nbsp;doesn't&nbsp;work&nbsp;for&nbsp;modules&nbsp;yet&nbsp;so&nbsp;at&nbsp;present&nbsp;you&nbsp;need&nbsp;to&nbsp;compile&nbsp;them&nbsp;&nbsp;<br />in&nbsp;for&nbsp;profiling.&nbsp;<br />You&nbsp;need&nbsp;to&nbsp;enable&nbsp;this&nbsp;at&nbsp;boot&nbsp;time&nbsp;by&nbsp;passing&nbsp;profile=2&nbsp;on&nbsp;the&nbsp;command&nbsp;lin&nbsp;<br />e;&nbsp;The&nbsp;number&nbsp;gives&nbsp;the&nbsp;power&nbsp;of&nbsp;2&nbsp;granularity&nbsp;used&nbsp;for&nbsp;the&nbsp;counters&nbsp;--&nbsp;2&nbsp;wi&nbsp;<br />ll&nbsp;give&nbsp;you&nbsp;a&nbsp;seperate&nbsp;counter&nbsp;for&nbsp;each&nbsp;PowerPC&nbsp;instruction&nbsp;(each&nbsp;4&nbsp;bytes).&nbsp;&nbsp;<br />Higher&nbsp;numbers&nbsp;consume&nbsp;less&nbsp;memory&nbsp;and&nbsp;give&nbsp;less&nbsp;precise&nbsp;results.&nbsp;The&nbsp;data&nbsp;f&nbsp;<br />rom&nbsp;/proc/profile&nbsp;will&nbsp;be&nbsp;in&nbsp;target&nbsp;byte&nbsp;order,&nbsp;so&nbsp;if&nbsp;you're&nbsp;cross-developin&nbsp;<br />g&nbsp;you&nbsp;may&nbsp;need&nbsp;to&nbsp;either&nbsp;byte&nbsp;swap&nbsp;it,&nbsp;or&nbsp;compile&nbsp;readprofile&nbsp;to&nbsp;run&nbsp;on&nbsp;your&nbsp;<br />&nbsp;target.&nbsp;<br />The&nbsp;PowerPC&nbsp;branch&nbsp;of&nbsp;the&nbsp;Linux&nbsp;kernel&nbsp;has&nbsp;been&nbsp;slow&nbsp;to&nbsp;implement&nbsp;the&nbsp;Instru&nbsp;<br />ction&nbsp;Pointer&nbsp;sampling&nbsp;function&nbsp;necessary&nbsp;to&nbsp;generate&nbsp;the&nbsp;/proc/profile&nbsp;data&nbsp;<br />.&nbsp;If&nbsp;it&nbsp;isn't&nbsp;implemented&nbsp;in&nbsp;your&nbsp;kernel,&nbsp;you'll&nbsp;see&nbsp;that&nbsp;readprofile&nbsp;always&nbsp;<br />&nbsp;shows&nbsp;zero&nbsp;time&nbsp;for&nbsp;every&nbsp;kernel&nbsp;function.&nbsp;In&nbsp;this&nbsp;case&nbsp;you&nbsp;need&nbsp;to&nbsp;apply&nbsp;t&nbsp;<br />he&nbsp;profile.patch&nbsp;from:&nbsp;<a href="http://members.xoom.com/greyhams/linux/patches/">http://members.xoom.com/greyhams/linux/patches/</a>&nbsp;<br />Linux&nbsp;Trace&nbsp;Toolkit&nbsp;<br /><a href="http://www.opersys.com/LTT">http://www.opersys.com/LTT</a>&nbsp;<br />The&nbsp;Linux&nbsp;Trace&nbsp;Toolkit&nbsp;works&nbsp;with&nbsp;an&nbsp;instrumented&nbsp;Linux&nbsp;kernel&nbsp;by&nbsp;saving&nbsp;ti&nbsp;<br />me-stamped&nbsp;records&nbsp;of&nbsp;important&nbsp;kernel&nbsp;events&nbsp;to&nbsp;a&nbsp;binary&nbsp;data&nbsp;file.&nbsp;A&nbsp;data&nbsp;&nbsp;<br />decoder&nbsp;converts&nbsp;the&nbsp;binary&nbsp;data&nbsp;to&nbsp;text&nbsp;and&nbsp;calculates&nbsp;statistical&nbsp;summarie&nbsp;<br />s,&nbsp;such&nbsp;as&nbsp;percent&nbsp;processor&nbsp;utilization&nbsp;by&nbsp;each&nbsp;process.&nbsp;The&nbsp;toolkit&nbsp;also&nbsp;i&nbsp;<br />ncludes&nbsp;an&nbsp;integrated&nbsp;environment&nbsp;that&nbsp;graphically&nbsp;displays&nbsp;the&nbsp;results&nbsp;and&nbsp;&nbsp;<br />provides&nbsp;search&nbsp;capability.&nbsp;<br />A&nbsp;version&nbsp;for&nbsp;embedded&nbsp;PowerPC&nbsp;targets&nbsp;is&nbsp;now&nbsp;available&nbsp;from:&nbsp;<a href="ftp://ftp.mvis">ftp://ftp.mvis</a>&nbsp;<br />ta.com/pub/LTT.&nbsp;<br />gprof&nbsp;<br />All&nbsp;the&nbsp;usual&nbsp;Linux&nbsp;user&nbsp;mode&nbsp;profiling&nbsp;tools&nbsp;like&nbsp;gprof&nbsp;are&nbsp;available.&nbsp;<br />kernprof&nbsp;<br /><a href="http://oss.sgi.com/projects/kernprof">http://oss.sgi.com/projects/kernprof</a>&nbsp;<br />This&nbsp;project&nbsp;aims&nbsp;to&nbsp;make&nbsp;full&nbsp;gprof&nbsp;profiling&nbsp;available&nbsp;for&nbsp;the&nbsp;kernel.&nbsp;How&nbsp;<br />ever,&nbsp;it&nbsp;hasn't&nbsp;been&nbsp;ported&nbsp;to&nbsp;the&nbsp;PowerPC&nbsp;architecture&nbsp;yet.&nbsp;<br />19.3&nbsp;IDMA&nbsp;<br />Beware&nbsp;that&nbsp;IDMA&nbsp;on&nbsp;the&nbsp;860&nbsp;is&nbsp;not&nbsp;designed&nbsp;for&nbsp;high&nbsp;performance,&nbsp;and&nbsp;the&nbsp;CP&nbsp;<br />U&nbsp;gets&nbsp;better&nbsp;throughput&nbsp;with&nbsp;explicit&nbsp;cache&nbsp;bursted&nbsp;programmed&nbsp;I/O.&nbsp;Search&nbsp;&nbsp;<br />for&nbsp;IDMA&nbsp;for&nbsp;more&nbsp;discussion.&nbsp;<br />Confusion&nbsp;sometimes&nbsp;arises&nbsp;because&nbsp;DMA&nbsp;transfers&nbsp;in&nbsp;most&nbsp;systems&nbsp;are&nbsp;faster&nbsp;&nbsp;<br />than&nbsp;CPU&nbsp;transfers,&nbsp;whereas&nbsp;here&nbsp;the&nbsp;reverse&nbsp;is&nbsp;generally&nbsp;true.&nbsp;Furthermore,&nbsp;<br />&nbsp;IDMA&nbsp;transfers&nbsp;eat&nbsp;into&nbsp;CPM&nbsp;processing&nbsp;time,&nbsp;limiting&nbsp;throughput&nbsp;on&nbsp;other&nbsp;c&nbsp;<br />ommunications&nbsp;modules&nbsp;at&nbsp;the&nbsp;same&nbsp;time.&nbsp;<br />19.4&nbsp;Network&nbsp;<br />To&nbsp;get&nbsp;good&nbsp;TCP/IP&nbsp;performance,&nbsp;you&nbsp;need&nbsp;a&nbsp;fast&nbsp;CPU.&nbsp;Using&nbsp;the&nbsp;FEC,&nbsp;a&nbsp;50&nbsp;MHz&nbsp;<br />&nbsp;860P&nbsp;will&nbsp;run&nbsp;about&nbsp;30&nbsp;Mbits/sec&nbsp;TCP/IP,&nbsp;and&nbsp;a&nbsp;100&nbsp;MHz&nbsp;860P&nbsp;will&nbsp;run&nbsp;about&nbsp;&nbsp;<br />60&nbsp;Mbits/sec&nbsp;TCP/IP.&nbsp;The&nbsp;bottleneck&nbsp;is&nbsp;the&nbsp;protocol&nbsp;and&nbsp;application&nbsp;processi&nbsp;<br />ng&nbsp;in&nbsp;the&nbsp;PPC&nbsp;core.&nbsp;The&nbsp;performance&nbsp;of&nbsp;a&nbsp;TCP/IP&nbsp;connection&nbsp;scales&nbsp;nearly&nbsp;lin&nbsp;<br />early&nbsp;with&nbsp;the&nbsp;processor&nbsp;speed.&nbsp;<br />If&nbsp;you&nbsp;need&nbsp;to&nbsp;go&nbsp;faster,&nbsp;use&nbsp;the&nbsp;8260.&nbsp;<br />19.5&nbsp;Optimisation&nbsp;<br />Optimising&nbsp;everything&nbsp;for&nbsp;space&nbsp;using&nbsp;gcc's&nbsp;-Os&nbsp;option&nbsp;is&nbsp;likely&nbsp;to&nbsp;provide&nbsp;&nbsp;<br />both&nbsp;the&nbsp;smallest&nbsp;code&nbsp;size&nbsp;and&nbsp;best&nbsp;performance,&nbsp;because&nbsp;it&nbsp;inhibits&nbsp;loop&nbsp;u&nbsp;<br />nrolling&nbsp;optimisation&nbsp;which&nbsp;tends&nbsp;to&nbsp;have&nbsp;a&nbsp;negative&nbsp;effect&nbsp;on&nbsp;embedded&nbsp;proc&nbsp;<br />essors&nbsp;with&nbsp;relatively&nbsp;small&nbsp;cache&nbsp;sizes.&nbsp;Furthermore,&nbsp;PowerPC&nbsp;processors&nbsp;ca&nbsp;<br />n&nbsp;speculatively&nbsp;execute&nbsp;branches&nbsp;overlapped&nbsp;with&nbsp;other&nbsp;loop&nbsp;instructions,&nbsp;ma&nbsp;<br />king&nbsp;the&nbsp;branch&nbsp;effectively&nbsp;execute&nbsp;in&nbsp;zero&nbsp;cycles&nbsp;so&nbsp;loop&nbsp;unrolling&nbsp;is&nbsp;unne&nbsp;<br />cessary&nbsp;in&nbsp;many&nbsp;circumstances.&nbsp;<br />----------------------------------------------------------------------------&nbsp;<br />----&nbsp;<br />Next&nbsp;Previous&nbsp;Contents&nbsp;&nbsp;<br />&nbsp;<br />--&nbsp;<br />&nbsp;<br />※&nbsp;来源:·BBS&nbsp;水木清华站&nbsp;smth.org·[FROM:&nbsp;166.111.161.8]&nbsp;<br /><a href="00000019.htm">上一篇</a><a href="javascript:history.go(-1)">返回上一页</a><a href="index.htm">回到目录</a><a href="#top">回到页首</a><a href="00000021.htm">下一篇</a></h1></center><center><h1>BBS 水木清华站∶精华区</h1></center></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -