⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 poller_bench.html

📁 实现了poll/epoll/devpoll等C++封装
💻 HTML
📖 第 1 页 / 共 2 页
字号:
2.4.0-test10-pre4 was slower than 2.2.14 in all cases tested.<p>I should show results for pipes as well as socketpairs.<p>The Linux 2.2.14 /dev/poll driver printed messages to the consolewhen sockets were closed; this should probably be disabled for production.<h3>kqueue()</h3>It looks offhand like kqueue() performs best of all the tested methods.It's even faster than, and scales better than, /dev/poll, at least in this microbenchmark.<h3>/dev/poll vs. poll</h3>In all cases tested involving sockets, /dev/poll was appreciably faster than poll().<p>The 2.2.14 Linux /dev/poll driver was about six times faster than poll()for 1000 fds, but fell down to only 2.7 times faster at 10000 fds.The Solaris /dev/poll driver was about seven times faster than poll() at100 fds, and increased to 40 times faster at 10000 fds.<h3>Scalability of poll() and /dev/poll</h3>Under Solaris 7, when the number of idle sockets was increased from 100 to 10000,the time to check for active sockets with poll() and /dev/poll increased by a factor of only 6.5 (good) and 1.5 (fantastic), respectively.  <p>Under Linux 2.2.14, when the number of idle sockets was increased from 100 to 10000,the time to check for active sockets with poll() and /dev/poll increased by a factor of 493 and 224, respectively.  This is terribly, horribly bad scaling behavior.<p>Under Linux 2.4.0-test10-pre4, when the number of idle sockets was increased from 100 to 10000,the time to check for active sockets with poll() increased by a factor of 300.  This is terribly, horribly bad scaling behavior.<p>There seems to be a scalability problem in poll() under both Linux 2.2.14 and 2.4.0-test10-pre4and in /dev/poll under Linux 2.2.14.  <p>poll() is stuck with an interface that dictates O(n) behavior on total pipes;still, Linux's implementation could be improved.The design of the current Linux /dev/poll patch is O(n) in total pipes, in spite of the fact that its interface allows it to be O(1) in total pipesand O(n) only in <i>active</i> pipes.<p>See also the <a href="http://boudicca.tux.org/hypermail/linux-kernel/2000week44/index.html#9">recent discussions on linux-kernel</a>.<h2><a name="profiling">Results - kernel profiling</a></h2>To look for the scalability problem, I added support to thebenchmark to trigger the Linux kernel profiler.  A few resultsare shown below.  (No smoking gun was found, but then, I wouldn'tknow a smoking gun if it hit me in the face.  Perhaps real kernelhackers can pick up the hunt from here.)<p>If you run the above test on a Linux system booted with 'profile=2',Poller_bench will output one kernel profiling data file per test condition.Poller_bench.sh does a gross analysis using 'readprofile | sort -rn | head > bench%d%c.top'to find the kernel functions with the highest CPU usage,where %d is the number of socketpairs, and %c is p for poll, d for /dev/poll, etc.<p>'more bench10000*.top' shows the results for 10000 socketpairs.On 2.2.14, it shows:<pre>::::::::::::::bench10000d.dat.top::::::::::::::   901 total                                      0.0008   833 dp_poll                                    1.4875    27 do_bottom_half                             0.1688     7 __get_request_wait                         0.0139     4 startup_32                                 0.0244     3 unix_poll                                  0.0203::::::::::::::bench10000p.dat.top::::::::::::::   584 total                                      0.0005   236 unix_poll                                  1.5946   162 sock_poll                                  4.5000   148 do_poll                                    0.6727    24 sys_poll                                   0.0659     7 __generic_copy_from_user                   0.1167</pre>This seems to indicate that /dev/poll spends nearly all of its time in dp_poll(),and poll spends a fair bit of time in three routines: unix_poll, sock_poll, and do_poll.<p>On 2.4.0-test10-pre4 smp, 'more bench10000*.top' shows:<pre>::::::::::::::2.4/bench10000p.dat.top::::::::::::::  1507 total                                      0.0011   748 default_idle                              14.3846   253 unix_poll                                  1.9167   209 fget                                       2.4881   195 sock_poll                                  5.4167    29 sys_poll                                   0.0342    29 fput                                       0.1272    29 do_pollfd                                  0.1648</pre>It seems curious that the idle routine should show up so much,but it's probably just the second CPU doing nothing.<p>Poller_bench.sh will also try to do a fine analysis of dp_poll() using the 'profile' tool (source included), which is a variant of readprofile that shows hotspots within kernel functions.  Looking at its output forthe run on 2.2.14, the three four-byte regions that take up the mostCPU time in dp_poll() in the 10000 socketpair case are<pre>   c01d9158  39.135654%           326   c01d9174  11.404561%            95   c01d91a0  27.250900%           227</pre>Looking at the output of 'objdump -d /usr/src/linux/vmlinux', thatregion corresponds to the object code:<pre>c01d9158:	c7 44 24 14 00 00 00 	movl   $0x0,0x14(%esp,1)c01d915f:	00 c01d9160:	8b 74 24 24          	mov    0x24(%esp,1),%esic01d9164:	8b 86 8c 04 00 00    	mov    0x48c(%esi),%eaxc01d916a:	3b 50 04             	cmp    0x4(%eax),%edxc01d916d:	73 0a                	jae    c01d9179 <dp_poll+0xc5>c01d916f:	8b 40 10             	mov    0x10(%eax),%eaxc01d9172:	8b 14 90             	mov    (%eax,%edx,4),%edxc01d9175:	89 54 24 14          	mov    %edx,0x14(%esp,1)c01d9179:	83 7c 24 14 00       	cmpl   $0x0,0x14(%esp,1)c01d917e:	75 12                	jne    c01d9192 <dp_poll+0xde>c01d9180:	53                   	push   %ebxc01d9181:	ff 74 24 3c          	pushl  0x3c(%esp,1)c01d9185:	e8 5a fc ff ff       	call   c01d8de4 <dp_delete>c01d918a:	83 c4 08             	add    $0x8,%espc01d918d:	e9 d1 00 00 00       	jmp    c01d9263 <dp_poll+0x1af>c01d9192:	8b 7c 24 10          	mov    0x10(%esp,1),%edic01d9196:	0f bf 4f 06          	movswl 0x6(%edi),%ecxc01d919a:	31 c0                	xor    %eax,%eaxc01d919c:	f0 0f b3 43 10       	lock btr %eax,0x10(%ebx)c01d91a1:	19 c0                	sbb    %eax,%eax</pre>I'm not yet familiar enough with kernel hacker tools to associate thosewith lines of code in /usr/src/linux/drivers/char/devpoll.c, butthat 'lock btr' hotspot appears to be the call to test_and_clear_bit().<h2><a name="lmbench">lmbench results</a></h2>lmbench results are presented here to help people trying to comparethe Intel and Sparc parts of the results shown above.<p>The source used was lmbench-2alpha10 from bitmover.com.I did not check into why the TCP test failed on the linux box.<pre>                 L M B E N C H  1 . 9   S U M M A R Y                 ------------------------------------                 (Alpha software, do not distribute) Processor, Processes - times in microseconds - smaller is better----------------------------------------------------------------Host                 OS  Mhz null null      open selct sig  sig  fork exec sh                             call  I/O stat clos       inst hndl proc proc proc--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----sparc-sun     SunOS 5.7  167  2.9  12.   48   55 0.40K  6.6   81 3.8K  15K  32Ki686-linu Linux 2.2.14d  651  0.5  0.8    4    5 0.03K  1.4    2 0.3K   1K   6K Context switching - times in microseconds - smaller is better-------------------------------------------------------------Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw--------- ------------- ----- ------ ------ ------ ------ ------- -------sparc-sun     SunOS 5.7   19     69    235   114    349     116     367i686-linu Linux 2.2.14d    1      5     17     5    129      30     129 *Local* Communication latencies in microseconds - smaller is better-------------------------------------------------------------------Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP                        ctxsw       UNIX         UDP         TCP conn--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----sparc-sun     SunOS 5.7    19    60  120   197         215       1148i686-linu Linux 2.2.14d     1     7   13    31          80            File & VM system latencies in microseconds - smaller is better--------------------------------------------------------------Host                 OS   0K File      10K File      Mmap    Prot    Page                        Create Delete Create Delete  Latency Fault   Fault--------- ------------- ------ ------ ------ ------  ------- -----   -----sparc-sun     SunOS 5.7                                 6605    15    5.2Ki686-linu Linux 2.2.14d     10      0     19      1     5968     1    0.5K *Local* Communication bandwidths in MB/s - bigger is better-----------------------------------------------------------Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem                             UNIX      reread reread (libc) (hand) read write--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----sparc-sun     SunOS 5.7   60   55   54     84    122    177     89  122   141i686-linu Linux 2.2.14d  528  366   -1    357    451    150    138  451   171 Memory latencies in nanoseconds - smaller is better    (WARNING - may not be correct, check graphs)---------------------------------------------------Host                 OS   Mhz  L1 $   L2 $    Main mem    Guesses--------- -------------   ---  ----   ----    --------    -------sparc-sun     SunOS 5.7   167    12     59         273                         i686-linu Linux 2.2.14d   651     4     10         131</pre><hr><i>Dan Kegel</i><br><a href="http://www.kegel.com/">www.kegel.com</a></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -