📄 the c10k problem.mht
字号:
<UL>
<LI>Whether and how to issue multiple I/O calls from a single thread=20
<UL>
<LI>Don't; use blocking/synchronous calls throughout, and possibly =
use=20
multiple threads or processes to achieve concurrency=20
<LI>Use nonblocking calls (e.g. write() on a socket set to =
O_NONBLOCK) to=20
start I/O, and readiness notification (e.g. poll() or /dev/poll) to =
know=20
when it's OK to start the next I/O on that channel. Generally only =
usable=20
with network I/O, not disk I/O.=20
<LI>Use asynchronous calls (e.g. aio_write()) to start I/O, and =
completion=20
notification (e.g. signals or completion ports) to know when the I/O =
finishes. Good for both network and disk I/O. </LI></UL>
<LI>How to control the code servicing each client=20
<UL>
<LI>one process for each client (classic Unix approach, used since =
1980 or=20
so)=20
<LI>one OS-level thread handles many clients; each client is =
controlled by:=20
<UL>
<LI>a user-level thread (e.g. GNU state threads, classic Java with =
green=20
threads)=20
<LI>a state machine (a bit esoteric, but popular in some circles; =
my=20
favorite)=20
<LI>a continuation (a bit esoteric, but popular in some circles) =
</LI></UL>
<LI>one OS-level thread for each client (e.g. classic Java with =
native=20
threads)=20
<LI>one OS-level thread for each active client (e.g. Tomcat with =
apache=20
front end; NT completion ports; thread pools) </LI></UL>
<LI>Whether to use standard O/S services, or put some code into the =
kernel=20
(e.g. in a custom driver, kernel module, or VxD) </LI></UL>
<P>The following five combinations seem to be popular:=20
<OL>
<LI><A href=3D"http://www.kegel.com/c10k.html#nb">Serve many clients =
with each=20
thread, and use nonblocking I/O and <B>level-triggered</B> readiness=20
notification</A>=20
<LI><A href=3D"http://www.kegel.com/c10k.html#nb.edge">Serve many =
clients with=20
each thread, and use nonblocking I/O and readiness <B>change</B>=20
notification</A>=20
<LI><A href=3D"http://www.kegel.com/c10k.html#aio">Serve many clients =
with each=20
server thread, and use asynchronous I/O</A>=20
<LI><A href=3D"http://www.kegel.com/c10k.html#threaded">serve one =
client with=20
each server thread, and use blocking I/O</A>=20
<LI><A href=3D"http://www.kegel.com/c10k.html#kio">Build the server =
code into=20
the kernel</A> </LI></OL>
<H3><A name=3Dnb>1. Serve many clients with each thread, and use =
nonblocking I/O=20
and <B>level-triggered</B> readiness notification</A></H3>
<P>... set nonblocking mode on all network handles, and use select() or =
poll()=20
to tell which network handle has data waiting. This is the traditional =
favorite.=20
With this scheme, the kernel tells you whether a file descriptor is =
ready,=20
whether or not you've done anything with that file descriptor since the =
last=20
time the kernel told you about it. (The name 'level triggered' comes =
from=20
computer hardware design; it's the opposite of <A=20
href=3D"http://www.kegel.com/c10k.html#nb.edge">'edge triggered'</A>. =
Jonathon=20
Lemon introduced the terms in his <A=20
href=3D"http://people.freebsd.org/~jlemon/papers/kqueue.pdf">BSDCON 2000 =
paper on=20
kqueue()</A>.)=20
<P>Note: it's particularly important to remember that readiness =
notification=20
from the kernel is only a hint; the file descriptor might not be ready =
anymore=20
when you try to read from it. That's why it's important to use =
nonblocking mode=20
when using readiness notification.=20
<P>An important bottleneck in this method is that read() or sendfile() =
from disk=20
blocks if the page is not in core at the moment; setting nonblocking =
mode on a=20
disk file handle has no effect. Same thing goes for memory-mapped disk =
files.=20
The first time a server needs disk I/O, its process blocks, all clients =
must=20
wait, and that raw nonthreaded performance goes to waste. <BR>This is =
what=20
asynchronous I/O is for, but on systems that lack AIO, worker threads or =
processes that do the disk I/O can also get around this bottleneck. One =
approach=20
is to use memory-mapped files, and if mincore() indicates I/O is needed, =
ask a=20
worker to do the I/O, and continue handling network traffic. Jef =
Poskanzer=20
mentions that Pai, Druschel, and Zwaenepoel's 1999 <A=20
href=3D"http://www.cs.rice.edu/~vivek/flash99/">Flash</A> web server =
uses this=20
trick; they gave a talk at <A=20
href=3D"http://www.usenix.org/events/usenix99/technical.html">Usenix =
'99</A> on=20
it. It looks like mincore() is available in BSD-derived Unixes like <A=20
href=3D"http://www.freebsd.org/cgi/man.cgi?query=3Dmincore">FreeBSD</A> =
and Solaris,=20
but is not part of the <A href=3D"http://www.unix-systems.org/">Single =
Unix=20
Specification</A>. It's available as part of Linux as of kernel 2.3.51, =
<A=20
href=3D"http://www.citi.umich.edu/projects/citi-netscape/status/mar-apr20=
00.html">thanks=20
to Chuck Lever</A>.=20
<P>But <A=20
href=3D"http://marc.theaimsgroup.com/?l=3Dfreebsd-hackers&m=3D1067183=
43317930&w=3D2">in=20
November 2003 on the freebsd-hackers list, Vivek Pei et al reported</A> =
very=20
good results using system-wide profiling of their Flash web server to =
attack=20
bottlenecks. One bottleneck they found was mincore (guess that wasn't =
such a=20
good idea after all) Another was the fact that sendfile blocks on disk =
access;=20
they improved performance by introducing a modified sendfile() that =
return=20
something like EWOULDBLOCK when the disk page it's fetching is not yet =
in core.=20
(Not sure how you tell the user the page is now resident... seems to me =
what's=20
really needed here is aio_sendfile().) The end result of their =
optimizations is=20
a SpecWeb99 score of about 800 on a 1GHZ/1GB FreeBSD box, which is =
better than=20
anything on file at spec.org.=20
<P>There are several ways for a single thread to tell which of a set of=20
nonblocking sockets are ready for I/O:=20
<UL>
<LI><A name=3Dnb.select><B>The traditional select()</B></A> =
<BR>Unfortunately,=20
select() is limited to FD_SETSIZE handles. This limit is compiled in =
to the=20
standard library and user programs. (Some versions of the C library =
let you=20
raise this limit at user app compile time.)=20
<P>See <A=20
=
href=3D"http://www.kegel.com/dkftpbench/doc/Poller_select.html">Poller_se=
lect</A>=20
(<A=20
=
href=3D"http://www.kegel.com/dkftpbench/dkftpbench-0.44/Poller_select.cc"=
>cc</A>,=20
<A=20
=
href=3D"http://www.kegel.com/dkftpbench/dkftpbench-0.44/Poller_select.h">=
h</A>)=20
for an example of how to use select() interchangeably with other =
readiness=20
notification schemes.=20
<P></P>
<LI><A name=3Dnb.poll><B>The traditional poll()</B></A> <BR>There is =
no=20
hardcoded limit to the number of file descriptors poll() can handle, =
but it=20
does get slow about a few thousand, since most of the file descriptors =
are=20
idle at any one time, and scanning through thousands of file =
descriptors takes=20
time.=20
<P>Some OS's (e.g. Solaris 8) speed up poll() et al by use of =
techniques like=20
poll hinting, which was <A=20
=
href=3D"http://www.humanfactor.com/cgi-bin/cgi-delegate/apache-ML/nh/1999=
/May/0415.html">implemented=20
and benchmarked by Niels Provos</A> for Linux in 1999.=20
<P>See <A=20
=
href=3D"http://www.kegel.com/dkftpbench/doc/Poller_poll.html">Poller_poll=
</A>=20
(<A=20
=
href=3D"http://www.kegel.com/dkftpbench/dkftpbench-0.44/Poller_poll.cc">c=
c</A>,=20
<A =
href=3D"http://www.kegel.com/dkftpbench/dkftpbench-0.44/Poller_poll.h">h<=
/A>,=20
<A =
href=3D"http://www.kegel.com/dkftpbench/Poller_bench.html">benchmarks</A>=
)=20
for an example of how to use poll() interchangeably with other =
readiness=20
notification schemes.=20
<P></P>
<LI><A name=3Dnb./dev/poll><B>/dev/poll</B></A><BR>This is the =
recommended poll=20
replacement for Solaris.=20
<P>The idea behind /dev/poll is to take advantage of the fact that =
often=20
poll() is called many times with the same arguments. With /dev/poll, =
you get=20
an open handle to /dev/poll, and tell the OS just once what files =
you're=20
interested in by writing to that handle; from then on, you just read =
the set=20
of currently ready file descriptors from that handle.=20
<P>It appeared quietly in Solaris 7 (<A=20
=
href=3D"http://sunsolve.sun.com/pub-cgi/retrieve.pl?patchid=3D106541&=
collection=3Dfpatches">see=20
patchid 106541</A>) but its first public appearance was in <A=20
=
href=3D"http://docs.sun.com/ab2/coll.40.6/REFMAN7/@Ab2PageView/55123?Ab2L=
ang=3DC&Ab2Enc=3Diso-8859-1">Solaris=20
8</A>; <A href=3D"http://www.sun.com/sysadmin/ea/poll.html">according =
to=20
Sun</A>, at 750 clients, this has 10% of the overhead of poll().=20
<P>Various implementations of /dev/poll were tried on Linux, but none =
of them=20
perform as well as epoll, and were never really completed. /dev/poll =
use on=20
Linux is not recommended.=20
<P>See <A=20
=
href=3D"http://www.kegel.com/dkftpbench/doc/Poller_devpoll.html">Poller_d=
evpoll</A>=20
(<A=20
=
href=3D"http://www.kegel.com/dkftpbench/dkftpbench-0.44/Poller_devpoll.cc=
">cc</A>,=20
<A=20
=
href=3D"http://www.kegel.com/dkftpbench/dkftpbench-0.44/Poller_devpoll.h"=
>h</A>=20
<A =
href=3D"http://www.kegel.com/dkftpbench/Poller_bench.html">benchmarks</A>=
)=20
for an example of how to use /dev/poll interchangeably with many other =
readiness notification schemes. (Caution - the example is for Linux =
/dev/poll,=20
might not work right on Solaris.)=20
<P></P>
<LI><B>kqueue()</B><BR>This is the recommended poll replacement for =
FreeBSD=20
(and, soon, NetBSD).=20
<P><A href=3D"http://www.kegel.com/c10k.html#nb.kqueue">See below.</A> =
kqueue()=20
can specify either edge triggering or level triggering. </P></LI></UL>
<H3><A name=3Dnb.edge>2. Serve many clients with each thread, and use =
nonblocking=20
I/O and readiness <B>change</B> notification</A></H3>Readiness change=20
notification (or edge-triggered readiness notification) means you give =
the=20
kernel a file descriptor, and later, when that descriptor transitions =
from=20
<I>not ready</I> to <I>ready</I>, the kernel notifies you somehow. It =
then=20
assumes you know the file descriptor is ready, and will not send any =
more=20
readiness notifications of that type for that file descriptor until you =
do=20
something that causes the file descriptor to no longer be ready (e.g. =
until you=20
receive the EWOULDBLOCK error on a send, recv, or accept call, or a send =
or recv=20
transfers less than the requested number of bytes).=20
<P>When you use readiness change notification, you must be prepared for =
spurious=20
events, since one common implementation is to signal readiness whenever =
any=20
packets are received, regardless of whether the file descriptor was =
already=20
ready.=20
<P>This is the opposite of "<A=20
href=3D"http://www.kegel.com/c10k.html#nb">level-triggered</A>" =
readiness=20
notification. It's a bit less forgiving of programming mistakes, since =
if you=20
miss just one event, the connection that event was for gets stuck =
forever.=20
Nevertheless, I have found that edge-triggered readiness notification =
made=20
programming nonblocking clients with OpenSSL easier, so it's worth =
trying.=20
<P><A =
href=3D"http://www.cs.rice.edu/~druschel/usenix99event.ps.gz">[Banga, =
Mogul,=20
Drusha '99]</A> described this kind of scheme in 1999.=20
<P>There are several APIs which let the application retrieve 'file =
descriptor=20
became ready' notifications:=20
<UL>
<LI><A name=3Dnb.kqueue><B>kqueue()</B></A> This is the recommended=20
edge-triggered poll replacement for FreeBSD (and, soon, NetBSD).=20
<P>FreeBSD 4.3 and later, and <A=20
href=3D"http://kerneltrap.org/node.php?id=3D472">NetBSD-current as of =
Oct=20
2002</A>, support a generalized alternative to poll() called <A=20
=
href=3D"http://www.freebsd.org/cgi/man.cgi?query=3Dkqueue&apropos=3D0=
&sektion=3D0&manpath=3DFreeBSD+5.0-current&format=3Dhtml">kqu=
eue()/kevent()</A>;=20
it supports both edge-triggering and level-triggering. (See also <A=20
href=3D"http://people.freebsd.org/~jlemon/">Jonathan Lemon's page</A> =
and his <A=20
href=3D"http://people.freebsd.org/~jlemon/papers/kqueue.pdf">BSDCon =
2000 paper=20
on kqueue()</A>.)=20
<P>Like /dev/poll, you allocate a listening object, but rather than =
opening=20
the file /dev/poll, you call kqueue() to allocate one. To change the =
events=20
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -