📄 perf-tuning.html.en
字号:
the status report contains timing indications. For highest
performance, set <code>ExtendedStatus off</code> (which is the
default).</p>
<h3>accept Serialization - multiple sockets</h3>
<div class="warning"><h3>Warning:</h3>
<p>This section has not been fully updated
to take into account changes made in the 2.0 version of the
Apache HTTP Server. Some of the information may still be
relevant, but please use it with care.</p>
</div>
<p>This discusses a shortcoming in the Unix socket API. Suppose
your web server uses multiple <code class="directive"><a href="../mod/mpm_common.html#listen">Listen</a></code> statements to listen on either multiple
ports or multiple addresses. In order to test each socket
to see if a connection is ready Apache uses
<code>select(2)</code>. <code>select(2)</code> indicates that a
socket has <em>zero</em> or <em>at least one</em> connection
waiting on it. Apache's model includes multiple children, and
all the idle ones test for new connections at the same time. A
naive implementation looks something like this (these examples
do not match the code, they're contrived for pedagogical
purposes):</p>
<div class="example"><p><code>
for (;;) {<br />
<span class="indent">
for (;;) {<br />
<span class="indent">
fd_set accept_fds;<br />
<br />
FD_ZERO (&accept_fds);<br />
for (i = first_socket; i <= last_socket; ++i) {<br />
<span class="indent">
FD_SET (i, &accept_fds);<br />
</span>
}<br />
rc = select (last_socket+1, &accept_fds, NULL, NULL, NULL);<br />
if (rc < 1) continue;<br />
new_connection = -1;<br />
for (i = first_socket; i <= last_socket; ++i) {<br />
<span class="indent">
if (FD_ISSET (i, &accept_fds)) {<br />
<span class="indent">
new_connection = accept (i, NULL, NULL);<br />
if (new_connection != -1) break;<br />
</span>
}<br />
</span>
}<br />
if (new_connection != -1) break;<br />
</span>
}<br />
process the new_connection;<br />
</span>
}
</code></p></div>
<p>But this naive implementation has a serious starvation problem.
Recall that multiple children execute this loop at the same
time, and so multiple children will block at
<code>select</code> when they are in between requests. All
those blocked children will awaken and return from
<code>select</code> when a single request appears on any socket
(the number of children which awaken varies depending on the
operating system and timing issues). They will all then fall
down into the loop and try to <code>accept</code> the
connection. But only one will succeed (assuming there's still
only one connection ready), the rest will be <em>blocked</em>
in <code>accept</code>. This effectively locks those children
into serving requests from that one socket and no other
sockets, and they'll be stuck there until enough new requests
appear on that socket to wake them all up. This starvation
problem was first documented in <a href="http://bugs.apache.org/index/full/467">PR#467</a>. There
are at least two solutions.</p>
<p>One solution is to make the sockets non-blocking. In this
case the <code>accept</code> won't block the children, and they
will be allowed to continue immediately. But this wastes CPU
time. Suppose you have ten idle children in
<code>select</code>, and one connection arrives. Then nine of
those children will wake up, try to <code>accept</code> the
connection, fail, and loop back into <code>select</code>,
accomplishing nothing. Meanwhile none of those children are
servicing requests that occurred on other sockets until they
get back up to the <code>select</code> again. Overall this
solution does not seem very fruitful unless you have as many
idle CPUs (in a multiprocessor box) as you have idle children,
not a very likely situation.</p>
<p>Another solution, the one used by Apache, is to serialize
entry into the inner loop. The loop looks like this
(differences highlighted):</p>
<div class="example"><p><code>
for (;;) {<br />
<span class="indent">
<strong>accept_mutex_on ();</strong><br />
for (;;) {<br />
<span class="indent">
fd_set accept_fds;<br />
<br />
FD_ZERO (&accept_fds);<br />
for (i = first_socket; i <= last_socket; ++i) {<br />
<span class="indent">
FD_SET (i, &accept_fds);<br />
</span>
}<br />
rc = select (last_socket+1, &accept_fds, NULL, NULL, NULL);<br />
if (rc < 1) continue;<br />
new_connection = -1;<br />
for (i = first_socket; i <= last_socket; ++i) {<br />
<span class="indent">
if (FD_ISSET (i, &accept_fds)) {<br />
<span class="indent">
new_connection = accept (i, NULL, NULL);<br />
if (new_connection != -1) break;<br />
</span>
}<br />
</span>
}<br />
if (new_connection != -1) break;<br />
</span>
}<br />
<strong>accept_mutex_off ();</strong><br />
process the new_connection;<br />
</span>
}
</code></p></div>
<p><a id="serialize" name="serialize">The functions</a>
<code>accept_mutex_on</code> and <code>accept_mutex_off</code>
implement a mutual exclusion semaphore. Only one child can have
the mutex at any time. There are several choices for
implementing these mutexes. The choice is defined in
<code>src/conf.h</code> (pre-1.3) or
<code>src/include/ap_config.h</code> (1.3 or later). Some
architectures do not have any locking choice made, on these
architectures it is unsafe to use multiple
<code class="directive"><a href="../mod/mpm_common.html#listen">Listen</a></code>
directives.</p>
<p>The directive <code class="directive"><a href="../mod/mpm_common.html#acceptmutex">AcceptMutex</a></code> can be used to
change the selected mutex implementation at run-time.</p>
<dl>
<dt><code>AcceptMutex flock</code></dt>
<dd>
<p>This method uses the <code>flock(2)</code> system call to
lock a lock file (located by the <code class="directive"><a href="../mod/mpm_common.html#lockfile">LockFile</a></code> directive).</p>
</dd>
<dt><code>AcceptMutex fcntl</code></dt>
<dd>
<p>This method uses the <code>fcntl(2)</code> system call to
lock a lock file (located by the <code class="directive"><a href="../mod/mpm_common.html#lockfile">LockFile</a></code> directive).</p>
</dd>
<dt><code>AcceptMutex sysvsem</code></dt>
<dd>
<p>(1.3 or later) This method uses SysV-style semaphores to
implement the mutex. Unfortunately SysV-style semaphores have
some bad side-effects. One is that it's possible Apache will
die without cleaning up the semaphore (see the
<code>ipcs(8)</code> man page). The other is that the
semaphore API allows for a denial of service attack by any
CGIs running under the same uid as the webserver
(<em>i.e.</em>, all CGIs, unless you use something like
<code class="program"><a href="../programs/suexec.html">suexec</a></code> or <code>cgiwrapper</code>). For these
reasons this method is not used on any architecture except
IRIX (where the previous two are prohibitively expensive
on most IRIX boxes).</p>
</dd>
<dt><code>AcceptMutex pthread</code></dt>
<dd>
<p>(1.3 or later) This method uses POSIX mutexes and should
work on any architecture implementing the full POSIX threads
specification, however appears to only work on Solaris (2.5
or later), and even then only in certain configurations. If
you experiment with this you should watch out for your server
hanging and not responding. Static content only servers may
work just fine.</p>
</dd>
<dt><code>AcceptMutex posixsem</code></dt>
<dd>
<p>(2.0 or later) This method uses POSIX semaphores. The
semaphore ownership is not recovered if a thread in the process
holding the mutex segfaults, resulting in a hang of the web
server.</p>
</dd>
</dl>
<p>If your system has another method of serialization which
isn't in the above list then it may be worthwhile adding code
for it to APR.</p>
<p>Another solution that has been considered but never
implemented is to partially serialize the loop -- that is, let
in a certain number of processes. This would only be of
interest on multiprocessor boxes where it's possible multiple
children could run simultaneously, and the serialization
actually doesn't take advantage of the full bandwidth. This is
a possible area of future investigation, but priority remains
low because highly parallel web servers are not the norm.</p>
<p>Ideally you should run servers without multiple
<code class="directive"><a href="../mod/mpm_common.html#listen">Listen</a></code>
statements if you want the highest performance.
But read on.</p>
<h3>accept Serialization - single socket</h3>
<p>The above is fine and dandy for multiple socket servers, but
what about single socket servers? In theory they shouldn't
experience any of these same problems because all children can
just block in <code>accept(2)</code> until a connection
arrives, and no starvation results. In practice this hides
almost the same "spinning" behaviour discussed above in the
non-blocking solution. The way that most TCP stacks are
implemented, the kernel actually wakes up all processes blocked
in <code>accept</code> when a single connection arrives. One of
those processes gets the connection and returns to user-space,
the rest spin in the kernel and go back to sleep when they
discover there's no connection for them. This spinning is
hidden from the user-land code, but it's there nonetheless.
This can result in the same load-spiking wasteful behaviour
that a non-blocking solution to the multiple sockets case
can.</p>
<p>For this reason we have found that many architectures behave
more "nicely" if we serialize even the single socket case. So
this is actually the default in almost all cases. Crude
experiments under Linux (2.0.30 on a dual Pentium pro 166
w/128Mb RAM) have shown that the serialization of the single
socket case causes less than a 3% decrease in requests per
second over unserialized single-socket. But unserialized
single-socket showed an extra 100ms latency on each request.
This latency is probably a wash on long haul lines, and only an
issue on LANs. If you want to override the single socket
serialization you can define
<code>SINGLE_LISTEN_UNSERIALIZED_ACCEPT</code> and then
single-socket servers will not serialize at all.</p>
<h3>Lingering Close</h3>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -