📄 perf-tuning.html.en
字号:
the status report contains timing indications. For highest performance, set <code>ExtendedStatus off</code> (which is the default).</p> <h3>accept Serialization - multiple sockets</h3> <div class="warning"><h3>Warning:</h3> <p>This section has not been fully updated to take into account changes made in the 2.x version of the Apache HTTP Server. Some of the information may still be relevant, but please use it with care.</p> </div> <p>This discusses a shortcoming in the Unix socket API. Suppose your web server uses multiple <code class="directive"><a href="../mod/mpm_common.html#listen">Listen</a></code> statements to listen on either multiple ports or multiple addresses. In order to test each socket to see if a connection is ready Apache uses <code>select(2)</code>. <code>select(2)</code> indicates that a socket has <em>zero</em> or <em>at least one</em> connection waiting on it. Apache's model includes multiple children, and all the idle ones test for new connections at the same time. A naive implementation looks something like this (these examples do not match the code, they're contrived for pedagogical purposes):</p> <div class="example"><p><code> for (;;) {<br /> <span class="indent"> for (;;) {<br /> <span class="indent"> fd_set accept_fds;<br /> <br /> FD_ZERO (&accept_fds);<br /> for (i = first_socket; i <= last_socket; ++i) {<br /> <span class="indent"> FD_SET (i, &accept_fds);<br /> </span> }<br /> rc = select (last_socket+1, &accept_fds, NULL, NULL, NULL);<br /> if (rc < 1) continue;<br /> new_connection = -1;<br /> for (i = first_socket; i <= last_socket; ++i) {<br /> <span class="indent"> if (FD_ISSET (i, &accept_fds)) {<br /> <span class="indent"> new_connection = accept (i, NULL, NULL);<br /> if (new_connection != -1) break;<br /> </span> }<br /> </span> }<br /> if (new_connection != -1) break;<br /> </span> }<br /> process the new_connection;<br /> </span> } </code></p></div> <p>But this naive implementation has a serious starvation problem. Recall that multiple children execute this loop at the same time, and so multiple children will block at <code>select</code> when they are in between requests. All those blocked children will awaken and return from <code>select</code> when a single request appears on any socket (the number of children which awaken varies depending on the operating system and timing issues). They will all then fall down into the loop and try to <code>accept</code> the connection. But only one will succeed (assuming there's still only one connection ready), the rest will be <em>blocked</em> in <code>accept</code>. This effectively locks those children into serving requests from that one socket and no other sockets, and they'll be stuck there until enough new requests appear on that socket to wake them all up. This starvation problem was first documented in <a href="http://bugs.apache.org/index/full/467">PR#467</a>. There are at least two solutions.</p> <p>One solution is to make the sockets non-blocking. In this case the <code>accept</code> won't block the children, and they will be allowed to continue immediately. But this wastes CPU time. Suppose you have ten idle children in <code>select</code>, and one connection arrives. Then nine of those children will wake up, try to <code>accept</code> the connection, fail, and loop back into <code>select</code>, accomplishing nothing. Meanwhile none of those children are servicing requests that occurred on other sockets until they get back up to the <code>select</code> again. Overall this solution does not seem very fruitful unless you have as many idle CPUs (in a multiprocessor box) as you have idle children, not a very likely situation.</p> <p>Another solution, the one used by Apache, is to serialize entry into the inner loop. The loop looks like this (differences highlighted):</p> <div class="example"><p><code> for (;;) {<br /> <span class="indent"> <strong>accept_mutex_on ();</strong><br /> for (;;) {<br /> <span class="indent"> fd_set accept_fds;<br /> <br /> FD_ZERO (&accept_fds);<br /> for (i = first_socket; i <= last_socket; ++i) {<br /> <span class="indent"> FD_SET (i, &accept_fds);<br /> </span> }<br /> rc = select (last_socket+1, &accept_fds, NULL, NULL, NULL);<br /> if (rc < 1) continue;<br /> new_connection = -1;<br /> for (i = first_socket; i <= last_socket; ++i) {<br /> <span class="indent"> if (FD_ISSET (i, &accept_fds)) {<br /> <span class="indent"> new_connection = accept (i, NULL, NULL);<br /> if (new_connection != -1) break;<br /> </span> }<br /> </span> }<br /> if (new_connection != -1) break;<br /> </span> }<br /> <strong>accept_mutex_off ();</strong><br /> process the new_connection;<br /> </span> } </code></p></div> <p><a id="serialize" name="serialize">The functions</a> <code>accept_mutex_on</code> and <code>accept_mutex_off</code> implement a mutual exclusion semaphore. Only one child can have the mutex at any time. There are several choices for implementing these mutexes. The choice is defined in <code>src/conf.h</code> (pre-1.3) or <code>src/include/ap_config.h</code> (1.3 or later). Some architectures do not have any locking choice made, on these architectures it is unsafe to use multiple <code class="directive"><a href="../mod/mpm_common.html#listen">Listen</a></code> directives.</p> <p>The directive <code class="directive"><a href="../mod/mpm_common.html#acceptmutex">AcceptMutex</a></code> can be used to change the selected mutex implementation at run-time.</p> <dl> <dt><code>AcceptMutex flock</code></dt> <dd> <p>This method uses the <code>flock(2)</code> system call to lock a lock file (located by the <code class="directive"><a href="../mod/mpm_common.html#lockfile">LockFile</a></code> directive).</p> </dd> <dt><code>AcceptMutex fcntl</code></dt> <dd> <p>This method uses the <code>fcntl(2)</code> system call to lock a lock file (located by the <code class="directive"><a href="../mod/mpm_common.html#lockfile">LockFile</a></code> directive).</p> </dd> <dt><code>AcceptMutex sysvsem</code></dt> <dd> <p>(1.3 or later) This method uses SysV-style semaphores to implement the mutex. Unfortunately SysV-style semaphores have some bad side-effects. One is that it's possible Apache will die without cleaning up the semaphore (see the <code>ipcs(8)</code> man page). The other is that the semaphore API allows for a denial of service attack by any CGIs running under the same uid as the webserver (<em>i.e.</em>, all CGIs, unless you use something like <code class="program"><a href="../programs/suexec.html">suexec</a></code> or <code>cgiwrapper</code>). For these reasons this method is not used on any architecture except IRIX (where the previous two are prohibitively expensive on most IRIX boxes).</p> </dd> <dt><code>AcceptMutex pthread</code></dt> <dd> <p>(1.3 or later) This method uses POSIX mutexes and should work on any architecture implementing the full POSIX threads specification, however appears to only work on Solaris (2.5 or later), and even then only in certain configurations. If you experiment with this you should watch out for your server hanging and not responding. Static content only servers may work just fine.</p> </dd> <dt><code>AcceptMutex posixsem</code></dt> <dd> <p>(2.0 or later) This method uses POSIX semaphores. The semaphore ownership is not recovered if a thread in the process holding the mutex segfaults, resulting in a hang of the web server.</p> </dd> </dl> <p>If your system has another method of serialization which isn't in the above list then it may be worthwhile adding code for it to APR.</p> <p>Another solution that has been considered but never implemented is to partially serialize the loop -- that is, let in a certain number of processes. This would only be of interest on multiprocessor boxes where it's possible multiple children could run simultaneously, and the serialization actually doesn't take advantage of the full bandwidth. This is a possible area of future investigation, but priority remains low because highly parallel web servers are not the norm.</p> <p>Ideally you should run servers without multiple <code class="directive"><a href="../mod/mpm_common.html#listen">Listen</a></code> statements if you want the highest performance. But read on.</p> <h3>accept Serialization - single socket</h3> <p>The above is fine and dandy for multiple socket servers, but what about single socket servers? In theory they shouldn't experience any of these same problems because all children can just block in <code>accept(2)</code> until a connection arrives, and no starvation results. In practice this hides almost the same "spinning" behaviour discussed above in the non-blocking solution. The way that most TCP stacks are implemented, the kernel actually wakes up all processes blocked in <code>accept</code> when a single connection arrives. One of those processes gets the connection and returns to user-space, the rest spin in the kernel and go back to sleep when they discover there's no connection for them. This spinning is hidden from the user-land code, but it's there nonetheless. This can result in the same load-spiking wasteful behaviour that a non-blocking solution to the multiple sockets case can.</p> <p>For this reason we have found that many architectures behave more "nicely" if we serialize even the single socket case. So this is actually the default in almost all cases. Crude experiments under Linux (2.0.30 on a dual Pentium pro 166 w/128Mb RAM) have shown that the serialization of the single socket case causes less than a 3% decrease in requests per second over unserialized single-socket. But unserialized single-socket showed an extra 100ms latency on each request. This latency is probably a wash on long haul lines, and only an issue on LANs. If you want to override the single socket serialization you can define <code>SINGLE_LISTEN_UNSERIALIZED_ACCEPT</code> and then single-socket servers will not serialize at all.</p> <h3>Lingering Close</h3>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -