📄 perf-tuning.html.en
字号:
<p>As discussed in <a href="http://www.ics.uci.edu/pub/ietf/http/draft-ietf-http-connection-00.txt"> draft-ietf-http-connection-00.txt</a> section 8, in order for an HTTP server to <strong>reliably</strong> implement the protocol it needs to shutdown each direction of the communication independently (recall that a TCP connection is bi-directional, each half is independent of the other). This fact is often overlooked by other servers, but is correctly implemented in Apache as of 1.2.</p> <p>When this feature was added to Apache it caused a flurry of problems on various versions of Unix because of a shortsightedness. The TCP specification does not state that the <code>FIN_WAIT_2</code> state has a timeout, but it doesn't prohibit it. On systems without the timeout, Apache 1.2 induces many sockets stuck forever in the <code>FIN_WAIT_2</code> state. In many cases this can be avoided by simply upgrading to the latest TCP/IP patches supplied by the vendor. In cases where the vendor has never released patches (<em>i.e.</em>, SunOS4 -- although folks with a source license can patch it themselves) we have decided to disable this feature.</p> <p>There are two ways of accomplishing this. One is the socket option <code>SO_LINGER</code>. But as fate would have it, this has never been implemented properly in most TCP/IP stacks. Even on those stacks with a proper implementation (<em>i.e.</em>, Linux 2.0.31) this method proves to be more expensive (cputime) than the next solution.</p> <p>For the most part, Apache implements this in a function called <code>lingering_close</code> (in <code>http_main.c</code>). The function looks roughly like this:</p> <div class="example"><p><code> void lingering_close (int s)<br /> {<br /> <span class="indent"> char junk_buffer[2048];<br /> <br /> /* shutdown the sending side */<br /> shutdown (s, 1);<br /> <br /> signal (SIGALRM, lingering_death);<br /> alarm (30);<br /> <br /> for (;;) {<br /> <span class="indent"> select (s for reading, 2 second timeout);<br /> if (error) break;<br /> if (s is ready for reading) {<br /> <span class="indent"> if (read (s, junk_buffer, sizeof (junk_buffer)) <= 0) {<br /> <span class="indent"> break;<br /> </span> }<br /> /* just toss away whatever is here */<br /> </span> }<br /> </span> }<br /> <br /> close (s);<br /> </span> } </code></p></div> <p>This naturally adds some expense at the end of a connection, but it is required for a reliable implementation. As HTTP/1.1 becomes more prevalent, and all connections are persistent, this expense will be amortized over more requests. If you want to play with fire and disable this feature you can define <code>NO_LINGCLOSE</code>, but this is not recommended at all. In particular, as HTTP/1.1 pipelined persistent connections come into use <code>lingering_close</code> is an absolute necessity (and <a href="http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html"> pipelined connections are faster</a>, so you want to support them).</p> <h3>Scoreboard File</h3> <p>Apache's parent and children communicate with each other through something called the scoreboard. Ideally this should be implemented in shared memory. For those operating systems that we either have access to, or have been given detailed ports for, it typically is implemented using shared memory. The rest default to using an on-disk file. The on-disk file is not only slow, but it is unreliable (and less featured). Peruse the <code>src/main/conf.h</code> file for your architecture and look for either <code>USE_MMAP_SCOREBOARD</code> or <code>USE_SHMGET_SCOREBOARD</code>. Defining one of those two (as well as their companions <code>HAVE_MMAP</code> and <code>HAVE_SHMGET</code> respectively) enables the supplied shared memory code. If your system has another type of shared memory, edit the file <code>src/main/http_main.c</code> and add the hooks necessary to use it in Apache. (Send us back a patch too please.)</p> <div class="note">Historical note: The Linux port of Apache didn't start to use shared memory until version 1.2 of Apache. This oversight resulted in really poor and unreliable behaviour of earlier versions of Apache on Linux.</div> <h3>DYNAMIC_MODULE_LIMIT</h3> <p>If you have no intention of using dynamically loaded modules (you probably don't if you're reading this and tuning your server for every last ounce of performance) then you should add <code>-DDYNAMIC_MODULE_LIMIT=0</code> when building your server. This will save RAM that's allocated only for supporting dynamically loaded modules.</p> </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div><div class="section"><h2><a name="trace" id="trace">Appendix: Detailed Analysis of a Trace</a></h2> <p>Here is a system call trace of Apache 2.0.38 with the worker MPM on Solaris 8. This trace was collected using:</p> <div class="example"><p><code> truss -l -p <var>httpd_child_pid</var>. </code></p></div> <p>The <code>-l</code> option tells truss to log the ID of the LWP (lightweight process--Solaris's form of kernel-level thread) that invokes each system call.</p> <p>Other systems may have different system call tracing utilities such as <code>strace</code>, <code>ktrace</code>, or <code>par</code>. They all produce similar output.</p> <p>In this trace, a client has requested a 10KB static file from the httpd. Traces of non-static requests or requests with content negotiation look wildly different (and quite ugly in some cases).</p> <div class="example"><pre>/67: accept(3, 0x00200BEC, 0x00200C0C, 1) (sleeping...)/67: accept(3, 0x00200BEC, 0x00200C0C, 1) = 9</pre></div> <p>In this trace, the listener thread is running within LWP #67.</p> <div class="note">Note the lack of <code>accept(2)</code> serialization. On this particular platform, the worker MPM uses an unserialized accept by default unless it is listening on multiple ports.</div> <div class="example"><pre>/65: lwp_park(0x00000000, 0) = 0/67: lwp_unpark(65, 1) = 0</pre></div> <p>Upon accepting the connection, the listener thread wakes up a worker thread to do the request processing. In this trace, the worker thread that handles the request is mapped to LWP #65.</p> <div class="example"><pre>/65: getsockname(9, 0x00200BA4, 0x00200BC4, 1) = 0</pre></div> <p>In order to implement virtual hosts, Apache needs to know the local socket address used to accept the connection. It is possible to eliminate this call in many situations (such as when there are no virtual hosts, or when <code class="directive"><a href="../mod/mpm_common.html#listen">Listen</a></code> directives are used which do not have wildcard addresses). But no effort has yet been made to do these optimizations. </p> <div class="example"><pre>/65: brk(0x002170E8) = 0/65: brk(0x002190E8) = 0</pre></div> <p>The <code>brk(2)</code> calls allocate memory from the heap. It is rare to see these in a system call trace, because the httpd uses custom memory allocators (<code>apr_pool</code> and <code>apr_bucket_alloc</code>) for most request processing. In this trace, the httpd has just been started, so it must call <code>malloc(3)</code> to get the blocks of raw memory with which to create the custom memory allocators.</p> <div class="example"><pre>/65: fcntl(9, F_GETFL, 0x00000000) = 2/65: fstat64(9, 0xFAF7B818) = 0/65: getsockopt(9, 65535, 8192, 0xFAF7B918, 0xFAF7B910, 2190656) = 0/65: fstat64(9, 0xFAF7B818) = 0/65: getsockopt(9, 65535, 8192, 0xFAF7B918, 0xFAF7B914, 2190656) = 0/65: setsockopt(9, 65535, 8192, 0xFAF7B918, 4, 2190656) = 0/65: fcntl(9, F_SETFL, 0x00000082) = 0</pre></div> <p>Next, the worker thread puts the connection to the client (file descriptor 9) in non-blocking mode. The <code>setsockopt(2)</code> and <code>getsockopt(2)</code> calls are a side-effect of how Solaris's libc handles <code>fcntl(2)</code> on sockets.</p> <div class="example"><pre>/65: read(9, " G E T / 1 0 k . h t m".., 8000) = 97</pre></div> <p>The worker thread reads the request from the client.</p> <div class="example"><pre>/65: stat("/var/httpd/apache/httpd-8999/htdocs/10k.html", 0xFAF7B978) = 0/65: open("/var/httpd/apache/httpd-8999/htdocs/10k.html", O_RDONLY) = 10</pre></div> <p>This httpd has been configured with <code>Options FollowSymLinks</code> and <code>AllowOverride None</code>. Thus it doesn't need to <code>lstat(2)</code> each directory in the path leading up to the requested file, nor check for <code>.htaccess</code> files. It simply calls <code>stat(2)</code> to verify that the file: 1) exists, and 2) is a regular file, not a directory.</p> <div class="example"><pre>/65: sendfilev(0, 9, 0x00200F90, 2, 0xFAF7B53C) = 10269</pre></div> <p>In this example, the httpd is able to send the HTTP response header and the requested file with a single <code>sendfilev(2)</code> system call. Sendfile semantics vary among operating systems. On some other systems, it is necessary to do a <code>write(2)</code> or <code>writev(2)</code> call to send the headers before calling <code>sendfile(2)</code>.</p> <div class="example"><pre>/65: write(4, " 1 2 7 . 0 . 0 . 1 - ".., 78) = 78</pre></div> <p>This <code>write(2)</code> call records the request in the access log. Note that one thing missing from this trace is a <code>time(2)</code> call. Unlike Apache 1.3, Apache 2.x uses <code>gettimeofday(3)</code> to look up the time. On some operating systems, like Linux or Solaris, <code>gettimeofday</code> has an optimized implementation that doesn't require as much overhead as a typical system call.</p> <div class="example"><pre>/65: shutdown(9, 1, 1) = 0/65: poll(0xFAF7B980, 1, 2000) = 1/65: read(9, 0xFAF7BC20, 512) = 0/65: close(9) = 0</pre></div> <p>The worker thread does a lingering close of the connection.</p> <div class="example"><pre>/65: close(10) = 0/65: lwp_park(0x00000000, 0) (sleeping...)</pre></div> <p>Finally the worker thread closes the file that it has just delivered and blocks until the listener assigns it another connection.</p> <div class="example"><pre>/67: accept(3, 0x001FEB74, 0x001FEB94, 1) (sleeping...)</pre></div> <p>Meanwhile, the listener thread is able to accept another connection as soon as it has dispatched this connection to a worker thread (subject to some flow-control logic in the worker MPM that throttles the listener if all the available workers are busy). Though it isn't apparent from this trace, the next <code>accept(2)</code> can (and usually does, under high load conditions) occur in parallel with the worker thread's handling of the just-accepted connection.</p> </div></div><div class="bottomlang"><p><span>Available Languages: </span><a href="../en/misc/perf-tuning.html" title="English"> en </a> |<a href="../ko/misc/perf-tuning.html" hreflang="ko" rel="alternate" title="Korean"> ko </a></p></div><div id="footer"><p class="apache">Copyright 2007 The Apache Software Foundation.<br />Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.</p><p class="menu"><a href="../mod/">Modules</a> | <a href="../mod/directives.html">Directives</a> | <a href="../faq/">FAQ</a> | <a href="../glossary.html">Glossary</a> | <a href="../sitemap.html">Sitemap</a></p></div></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -