📄 rewrite_guide_advanced.html.en
字号:
But notice that this performs more poorly than using an <code class="directive"><a href="../mod/core.html#errordocument">ErrorDocument</a></code> CGI-script!</p> </dd> <dt>Solution:</dt> <dd> <p>The first solution has the best performance but less flexibility, and is less error safe:</p><div class="example"><pre>RewriteEngine onRewriteCond /your/docroot/%{REQUEST_FILENAME} <strong>!-f</strong>RewriteRule ^(.+) http://<strong>webserverB</strong>.dom/$1</pre></div> <p>The problem here is that this will only work for pages inside the <code class="directive"><a href="../mod/core.html#documentroot">DocumentRoot</a></code>. While you can add more Conditions (for instance to also handle homedirs, etc.) there is better variant:</p><div class="example"><pre>RewriteEngine onRewriteCond %{REQUEST_URI} <strong>!-U</strong>RewriteRule ^(.+) http://<strong>webserverB</strong>.dom/$1</pre></div> <p>This uses the URL look-ahead feature of <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>. The result is that this will work for all types of URLs and is a safe way. But it does a performance impact on the webserver, because for every request there is one more internal subrequest. So, if your webserver runs on a powerful CPU, use this one. If it is a slow machine, use the first approach or better a <code class="directive"><a href="../mod/core.html#errordocument">ErrorDocument</a></code> CGI-script.</p> </dd> </dl> </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div><div class="section"><h2>Archive Access Multiplexer</h2> <dl> <dt>Description:</dt> <dd> <p>Do you know the great CPAN (Comprehensive Perl Archive Network) under <a href="http://www.perl.com/CPAN">http://www.perl.com/CPAN</a>? This does a redirect to one of several FTP servers around the world which carry a CPAN mirror and is approximately near the location of the requesting client. Actually this can be called an FTP access multiplexing service. While CPAN runs via CGI scripts, how can a similar approach implemented via <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>?</p> </dd> <dt>Solution:</dt> <dd> <p>First we notice that from version 3.0.0 <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code> can also use the "<code>ftp:</code>" scheme on redirects. And second, the location approximation can be done by a <code class="directive"><a href="../mod/mod_rewrite.html#rewritemap">RewriteMap</a></code> over the top-level domain of the client. With a tricky chained ruleset we can use this top-level domain as a key to our multiplexing map.</p><div class="example"><pre>RewriteEngine onRewriteMap multiplex txt:/path/to/map.cxanRewriteRule ^/CxAN/(.*) %{REMOTE_HOST}::$1 [C]RewriteRule ^.+\.<strong>([a-zA-Z]+)</strong>::(.*)$ ${multiplex:<strong>$1</strong>|ftp.default.dom}$2 [R,L]</pre></div><div class="example"><pre>#### map.cxan -- Multiplexing Map for CxAN##de ftp://ftp.cxan.de/CxAN/uk ftp://ftp.cxan.uk/CxAN/com ftp://ftp.cxan.com/CxAN/ :##EOF##</pre></div> </dd> </dl> </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div><div class="section"><h2><a name="content" id="content">Content Handling</a></h2> <h3>Browser Dependent Content</h3> <dl> <dt>Description:</dt> <dd> <p>At least for important top-level pages it is sometimes necessary to provide the optimum of browser dependent content, i.e. one has to provide a maximum version for the latest Netscape variants, a minimum version for the Lynx browsers and a average feature version for all others.</p> </dd> <dt>Solution:</dt> <dd> <p>We cannot use content negotiation because the browsers do not provide their type in that form. Instead we have to act on the HTTP header "User-Agent". The following condig does the following: If the HTTP header "User-Agent" begins with "Mozilla/3", the page <code>foo.html</code> is rewritten to <code>foo.NS.html</code> and and the rewriting stops. If the browser is "Lynx" or "Mozilla" of version 1 or 2 the URL becomes <code>foo.20.html</code>. All other browsers receive page <code>foo.32.html</code>. This is done by the following ruleset:</p><div class="example"><pre>RewriteCond %{HTTP_USER_AGENT} ^<strong>Mozilla/3</strong>.*RewriteRule ^foo\.html$ foo.<strong>NS</strong>.html [<strong>L</strong>]RewriteCond %{HTTP_USER_AGENT} ^<strong>Lynx/</strong>.* [OR]RewriteCond %{HTTP_USER_AGENT} ^<strong>Mozilla/[12]</strong>.*RewriteRule ^foo\.html$ foo.<strong>20</strong>.html [<strong>L</strong>]RewriteRule ^foo\.html$ foo.<strong>32</strong>.html [<strong>L</strong>]</pre></div> </dd> </dl> <h3>Dynamic Mirror</h3> <dl> <dt>Description:</dt> <dd> <p>Assume there are nice webpages on remote hosts we want to bring into our namespace. For FTP servers we would use the <code>mirror</code> program which actually maintains an explicit up-to-date copy of the remote data on the local machine. For a webserver we could use the program <code>webcopy</code> which acts similar via HTTP. But both techniques have one major drawback: The local copy is always just as up-to-date as often we run the program. It would be much better if the mirror is not a static one we have to establish explicitly. Instead we want a dynamic mirror with data which gets updated automatically when there is need (updated data on the remote host).</p> </dd> <dt>Solution:</dt> <dd> <p>To provide this feature we map the remote webpage or even the complete remote webarea to our namespace by the use of the <dfn>Proxy Throughput</dfn> feature (flag <code>[P]</code>):</p><div class="example"><pre>RewriteEngine onRewriteBase /~quux/RewriteRule ^<strong>hotsheet/</strong>(.*)$ <strong>http://www.tstimpreso.com/hotsheet/</strong>$1 [<strong>P</strong>]</pre></div><div class="example"><pre>RewriteEngine onRewriteBase /~quux/RewriteRule ^<strong>usa-news\.html</strong>$ <strong>http://www.quux-corp.com/news/index.html</strong> [<strong>P</strong>]</pre></div> </dd> </dl> <h3>Reverse Dynamic Mirror</h3> <dl> <dt>Description:</dt> <dd>...</dd> <dt>Solution:</dt> <dd><div class="example"><pre>RewriteEngine onRewriteCond /mirror/of/remotesite/$1 -URewriteRule ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1</pre></div> </dd> </dl> <h3>Retrieve Missing Data from Intranet</h3> <dl> <dt>Description:</dt> <dd> <p>This is a tricky way of virtually running a corporate (external) Internet webserver (<code>www.quux-corp.dom</code>), while actually keeping and maintaining its data on a (internal) Intranet webserver (<code>www2.quux-corp.dom</code>) which is protected by a firewall. The trick is that on the external webserver we retrieve the requested data on-the-fly from the internal one.</p> </dd> <dt>Solution:</dt> <dd> <p>First, we have to make sure that our firewall still protects the internal webserver and that only the external webserver is allowed to retrieve data from it. For a packet-filtering firewall we could for instance configure a firewall ruleset like the following:</p><div class="example"><pre><strong>ALLOW</strong> Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port <strong>80</strong><strong>DENY</strong> Host * Port * --> Host www2.quux-corp.dom Port <strong>80</strong></pre></div> <p>Just adjust it to your actual configuration syntax. Now we can establish the <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code> rules which request the missing data in the background through the proxy throughput feature:</p><div class="example"><pre>RewriteRule ^/~([^/]+)/?(.*) /home/$1/.www/$2RewriteCond %{REQUEST_FILENAME} <strong>!-f</strong>RewriteCond %{REQUEST_FILENAME} <strong>!-d</strong>RewriteRule ^/home/([^/]+)/.www/?(.*) http://<strong>www2</strong>.quux-corp.dom/~$1/pub/$2 [<strong>P</strong>]</pre></div> </dd> </dl> <h3>Load Balancing</h3> <dl> <dt>Description:</dt> <dd> <p>Suppose we want to load balance the traffic to <code>www.foo.com</code> over <code>www[0-5].foo.com</code> (a total of 6 servers). How can this be done?</p> </dd> <dt>Solution:</dt> <dd> <p>There are a lot of possible solutions for this problem. We will discuss first a commonly known DNS-based variant and then the special one with <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>:</p> <ol> <li> <strong>DNS Round-Robin</strong> <p>The simplest method for load-balancing is to use the DNS round-robin feature of <code>BIND</code>. Here you just configure <code>www[0-9].foo.com</code> as usual in your DNS with A(address) records, e.g.</p><div class="example"><pre>www0 IN A 1.2.3.1www1 IN A 1.2.3.2www2 IN A 1.2.3.3www3 IN A 1.2.3.4www4 IN A 1.2.3.5www5 IN A 1.2.3.6</pre></div> <p>Then you additionally add the following entry:</p><div class="example"><pre>www IN CNAME www0.foo.com. IN CNAME www1.foo.com. IN CNAME www2.foo.com. IN CNAME www3.foo.com. IN CNAME www4.foo.com. IN CNAME www5.foo.com. IN CNAME www6.foo.com.</pre></div> <p>Notice that this seems wrong, but is actually an intended feature of <code>BIND</code> and can be used in this way. However, now when <code>www.foo.com</code> gets resolved, <code>BIND</code> gives out <code>www0-www6</code> - but in a slightly permutated/rotated order every time. This way the clients are spread over the various servers. But notice that this not a perfect load balancing scheme, because DNS resolve information gets cached by the other nameservers on the net, so once a client has resolved <code>www.foo.com</code> to a particular <code>wwwN.foo.com</code>, all subsequent requests also go to this particular name <code>wwwN.foo.com</code>. But the final result is ok, because the total sum of the requests are really spread over the various webservers.</p>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -