📄 rewriteguide.html.en
字号:
</pre></div><div class="example"><pre>RewriteEngine onRewriteBase /~quux/RewriteRule ^<strong>usa-news\.html</strong>$ <strong>http://www.quux-corp.com/news/index.html</strong> [<strong>P</strong>]</pre></div> </dd> </dl> <h3>Reverse Dynamic Mirror</h3> <dl> <dt>Description:</dt> <dd>...</dd> <dt>Solution:</dt> <dd><div class="example"><pre>RewriteEngine onRewriteCond /mirror/of/remotesite/$1 -URewriteRule ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1</pre></div> </dd> </dl> <h3>Retrieve Missing Data from Intranet</h3> <dl> <dt>Description:</dt> <dd> <p>This is a tricky way of virtually running a corporate (external) Internet webserver (<code>www.quux-corp.dom</code>), while actually keeping and maintaining its data on a (internal) Intranet webserver (<code>www2.quux-corp.dom</code>) which is protected by a firewall. The trick is that on the external webserver we retrieve the requested data on-the-fly from the internal one.</p> </dd> <dt>Solution:</dt> <dd> <p>First, we have to make sure that our firewall still protects the internal webserver and that only the external webserver is allowed to retrieve data from it. For a packet-filtering firewall we could for instance configure a firewall ruleset like the following:</p><div class="example"><pre><strong>ALLOW</strong> Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port <strong>80</strong><strong>DENY</strong> Host * Port * --> Host www2.quux-corp.dom Port <strong>80</strong></pre></div> <p>Just adjust it to your actual configuration syntax. Now we can establish the <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code> rules which request the missing data in the background through the proxy throughput feature:</p><div class="example"><pre>RewriteRule ^/~([^/]+)/?(.*) /home/$1/.www/$2RewriteCond %{REQUEST_FILENAME} <strong>!-f</strong>RewriteCond %{REQUEST_FILENAME} <strong>!-d</strong>RewriteRule ^/home/([^/]+)/.www/?(.*) http://<strong>www2</strong>.quux-corp.dom/~$1/pub/$2 [<strong>P</strong>]</pre></div> </dd> </dl> <h3>Load Balancing</h3> <dl> <dt>Description:</dt> <dd> <p>Suppose we want to load balance the traffic to <code>www.foo.com</code> over <code>www[0-5].foo.com</code> (a total of 6 servers). How can this be done?</p> </dd> <dt>Solution:</dt> <dd> <p>There are a lot of possible solutions for this problem. We will discuss first a commonly known DNS-based variant and then the special one with <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>:</p> <ol> <li> <strong>DNS Round-Robin</strong> <p>The simplest method for load-balancing is to use the DNS round-robin feature of <code>BIND</code>. Here you just configure <code>www[0-9].foo.com</code> as usual in your DNS with A(address) records, e.g.</p><div class="example"><pre>www0 IN A 1.2.3.1www1 IN A 1.2.3.2www2 IN A 1.2.3.3www3 IN A 1.2.3.4www4 IN A 1.2.3.5www5 IN A 1.2.3.6</pre></div> <p>Then you additionally add the following entry:</p><div class="example"><pre>www IN CNAME www0.foo.com. IN CNAME www1.foo.com. IN CNAME www2.foo.com. IN CNAME www3.foo.com. IN CNAME www4.foo.com. IN CNAME www5.foo.com. IN CNAME www6.foo.com.</pre></div> <p>Notice that this seems wrong, but is actually an intended feature of <code>BIND</code> and can be used in this way. However, now when <code>www.foo.com</code> gets resolved, <code>BIND</code> gives out <code>www0-www6</code> - but in a slightly permutated/rotated order every time. This way the clients are spread over the various servers. But notice that this not a perfect load balancing scheme, because DNS resolve information gets cached by the other nameservers on the net, so once a client has resolved <code>www.foo.com</code> to a particular <code>wwwN.foo.com</code>, all subsequent requests also go to this particular name <code>wwwN.foo.com</code>. But the final result is ok, because the total sum of the requests are really spread over the various webservers.</p> </li> <li> <strong>DNS Load-Balancing</strong> <p>A sophisticated DNS-based method for load-balancing is to use the program <code>lbnamed</code> which can be found at <a href="http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html"> http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html</a>. It is a Perl 5 program in conjunction with auxilliary tools which provides a real load-balancing for DNS.</p> </li> <li> <strong>Proxy Throughput Round-Robin</strong> <p>In this variant we use <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code> and its proxy throughput feature. First we dedicate <code>www0.foo.com</code> to be actually <code>www.foo.com</code> by using a single</p><div class="example"><pre>www IN CNAME www0.foo.com.</pre></div> <p>entry in the DNS. Then we convert <code>www0.foo.com</code> to a proxy-only server, i.e. we configure this machine so all arriving URLs are just pushed through the internal proxy to one of the 5 other servers (<code>www1-www5</code>). To accomplish this we first establish a ruleset which contacts a load balancing script <code>lb.pl</code> for all URLs.</p><div class="example"><pre>RewriteEngine onRewriteMap lb prg:/path/to/lb.plRewriteRule ^/(.+)$ ${lb:$1} [P,L]</pre></div> <p>Then we write <code>lb.pl</code>:</p><div class="example"><pre>#!/path/to/perl#### lb.pl -- load balancing script##$| = 1;$name = "www"; # the hostname base$first = 1; # the first server (not 0 here, because 0 is myself)$last = 5; # the last server in the round-robin$domain = "foo.dom"; # the domainname$cnt = 0;while (<STDIN>) { $cnt = (($cnt+1) % ($last+1-$first)); $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain); print "http://$server/$_";}##EOF##</pre></div> <div class="note">A last notice: Why is this useful? Seems like <code>www0.foo.com</code> still is overloaded? The answer is yes, it is overloaded, but with plain proxy throughput requests, only! All SSI, CGI, ePerl, etc. processing is completely done on the other machines. This is the essential point.</div> </li> <li> <strong>Hardware/TCP Round-Robin</strong> <p>There is a hardware solution available, too. Cisco has a beast called LocalDirector which does a load balancing at the TCP/IP level. Actually this is some sort of a circuit level gateway in front of a webcluster. If you have enough money and really need a solution with high performance, use this one.</p> </li> </ol> </dd> </dl> <h3>New MIME-type, New Service</h3> <dl> <dt>Description:</dt> <dd> <p>On the net there are a lot of nifty CGI programs. But their usage is usually boring, so a lot of webmaster don't use them. Even Apache's Action handler feature for MIME-types is only appropriate when the CGI programs don't need special URLs (actually <code>PATH_INFO</code> and <code>QUERY_STRINGS</code>) as their input. First, let us configure a new file type with extension <code>.scgi</code> (for secure CGI) which will be processed by the popular <code>cgiwrap</code> program. The problem here is that for instance we use a Homogeneous URL Layout (see above) a file inside the user homedirs has the URL <code>/u/user/foo/bar.scgi</code>. But <code>cgiwrap</code> needs the URL in the form <code>/~user/foo/bar.scgi/</code>. The following rule solves the problem:</p><div class="example"><pre>RewriteRule ^/[uge]/<strong>([^/]+)</strong>/\.www/(.+)\.scgi(.*) ...... /internal/cgi/user/cgiwrap/~<strong>$1</strong>/$2.scgi$3 [NS,<strong>T=application/x-http-cgi</strong>]</pre></div> <p>Or assume we have some more nifty programs: <code>wwwlog</code> (which displays the <code>access.log</code> for a URL subtree and <code>wwwidx</code> (which runs Glimpse on a URL subtree). We have to provide the URL area to these programs so they know on which area they have to act on. But usually this ugly, because they are all the times still requested from that areas, i.e. typically we would run the <code>swwidx</code> program from within <code>/u/user/foo/</code> via hyperlink to</p><div class="example"><pre>/internal/cgi/user/swwidx?i=/u/user/foo/</pre></div> <p>which is ugly. Because we have to hard-code <strong>both</strong> the location of the area <strong>and</strong> the location of the CGI inside the hyperlink. When we have to reorganize the area, we spend a lot of time changing the various hyperlinks.</p> </dd> <dt>Solution:</dt> <dd> <p>The solution here is to provide a special new URL format which automatically leads to the proper CGI invocation. We configure the following:</p><div class="example"><pre>RewriteRule ^/([uge])/([^/]+)(/?.*)/\* /internal/cgi/user/wwwidx?i=/$1/$2$3/RewriteRule ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3</pre></div> <p>Now the hyperlink to search at <code>/u/user/foo/</code> reads only</p><div class="example"><pre>HREF="*"</pre></div> <p>which internally gets automatically transformed to</p><div class="example"><pre>/internal/cgi/user/wwwidx?i=/u/user/foo/</pre></div> <p>The same approach leads to an invocation for the access log CGI program when the hyperlink <code>:log</code> gets used.</p> </dd> </dl> <h3>From Static to Dynamic</h3> <dl> <dt>Description:</dt> <dd> <p>How can we transform a static page <code>foo.html</code> into a dynamic variant <code>foo.cgi</code> in a seamless way, i.e. without notice by the browser/user.</p> </dd> <dt>Solution:</dt> <dd> <p>We just rewrite the URL to the CGI-script and force the correct MIME-type so it gets really run as a CGI-script. This way a request to <code>/~quux/foo.html</code> internally leads to the invocation of <code>/~quux/foo.cgi</code>.</p><div class="example"><pre>RewriteEngine onRewriteBase /~quux/RewriteRule ^foo\.<strong>html</strong>$ foo.<strong>cgi</strong> [T=<strong>application/x-httpd-cgi</strong>]</pre></div> </dd> </dl> <h3>On-the-fly Content-Regeneration</h3> <dl> <dt>Description:</dt> <dd> <p>Here comes a really esoteric feature: Dynamically generated but statically served pages, i.e. pages should be delivered as pure static pages (read from the filesystem and just passed through), but they have to be generated dynamically by the webserver if missing. This way you can have CGI-generated pages which are statically served unless one (or a cronjob) removes the static contents. Then the contents gets refreshed.</p> </dd> <dt>Solution:</dt> <dd> This is done via the following ruleset:<div class="example"><pre>RewriteCond %{REQUEST_FILENAME} <strong>!-s</strong>RewriteRule ^page\.<strong>html</strong>$ page.<strong>cgi</strong> [T=application/x-httpd-cgi,L]</pre></div> <p>Here a request to <code>page.html</code> leads to a
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -