📄 rewrite_guide_advanced.html
字号:
load-balancing is to use the program
<code>lbnamed</code> which can be found at <a href="http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html">
http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html</a>.
It is a Perl 5 program in conjunction with auxilliary
tools which provides a real load-balancing for
DNS.</p>
</li>
<li>
<strong>Proxy Throughput Round-Robin</strong>
<p>In this variant we use <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>
and its proxy throughput feature. First we dedicate
<code>www0.foo.com</code> to be actually
<code>www.foo.com</code> by using a single</p>
<div class="example"><pre>
www IN CNAME www0.foo.com.
</pre></div>
<p>entry in the DNS. Then we convert
<code>www0.foo.com</code> to a proxy-only server,
i.e. we configure this machine so all arriving URLs
are just pushed through the internal proxy to one of
the 5 other servers (<code>www1-www5</code>). To
accomplish this we first establish a ruleset which
contacts a load balancing script <code>lb.pl</code>
for all URLs.</p>
<div class="example"><pre>
RewriteEngine on
RewriteMap lb prg:/path/to/lb.pl
RewriteRule ^/(.+)$ ${lb:$1} [P,L]
</pre></div>
<p>Then we write <code>lb.pl</code>:</p>
<div class="example"><pre>
#!/path/to/perl
##
## lb.pl -- load balancing script
##
$| = 1;
$name = "www"; # the hostname base
$first = 1; # the first server (not 0 here, because 0 is myself)
$last = 5; # the last server in the round-robin
$domain = "foo.dom"; # the domainname
$cnt = 0;
while (<STDIN>) {
$cnt = (($cnt+1) % ($last+1-$first));
$server = sprintf("%s%d.%s", $name, $cnt+$first, $domain);
print "http://$server/$_";
}
##EOF##
</pre></div>
<div class="note">A last notice: Why is this useful? Seems like
<code>www0.foo.com</code> still is overloaded? The
answer is yes, it is overloaded, but with plain proxy
throughput requests, only! All SSI, CGI, ePerl, etc.
processing is completely done on the other machines.
This is the essential point.</div>
</li>
<li>
<strong>Hardware/TCP Round-Robin</strong>
<p>There is a hardware solution available, too. Cisco
has a beast called LocalDirector which does a load
balancing at the TCP/IP level. Actually this is some
sort of a circuit level gateway in front of a
webcluster. If you have enough money and really need
a solution with high performance, use this one.</p>
</li>
</ol>
</dd>
</dl>
<h3>New MIME-type, New Service</h3>
<dl>
<dt>Description:</dt>
<dd>
<p>On the net there are a lot of nifty CGI programs. But
their usage is usually boring, so a lot of webmaster
don't use them. Even Apache's Action handler feature for
MIME-types is only appropriate when the CGI programs
don't need special URLs (actually <code>PATH_INFO</code>和<code>QUERY_STRINGS</code>) as their input. First,
let us configure a new file type with extension
<code>.scgi</code> (for secure CGI) which will be processed
by the popular <code>cgiwrap</code> program. The problem
here is that for instance we use a Homogeneous URL Layout
(see above) a file inside the user homedirs has the URL
<code>/u/user/foo/bar.scgi</code>. But
<code>cgiwrap</code> needs the URL in the form
<code>/~user/foo/bar.scgi/</code>. The following rule
solves the problem:</p>
<div class="example"><pre>
RewriteRule ^/[uge]/<strong>([^/]+)</strong>/\.www/(.+)\.scgi(.*) ...
... /internal/cgi/user/cgiwrap/~<strong>$1</strong>/$2.scgi$3 [NS,<strong>T=application/x-http-cgi</strong>]
</pre></div>
<p>Or assume we have some more nifty programs:
<code>wwwlog</code> (which displays the
<code>access.log</code> for a URL subtree and
<code>wwwidx</code> (which runs Glimpse on a URL
subtree). We have to provide the URL area to these
programs so they know on which area they have to act on.
But usually this ugly, because they are all the times
still requested from that areas, i.e. typically we would
run the <code>swwidx</code> program from within
<code>/u/user/foo/</code> via hyperlink to</p>
<div class="example"><pre>
/internal/cgi/user/swwidx?i=/u/user/foo/
</pre></div>
<p>which is ugly. Because we have to hard-code
<strong>both</strong> the location of the area
<strong>和</strong> the location of the CGI inside the
hyperlink. When we have to reorganize the area, we spend a
lot of time changing the various hyperlinks.</p>
</dd>
<dt>Solution:</dt>
<dd>
<p>The solution here is to provide a special new URL format
which automatically leads to the proper CGI invocation.
We configure the following:</p>
<div class="example"><pre>
RewriteRule ^/([uge])/([^/]+)(/?.*)/\* /internal/cgi/user/wwwidx?i=/$1/$2$3/
RewriteRule ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3
</pre></div>
<p>Now the hyperlink to search at
<code>/u/user/foo/</code> reads only</p>
<div class="example"><pre>
HREF="*"
</pre></div>
<p>which internally gets automatically transformed to</p>
<div class="example"><pre>
/internal/cgi/user/wwwidx?i=/u/user/foo/
</pre></div>
<p>The same approach leads to an invocation for the
access log CGI program when the hyperlink
<code>:log</code> gets used.</p>
</dd>
</dl>
<h3>On-the-fly Content-Regeneration</h3>
<dl>
<dt>Description:</dt>
<dd>
<p>Here comes a really esoteric feature: Dynamically
generated but statically served pages, i.e. pages should be
delivered as pure static pages (read from the filesystem
and just passed through), but they have to be generated
dynamically by the webserver if missing. This way you can
have CGI-generated pages which are statically served unless
one (or a cronjob) removes the static contents. Then the
contents gets refreshed.</p>
</dd>
<dt>Solution:</dt>
<dd>
This is done via the following ruleset:
<div class="example"><pre>
RewriteCond %{REQUEST_FILENAME} <strong>!-s</strong>
RewriteRule ^page\.<strong>html</strong>$ page.<strong>cgi</strong> [T=application/x-httpd-cgi,L]
</pre></div>
<p>Here a request to <code>page.html</code> leads to a
internal run of a corresponding <code>page.cgi</code> if
<code>page.html</code> is still missing or has filesize
null. The trick here is that <code>page.cgi</code> is a
usual CGI script which (additionally to its <code>STDOUT</code>)
writes its output to the file <code>page.html</code>.
Once it was run, the server sends out the data of
<code>page.html</code>. When the webmaster wants to force
a refresh the contents, he just removes
<code>page.html</code> (usually done by a cronjob).</p>
</dd>
</dl>
<h3>Document With Autorefresh</h3>
<dl>
<dt>Description:</dt>
<dd>
<p>Wouldn't it be nice while creating a complex webpage if
the webbrowser would automatically refresh the page every
time we write a new version from within our editor?
Impossible?</p>
</dd>
<dt>Solution:</dt>
<dd>
<p>No! We just combine the MIME multipart feature, the
webserver NPH feature and the URL manipulation power of
<code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>. First, we establish a new
URL feature: Adding just <code>:refresh</code> to any
URL causes this to be refreshed every time it gets
updated on the filesystem.</p>
<div class="example"><pre>
RewriteRule ^(/[uge]/[^/]+/?.*):refresh /internal/cgi/apache/nph-refresh?f=$1
</pre></div>
<p>Now when we reference the URL</p>
<div class="example"><pre>
/u/foo/bar/page.html:refresh
</pre></div>
<p>this leads to the internal invocation of the URL</p>
<div class="example"><pre>
/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html
</pre></div>
<p>The only missing part is the NPH-CGI script. Although
one would usually say "left as an exercise to the reader"
;-) I will provide this, too.</p>
<div class="example"><pre>
#!/sw/bin/perl
##
## nph-refresh -- NPH/CGI script for auto refreshing pages
## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
##
$| = 1;
# split the QUERY_STRING variable
@pairs = split(/&/, $ENV{'QUERY_STRING'});
foreach $pair (@pairs) {
($name, $value) = split(/=/, $pair);
$name =~ tr/A-Z/a-z/;
$name = 'QS_' . $name;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
eval "\$$name = \"$value\"";
}
$QS_s = 1 if ($QS_s eq ");
$QS_n = 3600 if ($QS_n eq ");
if ($QS_f eq ") {
print "HTTP/1.0 200 OK\n";
print "Content-type: text/html\n\n";
print "&lt;b&gt;ERROR&lt;/b&gt;: No file given\n";
exit(0);
}
if (! -f $QS_f) {
print "HTTP/1.0 200 OK\n";
print "Content-type: text/html\n\n";
print "&lt;b&gt;ERROR&lt;/b&gt;: File $QS_f not found\n";
exit(0);
}
sub print_http_headers_multipart_begin {
print "HTTP/1.0 200 OK\n";
$bound = "ThisRandomString12345";
print "Content-type: multipart/x-mixed-replace;boundary=$bound\n";
&print_http_headers_multipart_next;
}
sub print_http_headers_multipart_next {
print "\n--$bound\n";
}
sub print_http_headers_multipart_end {
print "\n--$bound--\n";
}
sub displayhtml {
local($buffer) = @_;
$len = length($buffer);
print "Content-type: text/html\n";
print "Content-length: $len\n\n";
print $buffer;
}
sub readfile {
local($file) = @_;
local(*FP, $size, $buffer, $bytes);
($x, $x, $x, $x, $x, $x, $x, $size) = stat($file);
$size = sprintf("%d", $size);
open(FP, "&lt;$file");
$bytes = sysread(FP, $buffer, $size);
close(FP);
return $buffer;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -