📄 rewriteguide.html.en
字号:
<code>foo.cgi</code> in a seamless way, i.e. without notice
by the browser/user.</p>
</dd>
<dt>Solution:</dt>
<dd>
<p>We just rewrite the URL to the CGI-script and force the
correct MIME-type so it gets really run as a CGI-script.
This way a request to <code>/~quux/foo.html</code>
internally leads to the invocation of
<code>/~quux/foo.cgi</code>.</p>
<div class="example"><pre>
RewriteEngine on
RewriteBase /~quux/
RewriteRule ^foo\.<strong>html</strong>$ foo.<strong>cgi</strong> [T=<strong>application/x-httpd-cgi</strong>]
</pre></div>
</dd>
</dl>
<h3>On-the-fly Content-Regeneration</h3>
<dl>
<dt>Description:</dt>
<dd>
<p>Here comes a really esoteric feature: Dynamically
generated but statically served pages, i.e. pages should be
delivered as pure static pages (read from the filesystem
and just passed through), but they have to be generated
dynamically by the webserver if missing. This way you can
have CGI-generated pages which are statically served unless
one (or a cronjob) removes the static contents. Then the
contents gets refreshed.</p>
</dd>
<dt>Solution:</dt>
<dd>
This is done via the following ruleset:
<div class="example"><pre>
RewriteCond %{REQUEST_FILENAME} <strong>!-s</strong>
RewriteRule ^page\.<strong>html</strong>$ page.<strong>cgi</strong> [T=application/x-httpd-cgi,L]
</pre></div>
<p>Here a request to <code>page.html</code> leads to a
internal run of a corresponding <code>page.cgi</code> if
<code>page.html</code> is still missing or has filesize
null. The trick here is that <code>page.cgi</code> is a
usual CGI script which (additionally to its <code>STDOUT</code>)
writes its output to the file <code>page.html</code>.
Once it was run, the server sends out the data of
<code>page.html</code>. When the webmaster wants to force
a refresh the contents, he just removes
<code>page.html</code> (usually done by a cronjob).</p>
</dd>
</dl>
<h3>Document With Autorefresh</h3>
<dl>
<dt>Description:</dt>
<dd>
<p>Wouldn't it be nice while creating a complex webpage if
the webbrowser would automatically refresh the page every
time we write a new version from within our editor?
Impossible?</p>
</dd>
<dt>Solution:</dt>
<dd>
<p>No! We just combine the MIME multipart feature, the
webserver NPH feature and the URL manipulation power of
<code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>. First, we establish a new
URL feature: Adding just <code>:refresh</code> to any
URL causes this to be refreshed every time it gets
updated on the filesystem.</p>
<div class="example"><pre>
RewriteRule ^(/[uge]/[^/]+/?.*):refresh /internal/cgi/apache/nph-refresh?f=$1
</pre></div>
<p>Now when we reference the URL</p>
<div class="example"><pre>
/u/foo/bar/page.html:refresh
</pre></div>
<p>this leads to the internal invocation of the URL</p>
<div class="example"><pre>
/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html
</pre></div>
<p>The only missing part is the NPH-CGI script. Although
one would usually say "left as an exercise to the reader"
;-) I will provide this, too.</p>
<div class="example"><pre>
#!/sw/bin/perl
##
## nph-refresh -- NPH/CGI script for auto refreshing pages
## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
##
$| = 1;
# split the QUERY_STRING variable
@pairs = split(/&/, $ENV{'QUERY_STRING'});
foreach $pair (@pairs) {
($name, $value) = split(/=/, $pair);
$name =~ tr/A-Z/a-z/;
$name = 'QS_' . $name;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
eval "\$$name = \"$value\"";
}
$QS_s = 1 if ($QS_s eq '');
$QS_n = 3600 if ($QS_n eq '');
if ($QS_f eq '') {
print "HTTP/1.0 200 OK\n";
print "Content-type: text/html\n\n";
print "&lt;b&gt;ERROR&lt;/b&gt;: No file given\n";
exit(0);
}
if (! -f $QS_f) {
print "HTTP/1.0 200 OK\n";
print "Content-type: text/html\n\n";
print "&lt;b&gt;ERROR&lt;/b&gt;: File $QS_f not found\n";
exit(0);
}
sub print_http_headers_multipart_begin {
print "HTTP/1.0 200 OK\n";
$bound = "ThisRandomString12345";
print "Content-type: multipart/x-mixed-replace;boundary=$bound\n";
&print_http_headers_multipart_next;
}
sub print_http_headers_multipart_next {
print "\n--$bound\n";
}
sub print_http_headers_multipart_end {
print "\n--$bound--\n";
}
sub displayhtml {
local($buffer) = @_;
$len = length($buffer);
print "Content-type: text/html\n";
print "Content-length: $len\n\n";
print $buffer;
}
sub readfile {
local($file) = @_;
local(*FP, $size, $buffer, $bytes);
($x, $x, $x, $x, $x, $x, $x, $size) = stat($file);
$size = sprintf("%d", $size);
open(FP, "&lt;$file");
$bytes = sysread(FP, $buffer, $size);
close(FP);
return $buffer;
}
$buffer = &readfile($QS_f);
&print_http_headers_multipart_begin;
&displayhtml($buffer);
sub mystat {
local($file) = $_[0];
local($time);
($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file);
return $mtime;
}
$mtimeL = &mystat($QS_f);
$mtime = $mtime;
for ($n = 0; $n &lt; $QS_n; $n++) {
while (1) {
$mtime = &mystat($QS_f);
if ($mtime ne $mtimeL) {
$mtimeL = $mtime;
sleep(2);
$buffer = &readfile($QS_f);
&print_http_headers_multipart_next;
&displayhtml($buffer);
sleep(5);
$mtimeL = &mystat($QS_f);
last;
}
sleep($QS_s);
}
}
&print_http_headers_multipart_end;
exit(0);
##EOF##
</pre></div>
</dd>
</dl>
<h3>Mass Virtual Hosting</h3>
<dl>
<dt>Description:</dt>
<dd>
<p>The <code class="directive"><a href="../mod/core.html#virtualhost"><VirtualHost></a></code> feature of Apache is nice
and works great when you just have a few dozens
virtual hosts. But when you are an ISP and have hundreds of
virtual hosts to provide this feature is not the best
choice.</p>
</dd>
<dt>Solution:</dt>
<dd>
<p>To provide this feature we map the remote webpage or even
the complete remote webarea to our namespace by the use
of the <dfn>Proxy Throughput</dfn> feature (flag <code>[P]</code>):</p>
<div class="example"><pre>
##
## vhost.map
##
www.vhost1.dom:80 /path/to/docroot/vhost1
www.vhost2.dom:80 /path/to/docroot/vhost2
:
www.vhostN.dom:80 /path/to/docroot/vhostN
</pre></div>
<div class="example"><pre>
##
## httpd.conf
##
:
# use the canonical hostname on redirects, etc.
UseCanonicalName on
:
# add the virtual host in front of the CLF-format
CustomLog /path/to/access_log "%{VHOST}e %h %l %u %t \"%r\" %>s %b"
:
# enable the rewriting engine in the main server
RewriteEngine on
# define two maps: one for fixing the URL and one which defines
# the available virtual hosts with their corresponding
# DocumentRoot.
RewriteMap lowercase int:tolower
RewriteMap vhost txt:/path/to/vhost.map
# Now do the actual virtual host mapping
# via a huge and complicated single rule:
#
# 1. make sure we don't map for common locations
RewriteCond %{REQUEST_URI} !^/commonurl1/.*
RewriteCond %{REQUEST_URI} !^/commonurl2/.*
:
RewriteCond %{REQUEST_URI} !^/commonurlN/.*
#
# 2. make sure we have a Host header, because
# currently our approach only supports
# virtual hosting through this header
RewriteCond %{HTTP_HOST} !^$
#
# 3. lowercase the hostname
RewriteCond ${lowercase:%{HTTP_HOST}|NONE} ^(.+)$
#
# 4. lookup this hostname in vhost.map and
# remember it only when it is a path
# (and not "NONE" from above)
RewriteCond ${vhost:%1} ^(/.*)$
#
# 5. finally we can map the URL to its docroot location
# and remember the virtual host for logging puposes
RewriteRule ^/(.*)$ %1/$1 [E=VHOST:${lowercase:%{HTTP_HOST}}]
:
</pre></div>
</dd>
</dl>
</div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
<div class="section">
<h2><a name="access" id="access">Access Restriction</a></h2>
<h3>Blocking of Robots</h3>
<dl>
<dt>Description:</dt>
<dd>
<p>How can we block a really annoying robot from
retrieving pages of a specific webarea? A
<code>/robots.txt</code> file containing entries of the
"Robot Exclusion Protocol" is typically not enough to get
rid of such a robot.</p>
</dd>
<dt>Solution:</dt>
<dd>
<p>We use a ruleset which forbids the URLs of the webarea
<code>/~quux/foo/arc/</code> (perhaps a very deep
directory indexed area where the robot traversal would
create big server load). We have to make sure that we
forbid access only to the particular robot, i.e. just
forbidding the host where the robot runs is not enough.
This would block users from this host, too. We accomplish
this by also matching the User-Agent HTTP header
information.</p>
<div class="example"><pre>
RewriteCond %{HTTP_USER_AGENT} ^<strong>NameOfBadRobot</strong>.*
RewriteCond %{REMOTE_ADDR} ^<strong>123\.45\.67\.[8-9]</strong>$
RewriteRule ^<strong>/~quux/foo/arc/</strong>.+ - [<strong>F</strong>]
</pre></div>
</dd>
</dl>
<h3>Blocked Inline-Images</h3>
<dl>
<dt>Description:</dt>
<dd>
<p>Assume we have under <code>http://www.quux-corp.de/~quux/</code>
some pages with inlined GIF graphics. These graphics are
nice, so others directly incorporate them via hyperlinks to
their pages. We don't like this practice because it adds
useless traffic to our server.</p>
</dd>
<dt>Solution:</dt>
<dd
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -