📄 rewriteguide.html.en
字号:
internal run of a corresponding <code>page.cgi</code> if <code>page.html</code> is still missing or has filesize null. The trick here is that <code>page.cgi</code> is a usual CGI script which (additionally to its <code>STDOUT</code>) writes its output to the file <code>page.html</code>. Once it was run, the server sends out the data of <code>page.html</code>. When the webmaster wants to force a refresh the contents, he just removes <code>page.html</code> (usually done by a cronjob).</p> </dd> </dl> <h3>Document With Autorefresh</h3> <dl> <dt>Description:</dt> <dd> <p>Wouldn't it be nice while creating a complex webpage if the webbrowser would automatically refresh the page every time we write a new version from within our editor? Impossible?</p> </dd> <dt>Solution:</dt> <dd> <p>No! We just combine the MIME multipart feature, the webserver NPH feature and the URL manipulation power of <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>. First, we establish a new URL feature: Adding just <code>:refresh</code> to any URL causes this to be refreshed every time it gets updated on the filesystem.</p><div class="example"><pre>RewriteRule ^(/[uge]/[^/]+/?.*):refresh /internal/cgi/apache/nph-refresh?f=$1</pre></div> <p>Now when we reference the URL</p><div class="example"><pre>/u/foo/bar/page.html:refresh</pre></div> <p>this leads to the internal invocation of the URL</p><div class="example"><pre>/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html</pre></div> <p>The only missing part is the NPH-CGI script. Although one would usually say "left as an exercise to the reader" ;-) I will provide this, too.</p><div class="example"><pre>#!/sw/bin/perl#### nph-refresh -- NPH/CGI script for auto refreshing pages## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.##$| = 1;# split the QUERY_STRING variable@pairs = split(/&/, $ENV{'QUERY_STRING'});foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); $name =~ tr/A-Z/a-z/; $name = 'QS_' . $name; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; eval "\$$name = \"$value\"";}$QS_s = 1 if ($QS_s eq '');$QS_n = 3600 if ($QS_n eq '');if ($QS_f eq '') { print "HTTP/1.0 200 OK\n"; print "Content-type: text/html\n\n"; print "&lt;b&gt;ERROR&lt;/b&gt;: No file given\n"; exit(0);}if (! -f $QS_f) { print "HTTP/1.0 200 OK\n"; print "Content-type: text/html\n\n"; print "&lt;b&gt;ERROR&lt;/b&gt;: File $QS_f not found\n"; exit(0);}sub print_http_headers_multipart_begin { print "HTTP/1.0 200 OK\n"; $bound = "ThisRandomString12345"; print "Content-type: multipart/x-mixed-replace;boundary=$bound\n"; &print_http_headers_multipart_next;}sub print_http_headers_multipart_next { print "\n--$bound\n";}sub print_http_headers_multipart_end { print "\n--$bound--\n";}sub displayhtml { local($buffer) = @_; $len = length($buffer); print "Content-type: text/html\n"; print "Content-length: $len\n\n"; print $buffer;}sub readfile { local($file) = @_; local(*FP, $size, $buffer, $bytes); ($x, $x, $x, $x, $x, $x, $x, $size) = stat($file); $size = sprintf("%d", $size); open(FP, "&lt;$file"); $bytes = sysread(FP, $buffer, $size); close(FP); return $buffer;}$buffer = &readfile($QS_f);&print_http_headers_multipart_begin;&displayhtml($buffer);sub mystat { local($file) = $_[0]; local($time); ($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file); return $mtime;}$mtimeL = &mystat($QS_f);$mtime = $mtime;for ($n = 0; $n &lt; $QS_n; $n++) { while (1) { $mtime = &mystat($QS_f); if ($mtime ne $mtimeL) { $mtimeL = $mtime; sleep(2); $buffer = &readfile($QS_f); &print_http_headers_multipart_next; &displayhtml($buffer); sleep(5); $mtimeL = &mystat($QS_f); last; } sleep($QS_s); }}&print_http_headers_multipart_end;exit(0);##EOF##</pre></div> </dd> </dl> <h3>Mass Virtual Hosting</h3> <dl> <dt>Description:</dt> <dd> <p>The <code class="directive"><a href="../mod/core.html#virtualhost"><VirtualHost></a></code> feature of Apache is nice and works great when you just have a few dozens virtual hosts. But when you are an ISP and have hundreds of virtual hosts to provide this feature is not the best choice.</p> </dd> <dt>Solution:</dt> <dd> <p>To provide this feature we map the remote webpage or even the complete remote webarea to our namespace by the use of the <dfn>Proxy Throughput</dfn> feature (flag <code>[P]</code>):</p><div class="example"><pre>#### vhost.map##www.vhost1.dom:80 /path/to/docroot/vhost1www.vhost2.dom:80 /path/to/docroot/vhost2 :www.vhostN.dom:80 /path/to/docroot/vhostN</pre></div><div class="example"><pre>#### httpd.conf## :# use the canonical hostname on redirects, etc.UseCanonicalName on :# add the virtual host in front of the CLF-formatCustomLog /path/to/access_log "%{VHOST}e %h %l %u %t \"%r\" %>s %b" :# enable the rewriting engine in the main serverRewriteEngine on# define two maps: one for fixing the URL and one which defines# the available virtual hosts with their corresponding# DocumentRoot.RewriteMap lowercase int:tolowerRewriteMap vhost txt:/path/to/vhost.map# Now do the actual virtual host mapping# via a huge and complicated single rule:## 1. make sure we don't map for common locationsRewriteCond %{REQUEST_URI} !^/commonurl1/.*RewriteCond %{REQUEST_URI} !^/commonurl2/.* :RewriteCond %{REQUEST_URI} !^/commonurlN/.*## 2. make sure we have a Host header, because# currently our approach only supports# virtual hosting through this headerRewriteCond %{HTTP_HOST} !^$## 3. lowercase the hostnameRewriteCond ${lowercase:%{HTTP_HOST}|NONE} ^(.+)$## 4. lookup this hostname in vhost.map and# remember it only when it is a path# (and not "NONE" from above)RewriteCond ${vhost:%1} ^(/.*)$## 5. finally we can map the URL to its docroot location# and remember the virtual host for logging puposesRewriteRule ^/(.*)$ %1/$1 [E=VHOST:${lowercase:%{HTTP_HOST}}] :</pre></div> </dd> </dl> </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div><div class="section"><h2><a name="access" id="access">Access Restriction</a></h2> <h3>Blocking of Robots</h3> <dl> <dt>Description:</dt> <dd> <p>How can we block a really annoying robot from retrieving pages of a specific webarea? A <code>/robots.txt</code> file containing entries of the "Robot Exclusion Protocol" is typically not enough to get rid of such a robot.</p> </dd> <dt>Solution:</dt> <dd> <p>We use a ruleset which forbids the URLs of the webarea <code>/~quux/foo/arc/</code> (perhaps a very deep directory indexed area where the robot traversal would create big server load). We have to make sure that we forbid access only to the particular robot, i.e. just forbidding the host where the robot runs is not enough. This would block users from this host, too. We accomplish this by also matching the User-Agent HTTP header information.</p><div class="example"><pre>RewriteCond %{HTTP_USER_AGENT} ^<strong>NameOfBadRobot</strong>.*RewriteCond %{REMOTE_ADDR} ^<strong>123\.45\.67\.[8-9]</strong>$RewriteRule ^<strong>/~quux/foo/arc/</strong>.+ - [<strong>F</strong>]</pre></div> </dd> </dl> <h3>Blocked Inline-Images</h3> <dl> <dt>Description:</dt> <dd> <p>Assume we have under <code>http://www.quux-corp.de/~quux/</code> some pages with inlined GIF graphics. These graphics are nice, so others directly incorporate them via hyperlinks to their pages. We don't like this practice because it adds useless traffic to our server.</p> </dd> <dt>Solution:</dt> <dd> <p>While we cannot 100% protect the images from inclusion, we can at least restrict the cases where the browser sends a HTTP Referer header.</p><div class="example"><pre>RewriteCond %{HTTP_REFERER} <strong>!^$</strong>RewriteCond %{HTTP_REFERER} !^http://www.quux-corp.de/~quux/.*$ [NC]RewriteRule <strong>.*\.gif$</strong> - [F]</pre></div><div class="example"><pre>RewriteCond %{HTTP_REFERER} !^$RewriteCond %{HTTP_REFERER} !.*/foo-with-gif\.html$RewriteRule <strong>^inlined-in-foo\.gif$</strong> - [F]</pre></div> </dd> </dl> <h3>Host Deny</h3> <dl> <dt>Description:</dt> <dd> <p>How can we forbid a list of externally configured hosts from using our server?</p> </dd> <dt>Solution:</dt> <dd> <p>For Apache >= 1.3b6:</p><div class="example"><pre>RewriteEngine onRewriteMap hosts-deny txt:/path/to/hosts.denyRewriteCond ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND} !=NOT-FOUND [OR]RewriteCond ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND} !=NOT-FOUNDRewriteRule ^/.* - [F]</pre></div> <p>For Apache <= 1.3b6:</p><div class="example"><pre>RewriteEngine onRewriteMap hosts-deny txt:/path/to/hosts.denyRewriteRule ^/(.*)$ ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND}/$1RewriteRule !^NOT-FOUND/.* - [F]RewriteRule ^NOT-FOUND/(.*)$ ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}/$1RewriteRule !^NOT-FOUND/.* - [F]RewriteRule ^NOT-FOUND/(.*)$ /$1</pre></div><div class="example"><pre>#### hosts.deny#### ATTENTION! This is a map, not a list, even when we treat it as such.## mod_rewrite parses it for key/value pairs, so at least a## dummy value "-" must be present for each entry.##193.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -