📄 rewriteguide.html.en

📁 Apache_2.0.59-Openssl_0.9 配置tomcat. Apache_2.0.59-Openssl_0.9 配置tomcat.
💻 EN
📖 第 1 页 / 共 5 页
字号:
        </dd>
      </dl>

    

    <h3>Dynamic Mirror</h3>

      

      <dl>
        <dt>Description:</dt>

        <dd>
          <p>Assume there are nice webpages on remote hosts we want
          to bring into our namespace. For FTP servers we would use
          the <code>mirror</code> program which actually maintains an
          explicit up-to-date copy of the remote data on the local
          machine. For a webserver we could use the program
          <code>webcopy</code> which acts similar via HTTP. But both
          techniques have one major drawback: The local copy is
          always just as up-to-date as often we run the program. It
          would be much better if the mirror is not a static one we
          have to establish explicitly. Instead we want a dynamic
          mirror with data which gets updated automatically when
          there is need (updated data on the remote host).</p>
        </dd>

        <dt>Solution:</dt>

        <dd>
          <p>To provide this feature we map the remote webpage or even
          the complete remote webarea to our namespace by the use
          of the <dfn>Proxy Throughput</dfn> feature
          (flag <code>[P]</code>):</p>

<div class="example"><pre>
RewriteEngine  on
RewriteBase    /~quux/
RewriteRule    ^<strong>hotsheet/</strong>(.*)$  <strong>http://www.tstimpreso.com/hotsheet/</strong>$1  [<strong>P</strong>]
</pre></div>

<div class="example"><pre>
RewriteEngine  on
RewriteBase    /~quux/
RewriteRule    ^<strong>usa-news\.html</strong>$   <strong>http://www.quux-corp.com/news/index.html</strong>  [<strong>P</strong>]
</pre></div>
        </dd>
      </dl>

    

    <h3>Reverse Dynamic Mirror</h3>

      

      <dl>
        <dt>Description:</dt>

        <dd>...</dd>

        <dt>Solution:</dt>

        <dd>
<div class="example"><pre>
RewriteEngine on
RewriteCond   /mirror/of/remotesite/$1           -U
RewriteRule   ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1
</pre></div>
        </dd>
      </dl>

    

    <h3>Retrieve Missing Data from Intranet</h3>

      

      <dl>
        <dt>Description:</dt>

        <dd>
          <p>This is a tricky way of virtually running a corporate
          (external) Internet webserver
          (<code>www.quux-corp.dom</code>), while actually keeping
          and maintaining its data on a (internal) Intranet webserver
          (<code>www2.quux-corp.dom</code>) which is protected by a
          firewall. The trick is that on the external webserver we
          retrieve the requested data on-the-fly from the internal
          one.</p>
        </dd>

        <dt>Solution:</dt>

        <dd>
          <p>First, we have to make sure that our firewall still
          protects the internal webserver and that only the
          external webserver is allowed to retrieve data from it.
          For a packet-filtering firewall we could for instance
          configure a firewall ruleset like the following:</p>

<div class="example"><pre>
<strong>ALLOW</strong> Host www.quux-corp.dom Port &gt;1024 --&gt; Host www2.quux-corp.dom Port <strong>80</strong>
<strong>DENY</strong>  Host *                 Port *     --&gt; Host www2.quux-corp.dom Port <strong>80</strong>
</pre></div>

          <p>Just adjust it to your actual configuration syntax.
          Now we can establish the <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>
          rules which request the missing data in the background
          through the proxy throughput feature:</p>

<div class="example"><pre>
RewriteRule ^/~([^/]+)/?(.*)          /home/$1/.www/$2
RewriteCond %{REQUEST_FILENAME}       <strong>!-f</strong>
RewriteCond %{REQUEST_FILENAME}       <strong>!-d</strong>
RewriteRule ^/home/([^/]+)/.www/?(.*) http://<strong>www2</strong>.quux-corp.dom/~$1/pub/$2 [<strong>P</strong>]
</pre></div>
        </dd>
      </dl>

    

    <h3>Load Balancing</h3>

      

      <dl>
        <dt>Description:</dt>

        <dd>
          <p>Suppose we want to load balance the traffic to
          <code>www.foo.com</code> over <code>www[0-5].foo.com</code>
          (a total of 6 servers). How can this be done?</p>
        </dd>

        <dt>Solution:</dt>

        <dd>
          <p>There are a lot of possible solutions for this problem.
          We will discuss first a commonly known DNS-based variant
          and then the special one with <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>:</p>

          <ol>
            <li>
              <strong>DNS Round-Robin</strong>

              <p>The simplest method for load-balancing is to use
              the DNS round-robin feature of <code>BIND</code>.
              Here you just configure <code>www[0-9].foo.com</code>
              as usual in your DNS with A(address) records, e.g.</p>

<div class="example"><pre>
www0   IN  A       1.2.3.1
www1   IN  A       1.2.3.2
www2   IN  A       1.2.3.3
www3   IN  A       1.2.3.4
www4   IN  A       1.2.3.5
www5   IN  A       1.2.3.6
</pre></div>

              <p>Then you additionally add the following entry:</p>

<div class="example"><pre>
www    IN  CNAME   www0.foo.com.
       IN  CNAME   www1.foo.com.
       IN  CNAME   www2.foo.com.
       IN  CNAME   www3.foo.com.
       IN  CNAME   www4.foo.com.
       IN  CNAME   www5.foo.com.
       IN  CNAME   www6.foo.com.
</pre></div>

              <p>Notice that this seems wrong, but is actually an
              intended feature of <code>BIND</code> and can be used
              in this way. However, now when <code>www.foo.com</code> gets
              resolved, <code>BIND</code> gives out <code>www0-www6</code>
              - but in a slightly permutated/rotated order every time.
              This way the clients are spread over the various
              servers. But notice that this not a perfect load
              balancing scheme, because DNS resolve information
              gets cached by the other nameservers on the net, so
              once a client has resolved <code>www.foo.com</code>
              to a particular <code>wwwN.foo.com</code>, all
              subsequent requests also go to this particular name
              <code>wwwN.foo.com</code>. But the final result is
              ok, because the total sum of the requests are really
              spread over the various webservers.</p>
            </li>

            <li>
              <strong>DNS Load-Balancing</strong>

              <p>A sophisticated DNS-based method for
              load-balancing is to use the program
              <code>lbnamed</code> which can be found at <a href="http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html">
              http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html</a>.
              It is a Perl 5 program in conjunction with auxilliary
              tools which provides a real load-balancing for
              DNS.</p>
            </li>

            <li>
              <strong>Proxy Throughput Round-Robin</strong>

              <p>In this variant we use <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>
              and its proxy throughput feature. First we dedicate
              <code>www0.foo.com</code> to be actually
              <code>www.foo.com</code> by using a single</p>

<div class="example"><pre>
www    IN  CNAME   www0.foo.com.
</pre></div>

              <p>entry in the DNS. Then we convert
              <code>www0.foo.com</code> to a proxy-only server,
              i.e. we configure this machine so all arriving URLs
              are just pushed through the internal proxy to one of
              the 5 other servers (<code>www1-www5</code>). To
              accomplish this we first establish a ruleset which
              contacts a load balancing script <code>lb.pl</code>
              for all URLs.</p>

<div class="example"><pre>
RewriteEngine on
RewriteMap    lb      prg:/path/to/lb.pl
RewriteRule   ^/(.+)$ ${lb:$1}           [P,L]
</pre></div>

              <p>Then we write <code>lb.pl</code>:</p>

<div class="example"><pre>
#!/path/to/perl
##
##  lb.pl -- load balancing script
##

$| = 1;

$name   = "www";     # the hostname base
$first  = 1;         # the first server (not 0 here, because 0 is myself)
$last   = 5;         # the last server in the round-robin
$domain = "foo.dom"; # the domainname

$cnt = 0;
while (&lt;STDIN&gt;) {
    $cnt = (($cnt+1) % ($last+1-$first));
    $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain);
    print "http://$server/$_";
}

##EOF##
</pre></div>

              <div class="note">A last notice: Why is this useful? Seems like
              <code>www0.foo.com</code> still is overloaded? The
              answer is yes, it is overloaded, but with plain proxy
              throughput requests, only! All SSI, CGI, ePerl, etc.
              processing is completely done on the other machines.
              This is the essential point.</div>
            </li>

            <li>
              <strong>Hardware/TCP Round-Robin</strong>

              <p>There is a hardware solution available, too. Cisco
              has a beast called LocalDirector which does a load
              balancing at the TCP/IP level. Actually this is some
              sort of a circuit level gateway in front of a
              webcluster. If you have enough money and really need
              a solution with high performance, use this one.</p>
            </li>
          </ol>
        </dd>
      </dl>

    

    <h3>New MIME-type, New Service</h3>

      

      <dl>
        <dt>Description:</dt>

        <dd>
          <p>On the net there are a lot of nifty CGI programs. But
          their usage is usually boring, so a lot of webmaster
          don't use them. Even Apache's Action handler feature for
          MIME-types is only appropriate when the CGI programs
          don't need special URLs (actually <code>PATH_INFO</code>
          and <code>QUERY_STRINGS</code>) as their input. First,
          let us configure a new file type with extension
          <code>.scgi</code> (for secure CGI) which will be processed
          by the popular <code>cgiwrap</code> program. The problem
          here is that for instance we use a Homogeneous URL Layout
          (see above) a file inside the user homedirs has the URL
          <code>/u/user/foo/bar.scgi</code>. But
          <code>cgiwrap</code> needs the URL in the form
          <code>/~user/foo/bar.scgi/</code>. The following rule
          solves the problem:</p>

<div class="example"><pre>
RewriteRule ^/[uge]/<strong>([^/]+)</strong>/\.www/(.+)\.scgi(.*) ...
... /internal/cgi/user/cgiwrap/~<strong>$1</strong>/$2.scgi$3  [NS,<strong>T=application/x-http-cgi</strong>]
</pre></div>

          <p>Or assume we have some more nifty programs:
          <code>wwwlog</code> (which displays the
          <code>access.log</code> for a URL subtree and
          <code>wwwidx</code> (which runs Glimpse on a URL
          subtree). We have to provide the URL area to these
          programs so they know on which area they have to act on.
          But usually this ugly, because they are all the times
          still requested from that areas, i.e. typically we would
          run the <code>swwidx</code> program from within
          <code>/u/user/foo/</code> via hyperlink to</p>

<div class="example"><pre>
/internal/cgi/user/swwidx?i=/u/user/foo/
</pre></div>

          <p>which is ugly. Because we have to hard-code
          <strong>both</strong> the location of the area
          <strong>and</strong> the location of the CGI inside the
          hyperlink. When we have to reorganize the area, we spend a
          lot of time changing the various hyperlinks.</p>
        </dd>

        <dt>Solution:</dt>

        <dd>
          <p>The solution here is to provide a special new URL format
          which automatically leads to the proper CGI invocation.
          We configure the following:</p>

<div class="example"><pre>
RewriteRule   ^/([uge])/([^/]+)(/?.*)/\*  /internal/cgi/user/wwwidx?i=/$1/$2$3/
RewriteRule   ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3
</pre></div>

          <p>Now the hyperlink to search at
          <code>/u/user/foo/</code> reads only</p>

<div class="example"><pre>
HREF="*"
</pre></div>

          <p>which internally gets automatically transformed to</p>

<div class="example"><pre>
/internal/cgi/user/wwwidx?i=/u/user/foo/
</pre></div>

          <p>The same approach leads to an invocation for the
          access log CGI program when the hyperlink
          <code>:log</code> gets used.</p>
        </dd>
      </dl>

    

    <h3>From Static to Dynamic</h3>

      

      <dl>
        <dt>Description:</dt>

        <dd>
          <p>How can we transform a static page
          <code>foo.html</code> into a dynamic variant
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -