📄 ch11_01.htm

📁 用perl编写CGI的好书。本书从解释CGI和底层HTTP协议如何工作开始
💻 HTM
📖 第 1 页 / 共 2 页
字号:
上一页 12
application would then be responsible for transparently embeddingspecial identifying information (such as a query string) into all thehyperlinks within the requested HTML document and returning the newlycreated content to the browser.</p><p>Let's look at how we're actually going to implement theapplication. It's only a two-step process. To reiterate, theproblem we're trying to solve is to determine what documents aparticular user requests and how much time he or she spends viewingthem. First, we need to identify the set of documents for which wewant to obtain the users' browsing history. Once we do that, wesimply move these documents to a specific directory under the webserver's document root directory.</p><p>Next, we need to configure the web server to execute a CGIapplication each and every time a user requests a document from thisdirectory. We'll use the Apache web server for this example,but the configuration details are very similar for other web servers,as well.</p><p>We simply need to insert the following directives into<a name="INDEX-2224" /><a name="INDEX-2225" />Apache's access configurationfile, <em class="emphasis">access.conf</em>:</p><blockquote><pre class="code">&lt;Directory /usr/local/apache/htdocs/store&gt;    AddType text/html   .html    AddType Tracker     .html    Action  Tracker     /cgi/track.cgi&lt;/Directory&gt;</pre></blockquote><p>When a user requests a document from the<em class="filename">/usr/local/apache/htdocs/store</em> directory, Apacheexecutes the <em class="emphasis">query_track</em> application, passing toit the relative URL of the requested document as extra pathinformation. Here's an example. When the user requests adocument from the directory for the first time:</p><blockquote class="simplelist"><p><em class="emphasis">http://localhost/store/index.html</em></p></blockquote><p>the web server will execute <em class="emphasis">query_track</em>, like so:</p><blockquote class="simplelist"><p><em class="emphasis">http://localhost/cgi/track.cgi/store/index.html</em></p></blockquote><p>The application uses the <a name="INDEX-2226" /><a name="INDEX-2227" />PATH_TRANSLATEDenvironment variable to get the full path of<em class="emphasis">index.html</em>. Then, it opens the file, creates anew identifier for the user, embeds it into each relative URL withinthe document, and returns the modified HTML stream to the browser. Inaddition, we log the transaction to a special log file, which you canuse to analyze users' browsing habits at a later time.</p><p>If you're curious as to what a modified<a name="INDEX-2228" />URL looks like, here's an example:</p><blockquote class="simplelist"><p><em class="emphasis">http://localhost/store/.CC7e2BMb_H6UdK9KfPtR1g/faq.html</em></p></blockquote><p>The identifier is a modified Base64 <a name="INDEX-2229" />MD5 message digest, computed usingvarious pieces of information from the request. The code to generateit looks like this:</p><blockquote><pre class="code">use Digest::MD5;my $md5 = new Digest::MD5;my $remote = $ENV{REMOTE_ADDR} . $ENV{REMOTE_PORT};my $id = $md5-&gt;md5_base64( time, $$, $remote );$id =~ tr|+/=|-_.|;  # Make non-word chars URL-friendly</pre></blockquote><p>This does a good job of generating a unique key for each request.However, it is not intended to create keys that cannot be cracked. Ifyou are generating session identifiers that provide access tosensitive data, then you should use a more sophisticated method togenerate an identifier.</p><p>If you use Apache, you do not have to generate a unique identifieryourself if you build <a name="INDEX-2230" /><a name="INDEX-2231" /><a name="INDEX-2232" />Apache with the<em class="emphasis">mod_unique_id</em> module. It creates a uniqueidentifier for each request, which is available to your CGI script as<tt class="literal">$ENV{UNIQUE_ID}</tt>.<em class="emphasis">mod_unique_id</em> is included in the Apachedistribution but not compiled by default.</p><p>Let's look at how we could construct code to <a name="INDEX-2233" /> <a name="INDEX-2,234" /> <a name="INDEX-2,235" />parse HTML documents and insertidentifiers. <a href="ch11_01.htm#ch11-50116">Example 11-1</a> shows a Perl module that weuse to parse the request URL and HTML output.</p><a name="ch11-50116" /><div class="example"><h4 class="objtitle">Example 11-1. CGIBook::UserTracker.pm </h4><a name="INDEX-2236" /><a name="INDEX-2,237" /><a name="INDEX-2,238" /><blockquote><pre class="code">#!/usr/bin/perl -wT#/----------------------------------------------------------------# UserTracker Module# # Inherits from HTML::Parser# # package CGIBook::UserTracker;push @ISA, "HTML::Parser";use strict;use URI;use HTML::Parser;1;#/----------------------------------------------------------------# Public methods# sub new {    my( $class, $path ) = @_;    my $id;        if ( $ENV{PATH_INFO} and         $ENV{PATH_INFO} =~ s|^/\.([a-z0-9_.-]*)/|/|i ) {        $id = $1;    }    else {        $id ||= unique_id(  );    }        my $self = $class-&gt;SUPER::new(  );    $self-&gt;{user_id}    = $id;    $self-&gt;{base_path}  = defined( $path ) ? $path : "";            return $self;}sub base_path {    my( $self, $path ) = @_;    $self-&gt;{base_path} = $path if defined $path;    return $self-&gt;{base_path};}sub user_id {    my $self = shift;    return $self-&gt;{user_id};}#/----------------------------------------------------------------# Internal (private) subs# sub unique_id {    # Use Apache's mod_unique_id if available    return $ENV{UNIQUE_ID} if exists $ENV{UNIQUE_ID};        require Digest::MD5;        my $md5 = new Digest::MD5;    my $remote = $ENV{REMOTE_ADDR} . $ENV{REMOTE_PORT};        # Note this is intended to be unique, and not unguessable    # It should not be used for generating keys to sensitive data    my $id = $md5-&gt;md5_base64( time, $$, $remote );    $id =~ tr|+/=|-_.|;  # Make non-word chars URL-friendly    return $id;}sub encode {    my( $self, $url ) = @_;    my $uri  = new URI( $url, "http" );    my $id   = $self-&gt;user_id(  );    my $base = $self-&gt;base_path;        my $path = $uri-&gt;path;    $path =~ s|^$base|$base/.$id| or        die "Invalid base path configured\n";    $uri-&gt;path( $path );        return $uri-&gt;as_string;}#/----------------------------------------------------------------# Subs to implement HTML::Parser callbacks# sub start {    my( $self, $tag, $attr, $attrseq, $origtext ) = @_;    my $new_text = $origtext;        my %relevant_pairs = (        frameset    =&gt; "src",        a           =&gt; "href",        area        =&gt; "href",        form        =&gt; "action",# Uncomment these lines if you want to track images too#        img         =&gt; "src",#        body        =&gt; "background",    );        while ( my( $rel_tag, $rel_attr ) = each %relevant_pairs ) {        if ( $tag eq $rel_tag and $attr-&gt;{$rel_attr} ) {            $attr-&gt;{$rel_attr} = $self-&gt;encode( $attr-&gt;{$rel_attr} );            my @attribs = map { "$_=\"$attr-&gt;{$_}\"" } @$attrseq;            $new_text = "&lt;$tag @attribs&gt;";        }    }        # Meta refresh tags have a different format, handled separately    if ( $tag eq "meta" and $attr-&gt;{"http-equiv"} eq "refresh" ) {        my( $delay, $url ) = split ";URL=", $attr-&gt;{content}, 2;        $attr-&gt;{content} = "$delay;URL=" . $self-&gt;encode( $url );        my @attribs = map { "$_=\"$attr-&gt;{$_}\"" } @$attrseq;        $new_text = "&lt;$tag @attribs&gt;";    }        print $new_text;}sub declaration {    my( $self, $decl ) = @_;    print $decl;}sub text {    my( $self, $text ) = @_;    print $text;}sub end {    my( $self, $tag ) = @_;    print "&lt;/$tag&gt;";}sub comment {    my( $self, $comment ) = @_;    print "&lt;!--$comment--&gt;";}</pre></blockquote></div><p><a href="ch11_01.htm#ch11-81204">Example 11-2</a> shows the CGI application that we use toprocess static HTML pages.</p><a name="ch11-81204" /><div class="example"><h4 class="objtitle">Example 11-2. query_track.cgi </h4><blockquote><pre class="code">#!/usr/bin/perl -wTuse strict;use CGIBook::UserTracker;local *FILE;my $track = new CGIBook::UserTracker;$track-&gt;base_path( "/store" );my $requested_doc = $ENV{PATH_TRANSLATED};unless ( -e $requested_doc ) {    print "Location: /errors/not_found.html\n\n";}open FILE, $requested_doc or die "Failed to open $requested_doc: $!";my $doc = do {    local $/ = undef;    &lt;FILE&gt;;};close FILE;# This assumes we're only tracking HTML files:print "Content-type: text/html\n\n";$track-&gt;parse( $doc );</pre></blockquote></div><p>Once we have inserted the identifier into all the URLs, we simplysend the modified content to the standard output stream, along withthe <a name="INDEX-2239" /><a name="INDEX-2240" /><a name="INDEX-2241" />contentheader.</p><p>Now that we've looked at how to maintain state between views ofmultiple HTML documents, our next step is to discuss persistence whenusing multiple forms. An online store, for example, is typicallybroken into multiple pages. We need to able to identify users as theyfill out each page. We'll look at techniques for solving such<a name="INDEX-2242" /><a name="INDEX-2243" /><a name="INDEX-2244" />problemsin the <a name="INDEX-2245" /><a name="INDEX-2246" />next<a name="INDEX-2247" /><a name="INDEX-2248" /><a name="INDEX-2249" />section.</p></div><hr align="left" width="515" /><div class="navbar"><table border="0" width="515"><tr><td width="172" valign="top" align="left"><a href="ch10_04.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0" /></a></td><td width="171" valign="top" align="center"><a href="index.htm"><img src="../gifs/txthome.gif" alt="Home" border="0" /></a></td><td width="172" valign="top" align="right"><a href="ch11_02.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0" /></a></td></tr><tr><td width="172" valign="top" align="left">10.4. DBI</td><td width="171" valign="top" align="center"><a href="index/index.htm"><img src="../gifs/index.gif" alt="Book Index" border="0" /></a></td><td width="172" valign="top" align="right">11.2. Hidden Fields</td></tr></table></div><hr align="left" width="515" /><img src="../gifs/navbar.gif" alt="Library Navigation Links" usemap="#library-map" border="0" /><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2001</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"><area href="../index.htm" coords="1,1,83,102" shape="rect" /><area href="../lnut/index.htm" coords="81,0,152,95" shape="rect" /><area href="../run/index.htm" coords="172,2,252,105" shape="rect" /><area href="../apache/index.htm" coords="238,2,334,95" shape="rect" /><area href="../sql/index.htm" coords="336,0,412,104" shape="rect" /><area href="../dbi/index.htm" coords="415,0,507,101" shape="rect" /><area href="../cgi/index.htm" coords="511,0,601,99" shape="rect" /></map></body></html>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -