📄 ch14.htm
字号:
<HTML>
<HEAD>
<TITLE>Chapter 14 -- Perl and Tracking</TITLE>
<META>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#CE2910">
<H1><FONT SIZE=6 COLOR=#FF0000>Chapter 14</FONT></H1>
<H1><FONT SIZE=6 COLOR=#FF0000>Perl and Tracking</FONT></H1>
<HR>
<P>
<CENTER><B><FONT SIZE=5><A NAME="CONTENTS">CONTENTS</A></FONT></B></CENTER>
<UL>
<LI><A HREF="#Logging">
Logging</A>
<UL>
<LI><A HREF="#TheLogFile">
The Log File</A>
<LI><A HREF="#HTTPStatusCodes">
HTTP Status Codes </A>
</UL>
<LI><A HREF="#TrackingandEnvironmentalVariables">
Tracking and Environmental Variables</A>
<UL>
<LI><A HREF="#Browsers">
Browsers</A>
<LI><A HREF="#IPAddressesDomainNames">
IP Addresses/Domain Names</A>
<LI><A HREF="#TheReferURL">
The Refer URL</A>
</UL>
<LI><A HREF="#TrackingHitswiththeLog">
Tracking Hits with the Log</A>
<LI><A HREF="#CountersRevisited">
Counters Revisited</A>
<UL>
<LI><A HREF="#ManagingDBMFiles">
Managing DBM Files</A>
</UL>
<LI><A HREF="#NTPerlChecklistScript">
NT Perl Checklist Script</A>
<LI><A HREF="#ChapterInReview">
Chapter In Review</A>
</UL>
</UL>
<HR>
<P>
There has been little mention so far in this book about the use
of logs and other methods of tracking. Once you have a complete
Web site, it is necessary to find out how users travel through
it. One of the ways to do this is to track usage with logs. To
do this accurately, you can place a Perl script at the top of
the Web page to be tracked. To demonstrate this use of Perl, in
this chapter a tracking element is added to the Goo Goo Records
Web site.
<H2><A NAME="Logging"><FONT SIZE=5 COLOR=#FF0000>
Logging</FONT></A></H2>
<P>
There are all kinds of logs, or lists of actions, that happen
inside a computer. Logs tend to be divided up based on their purpose,
such as a system log to record actions-called events-done by the
NT system, or an application log that records events caused by
applications. These logs can be used to keep track of anything
that happens that might be of interest to a Web Master or Network
Administrator. For example, every time the computer is asked to
start up an application, a note of that event is made in the application
log. This log, like most others, can be viewed using Event Viewer.
<P>
Within a Web site logs can be used to monitor who is visiting
your site, and even if they are trying to go into places they're
not supposed to.
<P>
One of the early problems with tracking and logging on the Web
were unrealistic and high hit counts for Web sites. These inflated
numbers were, and still are, caused by simplistic uses of counters
to record hits on a particular page. It is quite common for Web
pages to contain two or three hypertext links, an image link,
and a next page link. If the user accesses all of these links
then a hit count of four or five may occur, giving a skewed version
of site usage. This is only true if the links on the page are
used; their presence alone will not skew the hit count.
<P>
There are several solutions available to avoid this problem. One
is to add a short Perl script to the top of the larger Perl script
that delivers the Web pages that are being monitored for user
traffic. There are two ways to do this:
<OL>
<LI>Have a form call a CGI script, and that script will load the
page, and record the hit to that page.
<LI>Have one of the links to the HTML documents call an URL, like
this one from the Goo Goo Records site-http://www.googoo.com/cgi-bin/page.pl?next.htm.
</OL>
<P>
This second method will call a Perl script, page.pl, which will
read the query string information for the HTML document, in this
case, next.html. The script will then deliver that HTML document.
<P>
The second method is a little more flexible because you only need
one Perl script to deliver any page: the page to deliver changes
with the query string info. One drawback is that all of your links
will be to a Perl script, making the response time longer. Also,
you would have to do all of your logging from the Perl script
because the Web server log would only record that every user called
the script <I>x</I> number of times, without recording what the
destination was. This may be desirable, though, because using
this method allows you to make the Web site's log files as minimal
or as detailed as you like. This method is explored in detail
later in this chapter, as it is the same method used by the Goo
Goo Records' Web Master on their Web site. This is the script
that performs the logging task:
<BLOCKQUOTE>
<PRE>
#!/usr/bin/perl
###################################################
#
# This is the Page delivery script.
#
# This script takes the query string information as the filename and
# delivers the file to the browser. A link to deliver the page new.html would
# look like this:
#
# <A HREF="http://www.googoo.com/cgi-bin/page.pl?new.html>new</a>
#
# Path information is also valid, and necessary to get lower in the directory
# structure:
#
# <A HREF="http://www.googoo.com/cgi-bin/page.pl?/newstuff/new/new.html>new</a>
#
# This will allow more flexible logging of any page that is delivered with this
# script. With a little work, you can even get this script to process server
# side includes, counter, and all that jazz.
# The trouble here is that the server logs will now only show the user hitting
# page.pl, no matter which page they request. This is fine if you are creating
# your own logs, but can be frustrating if you are not. This script generates
# a log similar to the one generated by the EWACS server.
#####################################################
if ($ENV{'REQUEST_METHOD'} EQ 'GET') {
$file=$ENV{'QUERY_STRING'};
$file=~s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;
print "Content-type: text/html\n\n";
$file="c:\googoo\$file";
if (-e $file) {
open(LOG,">>c:\\logs\\access");
$t=localtime;
print "$t $ENV{'SERVER_NAME'} $ENV{'REMOTE_HOST'}
$ENV{'REQUEST_METHOD'} $file
$ENV{'SERVER_PROTOCOL'}\n";
close(LOG);
open(HTML,"$file");
while ($line=<HTML>) {
print $line;
}
close(HTML);
}
else {
print <<EOF;
<HTML>
<HEAD>
<TITLE>Error! File not found</TITLE>
</HEAD>
<H1>Error! File not found</H1>
<HR><P>
The file you requested was not found. Please contact <address><A
HREF="mailto:webmaster@googoo.com">webmaster@googoo.com</a></address>
</HTML>
EOF
}
}
else {
print "<HTML>\n";
print "<title>Error - Script Error</title>\n";
print "<h1>Error: Script Error</h1>\n";
print "<P><hr><P>\n";
print "There was an error with the Server Script. Please\n";
print "contact GooGoo Records at <address><a
href=\"mailto:support@googoo.com\">support@googoo.com</a></address>\n";
print "</HTML>\n";
exit;
}
</PRE>
</BLOCKQUOTE>
<P>
Another method of tracking is to read information from a log file,
and to create your tracking data from this data.
<H3><A NAME="TheLogFile">
The Log File</A></H3>
<P>
The file that contains the important information about the Goo
Goo Records site is known as the log file. Since they are using
the EMWAC HTTP service with their Web site, a log file is created
each day and kept in the log file directory. The directory path
for the log file directory on the Goo Goo Records server is C:\WINNT35\system32\LogFiles.
Each log file is given a file name relating to the date it was
created, following the general format of HSyymmdd.LOG. For example,
a log file created for July 6, 1996 would have the log filename
HS960706.LOG. An example of a log file's contents would resemble
this excerpt of a listing from the log file HS960509, from a server
in Finland:
<BLOCKQUOTE>
<PRE>
Thu May 09 20:09:17 1996 wait.pspt.fi 194.100.26.175 GET /ACEINDEX.HTM HTTP/1.0
Thu May 09 20:09:18 1996 wait.pspt.fi 194.100.26.175 GET /gif/AMKVLOGO.GIF HTTP/1.0
Thu May 09 20:09:19 1996 wait.pspt.fi 194.100.26.175 GET /gif/RNBW.GIF HTTP/1.0
Thu May 09 20:09:19 1996 wait.pspt.fi 194.100.26.175 GET /gif/RNBWBAR.GIF HTTP/1.0
Thu May 09 22:35:09 1996 wait.pspt.fi 194.215.82.227 GET /gif/WLOGO.GIF HTTP/1.0
Thu May 09 22:35:11 1996 wait.pspt.fi 194.215.82.227 GET /gif/BLUEBUL.GIF HTTP/1.0
Thu May 09 22:35:11 1996 wait.pspt.fi 194.215.82.227 GET /cgi-bin/counter.exe?-smittari+-w5+./DEFAULT.HTM
HTTP/1.0
Thu May 09 22:35:13 1996 wait.pspt.fi 194.215.82.227 GET /gif/EHI.JPG HTTP/1.0
Thu May 09 22:35:17 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI1.gif HTTP/1.0
Thu May 09 22:35:17 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI2.gif HTTP/1.0
Thu May 09 22:35:19 1996 wait.pspt.fi 194.215.82.227 GET /AVIVF.HTM HTTP/1.0
Thu May 09 22:35:23 1996 wait.pspt.fi 194.215.82.227 GET /gif/virtlogo.gif HTTP/1.0
Thu May 09 22:35:23 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI1.gif HTTP/1.0
Thu May 09 22:35:29 1996 wait.pspt.fi 194.215.82.227 GET /gif/KOULU.GIF HTTP/1.0
Thu May 09 22:35:32 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI2.gif HTTP/1.0
Thu May 09 22:35:45 1996 wait.pspt.fi 194.215.82.227 GET /gif/VF21.GIF HTTP/1.0
Thu May 09 22:36:02 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI3.gif HTTP/1.0
Thu May 09 22:36:14 1996 wait.pspt.fi 194.215.82.227 GET /gif/LETTER.GIF HTTP/1.0
Thu May 09 22:37:46 1996 wait.pspt.fi 194.215.82.227 GET /AVIONGEL.HTM HTTP/1.0
Thu May 09 22:37:52 1996 wait.pspt.fi 194.215.82.227 GET /gif/PIRUNLG.GIF HTTP/1.0
Thu May 09 22:44:43 1996 wait.pspt.fi 194.215.82.227 GET /AVIPELI1.HTM HTTP/1.0
Thu May 09 22:44:45 1996 wait.pspt.fi 194.215.82.227 GET /gif/STRESSLG.GIF HTTP/1.0
Fri May 10 04:29:29 1996 wait.pspt.fi 192.83.26.48 GET /gif/NAPPI3.gif HTTP/1.0
Fri May 10 04:29:30 1996 wait.pspt.fi 192.83.26.48 GET /gif/LETTER.GIF HTTP/1.0
Fri May 10 04:29:31 1996 wait.pspt.fi 192.83.26.48 GET /gif/engflag.jpg HTTP/1.0
Fri May 10 04:30:21 1996 wait.pspt.fi 192.83.26.48 GET /AVIVF.HTM HTTP/1.0
Fri May 10 04:30:26 1996 wait.pspt.fi 192.83.26.48 GET /gif/virtlogo.gif HTTP/1.0
Fri May 10 04:30:27 1996 wait.pspt.fi 192.83.26.48 GET /gif/VF21.GIF HTTP/1.0
Fri May 10 04:30:30 1996 wait.pspt.fi 192.83.26.48 GET /gif/KOULU.GIF HTTP/1.0
Fri May 10 04:31:11 1996 wait.pspt.fi 192.83.26.48 GET /AVIPELI2.HTM HTTP/1.0
Fri May 10 04:31:13 1996 wait.pspt.fi 192.83.26.48 GET /gif/LAITE.GIF HTTP/1.0
Fri May 10 04:31:14 1996 wait.pspt.fi 192.83.26.48 GET /gif/KOKOONP.JPG HTTP/1.0
Fri May 10 04:31:32 1996 wait.pspt.fi 192.83.26.48 GET /AVIPELI3.HTM HTTP/1.0
Fri May 10 04:31:33 1996 wait.pspt.fi 192.83.26.48 GET /gif/TIKI1.GIF HTTP/1.0
Fri May 10 04:31:33 1996 wait.pspt.fi 192.83.26.48 GET /gif/TPIRU1.GIF HTTP/1.0
Fri May 10 04:31:33 1996 wait.pspt.fi 192.83.26.48 GET /gif/TSTRE1.GIF HTTP/1.0
Fri May 10 04:31:46 1996 wait.pspt.fi 192.83.26.48 GET /AVIPELI4.HTM HTTP/1.0
Fri May 10 04:32:03 1996 wait.pspt.fi 192.83.26.48 GET /ACEINDEX.HTM HTTP/1.0
Fri May 10 04:32:19 1996 wait.pspt.fi 192.83.26.48 GET /ACEVF.HTM HTTP/1.0
Fri May 10 04:32:21 1996 wait.pspt.fi 192.83.26.48 GET /gif/ROBOCOP1.GIF HTTP/1.0
Fri May 10 04:33:01 1996 wait.pspt.fi 192.83.26.48 GET /ACEINDEX.HTM HTTP/1.0
Fri May 10 07:54:44 1996 wait.pspt.fi 193.166.48.136 GET /gif/NAPPI1.gif HTTP/1.0
Fri May 10 07:54:45 1996 wait.pspt.fi 193.166.48.136 GET /gif/NAPPI2.gif HTTP/1.0
Fri May 10 07:54:45 1996 wait.pspt.fi 193.166.48.136 GET /gif/NAPPI3.gif HTTP/1.0
Fri May 10 07:54:45 1996 wait.pspt.fi 193.166.48.136 GET /cgi-bin/counter.exe?-smittari+-w5+./DEFAULT.HTM
HTTP/1.0
Fri May 10 07:54:45 1996 wait.pspt.fi 193.166.48.136 GET /gif/LETTER.GIF HTTP/1.0
Fri May 10 10:08:25 1996 wait.pspt.fi 192.89.123.26 GET /gif/VFLOGO.GIF HTTP/1.0
Fri May 10 10:08:25 1996 wait.pspt.fi 192.89.123.26 GET /gif/AMKVLOGO.GIF HTTP/1.0
Fri May 10 10:08:37 1996 wait.pspt.fi 192.89.123.26 GET /AVIVF.HTM HTTP/1.0
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -