📄 ch21.htm
字号:
($site, $status) = (parseLogEntry())[0, 9];
if ($status eq '401') {
$siteList{$site}++;
}
}
close(LOGFILE);
@sortedSites = sort(keys(%siteList));
if (scalar(@sortedSites) == 0) {
print("There were no unauthorized access attempts.\n");
}
else {
foreach $site (@sortedSites) {
$count = $siteList{$site};
write;
}
}
</PRE>
</BLOCKQUOTE>
<P>
This program displays:
<BLOCKQUOTE>
<PRE>
Unauthorized Access Report Pg 1
Remote Site Name Access Count
--------------------------------------- ------------
ip48-max1-fitch.zipnet.net 1
kairos.algonet.se 4
</PRE>
</BLOCKQUOTE>
<P>
You can expand this program's usefulness by also displaying the
logName and fullName items from the log file.
<H3><A NAME="ExampleConvertingtheReporttoaWebPage">
Example: Converting the Report to a Web Page</A></H3>
<P>
Creating nice reports for your own use is all well and good. But
suppose your boss wants the statistics updated hourly and available
on demand? Printing the report and faxing to the head office is
probably a bad idea. One solution is to convert the report into
a Web page. Listing 21.5 contains a program that does just that.
The program will create a Web page that displays the access counts
for the documents that start with a 's.' Figure 21.1 shows the
Web page that displayed the access counts.
<P>
<A HREF="f21-1.gif" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/f21-1.gif"><B>Figure 21.1 : </B><I>The Web page that displayed the Access
Counts</I>.</A>
<P>
<IMG SRC="pseudo.gif" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/pseudo.gif" BORDER=1 ALIGN=RIGHT><p>
<BLOCKQUOTE>
<I>Turn on the warning option.<BR>
<I>Define the </I></I><TT><I>parseLogEntry()</I></TT><I>
fuNCtion.<BR>
Declare a local variable to hold the pattern that matches a single
item.<BR>
Use the matching operator to extract information into pattern
memory.<BR>
Return a list that contains the 11 items extracted from the log
entry.<BR>
Initialize some variables to be used later. The file name of the
accesslog, the web page file name, and the email address of the
web page maintainer.<BR>
Open the logfile.<BR>
Iterate over each line of the logfile.<BR>
Parse the entry to extract the 11 items but only keep the file
specification that was requested.<BR>
Put the filename into pattern memory.<BR>
Store the filename into </I><TT><I>$fileName</I></TT><I>.
<BR>
Test to see if </I><TT><I>$fileName</I></TT><I>
is defined.<BR>
INCrement the file specification's value in the </I><TT><I>%docList</I></TT><I>
hash.Close the log file.<BR>
Open the output file that will become the web page.<BR>
Output the HTML header.<BR>
Start the body of the HTML page.<BR>
Output current time.<BR>
Start an unorder list so the subsequent table is indented.<BR>
Start a HTML table.<BR>
Output the heading for the two columns the table will use.<BR>
Iterate over hash that holds the document list.<BR>
Output a table row for each hash entry.<BR>
End the HTML table.<BR>
End the unordered list.<BR>
Output a message about who to contact if questions arise.<BR>
End the body of the page.<BR>
End the HTML.<BR>
Close the web page file.</I>
</BLOCKQUOTE>
<HR>
<P>
<B>Listing 21.5 21LST05.PL-Creating a Web Page to View
Access Counts<BR>
</B>
<BLOCKQUOTE>
<PRE>#!/usr/bin/perl -w
sub parseLogEntry {
my($w) = "(.+?)";
m/^$w $w $w \[$w:$w $w\] "$w $w $w" $w $w/;
return($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11);
}
$LOGFILE = "access.log";
$webPage = "acescnt.htm";
$mailAddr = 'medined@planet.net';
open(LOGFILE) or die("Could not open log file.");
foreach (<LOGFILE>) {
$fileSpec = (parseLogEntry())[7];
$fileSpec =~ m!.+/(.+)!;
$fileName = $1;
# some requests don't specify a filename, just a directory.
if (defined($fileName)) {
$docList{$fileSpec}++ if $fileName =~ m/^s/i;
}
}
close(LOGFILE);
open(WEBPAGE, ">$webPage");
print WEBPAGE ("<HEAD><TITLE>Access Counts</TITLE></HEAD>");
print WEBPAGE ("<BODY>");
print WEBPAGE ("<H1>", scalar(localtime), "</H1>");
print WEBPAGE ("<UL>");
print WEBPAGE ("<TABLE BORDER=1 CELLPADDING=10>");
print WEBPAGE ("<TR><TH>Document</TH><TH>Access<BR>Count</TH></TR>");
foreach $document (sort(keys(%docList))) {
$count = $docList{$document};
print WEBPAGE ("<TR>");
print WEBPAGE ("<TD><FONT SIZE=2><TT>$document</TT></FONT></TD>");
print WEBPAGE ("<TD ALIGN=right>$count</TD>");
print WEBPAGE ("</TR>");
}
print WEBPAGE ("</TABLE><P>");
print WEBPAGE ("</UL>");
print WEBPAGE ("Have questions? Contact <A HREF=\"mailto:$mailAddr\ ">$mailAddr</A>");
print WEBPAGE ("</BODY></HTML>");
close(WEBPAGE);
</PRE>
</BLOCKQUOTE>
<HR>
<H3><A NAME="ExistingLogFileAnalyzingPrograms">
Existing Log File Analyzing Programs</A></H3>
<P>
Now that you've learned some of the basics of log file statistics,
you should check out a program called Statbot, which can be used
to automatically generate statistics and graphs. You can find
it at:
<BLOCKQUOTE>
<PRE>http://www.xmission.com:80/~dtubbs/
</PRE>
</BLOCKQUOTE>
<P>
Statbot is a WWW log analyzer, statistics generator, and database
program. It works by "snooping" on the logfiles generated
by most WWW servers and creating a database that contains information
about the WWW server. This database is then used to create a
statistics page and GIF charts that can be "linked to"
by other WWW resources.
<P>
Because Statbot "snoops" on the server logfiles, it
does not require the use of the server's cgi-bin capability.
It simply runs from the user's own directory, automatically updating
statistics. Statbot uses a text-based configuration file for setup,
so it is very easy to install and operate, even for people with
no programming experieNCe. Most importantly, Statbot is fast.
ONCe it is up and running, updating the database and creating
the new HTML page can take as little as 10 seconds. Because of
this, many Statbot users run Statbot oNCe every 5-10 minutes,
which provides them with the very latest statistical information
about their site.
<P>
Another fine log analysis program is AccessWatch, written by Dave
Maher. AccessWatch is a World Wide Web utility that provides a
comprehensive view of daily accesses for individual users. It
is equally capable of gathering statistics for an entire server.
It provides a regularly updated summary of WWW server hits and
accesses, and gives a graphical representation of available statistics.
It generates statistics for hourly server load, page demand, accesses
by domain, and accesses by host. AccessWatch parses the WWW server
log and searches for a common set of documents, usually specified
by a user's root directory, such as /~username/ or /users/username.
AccessWatch displays results in a graphical, compact format.
<P>
If you'd like to look at <I>all</I> of the available log file
analyzers, go to Yahoo's Log Analysis Tools page:
<BLOCKQUOTE>
<PRE>http://www.yahoo.com/Computers_and_Internet/Internet/
World_Wide_Web/HTTP/Servers/Log_Analysis_Tools/
</PRE>
</BLOCKQUOTE>
<P>
This page lists all types of log file analyzers-from simple Perl
scripts to full-blown graphical applications.
<H3><A NAME="CreatingYourOwnCGILogFile">
Creating Your Own CGI Log File</A></H3>
<P>
It is generally a good idea to keep track of who executes your
CGI scripts. You've already been introduced to the environment
variables that are available within your CGI script. Using the
information provided by those environment variables, you can create
your own log file.
<P>
<IMG SRC="pseudo.gif" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/pseudo.gif" BORDER=1 ALIGN=RIGHT><p>
<BLOCKQUOTE>
<I>Turn on the warning option.<BR>
<I>Define the </I></I><TT><I>writeCgiEntry()</I></TT><I>
fuNCtion.<BR>
Initialize the log file name.<BR>
Initialize the name of the current script.<BR>
Create local versions of environment variables.<BR>
Open the log file in append mode.<BR>
Output the variables using ! as a field delimiter.<BR>
Close the log file.<BR>
Call the </I><TT><I>writeCgiEntry()</I></TT><I>
fuNCtion.<BR>
Create a test HTML page.</I>
</BLOCKQUOTE>
<P>
Listing 21.6 shows how to create your own CGI log file based on
environment variables.
<HR>
<P>
<B>Listing 21.6 21LST06.PL-Creating Your Own CGI Log
File Based on Environment Variables<BR>
</B>
<BLOCKQUOTE>
<PRE>#!/usr/bin/perl -w
sub writeCgiEntry {
my($logFile) = "cgi.log";
my($script) = __FILE__;
my($name) = $ENV{'REMOTE_HOST'};
my($addr) = $ENV{'REMOTE_ADDR'};
my($browser) = $ENV{'HTTP_USER_AGENT'};
my($time) = time;
open(LOGFILE,">>$logFile") or die("Can't open cgi log file.\n");
print LOGFILE ("$script!$name!$addr!$browser!$time\n");
close(LOGFILE);
}
writeCgiEntry();
# do some CGI activity here.
print "Content-type: text/html\n\n";
print "<HTML>";
print "<TITLE>CGI Test</TITLE>";
print "<BODY><H1>Testing!</H1></BODY>";
print "</HTML>";
</PRE>
</BLOCKQUOTE>
<HR>
<P>
Every time this script is called, an entry will be made in the
CGI log file. If you place a call to the <TT>writeCgiEntry()</TT>
fuNCtion in all of your CGI scripts, after a while you will be
able perform some statistical analysis on who uses your CGI scripts.
<H2><A NAME="CommunicatingwithUsers"><FONT SIZE=5 COLOR=#FF0000>
Communicating with Users</FONT></A></H2>
<P>
So far we've been looking at examining the server log files in
this chapter. Perl is also very useful for creating the Web pages
that the user will view.
<H3><A NAME="ExampleGeneratingaWhatsNewPage">
Example: Generating a What's New Page</A></H3>
<P>
One of the most common features of a Web site is a What's New
page. This page typically lists all of the files modified in the
last week or month along with a short description of the document.
<P>
A What's New page is usually automatically generated using a scheduler
program, like <TT>cron</TT>. If you
try to generate the What's New page via a CGI script, your server
will quickly be overrun by the large number of disk accesses that
will be required and your users will be upset that a simple What's
New page takes so long to load.
<P>
Perl is an excellent tool for creating a What's New page. It has
good directory access fuNCtions and regular expressions that can
be used to search for titles or descriptions in HTML pages. Listing
21.7 contains a Perl program that will start at a specified base
directory and search for files that have been modified siNCe the
last time that the script was run. When the search is complete,
an HTML page is generated. You can have your home page point to
the automatically generated What's New page.
<P>
This program uses a small data file-called <TT>new.log</TT>-to
keep track of the last time that the program was run. Any files
that have changed siNCe that date are displayed on the HTML page.
<BR>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD><B>Note</B></TD></TR>
<TR><TD>
<BLOCKQUOTE>
This program contains the first significant use of recursion in this book. Recursion happens when a fuNCtion calls itself and will be fully explained after the program listing.</BLOCKQUOTE>
</TD></TR>
</TABLE>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -