📄 ch21.htm
字号:
option.<BR>
</I>Initialize <TT>$LOGFILE</TT> with
the full path and name of the access log.<BR>
Open the log file.<BR>
Iterate over the lines of the log file. Each line gets placed,in
turn, into <TT>$line</TT>.<BR>
Define a temporary variable to hold a pattern that recognizesa
single item.<BR>
Use the matching operator to store the 11 items into pattern memory.
<BR>
Store the pattern memories into individual variables.<BR>
Close the log file.
</BLOCKQUOTE>
<HR>
<P>
<B>Listing 21.2 21LST02.PL-Using a Regular Expression
to Parse the Log File Entry<BR>
</B>
<BLOCKQUOTE>
<PRE>#!/usr/bin/perl -w
$LOGFILE = "access.log";
open(LOGFILE) or die("Could not open log file.");
foreach $line (<LOGFILE>) {
$w = "(.+?)";
$line =~ m/^$w $w $w \[$w:$w $w\] "$w $w $w" $w $w/;
$site = $1;
$logName = $2;
$fullName = $3;
$date = $4;
$time = $5;
$gmt = $6;
$req = $7;
$file = $8;
$proto = $9;
$status = $10;
$length = $11;
# do line-by-line processing.
}
close(LOGFILE);
</PRE>
</BLOCKQUOTE>
<HR>
<P>
The main advantage to using regular expressions to extract information
is the ease with which you can adjust the pattern to account for
different log file formats. If you use a server that delimits
the date/time item with curly brackets, you only need to change
the line with the matching operator to accommodate the different
format.
<H3><A NAME="ExampleListingAccessbyDocument">
Example: Listing Access by Document</A></H3>
<P>
One easy and useful analysis that you can do is to find out how
many times each document at your site has been visited. Listing
21.3 contains a program that reports on the access counts of documents
beginning with the letter s.<BR>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD><B>Note</B></TD></TR>
<TR><TD>
<BLOCKQUOTE>
The <TT>parseLogEntry()</TT> fuNCtion uses <TT>$_</TT> as the pattern space. This eliminates the need to pass parameters but is generally considered bad programming practice. But this is a small program, so perhaps it's okay.
</BLOCKQUOTE>
</TD></TR>
</TABLE>
</CENTER>
<P>
<P>
<IMG SRC="pseudo.gif" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/pseudo.gif" BORDER=1 ALIGN=RIGHT><p>
<BLOCKQUOTE>
<I>Turn on the </I><TT><I>warning</I></TT><I>
option.<BR>
</I>Define a format for the report's detail line.<BR>
Define a format for the report's header line.<BR>
Define the <TT>parseLogEntry()</TT>
fuNCtion.<BR>
Declare a local variable to hold the pattern that matches a single
item.<BR>
Use the matching operator to extract information into pattern
memory.<BR>
Return a list that contains the 11 items extracted from the log
entry.<BR>
Open the logfile.<BR>
Iterate over each line of the logfile.<BR>
Parse the entry to extract the 11 items but only keep the file
specification that was requested.<BR>
Put the filename into pattern memory.<BR>
Store the filename into <TT>$fileName</TT>.
<BR>
Test to see if <TT>$fileName</TT>
is defined.<BR>
INCrement the file specification's value in the <TT>%docList</TT>
hash.<BR>
Close the log file.<BR>
Iterate over the hash that holds the file specifications.<BR>
Write out each hash entry in a report.
</BLOCKQUOTE>
<HR>
<P>
<B>Listing 21.3 21LST03.PL-Creating a Report of the
Access Counts for Documents that Start with the Letter S<BR>
</B>
<BLOCKQUOTE>
<PRE>#!/usr/bin/perl -w
format =
@<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @>>>>>>>
$document, $count
.
format STDOUT_TOP =
@|||||||||||||||||||||||||||||||||||| Pg @<
"Access Counts for S* Documents",, $%
Document Access Count
--------------------------------------- ------------
.
sub parseLogEntry {
my($w) = "(.+?)";
m/^$w $w $w \[$w:$w $w\] "$w $w $w" $w $w/;
return($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11);
}
$LOGFILE = "access.log";
open(LOGFILE) or die("Could not open log file.");
foreach (<LOGFILE>) {
$fileSpec = (parseLogEntry())[7];
$fileSpec =~ m!.+/(.+)!;
$fileName = $1;
# some requests don't specify a filename, just a directory.
if (defined($fileName)) {
$docList{$fileSpec}++ if $fileName =~ m/^s/i;
}
}
close(LOGFILE);
foreach $document (sort(keys(%docList))) {
$count = $docList{$document};
write;
}
</PRE>
</BLOCKQUOTE>
<HR>
<P>
This program displays:<BR>
<BLOCKQUOTE>
<PRE>Access Counts for S* Documents Pg 1
Document Access Count
-------------------------------------- ------------
/~bamohr/scapenow.gif 1
/~jltiNChe/songs/song2.gif 5
/~mtmortoj/mortoja_html/song.html 1
/~scmccubb/pics/shock.gif 1
</PRE>
</BLOCKQUOTE>
<P>
This program has a couple of points that deserve a comment or
two. First, notice that the program takes advantage of the fact
that Perl's variables default to a global scope. The main program
values <TT>$_</TT> with each log file
entry and <TT>parseLogEntry()</TT>
also directly accesses <TT>$_</TT>.
This is okay for a small program but for larger programs, you
need to use local variables. Second, notice that it takes two
steps to specify files that start with a letter. The filename
needs to be extracted from <TT>$fileSpec</TT>
and then the filename can be filtered inside the <TT>if</TT>
statement. If the file that was requested has no filename, the
server will probably default to <TT>index.html</TT>.
However, this program doesn't take this into account. It simply
ignores the log file entry if no file was explicitly requested.
<P>
You can use this same counting technique to display the most frequent
remote sites that contact your server. You can also check the
status code to see how many requests have been rejected. The next
section looks at status codes.
<H3><A NAME="ExampleLookingattheStatusCode">
Example: Looking at the Status Code</A></H3>
<P>
It is important for you to periodically check the server's log
file in order to determine if unauthorized people are trying to
access secured documents. This is done by checking the status
code in the log file entries.
<P>
Every status code is a three digit number. The first digit defines
how your server responded to the request. The last two digits
do not have any categorization role. There are five values for
the first digit:
<UL>
<LI><B>1xx: </B>Informational-Not used, but reserved for future
use
<LI><B>2xx</B>: Success-The action was successfully received,
understood, and accepted.
<LI><B>3xx</B>: Redirection - Further action must be taken in
order to complete the request.
<LI><B>4xx</B>: Client Error - The request contains bad syntax
or cannot be fulfilled.
<LI><B>5xx</B>: Server Error - The server failed to fulfill an
apparently valid request.
</UL>
<P>
Table 21.1 contains a list of the most common status codes that
can appear in your log file. You can find a complete list on the
<B>http://www.w3.org/pub/WWW/Protocols/HTTP/1.0/spec.html</B>
Web page.<BR>
<P>
<CENTER><B>Table 21.1 The Most Common Server Status
Codes</B></CENTER>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD WIDTH=97><CENTER><I>Status</I></CENTER></TD><TD WIDTH=234><I>Description Code</I>
</TD></TR>
<TR><TD WIDTH=97><CENTER>200</CENTER></TD><TD WIDTH=234>OK</TD>
</TR>
<TR><TD WIDTH=97><CENTER>204</CENTER></TD><TD WIDTH=234>No content
</TD></TR>
<TR><TD WIDTH=97><CENTER>301</CENTER></TD><TD WIDTH=234>Moved permanently
</TD></TR>
<TR><TD WIDTH=97><CENTER>302</CENTER></TD><TD WIDTH=234>Moved temporarily
</TD></TR>
<TR><TD WIDTH=97><CENTER>400</CENTER></TD><TD WIDTH=234>Bad Request
</TD></TR>
<TR><TD WIDTH=97><CENTER>401</CENTER></TD><TD WIDTH=234>Unauthorized
</TD></TR>
<TR><TD WIDTH=97><CENTER>403</CENTER></TD><TD WIDTH=234>Forbidden
</TD></TR>
<TR><TD WIDTH=97><CENTER>404</CENTER></TD><TD WIDTH=234>Not found
</TD></TR>
<TR><TD WIDTH=97><CENTER>500</CENTER></TD><TD WIDTH=234>Internal server error
</TD></TR>
<TR><TD WIDTH=97><CENTER>501</CENTER></TD><TD WIDTH=234>Not implemented
</TD></TR>
<TR><TD WIDTH=97><CENTER>503</CENTER></TD><TD WIDTH=234>Service unavailable
</TD></TR>
</TABLE>
</CENTER>
<P>
<P>
Status code 401 is logged when a user attempts to access a secured
document and enters an iNCorrect password. By searching the log
file for this code, you can create a report of the failed attempts
to gain entry into your site. Listing 21.4 shows how the log file
could be searched for a specific error code-in this case, 401.
<P>
<IMG SRC="pseudo.gif" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/pseudo.gif" BORDER=1 ALIGN=RIGHT><p>
<BLOCKQUOTE>
<I>Turn on the warning option.<BR>
Define a format for the report's detail line.<BR>
Define a format for the report's header line.<BR>
Define the </I><TT><I>parseLogEntry()</I></TT><I>
fuNCtion.<BR>
Declare a local variable to hold the pattern that matches a single
item.<BR>
Use the matching operator to extract information into pattern
memory.<BR>
Return a list that contains the 11 items extracted from the log
entry.<BR>
Open the logfile.<BR>
Iterate over each line of the logfile.<BR>
Parse the entry to extract the 11 items but only keep the site
information and the status code that was requested.<BR>
If the status code is 401 then save the iNCrement the counter
for that site.<BR>
Close the log file.<BR>
Check the site name to see if it has any entries. If not, display
a message that says no unauthorized accesses took place.<BR>
Iterate over the hash that holds the site names.<BR>
Write out each hash entry in a report.</I>
</BLOCKQUOTE>
<HR>
<P>
<B>Listing 21.4 21LST04.PL-Checking for Unauthorized
Access Attempts<BR>
</B>
<BLOCKQUOTE>
<PRE>#!/usr/bin/perl -w
format =
@<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @>>>>>>>
$site, $count
.
format STDOUT_TOP =
@|||||||||||||||||||||||||||||||||||| Pg @<
"Unauthorized Access Report", $%
Remote Site Name Access Count
--------------------------------------- ------------
.
sub parseLogEntry {
my($w) = "(.+?)";
m/^$w $w $w \[$w:$w $w\] "$w $w $w" $w $w/;
return($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11);
}
$LOGFILE = "access.log";
open(LOGFILE) or die("Could not open log file.");
foreach (<LOGFILE>) {
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -