📄 ch21.htm

📁 prrl 5 programs codes in the book
💻 HTM
📖 第 1 页 / 共 5 页
字号:

option.<BR>

</I>Initialize <TT>$LOGFILE</TT> with

the full path and name of the access log.<BR>

Open the log file.<BR>

Iterate over the lines of the log file. Each line gets placed,in

turn, into <TT>$line</TT>.<BR>

Define a temporary variable to hold a pattern that recognizesa

single item.<BR>

Use the matching operator to store the 11 items into pattern memory.

<BR>

Store the pattern memories into individual variables.<BR>

Close the log file.

</BLOCKQUOTE>

<HR>

<P>

<B>Listing 21.2&nbsp;&nbsp;21LST02.PL-Using a Regular Expression

to Parse the Log File Entry<BR>

</B>

<BLOCKQUOTE>

<PRE>#!/usr/bin/perl -w



$LOGFILE = &quot;access.log&quot;;

open(LOGFILE) or die(&quot;Could not open log file.&quot;);

foreach $line (&lt;LOGFILE&gt;) {

    $w = &quot;(.+?)&quot;;

    $line =~ m/^$w $w $w \[$w:$w $w\] &quot;$w $w $w&quot; $w $w/;



    $site     = $1;

    $logName  = $2;

    $fullName = $3;

    $date     = $4;

    $time     = $5;

    $gmt      = $6;

    $req      = $7;

    $file     = $8;

    $proto    = $9;

    $status   = $10;

    $length   = $11;



    # do line-by-line processing.

}

close(LOGFILE);

</PRE>

</BLOCKQUOTE>

<HR>

<P>

The main advantage to using regular expressions to extract information

is the ease with which you can adjust the pattern to account for

different log file formats. If you use a server that delimits

the date/time item with curly brackets, you only need to change

the line with the matching operator to accommodate the different

format.

<H3><A NAME="ExampleListingAccessbyDocument">

Example: Listing Access by Document</A></H3>

<P>

One easy and useful analysis that you can do is to find out how

many times each document at your site has been visited. Listing

21.3 contains a program that reports on the access counts of documents

beginning with the letter s.<BR>

<p>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>

<TR><TD><B>Note</B></TD></TR>

<TR><TD>

<BLOCKQUOTE>

The <TT>parseLogEntry()</TT> fuNCtion uses <TT>$_</TT> as the pattern space. This eliminates the need to pass parameters but is generally considered bad programming practice. But this is a small program, so perhaps it's okay.

</BLOCKQUOTE>



</TD></TR>

</TABLE>

</CENTER>

<P>

<P>

<IMG SRC="pseudo.gif" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/pseudo.gif" BORDER=1 ALIGN=RIGHT><p>

<BLOCKQUOTE>

<I>Turn on the </I><TT><I>warning</I></TT><I>

option.<BR>

</I>Define a format for the report's detail line.<BR>

Define a format for the report's header line.<BR>

Define the <TT>parseLogEntry()</TT>

fuNCtion.<BR>

Declare a local variable to hold the pattern that matches a single

item.<BR>

Use the matching operator to extract information into pattern

memory.<BR>

Return a list that contains the 11 items extracted from the log

entry.<BR>

Open the logfile.<BR>

Iterate over each line of the logfile.<BR>

Parse the entry to extract the 11 items but only keep the file

specification that was requested.<BR>

Put the filename into pattern memory.<BR>

Store the filename into <TT>$fileName</TT>.

<BR>

Test to see if <TT>$fileName</TT>

is defined.<BR>

INCrement the file specification's value in the <TT>%docList</TT>

hash.<BR>

Close the log file.<BR>

Iterate over the hash that holds the file specifications.<BR>

Write out each hash entry in a report.

</BLOCKQUOTE>

<HR>

<P>

<B>Listing 21.3&nbsp;&nbsp;21LST03.PL-Creating a Report of the

Access Counts for Documents that Start with the Letter S<BR>

</B>

<BLOCKQUOTE>

<PRE>#!/usr/bin/perl -w



format =

  @&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt; @&gt;&gt;&gt;&gt;&gt;&gt;&gt;

  $document,                              $count

.



format STDOUT_TOP =

  @||||||||||||||||||||||||||||||||||||  Pg @&lt;



  &quot;Access Counts for S* Documents&quot;,,        $%

  Document                                Access Count

  --------------------------------------- ------------

.



sub parseLogEntry {

    my($w) = &quot;(.+?)&quot;;

    m/^$w $w $w \[$w:$w $w\] &quot;$w $w $w&quot; $w $w/;

    return($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11);

}





$LOGFILE = &quot;access.log&quot;;

open(LOGFILE) or die(&quot;Could not open log file.&quot;);

foreach (&lt;LOGFILE&gt;) {

    $fileSpec = (parseLogEntry())[7];

    $fileSpec =~ m!.+/(.+)!;

    $fileName = $1;

    # some requests don't specify a filename, just a directory.

    if (defined($fileName)) {

        $docList{$fileSpec}++ if $fileName =~ m/^s/i;

    }

}

close(LOGFILE);



foreach $document (sort(keys(%docList))) {

    $count = $docList{$document};

    write;

}

</PRE>

</BLOCKQUOTE>

<HR>

<P>

This program displays:<BR>

<BLOCKQUOTE>

<PRE>Access Counts for S* Documents      Pg 1



  Document                                Access Count

  -------------------------------------- ------------

  /~bamohr/scapenow.gif                          1

  /~jltiNChe/songs/song2.gif                     5

  /~mtmortoj/mortoja_html/song.html              1

  /~scmccubb/pics/shock.gif                      1

</PRE>

</BLOCKQUOTE>

<P>

This program has a couple of points that deserve a comment or

two. First, notice that the program takes advantage of the fact

that Perl's variables default to a global scope. The main program

values <TT>$_</TT> with each log file

entry and <TT>parseLogEntry()</TT>

also directly accesses <TT>$_</TT>.

This is okay for a small program but for larger programs, you

need to use local variables. Second, notice that it takes two

steps to specify files that start with a letter. The filename

needs to be extracted from <TT>$fileSpec</TT>

and then the filename can be filtered inside the <TT>if</TT>

statement. If the file that was requested has no filename, the

server will probably default to <TT>index.html</TT>.

However, this program doesn't take this into account. It simply

ignores the log file entry if no file was explicitly requested.

<P>

You can use this same counting technique to display the most frequent

remote sites that contact your server. You can also check the

status code to see how many requests have been rejected. The next

section looks at status codes.

<H3><A NAME="ExampleLookingattheStatusCode">

Example: Looking at the Status Code</A></H3>

<P>

It is important for you to periodically check the server's log

file in order to determine if unauthorized people are trying to

access secured documents. This is done by checking the status

code in the log file entries.

<P>

Every status code is a three digit number. The first digit defines

how your server responded to the request. The last two digits

do not have any categorization role. There are five values for

the first digit:

<UL>

<LI><B>1xx: </B>Informational-Not used, but reserved for future

use

<LI><B>2xx</B>: Success-The action was successfully received,

understood, and accepted.

<LI><B>3xx</B>: Redirection - Further action must be taken in

order to complete the request.

<LI><B>4xx</B>: Client Error - The request contains bad syntax

or cannot be fulfilled.

<LI><B>5xx</B>: Server Error - The server failed to fulfill an

apparently valid request.

</UL>

<P>

Table 21.1 contains a list of the most common status codes that

can appear in your log file. You can find a complete list on the

<B>http://www.w3.org/pub/WWW/Protocols/HTTP/1.0/spec.html</B>

Web page.<BR>

<P>

<CENTER><B>Table 21.1&nbsp;&nbsp;The Most Common Server Status

Codes</B></CENTER>

<p>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>

<TR><TD WIDTH=97><CENTER><I>Status</I></CENTER></TD><TD WIDTH=234><I>Description Code</I>

</TD></TR>

<TR><TD WIDTH=97><CENTER>200</CENTER></TD><TD WIDTH=234>OK</TD>

</TR>

<TR><TD WIDTH=97><CENTER>204</CENTER></TD><TD WIDTH=234>No content

</TD></TR>

<TR><TD WIDTH=97><CENTER>301</CENTER></TD><TD WIDTH=234>Moved permanently

</TD></TR>

<TR><TD WIDTH=97><CENTER>302</CENTER></TD><TD WIDTH=234>Moved temporarily

</TD></TR>

<TR><TD WIDTH=97><CENTER>400</CENTER></TD><TD WIDTH=234>Bad Request

</TD></TR>

<TR><TD WIDTH=97><CENTER>401</CENTER></TD><TD WIDTH=234>Unauthorized

</TD></TR>

<TR><TD WIDTH=97><CENTER>403</CENTER></TD><TD WIDTH=234>Forbidden

</TD></TR>

<TR><TD WIDTH=97><CENTER>404</CENTER></TD><TD WIDTH=234>Not found

</TD></TR>

<TR><TD WIDTH=97><CENTER>500</CENTER></TD><TD WIDTH=234>Internal server error

</TD></TR>

<TR><TD WIDTH=97><CENTER>501</CENTER></TD><TD WIDTH=234>Not implemented

</TD></TR>

<TR><TD WIDTH=97><CENTER>503</CENTER></TD><TD WIDTH=234>Service unavailable

</TD></TR>

</TABLE>

</CENTER>

<P>

<P>

Status code 401 is logged when a user attempts to access a secured

document and enters an iNCorrect password. By searching the log

file for this code, you can create a report of the failed attempts

to gain entry into your site. Listing 21.4 shows how the log file

could be searched for a specific error code-in this case, 401.

<P>

<IMG SRC="pseudo.gif" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/pseudo.gif" BORDER=1 ALIGN=RIGHT><p>

<BLOCKQUOTE>

<I>Turn on the warning option.<BR>

Define a format for the report's detail line.<BR>

Define a format for the report's header line.<BR>

Define the </I><TT><I>parseLogEntry()</I></TT><I>

fuNCtion.<BR>

Declare a local variable to hold the pattern that matches a single

item.<BR>

Use the matching operator to extract information into pattern

memory.<BR>

Return a list that contains the 11 items extracted from the log

entry.<BR>

Open the logfile.<BR>

Iterate over each line of the logfile.<BR>

Parse the entry to extract the 11 items but only keep the site

information and the status code that was requested.<BR>

If the status code is 401 then save the iNCrement the counter

for that site.<BR>

Close the log file.<BR>

Check the site name to see if it has any entries. If not, display

a message that says no unauthorized accesses took place.<BR>

Iterate over the hash that holds the site names.<BR>

Write out each hash entry in a report.</I>

</BLOCKQUOTE>

<HR>

<P>

<B>Listing 21.4&nbsp;&nbsp;21LST04.PL-Checking for Unauthorized

Access Attempts<BR>

</B>

<BLOCKQUOTE>

<PRE>#!/usr/bin/perl -w



format =

  @&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt; @&gt;&gt;&gt;&gt;&gt;&gt;&gt;

  $site,                                  $count

.



format STDOUT_TOP =

  @||||||||||||||||||||||||||||||||||||  Pg @&lt;

  &quot;Unauthorized Access Report&quot;,             $%



  Remote Site Name                        Access Count

  --------------------------------------- ------------

.



sub parseLogEntry {

    my($w) = &quot;(.+?)&quot;;

    m/^$w $w $w \[$w:$w $w\] &quot;$w $w $w&quot; $w $w/;

    return($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11);

}





$LOGFILE = &quot;access.log&quot;;

open(LOGFILE) or die(&quot;Could not open log file.&quot;);

foreach (&lt;LOGFILE&gt;) {
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -