📄 ch21.htm

📁 this is a book on pearl , simple example with explanation is given here. it could be beneficial for
💻 HTM
📖 第 1 页 / 共 4 页
字号:
12 3 4 下一页
<HTML><HEAD><TITLE>Chapter 21  -- Using Perl with Web Servers</TITLE><META></HEAD><BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#CE2910"><H1><FONT SIZE=6 COLOR=#FF0000>Chapter&nbsp;21</FONT></H1><H1><FONT SIZE=6 COLOR=#FF0000>Using Perl with Web Servers</FONT></H1><HR><P><CENTER><B><FONT SIZE=5>CONTENTS</FONT></B></CENTER><UL><LI><A HREF="#ServerLogFiles">Server Log Files <UL><LI><A HREF="#ExampleReadingaLogFile">Example: Reading a Log File</A><LI><A HREF="#ExampleListingAccessbyDocument">Example: Listing Access by Document</A><LI><A HREF="#ExampleLookingattheStatusCode">Example: Looking at the Status Code</A><LI><A HREF="#ExampleConvertingtheReporttoaWebPage">Example: Converting the Report to a Web Page</A><LI><A HREF="#ExistingLogFileAnalyzingPrograms">Existing Log File Analyzing Programs</A><LI><A HREF="#CreatingYourOwnCGILogFile">Creating Your Own CGI Log File</A></UL><LI><A HREF="#CommunicatingwithUsers">Communicating with Users</A><UL><LI><A HREF="#ExampleGeneratingaWhatsNewPage">Example: Generating a What's New Page</A><LI><A HREF="#ExampleGettingUserFeedback">Example: Getting User Feedback</A></UL><LI><A HREF="#Summary">Summary</A><LI><A HREF="#ReviewExercises">Review Exercises</A></UL><HR><P>Web servers frequently need some type of maintenaNCe in orderto operate at peak efficieNCy. This chapter will look at somemaintenaNCe tasks that can be performed by Perl programs. Youwill see some ways that your server keeps track of who visitsand what Web pages are accessed on your site. You will also seesome ways to automatically generate a site index, a what's newdocument, and user feedback about a Web page.<H2><A NAME="ServerLogFiles"><FONT SIZE=5 COLOR=#FF0000>Server Log Files</FONT></A></H2><P>The most useful tool to assist in understanding how and when yourWeb site pages and applications are being accessed is the logfile generated by your Web server. This log file contains, amongother things, which pages are being accessed, by whom, and when.<P>Each Web server will provide some form of log file that recordswho and what accesses a specific HTML page or graphic. A terrificsite to get an overall comparison of the major Web servers canbe found at <B>http://www.webcompare.com/</B>. From this siteone can see which Web servers follow the CERN/NCSA common logformat that is detailed below. In addition, you can also findout which sites can customize log files, or write to multiplelog files. You might also be surprised at the number of Web serversthere are on the market.<P>Understanding the contents of the server log files is a worthwhileendeavor. And in this section, you'll see several ways that theinformation in the log files can be manipulated. However, if you'relike most people, you'll use one of the log file analyzers thatyou'll read about in the section &quot;Existing Log File AnalyzingPrograms&quot; to do most of your work. After all, you don't wantto create a program that others are giving away for free.<BR><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD><B>Note </B></TD></TR><TR><TD><BLOCKQUOTE>This section about server log files is one that you can read when the need arises. If you are not actively running a Web server now, you won't be able to get full value from the examples. The CD-ROM that accompanies this book has a sample log file to you to experiment on but it is very limited in size and scope.</BLOCKQUOTE></TD></TR></TABLE></CENTER><P><P>Nearly all of the major Web servers use a common format for theirlog files. These log files contain information such as the IPaddress of the remote host, the document that was requested, anda timestamp. The syntax for each line of a log file is:<PRE>site logName fullName [date:time GMToffset] &quot;req file proto&quot; status length</PRE></BLOCKQUOTE><P>Because that line of syntax is relatively meaningless, here isa line from a real log file:<BLOCKQUOTE><PRE>204.31.113.138 - - [03/Jul/1996:06:56:12 -0800]    &quot;GET /PowerBuilder/Compny3.htm HTTP/1.0&quot; 200 5593</PRE></BLOCKQUOTE><P>Even though I have split the line into two, you need to rememberthat inside the log file it really is only one line.<P>Each of the eleven items listed in the above syntax and exampleare described in the following list.<UL><LI><B>site</B>-either an IP address or the symbolic name of thesite making the HTTP request. In the example line the remotehostis <TT>204.31.113.138</TT>.<LI><B>logName</B>-login name of the user who owns the accountthat is making the HTTP request. Most remote sites don't giveout this information for security reasons. If this field is disabledby the host, you see a dash (<TT>-</TT>)instead of the login name.<LI><B>fullName</B>-full name of the user who owns the accountthat is making the HTTP request. Most remote sites don't giveout this information for security reasons. If this field is disabledby the host, you see a dash (<TT>-</TT>)instead of the full name. If your server requires a user id inorder to fulfill an HTTP request, the user id will be placed inthis field.<LI><B>date</B>-date of the HTTP request. In the example linethe date is <TT>03/Jul/1996</TT>.<LI><B>time</B>-time of the HTTP request. The time will be presentedin 24-hour format. In the example line the time is <TT>06:56:12</TT>.<LI><B>GMToffset</B>-signed offset from Greenwich Mean Time. GMTis the international time refereNCe. In the example line the offsetis -0800, eight hours earlier than GMT.<LI><B>req</B>-HTTP command. For WWW page requests, this fieldwill always start with the GET command. In the example line therequest is <TT>GET</TT>.<LI><B>file</B>-path and filename of the requested file. In theexample line the file is <TT>/PowerBuilder/Compny3.htm</TT>.There are three types of path/filename combinations:</UL><BLOCKQUOTE><B>Implied Path and Filename</B>-accesses a file in a user's homedirec-tory. For example, <TT>/~foo/</TT>could be expanded into <TT>/user/foo/homepage.html</TT>.The <TT>/user/foo</TT> directory isthe home directory for the user <TT>foo</TT>.And <TT>homepage.html</TT> is thedefault file name for any user's home page. Implied paths arehard to analyze because you need to know how the server is setup and because the server's set up may change.</BLOCKQUOTE><BLOCKQUOTE><B>Relative Path and Filename</B>-accesses a file in a directorythat is specified relative to a user's home directory. For example,<TT>/~foo/cooking.html</TT> will beexpanded into <TT>/user/foo/cooking.html</TT>.</BLOCKQUOTE><BLOCKQUOTE><B>Full Path and Filename</B>-accesses a file by explicitly statingthe full directory and filename. For example, <TT>/user/foo/biking/mountain/index.html</TT>.</BLOCKQUOTE><UL><LI><B>proto</B>-type of protocol used for the request. In theexample line, proto <TT>HTTP 1.0</TT>is used.<LI><B>status</B>-status code generated by the request. In theexample line the status is <TT>200</TT>.See section &quot;Example: Looking at the Status Code&quot; laterin the chapter for more information.<LI><B>length</B>-length of requested document. In the exampleline the byte is <TT>5593</TT>.</UL><P>Web servers can have many different types of log files. For example,you might see a proxy access log, or an error log. In this chapter,we'll focus on the access log-where the Web server tracks everyaccess to your Web site.<H3><A NAME="ExampleReadingaLogFile">Example: Reading a Log File</A></H3><P>In this section you see a Perl script that can open a log fileand iterate over the lines of the log file. It is usually unwiseto read entire log files into memory because they can get quitelarge. A friend of mine has a log file that is over 113 Megabytes!<P>Regardless of the way that you'd like to process the data, youmust open a log file and read it. You can read the entry intoone variable for processing, or you can split the entry into it'scomponents. To read each line into a single variable, use thefollowing code sample:<BLOCKQUOTE><PRE>$LOGFILE = &quot;access.log&quot;;open(LOGFILE) or die(&quot;Could not open log file.&quot;);foreach $line (&lt;LOGFILE&gt;) {    chomp($line);              # remove the newline from $line.    # do line-by-line processing.}<BR></PRE></BLOCKQUOTE><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD><B>Note</B></TD></TR><TR><TD><BLOCKQUOTE>If you don't have your own server logs, you can use the file <TT>server.log</TT> that is iNCluded on the CD-ROM that accompanies this book.</BLOCKQUOTE></TD></TR></TABLE></CENTER><P><P>The code snippet will open the log file for reading and will accessthe file one line at a time, loading the line into the <TT>$line</TT>variable. This type of processing is pretty limiting because youneed to deal with the entire log entry at oNCe.<P>A more popular way to read the log file is to split the contentsof the entry into different variables. For example, Listing 21.1uses the <TT>split()</TT> commandand some processing to value 11 variables:<P><IMG SRC="pseudo.gif" BORDER=1 ALIGN=RIGHT><p><BLOCKQUOTE><I>Turn on the </I><TT><I>warning</I></TT><I>option.<BR></I>Initialize <TT>$LOGFILE</TT> withthe full path and name of the access log.<BR>Open the log file.<BR>Iterate over the lines of the log file. Each line gets placed,<BR>in turn, into <TT>$line</TT>.<BR>Split <TT>$line</TT> using the spacecharacter as the delimiter.<BR>Get the time value from the <TT>$date</TT>variable.<BR>Remove the date value from the <TT>$date</TT>variable avoiding the time<BR>value and the '[' character.<BR>Remove the '&quot;' character from the beginning of the requestvalue.<BR>Remove the end square bracket from the <TT>gmt</TT>offset value.<BR>Remove the end quote from the protocol value.<BR>Close the log file.</BLOCKQUOTE><HR><P><B>Listing 21.1&nbsp;&nbsp;21LST01.PL-Read the Access Log andParse Each Entry<BR></B><BLOCKQUOTE><PRE>#!/usr/bin/perl -w$LOGFILE = &quot;access.log&quot;;open(LOGFILE) or die(&quot;Could not open log file.&quot;);foreach $line (&lt;LOGFILE&gt;) {        ($site, $logName, $fullName, $date, $gmt,         $req, $file, $proto, $status, $length) = split(' ',$line);    $time = substr($date, 13);    $date = substr($date, 1, 11);    $req  = substr($req, 1);    chop($gmt);    chop($proto);    # do line-by-line processing.}close(LOGFILE);</PRE></BLOCKQUOTE><HR><P>If you print out the variables, you might get a display like this:<BR><BLOCKQUOTE><PRE>$site     = ros.algonet.se$logName  = -$fullName = -$date     = 09/Aug/1996$time     = 08:30:52$gmt      = -0500$req      = GET$file     = /~jltiNChe/songs/rib_supp.gif$proto    = HTTP/1.0$status   = 200$length   = 1543</PRE></BLOCKQUOTE><P>You can see that after the split is done, further manipulationis needed in order to &quot;clean up&quot; the values inside thevariable. At the very least, the square brackets and the double-quotesneeded to be removed.<P>I prefer to use a regular expression to extract the informationfrom the log file entries. I feel that this approach is more straightforward-assumingthat you are comfortable with regular expressions-than the others.Listing 21.2 shows a program that uses a regular expression todetermine the 11 items in the log entries.<P><IMG SRC="pseudo.gif" BORDER=1 ALIGN=RIGHT><p><BLOCKQUOTE><I>Turn on the </I><TT><I>warning</I></TT><I>option.<BR></I>Initialize <TT>$LOGFILE</TT> withthe full path and name of the access log.<BR>Open the log file.<BR>Iterate over the lines of the log file. Each line gets placed,inturn, into <TT>$line</TT>.<BR>Define a temporary variable to hold a pattern that recognizesasingle item.<BR>Use the matching operator to store the 11 items into pattern memory.<BR>Store the pattern memories into individual variables.<BR>Close the log file.</BLOCKQUOTE><HR><P><B>Listing 21.2&nbsp;&nbsp;21LST02.PL-Using a Regular Expressionto Parse the Log File Entry<BR></B><BLOCKQUOTE><PRE>#!/usr/bin/perl -w$LOGFILE = &quot;access.log&quot;;open(LOGFILE) or die(&quot;Could not open log file.&quot;);foreach $line (&lt;LOGFILE&gt;) {    $w = &quot;(.+?)&quot;;    $line =~ m/^$w $w $w \[$w:$w $w\] &quot;$w $w $w&quot; $w $w/;    $site     = $1;    $logName  = $2;    $fullName = $3;    $date     = $4;    $time     = $5;    $gmt      = $6;    $req      = $7;    $file     = $8;    $proto    = $9;    $status   = $10;    $length   = $11;    # do line-by-line processing.}close(LOGFILE);</PRE></BLOCKQUOTE><HR><P>The main advantage to using regular expressions to extract informationis the ease with which you can adjust the pattern to account fordifferent log file formats. If you use a server that delimitsthe date/time item with curly brackets, you only need to changethe line with the matching operator to accommodate the differentformat.<H3><A NAME="ExampleListingAccessbyDocument">Example: Listing Access by Document</A></H3><P>One easy and useful analysis that you can do is to find out howmany times each document at your site has been visited. Listing21.3 contains a program that reports on the access counts of documentsbeginning with the letter s.<BR><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD><B>Note</B></TD></TR><TR><TD><BLOCKQUOTE>The <TT>parseLogEntry()</TT> fuNCtion uses <TT>$_</TT> as the pattern space. This eliminates the need to pass parameters but is generally considered bad programming practice. But this is a small program, so perhaps it's okay.</BLOCKQUOTE></TD></TR></TABLE></CENTER><P><P><IMG SRC="pseudo.gif" BORDER=1 ALIGN=RIGHT><p><BLOCKQUOTE><I>Turn on the </I><TT><I>warning</I></TT><I>option.<BR>
12 3 4 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -