📄 ch14.htm

📁 美国Macmillan出版社编写的Perl教程《Perl CGI Web Pages for WINNT》
💻 HTM
📖 第 1 页 / 共 3 页
字号:
Fri May 10 10:08:44 1996 wait.pspt.fi 192.89.123.26 GET /gif/VF21.GIF HTTP/1.0

Fri May 10 10:08:44 1996 wait.pspt.fi 192.89.123.26 GET /gif/KOULU.GIF HTTP/1.0

Fri May 10 10:11:59 1996 wait.pspt.fi 192.89.123.26 GET /AVITULOS.HTM HTTP/1.0

Fri May 10 10:12:05 1996 wait.pspt.fi 192.89.123.26 GET /gif/VIFA5PAP.GIF HTTP/1.0

Fri May 10 10:12:44 1996 wait.pspt.fi 192.89.123.26 GET /gif/NAPPI2.gif HTTP/1.0

Fri May 10 10:12:47 1996 wait.pspt.fi 192.89.123.26 GET /gif/NAPPI3.gif HTTP/1.0

Fri May 10 10:13:49 1996 wait.pspt.fi 192.89.123.26 GET /AVIONGEL.HTM HTTP/1.0

Fri May 10 10:13:59 1996 wait.pspt.fi 192.89.123.26 GET /gif/PIRUNLG.GIF HTTP/1.0

</PRE>

</BLOCKQUOTE>

<P>

In this log file you can see the different calls to the different

Perl scripts, and the method by which the request is made, either

Get or Post. The log file begins with the first request made that

day, and finishes with the last. This example is a very short

one, edited from its original for this example, so you can imagine

that log files on very active Web servers can easily become triple

this length. Purging log files is a very important practice to

integrate into your Web maintainence routine.

<P>

When you go to purge your log files, remember that you are going

to erase information you may need in the future. If you are generating

reports from these logs, make sure you only delete logs for which

reports have already been made. It is very common that these reports

run on a one- or two-week lag time behind the current date, so

the last one or two weeks' log files must be kept to successfully

generate these reports. 

<P>

The creation of a new log file for each day makes this process

of purging much easier than some HTTP servers which place all

log entries into one file, like the &quot;access_log&quot; file

used with the NCSA HTTP server. Instead of having to go into the

file and delete specific entries, creating an editing hassle,

the EMWAC server gives you the advantage of deleting the entire

log file for the days no longer necessary for generating reports.

<P>

Each log file is kept open until the next day's log records its

first action, or transaction. Once this transaction occurs, the

previous day's log file is closed. The data transactions recorded

in the EMWAC log files are as follows:

<UL>

<LI>The time and date of the request

<LI>The IP address or domain name of the server

<LI>The IP address or domain name of the client

<LI>The HTTP command

<LI>The URL requested

<LI>The version of the HTTP protocol used (when no version shows

up in&nbsp;the log file, this means the default version of 0.9

HTTP was used)  

</UL>

<P>

All of this information can be used to provide detailed reports

on Web site traffic. 

<P>

One way to find out the accurate number of hits a site is receiving

is to use the daily log file. By understanding the format of the

HTTP header that makes the request of the site's home page, we

can use a simple script to count actual hits.

<P>

Using the grep command in Perl, the Goo Goo Records' Web site

first used this script to figure out how many users accessed their

site. You might recall that the grep command uses the concept

of regular expressions to look for a match and then compiles a

list of all matches to the designated character string, or regular

expression.

<BLOCKQUOTE>

<PRE>

#! usr\bin\perl

     print &quot;content-type: text/html\n\n&quot;;

     $num = grep -c 'GET / HTTP' /googoo.com/ WINNT35\system32\LogFiles' ;

     $num += 'grep -c 'GET index.sht /googoo.com/ WINNT35\system32\LogFiles' ; 

     $num += 'grep -c 'GET index.htm /googoo.com/ WINNT35\system32\LogFiles' ;

     print &quot;$num\n&quot;;

</PRE>

</BLOCKQUOTE>

<P>

The Web Master abandoned this method of user hit tabulation early

on for several reasons. The first reason was that this method

may be more accurate, but it is very time consuming, because it

has to read through and count every match that occurs in the long

daily log files. The second reason is that each page that was

to be monitored had to have its own modified version of this script,

because the script makes a specific call to the page named in

the script. Another bad side effect of this script is that it

forces you to make your index Web page, index.htm, and a Server

Side Includes page for the whole thing to work. This will greatly

reduce the speed at which your home page works. The final reason

is that the site started using the EMWAC HTTP service, which doesn't

support Server Side Includes (notice the &quot;.sht&quot; file

extension used in the script which is the shortened, and approved,

NT version of &quot;.shtml&quot;), making the scripts useless.

Good thing for the Web Master there are several other ways to

count hits on a Web page.

<H3><A NAME="HTTPStatusCodes">

HTTP Status Codes </A></H3>

<P>

There are very few people left who use the Web and have not encountered

HTTP server codes yet. There may be nothing quite as frustrating

as not receiving the HTML document you requested, but instead

the message, &quot;Forbidden, access not granted&quot; or a similar,

one-line response. These responses are some of the many HTTP status

codes which are issued with each request made of a Web server.

Table 14.1 outlines the different types of HTTP status codes,

and what they mean.<BR>

<P>

<CENTER><B>Table 14.1 HTTP status codes</B></CENTER>

<P>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>

<TR VALIGN=TOP><TD WIDTH=144><CENTER><B>HTTP Status Code</B></CENTER></TD>

<TD WIDTH=144><CENTER><B>Code Type</B></CENTER></TD><TD WIDTH=288><CENTER><B>Meaning </B></CENTER>

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>200</CENTER></TD><TD WIDTH=144>Successful request

</TD><TD WIDTH=288>OK-The request was satisfied.</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>201</CENTER></TD><TD WIDTH=144>Successful request

</TD><TD WIDTH=288>OK-following a POST command.</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>202</CENTER></TD><TD WIDTH=144>Successful request

</TD><TD WIDTH=288>OK-request accepted for processing, but processing is not complete.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>203</CENTER></TD><TD WIDTH=144>Successful request

</TD><TD WIDTH=288>OK-Partial information-the returned information is only partial.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>204 </CENTER></TD><TD WIDTH=144>Successful request

</TD><TD WIDTH=288>OK-No Response-request received but no information exists to send back.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>300</CENTER></TD><TD WIDTH=144>Redirection

</TD><TD WIDTH=288>Moved-The information requested is in a new location and the change is permanent.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>301</CENTER></TD><TD WIDTH=144>Redirection

</TD><TD WIDTH=288>Found-The information requested temporarily has a different URL.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>302</CENTER></TD><TD WIDTH=144>Redirection

</TD><TD WIDTH=288>Method-Information under going change, a suggestion for the client to try another location.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>303</CENTER></TD><TD WIDTH=144>Redirection

</TD><TD WIDTH=288>Not Modified-The document has not been modified as expected in the Get request.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>304</CENTER></TD><TD WIDTH=144>Redirection

</TD><TD WIDTH=288>Not delivered from cache-The client already has all the information, the browser just needs to display it again.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>400</CENTER></TD><TD WIDTH=144>Error with client

</TD><TD WIDTH=288>Bad Request-A syntax problem with the client's request, or the request could not be satisfied.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>401</CENTER></TD><TD WIDTH=144>Error with client

</TD><TD WIDTH=288>Unauthorized-Client does not have authorization to access information requested.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>402</CENTER></TD><TD WIDTH=144>Error with client

</TD><TD WIDTH=288>Payment Granted-Used when payment methods are employed by the&nbsp;server and have been satified/ accepted.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>403</CENTER></TD><TD WIDTH=144>Error with client

</TD><TD WIDTH=288>Forbidden-No access for client to&nbsp;information, even with proper authorization.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>404</CENTER></TD><TD WIDTH=144>Error with client

</TD><TD WIDTH=288>Not Found-Server could not find file to satisfy client request.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>405</CENTER></TD><TD WIDTH=144>Error with client

</TD><TD WIDTH=288>Method Not Allowed-The method used in the request line is not allowed for access to the information in the request URL.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>406</CENTER></TD><TD WIDTH=144>Error with client

</TD><TD WIDTH=288>None Acceptable-The information requested has been found, but not within the conditions stated in the request, monitored by the Accept and Accept-Encoding request headers.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>407</CENTER></TD><TD WIDTH=144>Error with client

</TD><TD WIDTH=288>Proxy Authentication Required-This code is not in service yet because HTTP 1.0 does not have proxy capacity yet. When it does, this will indicate proper client authentication necessary to continue.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>409</CENTER></TD><TD WIDTH=144>Error with client

</TD><TD WIDTH=288>Conflict-There is a conflict with the information requested in its current state, preventing access.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>410</CENTER></TD><TD WIDTH=144>Error with client

</TD><TD WIDTH=288>Gone-Information requested by client is no longer available, with no forwarding URL.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>411</CENTER></TD><TD WIDTH=144>Error with client

</TD><TD WIDTH=288>Authorization Refused-The credentials in the client request are not satisfactory to allow access to requested information.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>500</CENTER></TD><TD WIDTH=144>Error with server

</TD><TD WIDTH=288>Internal Server Error-An unexpected condition has caused the server to&nbsp;be unable to satisfy the client's request.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>501</CENTER></TD><TD WIDTH=144>Error with server

</TD><TD WIDTH=288>Not Implemented-The client's request includes facilities not currently supported by the server.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=144><CENTER>502</CENTER></TD><TD WIDTH=144>Error with server

</TD><TD WIDTH=288>Bad Gateway-Upstream gateway access necessary for completing request denied or failed.

</TD></TR>

</TABLE></CENTER>

<P>

<P>

Understanding these status codes is critical if you want to keep

track of what is happening on your server with your Web sites.

While these status codes are not recorded by the EMWAC HTTP service

log files that the Goo Goo Records site uses, they are in the

log files of other servers. 

<H2><A NAME="TrackingandEnvironmentalVariables"><FONT SIZE=5 COLOR=#FF0000>

Tracking and Environmental Variables</FONT></A></H2>

<P>

In each of the Perl scripts that we have used so far, a standard

bit of code is used to parse off the form data. We then used that

parsed data to make decisions on which the Perl script is to act.

Form data, however, is not the only data we can glean from a user

through the server. In any Perl script called by an HTML document,

we can use the special environment variables to make decisions.

<P>

Environment variables are accessed through the %ENV variable,

and can be readily used. For example, if you wanted to track how

many users from the googoo.com domain have used your Perl scripts,

you could add the following snippet of code to each Perl script:

<BLOCKQUOTE>

<PRE>

if ($ENV{'REMOTE_HOST'}=~/googoo\.com/i) {

          open(TRACK,&quot;c:\logs\scripts.trk&quot;);

          $line=&lt;TRACK&gt;;

          close(TRACK);

          $line++;

          open(TRACK,&quot;&gt;c:\logs\scripts.trk&quot;);

          print TRACK $line;

          close(TRACK);

     }

</PRE>

</BLOCKQUOTE>

<P>

This script will increment the number contained in scripts.trk

every time the script is accessed only if the client is accessing

from within googoo.com. This could be useful for Web sites that

will only deliver certain pages to internal users, or to track

which users are inside, and which are outside your company.

<P>

In addition to the environmental variables already present on

the NT Server, and those which may have been added, the EMWAC

HTTP service uses the environmental variables listed in Table

14.2.<BR>

<P>

<CENTER><B>Table 14.2 Environmental variables</B></CENTER>

<P>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>

<TR VALIGN=TOP><TD WIDTH=192><CENTER><B>Environmental Variable</B></CENTER>

</TD><TD WIDTH=384><CENTER><B>Description</B></CENTER></TD></TR>

<TR VALIGN=TOP><TD WIDTH=192>CONTENT_LENGTH</TD><TD WIDTH=384>The length of the content as received from the client. 

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=192>CONTENT_TYPE</TD><TD WIDTH=384>The content type of the information received that has attached data, as with &quot;POST&quot; requests.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=192>GATEWAY_INTERFACE</TD><TD WIDTH=384>The CGI specification revision for the server, in the format of CGI/revsion.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=192>HTTP_ACCEPT</TD><TD WIDTH=384>This is the list of MIME types that the HTTP server will recognize, or accept, for use.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=192>PATH_INFO</TD><TD WIDTH=384>The path data based on the client's request.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=192>QUERY_STRING</TD><TD WIDTH=384>All the information that follows the &quot;?&quot; in the URL when the script specified was accessed using &quot;GET.&quot;

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=192>REMOTE_ADDR</TD><TD WIDTH=384>The client's IP address.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=192>REQUEST_METHOD</TD><TD WIDTH=384>The method of request made by the client, as&quot;GET,&quot; &quot;POST,&quot; and so forth.

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=192>SCRIPT_NAME</TD><TD WIDTH=384>Path name of the script requested to execute.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -