📄 ch14.htm
字号:
Fri May 10 10:08:44 1996 wait.pspt.fi 192.89.123.26 GET /gif/VF21.GIF HTTP/1.0
Fri May 10 10:08:44 1996 wait.pspt.fi 192.89.123.26 GET /gif/KOULU.GIF HTTP/1.0
Fri May 10 10:11:59 1996 wait.pspt.fi 192.89.123.26 GET /AVITULOS.HTM HTTP/1.0
Fri May 10 10:12:05 1996 wait.pspt.fi 192.89.123.26 GET /gif/VIFA5PAP.GIF HTTP/1.0
Fri May 10 10:12:44 1996 wait.pspt.fi 192.89.123.26 GET /gif/NAPPI2.gif HTTP/1.0
Fri May 10 10:12:47 1996 wait.pspt.fi 192.89.123.26 GET /gif/NAPPI3.gif HTTP/1.0
Fri May 10 10:13:49 1996 wait.pspt.fi 192.89.123.26 GET /AVIONGEL.HTM HTTP/1.0
Fri May 10 10:13:59 1996 wait.pspt.fi 192.89.123.26 GET /gif/PIRUNLG.GIF HTTP/1.0
</PRE>
</BLOCKQUOTE>
<P>
In this log file you can see the different calls to the different
Perl scripts, and the method by which the request is made, either
Get or Post. The log file begins with the first request made that
day, and finishes with the last. This example is a very short
one, edited from its original for this example, so you can imagine
that log files on very active Web servers can easily become triple
this length. Purging log files is a very important practice to
integrate into your Web maintainence routine.
<P>
When you go to purge your log files, remember that you are going
to erase information you may need in the future. If you are generating
reports from these logs, make sure you only delete logs for which
reports have already been made. It is very common that these reports
run on a one- or two-week lag time behind the current date, so
the last one or two weeks' log files must be kept to successfully
generate these reports.
<P>
The creation of a new log file for each day makes this process
of purging much easier than some HTTP servers which place all
log entries into one file, like the "access_log" file
used with the NCSA HTTP server. Instead of having to go into the
file and delete specific entries, creating an editing hassle,
the EMWAC server gives you the advantage of deleting the entire
log file for the days no longer necessary for generating reports.
<P>
Each log file is kept open until the next day's log records its
first action, or transaction. Once this transaction occurs, the
previous day's log file is closed. The data transactions recorded
in the EMWAC log files are as follows:
<UL>
<LI>The time and date of the request
<LI>The IP address or domain name of the server
<LI>The IP address or domain name of the client
<LI>The HTTP command
<LI>The URL requested
<LI>The version of the HTTP protocol used (when no version shows
up in the log file, this means the default version of 0.9
HTTP was used)
</UL>
<P>
All of this information can be used to provide detailed reports
on Web site traffic.
<P>
One way to find out the accurate number of hits a site is receiving
is to use the daily log file. By understanding the format of the
HTTP header that makes the request of the site's home page, we
can use a simple script to count actual hits.
<P>
Using the grep command in Perl, the Goo Goo Records' Web site
first used this script to figure out how many users accessed their
site. You might recall that the grep command uses the concept
of regular expressions to look for a match and then compiles a
list of all matches to the designated character string, or regular
expression.
<BLOCKQUOTE>
<PRE>
#! usr\bin\perl
print "content-type: text/html\n\n";
$num = grep -c 'GET / HTTP' /googoo.com/ WINNT35\system32\LogFiles' ;
$num += 'grep -c 'GET index.sht /googoo.com/ WINNT35\system32\LogFiles' ;
$num += 'grep -c 'GET index.htm /googoo.com/ WINNT35\system32\LogFiles' ;
print "$num\n";
</PRE>
</BLOCKQUOTE>
<P>
The Web Master abandoned this method of user hit tabulation early
on for several reasons. The first reason was that this method
may be more accurate, but it is very time consuming, because it
has to read through and count every match that occurs in the long
daily log files. The second reason is that each page that was
to be monitored had to have its own modified version of this script,
because the script makes a specific call to the page named in
the script. Another bad side effect of this script is that it
forces you to make your index Web page, index.htm, and a Server
Side Includes page for the whole thing to work. This will greatly
reduce the speed at which your home page works. The final reason
is that the site started using the EMWAC HTTP service, which doesn't
support Server Side Includes (notice the ".sht" file
extension used in the script which is the shortened, and approved,
NT version of ".shtml"), making the scripts useless.
Good thing for the Web Master there are several other ways to
count hits on a Web page.
<H3><A NAME="HTTPStatusCodes">
HTTP Status Codes </A></H3>
<P>
There are very few people left who use the Web and have not encountered
HTTP server codes yet. There may be nothing quite as frustrating
as not receiving the HTML document you requested, but instead
the message, "Forbidden, access not granted" or a similar,
one-line response. These responses are some of the many HTTP status
codes which are issued with each request made of a Web server.
Table 14.1 outlines the different types of HTTP status codes,
and what they mean.<BR>
<P>
<CENTER><B>Table 14.1 HTTP status codes</B></CENTER>
<P>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR VALIGN=TOP><TD WIDTH=144><CENTER><B>HTTP Status Code</B></CENTER></TD>
<TD WIDTH=144><CENTER><B>Code Type</B></CENTER></TD><TD WIDTH=288><CENTER><B>Meaning </B></CENTER>
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>200</CENTER></TD><TD WIDTH=144>Successful request
</TD><TD WIDTH=288>OK-The request was satisfied.</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>201</CENTER></TD><TD WIDTH=144>Successful request
</TD><TD WIDTH=288>OK-following a POST command.</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>202</CENTER></TD><TD WIDTH=144>Successful request
</TD><TD WIDTH=288>OK-request accepted for processing, but processing is not complete.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>203</CENTER></TD><TD WIDTH=144>Successful request
</TD><TD WIDTH=288>OK-Partial information-the returned information is only partial.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>204 </CENTER></TD><TD WIDTH=144>Successful request
</TD><TD WIDTH=288>OK-No Response-request received but no information exists to send back.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>300</CENTER></TD><TD WIDTH=144>Redirection
</TD><TD WIDTH=288>Moved-The information requested is in a new location and the change is permanent.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>301</CENTER></TD><TD WIDTH=144>Redirection
</TD><TD WIDTH=288>Found-The information requested temporarily has a different URL.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>302</CENTER></TD><TD WIDTH=144>Redirection
</TD><TD WIDTH=288>Method-Information under going change, a suggestion for the client to try another location.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>303</CENTER></TD><TD WIDTH=144>Redirection
</TD><TD WIDTH=288>Not Modified-The document has not been modified as expected in the Get request.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>304</CENTER></TD><TD WIDTH=144>Redirection
</TD><TD WIDTH=288>Not delivered from cache-The client already has all the information, the browser just needs to display it again.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>400</CENTER></TD><TD WIDTH=144>Error with client
</TD><TD WIDTH=288>Bad Request-A syntax problem with the client's request, or the request could not be satisfied.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>401</CENTER></TD><TD WIDTH=144>Error with client
</TD><TD WIDTH=288>Unauthorized-Client does not have authorization to access information requested.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>402</CENTER></TD><TD WIDTH=144>Error with client
</TD><TD WIDTH=288>Payment Granted-Used when payment methods are employed by the server and have been satified/ accepted.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>403</CENTER></TD><TD WIDTH=144>Error with client
</TD><TD WIDTH=288>Forbidden-No access for client to information, even with proper authorization.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>404</CENTER></TD><TD WIDTH=144>Error with client
</TD><TD WIDTH=288>Not Found-Server could not find file to satisfy client request.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>405</CENTER></TD><TD WIDTH=144>Error with client
</TD><TD WIDTH=288>Method Not Allowed-The method used in the request line is not allowed for access to the information in the request URL.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>406</CENTER></TD><TD WIDTH=144>Error with client
</TD><TD WIDTH=288>None Acceptable-The information requested has been found, but not within the conditions stated in the request, monitored by the Accept and Accept-Encoding request headers.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>407</CENTER></TD><TD WIDTH=144>Error with client
</TD><TD WIDTH=288>Proxy Authentication Required-This code is not in service yet because HTTP 1.0 does not have proxy capacity yet. When it does, this will indicate proper client authentication necessary to continue.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>409</CENTER></TD><TD WIDTH=144>Error with client
</TD><TD WIDTH=288>Conflict-There is a conflict with the information requested in its current state, preventing access.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>410</CENTER></TD><TD WIDTH=144>Error with client
</TD><TD WIDTH=288>Gone-Information requested by client is no longer available, with no forwarding URL.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>411</CENTER></TD><TD WIDTH=144>Error with client
</TD><TD WIDTH=288>Authorization Refused-The credentials in the client request are not satisfactory to allow access to requested information.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>500</CENTER></TD><TD WIDTH=144>Error with server
</TD><TD WIDTH=288>Internal Server Error-An unexpected condition has caused the server to be unable to satisfy the client's request.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>501</CENTER></TD><TD WIDTH=144>Error with server
</TD><TD WIDTH=288>Not Implemented-The client's request includes facilities not currently supported by the server.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=144><CENTER>502</CENTER></TD><TD WIDTH=144>Error with server
</TD><TD WIDTH=288>Bad Gateway-Upstream gateway access necessary for completing request denied or failed.
</TD></TR>
</TABLE></CENTER>
<P>
<P>
Understanding these status codes is critical if you want to keep
track of what is happening on your server with your Web sites.
While these status codes are not recorded by the EMWAC HTTP service
log files that the Goo Goo Records site uses, they are in the
log files of other servers.
<H2><A NAME="TrackingandEnvironmentalVariables"><FONT SIZE=5 COLOR=#FF0000>
Tracking and Environmental Variables</FONT></A></H2>
<P>
In each of the Perl scripts that we have used so far, a standard
bit of code is used to parse off the form data. We then used that
parsed data to make decisions on which the Perl script is to act.
Form data, however, is not the only data we can glean from a user
through the server. In any Perl script called by an HTML document,
we can use the special environment variables to make decisions.
<P>
Environment variables are accessed through the %ENV variable,
and can be readily used. For example, if you wanted to track how
many users from the googoo.com domain have used your Perl scripts,
you could add the following snippet of code to each Perl script:
<BLOCKQUOTE>
<PRE>
if ($ENV{'REMOTE_HOST'}=~/googoo\.com/i) {
open(TRACK,"c:\logs\scripts.trk");
$line=<TRACK>;
close(TRACK);
$line++;
open(TRACK,">c:\logs\scripts.trk");
print TRACK $line;
close(TRACK);
}
</PRE>
</BLOCKQUOTE>
<P>
This script will increment the number contained in scripts.trk
every time the script is accessed only if the client is accessing
from within googoo.com. This could be useful for Web sites that
will only deliver certain pages to internal users, or to track
which users are inside, and which are outside your company.
<P>
In addition to the environmental variables already present on
the NT Server, and those which may have been added, the EMWAC
HTTP service uses the environmental variables listed in Table
14.2.<BR>
<P>
<CENTER><B>Table 14.2 Environmental variables</B></CENTER>
<P>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR VALIGN=TOP><TD WIDTH=192><CENTER><B>Environmental Variable</B></CENTER>
</TD><TD WIDTH=384><CENTER><B>Description</B></CENTER></TD></TR>
<TR VALIGN=TOP><TD WIDTH=192>CONTENT_LENGTH</TD><TD WIDTH=384>The length of the content as received from the client.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=192>CONTENT_TYPE</TD><TD WIDTH=384>The content type of the information received that has attached data, as with "POST" requests.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=192>GATEWAY_INTERFACE</TD><TD WIDTH=384>The CGI specification revision for the server, in the format of CGI/revsion.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=192>HTTP_ACCEPT</TD><TD WIDTH=384>This is the list of MIME types that the HTTP server will recognize, or accept, for use.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=192>PATH_INFO</TD><TD WIDTH=384>The path data based on the client's request.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=192>QUERY_STRING</TD><TD WIDTH=384>All the information that follows the "?" in the URL when the script specified was accessed using "GET."
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=192>REMOTE_ADDR</TD><TD WIDTH=384>The client's IP address.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=192>REQUEST_METHOD</TD><TD WIDTH=384>The method of request made by the client, as"GET," "POST," and so forth.
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=192>SCRIPT_NAME</TD><TD WIDTH=384>Path name of the script requested to execute.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -