📄 ch21.htm
字号:
statistics in real-time every time the user accesses the on-line
summary page is unfeasible on most computer systems. The best
solution is to have these summaries created in the background
of the Web server on a regular basis, so users always get a reasonably
current set of information and don't have to wait for several
minutes while it processes the access log file. There's a UNIX
program called <I>crontab</I> that allows you to schedule events
(such as the execution of your program) in the background. Here's
how it works. First, you need to ensure that you (and not the
Web server process) has access to crontab; contact your UNIX admin
to let him or her know of your requirement.
<P>
<CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD><B>Caution</B></TD></TR>
<TR><TD>
<BLOCKQUOTE>
In general, the Web server process should have access to exactly what it needs access to-nothing more and nothing less. Remember that if a rogue user gains control of the Web server process (via a false crontab file or some other means), then he or she
would be able to effectively execute privileged commands with total anonymity-something which is never a good situation on a computer system.</BLOCKQUOTE>
</TD></TR>
</TABLE></CENTER>
<P>
<P>
After you've set up your crontab access, you should edit your
crontab file and add a line similar to the following:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">* 06 * * * /usr/home/big/anadas/cgiunleashed/auto-make</FONT></TT>
</BLOCKQUOTE>
<P>
You should read your system's man page for crontab to ensure that
you have your crontab file set up correctly.
<P>
Now that you've got crontab set up, you'll need to have an access
log summary program that produces a Web-viewable summary.
<H2><A NAME="EnvironmentVariables"><FONT SIZE=5 COLOR=#FF0000>Environment
Variables</FONT></A></H2>
<P>
The Web server's access log feature functions by recording information
about the user who is visiting your server, which is sent from
the user's own browser. While the information the access log records
is very useful, it is by no means an exhaustive account of everything
the browser "tells" the Web server about itself and
the user.
<P>
Let's take a look at the output of the environment variables program
first used in <A HREF="ch12.htm" >Chapter 12</A>, "Imagemaps"
(program is available on-line at <TT><FONT FACE="Courier"><A HREF="http://www.anadas.com/cgiunleashed/imagemaps/exe/showenv.cgi">http://www.anadas.com/cgiunleashed/imagemaps/exe/showenv.cgi</A></FONT></TT>):
<BLOCKQUOTE>
<TT><FONT FACE="Courier">SERVER_SOFTWARE=ncSA/1.5<BR>
GATEWAY_INTERFACE=CGI/1.1<BR>
DOCUMENT_ROOT=/usr/home/big/anadas<BR>
REMOTE_ADDR=199.45.70.220<BR>
SERVER_PROTOCOL=HTTP/1.0<BR>
REQUEST_METHOD=GET<BR>
REMOTE_HOST=tc220.wwdc.com<BR>
QUERY_STRING=<BR>
HTTP_USER_AGENT=Mozilla/3.0b5a (Win95; I)<BR>
PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin:/usr/contrib/bin:/usr/X11/bin
<BR>
HTTP_CONNECTION=Keep-Alive<BR>
HTTP_AccEPT=image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
*/*<BR>
SCRIPT_NAME=/cgiunleashed/imagemaps/exe/showenv.cgi<BR>
SERVER_NAME=www.anadas.com<BR>
SERVER_PORT=80<BR>
HTTP_HOST=www.anadas.com<BR>
SERVER_ADMIN=shuman@anadas.com</FONT></TT>
</BLOCKQUOTE>
<P>
This is the complete set of environment variable information for
the Web server process on this particular server, when a particular
user accessed the script in question. Most of these variables
are passed from the browser to the Web server, via the CGI interface.
Note, however, that some of the variables are set entirely on
the Web server's end, for the benefit of CGI programs that need
to know additional information about their environment. So what
do these environment variables mean?
<BLOCKQUOTE>
<TT><FONT FACE="Courier">SERVER_SOFTWARE</FONT></TT>: This indicates
the actual Web server software, which in this case is ncSA httpd
version 1.5.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">GATEWAY_INTERFACE</FONT></TT>: This is
the level of CGI compatibility supported by the server, which
in this case is 1.1.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">DOCUMENT_ROOT</FONT></TT>: This is also
a server-set environment variable. It indicates the location of
the root document for the Web server (<TT><FONT FACE="Courier"><A HREF="http://www.anadas.com">http://www.anadas.com</A></FONT></TT>).
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">REMOTE_ADDR</FONT></TT>: This environment
variable is passed by the browser and indicates the IP address
of the browser's Internet connection.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">SERVER_PROTOCOL</FONT></TT>: This environment
variable is set by the browser and indicates the HTTP compatibility
level.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">REQUEST_METHOD</FONT></TT>: This environment
variable is set by the browser according to the kind of query
it has sent to the Web server. Normal document and file retrievals
are classified as <TT><FONT FACE="Courier">GET</FONT></TT> queries.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">REMOTE_HOST</FONT></TT>: This environment
variable is sent by the browser and indicates the hostname associated
with its IP address, if applicable.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">QUERY_STRING</FONT></TT>: This environment
variable is set according to the information that is passed by
the query. In the case of a <TT><FONT FACE="Courier">GET</FONT></TT>
query, the query string consists of whatever information is after
the question mark (<TT><FONT FACE="Courier">?</FONT></TT>) in
the URL.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">HTTP_USER_AGENT</FONT></TT>: This environment
variable allows the browser to tell the server what its product
name and version number are.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">PATH</FONT></TT>: Every UNIX user has
a path associated with his or her login, and the Web server process
is no exception.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">HTTP_CONNECTION</FONT></TT>: This environment
variable is set by the Web browser to tell the server whether
or not it supports a keep-alive connection.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">HTTP_AccEPT</FONT></TT>: This environment
variable allows the Web browser to tell the Web server the different
data formats it accepts in-line (plug-ins not included).
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">SCRIPT_NAME</FONT></TT>: This environment
variable is set by the Web server and identifies the script that
is being run.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">SERVER_NAME</FONT></TT>: This environment
variable is set by the Web server and identifies the Web server's
hostname.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">SERVER_PORT</FONT></TT>: This environment
variable is set by the Web server and identifies the port address
the server is "listening to" for connections.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">HTTP_HOST</FONT></TT>: This environment
variable indicates the hostname of the Web server's host.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">SERVER_ADMIN</FONT></TT>: This environment
variable, set by the Web server, indicates the e-mail address
of the Web server administrator.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">AUTH_TYPE</FONT></TT>: If the server
supports user authentication, and the script is protected, this
is the protocol-specific authentication method used to validate
the user.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">REMOTE_USER</FONT></TT>: If the server
supports user authentication, and the script is protected, this
is the username they have authenticated as.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">REMOTE_IDENT</FONT></TT>: If the HTTP
server supports RFC 931 identification, this variable will be
set to the remote username retrieved from the server.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">DOCUMENT_NAME</FONT></TT>: The current
filename.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">DOCUMENT_URL</FONT></TT>: The virtual
path to the document.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">QUERY_STRING_UNESCAPED</FONT></TT>: The
unescaped version of any search query the client sent, with all
shell-special characters escaped with <TT><FONT FACE="Courier">\</FONT></TT>.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">DATE_LOCAL</FONT></TT>: The current date
and local time zone. Subject to the <TT><FONT FACE="Courier">timefmt</FONT></TT>
parameter to the <TT><FONT FACE="Courier">config</FONT></TT> command.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">DATE_GMT</FONT></TT>: Same as <TT><FONT FACE="Courier">DATE_LOCAL</FONT></TT>
but in Greenwich Mean Time.
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">LAST_MODIFIED</FONT></TT>: The last modification
date of the current document. Subject to <TT><FONT FACE="Courier">timefmt</FONT></TT>
like the others.
</BLOCKQUOTE>
<P>
Note that not all of these variables appear on the sample output.
This is because different servers and browser combinations created
different environment variables. Netscape Navigator, Microsoft
Internet Explorer, and many other Web browsers each put their
own spin on environment variables, and either provide more environment
variables or send richer information in the aforementioned variables.
For example, Internet Explorer sends the current screen resolution
in the browser-type environment variable. This allows dynamically
generated Web pages to optimize their appearance for a particular
screen size.
<P>
<CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD><B>Can I Get E-Mail Addresses?</B></TD></TR>
<TR><TD>
<BLOCKQUOTE>
One of the questions most often puzzled over by CGI programmers is whether or not they can obtain a user's e-mail address. Creators of browser software are very sensitive to this issue, and the answer is, in most cases, no. There are certain browsers that
pass along this information, at least to some extent.</BLOCKQUOTE>
<BLOCKQUOTE>
Some browsers that return full e-mail address information are</BLOCKQUOTE>
<UL>
<LI><FONT COLOR=#000000>ncSA Mosaic for Macintosh 2.0a17</FONT>
<LI><FONT COLOR=#000000>ncSA Mosaic for Macintosh 2.0a8</FONT>
<LI><FONT COLOR=#000000>MCom Netscape 0.9 beta (X, Mac, Windows)</FONT>
</UL>
<BLOCKQUOTE>
A browser that returns the username is:</BLOCKQUOTE>
<UL>
<LI><FONT COLOR=#000000>MCom Netscape 0.9 beta (X only)</FONT>
</UL>
</TD></TR>
</TABLE></CENTER>
<P>
<P>
The method by which environment variables are extracted in C is
presented in Listing 21.4, which is essentially the C version
of the showenv.cgi program.
<HR>
<BLOCKQUOTE>
<B>Listing 21.4. Source code for the Web server environment variable
printer.<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">// getenv.cpp -- Web Server Environment
Variable Printer<BR>
// Available on-line at http://www.anadas.com/cgiunleashed/trackuser/
<BR>
//<BR>
// This program displays all of the environment variables available
to the<BR>
// web server when a user accesses this program via the CGI interface
<BR>
//<BR>
// By Shuman Ghosemajumder, Anadas Software Development<BR>
<BR>
#include <stdio.h><BR>
<BR>
int main(int argc, char *argv[], char *env[]);<BR>
<BR>
int main(int argc, char *argv[], char *env[])<BR>
{<BR>
int count;<BR>
<BR>
printf("Content-type: text/html\n\n");
<BR>
<BR>
printf("<HTML><TITLE>Environment
Variables</TITLE><BODY>\n");<BR>
<BR>
printf("<H1>Web Server Environment
Variables</H1><ul>\n");<BR>
<BR>
for(count=0;env[count];)<BR>
{ printf("<B>Var
%d.</B> %s<BR>\n", count, env[count++] );<BR>
}<BR>
<BR>
printf("</ul></BODY></HTML>\n");
<BR>
<BR>
return(0); // exit gracefully
<BR>
}</FONT></TT>
</BLOCKQUOTE>
<HR>
<H2><FONT SIZE=5 COLOR=#FF0000><A Name="CreatingaPseudoAccessLogFile">Creating a Pseudo Access Log File </A></FONT>
</H2>
<P>
Having the ability to parse ready-made server access logs is wonderful,
but what if you don't have access to those logs? As long as you
can execute CGI scripts, you can create your own logs dynamically.
Listing 21.5 is an example of a program that generates a "Pseudo
Access Log File" every time it is loaded. This program creates
a log file similar to the server log files, but with richer information.
<HR>
<BLOCKQUOTE>
<B>Listing 21.5. Source code for the make log program.<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">// makelog.cpp -- MAKE LOG PROGRAM<BR>
// Available on-line at http://www.anadas.com/cgiunleashed/trackuser/
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -