📄 ch8.htm
字号:
<H4>Decoding <TT><FONT FACE="Courier">+</FONT></TT> and <TT><FONT FACE="Courier">%<I>hh</I></FONT></TT>
"URL-Encoding"</H4>
<P>
To allow arbitrary characters with special meanings like spaces,
ampersands, and equal signs to be passed as form data, any characters
that are likely to cause trouble are translated by the Web browser
to a safe alternative. Space characters (<TT><FONT FACE="Courier">
</FONT></TT>) are converted to plus signs (<TT><FONT FACE="Courier">+</FONT></TT>),
and other special characters such as ampersands (<TT><FONT FACE="Courier">&</FONT></TT>)
or percent characters (<TT><FONT FACE="Courier">%</FONT></TT>)
are replaced by a sequence
<BLOCKQUOTE>
<TT><FONT FACE="Courier">%hh</FONT></TT>
</BLOCKQUOTE>
<P>
where <TT><I><FONT FACE="Courier">hh</FONT></I></TT> is the hexadecimal
representation of the numeric code for the character replaced.
This encoding is the same as that used for URLs during HTTP transactions
and is hence referred to as "application/x-www-form-urlencoded."
Before the form data is passed to other procedures, this encoding
must be reversed. The Perl procedure in Listing 8.10 is an improvement
on the procedure in Listing 8.9, with code added to reverse the
URL-encoding.
<HR>
<BLOCKQUOTE>
<B>Listing 8.10. Separating and decoding Form data in Perl.<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">foreach (split("&", $encodedQuery))
{<BR>
($name,$contents) = split("=");
<BR>
$form{$name}=$contents;<BR>
$form{$name}=~s/\+/ /g;<BR>
$form{$name}=~s/%(..)/pack("c",hex($1))/ge;
<BR>
}</FONT></TT>
</BLOCKQUOTE>
<HR>
<H4>Basic Data Validation</H4>
<P>
It makes sense to check that the data passed from the form is
suitable for the intended purpose before continuing. The CGI program
could, for instance, output another HTML form re-prompting the
user for the information with an explanation of why the first
submission was not acceptable. Basic data checks could ensure
that a text box intended for a positive whole number contained
only digits, that a text box prompting for an Internet e-mail
address contained an at sign (<TT><FONT FACE="Courier">@</FONT></TT>)
and no spaces or that a <TT><FONT FACE="Courier">SELECT</FONT></TT>
form component returned at least one option from the genuine list
of choices.
<P>
If the form data is to be passed to another application or to
library routines, the data should be stripped of any characters
that aren't strictly necessary. In particular, any characters
that have special meanings to interpreters should be carefully
handled. For instance, in a UNIX Bourne shell script, it is difficult
to manipulate an environment variable or pass a variable to another
program without re-evaluating the contents of the environment
variable. In an <TT><FONT FACE="Courier"><ISINDEX></FONT></TT>
handler script the command
<BLOCKQUOTE>
<TT><FONT FACE="Courier">QUERY_STRING='/bin/env | /bin/grep '^QUERY_STRING='
| \<BR>
/bin/sed -e 's/QUERY_STRING=//' -e 's/[^A-Za-z ]//g'`</FONT></TT>
</BLOCKQUOTE>
<P>
will remove all characters other than letters and spaces from
the user input before it is reinterpreted.
<P>
Not only will this kind of "paranoia" remove a potential
logic flaw in the CGI form handler, it will also increase the
security of the Web server. For more discussion of these concerns,
please read <A HREF="ch9.htm" >Chapter 9</A>, "Security."
<H3><A NAME="ChoosingtheProgrammingLanguage">Choosing the Programming
Language</A></H3>
<P>
CGI is a platform-independent interface definition. The actual
choice of which programming language to use is left to the programmer.
Any programming language available on the Web server platform
that includes access to environment variables can be used for
writing CGI form handlers.
<H4>Pros and Cons</H4>
<P>
The class of programming tools characterized as high-level, interpreted,
"scripting" languages include UNIX command shell languages,
DOS batch command files, Perl scripts, and Visual Basic programs.
These typically have the following benefits:
<UL>
<LI><FONT COLOR=#000000>Rapid prototyping and development</FONT>
<LI><FONT COLOR=#000000>Ease of maintenance</FONT>
<LI><FONT COLOR=#000000>Convenient complex data representation</FONT>
</UL>
<P>
but can introduce the following costs:
<UL>
<LI><FONT COLOR=#000000>Slow response</FONT>
<LI><FONT COLOR=#000000>Large memory requirements</FONT>
<LI><FONT COLOR=#000000>High code visibility</FONT>
</UL>
<P>
The last of these can be a problem if the CGI program is to be
widely distributed, or if the Web server can be fooled into delivering
the text of the CGI program to a Web browser, as any security
holes are more easily discovered by system crackers.
<P>
Programming tools that compile lower-level source code such as
Pascal, C, and C++ reverse the pattern of pros and cons. These
tend to provide
<UL>
<LI><FONT COLOR=#000000>Fast response</FONT>
<LI><FONT COLOR=#000000>Low memory impact</FONT>
<LI><FONT COLOR=#000000>Obscured implementation</FONT>
</UL>
<P>
but may entail
<UL>
<LI><FONT COLOR=#000000>Lengthy design, development, and testing</FONT>
<LI><FONT COLOR=#000000>Need for expert maintenance</FONT>
<LI><FONT COLOR=#000000>Adaptation of data representations to
primitive data storage types</FONT>
</UL>
<P>
One can easily conclude that one of the former programming languages
might be appropriate for a short-term solution during development
of a tool restricted to a single organization, but the investment
of time in one of the latter programming languages might be more
appropriate to a simple but frequently used general Web form handler.
<H3><A NAME="ASampleCGIFormHandlerProgram">A Sample CGI Form Handler
Program</A></H3>
<P>
Let's employ the techniques learned to write a CGI form handler
program, as shown in Listing 8.11, for the sample form given in
Listing 8.5. We will accept and acknowledge <TT><FONT FACE="Courier">POST</FONT></TT>ed
form data (which we assume is coming from the sample form) and
write it to a text file: <TT><FONT FACE="Courier">/var/adm/www/comments.log</FONT></TT>.
<HR>
<BLOCKQUOTE>
<B>Listing 8.11. A form handler for Visitor's book/Comments form:
comments.cgi.<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">#!/usr/local/bin/perl<BR>
# Handle comment form submissions<BR>
# Form fields: surname,forename,title,opinion,pages,feedback<BR>
<BR>
$encodedQuery=""; # The encoded
form data will be appended to this string<BR>
$charsRemaining=102400;# This CGI program will truncate the encoded
form data after 100Kbytes<BR>
#
unless the Content-length: is specified.<BR>
$charsRemaining=$ENV{"CONTENT_LENGTH"} if $ENV{"CONTENT_LENGTH"};
<BR>
while ($charsRemaining--) {<BR>
$encodedQuery.=getc;<BR>
}<BR>
<BR>
foreach (split("&", $encodedQuery)) {<BR>
($name,$contents) = split("=");
<BR>
$form{$name}=$contents;<BR>
$form{$name}=~s/\+/ /g;<BR>
$form{$name}=~s/%(..)/pack("c",hex($1))/ge;
<BR>
}<BR>
<BR>
print "Content-type: text/html\n\n"; #
Generate HTTP header<BR>
print "<HTML><HEAD><TITLE>Thank-you</TITLE></HEAD>\n";
<BR>
print "<BODY><H1>Thank-you</H1>\n";
<BR>
$safename=$form{"title"}." ".$form{"surname"};
<BR>
$safename=~s/[^\w ]/ /g; #
Excise any HTML special characters<BR>
$safepages=$form{"pages"};<BR>
$safepages=~s/[^\w ]/ /g; #
Excise any HTML special characters<BR>
print "<P>Thank-you for submitting your comments on
".$safepages.", ".$safename."<BR>
Â<P>\n";<BR>
print "<HR>\n";<BR>
print '<P><A HREF="/">Return to home page</A></P>';
<BR>
print "\n</BODY></HTML>";<BR>
<BR>
if (open(LOGFILE, ">>/var/adm/www/comments.log"))
{<BR>
foreach (keys %form) {<BR>
print LOGFILE
$_.":\n".$form{$_}."\n";<BR>
}<BR>
close(LOGFILE);<BR>
}</FONT></TT>
</BLOCKQUOTE>
<HR>
<H3><A NAME="FormsBasedIntranetInternetClientSer">Forms-Based
Intranet/Internet Client/Server Applications</A></H3>
<P>
If your organization plans to use Web forms for major applications-either
publicly available or restricted to a LAN-it would be well worth
developing and standardizing on a library of CGI and Web form
routines. Standard procedures can be designed not only to decode
form submissions, but also to generate forms "on-the-fly."
<H4>What Forms Can and Can't Do</H4>
<P>
Web forms do not provide the rich set of user interface objects
available in system-specific GUI toolkits. They do not provide
instant feedback or a high-level of control on allowable input.
<P>
Web forms do, however, allow platform-independent development
of generic input clients for network applications. Web browsers
that support Web forms are available for the most popular client
platforms. In fact, your intended user probably already has a
forms-capable client on his or her desktop.
<H4>Automatically Generated Forms</H4>
<P>
Rather than designing a different Web form for every possible
situation, a programmer can design a CGI application to automatically
generate HTML forms by describing the data types to be prompted
for in a machine-readable representation, and choosing a template
HTML form tag that is appropriate to each data type.
<P>
Numbers can be prompted for using the <TT><FONT FACE="Courier">INPUT
TYPE=TEXT</FONT></TT> tag, Boolean choices using radio buttons,
or <TT><FONT FACE="Courier">SELECT</FONT></TT> tags and textual
data using the <TT><FONT FACE="Courier">TEXTAREA</FONT></TT> tag.
These can be automatically sized as needed using their respective
tag attributes. A truly object-orientated design would implement
a "Web forms" interface to all objects. This interface
would include methods to generate HTML form tags that prompt for
the contents of an object, methods to validate the contents of
the form submission, and methods to help and re-prompt the user
when the form data submitted is not suitable.
<H4>Partially Prefilled Forms</H4>
<P>
Library routines that generate Web form tags should include the
capability to supply a default value to the user. The contents
of <TT><FONT FACE="Courier">TEXTAREA</FONT></TT> tags and the
<TT><FONT FACE="Courier">VALUE</FONT></TT>, <TT><FONT FACE="Courier">chECKED</FONT></TT>,
and <TT><FONT FACE="Courier">SELECTED</FONT></TT> attributes described
previously provide several ways to supply default input. This
capability is not only a way to suggest appropriate responses,
it can also be used when the user is re-editing or changing existing
data.
<P>
<CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD><B>Tip</B></TD></TR>
<TR><TD>
<BLOCKQUOTE>
Not all existing data can be offered as a default entry in all form tags. The <TT><FONT FACE="Courier">INPUT TYPE=TEXT</FONT></TT> tag will accommodate only default <TT><FONT FACE="Courier">VALUE</FONT></TT>s, which can be expressed as an HTML attribute.
Quote characters (<TT><FONT FACE="Courier">"</FONT></TT>), greater-than characters (<TT><FONT FACE="Courier">></FONT></TT>), and line-breaks all cause problems if they are used as default text in <TT><FONT FACE="Courier">INPUT TYPE=TEXT</FONT></TT>
tags. Often, the <TT><FONT FACE="Courier">TEXTAREA</FONT></TT> tag comes to the rescue with its "container" syntax described previously. Also, if it is possible that existing data does not conform to the current set of options in a <TT><FONT
FACE="Courier">SELECT</FONT></TT> or similar form tag, an "Other" option accompanied by a text field can save the day.
</BLOCKQUOTE>
</TD></TR>
</TABLE></CENTER>
<P>
<H2><A NAME="FormsReadyReference"><FONT SIZE=5 COLOR=#FF0000>Forms
Ready Reference</FONT></A></H2>
<P>
The following is a summary of Web forms for reference:
<BLOCKQUOTE>
<TT><FONT FACE="Courier"><FORM ACTION="url" METHOD=reqtype
><BR>
<INP
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -