📄 ch02_01.htm

📁 用perl编写CGI的好书。本书从解释CGI和底层HTTP协议如何工作开始
💻 HTM
📖 第 1 页 / 共 2 页
字号:
上一页 12
absolute URL is <em class="emphasis">http://localhost/cgi/script.cgi.</em></p></dd><dt><b>Relative URL</b></dt><dd><p>URLs without a scheme, host, or port are called<a name="INDEX-221" /> <a name="INDEX-222" /><a name="INDEX-223" />relative URLs. These can be furtherbroken down into full and relative paths:</p><dl><dt><b>Full paths</b></dt><dd><p>Relative URLs with an absolute<a name="INDEX-224" />path are sometimes referred toas <em class="firstterm">full paths</em> (even though they can alsoinclude a query string and fragment identifier). Full paths can bedistinguished from URLs with relative paths because they always startwith a forward slash. Note that in all these cases, the paths arevirtual paths, and do not necessarily correspond to a path on the webserver's filesystem. An example of an absolute path is<em class="emphasis">/index.html</em>.</p></dd><dt><b>Relative paths</b></dt><dd><p>Relative URLs that begin with a character other than a forward slashare <em class="firstterm">relative paths</em>. Examples of relative pathsinclude <em class="emphasis">script.cgi</em> and<em class="emphasis">../images/photo.jpg.</em></p></dd></dl></dd></dl></div><a name="ch02-80730" /><div class="sect2"><h3 class="sect2">2.1.3. URL Encoding</h3><p>Many characters must be <a name="INDEX-225" /> <a name="INDEX-226" /><a name="INDEX-227" />encoded within a URL for avariety of reasons. For example, certain characters such as<tt class="literal">?</tt>, <tt class="literal">#</tt>, and <tt class="literal">/</tt>have special meaning within URLs and will be misinterpreted unlessencoded. It is possible to name a file<em class="filename">doc#2.html</em> on some systems, but the URL<em class="filename">http://localhost/doc#2.html</em> would not point tothis document. It points to the fragment <em class="filename">2.html</em>in a (possibly nonexistent) file named <em class="filename">doc</em>. Wemust encode the <tt class="literal">#</tt> character so the web browser andserver recognize that it is part of the resource name instead.</p><p>Characters are encoded by representing them with a<a name="INDEX-228" /><a name="INDEX-229" /> <a name="INDEX-230" /><a name="INDEX-231" />percent sign (<tt class="literal">%</tt>)followed by the two-digit hexadecimal value for that character basedupon the ISO Latin 1 character set or ASCII character set (thesecharacter sets are the same for the first eight bits). For example,the <tt class="literal">#</tt> symbol has a hexadecimal value of<tt class="literal">0x23</tt>, so it is encoded as <tt class="literal">%23</tt>.</p><p>The following characters must be encoded:</p><ul><li><p>Control characters: ASCII <tt class="literal">0x00</tt> through<tt class="literal">0x1F</tt> plus <tt class="literal">0x7F</tt></p></li><li><p>Eight-bit characters: ASCII <tt class="literal">0x80</tt> through<tt class="literal">0xFF</tt></p></li><li><p>Characters given special importance within URLs: <tt class="literal">; / ? : @&amp; = + $ ,</tt></p></li><li><p>Characters often used to delimit (quote) URLs: <tt class="literal">&lt; &gt; # %"</tt></p></li><li><p>Characters considered unsafe because they may have special meaningfor other protocols used to transmit URLs (e.g., SMTP): <tt class="literal">{ }| \ ^ [ ] `</tt></p></li></ul><p>Additionally,<a name="INDEX-232" /> <a name="INDEX-233" /> <a name="INDEX-234" />spaces should be encoded as<tt class="literal">+</tt> although <tt class="literal">%20</tt> is also allowed.As you can see, most characters must be encoded; the list of<a name="INDEX-235" />allowed characters is actually muchshorter:</p><ul><li><p>Letters: <tt class="literal">a-z</tt> and <tt class="literal">A-Z</tt></p></li><li><p>Digits: <tt class="literal">0-9</tt></p></li><li><p>The following characters: <tt class="literal">- _ . ! ~ * ' ( )</tt></p></li></ul><p>It is actually permissible and not uncommon for any of the allowedcharacters to also be encoded by some software. Thus, any applicationthat decodes a URL must decode every occurrence of a percentage signfollowed by any two hexadecimal digits.</p><p>The following <a name="INDEX-236" /><a name="INDEX-237" />code encodes text for URLs:</p><blockquote><pre class="code">sub url_encode {    my $text = shift;    $text =~ s/([^a-z0-9_.!~*'(  ) -])/sprintf "%%%02X", ord($1)/ei;    $text =~ tr/ /+/;    return $text;}</pre></blockquote><p>Any character not in the allowed set is replaced by a percentage signand its two-digit hexadecimal equivalent. The three percentage signsare necessary because percentage signs indicate format codes for<tt class="function">sprintf</tt>, and literal percentage signs must beindicated by two percentage signs. Our format code thus includes apercentage sign, <tt class="literal">%%</tt>, plus the format code for twohexadecimal digits, <tt class="literal">%02X</tt>.</p><p>Code to decode URL encoded text looks like this:</p><blockquote><pre class="code">sub url_decode {    my $text = shift;    $text =~ tr/\+/ /;    $text =~ s/%([a-f0-9][a-f0-9])/chr( hex( $1 ) )/ei;    return $text;}</pre></blockquote><p>Here we first translate any plus signs to spaces. Then we scan for apercentage sign followed by two hexadecimal digits and use<a name="INDEX-238" /><a name="INDEX-239" />Perl's <tt class="function">chr</tt>function to convert the hexadecimal value into acharacter.</p><p>Neither the <a name="INDEX-240" />encoding nor the decoding operationscan be safely repeated on the same text. Text encoded twice differsfrom text encoded once because the percentage signs introduced in thefirst step would themselves be encoded in the second. Likewise, youcannot encode or decode entire URLs. If you were to decode a URL, youcould no longer reliably parse it, for you may have introducedcharacters that would be misinterpreted such as <tt class="literal">/</tt>or <tt class="literal">?</tt>. You should always parse a URL to get thecomponents you want before decoding them; likewise, encode componentsbefore building them into a full URL.</p><p>Note that it's good to understand how a wheel works butreinventing it would be pointless. Even though you have just seen howto encode and decode text for URLs, you shouldn't do soyourself. The <a name="INDEX-241" /><a name="INDEX-242" /><a name="INDEX-243" /><a name="INDEX-244" />URI::URL module (actually it is acollection of modules), available on CPAN (see <a href="appb_01.htm">Appendix B, "Perl Modules"</a>), provides many URL-related modules andfunctions. One of the included modules, URI::Escape, provides the<tt class="function">url_escape</tt><a name="INDEX-245" /><a name="INDEX-246" /><a name="INDEX-247" /> and<tt class="function">url_unescape</tt> functions. Use them. Thesubroutines in these modules have been vigorously tested, and futureversions will reflect any changes to HTTP as it evolves.<a href="#FOOTNOTE-2">[2]</a>Using standard subroutines will also make your code much clearer tothose who may have to maintain your code later (this includes you).</p><blockquote><a name="FOOTNOTE-2" /><p>[2]Don't think this could happen? What if we told you thetilde character (<tt class="literal">~</tt>) was not always allowed inURLs? This restriction was removed after it became common practicefor some web servers to accept a tilde plus username in the path toindicate a user's personal web directory.</p></blockquote><p>If, despite these warnings, you still insist on writing your owndecoding code yourself, at least place it in appropriately namedsubroutines. Granted, some of these actions take only a line or twoof code, but the code is quite cryptic, and these operations<a name="INDEX-248" /><a name="INDEX-249" /><a name="INDEX-250" />should be<a name="INDEX-251" /> <a name="INDEX-252" />clearly labeled.</p></div></div><hr align="left" width="515" /><div class="navbar"><table border="0" width="515"><tr><td width="172" valign="top" align="left"><a href="ch01_04.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0" /></a></td><td width="171" valign="top" align="center"><a href="index.htm"><img src="../gifs/txthome.gif" alt="Home" border="0" /></a></td><td width="172" valign="top" align="right"><a href="ch02_02.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0" /></a></td></tr><tr><td width="172" valign="top" align="left">1.4. Web Server Configuration</td><td width="171" valign="top" align="center"><a href="index/index.htm"><img src="../gifs/index.gif" alt="Book Index" border="0" /></a></td><td width="172" valign="top" align="right">2.2. HTTP</td></tr></table></div><hr align="left" width="515" /><img src="../gifs/navbar.gif" alt="Library Navigation Links" usemap="#library-map" border="0" /><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2001</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"><area href="../index.htm" coords="1,1,83,102" shape="rect" /><area href="../lnut/index.htm" coords="81,0,152,95" shape="rect" /><area href="../run/index.htm" coords="172,2,252,105" shape="rect" /><area href="../apache/index.htm" coords="238,2,334,95" shape="rect" /><area href="../sql/index.htm" coords="336,0,412,104" shape="rect" /><area href="../dbi/index.htm" coords="415,0,507,101" shape="rect" /><area href="../cgi/index.htm" coords="511,0,601,99" shape="rect" /></map></body></html>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -