⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch02_01.htm

📁 用perl编写CGI的好书。本书从解释CGI和底层HTTP协议如何工作开始
💻 HTM
📖 第 1 页 / 共 2 页
字号:
<?label 2. The Hypertext Transport Protocol ?><html><head><title>The Hypertext Transport Protocol  (CGI Programming with Perl)</title><link href="../style/style1.css" type="text/css" rel="stylesheet" /><meta name="DC.Creator" content="Scott Guelich, Gunther Birznieks and Shishir Gundavaram" /><meta scheme="MIME" content="text/xml" name="DC.Format" /><meta content="en-US" name="DC.Language" /><meta content="O'Reilly & Associates, Inc." name="DC.Publisher" /><meta scheme="ISBN" name="DC.Source" content="1565924193L" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="CGI Programming with Perl" /><meta content="Text.Monograph" name="DC.Type" /></head><body bgcolor="#ffffff"><img src="gifs/smbanner.gif" alt="Book Home" usemap="#banner-map" border="0" /><map name="banner-map"><area alt="CGI Programming with Perl" href="index.htm" coords="0,0,466,65" shape="rect" /><area alt="Search this book" href="jobjects/fsearch.htm" coords="467,0,514,18" shape="rect" /></map><div class="navbar"><table border="0" width="515"><tr><td width="172" valign="top" align="left"><a href="ch01_04.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0" /></a></td><td width="171" valign="top" align="center"><a href="index.htm">CGI Programming with Perl</a></td><td width="172" valign="top" align="right"><a href="ch02_02.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0" /></a></td></tr></table></div><hr align="left" width="515" /><h1 class="chapter">Chapter 2. The Hypertext Transport Protocol </h1><div class="htmltoc"><h4 class="tochead">Contents:</h4><p><a href="ch02_01.htm">URLs</a><br><a href="ch02_02.htm">HTTP</a><br><a href="ch02_03.htm">Browser Requests</a><br><a href="ch02_04.htm">Server Responses</a><br><a href="ch02_05.htm">Proxies</a><br><a href="ch02_06.htm">Content Negotiation</a><br><a href="ch02_07.htm">Summary</a><br></p></div><p>The <a name="INDEX-169" /> <a name="INDEX-170" /> <a name="INDEX-171" />HypertextTransport Protocol (HTTP) is the common language that web browsersand web servers use to communicate with each other on the Internet.CGI is built on top of HTTP, so to understand CGI fully, it certainlyhelps to understand HTTP. One of the reasons CGI is so powerful isbecause it allows you to manipulate the metadata exchanged betweenthe web browser and server and thus perform many useful tricks,including:</p><ul><li><p>Serve content of varying type, language, or other encoding accordingto the client's needs.</p></li><li><p>Check the user's previous location.</p></li><li><p>Check the browser type and version and adapt your response to it.</p></li><li><p>Specify how long the client can cache a page before it is consideredoutdated and should be reloaded.</p></li></ul><p>We won't cover all of the details of HTTP, just what isimportant for our understanding of CGI. Specifically, we'llfocus on the request and response process: how browsers ask for andreceive web pages.</p><p>If you are interested in understanding more about<a name="INDEX-172" /> <a name="INDEX-173" /> <a name="INDEX-174" />HTTP than we provide here,visit the World Wide Web Consortium's web site at <a href="http://www.w3.org/Protocols/">http://www.w3.org/Protocols/</a>. On the otherhand, if you are eager to get started writing CGI scripts, you may betempted to skip this chapter. We encourage you not to. Although youcan certainly learn to write CGI scripts without learning HTTP,without the bigger picture you may end up memorizing what to doinstead of understanding why. This is certainly the most challengingchapter, however, because we cover a lot of material without manyexamples. So if you find it a little dry and want to peek ahead tothe fun stuff, we'll forgive you. Just be sure to return herelater.</p><div class="sect1"><a name="ch02-81296" /><h2 class="sect1">2.1. URLs</h2><p>During our discussion of HTTP and CGI, we will be often be referringto <em class="firstterm">URLs</em><a name="INDEX-175" /> <a name="INDEX-176" />, or<em class="firstterm">Uniform Resource Locators</em>. If you have usedthe Web at all, then you are probably familiar with URLs. In webterms, a<em class="firstterm">resource</em><a name="INDEX-177" />represents anything available on the web, whether it be an HTML page,an image, a CGI script, etc. URLs provide a standard way to locatethese resources on the Web.</p><p>Note that URLs are not actually specific to HTTP; they can refer toresources in many protocols. Our discussion here will focus strictlyon HTTP URLs.</p><div class="sidebar"><h4 class="objtitle">What About URIs?</h4><p>You may have also encountered the term URI and wondered aboutthe difference between a URI and a URL. Actually, the terms are ofteninterchangeable because all URLs are URIs. <a name="INDEX-178" />Uniform ResourceIdentifiers (URIs) are a more generalized class which includes URLsas well as <a name="INDEX-179" />Uniform Resource Names (URNs). A URNprovides a name that sticks to an object even though the location ofthe object may move around. You can think of it this way: your nameis similar to a URN, while your address is similar to a URL. Bothserve to identify you in some way, and in this manner both are URIs.</p><p>Because URNs are just a concept and are not used on theWeb today, you can safely think of URIs and URLs as interchangeableterms and not let the terminology throw you. Since we are notinterested in other forms of URIs, we will try to avoid confusionaltogether by just using the term URL in the text.</p></div><a name="ch02-1-fm2xml" /><div class="sect2"><h3 class="sect2">2.1.1. Elements of a URL</h3><p><a name="INDEX-180" />HTTP URLs consist of a scheme, a host name,a port number, a path, a query string, and a fragment identifier, anyof which may be omitted under certain circumstances (see <a href="ch02_01.htm#ch02-40406">Figure 2-1</a>).</p><a name="ch02-40406" /><div class="figure"><img width="446" src="figs/cgi2.0201.gif" height="42" alt="Figure 2-1" /></div><h4 class="objtitle">Figure 2-1. Components of a URL</h4><p>HTTP URLs contain the following elements:</p><dl><dt><b>Scheme</b></dt><dd><p>The <a name="INDEX-181" /><a name="INDEX-182" />scheme represents the protocol, andfor our purposes will either be <tt class="literal">http</tt> or<tt class="literal">https</tt>.<tt class="literal">https</tt><a name="INDEX-183" />represents a connection to a secure web server. Refer to <a href="ch02_02.htm#ch02-52204">the sidebar "The Secure Sockets Layer"</a> later in this chapter.</p></dd><dt><b>Host</b></dt><dd><p>The <a name="INDEX-184" /><a name="INDEX-185" /><a name="INDEX-186" />hostidentifies the machine running a web server. It can be a<a name="INDEX-187" />domain name or an <a name="INDEX-188" /> <a name="INDEX-189" />IP address, although it is a bad ideato use IP addresses in URLs and is strongly discouraged. The problemis that IP addresses often change for any number of reasons: a website may move from one machine to another, or it may relocate toanother network. Domain names can remain constant in these cases,allowing these changes to remain hidden from the user.</p></dd><dt><b>Port number</b></dt><dd><p>The <a name="INDEX-190" /><a name="INDEX-191" />port number is optional and may appearin URLs only if the host is also included. The host and port areseparated by a colon. If the port is not specified, port 80 is usedfor <tt class="literal">http</tt> URLs and port 443 is used for<tt class="literal">https</tt> URLs.</p><p>It is possible to configure a web server to answer other ports. Thisis often done if two different web servers need to operate on thesame machine, or if a web server is operated by someone who does nothave sufficient rights on the machine to start a server on theseports (e.g., only <em class="emphasis">root</em> may bind to ports below1024 on Unix machines). However, servers using ports other than thestandard 80 and 443 may be inaccessible to users behind firewalls.Some firewalls are configured to restrict access to all but a narrowset of ports representing the defaults for certain allowed protocols.</p></dd><dt><b>Path information</b></dt><dd><p><a name="INDEX-192" /><a name="INDEX-193" /> <a name="INDEX-194" />Pathinformation represents the location of the resource being requested,such as an HTML file or a CGI script. Depending on how your webserver is configured, it may or may not map to some actual file pathon your system. As we mentioned last chapter, the<a name="INDEX-195" /><a name="INDEX-196" /><a name="INDEX-197" />URL path for CGI scripts generallybegin with <em class="filename">/cgi/</em> or<em class="filename">/cgi-bin/</em> and these paths are mapped to asimilarly-named directory in the web server, such as<em class="filename">/usr/local/apache/cgi-bin</em>.</p><p>Note that the URL for a script may include path information beyondthe location of the script itself. For example, say you have a CGIat:</p><p><em class="emphasis">http://localhost/cgi/browse_docs.cgi</em></p><p>You can pass extra path information to the script by appending it tothe end, for example:</p><p><em class="emphasis">http://localhost/cgi/browse_docs.cgi/docs/product/description.text</em></p><p>Here the path <em class="filename">/docs/product/description.text</em> ispassed to the script. We explain how to access and use thisadditional path information in more detail in the next chapter.</p></dd><dt><b>Query string</b></dt><dd><p>A <a name="INDEX-198" /><a name="INDEX-199" />query string passes additional parametersto scripts. It is sometimes referred to as a<a name="INDEX-200" /><a name="INDEX-201" /><a name="INDEX-202" /><a name="INDEX-203" />search string or an index.It may contain <a name="INDEX-204" />nameand value pairs, in which each pair is separated from the next pairby an <a name="INDEX-205" />ampersand(<tt class="literal">&amp;</tt>), and the name and value are separated fromeach other by an <a name="INDEX-206" />equals sign (<tt class="literal">=</tt>).We discuss how to parse and use this information in your scripts inthe next chapter.</p><p>Query strings can also include data that is not formatted asname-value pairs. If a query string does not contain an equals sign,it is often referred to as an index. Each argument should beseparated from the next by an<a name="INDEX-207" />encoded space (encodedeither as <a name="INDEX-208" /> <a name="INDEX-209" /><tt class="literal">+</tt> or<tt class="literal">%20</tt>; see <a href="ch02_01.htm#ch02-80730">Section 2.1.3, "URL Encoding"</a>below). CGI scripts handle indexes a little differently, as we willsee in the next chapter.</p></dd><dt><b>Fragment identifier</b></dt><dd><p><a name="INDEX-210" /><a name="INDEX-211" />Fragment identifiers refer to aspecific section in a resource. Fragment identifiers are not sent toweb servers, so you cannot access this component of the URLs in yourCGI scripts. Instead, the browser fetches a resource and then appliesthe fragment identifier to locate the appropriate section in theresource. For <a name="INDEX-212" /> <a name="INDEX-213" />HTML documents, fragment identifiersrefer to anchor tags within the document:</p><blockquote><pre class="code">&lt;a name="anchor" &gt;Here is the content you're after...&lt;/a&gt;</pre></blockquote><p>The following URL would request the full document and then scroll tothe section marked by the anchor tag:</p><p><em class="emphasis">http://localhost/document.html#anchor</em></p><p>Web browsers generally jump to the bottom of the document if noanchor for the fragment identifier is found.</p></dd></dl></div><a name="ch02-2-fm2xml" /><div class="sect2"><h3 class="sect2">2.1.2. Absolute and Relative URLs</h3><p>Many of the elements within a <a name="INDEX-214" /> <a name="INDEX-215" /> <a name="INDEX-216" />URL are optional. You may omit the<a name="INDEX-217" /> <a name="INDEX-218" />scheme, host, and port number in a URL ifthe URL is used in a context where these elements can be assumed. Forexample, if you include a URL in a link on an HTML page and leave outthese elements, the browser will assume the link applies to aresource on the same machine as the link. There are two classes ofURLs:</p><dl><dt><b>Absolute URL</b></dt><dd><p>URLs that include the hostname are called<a name="INDEX-219" /><a name="INDEX-220" />absolute URLs. An example of an

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -