📄 ch11_01.htm
字号:
<?label 11. Maintaining State?><html><head><title>Maintaining State (CGI Programming with Perl)</title><link href="../style/style1.css" type="text/css" rel="stylesheet" /><meta name="DC.Creator" content="Scott Guelich, Gunther Birznieks and Shishir Gundavaram" /><meta scheme="MIME" content="text/xml" name="DC.Format" /><meta content="en-US" name="DC.Language" /><meta content="O'Reilly & Associates, Inc." name="DC.Publisher" /><meta scheme="ISBN" name="DC.Source" content="1565924193L" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="CGI Programming with Perl" /><meta content="Text.Monograph" name="DC.Type" /></head><body bgcolor="#ffffff"><img src="gifs/smbanner.gif" alt="Book Home" usemap="#banner-map" border="0" /><map name="banner-map"><area alt="CGI Programming with Perl" href="index.htm" coords="0,0,466,65" shape="rect" /><area alt="Search this book" href="jobjects/fsearch.htm" coords="467,0,514,18" shape="rect" /></map><div class="navbar"><table border="0" width="515"><tr><td width="172" valign="top" align="left"><a href="ch10_04.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0" /></a></td><td width="171" valign="top" align="center"><a href="index.htm">CGI Programming with Perl</a></td><td width="172" valign="top" align="right"><a href="ch11_02.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0" /></a></td></tr></table></div><hr align="left" width="515" /><h1 class="chapter">Chapter 11. Maintaining State</h1><div class="htmltoc"><h4 class="tochead">Contents:</h4><p><a href="ch11_01.htm">Query Strings and Extra Path Information</a><br><a href="ch11_02.htm">Hidden Fields</a><br><a href="ch11_03.htm">Client-Side Cookies</a><br></p></div><p>HTTP is a stateless protocol. As we discussed in <a href="ch02_01.htm">Chapter 2, "The Hypertext Transport Protocol "</a>, the <a name="INDEX-2179" />HTTP protocol defines how web clients andservers communicate with each other to provide documents andresources to the user. Unfortunately, as we noted in our discussionof HTTP (see <a href="ch02_05.htm#ch02-99950">Section 2.5.1, "Identifying Clients"</a>), HTTP does not provide adirect way of identifying clients in order to keep track of themacross multiple page requests. There are ways to track users throughindirect methods, however, and we'll explore these methods inthis chapter. Web developers refer to the practice of tracking usersas <em class="firstterm">maintaining state</em><a name="INDEX-2180" /><a name="INDEX-2181" /> <a name="INDEX-2,182" /><a name="INDEX-2183" />.The series of interactions that a particular user has with our siteis a <em class="firstterm">session</em><a name="INDEX-2184" /> <a name="INDEX-2,185" />. The information that we collect for auser is <em class="firstterm">session information</em>.</p><p>Why would we want to maintain state? If you value privacy, the ideaof tracking users may raise concerns. It is true that tracking userscan be used for questionable purposes. However, there are legitimateinstances when you must track users. Take an online store: in orderto allow a customer to browse products, add some to a shopping cart,and then check out by purchasing the selected items, the server mustmaintain a separate shopping cart for each user. In this case,collecting selected items in a user's session information isnot only acceptable, but expected.</p><p>Before we discuss methods for maintaining state, let's brieflyreview what we learned earlier about the <a name="INDEX-2186" />HTTP transaction model. This willprovide a context to understand the options we present later. Eachand every HTTP transaction follows the same general format: a<a name="INDEX-2187" /><a name="INDEX-2188" /><a name="INDEX-2189" />requestfrom a client followed by a response from the server. Each of theseis divided into a request/response line, header lines, and possiblysome message content. For example, if you open your favorite browserand type in the URL:</p><blockquote class="simplelist"><p><em class="emphasis">http://www.oreilly.com/catalog/cgi2/index.html</em></p></blockquote><p>Your browser then connects to <em class="emphasis">www.oreilly.com</em> onport 80 (the default port for HTTP) and issues a request for<em class="emphasis">/catalog/cgi2/index.html</em>. On the server side,because the web server is bound to port 80, it answers any requeststhat are issued through that port. Here is how the request would lookfrom a browser supporting HTTP 1.0:</p><blockquote><pre class="code">GET /index.html HTTP/1.0Accept: image/gif, image/x-xbitmap, image/jpeg, image/png, */*Accept-Language: enAccept-Charset: iso-8859-1,*,utf-8User-Agent: Mozilla/4.5 (Macintosh; I; PPC)</pre></blockquote><p>The browser uses the <a name="INDEX-2190" />GET request method to ask for thedocument, specifies the HTTP protocol to use, and supplies a numberof headers to pass information about itself and the format of thecontent it will accept. Because the request is sent via GET and notPOST, the browser is not passing any content to the server.</p><p>Here is how the<a name="INDEX-2191" />server would respond to the request:</p><blockquote><pre class="code">HTTP/1.0 200 OKDate: Sat, 18 Mar 2000 20:35:35 GMTServer: Apache/1.3.9 (Unix)Last-Modified: Wed, 20 May 1998 14:59:42 GMTContent-Length: 141Content-Type: text/html(content)...</pre></blockquote><p>In Version 1.0 of HTTP, the server returns the requested document andthen closes the<a name="INDEX-2192" />connection.Yes, that's right: the server doesn't keep the connectionopen between itself and the browser. So, if you were to click on alink on the returned page, the browser then issues another request tothe server, and so on. As a result, the server has no way of knowingthat it's you that is requesting the successive document. Thisis what we mean by<em class="emphasis">stateless</em><a name="INDEX-2193" /><a name="INDEX-2194" />, or nonpersistent; the serverdoesn't maintain or store any request-related information fromone transaction to the next. You do know the network address of theclient who is connecting to you, but as you'll recall from ourearlier discussion of<a name="INDEX-2195" />proxies (see <a href="ch02_05.htm#ch02-54689">Section 2.5, "Proxies"</a>), multiple users may be making connections viathe same proxy.</p><p>You may be waiting to hear what's changed in Version 1.1 ofHTTP. In fact, a<a name="INDEX-2196" />connection may remain open acrossmultiple requests, although the request and response cycle is thesame as above. However, you cannot rely on the network connectionremaining open since the connection can be closed or lost for anynumber of reasons, and in any event CGI has not been modified toallow you access any information that would associate requests madeacross the same connection. So in HTTP 1.1 as in HTTP 1.0, the job ofmaintaining state falls to us.</p><p>Consider our <a name="INDEX-2197" />shopping cart example: it should allowconsumers to navigate through many pages and selectively place itemsin their carts. A consumer typically places an item in a cart byselecting a product, entering the desired quantity, and submittingthe form. This action sends the data to the web server, which, inturn, invokes the requested CGI application. To the server,it's simply another request. So, it's up to theapplication to not only keep track of the data between multipleinvocations, but also to identify the data as belonging to aparticular consumer.</p><p>In order to maintain state, we must get the client to pass us some<a name="INDEX-2198" /><a name="INDEX-2199" />unique identifierwith each request. As you can see from the HTTP request exampleearlier, there are only three different ways the client can passinformation to us: via the request line, via a header line, or viathe content (in the case of a POST request). Thus, in order tomaintain state, we can have the client pass a unique identifier to usvia any of these methods. In fact, the techniques we'll explorewill cover all three of these ways:</p><dl><dt><b><a name="INDEX-2200" /><a name="INDEX-2201" /><a name="INDEX-2202" />Query strings and extra path information</b></dt><dd><p>It's possible to embed an identifier in the query string or asextra path information within a document's URL. As userstraverse through a site, a CGI application generates documents on thefly, passing the identifier from document to document. This allows usto keep track of all the documents requested by each user, and in theorder in which they were requested. The browser sends thisinformation to us via the request line.</p></dd><dt><b><a name="INDEX-2203" />Hidden fields</b></dt><dd><p>Hidden form fields allow us to embed "invisible"name-value information within forms that the user cannot see withoutviewing the source of the HTML page. Like typical form fields andvalues, this information is sent to the CGI application when the userpresses the submit button. We generally use this technique tomaintain the user's selections and preferences when multipleforms are involved. We'll also look at how CGI.pm can do muchof this work for us. The browser sends this information to us via therequest line or via the message content depending on whether therequest was GET or POST, respectively.</p></dd><dt><b><a name="INDEX-2204" /><a name="INDEX-2205" /><a name="INDEX-2206" />Client-side cookies</b></dt><dd><p>All modern browsers support client-side cookies, which allow us tostore information on the client machine and have it pass it back tous with each request. We can use this to store semi-permanent data onthe client-side, which will be available to us whenever the userrequests future resources from the server. Cookies are sent back tous by the client in the <em class="emphasis">Cookie</em> HTTP header line.</p></dd></dl><p>The advantages and disadvantages of each technique are summarized in<a href="ch11_01.htm#ch11-81511">Table 11-1</a>. We will review each techniqueseparately, so if some of the points in the table are unclear you maywant to refer back to this table after reading the sections below. Ingeneral, though, you should note that client-side cookies are themost powerful option for maintaining state, but they requiresomething from the client. The other options work regardless of theclient, but both have limits in the number of the pages that we cantrack the user across.</p><a name="ch11-81511" /><h4 class="objtitle">Table 11-1. Summary of the Techniques for Maintaining State </h4><table border="1"><tr><th><p>Technique</p></th><th><p>Scope</p></th><th><p>Reliability and Performance</p></th><th><p>Client Requirements</p></th></tr><tr><td><p>Query strings and extra path information</p></td><td><p>Can be configured to apply to a particular group of pages or anentire web site, but state information is lost if the user leaves theweb site and later returns</p></td><td><p>Difficult to reliably parse all links in a document;</p><p>significant performance cost to pass static content through CGIscripts</p></td><td><p>Does not require any special behavior from the client</p></td></tr><tr><td><p>Hidden fields</p></td><td><p>Only works across a series of form submissions</p></td><td><p>Easy to implement; does not affect performance</p></td><td><p>Does not require any special behavior from the client</p></td></tr><tr><td><p>Cookies</p></td><td><p>Works everywhere, even if the user visits another site and laterreturns</p></td><td><p>Easy to implement; does not affect performance</p></td><td><p>Requires that the client supports (and accepts) cookies</p></td></tr></table><div class="sect1"><a name="ch11-36070" /><h2 class="sect1">11.1. Query Strings and Extra Path Information</h2><p>We've passed <a name="INDEX-2209" /> <a name="INDEX-2,210" /> <a name="INDEX-2,211" />queryinformation to <a name="INDEX-2212" /> <a name="INDEX-2,213" />CGI applications many timesthroughout this book. In this section, we'll use queries in aslightly less obvious manner, namely to track a user's browsingtrail while traversing from one document to the next on the server.</p><p>In order to do this, we'll have a <a name="INDEX-2214" /> <a name="INDEX-2,215" />CGI script handle every request fora static HTML page. The CGI script will check whether the request URLcontains an identifier matching our format. If it doesn't, thescript assumes that this is a new user and generates a newidentifier. The script then parses the requested HTML document bylooking for links to other URLs within our web site and appending aunique identifier to each URL. Thus, the identifier will be passed onwith future requests and propagated from document to document. Ofcourse, if we want to track users across CGI applications thenwe'll also need to parse the output of these CGI scripts. Thesimplest way to<a name="INDEX-2216" />accomplish bothgoals is to create a general module that handles reading theidentifier and parsing the output. This way, we need to write ourcode only once and can have the script for our HTML pages as well asallow all our other CGI scripts share it.</p><p>As you may have guessed, this is not a very efficient process, sincea request for each and every HTML document triggers a CGI applicationto be executed. Tools such as <em class="emphasis">mod_perl</em> andFastCGI, discussed in <a href="ch17_01.htm">Chapter 17, "Efficiency and Optimization"</a>, help because bothof these tools effectively embed the Perl interpreter into the webserver.</p><p>Another strategy to help improve performance is to perform someprocessing in advance. If you are willing to preprocess yourdocuments, you can reduce the amount of work that happens when thecustomer accesses the document. The majority of the work involved inparsing a document and replacing<a name="INDEX-2217" />linksis identifying the links. <a name="INDEX-2218" /><a name="INDEX-2219" /><a name="INDEX-2220" /> <a name="INDEX-2,221" />HTML::Parseris a good module, but the work it does is rather complex. If youparse the links and add a special keyword instead of one for aparticular user, then later you can look for this keyword and nothave to worry about recognizing links. For example, you could parse<a name="INDEX-2222" />URLs and add<tt class="literal">#USERID#</tt> as the identifier for each document. Theresulting code becomes much simpler. You can effectively handledocuments this way:</p><blockquote><pre class="code">sub parse { my( $filename, $id ) = @_; local *FH; open FH, $filename or die "Cannot open file: $!"; while (<FH>) { s/#USERID#/$id/g; print; }}</pre></blockquote><p>However, when a user traverses through a set of static HTMLdocuments, CGI applications are typically not involved. Ifthat's the case, how do we pass session information from oneHTML document to the next, and be able to keep track of it on theserver?</p><p>The answer to our problem is to configure the<a name="INDEX-2223" />server such that when the userrequests an HTML document, the server executes a CGI application. The
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -