📄 ch05_02.htm
字号:
interface for handling file uploads.</p><p>You can access the<a name="INDEX-1025" />name of an uploaded file with the<tt class="function">param</tt><a name="INDEX-1026" /> method, just like the value of any otherform element. For example, if your CGI script were receiving inputfrom the following HTML form:</p><blockquote><pre class="code"><FORM ACTION="/cgi/upload.cgi" METHOD="POST" ENCTYPE="multipart/form-data"> <P>Please choose a file to upload: <INPUT TYPE="FILE" NAME="file"> <INPUT TYPE="SUBMIT"></FORM></pre></blockquote><p>then you could get the name of the uploaded file this way, byreferring to the name of the <FILE> input element, in this case"file":</p><blockquote><pre class="code">my $file = $q->param( "file" );</pre></blockquote><p>The name you receive from this parameter is the name of the file asit appeared on the user's machine when they uploaded it. CGI.pmstores the file as a temporary file on your system, but the name ofthis temporary file does not correspond to the name you get from thisparameter. We will see how to access the temporary file in a moment.</p><p>The name supplied by this parameter varies according to<a name="INDEX-1027" /> <a name="INDEX-1,028" />platform and browser. Some systemssupply just the name of the uploaded file; others supply the entirepath of the file on the user's machine. Because<a name="INDEX-1029" />pathdelimiters also vary between systems, it can be a challengedetermining the name of the file. The following command appears towork for Windows, Macintosh, and Unix-compatible systems:</p><blockquote><pre class="code">my( $file ) = $q->param( "file" ) =~ m|([^/:\\]+)$|;</pre></blockquote><p>However, it may strip parts of filenames, since "report11/3/99" is a valid filename on Macintosh systems and the abovecommand would in this case set <tt class="literal">$file</tt> to"99". Another solution is to replace any characters otherthan letters, digits, underscores, dashes, and periods with<a name="INDEX-1030" />underscores andprevent any files from beginning with <a name="INDEX-1031" /><a name="INDEX-1032" />periods or dashes:</p><blockquote><pre class="code">my $file = $q->param( "file" );$file =~ s/([^\w.-])/_/g;$file =~ s/^[-.]+//;</pre></blockquote><p>The problem with this is that<a name="INDEX-1033" /><a name="INDEX-1034" /><a name="INDEX-1035" />Netscape's browserson Windows sends the full path to the file as the filename. Thus,<tt class="literal">$file</tt> may be set to something long and ugly like"C_ _ _Windows_Favorites_report.doc".</p><p>You could try to sort out the behaviors of the different operatingsystems and browsers, check for the user's browser andoperating system, and then treat the filename appropriately, but thatwould be a very poor solution. You are bound to miss somecombinations, you would constantly need to update it, and one of thegreatest advantages of the Web is that it works across platforms; youshould not build any limitations into your solutions.</p><p>So the simple, obvious solution is actually nontechnical. If you doneed to know the name of the uploaded file, just add another textfield to the form allowing the user to enter the name of the filethey are uploading. This has the added advantage of allowing a userto provide a different name than the file has, if appropriate. The<a name="INDEX-1036" />HTML form looks likethis:</p><blockquote><pre class="code"><FORM ACTION="/cgi/upload.cgi" METHOD="POST" ENCTYPE="multipart/form-data"> <P>Please choose a file to upload: <INPUT TYPE="FILE" NAME="file"> <P>Please enter the name of this file: <INPUT TYPE="TEXT" NAME="filename"></FORM></pre></blockquote><p>You can then get the name from the text field, remembering to stripout any odd characters:</p><blockquote><pre class="code">my $filename = $q->param( "filename" );$filename =~ s/([^\w.-])/_/g;$filename =~ s/^[-.]+//;</pre></blockquote><p>So now that we know how to get the name of the<a name="INDEX-1037" />file uploaded, let's look athow we get at the content. CGI.pm creates a temporary file to storethe contents of the upload; you can get a file handle for this fileby passing the name of the file according to the file element to the<tt class="function">upload</tt><a name="INDEX-1038" /> <a name="INDEX-1,039" />method as follows:</p><blockquote><pre class="code">my $file = $q->param( "file" );my $fh = $q->upload( $file );</pre></blockquote><p>The <tt class="function">upload</tt> method was added to CGI.pm in Version2.47. Prior to this you could use the value returned by<tt class="function">param</tt><a name="INDEX-1040" /> (in this case<tt class="literal">$file</tt>) as a file handle in order to read from thefile; if you use it as a string it returns the name of the file. Thisactually still works, but there are conflicts with strict mode andother problems, so <tt class="function">upload</tt> is the preferred wayto get a file handle now. Be sure that you pass<em class="emphasis">upload</em> the name of the file according to<em class="emphasis">param</em>, and not a different name (e.g., the namethe user supplied, the name with nonalphanumeric characters replacedwith underscores, etc.).</p><p>Note that <a name="INDEX-1041" /><a name="INDEX-1042" /><a name="INDEX-1043" />transfer errors are much more common withfile uploads than with other forms of input. If the user presses theStop button in the browser as the file is uploading, for example,CGI.pm will receive only a portion of the uploaded file. Because ofthe format of<em class="emphasis">multipart/form-data</em><a name="INDEX-1044" /> requests, CGI.pm willrecognize that the transfer is incomplete. You can check for errorssuch as this by using the<tt class="function">cgi_error</tt><a name="INDEX-1045" /><a name="INDEX-1046" /><a name="INDEX-1047" />method after creating a CGI.pm object. It returns the<a name="INDEX-1048" /> <a name="INDEX-1,049" />HTTP status code and messagecorresponding to the error, if applicable, or an empty string if noerror has occurred. For instance, if the<em class="emphasis">Content-length</em> of a POST request exceeds<tt class="literal">$CGI::POST_MAX</tt>, then<tt class="function">cgi_error</tt> will return "413 Request entitytoo large". As a general rule, you should always check for anerror when you are recording input on the server. This includes fileuploads and other POST requests. It doesn't hurt to check foran error with GET requests either.</p><p><a href="ch05_02.htm#ch05-48711">Example 5-2</a> provides the <a name="INDEX-1050" />complete code, with error checking, toreceive a file upload via our previous HTML form.</p><a name="ch05-48711" /><div class="example"><h4 class="objtitle">Example 5-2. upload.cgi </h4><blockquote><pre class="code">#!/usr/bin/perl -wTuse strict;use CGI;use Fcntl qw( :DEFAULT :flock );use constant UPLOAD_DIR => "/usr/local/apache/data/uploads";use constant BUFFER_SIZE => 16_384;use constant MAX_FILE_SIZE => 1_048_576; # Limit each upload to 1 MBuse constant MAX_DIR_SIZE => 100 * 1_048_576; # Limit total uploads to 100 MBuse constant MAX_OPEN_TRIES => 100;$CGI::DISABLE_UPLOADS = 0;$CGI::POST_MAX = MAX_FILE_SIZE;my $q = new CGI;$q->cgi_error and error( $q, "Error transferring file: " . $q->cgi_error );my $file = $q->param( "file" ) || error( $q, "No file received." );my $filename = $q->param( "filename" ) || error( $q, "No filename entered." );my $fh = $q->upload( $file );my $buffer = "";if ( dir_size( UPLOAD_DIR ) + $ENV{CONTENT_LENGTH} > MAX_DIR_SIZE ) { error( $q, "Upload directory is full." );}# Allow letters, digits, periods, underscores, dashes# Convert anything else to an underscore$filename =~ s/[^\w.-]/_/g;if ( $filename =~ /^(\w[\w.-]*)/ ) { $filename = $1;}else { error( $q, "Invalid file name; files must start with a letter or number." );}# Open output file, making sure the name is uniqueuntil ( sysopen OUTPUT, UPLOAD_DIR . $filename, O_CREAT | O_EXCL ) { $filename =~ s/(\d*)(\.\w+)$/($1||0) + 1 . $2/e; $1 >= MAX_OPEN_TRIES and error( $q, "Unable to save your file." );}# This is necessary for non-Unix systems; does nothing on Unixbinmode $fh;binmode OUTPUT;# Write contents to output filewhile ( read( $fh, $buffer, BUFFER_SIZE ) ) { print OUTPUT $buffer;}close OUTPUT;sub dir_size { my $dir = shift; my $dir_size = 0; # Loop through files and sum the sizes; doesn't descend down subdirs opendir DIR, $dir or die "Unable to open $dir: $!"; while ( readdir DIR ) { $dir_size += -s "$dir/$_"; } return $dir_size;}sub error { my( $q, $reason ) = @_; print $q->header( "text/html" ), $q->start_html( "Error" ), $q->h1( "Error" ), $q->p( "Your upload was not procesed because the following error ", "occured: " ), $q->p( $q->i( $reason ) ), $q->end_html; exit;}</pre></blockquote></div><p>We start by creating several constants to configure this script.<tt class="literal">UPLOAD_DIR</tt><a name="INDEX-1051" />is the path to the directory where we will store uploaded files.<tt class="literal">BUFFER_SIZE</tt><a name="INDEX-1052" />is the amount of data to read into memory while transferring from thetemporary file to the output file. <tt class="literal">MAX_FILE_SIZE</tt>is the maximum file size we will accept; this is important because wewant to limit users from uploading gigabyte-sized files and fillingup all of the server's disk space.<tt class="literal">MAX_DIR_SIZE</tt><a name="INDEX-1053" /> is themaximum size that we will allow our upload directory to grow to. Thisrestriction is as important as the last because users can fill up ourdisks by posting lots of small files just as easily as posting largefiles. Finally, <tt class="literal">MAX_OPEN_TRIES</tt> is the number oftimes we try to generate a unique filename and open that file beforewe give up; we'll see why this step is necessary in a moment.</p><p>First, we enable file uploads, then we set<tt class="literal">$CGI::POST_MAX</tt><a name="INDEX-1054" /><a name="INDEX-1055" /> to<tt class="literal">MAX_FILE_SIZE</tt>. Note<tt class="literal">$CGI::POST_MAX</tt> is actually the size of the entirecontent of the request, which includes the data for other form fieldsas well as overhead for the <em class="emphasis">multipart/form-data</em>encoding, so this value is actually a little larger than the maximumfile size that the script will actually accept. For this form, thedifference is minor, but if you add a file upload field to a complexform with multiple text fields, then you should keep this distinctionin mind.</p><p>We then create a CGI object and check for errors. As we said earlier,errors with file uploads are much more common than with other formsof CGI input. Next we get the file's upload name and thefilename the user provided, reporting errors if either of these ismissing. Note that a user may be rather upset to get a message sayingthat the filename is missing after uploading a large file via amodem. There is no way to interrupt that transfer, but in aproduction application, it might be more user-friendly to save theunnamed file temporarily, prompt the user for a filename, and thenrename the file. Of course, you would then need periodically clean uptemporary files that were abandoned.</p><p>We get a <a name="INDEX-1056" />file handle, <tt class="literal">$fh</tt>, tothe temporary file where CGI.pm has stored the input. We checkwhether our upload directory is full and report an error if this isthe case. Again, this message is likely to create some unhappy users.In a production application you should add code to notify anadministrator who can see why the upload directory is full andresolve the problem. See <a href="ch09_01.htm">Chapter 9, "Sending Email"</a>.</p><p>Next, we replace any characters in the filename the user suppliedthat may cause problems with an underscore and make sure the namedoesn't start with a <a name="INDEX-1057" /><a name="INDEX-1058" />period or a dash. The odd construct thatreassigns the result of the <a name="INDEX-1059" />regular expression to<tt class="literal">$filename</tt> untaints that variable. We'lldiscuss tainting and why this is important in <a href="ch08_01.htm">Chapter 8, "Security"</a>. We confirm again that<tt class="literal">$filename</tt> is not empty (which would happen if ithad consisted of nothing but periods and/or dashes) and generate anerror if this is the case.</p><p>We try to open a file with this name in our upload directory. If wefail, then we add a digit to <tt class="literal">$filename</tt> and tryagain. The regular expression allows us to keep the file extensionthe same: if there is already a <em class="filename">report.txt</em> file,then the next upload with that name will be named<em class="filename">report1.txt</em>, the next one<em class="filename">report2.txt</em>, etc. This continues until we exceed<tt class="literal">MAX_OPEN_TRIES</tt><a name="INDEX-1060" /> <a name="INDEX-1,061" />. It is important thatwe create a limit to this loop because there may be a reason otherthan a non-unique name that prevents us from saving the file. If thedisk is full or the system has too many open files, for example, wedo not want to start looping endlessly. This error should also notifyan administrator that something is wrong.</p><p>This script is written to handle any type of file upload, includingbinary files such as images or audio. By default, whenever Perlaccesses a file handle on non-Unix systems (more specifically,systems that do not use <tt class="literal">\n</tt> as their end of linecharacter), Perl translates the native <a name="INDEX-1062" /> <a name="INDEX-1,063" /> <a name="INDEX-1,064" />operating system's end of linecharacters, such as <tt class="literal">\r\n</tt> for Windows or<tt class="literal">\r</tt> for MacOS, to <tt class="literal">\n</tt> on inputand back to the native characters on output. This works great fortext files, but it can corrupt binary files. Thus, we enable<a name="INDEX-1065" />binary mode with the<tt class="function">binmode</tt> function in order to disable thistranslation. On systems, like Unix, where no end of line translationoccurs, <tt class="function">binmode</tt> has no effect.</p><p>Finally, we read from our temporary file handle and write to ouroutput file and exit. We use the<tt class="function">read</tt><a name="INDEX-1066" /> function to read and write a chunk a dataat a time. The size of this chunk is defined by our<tt class="literal">BUFFER_SIZE</tt> constant. In case you are wondering,CGI.pm will remove its temporary file automatically when our scriptexits (technically, when <tt class="literal">$q</tt> goes out of scope).</p><p>There is another way we could have moved the file to our<em class="emphasis">uploads</em> directory. We could use CGI.pm'sundocumented<tt class="function">tmpFileName</tt><a name="INDEX-1067" /><a name="INDEX-1068" /><a name="INDEX-1069" /> method to get the name of thetemporary file containing the upload and then used<a name="INDEX-1070" />Perl's <tt class="function">rename</tt>function to move the file. However, relying on undocumented code isdangerous, because it may not be compatible with future versions ofCGI.pm. Thus, in our example we stick to the public API instead.</p><p>The<tt class="function">dir_size</tt><a name="INDEX-1071" /><a name="INDEX-1072" /> subroutine calculates the size of adirectory by summing the size of each of its files. The<tt class="function">error</tt> subroutine prints a message telling theuser why the transfer failed. In a production application, youprobably want to provide links for the user to get help <a name="INDEX-1073" />or to notify<a name="INDEX-1074" /> <a name="INDEX-1,075" />someone <a name="INDEX-1,076" /> <a name="INDEX-1,077" /> <a name="INDEX-1,078" />about problems.</p></div><hr align="left" width="515" /><div class="navbar"><table border="0" width="515"><tr><td width="172" valign="top" align="left"><a href="ch05_01.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0" /></a></td><td width="171" valign="top" align="center"><a href="index.htm"><img src="../gifs/txthome.gif" alt="Home" border="0" /></a></td><td width="172" valign="top" align="right"><a href="ch05_03.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0" /></a></td></tr><tr><td width="172" valign="top" align="left">5. CGI.pm</td><td width="171" valign="top" align="center"><a href="index/index.htm"><img src="../gifs/index.gif" alt="Book Index" border="0" /></a></td><td width="172" valign="top" align="right">5.3. Generating Output with CGI.pm</td></tr></table></div><hr align="left" width="515" /><img src="../gifs/navbar.gif" alt="Library Navigation Links" usemap="#library-map" border="0" /><p><font size="-1"><a href="copyrght.htm">Copyright © 2001</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area href="../index.htm" coords="1,1,83,102" shape="rect" /><area href="../lnut/index.htm" coords="81,0,152,95" shape="rect" /><area href="../run/index.htm" coords="172,2,252,105" shape="rect" /><area href="../apache/index.htm" coords="238,2,334,95" shape="rect" /><area href="../sql/index.htm" coords="336,0,412,104" shape="rect" /><area href="../dbi/index.htm" coords="415,0,507,101" shape="rect" /><area href="../cgi/index.htm" coords="511,0,601,99" shape="rect" /></map></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -