📄 ch09_02.htm
字号:
valid email addresses, you should accept quoted elements. Onlyelements on the left side of the "@" may be quoted, butany ASCII character is allowed within quotes (some have to be escapedwith a backslash). This is why any check in our code for"invalid characters" in an email address would be flawed,and this is why it is very dangerous to pass email addresses througha shell as an argument to a command.</p><blockquote><a name="FOOTNOTE-18" /><p>[18]RFC 822 more technicallyrefers to this as an "atom."</p></blockquote><p>The second email address also includes<a name="INDEX-1844" />spaces. Spaces(and tabs) are legal between any element and at the beginning and endof the email address. However, it doesn't change the meaning toremove them and that is exactly what emailers generally do when yousend a message to an email address containing spaces. Note, however,that you cannot simply remove every space in an email address sincespaces appearing within quotes do carry meaning and must be leftintact. Only those appearing outside of quotes can be removed. Wewill strip them in our example. We probably don't have to; itis not unreasonable to expect your users to enter the email addresswithout extra spaces.</p><p>The last example contains<a name="INDEX-1845" />comments. It isperfectly legal to include comments, which are enclosed withinparentheses, anywhere where spaces are allowed. Comments are onlyintended to pass additional information to humans, and machines canignore them. Thus, it is rather silly to enter them into an automatedweb form. We will simplify our code by not accepting comments in theemail addresses we are checking.</p><p>So here is the code that we will use to validate email addresses. Itis considerably shorter than the example given by Mr. Friedl, but itis not nearly so flexible. It does not support comments, it removesspaces before validating, and it limits<a name="INDEX-1846" />hosts to moderndomain names and IP addresses. Nonetheless, it is quite complicated,and the regular expression to perform the check would be toodifficult to type out. Instead, we build it through a number ofintermediate variables. The process of doing this is too involved toexplain here. If you want to understand how to build complex regularexpressions like this, we highly recommend <em class="citetitle">MasteringRegular Expressions</em>.</p><p>One note, however: the variable <tt class="literal">$top_level</tt>contains the expression that matches valid<a name="INDEX-1847" />top-level domains. Our current toplevel domains have two (e.g., <tt class="literal">.us</tt>,<tt class="literal">.uk</tt>, <tt class="literal">.au</tt>, etc.) or threeletters (e.g., <tt class="literal">.com</tt>, <tt class="literal">.org</tt>,<tt class="literal">.net</tt>, etc.). The number of top-level domains willcertainly increase. Some of the proposed new names, such as<tt class="literal">.firm</tt>, have more than three characters. Thus, theregular expression below will allow anywhere from two to fourcharacters:</p><blockquote><pre class="code">my $top_level = qq{ (?: $atom_char ){2,4} };</pre></blockquote><p>If you want to be more restrictive today, you can limit it to three.Likewise, if top-level domains with more than four characters aresomeday allowed, you would need to increase it.</p><p>Finally, here's the<a name="INDEX-1848" /><a name="INDEX-1849" /><a name="INDEX-1850" />code:</p><blockquote><pre class="code">sub validate_email_address { my $addr_to_check = shift; $addr_to_check =~ s/("(?:[^"\\]|\\.)*"|[^\t "]*)[ \t]*/$1/g; my $esc = '\\\\'; my $space = '\040'; my $ctrl = '\000-\037'; my $dot = '\.'; my $nonASCII = '\x80-\xff'; my $CRlist = '\012\015'; my $letter = 'a-zA-Z'; my $digit = '\d'; my $atom_char = qq{ [^$space<>\@,;:".\\[\\]$esc$ctrl$nonASCII] }; my $atom = qq{ $atom_char+ }; my $byte = qq{ (?: 1?$digit?$digit | 2[0-4]$digit | 25[0-5] ) }; my $qtext = qq{ [^$esc$nonASCII$CRlist"] }; my $quoted_pair = qq{ $esc [^$nonASCII] }; my $quoted_str = qq{ " (?: $qtext | $quoted_pair )* " }; my $word = qq{ (?: $atom | $quoted_str ) }; my $ip_address = qq{ \\[ $byte (?: $dot $byte ){3} \\] }; my $sub_domain = qq{ [$letter$digit] [$letter$digit-]{0,61} [$letter$digit]}; my $top_level = qq{ (?: $atom_char ){2,4} }; my $domain_name = qq{ (?: $sub_domain $dot )+ $top_level }; my $domain = qq{ (?: $domain_name | $ip_address ) }; my $local_part = qq{ $word (?: $dot $word )* }; my $address = qq{ $local_part \@ $domain }; return $addr_to_check =~ /^$address$/ox ? $addr_to_check : "";}</pre></blockquote><p>If you supply an email address to<tt class="function">validate_email_address</tt>, it will strip out anyspaces or tabs that are not within quotes. We're being a littlelenient here since spaces within elements (as opposed to spaces<em class="emphasis">around</em> elements) are actually illegal, butwe'll just strip them in this step along with the legal spaces.We then check the address against our <a name="INDEX-1851" />regular expression. If it matches, theemail address is valid and is returned (without spaces). Otherwise,an empty string is returned, which evaluates to false in Perl. Youcan use the subroutine like so:</p><blockquote><pre class="code">use strict;use CGI;use CGIBook::Error;my $q = new CGI;my $email = validate_email_address( $q->param( "email" ) );unless ( $email ) { error( $q, "The email address you entered is invalid. " . "Please use your browser's Back button to " . "return to the form and try again." );}..</pre></blockquote><p>If you were planning to check multiple email addresses or intended touse this in an environment where your Perl code is precompiled (like<em class="emphasis">mod_perl</em> or FastCGI), then you could optimizethis code by building the regular expression once and caching thisexpression. However, this example is intended more to demonstrate whyvalidating email addresses is a challenge than to be used inproduction (it does not resolve the issue that an email address canbe syntactically valid yet bad).</p></div><hr align="left" width="515" /><div class="navbar"><table border="0" width="515"><tr><td width="172" valign="top" align="left"><a href="ch09_01.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0" /></a></td><td width="171" valign="top" align="center"><a href="index.htm"><img src="../gifs/txthome.gif" alt="Home" border="0" /></a></td><td width="172" valign="top" align="right"><a href="ch09_03.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0" /></a></td></tr><tr><td width="172" valign="top" align="left">9. Sending Email</td><td width="171" valign="top" align="center"><a href="index/index.htm"><img src="../gifs/index.gif" alt="Book Index" border="0" /></a></td><td width="172" valign="top" align="right">9.3. Structure of Internet Email</td></tr></table></div><hr align="left" width="515" /><img src="../gifs/navbar.gif" alt="Library Navigation Links" usemap="#library-map" border="0" /><p><font size="-1"><a href="copyrght.htm">Copyright © 2001</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area href="../index.htm" coords="1,1,83,102" shape="rect" /><area href="../lnut/index.htm" coords="81,0,152,95" shape="rect" /><area href="../run/index.htm" coords="172,2,252,105" shape="rect" /><area href="../apache/index.htm" coords="238,2,334,95" shape="rect" /><area href="../sql/index.htm" coords="336,0,412,104" shape="rect" /><area href="../dbi/index.htm" coords="415,0,507,101" shape="rect" /><area href="../cgi/index.htm" coords="511,0,601,99" shape="rect" /></map></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -