perlfaq6.pod

来自「视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.」· POD 代码 · 共 1,005 行 · 第 1/3 页

POD
1,005
字号
=head1 NAMEperlfaq6 - Regular Expressions ($Revision: 10126 $)=head1 DESCRIPTIONThis section is surprisingly small because the rest of the FAQ islittered with answers involving regular expressions.  For example,decoding a URL and checking whether something is a number are handledwith regular expressions, but those answers are found elsewhere inthis document (in L<perlfaq9>: "How do I decode or create those %-encodingson the web" and L<perlfaq4>: "How do I determine whether a scalar isa number/whole/integer/float", to be precise).=head2 How can I hope to use regular expressions without creating illegible and unmaintainable code?X<regex, legibility> X<regexp, legibility>X<regular expression, legibility> X</x>Three techniques can make regular expressions maintainable andunderstandable.=over 4=item Comments Outside the RegexDescribe what you're doing and how you're doing it, using normal Perlcomments.	# turn the line into the first word, a colon, and the	# number of characters on the rest of the line	s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg;=item Comments Inside the RegexThe C</x> modifier causes whitespace to be ignored in a regex pattern(except in a character class), and also allows you to use normalcomments there, too.  As you can imagine, whitespace and comments helpa lot.C</x> lets you turn this:	s{<(?:[^>'"]*|".*?"|'.*?')+>}{}gs;into this:	s{ <                    # opening angle bracket		(?:                 # Non-backreffing grouping paren			[^>'"] *        # 0 or more things that are neither > nor ' nor "				|           #    or else			".*?"           # a section between double quotes (stingy match)				|           #    or else			'.*?'           # a section between single quotes (stingy match)		) +                 #   all occurring one or more times		>                   # closing angle bracket	}{}gsx;                 # replace with nothing, i.e. deleteIt's still not quite so clear as prose, but it is very useful fordescribing the meaning of each part of the pattern.=item Different DelimitersWhile we normally think of patterns as being delimited with C</>characters, they can be delimited by almost any character.  L<perlre>describes this.  For example, the C<s///> above uses braces asdelimiters.  Selecting another delimiter can avoid quoting thedelimiter within the pattern:	s/\/usr\/local/\/usr\/share/g;	# bad delimiter choice	s#/usr/local#/usr/share#g;		# better=back=head2 I'm having trouble matching over more than one line.  What's wrong?X<regex, multiline> X<regexp, multiline> X<regular expression, multiline>Either you don't have more than one line in the string you're lookingat (probably), or else you aren't using the correct modifier(s) onyour pattern (possibly).There are many ways to get multiline data into a string.  If you wantit to happen automatically while reading input, you'll want to set $/(probably to '' for paragraphs or C<undef> for the whole file) toallow you to read more than one line at a time.Read L<perlre> to help you decide which of C</s> and C</m> (or both)you might want to use: C</s> allows dot to include newline, and C</m>allows caret and dollar to match next to a newline, not just at theend of the string.  You do need to make sure that you've actuallygot a multiline string in there.For example, this program detects duplicate words, even when they spanline breaks (but not paragraph ones).  For this example, we don't needC</s> because we aren't using dot in a regular expression that we wantto cross line boundaries.  Neither do we need C</m> because we aren'twanting caret or dollar to match at any point inside the record nextto newlines.  But it's imperative that $/ be set to something otherthan the default, or else we won't actually ever have a multilinerecord read in.	$/ = '';  		# read in more whole paragraph, not just one line	while ( <> ) {		while ( /\b([\w'-]+)(\s+\1)+\b/gi ) {  	# word starts alpha			print "Duplicate $1 at paragraph $.\n";		}	}Here's code that finds sentences that begin with "From " (which wouldbe mangled by many mailers):	$/ = '';  		# read in more whole paragraph, not just one line	while ( <> ) {		while ( /^From /gm ) { # /m makes ^ match next to \n		print "leading from in paragraph $.\n";		}	}Here's code that finds everything between START and END in a paragraph:	undef $/;  		# read in whole file, not just one line or paragraph	while ( <> ) {		while ( /START(.*?)END/sgm ) { # /s makes . cross line boundaries		    print "$1\n";		}	}=head2 How can I pull out lines between two patterns that are themselves on different lines?X<..>You can use Perl's somewhat exotic C<..> operator (documented inL<perlop>):	perl -ne 'print if /START/ .. /END/' file1 file2 ...If you wanted text and not lines, you would use	perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...But if you want nested occurrences of C<START> through C<END>, you'llrun up against the problem described in the question in this sectionon matching balanced text.Here's another example of using C<..>:	while (<>) {		$in_header =   1  .. /^$/;		$in_body   = /^$/ .. eof;	# now choose between them	} continue {		$. = 0 if eof;	# fix $.	}=head2 I put a regular expression into $/ but it didn't work. What's wrong?X<$/, regexes in> X<$INPUT_RECORD_SEPARATOR, regexes in>X<$RS, regexes in>$/ has to be a string.  You can use these examples if you really need to do this.If you have File::Stream, this is easy.	use File::Stream;	my $stream = File::Stream->new(		$filehandle,		separator => qr/\s*,\s*/,		);	print "$_\n" while <$stream>;If you don't have File::Stream, you have to do a little more work.You can use the four argument form of sysread to continually add toa buffer.  After you add to the buffer, you check if you have acomplete line (using your regular expression).	local $_ = "";	while( sysread FH, $_, 8192, length ) {		while( s/^((?s).*?)your_pattern/ ) {			my $record = $1;			# do stuff here.		}	} You can do the same thing with foreach and a match using the c flag and the \G anchor, if you do not mind your entire file being in memory at the end.	local $_ = "";	while( sysread FH, $_, 8192, length ) {		foreach my $record ( m/\G((?s).*?)your_pattern/gc ) {			# do stuff here.		}	substr( $_, 0, pos ) = "" if pos;	}=head2 How do I substitute case insensitively on the LHS while preserving case on the RHS?X<replace, case preserving> X<substitute, case preserving>X<substitution, case preserving> X<s, case preserving>Here's a lovely Perlish solution by Larry Rosler.  It exploitsproperties of bitwise xor on ASCII strings.	$_= "this is a TEsT case";	$old = 'test';	$new = 'success';	s{(\Q$old\E)}	{ uc $new | (uc $1 ^ $1) .		(uc(substr $1, -1) ^ substr $1, -1) x		(length($new) - length $1)	}egi;	print;And here it is as a subroutine, modeled after the above:	sub preserve_case($$) {		my ($old, $new) = @_;		my $mask = uc $old ^ $old;		uc $new | $mask .			substr($mask, -1) x (length($new) - length($old))    }	$a = "this is a TEsT case";	$a =~ s/(test)/preserve_case($1, "success")/egi;	print "$a\n";This prints:	this is a SUcCESS caseAs an alternative, to keep the case of the replacement word if it islonger than the original, you can use this code, by Jeff Pinyan:	sub preserve_case {		my ($from, $to) = @_;		my ($lf, $lt) = map length, @_;		if ($lt < $lf) { $from = substr $from, 0, $lt }		else { $from .= substr $to, $lf }		return uc $to | ($from ^ uc $from);		}This changes the sentence to "this is a SUcCess case."Just to show that C programmers can write C in any programming language,if you prefer a more C-like solution, the following script makes thesubstitution have the same case, letter by letter, as the original.(It also happens to run about 240% slower than the Perlish solution runs.)If the substitution has more characters than the string being substituted,the case of the last character is used for the rest of the substitution.	# Original by Nathan Torkington, massaged by Jeffrey Friedl	#	sub preserve_case($$)	{		my ($old, $new) = @_;		my ($state) = 0; # 0 = no change; 1 = lc; 2 = uc		my ($i, $oldlen, $newlen, $c) = (0, length($old), length($new));		my ($len) = $oldlen < $newlen ? $oldlen : $newlen;		for ($i = 0; $i < $len; $i++) {			if ($c = substr($old, $i, 1), $c =~ /[\W\d_]/) {				$state = 0;			} elsif (lc $c eq $c) {				substr($new, $i, 1) = lc(substr($new, $i, 1));				$state = 1;			} else {				substr($new, $i, 1) = uc(substr($new, $i, 1));				$state = 2;			}		}		# finish up with any remaining new (for when new is longer than old)		if ($newlen > $oldlen) {			if ($state == 1) {				substr($new, $oldlen) = lc(substr($new, $oldlen));			} elsif ($state == 2) {				substr($new, $oldlen) = uc(substr($new, $oldlen));			}		}		return $new;	}=head2 How can I make C<\w> match national character sets?X<\w>Put C<use locale;> in your script.  The \w character class is takenfrom the current locale.See L<perllocale> for details.=head2 How can I match a locale-smart version of C</[a-zA-Z]/>?X<alpha>You can use the POSIX character class syntax C</[[:alpha:]]/>documented in L<perlre>.No matter which locale you are in, the alphabetic characters arethe characters in \w without the digits and the underscore.As a regex, that looks like C</[^\W\d_]/>.  Its complement,the non-alphabetics, is then everything in \W along withthe digits and the underscore, or C</[\W\d_]/>.=head2 How can I quote a variable to use in a regex?X<regex, escaping> X<regexp, escaping> X<regular expression, escaping>The Perl parser will expand $variable and @variable references inregular expressions unless the delimiter is a single quote.  Remember,too, that the right-hand side of a C<s///> substitution is considereda double-quoted string (see L<perlop> for more details).  Rememberalso that any regex special characters will be acted on unless youprecede the substitution with \Q.  Here's an example:	$string = "Placido P. Octopus";	$regex  = "P.";	$string =~ s/$regex/Polyp/;	# $string is now "Polypacido P. Octopus"Because C<.> is special in regular expressions, and can match anysingle character, the regex C<P.> here has matched the <Pl> in theoriginal string.To escape the special meaning of C<.>, we use C<\Q>:	$string = "Placido P. Octopus";	$regex  = "P.";	$string =~ s/\Q$regex/Polyp/;	# $string is now "Placido Polyp Octopus"

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?