📄 perlfaq4.pod
字号:
Perl is just as Y2K compliant as your pencil--no more, and no less.Can you use your pencil to write a non-Y2K-compliant memo? Of courseyou can. Is that the pencil's fault? Of course it isn't.The date and time functions supplied with Perl (gmtime and localtime)supply adequate information to determine the year well beyond 2000(2038 is when trouble strikes for 32-bit machines). The year returnedby these functions when used in a list context is the year minus 1900.For years between 1910 and 1999 this I<happens> to be a 2-digit decimalnumber. To avoid the year 2000 problem simply do not treat the year asa 2-digit number. It isn't.When gmtime() and localtime() are used in scalar context they returna timestamp string that contains a fully-expanded year. For example,C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:002001". There's no year 2000 problem here.That doesn't mean that Perl can't be used to create non-Y2K compliantprograms. It can. But so can your pencil. It's the fault of the user,not the language. At the risk of inflaming the NRA: ``Perl doesn'tbreak Y2K, people do.'' See http://language.perl.com/news/y2k.html fora longer exposition.=head1 Data: Strings=head2 How do I validate input?The answer to this question is usually a regular expression, perhapswith auxiliary logic. See the more specific questions (numbers, mailaddresses, etc.) for details.=head2 How do I unescape a string?It depends just what you mean by ``escape''. URL escapes are dealtwith in L<perlfaq9>. Shell escapes with the backslash (C<\>)character are removed with s/\\(.)/$1/g;This won't expand C<"\n"> or C<"\t"> or any other special escapes.=head2 How do I remove consecutive pairs of characters?To turn C<"abbcccd"> into C<"abccd">: s/(.)\1/$1/g; # add /s to include newlinesHere's a solution that turns "abbcccd" to "abcd": y///cs; # y == tr, but shorter :-)=head2 How do I expand function calls in a string?This is documented in L<perlref>. In general, this is fraught withquoting and readability problems, but it is possible. To interpolatea subroutine call (in list context) into a string: print "My sub returned @{[mysub(1,2,3)]} that time.\n";If you prefer scalar context, similar chicanery is also useful forarbitrary expressions: print "That yields ${\($n + 5)} widgets\n";Version 5.004 of Perl had a bug that gave list context to theexpression in C<${...}>, but this is fixed in version 5.005.See also ``How can I expand variables in text strings?'' in thissection of the FAQ.=head2 How do I find matching/nesting anything?This isn't something that can be done in one regular expression, nomatter how complicated. To find something between two singlecharacters, a pattern like C</x([^x]*)x/> will get the interveningbits in $1. For multiple ones, then something more likeC</alpha(.*?)omega/> would be needed. But none of these deals withnested patterns, nor can they. For that you'll have to write aparser.If you are serious about writing a parser, there are a number ofmodules or oddities that will make your life a lot easier. There arethe CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced;and the byacc program.One simple destructive, inside-out approach that you might try is topull out the smallest nesting parts one at a time: while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) { # do something with $1 } A more complicated and sneaky approach is to make Perl's regularexpression engine do it for you. This is courtesy Dean Inada, andrather has the nature of an Obfuscated Perl Contest entry, but itreally does work: # $_ contains the string to parse # BEGIN and END are the opening and closing markers for the # nested text. @( = ('(',''); @) = (')',''); ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs; @$ = (eval{/$re/},$@!~/unmatched/); print join("\n",@$[0..$#$]) if( $$[-1] );=head2 How do I reverse a string?Use reverse() in scalar context, as documented inL<perlfunc/reverse>. $reversed = reverse $string;=head2 How do I expand tabs in a string?You can do it yourself: 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;Or you can just use the Text::Tabs module (part of the standard Perldistribution). use Text::Tabs; @expanded_lines = expand(@lines_with_tabs);=head2 How do I reformat a paragraph?Use Text::Wrap (part of the standard Perl distribution): use Text::Wrap; print wrap("\t", ' ', @paragraphs);The paragraphs you give to Text::Wrap should not contain embeddednewlines. Text::Wrap doesn't justify the lines (flush-right).=head2 How can I access/change the first N letters of a string?There are many ways. If you just want to grab a copy, usesubstr(): $first_byte = substr($a, 0, 1);If you want to modify part of a string, the simplest way is often touse substr() as an lvalue: substr($a, 0, 3) = "Tom";Although those with a pattern matching kind of thought process willlikely prefer $a =~ s/^.../Tom/;=head2 How do I change the Nth occurrence of something?You have to keep track of N yourself. For example, let's say you wantto change the fifth occurrence of C<"whoever"> or C<"whomever"> intoC<"whosoever"> or C<"whomsoever">, case insensitively. Theseall assume that $_ contains the string to be altered. $count = 0; s{((whom?)ever)}{ ++$count == 5 # is it the 5th? ? "${2}soever" # yes, swap : $1 # renege and leave it there }ige;In the more general case, you can use the C</g> modifier in a C<while>loop, keeping count of matches. $WANT = 3; $count = 0; $_ = "One fish two fish red fish blue fish"; while (/(\w+)\s+fish\b/gi) { if (++$count == $WANT) { print "The third fish is a $1 one.\n"; } }That prints out: C<"The third fish is a red one."> You can also use arepetition count and repeated pattern like this: /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;=head2 How can I count the number of occurrences of a substring within a string?There are a number of ways, with varying efficiency. If you want acount of a certain single character (X) within a string, you can use theC<tr///> function like so: $string = "ThisXlineXhasXsomeXx'sXinXit"; $count = ($string =~ tr/X//); print "There are $count X characters in the string";This is fine if you are just looking for a single character. However,if you are trying to count multiple character substrings within alarger string, C<tr///> won't work. What you can do is wrap a while()loop around a global pattern match. For example, let's count negativeintegers: $string = "-9 55 48 -2 23 -76 4 14 -44"; while ($string =~ /-\d+/g) { $count++ } print "There are $count negative numbers in the string";=head2 How do I capitalize all the words on one line?To make the first letter of each word upper case: $line =~ s/\b(\w)/\U$1/g;This has the strange effect of turning "C<don't do it>" into "C<Don'TDo It>". Sometimes you might want this. Other times you might need amore thorough solution (Suggested by brian d. foy): $string =~ s/ ( (^\w) #at the beginning of the line | # or (\s\w) #preceded by whitespace ) /\U$1/xg; $string =~ /([\w']+)/\u\L$1/g;To make the whole line upper case: $line = uc($line);To force each word to be lower case, with the first letter upper case: $line =~ s/(\w+)/\u\L$1/g;You can (and probably should) enable locale awareness of thosecharacters by placing a C<use locale> pragma in your program.See L<perllocale> for endless details on locales.This is sometimes referred to as putting something into "titlecase", but that's not quite accurate. Consider the propercapitalization of the movie I<Dr. Strangelove or: How I Learned toStop Worrying and Love the Bomb>, for example.=head2 How can I split a [character] delimited string except when inside[character]? (Comma-separated files)Take the example case of trying to split a string that is comma-separatedinto its different fields. (We'll pretend you said comma-separated, notcomma-delimited, which is different and almost never what you mean.) Youcan't use C<split(/,/)> because you shouldn't split if the comma is insidequotes. For example, take a data line like this: SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"Due to the restriction of the quotes, this is a fairly complexproblem. Thankfully, we have Jeffrey Friedl, author of a highlyrecommended book on regular expressions, to handle these for us. Hesuggests (assuming your string is contained in $text): @new = (); push(@new, $+) while $text =~ m{ "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes | ([^,]+),? | , }gx; push(@new, undef) if substr($text,-1,1) eq ',';If you want to represent quotation marks inside aquotation-mark-delimited field, escape them with backslashes (eg,C<"like \"this\"">. Unescaping them is a task addressed earlier inthis section.Alternatively, the Text::ParseWords module (part of the standard Perldistribution) lets you say: use Text::ParseWords; @new = quotewords(",", 0, $text);There's also a Text::CSV (Comma-Separated Values) module on CPAN.=head2 How do I strip blank space from the beginning/end of a string?Although the simplest approach would seem to be $string =~ s/^\s*(.*?)\s*$/$1/;not only is this unnecessarily slow and destructive, it also fails withembedded newlines. It is much faster to do this operation in two steps: $string =~ s/^\s+//; $string =~ s/\s+$//;Or more nicely written as: for ($string) { s/^\s+//; s/\s+$//; }This idiom takes advantage of the C<foreach> loop's aliasingbehavior to factor out common code. You can do thison several strings at once, or arrays, or even the values of a hash if you use a slice: # trim whitespace in the scalar, the array, # and all the values in the hash foreach ($scalar, @array, @hash{keys %hash}) { s/^\s+//; s/\s+$//; }=head2 How do I pad a string with blanks or pad a number with zeroes?(This answer contributed by Uri Guttman, with kibitzing fromBart Lateur.) In the following examples, C<$pad_len> is the length to which you wishto pad the string, C<$text> or C<$num> contains the string to be padded,and C<$pad_char> contains the padding character. You can use a singlecharacter string constant instead of the C<$pad_char> variable if youknow what it is in advance. And in the same way you can use an integer inplace of C<$pad_len> if you know the pad length in advance.The simplest method uses the C<sprintf> function. It can pad on the leftor right with blanks and on the left with zeroes and it will nottruncate the result. The C<pack> function can only pad strings on theright with blanks and it will truncate the result to a maximum length ofC<$pad_len>. # Left padding a string with blanks (no truncation): $padded = sprintf("%${pad_len}s", $text); # Right padding a string with blanks (no truncation): $padded = sprintf("%-${pad_len}s", $text); # Left padding a number with 0 (no truncation): $padded = sprintf("%0${pad_len}d", $num); # Right padding a string with blanks using pack (will truncate): $padded = pack("A$pad_len",$text);If you need to pad with a character other than blank or zero you can useone of the following methods. They all generate a pad string with theC<x> operator and combine that with C<$text>. These methods donot truncate C<$text>.Left and right padding with any character, creating a new string: $padded = $pad_char x ( $pad_len - length( $text ) ) . $text; $padded = $text . $pad_char x ( $pad_len - length( $text ) );Left and right padding with any character, modifying C<$text> directly: substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) ); $text .= $pad_char x ( $pad_len - length( $text ) );=head2 How do I extract selected columns from a string?Use substr() or unpack(), both documented in L<perlfunc>.If you prefer thinking in terms of columns instead of widths, you can use this kind of thing: # determine the unpack format needed to split Linux ps output # arguments are cut columns my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72); sub cut2fmt { my(@positions) = @_; my $template = '';
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -