📄 perlfaq4.pod
字号:
=head1 NAME
perlfaq4 - Data Manipulation ($Revision: 1.26 $, $Date: 1998/08/05 12:04:00 $)
=head1 DESCRIPTION
The section of the FAQ answers question related to the manipulation
of data as numbers, dates, strings, arrays, hashes, and miscellaneous
data issues.
=head1 Data: Numbers
=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
The infinite set that a mathematician thinks of as the real numbers can
only be approximate on a computer, since the computer only has a finite
number of bits to store an infinite number of, um, numbers.
Internally, your computer represents floating-point numbers in binary.
Floating-point numbers read in from a file or appearing as literals
in your program are converted from their decimal floating-point
representation (eg, 19.95) to the internal binary representation.
However, 19.95 can't be precisely represented as a binary
floating-point number, just like 1/3 can't be exactly represented as a
decimal floating-point number. The computer's binary representation
of 19.95, therefore, isn't exactly 19.95.
When a floating-point number gets printed, the binary floating-point
representation is converted back to decimal. These decimal numbers
are displayed in either the format you specify with printf(), or the
current output format for numbers (see L<perlvar/"$#"> if you use
print. C<$#> has a different default value in Perl5 than it did in
Perl4. Changing C<$#> yourself is deprecated.
This affects B<all> computer languages that represent decimal
floating-point numbers in binary, not just Perl. Perl provides
arbitrary-precision decimal numbers with the Math::BigFloat module
(part of the standard Perl distribution), but mathematical operations
are consequently slower.
To get rid of the superfluous digits, just use a format (eg,
C<printf("%.2f", 19.95)>) to get the required precision.
See L<perlop/"Floating-point Arithmetic">.
=head2 Why isn't my octal data interpreted correctly?
Perl only understands octal and hex numbers as such when they occur
as literals in your program. If they are read in from somewhere and
assigned, no automatic conversion takes place. You must explicitly
use oct() or hex() if you want the values converted. oct() interprets
both hex ("0x350") numbers and octal ones ("0350" or even without the
leading "0", like "377"), while hex() only converts hexadecimal ones,
with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
This problem shows up most often when people try using chmod(), mkdir(),
umask(), or sysopen(), which all want permissions in octal.
chmod(644, $file); # WRONG -- perl -w catches this
chmod(0644, $file); # right
=head2 Does perl have a round function? What about ceil() and floor()? Trig functions?
Remember that int() merely truncates toward 0. For rounding to a
certain number of digits, sprintf() or printf() is usually the easiest
route.
printf("%.3f", 3.1415926535); # prints 3.142
The POSIX module (part of the standard perl distribution) implements
ceil(), floor(), and a number of other mathematical and trigonometric
functions.
use POSIX;
$ceil = ceil(3.5); # 4
$floor = floor(3.5); # 3
In 5.000 to 5.003 Perls, trigonometry was done in the Math::Complex
module. With 5.004, the Math::Trig module (part of the standard perl
distribution) implements the trigonometric functions. Internally it
uses the Math::Complex module and some functions can break out from
the real axis into the complex plane, for example the inverse sine of
2.
Rounding in financial applications can have serious implications, and
the rounding method used should be specified precisely. In these
cases, it probably pays not to trust whichever system rounding is
being used by Perl, but to instead implement the rounding function you
need yourself.
=head2 How do I convert bits into ints?
To turn a string of 1s and 0s like C<10110110> into a scalar containing
its binary value, use the pack() function (documented in
L<perlfunc/"pack">):
$decimal = pack('B8', '10110110');
Here's an example of going the other way:
$binary_string = join('', unpack('B*', "\x29"));
=head2 How do I multiply matrices?
Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
or the PDL extension (also available from CPAN).
=head2 How do I perform an operation on a series of integers?
To call a function on each element in an array, and collect the
results, use:
@results = map { my_func($_) } @array;
For example:
@triple = map { 3 * $_ } @single;
To call a function on each element of an array, but ignore the
results:
foreach $iterator (@array) {
&my_func($iterator);
}
To call a function on each integer in a (small) range, you B<can> use:
@results = map { &my_func($_) } (5 .. 25);
but you should be aware that the C<..> operator creates an array of
all integers in the range. This can take a lot of memory for large
ranges. Instead use:
@results = ();
for ($i=5; $i < 500_005; $i++) {
push(@results, &my_func($i));
}
=head2 How can I output Roman numerals?
Get the http://www.perl.com/CPAN/modules/by-module/Roman module.
=head2 Why aren't my random numbers random?
The short explanation is that you're getting pseudorandom numbers, not
random ones, because computers are good at being predictable and bad
at being random (despite appearances caused by bugs in your programs
:-). A longer explanation is available on
http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy of Tom
Phoenix. John von Neumann said, ``Anyone who attempts to generate
random numbers by deterministic means is, of course, living in a state
of sin.''
You should also check out the Math::TrulyRandom module from CPAN. It
uses the imperfections in your system's timer to generate random
numbers, but this takes quite a while. If you want a better
pseudorandom generator than comes with your operating system, look at
``Numerical Recipes in C'' at http://nr.harvard.edu/nr/bookc.html .
=head1 Data: Dates
=head2 How do I find the week-of-the-year/day-of-the-year?
The day of the year is in the array returned by localtime() (see
L<perlfunc/"localtime">):
$day_of_year = (localtime(time()))[7];
or more legibly (in 5.004 or higher):
use Time::localtime;
$day_of_year = localtime(time())->yday;
You can find the week of the year by dividing this by 7:
$week_of_year = int($day_of_year / 7);
Of course, this believes that weeks start at zero. The Date::Calc
module from CPAN has a lot of date calculation functions, including
day of the year, week of the year, and so on. Note that not
all business consider ``week 1'' to be the same; for example,
American business often consider the first week with a Monday
in it to be Work Week #1, despite ISO 8601, which consider
WW1 to be the frist week with a Thursday in it.
=head2 How can I compare two dates and find the difference?
If you're storing your dates as epoch seconds then simply subtract one
from the other. If you've got a structured date (distinct year, day,
month, hour, minute, seconds values) then use one of the Date::Manip
and Date::Calc modules from CPAN.
=head2 How can I take a string and turn it into epoch seconds?
If it's a regular enough string that it always has the same format,
you can split it up and pass the parts to C<timelocal> in the standard
Time::Local module. Otherwise, you should look into the Date::Calc
and Date::Manip modules from CPAN.
=head2 How can I find the Julian Day?
Neither Date::Manip nor Date::Calc deal with Julian days. Instead,
there is an example of Julian date calculation that should help you in
http://www.perl.com/CPAN/authors/David_Muir_Sharnoff/modules/Time/JulianDay.pm.gz
.
=head2 Does Perl have a year 2000 problem? Is Perl Y2K compliant?
Short answer: No, Perl does not have a Year 2000 problem. Yes,
Perl is Y2K compliant. The programmers you're hired to use it,
however, probably are not.
Long answer: Perl is just as Y2K compliant as your pencil--no more,
and no less. The date and time functions supplied with perl (gmtime
and localtime) supply adequate information to determine the year well
beyond 2000 (2038 is when trouble strikes for 32-bit machines). The
year returned by these functions when used in an array context is the
year minus 1900. For years between 1910 and 1999 this I<happens> to
be a 2-digit decimal number. To avoid the year 2000 problem simply do
not treat the year as a 2-digit number. It isn't.
When gmtime() and localtime() are used in scalar context they return
a timestamp string that contains a fully-expanded year. For example,
C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
2001". There's no year 2000 problem here.
That doesn't mean that Perl can't be used to create non-Y2K compliant
programs. It can. But so can your pencil. It's the fault of the user,
not the language. At the risk of inflaming the NRA: ``Perl doesn't
break Y2K, people do.'' See http://language.perl.com/news/y2k.html for
a longer exposition.
=head1 Data: Strings
=head2 How do I validate input?
The answer to this question is usually a regular expression, perhaps
with auxiliary logic. See the more specific questions (numbers, mail
addresses, etc.) for details.
=head2 How do I unescape a string?
It depends just what you mean by ``escape''. URL escapes are dealt
with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
character are removed with:
s/\\(.)/$1/g;
This won't expand C<"\n"> or C<"\t"> or any other special escapes.
=head2 How do I remove consecutive pairs of characters?
To turn C<"abbcccd"> into C<"abccd">:
s/(.)\1/$1/g;
=head2 How do I expand function calls in a string?
This is documented in L<perlref>. In general, this is fraught with
quoting and readability problems, but it is possible. To interpolate
a subroutine call (in list context) into a string:
print "My sub returned @{[mysub(1,2,3)]} that time.\n";
If you prefer scalar context, similar chicanery is also useful for
arbitrary expressions:
print "That yields ${\($n + 5)} widgets\n";
Version 5.004 of Perl had a bug that gave list context to the
expression in C<${...}>, but this is fixed in version 5.005.
See also ``How can I expand variables in text strings?'' in this
section of the FAQ.
=head2 How do I find matching/nesting anything?
This isn't something that can be done in one regular expression, no
matter how complicated. To find something between two single
characters, a pattern like C</x([^x]*)x/> will get the intervening
bits in $1. For multiple ones, then something more like
C</alpha(.*?)omega/> would be needed. But none of these deals with
nested patterns, nor can they. For that you'll have to write a
parser.
If you are serious about writing a parser, there are a number of
modules or oddities that will make your life a lot easier. There is
the CPAN module Parse::RecDescent, the standard module Text::Balanced,
the byacc program, and Mark-Jason Dominus's excellent I<py> tool at
http://www.plover.com/~mjd/perl/py/ .
One simple destructive, inside-out approach that you might try is to
pull out the smallest nesting parts one at a time:
while (s//BEGIN((?:(?!BEGIN)(?!END).)*)END/gs) {
# do something with $1
}
=head2 How do I reverse a string?
Use reverse() in scalar context, as documented in
L<perlfunc/reverse>.
$reversed = reverse $string;
=head2 How do I expand tabs in a string?
You can do it yourself:
1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
Or you can just use the Text::Tabs module (part of the standard perl
distribution).
use Text::Tabs;
@expanded_lines = expand(@lines_with_tabs);
=head2 How do I reformat a paragraph?
Use Text::Wrap (part of the standard perl distribution):
use Text::Wrap;
print wrap("\t", ' ', @paragraphs);
The paragraphs you give to Text::Wrap should not contain embedded
newlines. Text::Wrap doesn't justify the lines (flush-right).
=head2 How can I access/change the first N letters of a string?
There are many ways. If you just want to grab a copy, use
substr():
$first_byte = substr($a, 0, 1);
If you want to modify part of a string, the simplest way is often to
use substr() as an lvalue:
substr($a, 0, 3) = "Tom";
Although those with a pattern matching kind of thought process will
likely prefer:
$a =~ s/^.../Tom/;
=head2 How do I change the Nth occurrence of something?
You have to keep track of N yourself. For example, let's say you want
to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
C<"whosoever"> or C<"whomsoever">, case insensitively.
$count = 0;
s{((whom?)ever)}{
++$count == 5 # is it the 5th?
? "${2}soever" # yes, swap
: $1 # renege and leave it there
}igex;
In the more general case, you can use the C</g> modifier in a C<while>
loop, keeping count of matches.
$WANT = 3;
$count = 0;
while (/(\w+)\s+fish\b/gi) {
if (++$count == $WANT) {
print "The third fish is a $1 one.\n";
# Warning: don't `last' out of this loop
}
}
That prints out: C<"The third fish is a red one."> You can also use a
repetition count and repeated pattern like this:
/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
=head2 How can I count the number of occurrences of a substring within a string?
There are a number of ways, with varying efficiency: If you want a
count of a certain single character (X) within a string, you can use the
C<tr///> function like so:
$string = "ThisXlineXhasXsomeXx'sXinXit":
$count = ($string =~ tr/X//);
print "There are $count X charcters in the string";
This is fine if you are just looking for a single character. However,
if you are trying to count multiple character substrings within a
larger string, C<tr///> won't work. What you can do is wrap a while()
loop around a global pattern match. For example, let's count negative
integers:
$string = "-9 55 48 -2 23 -76 4 14 -44";
while ($string =~ /-\d+/g) { $count++ }
print "There are $count negative numbers in the string";
=head2 How do I capitalize all the words on one line?
To make the first letter of each word upper case:
$line =~ s/\b(\w)/\U$1/g;
This has the strange effect of turning "C<don't do it>" into "C<Don'T
Do It>". Sometimes you might want this, instead (Suggested by Brian
Foy):
$string =~ s/ (
(^\w) #at the beginning of the line
| # or
(\s\w) #preceded by whitespace
)
/\U$1/xg;
$string =~ /([\w']+)/\u\L$1/g;
To make the whole line upper case:
$line = uc($line);
To force each word to be lower case, with the first letter upper case:
$line =~ s/(\w+)/\u\L$1/g;
You can (and probably should) enable locale awareness of those
characters by placing a C<use locale> pragma in your program.
See L<perllocale> for endless details on locales.
=head2 How can I split a [character] delimited string except when inside
[character]? (Comma-separated files)
Take the example case of trying to split a string that is comma-separated
into its different fields. (We'll pretend you said comma-separated, not
comma-delimited, which is different and almost never what you mean.) You
can't use C<split(/,/)> because you shouldn't split if the comma is inside
quotes. For example, take a data line like this:
SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
Due to the restriction of the quotes, this is a fairly complex
problem. Thankfully, we have Jeffrey Friedl, author of a highly
recommended book on regular expressions, to handle these for us. He
suggests (assuming your string is contained in $text):
@new = ();
push(@new, $+) while $text =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
| ([^,]+),?
| ,
}gx;
push(@new, undef) if substr($text,-1,1) eq ',';
If you want to represent quotation marks inside a
quotation-mark-delimited field, escape them with backslashes (eg,
C<"like \"this\"">. Unescaping them is a task addressed earlier in
this section.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -