perlpacktut.pod

来自「视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.」· POD 代码 · 共 1,222 行 · 第 1/4 页

POD
1,222
字号
=head1 NAMEperlpacktut - tutorial on C<pack> and C<unpack>=head1 DESCRIPTIONC<pack> and C<unpack> are two functions for transforming data accordingto a user-defined template, between the guarded way Perl stores valuesand some well-defined representation as might be required in the environment of a Perl program. Unfortunately, they're also two of the most misunderstood and most often overlooked functions that Perlprovides. This tutorial will demystify them for you.=head1 The Basic PrincipleMost programming languages don't shelter the memory where variables arestored. In C, for instance, you can take the address of some variable,and the C<sizeof> operator tells you how many bytes are allocated tothe variable. Using the address and the size, you may access the storageto your heart's content.In Perl, you just can't access memory at random, but the structural andrepresentational conversion provided by C<pack> and C<unpack> is anexcellent alternative. The C<pack> function converts values to a bytesequence containing representations according to a given specification,the so-called "template" argument. C<unpack> is the reverse process,deriving some values from the contents of a string of bytes. (Be cautioned,however, that not all that has been packed together can be neatly unpacked - a very common experience as seasoned travellers are likely to confirm.)Why, you may ask, would you need a chunk of memory containing some valuesin binary representation? One good reason is input and output accessingsome file, a device, or a network connection, whereby this binaryrepresentation is either forced on you or will give you some benefitin processing. Another cause is passing data to some system call thatis not available as a Perl function: C<syscall> requires you to provideparameters stored in the way it happens in a C program. Even text processing (as shown in the next section) may be simplified with judicious usage of these two functions.To see how (un)packing works, we'll start with a simple templatecode where the conversion is in low gear: between the contents of a bytesequence and a string of hexadecimal digits. Let's use C<unpack>, sincethis is likely to remind you of a dump program, or some desperate lastmessage unfortunate programs are wont to throw at you before they expireinto the wild blue yonder. Assuming that the variable C<$mem> holds a sequence of bytes that we'd like to inspect without assuming anything about its meaning, we can write   my( $hex ) = unpack( 'H*', $mem );   print "$hex\n";whereupon we might see something like this, with each pair of hex digitscorresponding to a byte:   41204d414e204120504c414e20412043414e414c2050414e414d41What was in this chunk of memory? Numbers, characters, or a mixture ofboth? Assuming that we're on a computer where ASCII (or some similar)encoding is used: hexadecimal values in the range C<0x40> - C<0x5A>indicate an uppercase letter, and C<0x20> encodes a space. So we mightassume it is a piece of text, which some are able to read like a tabloid;but others will have to get hold of an ASCII table and relive thatfirstgrader feeling. Not caring too much about which way to read this,we note that C<unpack> with the template code C<H> converts the contentsof a sequence of bytes into the customary hexadecimal notation. Since"a sequence of" is a pretty vague indication of quantity, C<H> has beendefined to convert just a single hexadecimal digit unless it is followedby a repeat count. An asterisk for the repeat count means to use whateverremains.The inverse operation - packing byte contents from a string of hexadecimaldigits - is just as easily written. For instance:   my $s = pack( 'H2' x 10, map { "3$_" } ( 0..9 ) );   print "$s\n";Since we feed a list of ten 2-digit hexadecimal strings to C<pack>, thepack template should contain ten pack codes. If this is run on a computerwith ASCII character coding, it will print C<0123456789>.=head1 Packing TextLet's suppose you've got to read in a data file like this:    Date      |Description                | Income|Expenditure    01/24/2001 Ahmed's Camel Emporium                  1147.99    01/28/2001 Flea spray                                24.99    01/29/2001 Camel rides to tourists      235.00How do we do it? You might think first to use C<split>; however, sinceC<split> collapses blank fields, you'll never know whether a record wasincome or expenditure. Oops. Well, you could always use C<substr>:    while (<>) {         my $date   = substr($_,  0, 11);        my $desc   = substr($_, 12, 27);        my $income = substr($_, 40,  7);        my $expend = substr($_, 52,  7);        ...    }It's not really a barrel of laughs, is it? In fact, it's worse than itmay seem; the eagle-eyed may notice that the first field should only be10 characters wide, and the error has propagated right through the othernumbers - which we've had to count by hand. So it's error-prone as wellas horribly unfriendly.Or maybe we could use regular expressions:    while (<>) {         my($date, $desc, $income, $expend) =             m|(\d\d/\d\d/\d{4}) (.{27}) (.{7})(.*)|;        ...    }Urgh. Well, it's a bit better, but - well, would you want to maintainthat?Hey, isn't Perl supposed to make this sort of thing easy? Well, it does,if you use the right tools. C<pack> and C<unpack> are designed to helpyou out when dealing with fixed-width data like the above. Let's have alook at a solution with C<unpack>:    while (<>) {         my($date, $desc, $income, $expend) = unpack("A10xA27xA7A*", $_);        ...    }That looks a bit nicer; but we've got to take apart that weird template.Where did I pull that out of? OK, let's have a look at some of our data again; in fact, we'll includethe headers, and a handy ruler so we can keep track of where we are.             1         2         3         4         5            1234567890123456789012345678901234567890123456789012345678    Date      |Description                | Income|Expenditure    01/28/2001 Flea spray                                24.99    01/29/2001 Camel rides to tourists      235.00From this, we can see that the date column stretches from column 1 tocolumn 10 - ten characters wide. The C<pack>-ese for "character" isC<A>, and ten of them are C<A10>. So if we just wanted to extract thedates, we could say this:    my($date) = unpack("A10", $_);OK, what's next? Between the date and the description is a blank column;we want to skip over that. The C<x> template means "skip forward", so wewant one of those. Next, we have another batch of characters, from 12 to38. That's 27 more characters, hence C<A27>. (Don't make the fenceposterror - there are 27 characters between 12 and 38, not 26. Count 'em!)Now we skip another character and pick up the next 7 characters:    my($date,$description,$income) = unpack("A10xA27xA7", $_);Now comes the clever bit. Lines in our ledger which are just income andnot expenditure might end at column 46. Hence, we don't want to tell ourC<unpack> pattern that we B<need> to find another 12 characters; we'lljust say "if there's anything left, take it". As you might guess fromregular expressions, that's what the C<*> means: "use everythingremaining".=over 3=item *Be warned, though, that unlike regular expressions, if the C<unpack>template doesn't match the incoming data, Perl will scream and die.=backHence, putting it all together:    my($date,$description,$income,$expend) = unpack("A10xA27xA7xA*", $_);Now, that's our data parsed. I suppose what we might want to do now istotal up our income and expenditure, and add another line to the end ofour ledger - in the same format - saying how much we've brought in andhow much we've spent:    while (<>) {        my($date, $desc, $income, $expend) = unpack("A10xA27xA7xA*", $_);        $tot_income += $income;        $tot_expend += $expend;    }    $tot_income = sprintf("%.2f", $tot_income); # Get them into     $tot_expend = sprintf("%.2f", $tot_expend); # "financial" format    $date = POSIX::strftime("%m/%d/%Y", localtime);     # OK, let's go:    print pack("A10xA27xA7xA*", $date, "Totals", $tot_income, $tot_expend);Oh, hmm. That didn't quite work. Let's see what happened:    01/24/2001 Ahmed's Camel Emporium                   1147.99    01/28/2001 Flea spray                                 24.99    01/29/2001 Camel rides to tourists     1235.00    03/23/2001Totals                     1235.001172.98OK, it's a start, but what happened to the spaces? We put C<x>, didn'twe? Shouldn't it skip forward? Let's look at what L<perlfunc/pack> says:    x   A null byte.Urgh. No wonder. There's a big difference between "a null byte",character zero, and "a space", character 32. Perl's put somethingbetween the date and the description - but unfortunately, we can't seeit! What we actually need to do is expand the width of the fields. The C<A>format pads any non-existent characters with spaces, so we can use theadditional spaces to line up our fields, like this:    print pack("A11 A28 A8 A*", $date, "Totals", $tot_income, $tot_expend);(Note that you can put spaces in the template to make it more readable,but they don't translate to spaces in the output.) Here's what we gotthis time:    01/24/2001 Ahmed's Camel Emporium                   1147.99    01/28/2001 Flea spray                                 24.99    01/29/2001 Camel rides to tourists     1235.00    03/23/2001 Totals                      1235.00 1172.98That's a bit better, but we still have that last column which needs tobe moved further over. There's an easy way to fix this up:unfortunately, we can't get C<pack> to right-justify our fields, but wecan get C<sprintf> to do it:    $tot_income = sprintf("%.2f", $tot_income);     $tot_expend = sprintf("%12.2f", $tot_expend);    $date = POSIX::strftime("%m/%d/%Y", localtime);     print pack("A11 A28 A8 A*", $date, "Totals", $tot_income, $tot_expend);This time we get the right answer:    01/28/2001 Flea spray                                 24.99    01/29/2001 Camel rides to tourists     1235.00    03/23/2001 Totals                      1235.00      1172.98So that's how we consume and produce fixed-width data. Let's recap whatwe've seen of C<pack> and C<unpack> so far:=over 3=item *Use C<pack> to go from several pieces of data to one fixed-widthversion; use C<unpack> to turn a fixed-width-format string into severalpieces of data. =item *The pack format C<A> means "any character"; if you're C<pack>ing andyou've run out of things to pack, C<pack> will fill the rest up withspaces.=item *C<x> means "skip a byte" when C<unpack>ing; when C<pack>ing, it means"introduce a null byte" - that's probably not what you mean if you'redealing with plain text.=item *You can follow the formats with numbers to say how many charactersshould be affected by that format: C<A12> means "take 12 characters";C<x6> means "skip 6 bytes" or "character 0, 6 times".=item *Instead of a number, you can use C<*> to mean "consume everything elseleft". B<Warning>: when packing multiple pieces of data, C<*> only means"consume all of the current piece of data". That's to say    pack("A*A*", $one, $two)packs all of C<$one> into the first C<A*> and then all of C<$two> intothe second. This is a general principle: each format charactercorresponds to one piece of data to be C<pack>ed.=back=head1 Packing NumbersSo much for textual data. Let's get onto the meaty stuff that C<pack>and C<unpack> are best at: handling binary formats for numbers. There is,of course, not just one binary format  - life would be too simple - butPerl will do all the finicky labor for you.=head2 Integers

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?