perlpacktut.pod

来自「视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.」· POD 代码 · 共 1,222 行 · 第 1/4 页

POD
1,222
字号
Packing and unpacking numbers implies conversion to and from someI<specific> binary representation. Leaving floating point numbersaside for the moment, the salient properties of any such representationare:=over 4=item *the number of bytes used for storing the integer,=item *whether the contents are interpreted as a signed or unsigned number,=item *the byte ordering: whether the first byte is the least or mostsignificant byte (or: little-endian or big-endian, respectively).=backSo, for instance, to pack 20302 to a signed 16 bit integer in yourcomputer's representation you write   my $ps = pack( 's', 20302 );Again, the result is a string, now containing 2 bytes. If you print this string (which is, generally, not recommended) you might seeC<ON> or C<NO> (depending on your system's byte ordering) - or somethingentirely different if your computer doesn't use ASCII character encoding.Unpacking C<$ps> with the same template returns the original integer value:   my( $s ) = unpack( 's', $ps );This is true for all numeric template codes. But don't expect miracles:if the packed value exceeds the allotted byte capacity, high order bitsare silently discarded, and unpack certainly won't be able to pull themback out of some magic hat. And, when you pack using a signed templatecode such as C<s>, an excess value may result in the sign bitgetting set, and unpacking this will smartly return a negative value.16 bits won't get you too far with integers, but there is C<l> and C<L>for signed and unsigned 32-bit integers. And if this is not enough andyour system supports 64 bit integers you can push the limits much closerto infinity with pack codes C<q> and C<Q>. A notable exception is providedby pack codes C<i> and C<I> for signed and unsigned integers of the "local custom" variety: Such an integer will take up as many bytes asa local C compiler returns for C<sizeof(int)>, but it'll use I<at least>32 bits.Each of the integer pack codes C<sSlLqQ> results in a fixed number of bytes,no matter where you execute your program. This may be useful for some applications, but it does not provide for a portable way to pass data structures between Perl and C programs (bound to happen when you call XS extensions or the Perl function C<syscall>), or when you read orwrite binary files. What you'll need in this case are template codes thatdepend on what your local C compiler compiles when you code C<short> orC<unsigned long>, for instance. These codes and their correspondingbyte lengths are shown in the table below.  Since the C standard leavesmuch leeway with respect to the relative sizes of these data types, actualvalues may vary, and that's why the values are given as expressions inC and Perl. (If you'd like to use values from C<%Config> in your programyou have to import it with C<use Config>.)   signed unsigned  byte length in C   byte length in Perl            s!     S!      sizeof(short)      $Config{shortsize}     i!     I!      sizeof(int)        $Config{intsize}     l!     L!      sizeof(long)       $Config{longsize}     q!     Q!      sizeof(long long)  $Config{longlongsize}The C<i!> and C<I!> codes aren't different from C<i> and C<I>; they aretolerated for completeness' sake.=head2 Unpacking a Stack FrameRequesting a particular byte ordering may be necessary when you work withbinary data coming from some specific architecture whereas your program couldrun on a totally different system. As an example, assume you have 24 bytescontaining a stack frame as it happens on an Intel 8086:      +---------+        +----+----+               +---------+ TOS: |   IP    |  TOS+4:| FL | FH | FLAGS  TOS+14:|   SI    |      +---------+        +----+----+               +---------+      |   CS    |        | AL | AH | AX            |   DI    |      +---------+        +----+----+               +---------+                         | BL | BH | BX            |   BP    |                         +----+----+               +---------+                         | CL | CH | CX            |   DS    |                         +----+----+               +---------+                         | DL | DH | DX            |   ES    |                         +----+----+               +---------+First, we note that this time-honored 16-bit CPU uses little-endian order,and that's why the low order byte is stored at the lower address. Tounpack such a (unsigned) short we'll have to use code C<v>. A repeatcount unpacks all 12 shorts:   my( $ip, $cs, $flags, $ax, $bx, $cd, $dx, $si, $di, $bp, $ds, $es ) =     unpack( 'v12', $frame );Alternatively, we could have used C<C> to unpack the individuallyaccessible byte registers FL, FH, AL, AH, etc.:   my( $fl, $fh, $al, $ah, $bl, $bh, $cl, $ch, $dl, $dh ) =     unpack( 'C10', substr( $frame, 4, 10 ) );It would be nice if we could do this in one fell swoop: unpack a short,back up a little, and then unpack 2 bytes. Since Perl I<is> nice, itproffers the template code C<X> to back up one byte. Putting this alltogether, we may now write:   my( $ip, $cs,       $flags,$fl,$fh,       $ax,$al,$ah, $bx,$bl,$bh, $cx,$cl,$ch, $dx,$dl,$dh,        $si, $di, $bp, $ds, $es ) =   unpack( 'v2' . ('vXXCC' x 5) . 'v5', $frame );(The clumsy construction of the template can be avoided - just read on!)  We've taken some pains to construct the template so that it matchesthe contents of our frame buffer. Otherwise we'd either get undefined values,or C<unpack> could not unpack all. If C<pack> runs out of items, it willsupply null strings (which are coerced into zeroes whenever the pack codesays so).=head2 How to Eat an Egg on a NetThe pack code for big-endian (high order byte at the lowest address) isC<n> for 16 bit and C<N> for 32 bit integers. You use these codesif you know that your data comes from a compliant architecture, but,surprisingly enough, you should also use these pack codes if youexchange binary data, across the network, with some system that youknow next to nothing about. The simple reason is that thisorder has been chosen as the I<network order>, and all standard-fearingprograms ought to follow this convention. (This is, of course, a sternbacking for one of the Lilliputian parties and may well influence thepolitical development there.) So, if the protocol expects you to senda message by sending the length first, followed by just so many bytes,you could write:   my $buf = pack( 'N', length( $msg ) ) . $msg;or even:   my $buf = pack( 'NA*', length( $msg ), $msg );and pass C<$buf> to your send routine. Some protocols demand that thecount should include the length of the count itself: then just add 4to the data length. (But make sure to read L<"Lengths and Widths"> beforeyou really code this!)=head2 Byte-order modifiersIn the previous sections we've learned how to use C<n>, C<N>, C<v> andC<V> to pack and unpack integers with big- or little-endian byte-order.While this is nice, it's still rather limited because it leaves out allkinds of signed integers as well as 64-bit integers. For example, if youwanted to unpack a sequence of signed big-endian 16-bit integers in aplatform-independent way, you would have to write:   my @data = unpack 's*', pack 'S*', unpack 'n*', $buf;This is ugly. As of Perl 5.9.2, there's a much nicer way to express yourdesire for a certain byte-order: the C<E<gt>> and C<E<lt>> modifiers.C<E<gt>> is the big-endian modifier, while C<E<lt>> is the little-endianmodifier. Using them, we could rewrite the above code as:   my @data = unpack 's>*', $buf;As you can see, the "big end" of the arrow touches the C<s>, which is anice way to remember that C<E<gt>> is the big-endian modifier. The sameobviously works for C<E<lt>>, where the "little end" touches the code.You will probably find these modifiers even more useful if you haveto deal with big- or little-endian C structures. Be sure to readL<"Packing and Unpacking C Structures"> for more on that.=head2 Floating point NumbersFor packing floating point numbers you have the choice between thepack codes C<f>, C<d>, C<F> and C<D>. C<f> and C<d> pack into (or unpackfrom) single-precision or double-precision representation as it is providedby your system. If your systems supports it, C<D> can be used to pack andunpack extended-precision floating point values (C<long double>), whichcan offer even more resolution than C<f> or C<d>. C<F> packs an C<NV>,which is the floating point type used by Perl internally. (Thereis no such thing as a network representation for reals, so if you wantto send your real numbers across computer boundaries, you'd better stickto ASCII representation, unless you're absolutely sure what's on the otherend of the line. For the even more adventuresome, you can use the byte-ordermodifiers from the previous section also on floating point codes.)=head1 Exotic Templates=head2 Bit StringsBits are the atoms in the memory world. Access to individual bits mayhave to be used either as a last resort or because it is the mostconvenient way to handle your data. Bit string (un)packing convertsbetween strings containing a series of C<0> and C<1> characters anda sequence of bytes each containing a group of 8 bits. This is almostas simple as it sounds, except that there are two ways the contents ofa byte may be written as a bit string. Let's have a look at an annotatedbyte:     7 6 5 4 3 2 1 0   +-----------------+   | 1 0 0 0 1 1 0 0 |   +-----------------+    MSB           LSBIt's egg-eating all over again: Some think that as a bit string this shouldbe written "10001100" i.e. beginning with the most significant bit, othersinsist on "00110001". Well, Perl isn't biased, so that's why we have two bitstring codes:   $byte = pack( 'B8', '10001100' ); # start with MSB   $byte = pack( 'b8', '00110001' ); # start with LSBIt is not possible to pack or unpack bit fields - just integral bytes.C<pack> always starts at the next byte boundary and "rounds up" to thenext multiple of 8 by adding zero bits as required. (If you do want bitfields, there is L<perlfunc/vec>. Or you could implement bit field handling at the character string level, using split, substr, andconcatenation on unpacked bit strings.)To illustrate unpacking for bit strings, we'll decompose a simplestatus register (a "-" stands for a "reserved" bit):   +-----------------+-----------------+   | S Z - A - P - C | - - - - O D I T |   +-----------------+-----------------+    MSB           LSB MSB           LSBConverting these two bytes to a string can be done with the unpack template C<'b16'>. To obtain the individual bit values from the bitstring we use C<split> with the "empty" separator pattern which dissectsinto individual characters. Bit values from the "reserved" positions aresimply assigned to C<undef>, a convenient notation for "I don't care wherethis goes".   ($carry, undef, $parity, undef, $auxcarry, undef, $zero, $sign,    $trace, $interrupt, $direction, $overflow) =      split( //, unpack( 'b16', $status ) );We could have used an unpack template C<'b12'> just as well, since thelast 4 bits can be ignored anyway. =head2 UuencodingAnother odd-man-out in the template alphabet is C<u>, which packs an"uuencoded string". ("uu" is short for Unix-to-Unix.) Chances are thatyou won't ever need this encoding technique which was invented to overcomethe shortcomings of old-fashioned transmission mediums that do not supportother than simple ASCII data. The essential recipe is simple: Take three bytes, or 24 bits. Split them into 4 six-packs, adding a space (0x20) to each. Repeat until all of the data is blended. Fold groups of 4 bytes into lines no longer than 60 and garnish them in front with the original byte count (incremented by 0x20) and a C<"\n"> at the end. - The C<pack> chef willprepare this for you, a la minute, when you select pack code C<u> on the menu:   my $uubuf = pack( 'u', $bindat );A repeat count after C<u> sets the number of bytes to put into anuuencoded line, which is the maximum of 45 by default, but could beset to some (smaller) integer multiple of three. C<unpack> simply ignoresthe repeat count.=head2 Doing SumsAn even stranger template code is C<%>E<lt>I<number>E<gt>. First, because it's used as a prefix to some other template code. Second, because itcannot be used in C<pack> at all, and third, in C<unpack>, doesn't return thedata as defined by the template code it precedes. Instead it'll give you aninteger of I<number> bits that is computed from the data value by doing sums. For numeric unpack codes, no big feat is achieved:    my $buf = pack( 'iii', 100, 20, 3 );    print unpack( '%32i3', $buf ), "\n";  # prints 123For string values, C<%> returns the sum of the byte values savingyou the trouble of a sum loop with C<substr> and C<ord>:    print unpack( '%32A*', "\x01\x10" ), "\n";  # prints 17Although the C<%> code is documented as returning a "checksum":don't put your trust in such values! Even when applied to a small numberof bytes, they won't guarantee a noticeable Hamming distance.In connection with C<b> or C<B>, C<%> simply adds bits, and this can be putto good use to count set bits efficiently:    my $bitcount = unpack( '%32b*', $mask );And an even parity bit can be determined like this:

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?