📄 perlpacktut.1
字号:
.Ve.PP\&\s-1OK\s0, it's a start, but what happened to the spaces? We put \f(CW\*(C`x\*(C'\fR, didn'twe? Shouldn't it skip forward? Let's look at what \*(L"pack\*(R" in perlfunc says:.PP.Vb 1\& x A null byte..Ve.PPUrgh. No wonder. There's a big difference between \*(L"a null byte\*(R",character zero, and \*(L"a space\*(R", character 32. Perl's put somethingbetween the date and the description \- but unfortunately, we can't seeit!.PPWhat we actually need to do is expand the width of the fields. The \f(CW\*(C`A\*(C'\fRformat pads any non-existent characters with spaces, so we can use theadditional spaces to line up our fields, like this:.PP.Vb 1\& print pack("A11 A28 A8 A*", $date, "Totals", $tot_income, $tot_expend);.Ve.PP(Note that you can put spaces in the template to make it more readable,but they don't translate to spaces in the output.) Here's what we gotthis time:.PP.Vb 4\& 01/24/2001 Ahmed\*(Aqs Camel Emporium 1147.99\& 01/28/2001 Flea spray 24.99\& 01/29/2001 Camel rides to tourists 1235.00\& 03/23/2001 Totals 1235.00 1172.98.Ve.PPThat's a bit better, but we still have that last column which needs tobe moved further over. There's an easy way to fix this up:unfortunately, we can't get \f(CW\*(C`pack\*(C'\fR to right-justify our fields, but wecan get \f(CW\*(C`sprintf\*(C'\fR to do it:.PP.Vb 4\& $tot_income = sprintf("%.2f", $tot_income); \& $tot_expend = sprintf("%12.2f", $tot_expend);\& $date = POSIX::strftime("%m/%d/%Y", localtime); \& print pack("A11 A28 A8 A*", $date, "Totals", $tot_income, $tot_expend);.Ve.PPThis time we get the right answer:.PP.Vb 3\& 01/28/2001 Flea spray 24.99\& 01/29/2001 Camel rides to tourists 1235.00\& 03/23/2001 Totals 1235.00 1172.98.Ve.PPSo that's how we consume and produce fixed-width data. Let's recap whatwe've seen of \f(CW\*(C`pack\*(C'\fR and \f(CW\*(C`unpack\*(C'\fR so far:.IP "\(bu" 3Use \f(CW\*(C`pack\*(C'\fR to go from several pieces of data to one fixed-widthversion; use \f(CW\*(C`unpack\*(C'\fR to turn a fixed-width-format string into severalpieces of data..IP "\(bu" 3The pack format \f(CW\*(C`A\*(C'\fR means \*(L"any character\*(R"; if you're \f(CW\*(C`pack\*(C'\fRing andyou've run out of things to pack, \f(CW\*(C`pack\*(C'\fR will fill the rest up withspaces..IP "\(bu" 3\&\f(CW\*(C`x\*(C'\fR means \*(L"skip a byte\*(R" when \f(CW\*(C`unpack\*(C'\fRing; when \f(CW\*(C`pack\*(C'\fRing, it means\&\*(L"introduce a null byte\*(R" \- that's probably not what you mean if you'redealing with plain text..IP "\(bu" 3You can follow the formats with numbers to say how many charactersshould be affected by that format: \f(CW\*(C`A12\*(C'\fR means \*(L"take 12 characters\*(R";\&\f(CW\*(C`x6\*(C'\fR means \*(L"skip 6 bytes\*(R" or \*(L"character 0, 6 times\*(R"..IP "\(bu" 3Instead of a number, you can use \f(CW\*(C`*\*(C'\fR to mean \*(L"consume everything elseleft\*(R"..Sp\&\fBWarning\fR: when packing multiple pieces of data, \f(CW\*(C`*\*(C'\fR only means\&\*(L"consume all of the current piece of data\*(R". That's to say.Sp.Vb 1\& pack("A*A*", $one, $two).Ve.Sppacks all of \f(CW$one\fR into the first \f(CW\*(C`A*\*(C'\fR and then all of \f(CW$two\fR intothe second. This is a general principle: each format charactercorresponds to one piece of data to be \f(CW\*(C`pack\*(C'\fRed..SH "Packing Numbers".IX Header "Packing Numbers"So much for textual data. Let's get onto the meaty stuff that \f(CW\*(C`pack\*(C'\fRand \f(CW\*(C`unpack\*(C'\fR are best at: handling binary formats for numbers. There is,of course, not just one binary format \- life would be too simple \- butPerl will do all the finicky labor for you..Sh "Integers".IX Subsection "Integers"Packing and unpacking numbers implies conversion to and from some\&\fIspecific\fR binary representation. Leaving floating point numbersaside for the moment, the salient properties of any such representationare:.IP "\(bu" 4the number of bytes used for storing the integer,.IP "\(bu" 4whether the contents are interpreted as a signed or unsigned number,.IP "\(bu" 4the byte ordering: whether the first byte is the least or mostsignificant byte (or: little-endian or big-endian, respectively)..PPSo, for instance, to pack 20302 to a signed 16 bit integer in yourcomputer's representation you write.PP.Vb 1\& my $ps = pack( \*(Aqs\*(Aq, 20302 );.Ve.PPAgain, the result is a string, now containing 2 bytes. If you print this string (which is, generally, not recommended) you might see\&\f(CW\*(C`ON\*(C'\fR or \f(CW\*(C`NO\*(C'\fR (depending on your system's byte ordering) \- or somethingentirely different if your computer doesn't use \s-1ASCII\s0 character encoding.Unpacking \f(CW$ps\fR with the same template returns the original integer value:.PP.Vb 1\& my( $s ) = unpack( \*(Aqs\*(Aq, $ps );.Ve.PPThis is true for all numeric template codes. But don't expect miracles:if the packed value exceeds the allotted byte capacity, high order bitsare silently discarded, and unpack certainly won't be able to pull themback out of some magic hat. And, when you pack using a signed templatecode such as \f(CW\*(C`s\*(C'\fR, an excess value may result in the sign bitgetting set, and unpacking this will smartly return a negative value..PP16 bits won't get you too far with integers, but there is \f(CW\*(C`l\*(C'\fR and \f(CW\*(C`L\*(C'\fRfor signed and unsigned 32\-bit integers. And if this is not enough andyour system supports 64 bit integers you can push the limits much closerto infinity with pack codes \f(CW\*(C`q\*(C'\fR and \f(CW\*(C`Q\*(C'\fR. A notable exception is providedby pack codes \f(CW\*(C`i\*(C'\fR and \f(CW\*(C`I\*(C'\fR for signed and unsigned integers of the \&\*(L"local custom\*(R" variety: Such an integer will take up as many bytes asa local C compiler returns for \f(CW\*(C`sizeof(int)\*(C'\fR, but it'll use \fIat least\fR32 bits..PPEach of the integer pack codes \f(CW\*(C`sSlLqQ\*(C'\fR results in a fixed number of bytes,no matter where you execute your program. This may be useful for some applications, but it does not provide for a portable way to pass data structures between Perl and C programs (bound to happen when you call \&\s-1XS\s0 extensions or the Perl function \f(CW\*(C`syscall\*(C'\fR), or when you read orwrite binary files. What you'll need in this case are template codes thatdepend on what your local C compiler compiles when you code \f(CW\*(C`short\*(C'\fR or\&\f(CW\*(C`unsigned long\*(C'\fR, for instance. These codes and their correspondingbyte lengths are shown in the table below. Since the C standard leavesmuch leeway with respect to the relative sizes of these data types, actualvalues may vary, and that's why the values are given as expressions inC and Perl. (If you'd like to use values from \f(CW%Config\fR in your programyou have to import it with \f(CW\*(C`use Config\*(C'\fR.).PP.Vb 5\& signed unsigned byte length in C byte length in Perl \& s! S! sizeof(short) $Config{shortsize}\& i! I! sizeof(int) $Config{intsize}\& l! L! sizeof(long) $Config{longsize}\& q! Q! sizeof(long long) $Config{longlongsize}.Ve.PPThe \f(CW\*(C`i!\*(C'\fR and \f(CW\*(C`I!\*(C'\fR codes aren't different from \f(CW\*(C`i\*(C'\fR and \f(CW\*(C`I\*(C'\fR; they aretolerated for completeness' sake..Sh "Unpacking a Stack Frame".IX Subsection "Unpacking a Stack Frame"Requesting a particular byte ordering may be necessary when you work withbinary data coming from some specific architecture whereas your program couldrun on a totally different system. As an example, assume you have 24 bytescontaining a stack frame as it happens on an Intel 8086:.PP.Vb 11\& +\-\-\-\-\-\-\-\-\-+ +\-\-\-\-+\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+\& TOS: | IP | TOS+4:| FL | FH | FLAGS TOS+14:| SI |\& +\-\-\-\-\-\-\-\-\-+ +\-\-\-\-+\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+\& | CS | | AL | AH | AX | DI |\& +\-\-\-\-\-\-\-\-\-+ +\-\-\-\-+\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+\& | BL | BH | BX | BP |\& +\-\-\-\-+\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+\& | CL | CH | CX | DS |\& +\-\-\-\-+\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+\& | DL | DH | DX | ES |\& +\-\-\-\-+\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+.Ve.PPFirst, we note that this time-honored 16\-bit \s-1CPU\s0 uses little-endian order,and that's why the low order byte is stored at the lower address. Tounpack such a (unsigned) short we'll have to use code \f(CW\*(C`v\*(C'\fR. A repeatcount unpacks all 12 shorts:.PP.Vb 2\& my( $ip, $cs, $flags, $ax, $bx, $cd, $dx, $si, $di, $bp, $ds, $es ) =\& unpack( \*(Aqv12\*(Aq, $frame );.Ve.PPAlternatively, we could have used \f(CW\*(C`C\*(C'\fR to unpack the individuallyaccessible byte registers \s-1FL\s0, \s-1FH\s0, \s-1AL\s0, \s-1AH\s0, etc.:.PP.Vb 2\& my( $fl, $fh, $al, $ah, $bl, $bh, $cl, $ch, $dl, $dh ) =\& unpack( \*(AqC10\*(Aq, substr( $frame, 4, 10 ) );.Ve.PPIt would be nice if we could do this in one fell swoop: unpack a short,back up a little, and then unpack 2 bytes. Since Perl \fIis\fR nice, itproffers the template code \f(CW\*(C`X\*(C'\fR to back up one byte. Putting this alltogether, we may now write:.PP.Vb 5\& my( $ip, $cs,\& $flags,$fl,$fh,\& $ax,$al,$ah, $bx,$bl,$bh, $cx,$cl,$ch, $dx,$dl,$dh, \& $si, $di, $bp, $ds, $es ) =\& unpack( \*(Aqv2\*(Aq . (\*(AqvXXCC\*(Aq x 5) . \*(Aqv5\*(Aq, $frame );.Ve.PP(The clumsy construction of the template can be avoided \- just read on!).PPWe've taken some pains to construct the template so that it matchesthe contents of our frame buffer. Otherwise we'd either get undefined values,or \f(CW\*(C`unpack\*(C'\fR could not unpack all. If \f(CW\*(C`pack\*(C'\fR runs out of items, it willsupply null strings (which are coerced into zeroes whenever the pack codesays so)..Sh "How to Eat an Egg on a Net".IX Subsection "How to Eat an Egg on a Net"The pack code for big-endian (high order byte at the lowest address) is\&\f(CW\*(C`n\*(C'\fR for 16 bit and \f(CW\*(C`N\*(C'\fR for 32 bit integers. You use these codesif you know that your data comes from a compliant architecture, but,surprisingly enough, you should also use these pack codes if youexchange binary data, across the network, with some system that youknow next to nothing about. The simple reason is that thisorder has been chosen as the \fInetwork order\fR, and all standard-fearingprograms ought to follow this convention. (This is, of course, a sternbacking for one of the Lilliputian parties and may well influence thepolitical development there.) So, if the protocol expects you to senda message by sending the length first, followed by just so many bytes,you could write:.PP.Vb 1\& my $buf = pack( \*(AqN\*(Aq, length( $msg ) ) . $msg;.Ve.PPor even:.PP.Vb 1\& my $buf = pack( \*(AqNA*\*(Aq, length( $msg ), $msg );.Ve.PPand pass \f(CW$buf\fR to your send routine. Some protocols demand that thecount should include the length of the count itself: then just add 4to the data length. (But make sure to read \*(L"Lengths and Widths\*(R" beforeyou really code this!).Sh "Byte-order modifiers".IX Subsection "Byte-order modifiers"In the previous sections we've learned how to use \f(CW\*(C`n\*(C'\fR, \f(CW\*(C`N\*(C'\fR, \f(CW\*(C`v\*(C'\fR and\&\f(CW\*(C`V\*(C'\fR to pack and unpack integers with big\- or little-endian byte-order.While this is nice, it's still rather limited because it leaves out allkinds of signed integers as well as 64\-bit integers. For example, if youwanted to unpack a sequence of signed big-endian 16\-bit integers in aplatform-independent way, you would have to write:.PP.Vb 1\& my @data = unpack \*(Aqs*\*(Aq, pack \*(AqS*\*(Aq, unpack \*(Aqn*\*(Aq, $buf;.Ve.PPThis is ugly. As of Perl 5.9.2, there's a much nicer way to express yourdesire for a certain byte-order: the \f(CW\*(C`>\*(C'\fR and \f(CW\*(C`<\*(C'\fR modifiers.\&\f(CW\*(C`>\*(C'\fR is the big-endian modifier, while \f(CW\*(C`<\*(C'\fR is the little-endianmodifier. Using them, we could rewrite the above code as:.PP.Vb 1\& my @data = unpack \*(Aqs>*\*(Aq, $buf;.Ve.PPAs you can see, the \*(L"big end\*(R" of the arrow touches the \f(CW\*(C`s\*(C'\fR, which is anice way to remember that \f(CW\*(C`>\*(C'\fR is the big-endian modifier. The sameobviously works for \f(CW\*(C`<\*(C'\fR, where the \*(L"little end\*(R" touches the code..PPYou will probably find these modifiers even more useful if you haveto deal with big\- or little-endian C structures. Be sure to read\&\*(L"Packing and Unpacking C Structures\*(R" for more on that..Sh "Floating point Numbers".IX Subsection "Floating point Numbers"For packing floating point numbers you have the choice between thepack codes \f(CW\*(C`f\*(C'\fR, \f(CW\*(C`d\*(C'\fR, \f(CW\*(C`F\*(C'\fR and \f(CW\*(C`D\*(C'\fR. \f(CW\*(C`f\*(C'\fR and \f(CW\*(C`d\*(C'\fR pack into (or unpackfrom) single-precision or double-precision representation as it is providedby your system. If your systems supports it, \f(CW\*(C`D\*(C'\fR can be used to pack andunpack extended-precision floating point values (\f(CW\*(C`long double\*(C'\fR), whichcan offer even more resolution than \f(CW\*(C`f\*(C'\fR or \f(CW\*(C`d\*(C'\fR. \f(CW\*(C`F\*(C'\fR packs an \f(CW\*(C`NV\*(C'\fR,which is the floating point type used by Perl internally. (Thereis no such thing as a network representation for reals, so if you wantto send your real numbers across computer boundaries, you'd better stickto \s-1ASCII\s0 representation, unless you're absolutely sure what's on the otherend of the line. For the even more adventuresome, you can use the byte-ordermodifiers from the previous section also on floating point codes.).SH "Exotic Templates".IX Header "Exotic Templates".Sh "Bit Strings".IX Subsection "Bit Strings"Bits are the atoms in the memory world. Access to individual bits mayhave to be used either as a last resort or because it is the mostconvenient way to handle your data. Bit string (un)packing convertsbetween strings containing a series of \f(CW0\fR and \f(CW1\fR characters anda sequence of bytes each containing a group of 8 bits. This is almostas simple as it sounds, except that there are two ways the contents ofa byte may be written as a bit string. Let's have a look at an annotatedbyte:.PP.Vb 5\& 7 6 5 4 3 2 1 0\& +\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+\& | 1 0 0 0 1 1 0 0 |\& +\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+\& MSB LSB.Ve.PPIt's egg-eating all over again: Some think that as a bit string this shouldbe written \*(L"10001100\*(R" i.e. beginning with the most significant bit, othersinsist on \*(L"00110001\*(R". Well, Perl isn't biased, so that's why we have two bitstring codes:.PP.Vb 2\& $byte = pack( \*(AqB8\*(Aq, \*(Aq10001100\*(Aq ); # start with MSB\& $byte = pack( \*(Aqb8\*(Aq, \*(Aq00110001\*(Aq ); # start with LSB.Ve.PPIt is not possible to pack or unpack bit fields \- just integral bytes.\&\f(CW\*(C`pack\*(C'\fR always starts at the next byte boundary and \*(L"rounds up\*(R" to thenext multiple of 8 by adding zero bits as required. (If you do want bitfields, there is \*(L"vec\*(R" in perlfunc. Or you could implement bit field handling at the character string level, using split, substr, andconcatenation on unpacked bit strings.).PPTo illustrate unpacking for bit strings, we'll decompose a simplestatus register (a \*(L"\-\*(R" stands for a \*(L"reserved\*(R" bit):.PP.Vb 4\& +\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+\& | S Z \- A \- P \- C | \- \- \- \- O D I T |\& +\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+\& MSB LSB MSB LSB.Ve.PPConverting these two bytes to a string can be done with the unpack template \f(CW\*(Aqb16\*(Aq\fR. To obtain the individual bit values from the bitstring we use \f(CW\*(C`split\*(C'\fR with the \*(L"empty\*(R" separator pattern which dissectsinto individual characters. Bit values from the \*(L"reserved\*(R" positions aresimply assigned to \f(CW\*(C`undef\*(C'\fR, a convenient notation for \*(L"I don't care wherethis goes\*(R"..PP.Vb 3\& ($carry, undef, $parity, undef, $auxcarry, undef, $zero, $sign,\& $trace, $interrupt, $direction, $overflow) =\& split( //, unpack( \*(Aqb16\*(Aq, $status ) );.Ve.PPWe could have used an unpack template \f(CW\*(Aqb12\*(Aq\fR just as well, since thelast 4 bits can be ignored anyway..Sh "Uuencoding".IX Subsection "Uuencoding"
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -