⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch08_12.htm

📁 By Tom Christiansen and Nathan Torkington ISBN 1-56592-243-3 First Edition, published August 1998
💻 HTM
字号:
<HTML><HEAD><TITLE>Recipe 8.11. Processing Binary Files (Perl Cookbook)</TITLE><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen &amp; Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:38:46Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch08_01.htm"TITLE="8. File Contents"><LINKREL="prev"HREF="ch08_11.htm"TITLE="8.10. Removing the Last Line of a File"><LINKREL="next"HREF="ch08_13.htm"TITLE="8.12. Using Random-Access I/O"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch08_11.htm"TITLE="8.10. Removing the Last Line of a File"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 8.10. Removing the Last Line of a File"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"><ACLASS="chapter"REL="up"HREF="ch08_01.htm"TITLE="8. File Contents"></A></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch08_13.htm"TITLE="8.12. Using Random-Access I/O"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 8.12. Using Random-Access I/O"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch08-19069">8.11. Processing Binary Files</A></H2><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch08-pgfId-1050">Problem<ACLASS="indexterm"NAME="ch08-idx-1000004672-0"></A><ACLASS="indexterm"NAME="ch08-idx-1000004672-1"></A></A></H3><PCLASS="para">Your system distinguishes between text and binary files. How do you?</P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch08-pgfId-1056">Solution</A></H3><PCLASS="para">Use the <CODECLASS="literal">binmode</CODE><ACLASS="indexterm"NAME="ch08-idx-1000004678-0"></A> function on the filehandle:</P><PRECLASS="programlisting">binmode(HANDLE);</PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch08-pgfId-1000005428">Discussion</A></H3><PCLASS="para">Not everyone agrees what constitutes a line in a text file, because one person's textual character set is another's binary gibberish. Even when everyone is using ASCII instead of EBCDIC, Rad50, or Unicode, discrepancies arise.</P><PCLASS="para"><ACLASS="indexterm"NAME="ch08-idx-1000004679-0"></A><ACLASS="indexterm"NAME="ch08-idx-1000004679-1"></A>As mentioned in the Introduction, there is no such thing as a newline character. It is purely virtual, a figment of the operating system, standard libraries, device drivers, and Perl.</P><PCLASS="para">Under Unix or Plan9, a <CODECLASS="literal">&quot;\n&quot;</CODE> represents the physical sequence <CODECLASS="literal">&quot;\cJ&quot;</CODE> (the Perl double-quote escape for Ctrl-J), a linefeed. However, on a terminal that's not in raw mode, an Enter key generates an incoming <CODECLASS="literal">&quot;\cM&quot;</CODE> (a carriage return) which turns into <CODECLASS="literal">&quot;\cJ&quot;</CODE>, whereas an outgoing <CODECLASS="literal">&quot;\cJ&quot;</CODE> turns into <CODECLASS="literal">&quot;\cM\cJ&quot;</CODE>. This strangeness doesn't happen with normal files, just terminal devices, and it is handled strictly by the device driver.</P><PCLASS="para">On a Mac, a <CODECLASS="literal">&quot;\n&quot;</CODE> is usually represented by <CODECLASS="literal">&quot;\cM&quot;</CODE>; just to make life interesting (and because the standard requires that <CODECLASS="literal">&quot;\n&quot;</CODE> and <CODECLASS="literal">&quot;\r&quot;</CODE> be different), a <CODECLASS="literal">&quot;\r&quot;</CODE> represents a <CODECLASS="literal">&quot;\cJ&quot;</CODE>. This is exactly the opposite of the way that Unix, Plan9, VMS, CP/M, or nearly anyone else does it. So, Mac programmers writing files for other systems or talking over a network have to be careful. If you send out <CODECLASS="literal">&quot;\n&quot;</CODE>, you'll deliver a <CODECLASS="literal">&quot;\cM&quot;</CODE>, and no <CODECLASS="literal">&quot;\cJ&quot;</CODE> will be seen. Most network services prefer to receive and send <CODECLASS="literal">&quot;\cM\cJ&quot;</CODE> as a line terminator, but most accept merely a <CODECLASS="literal">&quot;\cJ&quot;</CODE>.</P><PCLASS="para">Under VMS, DOS, or their derivatives, a <CODECLASS="literal">&quot;\n&quot;</CODE> represents <CODECLASS="literal">&quot;\cJ&quot;</CODE>, similar to Unix and Plan9. From the perspective of a tty, Unix and DOS behave identically: a user who hits Enter generates a <CODECLASS="literal">&quot;\cM&quot;</CODE>, but this arrives at the program as a <CODECLASS="literal">&quot;\n&quot;</CODE>, which is <CODECLASS="literal">&quot;\cJ&quot;</CODE>. A <CODECLASS="literal">&quot;\n&quot;</CODE> (that's a <CODECLASS="literal">&quot;\cJ&quot;</CODE>, remember) sent to a terminal shows up as a <CODECLASS="literal">&quot;\cM\cJ&quot;</CODE>.</P><PCLASS="para">These strange conversions happen to Windows files as well. A DOS text file actually physically contains two characters at the end of every line, <CODECLASS="literal">&quot;\cM\cJ&quot;</CODE>. The last block in the file has a <CODECLASS="literal">&quot;\cZ&quot;</CODE> to indicate where the text stops. When you write a line like <CODECLASS="literal">&quot;bad</CODE> <CODECLASS="literal">news\n&quot;</CODE> on those systems, the file contains <CODECLASS="literal">&quot;bad</CODE> <CODECLASS="literal">news\cM\cJ&quot;</CODE>, just as if it were a terminal.</P><PCLASS="para">When you read a line on such systems, it's even stranger. The file itself contains <CODECLASS="literal">&quot;bad</CODE> <CODECLASS="literal">news\cM\cJ&quot;</CODE>, a 10-byte string. When you read it in, your program gets nothing but <CODECLASS="literal">&quot;bad</CODE> <CODECLASS="literal">news\n&quot;</CODE>, where that <CODECLASS="literal">&quot;\n&quot;</CODE> is the virtual newline character, that is, a linefeed (<CODECLASS="literal">&quot;\cJ&quot;</CODE>). That means to get rid of it, a single <CODECLASS="literal">chop</CODE> or <CODECLASS="literal">chomp</CODE> will do. But your poor program has been tricked into thinking it's only read nine bytes from the file. If you were to read 10 such lines, you would appear to have read  just 90 bytes into the file, but in fact would be at position 100. That's why the <CODECLASS="literal">tell</CODE> function must always be used to determine your location. You can't infer your position just by counting what you've read.</P><PCLASS="para">This legacy of the old CP/M filesystem, whose equivalent of a Unix inode stored only block counts and not file sizes, has frustrated programmers for decades, and no end is in sight. Because DOS is compatible with CP/M file formats, Windows with DOS, and NT with Windows, the sins of the fathers have truly been visited unto the children of the fourth generation.</P><PCLASS="para">You can circumvent the single <CODECLASS="literal">&quot;\n&quot;</CODE> terminator by telling Perl (and the operating system) that you're working with binary data. The <CODECLASS="literal">binmode</CODE> function indicates that data read or written through the given filehandle should not be mangled the way a text file would likely be on those systems.</P><PRECLASS="programlisting">$gifname = &quot;picture.gif&quot;;open(GIF, $gifname)         or die &quot;can't open $gifname: $!&quot;;binmode(GIF);               # now DOS won't mangle binary input from GIFbinmode(STDOUT);            # now DOS won't mangle binary output to STDOUTwhile (read(GIF, $buff, 8 * 2**10)) {    print STDOUT $buff;}</PRE><PCLASS="para">Calling <CODECLASS="literal">binmode</CODE> on systems that don't make this distinction (including Unix, the Mac, and Plan 9) is harmless. Inappropriately doing so (such as on a text file) on systems that do (including MVS, VMS, and DOS,  regardless of its GUI )  can mangle your files.</P><PCLASS="para">If you're <EMCLASS="emphasis">not</EM> using <CODECLASS="literal">binmode</CODE>, the data you read using stdio (&lt;&gt;) will automatically have the native system's line terminator changed to <CODECLASS="literal">&quot;\n&quot;</CODE>, even if you change <CODECLASS="literal">$/</CODE>. Similarly, any <CODECLASS="literal">&quot;\n&quot;</CODE> you <CODECLASS="literal">print</CODE> to the filehandle will be turned into the native line terminator. See this chapter's Introduction for more details.</P><PCLASS="para">If you want to get what was on the disk, byte for byte, you should set <CODECLASS="literal">binmode</CODE> if you're on one of the odd systems listed above. Then, of course, you also have to set <CODECLASS="literal">$/</CODE> to the real record separator if you want to use &lt;&gt; on it. <ACLASS="indexterm"NAME="ch08-idx-1000004674-0"></A><ACLASS="indexterm"NAME="ch08-idx-1000004674-1"></A><ACLASS="indexterm"NAME="ch08-idx-1000004674-2"></A></P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch08-pgfId-1110">See Also</A></H3><PCLASS="para">The <CODECLASS="literal">open</CODE> and <CODECLASS="literal">binmode</CODE> functions in <ICLASS="filename">perlfunc </I>(1) and in <ACLASS="olink"HREF="../prog/ch03_01.htm">Chapter 3</A> of <ACLASS="citetitle"HREF="../prog/index.htm"TITLE="Programming Perl"><CITECLASS="citetitle">Programming Perl</CITE></A>; your system's <ICLASS="filename">open </I>(2) and <ICLASS="filename">fopen </I>(3) manpages</P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch08_11.htm"TITLE="8.10. Removing the Last Line of a File"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 8.10. Removing the Last Line of a File"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch08_13.htm"TITLE="8.12. Using Random-Access I/O"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 8.12. Using Random-Access I/O"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">8.10. Removing the Last Line of a File</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">8.12. Using Random-Access I/O</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -