⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 arcwriter.html

📁 用JAVA编写的,在做实验的时候留下来的,本来想删的,但是传上来,大家分享吧
💻 HTML
📖 第 1 页 / 共 3 页
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><!--NewPage--><HTML><HEAD><!-- Generated by javadoc (build 1.5.0_06) on Wed Sep 27 16:03:13 PDT 2006 --><TITLE>ARCWriter (Heritrix 1.10.1)</TITLE><META NAME="keywords" CONTENT="org.archive.io.arc.ARCWriter class"><LINK REL ="stylesheet" TYPE="text/css" HREF="../../../../stylesheet.css" TITLE="Style"><SCRIPT type="text/javascript">function windowTitle(){    parent.document.title="ARCWriter (Heritrix 1.10.1)";}</SCRIPT><NOSCRIPT></NOSCRIPT></HEAD><BODY BGCOLOR="white" onload="windowTitle();"><!-- ========= START OF TOP NAVBAR ======= --><A NAME="navbar_top"><!-- --></A><A HREF="#skip-navbar_top" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_top_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY="">  <TR ALIGN="center" VALIGN="top">  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> &nbsp;<FONT CLASS="NavBarFont1Rev"><B>Class</B></FONT>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="class-use/ARCWriter.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A>&nbsp;</TD>  </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">&nbsp;<A HREF="../../../../org/archive/io/arc/ARCUtils.html" title="class in org.archive.io.arc"><B>PREV CLASS</B></A>&nbsp;&nbsp;<A HREF="../../../../org/archive/io/arc/ARCWriterPool.html" title="class in org.archive.io.arc"><B>NEXT CLASS</B></A></FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">  <A HREF="../../../../index.html?org/archive/io/arc/ARCWriter.html" target="_top"><B>FRAMES</B></A>  &nbsp;&nbsp;<A HREF="ARCWriter.html" target="_top"><B>NO FRAMES</B></A>  &nbsp;&nbsp;<SCRIPT type="text/javascript">  <!--  if(window==top) {    document.writeln('<A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A>');  }  //--></SCRIPT><NOSCRIPT>  <A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR><TR><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">  SUMMARY:&nbsp;NESTED&nbsp;|&nbsp;<A HREF="#fields_inherited_from_class_org.archive.io.WriterPoolMember">FIELD</A>&nbsp;|&nbsp;<A HREF="#constructor_summary">CONSTR</A>&nbsp;|&nbsp;<A HREF="#method_summary">METHOD</A></FONT></TD><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">DETAIL:&nbsp;FIELD&nbsp;|&nbsp;<A HREF="#constructor_detail">CONSTR</A>&nbsp;|&nbsp;<A HREF="#method_detail">METHOD</A></FONT></TD></TR></TABLE><A NAME="skip-navbar_top"></A><!-- ========= END OF TOP NAVBAR ========= --><HR><!-- ======== START OF CLASS DATA ======== --><H2><FONT SIZE="-1">org.archive.io.arc</FONT><BR>Class ARCWriter</H2><PRE>java.lang.Object  <IMG SRC="../../../../resources/inherit.gif" ALT="extended by "><A HREF="../../../../org/archive/io/WriterPoolMember.html" title="class in org.archive.io">org.archive.io.WriterPoolMember</A>      <IMG SRC="../../../../resources/inherit.gif" ALT="extended by "><B>org.archive.io.arc.ARCWriter</B></PRE><DL><DT><B>All Implemented Interfaces:</B> <DD><A HREF="../../../../org/archive/io/arc/ARCConstants.html" title="interface in org.archive.io.arc">ARCConstants</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html" title="interface in org.archive.io">ArchiveFileConstants</A></DD></DL><HR><DL><DT><PRE>public class <B>ARCWriter</B><DT>extends <A HREF="../../../../org/archive/io/WriterPoolMember.html" title="class in org.archive.io">WriterPoolMember</A><DT>implements <A HREF="../../../../org/archive/io/arc/ARCConstants.html" title="interface in org.archive.io.arc">ARCConstants</A></DL></PRE><P>Write ARC files. Assumption is that the caller is managing access to this ARCWriter ensuring only one thread of control accessing this ARC file instance at any one time. <p>ARC files are described here: <a href="http://www.archive.org/web/researcher/ArcFileFormat.php">Arc File Format</a>.  This class does version 1 of the ARC file format.  It also writes version 1.1 which is version 1 with data stuffed into the body of the first arc record in the file, the arc file meta record itself. <p>An ARC file is three lines of meta data followed by an optional 'body' and then a couple of '\n' and then: record, '\n', record, '\n', record, etc. If we are writing compressed ARC files, then each of the ARC file records is individually gzipped and concatenated together to make up a single ARC file. In GZIP terms, each ARC record is a GZIP <i>member</i> of a total gzip'd file. <p>The GZIPping of the ARC file meta data is exceptional.  It is GZIPped w/ an extra GZIP header, a special Internet Archive (IA) extra header field (e.g. FEXTRA is set in the GZIP header FLG field and an extra field is appended to the GZIP header).  The extra field has little in it but its presence denotes this GZIP as an Internet Archive gzipped ARC.  See RFC1952 to learn about the GZIP header structure. <p>This class then does its GZIPping in the following fashion.  Each GZIP member is written w/ a new instance of GZIPOutputStream -- actually ARCWriterGZIPOututStream so we can get access to the underlying stream. The underlying stream stays open across GZIPoutputStream instantiations. For the 'special' GZIPing of the ARC file meta data, we cheat by catching the GZIPOutputStream output into a byte array, manipulating it adding the IA GZIP header, before writing to the stream. <p>I tried writing a resettable GZIPOutputStream and could make it work w/ the SUN JDK but the IBM JDK threw NPE inside in the deflate.reset -- its zlib native call doesn't seem to like the notion of resetting -- so I gave up on it. <p>Because of such as the above and troubles with GZIPInputStream, we should write our own GZIP*Streams, ones that resettable and consious of gzip members. <p>This class will write until we hit >= maxSize.  The check is done at record boundary.  Records do not span ARC files.  We will then close current file and open another and then continue writing. <p><b>TESTING: </b>Here is how to test that produced ARC files are good using the <a href="http://www.archive.org/web/researcher/tool_documentation.php">alexa ARC c-tools</a>: <pre> % av_procarc hx20040109230030-0.arc.gz | av_ziparc > \     /tmp/hx20040109230030-0.dat.gz % av_ripdat /tmp/hx20040109230030-0.dat.gz > /tmp/hx20040109230030-0.cdx </pre> Examine the produced cdx file to make sure it makes sense.  Search for 'no-type 0'.  If found, then we're opening a gzip record w/o data to write.  This is bad. <p>You can also do <code>gzip -t FILENAME</code> and it will tell you if the ARC makes sense to GZIP.  <p>While being written, ARCs have a '.open' suffix appended.<P><P><DL><DT><B>Author:</B></DT>  <DD>stack</DD></DL><HR><P><!-- =========== FIELD SUMMARY =========== --><A NAME="field_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Field Summary</B></FONT></TH></TR></TABLE>&nbsp;<A NAME="fields_inherited_from_class_org.archive.io.WriterPoolMember"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Fields inherited from class org.archive.io.<A HREF="../../../../org/archive/io/WriterPoolMember.html" title="class in org.archive.io">WriterPoolMember</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/io/WriterPoolMember.html#DEFAULT_PREFIX">DEFAULT_PREFIX</A>, <A HREF="../../../../org/archive/io/WriterPoolMember.html#DEFAULT_SUFFIX">DEFAULT_SUFFIX</A>, <A HREF="../../../../org/archive/io/WriterPoolMember.html#HOSTNAME_VARIABLE">HOSTNAME_VARIABLE</A>, <A HREF="../../../../org/archive/io/WriterPoolMember.html#UTF8">UTF8</A></CODE></TD></TR></TABLE>&nbsp;<A NAME="fields_inherited_from_class_org.archive.io.arc.ARCConstants"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Fields inherited from interface org.archive.io.arc.<A HREF="../../../../org/archive/io/arc/ARCConstants.html" title="interface in org.archive.io.arc">ARCConstants</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/io/arc/ARCConstants.html#ARC_FILE_EXTENSION">ARC_FILE_EXTENSION</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#ARC_GZIP_EXTRA_FIELD">ARC_GZIP_EXTRA_FIELD</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#ARC_MAGIC_NUMBER">ARC_MAGIC_NUMBER</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#CHECKSUM_FIELD_KEY">CHECKSUM_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#CHECKSUM_HEADER_FIELD_KEY">CHECKSUM_HEADER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#CODE_HEADER_FIELD_KEY">CODE_HEADER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#COMPRESSED_ARC_FILE_EXTENSION">COMPRESSED_ARC_FILE_EXTENSION</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#DEFAULT_ENCODING">DEFAULT_ENCODING</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#DEFAULT_GZIP_HEADER_LENGTH">DEFAULT_GZIP_HEADER_LENGTH</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#DEFAULT_MAX_ARC_FILE_SIZE">DEFAULT_MAX_ARC_FILE_SIZE</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#DOT_ARC_FILE_EXTENSION">DOT_ARC_FILE_EXTENSION</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#DOT_COMPRESSED_ARC_FILE_EXTENSION">DOT_COMPRESSED_ARC_FILE_EXTENSION</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#DOT_COMPRESSED_FILE_EXTENSION">DOT_COMPRESSED_FILE_EXTENSION</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#FILENAME_FIELD_KEY">FILENAME_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#FILENAME_HEADER_FIELD_KEY">FILENAME_HEADER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#GZIP_HEADER_BEGIN">GZIP_HEADER_BEGIN</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#HEADER_FIELD_SEPARATOR">HEADER_FIELD_SEPARATOR</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#IP_HEADER_FIELD_KEY">IP_HEADER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#LINE_SEPARATOR">LINE_SEPARATOR</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#LOCATION_HEADER_FIELD_KEY">LOCATION_HEADER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#MAX_METADATA_LINE_LENGTH">MAX_METADATA_LINE_LENGTH</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#MINIMUM_RECORD_LENGTH">MINIMUM_RECORD_LENGTH</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#OFFSET_FIELD_KEY">OFFSET_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#OFFSET_HEADER_FIELD_KEY">OFFSET_HEADER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#REQUIRED_VERSION_1_HEADER_FIELDS">REQUIRED_VERSION_1_HEADER_FIELDS</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#STATUSCODE_FIELD_KEY">STATUSCODE_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#TOKENIZED_PREFIX">TOKENIZED_PREFIX</A></CODE></TD></TR></TABLE>&nbsp;<A NAME="fields_inherited_from_class_org.archive.io.ArchiveFileConstants"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Fields inherited from interface org.archive.io.<A HREF="../../../../org/archive/io/ArchiveFileConstants.html" title="interface in org.archive.io">ArchiveFileConstants</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/io/ArchiveFileConstants.html#ABSOLUTE_OFFSET_KEY">ABSOLUTE_OFFSET_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#CDX">CDX</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#CDX_FILE">CDX_FILE</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#CDX_LINE_BUFFER_SIZE">CDX_LINE_BUFFER_SIZE</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#COMPRESSED_FILE_EXTENSION">COMPRESSED_FILE_EXTENSION</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#CRLF">CRLF</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#DATE_FIELD_KEY">DATE_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#DEFAULT_DIGEST_METHOD">DEFAULT_DIGEST_METHOD</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#DUMP">DUMP</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#GZIP_DUMP">GZIP_DUMP</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#INVALID_SUFFIX">INVALID_SUFFIX</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#LENGTH_FIELD_KEY">LENGTH_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#MIMETYPE_FIELD_KEY">MIMETYPE_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#NOHEAD">NOHEAD</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#OCCUPIED_SUFFIX">OCCUPIED_SUFFIX</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#READER_IDENTIFIER_FIELD_KEY">READER_IDENTIFIER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#RECORD_IDENTIFIER_FIELD_KEY">RECORD_IDENTIFIER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#SINGLE_SPACE">SINGLE_SPACE</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#TYPE_FIELD_KEY">TYPE_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#URL_FIELD_KEY">URL_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#VERSION_FIELD_KEY">VERSION_FIELD_KEY</A></CODE></TD></TR></TABLE>&nbsp;<!-- ======== CONSTRUCTOR SUMMARY ======== --><A NAME="constructor_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Constructor Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><B><A HREF="../../../../org/archive/io/arc/ARCWriter.html#ARCWriter(java.util.concurrent.atomic.AtomicInteger, java.util.List, java.lang.String, boolean, int)">ARCWriter</A></B>(java.util.concurrent.atomic.AtomicInteger&nbsp;serialNo,          java.util.List&lt;java.io.File&gt;&nbsp;dirs,          java.lang.String&nbsp;prefix,          boolean&nbsp;cmprs,

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -