📄 arcreader.html
字号:
<TD><CODE><B><A HREF="../../../../org/archive/io/arc/ARCReader.html#gotoEOR(org.archive.io.ArchiveRecord)">gotoEOR</A></B>(<A HREF="../../../../org/archive/io/ArchiveRecord.html" title="class in org.archive.io">ArchiveRecord</A> record)</CODE><BR> Skip over any trailing new lines at end of the record so we're lined up ready to read the next.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected boolean</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/io/arc/ARCReader.html#isAlignedOnFirstRecord()">isAlignedOnFirstRecord</A></B>()</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> boolean</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/io/arc/ARCReader.html#isParseHttpHeaders()">isParseHttpHeaders</A></B>()</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>static void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/io/arc/ARCReader.html#main(java.lang.String[])">main</A></B>(java.lang.String[] args)</CODE><BR> Command-line interface to ARCReader.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected static void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/io/arc/ARCReader.html#output(org.archive.io.arc.ARCReader, java.lang.String)">output</A></B>(<A HREF="../../../../org/archive/io/arc/ARCReader.html" title="class in org.archive.io.arc">ARCReader</A> reader, java.lang.String format)</CODE><BR> Write out the arcfile.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected boolean</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/io/arc/ARCReader.html#output(java.lang.String)">output</A></B>(java.lang.String format)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected static void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/io/arc/ARCReader.html#outputRecord(org.archive.io.arc.ARCReader, java.lang.String)">outputRecord</A></B>(<A HREF="../../../../org/archive/io/arc/ARCReader.html" title="class in org.archive.io.arc">ARCReader</A> r, java.lang.String format)</CODE><BR> Output passed record using passed format specifier.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected boolean</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/io/arc/ARCReader.html#outputRecord(java.lang.String)">outputRecord</A></B>(java.lang.String format)</CODE><BR> Output passed record using passed format specifier.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/io/arc/ARCReader.html#setAlignedOnFirstRecord(boolean)">setAlignedOnFirstRecord</A></B>(boolean alignedOnFirstRecord)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/io/arc/ARCReader.html#setParseHttpHeaders(boolean)">setParseHttpHeaders</A></B>(boolean parse)</CODE><BR> </TD></TR></TABLE> <A NAME="methods_inherited_from_class_org.archive.io.ArchiveReader"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class org.archive.io.<A HREF="../../../../org/archive/io/ArchiveReader.html" title="class in org.archive.io">ArchiveReader</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/io/ArchiveReader.html#cdxOutput(boolean)">cdxOutput</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#cleanupCurrentRecord()">cleanupCurrentRecord</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#close()">close</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#currentRecord(org.archive.io.ArchiveRecord)">currentRecord</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#get()">get</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#get(long)">get</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#getCurrentRecord()">getCurrentRecord</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#getIn()">getIn</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#getInputStream()">getInputStream</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#getInputStream(java.io.File, long)">getInputStream</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#getLogger()">getLogger</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#getReaderIdentifier()">getReaderIdentifier</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#getStrippedFileName()">getStrippedFileName</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#getStrippedFileName(java.lang.String, java.lang.String)">getStrippedFileName</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#getTrueOrFalse(java.lang.String)">getTrueOrFalse</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#initialize(java.lang.String)">initialize</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#isCompressed()">isCompressed</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#isDigest()">isDigest</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#isStrict()">isStrict</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#isValid()">isValid</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#iterator()">iterator</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#logStdErr(java.util.logging.Level, java.lang.String)">logStdErr</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#rewind()">rewind</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#setCompressed(boolean)">setCompressed</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#setDigest(boolean)">setDigest</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#setIn(java.io.InputStream)">setIn</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#setReaderIdentifier(java.lang.String)">setReaderIdentifier</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#setStrict(boolean)">setStrict</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#setVersion(java.lang.String)">setVersion</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#stripExtension(java.lang.String, java.lang.String)">stripExtension</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#validate()">validate</A>, <A HREF="../../../../org/archive/io/ArchiveReader.html#validate(int)">validate</A></CODE></TD></TR></TABLE> <A NAME="methods_inherited_from_class_java.lang.Object"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class java.lang.Object</B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE>clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait</CODE></TD></TR></TABLE> <P><!-- ============ FIELD DETAIL =========== --><A NAME="field_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Field Detail</B></FONT></TH></TR></TABLE><A NAME="logger"><!-- --></A><H3>logger</H3><PRE>java.util.logging.Logger <B>logger</B></PRE><DL><DL></DL></DL><!-- ========= CONSTRUCTOR DETAIL ======== --><A NAME="constructor_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Constructor Detail</B></FONT></TH></TR></TABLE><A NAME="ARCReader()"><!-- --></A><H3>ARCReader</H3><PRE><B>ARCReader</B>()</PRE><DL></DL><!-- ============ METHOD DETAIL ========== --><A NAME="method_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="gotoEOR(org.archive.io.ArchiveRecord)"><!-- --></A><H3>gotoEOR</H3><PRE>protected void <B>gotoEOR</B>(<A HREF="../../../../org/archive/io/ArchiveRecord.html" title="class in org.archive.io">ArchiveRecord</A> record) throws java.io.IOException</PRE><DL><DD>Skip over any trailing new lines at end of the record so we're lined up ready to read the next.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/io/ArchiveReader.html#gotoEOR(org.archive.io.ArchiveRecord)">gotoEOR</A></CODE> in class <CODE><A HREF="../../../../org/archive/io/ArchiveReader.html" title="class in org.archive.io">ArchiveReader</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>record</CODE> - <DT><B>Throws:</B><DD><CODE>java.io.IOException</CODE></DL></DD></DL><HR><A NAME="createArchiveRecord(java.io.InputStream, long)"><!-- --></A><H3>createArchiveRecord</H3><PRE>protected <A HREF="../../../../org/archive/io/arc/ARCRecord.html" title="class in org.archive.io.arc">ARCRecord</A> <B>createArchiveRecord</B>(java.io.InputStream is, long offset) throws java.io.IOException</PRE><DL><DD>Create new arc record. Encapsulate housekeeping that has to do w/ creating a new record. <p>Call this method at end of constructor to read in the arcfile header. Will be problems reading subsequent arc records if you don't since arcfile header has the list of metadata fields for all records that follow. <p>When parsing through ARCs writing out CDX info, we spend about 38% of CPU in here -- about 30% of which is in getTokenizedHeaderLine -- of which 16% is reading.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/io/ArchiveReader.html#createArchiveRecord(java.io.InputStream, long)">createArchiveRecord</A></CODE> in class <CODE><A HREF="../../../../org/archive/io/ArchiveReader.html" title="class in org.archive.io">ArchiveReader</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>is</CODE> - InputStream to use.<DD><CODE>offset</CODE> - Absolute offset into arc file.<DT><B>Returns:</B><DD>An arc record.<DT><B>Throws:</B><DD><CODE>java.io.IOException</CODE></DL></DD></DL><HR><A NAME="getVersion()"><!-- --></A><H3>getVersion</H3><PRE>public java.lang.String <B>getVersion</B>()</PRE><DL><DD>Returns version of this ARC file. Usually read from first record of ARC. If we're reading without having first read the first record -- e.g. random access into middle of an ARC -- then version will not have been set. For now, we return a default, version 1.1. Later, if more than just one version of ARC, we could look at such as the meta line to see what version of ARC this is.<P><DD><DL><DT><B>Overrides:</B><DD><CODE><A HREF="../../../../org/archive/io/ArchiveReader.html#getVersion()">getVersion</A></CODE> in class <CODE><A HREF="../../../../org/archive/io/ArchiveReader.html" title="class in org.archive.io">ArchiveReader</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>Version of this ARC file.</DL></DD></DL><HR><A NAME="fixSpaceInMetadataLine(java.util.List, int)"><!-- --></A><H3>fixSpaceInMetadataLine</H3><PRE>protected java.util.List<java.lang.String> <B>fixSpaceInMetadataLine</B>(java.util.List<java.lang.String> values, int requiredSize)</PRE><DL><DD>Fix space in URLs. The ARCWriter used to write into the ARC URLs with spaces in them. See <a href="https://sourceforge.net/tracker/?group_id=73833&atid=539099&func=detail&aid=1010966">[ 1010966 ] crawl.log has URIs with spaces in them</a>. This method does fix up on such headers converting all spaces found to '%20'.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>values</CODE> - List of metadata values.<DD><CODE>requiredSize</CODE> - Expected size of resultant values list.<DT><B>Returns:</B><DD>New list if we successfully fixed up values or original if fixup failed.</DL></DD></DL><HR><A NAME="isAlignedOnFirstRecord()"><!-- --></A><H3>isAlignedOnFirstRecord</H3><PRE>protected boolean <B>isAlignedOnFirstRecord</B>()</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="setAlignedOnFirstRecord(boolean)"><!-- --></A><H3>setAlignedOnFirstRecord</H3><PRE>protected void <B>setAlignedOnFirstRecord</B>(boolean alignedOnFirstRecord)</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="isParseHttpHeaders()"><!-- --></A><H3>isParseHttpHeaders</H3><PRE>public boolean <B>isParseHttpHeaders</B>()</PRE><DL><DD><DL>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -