package-summary.html

来自「 Lucene是apache软件基金会[4] jakarta项目组的一个子项目」· HTML 代码 · 共 625 行 · 第 1/2 页

HTML
625
字号
Existing tasks can be divided into a few groups:
regular index/search work tasks, report tasks, and control tasks.
</p>

<ol>

 <li>
 <b>Report tasks</b>: There are a few Report commands for generating reports.
 Only task runs that were completed are reported.
 (The 'Report tasks' themselves are not measured and not reported.)
 <ul>
             <li>
            <font color="#FF0066">RepAll</font> - all (completed) task runs.
            </li>
            <li>
            <font color="#FF0066">RepSumByName</font> - all statistics, aggregated by name. So, if AddDoc was executed 2000 times,
            only 1 report line would be created for it, aggregating all those 2000 statistic records.
            </li>
            <li>
            <font color="#FF0066">RepSelectByPref &nbsp; prefixWord</font> - all records for tasks whose name start with <font color="#FF0066">prefixWord</font>.
            </li>
            <li>
            <font color="#FF0066">RepSumByPref &nbsp; prefixWord</font> - all records for tasks whose name start with <font color="#FF0066">prefixWord</font>,
            aggregated by their full task name.
            </li>
            <li>
            <font color="#FF0066">RepSumByNameRound</font> - all statistics, aggregated by name and by <font color="#FF0066">Round</font>.
            So, if AddDoc was executed 2000 times in each of 3 <font color="#FF0066">rounds</font>, 3 report lines would be created for it,
            aggregating all those 2000 statistic records in each round. See more about rounds in the <font color="#FF0066">NewRound</font> command description below.
            </li>
            <li>
            <font color="#FF0066">RepSumByPrefRound &nbsp; prefixWord</font> - similar to <font color="#FF0066">RepSumByNameRound</font>,
            just that only tasks whose name starts with <font color="#FF0066">prefixWord</font> are included.
            </li>
 </ul>
 If needed, additional reports can be added by extending the abstract class ReportTask, and by
 manipulating the statistics data in Points and TaskStats.
 </li>

 <li><b>Control tasks</b>: Few of the tasks control the benchmark algorithm all over:
 <ul>
     <li>
     <font color="#FF0066">ClearStats</font> - clears the entire statistics.
     Further reports would only include task runs that would start after this call.
     </li>
     <li>
     <font color="#FF0066">NewRound</font> - virtually start a new round of performance test.
     Although this command can be placed anywhere, it mostly makes sense at the end of an outermost sequence.
     <br>This increments a global "round counter". All task runs that would start now would
     record the new, updated round counter as their round number. This would appear in reports.
     In particular, see <font color="#FF0066">RepSumByNameRound</font> above.
     <br>An additional effect of NewRound, is that numeric and boolean properties defined (at the head 
     of the .alg file) as a sequence of values, e.g. <font color="#FF0066">merge.factor=mrg:10:100:10:100</font> would
     increment (cyclic) to the next value.
     Note: this would also be reflected in the reports, in this case under a column that would be named "mrg".
     </li>
     <li>
     <font color="#FF0066">ResetInputs</font> - DocMaker and the various QueryMakers
     would reset their counters to start.
     The way these Maker interfaces work, each call for makeDocument()
     or makeQuery() creates the next document or query
     that it "knows" to create.
     If that pool is "exhausted", the "maker" start over again. The resetInpus command
     therefore allows to make the rounds comparable.
     It is therefore useful to invoke ResetInputs together with NewRound.
     </li>
     <li>
     <font color="#FF0066">ResetSystemErase</font> - reset all index and input data and call gc.
     Does NOT reset statistics. This contains ResetInputs.
     All writers/readers are nullified, deleted, closed.
     Index is erased.
     Directory is erased.
     You would have to call CreateIndex once this was called...
     </li>
     <li>
     <font color="#FF0066">ResetSystemSoft</font> -  reset all index and input data and call gc.
     Does NOT reset statistics. This contains ResetInputs.
     All writers/readers are nullified, closed.
     Index is NOT erased.
     Directory is NOT erased.
     This is useful for testing performance on an existing index, for instance if the construction of a large index
     took a very long time and now you would to test its search or update performance.
     </li>
 </ul>
 </li>

 <li>
 Other existing tasks are quite straightforward and would just be briefly described here.
 <ul>
     <li>
     <font color="#FF0066">CreateIndex</font> and <font color="#FF0066">OpenIndex</font> both leave the index open for later update operations.
     <font color="#FF0066">CloseIndex</font> would close it.
     </li>
     <li>
     <font color="#FF0066">OpenReader</font>, similarly, would leave an index reader open for later search operations.
     But this have further semantics.
     If a Read operation is performed, and an open reader exists, it would be used.
     Otherwise, the read operation would open its own reader and close it when the read operation is done.
     This allows testing various scenarios - sharing a reader, searching with "cold" reader, with "warmed" reader, etc.
     The read operations affected by this are: <font color="#FF0066">Warm</font>,
     <font color="#FF0066">Search</font>, <font color="#FF0066">SearchTrav</font> (search and traverse),
     and <font color="#FF0066">SearchTravRet</font> (search and traverse and retrieve).
     Notice that each of the 3 search task types maintains its own queryMaker instance.
     </li>
 </ul
 </li>
 </ol>

<a name="properties"></a>
<h2>Benchmark properties</h2>

<p>
Properties are read from the header of the .alg file, and
define several parameters of the performance test.
As mentioned above for the <font color="#FF0066">NewRound</font> task,
numeric and boolean properties that are defined as a sequence
of values, e.g. <font color="#FF0066">merge.factor=mrg:10:100:10:100</font>
would increment (cyclic) to the next value, when NewRound is called, and would also
appear as a named column in the reports (column name would be "mrg" in this example).
</p>

<p>
Some of the currently defined properties are:
</p>

<ol>
    <li>
    <font color="#FF0066">analyzer</font> - full class name for the analyzer to use.
    Same analyzer would be used in the entire test.
    </li>

    <li>
    <font color="#FF0066">directory</font> - valid values are FSDirectory and RAMDirectory.
    This tells which directory to use for the performance test.
    </li>

    <li>
    <b>Index work parameters</b>:
    Multi int/boolean values would be iterated with calls to NewRound.
    There would be also added as columns in the reports, first string in the
    sequence is the column name.
    (Make sure it is no shorter than any value in the sequence).
    <ul>
        <li><font color="#FF0066">max.buffered</font>
        <br>Example: max.buffered=buf:10:10:100:100 -
        this would define using maxBufferedDocs of 10 in iterations 0 and 1,
        and 100 in iterations 2 and 3.
        </li>
        <li>
        <font color="#FF0066">merge.factor</font> - which
        merge factor to use.
        </li>
        <li>
        <font color="#FF0066">compound</font> - whether the index is
        using the compound format or not. Valid values are "true" and "false".
        </li>
    </ul>
</ol>

<p>
For additional defined properties see the *.alg files under conf.
</p>

<a name="example"></a>
<h2>Example input algorithm and the result benchmark report</h2>
<p>
The following example is in conf/sample.alg:
<pre>
<font color="#003333"># --------------------------------------------------------
#
# Sample: what is the effect of doc size on indexing time?
#
# There are two parts in this test:
# - PopulateShort adds 2N documents of length  L
# - PopulateLong  adds  N documents of length 2L
# Which one would be faster?
# The comparison is done twice.
#
# --------------------------------------------------------

<font color="#990066"># -------------------------------------------------------------------------------------
# multi val params are iterated by NewRound's, added to reports, start with column name.
merge.factor=mrg:10:20
max.buffered=buf:100:1000
compound=true

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=true
doc.term.vector=false
doc.add.log.step=500

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=false
# -------------------------------------------------------------------------------------</font>
<font color="#3300FF">{

    { "PopulateShort"
        CreateIndex
        { AddDoc(4000) > : 20000
        Optimize
        CloseIndex
    >

    ResetSystemErase

    { "PopulateLong"
        CreateIndex
        { AddDoc(8000) > : 10000
        Optimize
        CloseIndex
    >

    ResetSystemErase

    NewRound

} : 2

RepSumByName
RepSelectByPref Populate
</font>
</pre>
</p>

<p>
The command line for running this sample:
<br><code>ant run-task -Dtask.alg=conf/sample.alg</code>
</p>

<p>
The output report from running this test contains the following:
<pre>
Operation     round mrg  buf   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
PopulateShort     0  10  100        1        20003        119.6      167.26    12,959,120     14,241,792
PopulateLong -  - 0  10  100 -  -   1 -  -   10003 -  -  - 74.3 -  - 134.57 -  17,085,208 -   20,635,648
PopulateShort     1  20 1000        1        20003        143.5      139.39    63,982,040     94,756,864
PopulateLong -  - 1  20 1000 -  -   1 -  -   10003 -  -  - 77.0 -  - 129.92 -  87,309,608 -  100,831,232
</pre>
</p>
</DIV>
<DIV>&nbsp;</DIV>
<P>

<P>
<HR>

<!-- ======= START OF BOTTOM NAVBAR ====== -->
<A NAME="navbar_bottom"><!-- --></A><A HREF="#skip-navbar_bottom" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY="">
<TR>
<TD COLSPAN=3 BGCOLOR="#EEEEFF" CLASS="NavBarCell1">
<A NAME="navbar_bottom_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY="">
  <TR ALIGN="center" VALIGN="top">
  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A>&nbsp;</TD>
  <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> &nbsp;<FONT CLASS="NavBarFont1Rev"><B>Package</B></FONT>&nbsp;</TD>
  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <FONT CLASS="NavBarFont1">Class</FONT>&nbsp;</TD>
  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-use.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A>&nbsp;</TD>
  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A>&nbsp;</TD>
  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A>&nbsp;</TD>
  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A>&nbsp;</TD>
  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A>&nbsp;</TD>
  </TR>
</TABLE>
</TD>
<TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM>
</EM>
</TD>
</TR>

<TR>
<TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">
&nbsp;<A HREF="../../../../../org/apache/lucene/benchmark/package-summary.html"><B>PREV PACKAGE</B></A>&nbsp;
&nbsp;<A HREF="../../../../../org/apache/lucene/benchmark/byTask/feeds/package-summary.html"><B>NEXT PACKAGE</B></A></FONT></TD>
<TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">
  <A HREF="../../../../../index.html" target="_top"><B>FRAMES</B></A>  &nbsp;
&nbsp;<A HREF="package-summary.html" target="_top"><B>NO FRAMES</B></A>  &nbsp;
&nbsp;<SCRIPT type="text/javascript">
  <!--
  if(window==top) {
    document.writeln('<A HREF="../../../../../allclasses-noframe.html"><B>All Classes</B></A>');
  }
  //-->
</SCRIPT>
<NOSCRIPT>
  <A HREF="../../../../../allclasses-noframe.html"><B>All Classes</B></A>
</NOSCRIPT>
</FONT></TD>
</TR>
</TABLE>
<A NAME="skip-navbar_bottom"></A><!-- ======== END OF BOTTOM NAVBAR ======= -->

<HR>
Copyright &copy; 2000-2007 Apache Software Foundation.  All Rights Reserved.
</BODY>
</HTML>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?