package-summary.html
来自「 Lucene是apache软件基金会[4] jakarta项目组的一个子项目」· HTML 代码 · 共 625 行 · 第 1/2 页
HTML
625 行
Existing tasks can be divided into a few groups:
regular index/search work tasks, report tasks, and control tasks.
</p>
<ol>
<li>
<b>Report tasks</b>: There are a few Report commands for generating reports.
Only task runs that were completed are reported.
(The 'Report tasks' themselves are not measured and not reported.)
<ul>
<li>
<font color="#FF0066">RepAll</font> - all (completed) task runs.
</li>
<li>
<font color="#FF0066">RepSumByName</font> - all statistics, aggregated by name. So, if AddDoc was executed 2000 times,
only 1 report line would be created for it, aggregating all those 2000 statistic records.
</li>
<li>
<font color="#FF0066">RepSelectByPref prefixWord</font> - all records for tasks whose name start with <font color="#FF0066">prefixWord</font>.
</li>
<li>
<font color="#FF0066">RepSumByPref prefixWord</font> - all records for tasks whose name start with <font color="#FF0066">prefixWord</font>,
aggregated by their full task name.
</li>
<li>
<font color="#FF0066">RepSumByNameRound</font> - all statistics, aggregated by name and by <font color="#FF0066">Round</font>.
So, if AddDoc was executed 2000 times in each of 3 <font color="#FF0066">rounds</font>, 3 report lines would be created for it,
aggregating all those 2000 statistic records in each round. See more about rounds in the <font color="#FF0066">NewRound</font> command description below.
</li>
<li>
<font color="#FF0066">RepSumByPrefRound prefixWord</font> - similar to <font color="#FF0066">RepSumByNameRound</font>,
just that only tasks whose name starts with <font color="#FF0066">prefixWord</font> are included.
</li>
</ul>
If needed, additional reports can be added by extending the abstract class ReportTask, and by
manipulating the statistics data in Points and TaskStats.
</li>
<li><b>Control tasks</b>: Few of the tasks control the benchmark algorithm all over:
<ul>
<li>
<font color="#FF0066">ClearStats</font> - clears the entire statistics.
Further reports would only include task runs that would start after this call.
</li>
<li>
<font color="#FF0066">NewRound</font> - virtually start a new round of performance test.
Although this command can be placed anywhere, it mostly makes sense at the end of an outermost sequence.
<br>This increments a global "round counter". All task runs that would start now would
record the new, updated round counter as their round number. This would appear in reports.
In particular, see <font color="#FF0066">RepSumByNameRound</font> above.
<br>An additional effect of NewRound, is that numeric and boolean properties defined (at the head
of the .alg file) as a sequence of values, e.g. <font color="#FF0066">merge.factor=mrg:10:100:10:100</font> would
increment (cyclic) to the next value.
Note: this would also be reflected in the reports, in this case under a column that would be named "mrg".
</li>
<li>
<font color="#FF0066">ResetInputs</font> - DocMaker and the various QueryMakers
would reset their counters to start.
The way these Maker interfaces work, each call for makeDocument()
or makeQuery() creates the next document or query
that it "knows" to create.
If that pool is "exhausted", the "maker" start over again. The resetInpus command
therefore allows to make the rounds comparable.
It is therefore useful to invoke ResetInputs together with NewRound.
</li>
<li>
<font color="#FF0066">ResetSystemErase</font> - reset all index and input data and call gc.
Does NOT reset statistics. This contains ResetInputs.
All writers/readers are nullified, deleted, closed.
Index is erased.
Directory is erased.
You would have to call CreateIndex once this was called...
</li>
<li>
<font color="#FF0066">ResetSystemSoft</font> - reset all index and input data and call gc.
Does NOT reset statistics. This contains ResetInputs.
All writers/readers are nullified, closed.
Index is NOT erased.
Directory is NOT erased.
This is useful for testing performance on an existing index, for instance if the construction of a large index
took a very long time and now you would to test its search or update performance.
</li>
</ul>
</li>
<li>
Other existing tasks are quite straightforward and would just be briefly described here.
<ul>
<li>
<font color="#FF0066">CreateIndex</font> and <font color="#FF0066">OpenIndex</font> both leave the index open for later update operations.
<font color="#FF0066">CloseIndex</font> would close it.
</li>
<li>
<font color="#FF0066">OpenReader</font>, similarly, would leave an index reader open for later search operations.
But this have further semantics.
If a Read operation is performed, and an open reader exists, it would be used.
Otherwise, the read operation would open its own reader and close it when the read operation is done.
This allows testing various scenarios - sharing a reader, searching with "cold" reader, with "warmed" reader, etc.
The read operations affected by this are: <font color="#FF0066">Warm</font>,
<font color="#FF0066">Search</font>, <font color="#FF0066">SearchTrav</font> (search and traverse),
and <font color="#FF0066">SearchTravRet</font> (search and traverse and retrieve).
Notice that each of the 3 search task types maintains its own queryMaker instance.
</li>
</ul
</li>
</ol>
<a name="properties"></a>
<h2>Benchmark properties</h2>
<p>
Properties are read from the header of the .alg file, and
define several parameters of the performance test.
As mentioned above for the <font color="#FF0066">NewRound</font> task,
numeric and boolean properties that are defined as a sequence
of values, e.g. <font color="#FF0066">merge.factor=mrg:10:100:10:100</font>
would increment (cyclic) to the next value, when NewRound is called, and would also
appear as a named column in the reports (column name would be "mrg" in this example).
</p>
<p>
Some of the currently defined properties are:
</p>
<ol>
<li>
<font color="#FF0066">analyzer</font> - full class name for the analyzer to use.
Same analyzer would be used in the entire test.
</li>
<li>
<font color="#FF0066">directory</font> - valid values are FSDirectory and RAMDirectory.
This tells which directory to use for the performance test.
</li>
<li>
<b>Index work parameters</b>:
Multi int/boolean values would be iterated with calls to NewRound.
There would be also added as columns in the reports, first string in the
sequence is the column name.
(Make sure it is no shorter than any value in the sequence).
<ul>
<li><font color="#FF0066">max.buffered</font>
<br>Example: max.buffered=buf:10:10:100:100 -
this would define using maxBufferedDocs of 10 in iterations 0 and 1,
and 100 in iterations 2 and 3.
</li>
<li>
<font color="#FF0066">merge.factor</font> - which
merge factor to use.
</li>
<li>
<font color="#FF0066">compound</font> - whether the index is
using the compound format or not. Valid values are "true" and "false".
</li>
</ul>
</ol>
<p>
For additional defined properties see the *.alg files under conf.
</p>
<a name="example"></a>
<h2>Example input algorithm and the result benchmark report</h2>
<p>
The following example is in conf/sample.alg:
<pre>
<font color="#003333"># --------------------------------------------------------
#
# Sample: what is the effect of doc size on indexing time?
#
# There are two parts in this test:
# - PopulateShort adds 2N documents of length L
# - PopulateLong adds N documents of length 2L
# Which one would be faster?
# The comparison is done twice.
#
# --------------------------------------------------------
<font color="#990066"># -------------------------------------------------------------------------------------
# multi val params are iterated by NewRound's, added to reports, start with column name.
merge.factor=mrg:10:20
max.buffered=buf:100:1000
compound=true
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory
doc.stored=true
doc.tokenized=true
doc.term.vector=false
doc.add.log.step=500
docs.dir=reuters-out
doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker
query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker
# task at this depth or less would print when they start
task.max.depth.log=2
log.queries=false
# -------------------------------------------------------------------------------------</font>
<font color="#3300FF">{
{ "PopulateShort"
CreateIndex
{ AddDoc(4000) > : 20000
Optimize
CloseIndex
>
ResetSystemErase
{ "PopulateLong"
CreateIndex
{ AddDoc(8000) > : 10000
Optimize
CloseIndex
>
ResetSystemErase
NewRound
} : 2
RepSumByName
RepSelectByPref Populate
</font>
</pre>
</p>
<p>
The command line for running this sample:
<br><code>ant run-task -Dtask.alg=conf/sample.alg</code>
</p>
<p>
The output report from running this test contains the following:
<pre>
Operation round mrg buf runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
PopulateShort 0 10 100 1 20003 119.6 167.26 12,959,120 14,241,792
PopulateLong - - 0 10 100 - - 1 - - 10003 - - - 74.3 - - 134.57 - 17,085,208 - 20,635,648
PopulateShort 1 20 1000 1 20003 143.5 139.39 63,982,040 94,756,864
PopulateLong - - 1 20 1000 - - 1 - - 10003 - - - 77.0 - - 129.92 - 87,309,608 - 100,831,232
</pre>
</p>
</DIV>
<DIV> </DIV>
<P>
<P>
<HR>
<!-- ======= START OF BOTTOM NAVBAR ====== -->
<A NAME="navbar_bottom"><!-- --></A><A HREF="#skip-navbar_bottom" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY="">
<TR>
<TD COLSPAN=3 BGCOLOR="#EEEEFF" CLASS="NavBarCell1">
<A NAME="navbar_bottom_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY="">
<TR ALIGN="center" VALIGN="top">
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A> </TD>
<TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> <FONT CLASS="NavBarFont1Rev"><B>Package</B></FONT> </TD>
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <FONT CLASS="NavBarFont1">Class</FONT> </TD>
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-use.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A> </TD>
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A> </TD>
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A> </TD>
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A> </TD>
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A> </TD>
</TR>
</TABLE>
</TD>
<TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM>
</EM>
</TD>
</TR>
<TR>
<TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">
<A HREF="../../../../../org/apache/lucene/benchmark/package-summary.html"><B>PREV PACKAGE</B></A>
<A HREF="../../../../../org/apache/lucene/benchmark/byTask/feeds/package-summary.html"><B>NEXT PACKAGE</B></A></FONT></TD>
<TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">
<A HREF="../../../../../index.html" target="_top"><B>FRAMES</B></A>
<A HREF="package-summary.html" target="_top"><B>NO FRAMES</B></A>
<SCRIPT type="text/javascript">
<!--
if(window==top) {
document.writeln('<A HREF="../../../../../allclasses-noframe.html"><B>All Classes</B></A>');
}
//-->
</SCRIPT>
<NOSCRIPT>
<A HREF="../../../../../allclasses-noframe.html"><B>All Classes</B></A>
</NOSCRIPT>
</FONT></TD>
</TR>
</TABLE>
<A NAME="skip-navbar_bottom"></A><!-- ======== END OF BOTTOM NAVBAR ======= -->
<HR>
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.
</BODY>
</HTML>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?