📄 indexwriter.java

📁 lucene2.2.0版本
💻 JAVA
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
package org.apache.lucene.index;/** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements.  See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License.  You may obtain a copy of the License at * *     http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */import org.apache.lucene.analysis.Analyzer;import org.apache.lucene.document.Document;import org.apache.lucene.search.Similarity;import org.apache.lucene.store.Directory;import org.apache.lucene.store.FSDirectory;import org.apache.lucene.store.Lock;import org.apache.lucene.store.LockObtainFailedException;import org.apache.lucene.store.AlreadyClosedException;import org.apache.lucene.store.RAMDirectory;import java.io.File;import java.io.IOException;import java.io.PrintStream;import java.util.ArrayList;import java.util.List;import java.util.HashMap;import java.util.Iterator;import java.util.Map.Entry;/**  An <code>IndexWriter</code> creates and maintains an index.  <p>The <code>create</code> argument to the   <a href="#IndexWriter(org.apache.lucene.store.Directory, org.apache.lucene.analysis.Analyzer, boolean)"><b>constructor</b></a>  determines whether a new index is created, or whether an existing index is  opened.  Note that you  can open an index with <code>create=true</code> even while readers are  using the index.  The old readers will continue to search  the "point in time" snapshot they had opened, and won't  see the newly created index until they re-open.  There are  also <a href="#IndexWriter(org.apache.lucene.store.Directory, org.apache.lucene.analysis.Analyzer)"><b>constructors</b></a>  with no <code>create</code> argument which  will create a new index if there is not already an index at the  provided path and otherwise open the existing index.</p>  <p>In either case, documents are added with <a  href="#addDocument(org.apache.lucene.document.Document)"><b>addDocument</b></a>  and removed with <a  href="#deleteDocuments(org.apache.lucene.index.Term)"><b>deleteDocuments</b></a>.  A document can be updated with <a href="#updateDocument(org.apache.lucene.index.Term, org.apache.lucene.document.Document)"><b>updateDocument</b></a>   (which just deletes and then adds the entire document).  When finished adding, deleting and updating documents, <a href="#close()"><b>close</b></a> should be called.</p>  <p>These changes are buffered in memory and periodically  flushed to the {@link Directory} (during the above method calls).  A flush is triggered when there are  enough buffered deletes (see {@link  #setMaxBufferedDeleteTerms}) or enough added documents  (see {@link #setMaxBufferedDocs}) since the last flush,  whichever is sooner.  You can also force a flush by  calling {@link #flush}.  When a flush occurs, both pending  deletes and added documents are flushed to the index.  A  flush may also trigger one or more segment merges.</p>  <a name="autoCommit"></a>  <p>The optional <code>autoCommit</code> argument to the  <a href="#IndexWriter(org.apache.lucene.store.Directory, boolean, org.apache.lucene.analysis.Analyzer)"><b>constructors</b></a>  controls visibility of the changes to {@link IndexReader} instances reading the same index.  When this is <code>false</code>, changes are not  visible until {@link #close()} is called.  Note that changes will still be flushed to the  {@link org.apache.lucene.store.Directory} as new files,  but are not committed (no new <code>segments_N</code> file  is written referencing the new files) until {@link #close} is  called.  If something goes terribly wrong (for example the  JVM crashes) before {@link #close()}, then  the index will reflect none of the changes made (it will  remain in its starting state).  You can also call {@link #abort()}, which closes the writer without committing any  changes, and removes any index  files that had been flushed but are now unreferenced.  This mode is useful for preventing readers from refreshing  at a bad time (for example after you've done all your  deletes but before you've done your adds).  It can also be used to implement simple single-writer  transactional semantics ("all or none").</p>  <p>When <code>autoCommit</code> is <code>true</code> then  every flush is also a commit ({@link IndexReader}  instances will see each flush as changes to the index).  This is the default, to match the behavior before 2.2.  When running in this mode, be careful not to refresh your  readers while optimize or segment merges are taking place  as this can tie up substantial disk space.</p>    <p>Regardless of <code>autoCommit</code>, an {@link  IndexReader} or {@link org.apache.lucene.search.IndexSearcher} will only see the  index as of the "point in time" that it was opened.  Any  changes committed to the index after the reader was opened  are not visible until the reader is re-opened.</p>  <p>If an index will not have more documents added for a while and optimal search  performance is desired, then the <a href="#optimize()"><b>optimize</b></a>  method should be called before the index is closed.</p>  <p>Opening an <code>IndexWriter</code> creates a lock file for the directory in use. Trying to open  another <code>IndexWriter</code> on the same directory will lead to a  {@link LockObtainFailedException}. The {@link LockObtainFailedException}  is also thrown if an IndexReader on the same directory is used to delete documents  from the index.</p>    <a name="deletionPolicy"></a>  <p>Expert: <code>IndexWriter</code> allows an optional  {@link IndexDeletionPolicy} implementation to be  specified.  You can use this to control when prior commits  are deleted from the index.  The default policy is {@link  KeepOnlyLastCommitDeletionPolicy} which removes all prior  commits as soon as a new commit is done (this matches  behavior before 2.2).  Creating your own policy can allow  you to explicitly keep previous "point in time" commits  alive in the index for some time, to allow readers to  refresh to the new commit without having the old commit  deleted out from under them.  This is necessary on  filesystems like NFS that do not support "delete on last  close" semantics, which Lucene's "point in time" search  normally relies on. </p>  *//* * Clarification: Check Points (and commits) * Being able to set autoCommit=false allows IndexWriter to flush and  * write new index files to the directory without writing a new segments_N * file which references these new files. It also means that the state of  * the in memory SegmentInfos object is different than the most recent * segments_N file written to the directory. *  * Each time the SegmentInfos is changed, and matches the (possibly  * modified) directory files, we have a new "check point".  * If the modified/new SegmentInfos is written to disk - as a new  * (generation of) segments_N file - this check point is also an  * IndexCommitPoint. *  * With autoCommit=true, every checkPoint is also a CommitPoint. * With autoCommit=false, some checkPoints may not be commits. *  * A new checkpoint always replaces the previous checkpoint and  * becomes the new "front" of the index. This allows the IndexFileDeleter  * to delete files that are referenced only by stale checkpoints. * (files that were created since the last commit, but are no longer * referenced by the "front" of the index). For this, IndexFileDeleter  * keeps track of the last non commit checkpoint. */public class IndexWriter {  /**   * Default value for the write lock timeout (1,000).   * @see #setDefaultWriteLockTimeout   */  public static long WRITE_LOCK_TIMEOUT = 1000;  private long writeLockTimeout = WRITE_LOCK_TIMEOUT;  /**   * Name of the write lock in the index.   */  public static final String WRITE_LOCK_NAME = "write.lock";  /**   * Default value is 10. Change using {@link #setMergeFactor(int)}.   */  public final static int DEFAULT_MERGE_FACTOR = 10;  /**   * Default value is 10. Change using {@link #setMaxBufferedDocs(int)}.   */  public final static int DEFAULT_MAX_BUFFERED_DOCS = 10;  /**   * Default value is 1000. Change using {@link #setMaxBufferedDeleteTerms(int)}.   */  public final static int DEFAULT_MAX_BUFFERED_DELETE_TERMS = 1000;  /**   * Default value is {@link Integer#MAX_VALUE}. Change using {@link #setMaxMergeDocs(int)}.   */  public final static int DEFAULT_MAX_MERGE_DOCS = Integer.MAX_VALUE;  /**   * Default value is 10,000. Change using {@link #setMaxFieldLength(int)}.   */  public final static int DEFAULT_MAX_FIELD_LENGTH = 10000;  /**   * Default value is 128. Change using {@link #setTermIndexInterval(int)}.   */  public final static int DEFAULT_TERM_INDEX_INTERVAL = 128;    // The normal read buffer size defaults to 1024, but  // increasing this during merging seems to yield  // performance gains.  However we don't want to increase  // it too much because there are quite a few  // BufferedIndexInputs created during merging.  See  // LUCENE-888 for details.  private final static int MERGE_READ_BUFFER_SIZE = 4096;  private Directory directory;  // where this index resides  private Analyzer analyzer;    // how to analyze text  private Similarity similarity = Similarity.getDefault(); // how to normalize  private boolean commitPending; // true if segmentInfos has changes not yet committed  private SegmentInfos rollbackSegmentInfos;      // segmentInfos we will fallback to if the commit fails  private SegmentInfos localRollbackSegmentInfos;      // segmentInfos we will fallback to if the commit fails  private boolean localAutoCommit;                // saved autoCommit during local transaction  private boolean autoCommit = true;              // false if we should commit only on close  SegmentInfos segmentInfos = new SegmentInfos();       // the segments  SegmentInfos ramSegmentInfos = new SegmentInfos();    // the segments in ramDirectory  private final RAMDirectory ramDirectory = new RAMDirectory(); // for temp segs  private IndexFileDeleter deleter;  private Lock writeLock;  private int termIndexInterval = DEFAULT_TERM_INDEX_INTERVAL;  // The max number of delete terms that can be buffered before  // they must be flushed to disk.  private int maxBufferedDeleteTerms = DEFAULT_MAX_BUFFERED_DELETE_TERMS;  // This Hashmap buffers delete terms in ram before they are applied.  // The key is delete term; the value is number of ram  // segments the term applies to.  private HashMap bufferedDeleteTerms = new HashMap();  private int numBufferedDeleteTerms = 0;  /** Use compound file setting. Defaults to true, minimizing the number of   * files used.  Setting this to false may improve indexing performance, but   * may also cause file handle problems.   */  private boolean useCompoundFile = true;  private boolean closeDir;  private boolean closed;  /**   * Used internally to throw an {@link   * AlreadyClosedException} if this IndexWriter has been   * closed.   * @throws AlreadyClosedException if this IndexWriter is   */  protected final void ensureOpen() throws AlreadyClosedException {    if (closed) {      throw new AlreadyClosedException("this IndexWriter is closed");    }  }  /** Get the current setting of whether to use the compound file format.   *  Note that this just returns the value you set with setUseCompoundFile(boolean)   *  or the default. You cannot use this to query the status of an existing index.   *  @see #setUseCompoundFile(boolean)   */  public boolean getUseCompoundFile() {    ensureOpen();    return useCompoundFile;  }  /** Setting to turn on usage of a compound file. When on, multiple files   *  for each segment are merged into a single file once the segment creation   *  is finished. This is done regardless of what directory is in use.   */  public void setUseCompoundFile(boolean value) {    ensureOpen();    useCompoundFile = value;  }  /** Expert: Set the Similarity implementation used by this IndexWriter.   *   * @see Similarity#setDefault(Similarity)   */  public void setSimilarity(Similarity similarity) {    ensureOpen();    this.similarity = similarity;  }  /** Expert: Return the Similarity implementation used by this IndexWriter.   *   * <p>This defaults to the current value of {@link Similarity#getDefault()}.   */  public Similarity getSimilarity() {    ensureOpen();    return this.similarity;  }  /** Expert: Set the interval between indexed terms.  Large values cause less   * memory to be used by IndexReader, but slow random-access to terms.  Small   * values cause more memory to be used by an IndexReader, and speed   * random-access to terms.   *   * This parameter determines the amount of computation required per query   * term, regardless of the number of documents that contain that term.  In   * particular, it is the maximum number of other terms that must be   * scanned before a term is located and its frequency and position information   * may be processed.  In a large index with user-entered query terms, query   * processing time is likely to be dominated not by term lookup but rather   * by the processing of frequency and positional data.  In a small index   * or when many uncommon query terms are generated (e.g., by wildcard   * queries) term lookup may become a dominant cost.   *   * In particular, <code>numUniqueTerms/interval</code> terms are read into   * memory by an IndexReader, and, on average, <code>interval/2</code> terms   * must be scanned for each random term access.   *   * @see #DEFAULT_TERM_INDEX_INTERVAL   */  public void setTermIndexInterval(int interval) {    ensureOpen();    this.termIndexInterval = interval;  }  /** Expert: Return the interval between indexed terms.   *   * @see #setTermIndexInterval(int)   */  public int getTermIndexInterval() {    ensureOpen();    return termIndexInterval;  }  /**   * Constructs an IndexWriter for the index in <code>path</code>.   * Text will be analyzed with <code>a</code>.  If <code>create</code>   * is true, then a new, empty index will be created in   * <code>path</code>, replacing the index already there, if any.   *
12 3 4 5 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -