⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 mutablestring.java

📁 MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections
💻 JAVA
📖 第 1 页 / 共 5 页
字号:
package it.unimi.dsi.mg4j.util;/*		  * MG4J: Managing Gigabytes for Java * * Copyright (C) 2002-2007 Paolo Boldi and Sebastiano Vigna  * *  This library is free software; you can redistribute it and/or modify it *  under the terms of the GNU Lesser General Public License as published by the Free *  Software Foundation; either version 2.1 of the License, or (at your option) *  any later version. * *  This library is distributed in the hope that it will be useful, but *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY *  or FITfNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License *  for more details. * *  You should have received a copy of the GNU Lesser General Public License *  along with this program; if not, write to the Free Software *  Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */import it.unimi.dsi.fastutil.chars.Char2CharMap;import it.unimi.dsi.fastutil.chars.CharArrays;import it.unimi.dsi.fastutil.chars.CharList;import it.unimi.dsi.fastutil.chars.CharSet;import it.unimi.dsi.fastutil.objects.ObjectArrays;import java.io.DataInput;import java.io.DataOutput;import java.io.EOFException;import java.io.IOException;import java.io.InputStream;import java.io.ObjectInputStream;import java.io.ObjectOutputStream;import java.io.OutputStream;import java.io.PrintStream;import java.io.PrintWriter;import java.io.Reader;import java.io.Serializable;import java.io.UTFDataFormatException;import java.io.Writer;/** Fast, compact, optimised &amp; versatile mutable strings. * * <h3>Motivation</h3> * * <P>The classical Java string classes, {@link java.lang.String} and {@link * java.lang.StringBuffer}, lie at the extreme of a spectrum (immutable and * mutable). * * <P>However, large-scale text indexing requires some features that are not * provided by these classes: in particular, the possibility of using a mutable * string, once frozen, in the same optimised way of an immutable * string. Moreover, usually we do not need synchronisation (which makes * {@link java.lang.StringBuffer} slow). *  * <P>In a typical scenario you are dividing text into words (so you use a * <em>mutable</em> string to accumulate characters). Once you've got your * word, you would like to check whether this word is in a dictionary * <em>without creating a new object</em>. However, equality of * <code>StringBuffer</code>s is not defined on their content, and storing * words after a conversion to <code>String</code> will not help either, as * then you would need to convert the current mutable string into an immutable * one (thus creating a new object) <em>before deciding whether you need to * store it</em>. *  * <P>This class tries to make the best of both worlds, and thus aims at being * a Better Mousetrap&trade;.  * * <P>You can read more details about the design of <code>MutableString</code> * in Paolo Boldi and Sebastiano Vigna,  * &ldquo;<a href="http://vigna.dsi.unimi.it/papers.php#BoVMSJ">Mutable strings * in Java: Design, implementation and lightweight text-search * algorithms</a>&rdquo;, <i>Sci. Comput. Programming</i>, 54(1):3-23, 2005. * * <h3>Features</h3> * * Mutable strings come in two flavours: <em>compact</em> and * <em>loose</em>. A mutable string created by the empty constructor or * the constructor specifying a capacity is loose. All other * constructors create compact mutable strings. In most cases, you can completely * forget whether your mutable strings are loose or compact and get * good performance. * * <P><ul>  * * <li>Mutable strings occupy little space&mdash; their only attributes are a * backing character array and an integer (they are smaller of both * <code>String</code>s and <code>StringBuffer</code>s); * * <li>they are not synchronised;  * * <li>their methods try to be as efficient as possible:for instance, if some * limitation on a parameter is implied by limitation on array access, we do * not check it explicitly, and Bloom filters are used to speed up {@link * #replace(char[],String[]) multi-character substitutions}; * * <li>they let you access directly the backing array * (at your own risk); * * <li>they implement {@link CharSequence}, so, for instance, you can match or split a * mutable string against a regular expression using the {@linkplain java.util.regex.Pattern standard Java API}; * * <li>they implement {@link Appendable}, so they can be used with {@link java.util.Formatter} * and similar classes; * * <li><code>null</code>is not accepted as a string argument;  * * <li>compact mutable strings have a slow growth; loose mutable strings have a fast growth; * * <li>hash codes of compact mutable strings are cached (for faster * equality checks); * * <li>typical conversions such as trimming, upper/lower casing and * replacements are made in place, with minimal reallocations; * * <li>all methods try, whenever it is possible, to return <code>this</code>, so * you can chain methods as in <code>s.length(0).append("foo").append("bar")</code>; * * <li>you can write or print a mutable string without creating a * <code>String</code> by using {@link #write(Writer)}, {@link * #print(PrintWriter)} and {@link #println(PrintWriter)}; you can read it back * using {@link #read(Reader,int)}. * * <li>you can write <em>any</em> mutable string in (length-prefixed) UTF-8 * format by using {@link #writeSelfDelimUTF8(DataOutput)}&mdash;you are not * limited to strings whose UTF-8 encoded length fits 16 bits; * * <li>you can {@link #wrap(char[]) wrap} any character array into a mutable string; * * <li>this class is not final: thus, you can add your own methods to * specialised versions. * * </ul> * * <P>Committing to use this class for such an ubiquitous data structure as * strings may seem dangerous, as standard string classes are by now tested and * stable. However, this class has been heavily regression (and torture) tested * on all methods, and we believe it is very reliable. * * <P>To simplify the transition to mutable strings, we have tried to make * mixing string classes simpler by providing polymorphic versions of all * methods accepting one or more strings&mdash;whenever you must specify a * string you can usually provide a <code>MutableString</code>, a * <code>String</code>, or a generic <code>CharSequence</code>. * * <P>Note that usually we provide a specific method for * <code>String</code>. This duplication may seem useless, as * <code>String</code> implements <code>CharSequence</code>. However, invoking * methods on an interface is slower than invoking methods on a class, and we * expect constant strings to appear often in such methods. * * <h3>The Reallocation Heuristic</h3> * * <P>Backing array reallocations use a heuristic based on looseness.  Whenever * an operation changes the length, compact strings are resized to fit * <em>exactly</em> the new content, whereas the capacity of a loose string is * never shortened, and enlargements maximise the new length required with the * double of the current capacity. * * <P>The effect of this policy is that loose strings will get large buffers * quickly, but compact strings will occupy little space and perform very well * in data structures using hash codes. * * <P>For instance, you can easily reuse a loose mutable string calling {@link * #length(int) length(0)} (which does <em>not</em> reallocate the backing * array). * * <P>In any case, you can call {@link #compact()} and {@link #loose()} to force  * the respective condition. * * <h3>Disadvantages</h3> * * <P>The main disadvantage of mutable strings is that their substrings * cannot share their backing arrays, so if you need to generate many * substrings you may want to use <code>String</code>. However, {@link * #subSequence(int,int) subSequence()} returns a {@link CharSequence} that * shares the backing array. * * <h3>Warnings</h3> * * There are a few differences with standard string classes you should be aware * of. * * <ol> * * <li><STRONG>This class is not synchronised</STRONG>.  If multiple threads * access an object of this class concurrently, and at least one of the threads * modifies it, it must be synchronised externally. * * <li>This class implements polymorphic versions of the {@link #equals(Object) * equals} method that compare the <em>content</em> of <code>String</code>s and * <code>CharSequence</code>s, so that you can easily do checks like * * <PRE> *         mutableString.equals( "Hello" )  * </PRE>  * * Thus, you must <em>not</em> mix mutable strings with * <code>CharSequence</code>s in collections as equality between objects of * those types is not symmetric. *  * <li>When the length of a string or char array argument is zero, * some methods may just do nothing even if other parameters are out of * bounds. * * <li>The output of {@link #writeSelfDelimUTF8(DataOutput) writeSelfDelimUTF8()} is * <em>not</em> compatible with the usual Java {@link * DataOutput#writeUTF(String) writeUTF()}. * * <li>Even if this class is not final, most <em>methods</em> are declared * final for efficiency, so you cannot override them (why should you ever want * to override {@link #array()}?). * * </ol> * * <strong>Warning</strong>: in MG4J 1.2, the <code>compareTo(Object)</code> method * has been removed as it was incompatible with generics. * * <h3>Benchmarking</h3> * * <P>Benchmarking a string class is an almost impossible task, as patterns of * usage may vary wildly from application to application. Here we give some * simple data about text scanning: we want to count the number of occurrences * of each word (a maximal subsequence of characters satisfying {@link * Character#isLetterOrDigit(char) isLetterOrDigit()}) in a file of about 200 Mbytes. * <pre> * $ java it.unimi.dsi.mg4j.test.StringSpeedTest &lt;/tmp/test * Read 30090870 words in 33328 ms (902843.4696510546 words/s) * $ java it.unimi.dsi.mg4j.test.MutableStringSpeedTest &lt;/tmp/test * Read 30090870 words in 12745 ms (2360994.1153393486 words/s) * </pre> * <P>Note that 30% of the running time is spent in I/O latency and in {@link * Character#isLetterOrDigit(char) isLetterOrDigit()}, so the actual speedup is * even greater (of course, to get this performance you need a map from <a * href="http://fastutil.dsi.unimi.it/">fastutil</A>).   * * @author Sebastiano Vigna * @author Paolo Boldi * @since 0.3 * @deprecated Moved to <code>dsiutils</code>. */@Deprecatedpublic class MutableString implements Serializable, CharSequence, Appendable, Comparable<MutableString>, Cloneable {	/** A mutable string containing <samp>null</samp>, used for implementing {@link Appendable}'s semantics. */	private final static MutableString NULL = new MutableString( "null" );		/** The backing array. */	protected transient char[] array;	/** This mutable string is compact iff this attribute is negative.	 * It the string is compact, the attribute is its hash code (-1 denotes the invalid	 * hash code). If the string is loose, the attribute is the number of 	 * characters actually stored in the backing array. */	protected transient int hashLength;	public static final long serialVersionUID = -518929984008928417L;    	/** Creates a new loose empty mutable string with capacity 2. */	public MutableString() {		this( 2 );	}	/** Creates a new loose empty mutable string with given capacity.	 *	 * @param capacity the required capacity.	 */	public MutableString( final int capacity ) {		array = capacity != 0 ? new char[ capacity ] : CharArrays.EMPTY_ARRAY;	}	/** Creates a new compact mutable string with given length.	 *	 * @param length the desired length of the new string.	 */	private void makeCompactMutableString( final int length ) {		array = length != 0 ? new char[ length ] : CharArrays.EMPTY_ARRAY;		hashLength = -1;	}	/** Creates a new compact mutable string copying a given mutable string.	 *	 * @param s the initial contents of the string.	 */	public MutableString( final MutableString s ) {		makeCompactMutableString( s.length() );		System.arraycopy( s.array, 0, array, 0, array.length );    }    /** Creates a new compact mutable string copying a given <code>String</code>.     *     * @param s the initial contents of the string.     */    public MutableString( final String s ) {		makeCompactMutableString( s.length() );		s.getChars( 0, array.length, array, 0 );    }    /** Creates a new compact mutable string copying a given <code>CharSequence</code>.     *     * @param s the initial contents of the string.     */    public MutableString( final CharSequence s ) {		makeCompactMutableString( s.length() );		getChars( s, 0, array.length, array, 0 );    }    /** Creates a new compact mutable string copying a given character array.     *     * @param a the initial contents of the string.     */    public MutableString( final char[] a ) {		makeCompactMutableString( a.length );		System.arraycopy( a, 0, array, 0, array.length );    }    /** Creates a new compact mutable string copying a part of a given character array.     *     * @param a a character array.     * @param offset an offset into the array.     * @param len how many characters to copy.     */    public MutableString( final char[] a, final int offset, final int len ) {		makeCompactMutableString( len );		System.arraycopy( a, offset, array, 0, len );    }    /** Creates a new compact mutable string by copying this one.	 *     * @return a compact copy of this mutable string.     */    public MutableString copy() {		return new MutableString( this );    }    /** Creates a new compact mutable string by copying this one.	 *	 * <P>This method is identical to {@link #copy}, but the latter returns	 * a more specific type.	 *     * @return a compact copy of this mutable string.     */    public Object clone() {		return new MutableString( this );    }    /** Commodity static method implementing {@link     *  java.lang.String#getChars(int,int,char[],int)} for a <code>CharSequence</code>s.      *     * @param s a <code>CharSequence</code>.     * @param start copy start from this index (inclusive).     * @param end copy ends at this index (exclusive).

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -