📄 svdmatrix.java
字号:
* moves from its initial size to one tenth (<code>1/10</code>) of * its initial size after <code>9 * annealingRate</code> epochs, * and one hundredth of its initial size after * <code>99 * annealingRate</code> epochs.. * * <h3>Parameter Choices</h3> * * <p>The previous discussion has introduced a daunting list of * parameters required for gradient descent for singular value * decomposition. Unfortunately, the stochastic gradient descent * solver requires a fair amount of tuning to recover a low mean * square error factorization. The optimal settings will also depend * on the input matrix; for example, very sparse partial matrices are * much easier to fit than dense matrices. * * <h4>Maximum Order</h4> * * <p>Determining the order of the decomposition is mostly a matter * of determining how many orders are needed for the amount of * smoothing required. Low order reconstructions are useful for * most applications. One way to determine maximum order is using * held out data for an application. Another is to look for * a point where the singular values become (relatively) insignificant. * * * <h4>Maximum Epochs and Early Stopping</h4> * * <p>For low mean square error factorizations, many epochs may be * necessary. Lowering the maximum number of epochs leads to what is * known as early stopping. Sometimes, early stopping leads to * more accurate held-out predictions, but it will always raise * the factorization error (training data predictions). In general, * the maximum number of epochs needs to be set empirically. * Initially, try fewer epochs and gradually raise the number of * epochs until the desired accuracy is achieved. * * <h4>Minimum Improvement</h4> * * By setting the minimum improvement to 0.0, the algorithm is forced * to use the maximum number of epochs. By setting it above this * level, a form of early stopping is achieved when the benefit of * another epoch of refitting falls below a given improvement * threshold. This value may also be set on an application basis * using held out data, or it may be fit to achieve a given level of * mean square error on the training (input) data. The minimum * improvement needs to be set fairly low (less than 0.000001) to * achieve reasonably precise factorizations. Note that minimum * improvement is defined relatively, as noted above. * * <h4>Initial Parameter Values</h4> * * <p>Initial values of the singular vectors are not particularly * sensitive, because we are using multiplicative updates. A good * rule of thumb for starting values is the the inverse square root of * the maximum order: * * <blockquote><pre> * initVal ~ 1 / sqrt(maxOrder)</pre></blockquote> * * <h4>Learning Rate, Maximum Epochs and Annealing</h4> * * <p>A good starting point for the learning rate is 0.01. The * annealing parameter should be set so that the learning rate is cut * by at least a factor of 10 in the final rounds. This calls for a * value that's roughly one tenth of the maximum number of epochs. If * the initial learning rate is set higher, then the annealing * schedule should be more agressive so that it spends a fair amount * of time in the 0.001 to 0.0001 learning rate range. * * * <h3>References</h3> * * <p>There are thorough Wikipedia articles on singular value decomposition * and gradient descent, although the SVD entry focuses entirely on * complete (non-partial) matrices: * * <ul> * <li><a href="http://en.wikipedia.org/wiki/Singular_value_decomposition">Wikipedia: Singular Value Decomposition</a></li> * <li><a href="http://en.wikipedia.org/wiki/Gradient_descent">Wikipedia: Gradient Descent</a></li> * </ul> * * Both of the standard machine learning textbooks have good * theoretical explanations of least squares, regularization, * gradient descent, and singular value decomposition, but not * all together: * * <ul> * <li>Chris Bishop. 2007. <i>Pattern Recognition and Machine Learning.</i> Springer.</li> * <li>Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2001. * <i>The Elements of Statistical Learning</i>. Springer.</i> * </ul> * * <p>The following is a particularly clear explanation of many of * the issues involved in the context of neural networks: * * <ul> * <li>Genevieve Orr, Nici Schraudolph, and Fred Cummins. 1999.<a href="http://www.willamette.edu/~gorr/classes/cs449/intro.html">CS-449: Neural Networks</a>. Willamette University course notes.</li> * </ul> * <p>Our partial SVD solver is based on C code from Timely Development * (see license below). Timely based their code on Simon Funk's * blog entry. Simon Funk's approach was itself based on Genevieve Gorrell's * 2006 EACL paper. * * <ul> * <li><a href="http://www.timelydevelopment.com/Demos/NetflixPrize.htm">Timely Development's Netflix Prize Page</a>. * <li>Simon Funk (pseudonym for Brandyn Webb). 2007. * <a href="http://sifter.org/~simon/journal/20061211.html">Gradient Descent SVD Algorithm</a>. <i>The Evolution of Cybernetics</i> blog. * <li>Genevieve Gorrell. 2006. * <a href="http://acl.ldc.upenn.edu/E/E06/E06-1013.pdf">Generalized Hebbian Algorithm for Incremental Singular ValueDecomposition in Natural Language Processing</a>. EACL 2006.</li>* <li>Genevieve Gorrell. 2006.* <a href="http://www.dcs.shef.ac.uk/~genevieve/gorrell_thesis.pdf">Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing</a>. Ph.D. Thesis.* Linköping University. Sweden.</li> * </ul> * * <h3>Acknowledgements</h3> * * <p>The singular value decomposition code is rather loosely based on * a C program developed by <a * href="http://www.timelydevelopment.com/">Timely Development</a>. * Here is the copyright notice for the original code: * * <blockquote><pre style="font-size:80%"> * // SVD Sample Code * // * // Copyright (C) 2007 Timely Development (www.timelydevelopment.com) * // * // Special thanks to Simon Funk and others from the Netflix Prize contest * // for providing pseudo-code and tuning hints. * // * // Feel free to use this code as you wish as long as you include * // these notices and attribution. * // * // Also, if you have alternative types of algorithms for accomplishing * // the same goal and would like to contribute, please share them as well :) * // * // STANDARD DISCLAIMER: * // * // - THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY * // - OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT * // - LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR * // - FITNESS FOR A PARTICULAR PURPOSE. * </pre></blockquote> * * @author Bob Carpenter * @version 3.2 * @since LingPipe3.2 */public class SvdMatrix extends AbstractMatrix { private final double[][] mRowVectors; private final double[][] mColumnVectors; private final int mOrder; /** * Construct a factored matrix using the specified row and column * vectors at the specified order. Each vector in the arrays of * row and column vectors must be of the same dimension as the * order. * * <p>See the class documentation above for more information * on singular value decomposition. * * @param rowVectors Row vectors, indexed by row. * @param columnVectors Column vectors, indexed by column. * @param order Dimensionality of the row and column vectors. */ public SvdMatrix(double[][] rowVectors, double[][] columnVectors, int order) { verifyDimensions("row",order,rowVectors); verifyDimensions("column",order,columnVectors); mRowVectors = rowVectors; mColumnVectors = columnVectors; mOrder = order; } /** * Construct an SVD matrix using the specified singular vectors * and singular values. The order of the factorization is equal to * the length of the singular values. Each singular vector must * be the same dimensionality as the array of singular values. * * <p>See the class documentation above for more information * on singular value decomposition. * * @param rowSingularVectors Row singular vectors, indexed by row. * @param columnSingularVectors Column singular vectors, indexed by column. * @param singularValues Array of singular values, in decreasing order. */ public SvdMatrix(double[][] rowSingularVectors, double[][] columnSingularVectors, double[] singularValues) { mOrder = singularValues.length; verifyDimensions("row",mOrder,rowSingularVectors); verifyDimensions("column",mOrder,columnSingularVectors); mRowVectors = new double[rowSingularVectors.length][mOrder]; mColumnVectors = new double[columnSingularVectors.length][mOrder]; double[] sqrtSingularValues = new double[singularValues.length]; for (int i = 0; i < sqrtSingularValues.length; ++i) sqrtSingularValues[i] = Math.sqrt(singularValues[i]); scale(mRowVectors,rowSingularVectors,sqrtSingularValues); scale(mColumnVectors,columnSingularVectors,sqrtSingularValues); } /** * Returns the number of rows in this matrix. * * @return The number of rows in this matrix. */ public int numRows() { return mRowVectors.length; } /** * Returns the number of columns in this matrix. * * @return The number of columns in this matrix. */ public int numColumns() { return mColumnVectors.length; } /** * Returns the order of this factorization. The order * is the number of dimensions in the singular vectors * and the singular values. * * @return The order of this decomposition. */ public int order() { return mRowVectors[0].length; } /** * Returns the value of this matrix at the specified row and column. * * @param row Row index. * @param column Column index. * @return The value of this matrix at the specified row and * column. */ public double value(int row, int column) { double[] rowVec = mRowVectors[row]; double[] colVec = mColumnVectors[column]; double result = 0.0; for (int i = 0; i < rowVec.length; ++i) result += rowVec[i] * colVec[i]; return result; } /** * Returns the array of singular values for this decomposition. * * @return The singular values for this decomposition. */ public double[] singularValues() { double[] singularValues = new double[mOrder]; for (int i = 0; i < singularValues.length; ++i) singularValues[i] = singularValue(i); return singularValues; } /** * Returns the singular value for the specified order. * * @param order Dimension of the singular value. * @return The singular value at the specified order. */ public double singularValue(int order) { if (order >= mOrder) { String msg = "Maximum order=" + (mOrder-1) + " found order=" + order; throw new IllegalArgumentException(msg); } return columnLength(mRowVectors,order) * columnLength(mColumnVectors,order); } /** * Returns a matrix in which the left singular vectors make up the * columns. The first index is for the row of the original matrix * and the second index is for the order of the singular vector. * Thus the returned matrix is <code>m×k</code>, if the * original input was an <code>m×n</code> matrix and SVD was * performed at order <code>k</code>. * @return The left singular vectors of this matrix. */ public double[][] leftSingularVectors() { return normalizeColumns(mRowVectors); } /** * Returns a matrix in which the right singular vectors make up * the columns. The first index is for the column of the original * matrix and the second index is for the order of the singular * vector. Thus the returned matrix is <code>n×k</code>, if * the original input was an <code>m×n</code> matrix and SVD * was performed at order <code>k</code>. * * @return The right singular vectors. */ public double[][] rightSingularVectors() { return normalizeColumns(mColumnVectors); } /** * Returns the signular value decomposition of the specified * complete matrix of values. * * <p>For a full description of the arguments and values, see * the method documentation for * {@link #partialSvd(int[][],double[][],int,double,double,double,double,double,int,int,PrintWriter)} * and the class documentation above. * * <p>The two-dimensional array of values must be an * <code>m × n</code> matrix. In particular, each row * must be of the same length. If this is not the case, an illegal * argument exception will be raised. * * @param values Array of values. * @param maxOrder Maximum order of the decomposition. * @param featureInit Initial value for singular vectors. * @param initialLearningRate Incremental multiplier of error determining how * fast learning occurs. * @param annealingRate Rate at which annealing occurs; higher values * provide more gradual annealing. * @param regularization A regularization constant to damp learning. * @param minImprovement Minimum relative improvement in mean square error required * to finish an epoch. * @param minEpochs Minimum number of epochs for training. * @param maxEpochs Maximum number of epochs for training. * @param writer Print writer to which output is directed during * training, or null if no output is desired. * @return Singular value decomposition for the specified partial matrix * at the specified order. * @throws IllegalArgumentException Under conditions listed in the * method documentation above. */ public static SvdMatrix svd(double[][] values, int maxOrder, double featureInit, double initialLearningRate, double annealingRate, double regularization, double minImprovement, int minEpochs, int maxEpochs, PrintWriter writer) { int m = values.length; int n = values[0].length;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -