📄 cluster.man
字号:
.\" $Header: /usr/src/local/conn/cluster/RCS/cluster.man,v 1.15 1993/02/03 07:43:07 stolcke Exp $.TH CLUSTER L "$Date: 1993/02/03 07:43:07 $".SH NAMEcluster, pca \- Hierarchical Cluster Analysis and Principal Component Analysis.SH SYNOPSIS.B cluster.RI [ options ].RI [ vectorfile .RI [ namesfile ]].LP.B pca.RI [ options ].RI [ vectorfile.RI [ namesfile ]].SH DESCRIPTION.B Clusterperforms Hierarchical Cluster Analysis (HCA) on a set of vectors andoutputs the result in a variety of formats on standard output..PP.B Pcaperforms Principal Component Analysis (PCA) on a set of vectors andprints the transformed set of vectors on standard output..PPIf.I vectorfileis given it is read as the file containing the vector data,one vector per line, components separated by whitespace.An optional.I namesfilecan be given to assign names (arbitrary strings) to these vectors.Names must be specified one per line, matching the number of vectors in.IR vectorfile .Names are either contiguous non-whitespace characters or arbitrary stringsdelimited by an initial double quote `"' and the end of line..PPVector names may also be given in.I vectorfileitself, following the vector components on each line.If no namesare provided, vectors in the output are identified by their input sequencenumber instead..PPEither of these files may be given as.RB ` - ',indicating that thecorresponding information should be read from standard input.If no arguments are given standard input is read,allowing.B clusterto be used as a filter..PP.B Clusterand.B pcaalso provide a simple scaling facility.If the first line of the input is terminated by the keyword.RB ` _SCALE_ 'it is interpreted as a vector of scaling factors.The following lines are thenread as data as usual, except that vector components are multiplied bytheir corresponding scaling factors.To specify scaling factors on the command line use.br (echo.I factor1.I factor2\&... _SCALE_ ; \e.br cat.I vectorfile) | cluster - [.I namesfile].PPYet another potentially useful feature is that vector components may bespecified as.RB ` D/C '(don't care), meaning that that component will alwayscontribute zero in computing distances to other vectors.In PCA mode, each D/C value is replaced by the mean of all non-D/C valuesalong its dimension..SH OPTIONS.TP.B -pForce PCA mode, even when the program is called as.BR cluster ..RB ( clusterand.B pcaare different incarnations of the same program, depending on thezeroth argument.).TP.B -sSuppress scaling.Vector components are not scaled, even if a.B _SCALE_line was found.This is useful to produce both scaled and unscaled analyses from thesame input file..TP.B -vVerbose output. Reports the number and dimension of vectors readand precedes each output section with an explanatory message.For.B pca ,execution of the computational steps involved is reported..SS "Cluster only".TP.B -dOutput all pairs of clusters formed, along with their respectiveinter-cluster distances.Clusters are given as lists of vectors..TP.B -tRepresent the hierarchical clusters as a tree lying on its side.The leaves of the tree are formed by vector names, and thehorizontal spacing between nodes is proportional to the distancesbetween clusters.The output uses only ASCII characters, resulting in a rough approximationof the true proportions..TP.B -TSame as.B -tbut the cluster tree is displayed in a.BR curses (3)pad. The terminal screen can be scrolled around the tree representation.Also, VT100 graphics characters are used for line drawing if available.While displaying the tree, the following one-key commands can be used:.RS.PD 0.TP 10Home, HScroll to upper-left corner of window..TPh, j, k, l, arrow keysScroll left, down, up, right by one position..TPTab, BackTabScroll right, left by 8 positions..TPn, pSroll down, up by one page..TPRRedraw screen..TPqQuit the display..PD.RE.TP.BI -w widthSet the width of the tree representation used by.B -tand.B -Tto.I widthcharacters.The default width is 80 or the terminal width as determined by.BR curses (3).Wider trees are more difficult to view but give a more accurate pictureof relative distances..TP.B -gSame as.BR -t ,but the graphical output is specified in a format suitable for theUNIX.BR graph (1G)utility, which allows further formatting such as bounding box,axes labels, rotation, and scaling..BR Graph (1G)in turn produces plotting instructions according to the.BR plot (5)format, for which a variety of output filters exist.The following are typical command lines..br.sp 1Previewing on a standard terminal:.br cluster -g | graph -g1 | plot -Tcrt.brPreviewing under X windows:.br cluster -g | graph -g1 | xplot.bror.br cluster -g | xgraph.brIf neither xplot nor xgraph are available, run an.BR xterm (1)switched to Tektronics mode and use.br cluster -g | graph -g1 | plot -Ttek.brConverting to postscript:.br cluster -g | graph -g1 | psplot.brPrinting on a printer supporting.B plot (5)format:.br cluster -g | graph -g1 | lpr -g .br.TP.B -bSame as.BR -g ,except that double drawing of lines is avoided, thus saving space and time.This requires however that.B graphbe called with the.B -boption to correctly assemble the tree from pieces:.br cluster -b | graph -b.TP.B -BThe input vectors are output as bit vectors induced by the cluster tree.The cluster tree is interpreted as a code tree, i.e., for each left or rightbranch are `0' or `1' bit, respectively, is printed.An `x' is used to pad vectors to the depth of the tree..TP.BI -n pNorm to be used as distance metric between vectors.A positive integer.I pspecifies a metric based on the L\c.IR p -norm.The value.B 0selects the maximum norm.The default is.B 2(Euclidean distance)..PPFor compatibility with an earlier version of the program, the defaultbehavior of.B clustercorresponds to the combination of options.BR -dtv ..SS "Pca only".TP.BI -e eigenbaseUse.I eigenbaseas a file with precomputed eigenvectors.If the file exists, it is read and the relatively costly eigenvaluecomputation is avoided.This also allows transforming a data set according to principle componentsdetermined from a different data set.If the file does not exist, an eigenbase is computed from the currentinput and saved in the file..TP.BI -c pc1,pc2,...Select a subset of the principal components for output, as typicallyused for dimensionality reduction of vector sets.Components of the transformed vectors are listed in the orderspecified by the comma-separated list of numbers.IR pc1 , pc2 ,...For example,.B "-c4,2"prints the fourth and second principal components (in that order)..TP.B -EOutput the eigenvalues instead of the transformed input vectors.Eigenvalues are printed in descending order or as specified by the.B -coption.This option forces recomputation of the eigenbase even if an existingfile is specified with the.B -eoption..SH BUGSHalfhearted error handling.If vectors and names are given in the samefile, the name at the end of the first line must be a non-numerical string,or it will be mistaken as a vector component..PPThe vector names at the leaves of the cluster tree tend tostretch beyond the bounding box of the plot.This is a feature since .B clusterleaves the graphing process entirely to.BR graph (1G),which doesn't care about the length of strings.This can be corrected by explicitly specifying an upper limit for the xcoordinate..PPThe clustering algorithm could be optimized further..SH "SEE ALSO"graph(1G), plot(5), plot(1G), xplot(1), xgraph(1), xterm(1),psplot(1), curses(3), lpr(1)..SH AUTHORSOriginal version by Yoshiro Miyata (miyata@boulder.colorado.edu)..brMinor fixes, various options,.BR curses (3)support,.BR graph (1G)output and PCA addition by Andreas Stolcke (stolcke@icsi.berkeley.edu)..brScaling and algorithm improvements suggested by Steve Omohundro(om@icsi.berkeley.edu)..brDon't care values suggested by Kim Daugherty (kimd@gizmo.usc.edu)..brBit vector output suggested by Joseph Devlin (jdevlin@maestro.usc.edu)..brThe algorithms for eigenvalue computation and Gaussian eliminationwere adapted from.I "Numerical Recipes in C" by Press, Flannery, Teukolsky & Vetterling..brFinally, this program is freely distributable, but nobody should try to makemoney off of it, and it would be nice if researchers using it acknowledged the people mentioned above.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -