📄 cluster.html
字号:
<html>
<head><title>Clustering a dissimilarity matrix</title></head>
<body bgcolor="#ffffff">
<h3>Clustering a dissimilarity matrix</h3>
<font color=blue><tt>cluster <font color=red>-##</font>
[-= -X </tt><em>xfile</em><tt>] </tt><em>file</em></font>
<blockquote>
<br> <font color=red><tt> -# </tt></font>number of clusters
<br> <font color=blue><tt> -= </tt></font>if set, bias towards similar size clusters
<br> <font color=blue><tt> -X </tt><em>ffile</em></font>list of indices with fixed cluster assignments
<br> <font color=blue><tt> -o </tt></font><a href="../general.html#outfile" tppabs="http://www.mpipks-dresden.mpg.de/~tisean/TISEAN_2.0/docs/general.html#outfile">output file name</a>, just <font color=blue><tt> -o </tt></font>means <font color=blue><em>file</em><tt>_clust</tt></font>
<br> <font color=blue><tt> -V </tt></font><a href="../general.html#verbosity" tppabs="http://www.mpipks-dresden.mpg.de/~tisean/TISEAN_2.0/docs/general.html#verbosity">verbosity level</a> (0 = only fatal errors)
<br> <font color=blue><tt> -h </tt></font>show this message
<p>
<a href="../general.html#verbosity" tppabs="http://www.mpipks-dresden.mpg.de/~tisean/TISEAN_2.0/docs/general.html#verbosity">verbosity level</a> (add what you want):
<p>
<font color=blue> 1</font> = input/output
<br> <font color=blue> 2</font> = state of clustering
<br> <font color=blue> 8</font> = temperature / cost at cooling
</blockquote>
The program reads a dissimilarity matrix of the form <font color=blue>i, j,
d<sub>i,j</sub></font> (columns 1,2,3 of the input file). Any missing values
are filled in by the mean of the given values. Now <font color=red><tt> -#
</tt></font> clusters are formed by minimising the average dissimilarity of
each entity to all the entities within each cluster. The method is described in
<a href="../chaospaper/citation.html#cluster" tppabs="http://www.mpipks-dresden.mpg.de/~tisean/TISEAN_2.0/docs/chaospaper/citation.html#cluster">Schreiber and Schmitz</a>.
Certain indices may be assigned to a cluster by listing the index and the
cluster number in <font color=blue><em>ffile</em></font> 9the argument of the
<font color=blue><tt> -X </tt></font> option).
<p>
Progress is monitored by a string printed at brief intervals. Here, clusters
are lettered by A, B, ... On output, the clustering is described by giving
for each index the cluster number and the average dissimilarities of that item
to each cluster.
<p>
As an example, consider four time series 1,2,3,4 where 1 and 2 are very
similar, 3 and 4 as well, but teh two groups are quite dissimilar. This may be
reflected in the dissimilarity matrix
<blockquote>
<table noborder cellpadding=0 cellspacing=0>
<tr><th>i</th><th>j</th><th>d<sub>i,j</sub></th></tr>
<tr><td colspan=3><hr with=100%></td></tr>
<tr><td> 1 </td><td>1 </td><td>0 </td></tr>
<tr><td> 1 </td><td>2 </td><td>0 </td></tr>
<tr><td> 1 </td><td>3 </td><td>1 </td></tr>
<tr><td> 1 </td><td>4 </td><td>1 </td></tr>
<tr><td> 2 </td><td>1 </td><td>0 </td></tr>
<tr><td> 2 </td><td>2 </td><td>0 </td></tr>
<tr><td> 2 </td><td>3 </td><td>1 </td></tr>
<tr><td> 2 </td><td>4 </td><td>1 </td></tr>
<tr><td> 3 </td><td>1 </td><td>1 </td></tr>
<tr><td> 3 </td><td>2 </td><td>1 </td></tr>
<tr><td> 3 </td><td>3 </td><td>0 </td></tr>
<tr><td> 3 </td><td>4 </td><td>0 </td></tr>
<tr><td> 4 </td><td>1 </td><td>1 </td></tr>
<tr><td> 4 </td><td>2 </td><td>1 </td></tr>
<tr><td> 4 </td><td>3 </td><td>0 </td></tr>
<tr><td> 4 </td><td>4 </td><td>0 </td></tr>
</table>
</blockquote>
Running
<blockquote>
<font color=blue><tt>> cluster -#2</tt></font>
</blockquote>
on this data will yield as output
<pre>
1 0. 1.
1 0. 1.
2 1. 0.
2 1. 0.
</pre>
This means that, as expected, set 1 and two are in cluster 1. Also, 3 and 4 are
in cluster 2. They all have average distance 0. to their home cluster and 1. to
the other cluster.
<p>
Dissimilarity matrices for time series can be produced either using <a
herf="../dresden/nstat_z.htm">nstat_z</a> or by computing any other
dissimilarity measure (<a
herf="../dresden/xc2.html">xc2</a>, <a
herf="../dresden/xzero.html">xzero</a>, <a
herf="../dresden/xcor.html">xcor</a>, with appropriate settings)
in a loop.
<p>
Here one more example for UNIX users using pipelines:
<blockquote>
<font color=blue><tt>> ( henon -l 1000 ; ikeda -l 1000 ) | nstat_z -#10 | cluster -#2</tt></font>
</blockquote>
A time series of 2000 points is produced, the first 1000 from the Hénon
map, the second from the Ikeda map. Splitting it into 10 segments, <a
href="../../../../../tppmsgs/msgs0.htm#37" tppabs="http://www.mpipks-dresden.mpg.de/~tisean/TISEAN_2.0/docs/dresden/nstat_z.htm">nstat_z</a> produces a 10 by 10 matrix which is
then used to form 2 clusters:
<pre> 1 0.510285676 1.51835787
1 0.515677989 1.47538877
1 0.505068302 1.50731277
1 0.526149631 1.50477791
1 0.54192245 1.5086087
2 1.49558449 0.900290847
2 1.49753571 0.912206411
2 1.50899839 0.89825052
2 1.49374235 0.903614342
2 1.51858497 0.915635824
</pre>
Indeed, the first 5 segments form one cluster and segments 6-10 the other.
<p>
<a href="../contents.html" tppabs="http://www.mpipks-dresden.mpg.de/~tisean/TISEAN_2.0/docs/contents.html">Table of Contents</a> * <a href="../../index.html" tppabs="http://www.mpipks-dresden.mpg.de/~tisean/TISEAN_2.0/index.html" target="_top">TISEAN home</a>
</body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -