📄 clustering - 咨询之路 - 博客园.htm
字号:
<CENTER>
<TABLE cellSpacing=1 bgColor=#ffffff border=1>
<TBODY>
<TR>
<TD><STRONG><EM>k</EM>-means Algorithm:<BR></STRONG>
<OL>
<LI>Select <EM>k</EM> clusters arbitrarily.
<LI>Initialize cluster centers with those <EM>k</EM> clusters.
<LI>Do loop<BR> a) Partition by assigning or reassigning all
data objects to their closest cluster center.<BR> b) Compute
new cluster centers as mean value of the objects in each
cluster.<BR>Until no change in cluster center calculation
</LI></OL></TD></TR></TBODY></TABLE>Figure 1: <EM>k</EM>-means Algorithm
</CENTER>
<CENTER><BR><BR><IMG alt="" src="Clustering - 咨询之路 - 博客园.files/ClusterEg.gif"
border=1> <BR>Figure 2: Sample Data </CENTER>
<BLOCKQUOTE><STRONG>Example:</STRONG> Suppose we are given the data in Figure
2 as input and we choose <EM>k</EM>=2 and the Manhattan distance function
<EM>dis</EM> =
|<EM>x<SUB></SUB>-<EM>x</EM><SUB>1</SUB>|+|<EM>y<SUB></SUB>-<EM>y</EM><SUB>1</SUB>|.
The detailed computation is as follows: <BR><BR>Step 1. Initialize <EM>k</EM>
partitions. An initial partition can be formed by first specifying a set of
<EM>k</EM> seed points. The seed points could be the first <EM>k</EM> objects
or <EM>k</EM> objects chosen randomly from the input objects. We choose the
first two objects as seed points and initialize the clusters as
<EM>C</EM><SUB>1</SUB>={(0,0)} and <EM>C</EM><SUB>2</SUB>={(0,1)}. <BR>Step 2.
Since there is only one point in each cluster, that point is the cluster
center. <BR>Step 3a. Calculate the distance between each object and each
cluster center, assigning the object to the closest
cluster.<BR> For example, for the third
object:<BR> <EM>dis</EM>(1,3) =
|1-0| + |1-0| = 2
and<BR> <EM>dis</EM>(2,3) =
|1-0| + |1-1| = 1,<BR> so this object is assigned to
<EM>C</EM><SUB>2</SUB>.<BR> The fifth object is
equidistant from both clusters, so we arbitrarily assign it to
<EM>C</EM><SUB>1</SUB>.<BR> After calculating the
distance for all points, the clusters contain the following
objects:<BR> <EM>C</EM><SUB>1</SUB>
= {(0,0),(1,0),(0.5,0.5)}
and<BR> <EM>C</EM><SUB>2</SUB>
= {(0,1),(1,1),(5,5),(5,6),(6,6),(6,5),(5.5,5.5)}. <BR>Step 3b. Compute new
cluster center for each cluster.<BR> New center for
<EM>C</EM><SUB>1</SUB> =
(0.5,0.16)<BR> (0+1+0.5)/3 =
0.5<BR> (0+0+0.5)/3 =
0.16<BR> New center for <EM>C</EM><SUB>2</SUB> =
(4.1,4.2)<BR> (0+1+5+5+6+6+5.5)/7
= 4.1<BR> (1+1+5+5+6+6+5.5)/7 =
4.2 <BR>Step 3a′. New centers <EM>C</EM><SUB>1</SUB> = (0.5,0.16) and
<EM>C</EM><SUB>2</SUB> = (4.1,4.2) differ from old centers
<EM>C</EM><SUB>1</SUB>={(0,0)} and <EM>C</EM><SUB>2</SUB>={(0,1)}, so the loop
is repeated. <BR> Reassign the ten objects to the
closest cluster center, resulting
in:<BR> <EM>C</EM><SUB>1</SUB>
=
{(0,0),(0,1),(1,1),(1,0),(0.5,0.5)}<BR> <EM>C</EM><SUB>2</SUB>
= {(5,5),(5,6),(6,6),(6,5),(5.5,5.5)} <BR>Step 3b′. Compute new cluster center
for each cluster<BR> New center for
<EM>C</EM><SUB>1</SUB> = (0.5,0.5)<BR> New
center for <EM>C</EM><SUB>2</SUB> = (5.5,5.5) <BR>Step 3a′′. New centers
<EM>C</EM><SUB>1</SUB> = (0.5,0.5) and <EM>C</EM><SUB>2</SUB> = (5.5,5.5)
differ from old centers <EM>C</EM><SUB>1</SUB> = (0.5,0.16) and
<EM>C</EM><SUB>2</SUB> = (4.1,4.2), so the loop is repeated.
<BR> Reassign the ten objects to the closest cluster
center. Result is same as in Step 5. <BR>Step 3b′′. Compute new cluster
centers. <BR>Algorithm is done: Centers are the same as in Step 3b′, so
algorithm is finished. Result is same as in 3b′.
</EM>2</EM>2</BLOCKQUOTE><BR><FONT size=+2><STRONG>Agglomerative Hierarchical
Algorithm</STRONG></FONT>
<P>Hierarchical algorithms can be either agglomerative or divisive, that is
top-down or bottom-up. All <STRONG><EM>agglomerative hierarchical clustering
algorithms</EM></STRONG> begin with each object as a separate group. These
groups are successively combined based on similarity until there is only one
group remaining or a specified termination condition is satisfied. For
<EM>n</EM> objects, <EM>n</EM>-1 mergings are done. <STRONG><EM>Hierarchical
algorithms</EM> are rigid in that once a merge has been done, it cannot be
undone. Although there are smaller computational costs with this, it can also
cause problems if an erroneous merge is done. As such, merge points need to be
chosen carefully. Here we describe a simple agglomerative clustering algorithm.
More complex algorithms have been developed, such as BIRCH and CURE, in an
attempt to improve the clustering quality of hierarchical algorithms.
</STRONG></P>
<CENTER><BR><IMG alt="" src="Clustering - 咨询之路 - 博客园.files/Hierarch.gif"
border=1> <BR>Figure 3: Sample Dendogram <BR><BR></CENTER>
<P>In the context of hierarchical clustering, the hierarchy graph is called a
<STRONG><EM>dendogram</EM>. Figure 3 shows a sample dendogram that could be
produced from a hierarchical clustering algorithm. Unlike with the
<EM>k</EM>-means algorithm, the number of clusters (<EM>k</EM>) is not specified
in hierarchical clustering. After the hierarchy is built, the user can specify
the number of clusters required, from 1 to <EM>n</EM>. The top level of the
hierarchy represents one cluster, or <EM>k</EM>=1. To examine more clusters, we
simply need to traverse down the hierarchy. <BR><BR></STRONG></P>
<CENTER>
<TABLE cellSpacing=0 bgColor=#ffffff border=1>
<TBODY>
<TR>
<TD width=400><STRONG>Agglomerative Hierarchical Algorithm:</STRONG>
<BR><BR><STRONG>Given:</STRONG><BR> A set
<EM>X</EM> of objects
{<EM>x</EM><SUB>1</SUB>,...,<EM>x<SUB>n</SUB></EM>}<BR>
A distance function
<EM>dis</EM>(<EM>c</EM><SUB>1</SUB>,<EM>c</EM><SUB>2</SUB>)<BR>
<OL>
<LI><STRONG>for</STRONG> <EM>i</EM> = 1 to
<EM>n</EM><BR> <EM>c<SUB>i</SUB></EM> =
{<EM>x<SUB>i</SUB></EM>}<BR><STRONG>end for</STRONG><BR>
<LI><EM>C</EM> = {<EM>c</EM><SUB>1</SUB>,...,<EM>c<SUB>b</SUB></EM>}<BR>
<LI><EM>l</EM> = <EM>n</EM>+1<BR>
<LI><STRONG>while</STRONG> <EM>C</EM>.size > 1
<STRONG>do</STRONG><BR> a)
(<EM>c<SUB>min</SUB>,<EM>c<SUB>min</SUB>) = minimum
<EM>dis</EM>(<EM>c<SUB>i</SUB></EM>,<EM>c<SUB>j</SUB></EM>) for all
<EM>c<SUB>i</SUB></EM>,<EM>c<SUB>j</SUB></EM> in
<EM>C</EM><BR> b) remove <EM>c<SUB>min</SUB> and
<EM>c<SUB>min</SUB> from <EM>C</EM><BR> c) add
{<EM>c<SUB>min</SUB>,<EM>c<SUB>min</SUB>} to
<EM>C</EM><BR> d) <EM>l</EM> = <EM>l</EM> + 1<BR><STRONG>end
while</STRONG> </EM>2</EM>1</EM>2</EM>1</EM>2</EM>1
</LI></OL></TD></TR></TBODY></TABLE>Figure 4: Agglomerative Hierarchical
Algorithm </CENTER>
<P>Figure 4 shows a simple hierarchical algorithm. The distance function in this
algorithm can determine similarity of clusters through many methods, including
single link and group-average. <STRONG><EM>Single link</EM> calculates the
distance between two clusters as the shortest distance between any two objects
contained in those clusters. <STRONG><EM>Group-average</EM> first finds the
average values for all objects in the group (i.e., cluster) and the calculates
the distance between clusters as the distance between the average values.
</STRONG></STRONG></P>
<P>Each object in <EM>X</EM> is initially used to create a cluster containing a
single object. These clusters are successively merged into new clusters, which
are added to the set of clusters, <EM>C</EM>. When a pair of clusters is merged,
the original clusters are removed from <EM>C</EM>. Thus, the number of clusters
in <EM>C</EM> decreases until there is only one cluster remaining, containing
all the objects from <EM>X</EM>. The hierarchy of clusters is implicity
represented in the nested sets of <EM>C</EM>. </P>
<BLOCKQUOTE><STRONG>Example:</STRONG> Suppose the input to the simple
agglomerative algorithm described above is the set <EM>X</EM>, shown in Figure
5 represented in matrix and graph form. We will use the Manhattan distance
function and the single link method for calculating distance between clusters.
The set <EM>X</EM> contains <EM>n</EM>=10 elements, <EM>x</EM><SUB>1</SUB> to
<EM>x</EM><SUB>10</SUB>, where <EM>x</EM><SUB>1</SUB>=(0,0). <BR>
<CENTER><IMG alt="" src="Clustering - 咨询之路 - 博客园.files/Grapheg.gif" border=1>
<BR>Figure 5: Sample Data </CENTER><BR><BR><BR>Step 1. Initially, each element
<EM>x<SUB>i</SUB></EM> of <EM>X</EM> is placed in a cluster
<EM>c<SUB>i</SUB></EM>, where <EM>c<SUB>i</SUB></EM> is a member of the set of
clusters <EM>C</EM>. <BR> <EM>C</EM> =
{{<EM>x</EM><SUB>1</SUB>},{<EM>x</EM><SUB>2</SUB>},{<EM>x</EM><SUB>3</SUB>},
{<EM>x</EM><SUB>4</SUB>},{<EM>x</EM><SUB>5</SUB>},{<EM>x</EM><SUB>6</SUB>},{<EM>x</EM><SUB>7</SUB>},
{<EM>x</EM><SUB>8</SUB>},{<EM>x</EM><SUB>9</SUB>},{<EM>x</EM><SUB>10</SUB>}}
<BR><BR>Step 2. Set <EM>l</EM> = 11. <BR><BR>Step 3. (First iteration of while
loop) <EM>C</EM>.size = 10
<UL>
<LI>The minimum single link distance between two clusters is 1. This occurs
in two places, between <EM>c</EM><SUB>2</SUB> and <EM>c</EM><SUB>10</SUB>
and between <EM>c</EM><SUB>3</SUB> and <EM>c</EM><SUB>10</SUB>.
<BR>Depending on how our minimum function works we can choose either pair of
clusters. Arbitrarily we choose the first. <BR>
(<EM>c<SUB>min</SUB>,<EM>c<SUB>min</SUB>) =
(<EM>c</EM><SUB>2</SUB>,<EM>c</EM><SUB>10</SUB>) </EM>2</EM>1
<LI>Since <EM>l</EM> = 10, <EM>c</EM><SUB>11</SUB> = <EM>c</EM><SUB>2</SUB>
<FONT face=arial>U</FONT> <EM>c</EM><SUB>10</SUB> =
{{<EM>x</EM><SUB>2</SUB>},{<EM>x</EM><SUB>10</SUB>}}
<LI>Remove <EM>c</EM><SUB>2</SUB> and <EM>c</EM><SUB>10</SUB> from
<EM>C</EM>.
<LI>Add <EM>c</EM><SUB>11</SUB> to <EM>C</EM>. <BR>
<EM>C</EM> = {{<EM>x</EM><SUB>1</SUB>},{<EM>x</EM><SUB>3</SUB>},
{<EM>x</EM><SUB>4</SUB>},{<EM>x</EM><SUB>5</SUB>},{<EM>x</EM><SUB>6</SUB>},{<EM>x</EM><SUB>7</SUB>},
{<EM>x</EM><SUB>8</SUB>},{<EM>x</EM><SUB>9</SUB>},{{<EM>x</EM><SUB>2</SUB>},
{<EM>x</EM><SUB>10</SUB>}}}
<LI>Set <EM>l</EM> = <EM>l</EM> + 1 = 12 </LI></UL>Step 3. (Second iteration)
<EM>C</EM>.size = 9
<UL>
<LI>The minimum single link distance between two clusters is 1. This occurs
between, between <EM>c</EM><SUB>3</SUB> and <EM>c</EM><SUB>11</SUB> because
the distance between <EM>x</EM><SUB>3</SUB> and <EM>x</EM><SUB>10</SUB> is
1, where <EM>x</EM><SUB>10</SUB> is in <EM>c</EM><SUB>11</SUB>.
<BR> (<EM>c<SUB>min</SUB>,<EM>c<SUB>min</SUB>) =
(<EM>c</EM><SUB>3</SUB>,<EM>c</EM><SUB>11</SUB>) </EM>2</EM>1
<LI><EM>c</EM><SUB>12</SUB> = <EM>c</EM><SUB>3</SUB> <FONT
face=arial>U</FONT> <EM>c</EM><SUB>11</SUB> =
{{{<EM>x</EM><SUB>2</SUB>},{<EM>x</EM><SUB>10</SUB>}},{<EM>x</EM><SUB>3</SUB>}}
<LI>Remove <EM>c</EM><SUB>3</SUB> and <EM>c</EM><SUB>11</SUB> from
<EM>C</EM>.
<LI>Add <EM>c</EM><SUB>12</SUB> to <EM>C</EM>. <BR>
<EM>C</EM> = {{<EM>x</EM><SUB>1</SUB>},
{<EM>x</EM><SUB>4</SUB>},{<EM>x</EM><SUB>5</SUB>},{<EM>x</EM><SUB>6</SUB>},{<EM>x</EM><SUB>7</SUB>},
{<EM>x</EM><SUB>8</SUB>},{<EM>x</EM><SUB>9</SUB>},{{{<EM>x</EM><SUB>2</SUB>},
{<EM>x</EM><SUB>10</SUB>}}, {<EM>x</EM><SUB>3</SUB>}}}
<LI>Set <EM>l</EM> = 13 </LI></UL>Step 3. (Third iteration) <EM>C</EM>.size =
8
<UL>
<LI>(<EM>c<SUB>min</SUB>,<EM>c<SUB>min</SUB>) =
(<EM>c</EM><SUB>1</SUB>,<EM>c</EM><SUB>12</SUB>) </EM>2</EM>1
<LI><EM>C</EM> =
{{<EM>x</EM><SUB>4</SUB>},{<EM>x</EM><SUB>5</SUB>},{<EM>x</EM><SUB>6</SUB>},{<EM>x</EM><SUB>7</SUB>},
{<EM>x</EM><SUB>8</SUB>},{<EM>x</EM><SUB>9</SUB>},{{{{<EM>x</EM><SUB>2</SUB>},
{<EM>x</EM><SUB>10</SUB>}},
{<EM>x</EM><SUB>3</SUB>}},{<EM>x</EM><SUB>1</SUB>}}} </LI></UL>Step 3. (Fourth
iteration) <EM>C</EM>.size = 7
<UL>
<LI>(<EM>c<SUB>min</SUB>,<EM>c<SUB>min</SUB>) =
(<EM>c</EM><SUB>4</SUB>,<EM>c</EM><SUB>8</SUB>) </EM>2</EM>1
<LI><EM>C</EM> =
{{<EM>x</EM><SUB>5</SUB>},{<EM>x</EM><SUB>6</SUB>},{<EM>x</EM><SUB>7</SUB>},
{<EM>x</EM><SUB>9</SUB>},{{{{<EM>x</EM><SUB>2</SUB>},
{<EM>x</EM><SUB>10</SUB>}},
{<EM>x</EM><SUB>3</SUB>}},{<EM>x</EM><SUB>1</SUB>}},{{<EM>x</EM><SUB>4</SUB>},{<EM>x</EM><SUB>8</SUB>}}}
</LI></UL>Step 3. (Fifth iteration) <EM>C</EM>.size = 6
<UL>
<LI>(<EM>c<SUB>min</SUB>,<EM>c<SUB>min</SUB>) =
(<EM>c</EM><SUB>5</SUB>,<EM>c</EM><SUB>7</SUB>) </EM>2</EM>1
<LI><EM>C</EM> = {{<EM>x</EM><SUB>6</SUB>},
{<EM>x</EM><SUB>9</SUB>},{{{{<EM>x</EM><SUB>2</SUB>},
{<EM>x</EM><SUB>10</SUB>}},
{<EM>x</EM><SUB>3</SUB>}},{<EM>x</EM><SUB>1</SUB>}},{{<EM>x</EM><SUB>4</SUB>},{<EM>x</EM><SUB>8</SUB>}},
{{<EM>x</EM><SUB>5</SUB>},{<EM>x</EM><SUB>7</SUB>}}} </LI></UL>Step 3. (Sixth
iteration) <EM>C</EM>.size = 5
<UL>
<LI>(<EM>c<SUB>min</SUB>,<EM>c<SUB>min</SUB>) =
(<EM>c</EM><SUB>9</SUB>,<EM>c</EM><SUB>13</SUB>) </EM>2</EM>1
<LI><EM>C</EM> = {{<EM>x</EM><SUB>6</SUB>},
{{<EM>x</EM><SUB>4</SUB>},{<EM>x</EM><SUB>8</SUB>}},
{{<EM>x</EM><SUB>5</SUB>},{<EM>x</EM><SUB>7</SUB>}},{{{{{<EM>x</EM><SUB>2</SUB>},
{<EM>x</EM><SUB>10</SUB>}},
{<EM>x</EM><SUB>3</SUB>}},{<EM>x</EM><SUB>1</SUB>}},{<EM>x</EM><SUB>9</SUB>}}}
</LI></UL>Step 3. (Seventh iteration) <EM>C</EM>.size = 4
<UL>
<LI>(<EM>c<SUB>min</SUB>,<EM>c<SUB>min</SUB>) =
(<EM>c</EM><SUB>6</SUB>,<EM>c</EM><SUB>15</SUB>) </EM>2</EM>1
<LI><EM>C</EM> = {{{<EM>x</EM><SUB>4</SUB>},{<EM>x</EM><SUB>8</SUB>}},
{{{{{<EM>x</EM><SUB>2</SUB>}, {<EM>x</EM><SUB>10</SUB>}},
{<EM>x</EM><SUB>3</SUB>}},{<EM>x</EM><SUB>1</SUB>}},{<EM>x</EM><SUB>9</SUB>}},{{<EM>x</EM><SUB>6</SUB>},
{{<EM>x</EM><SUB>5</SUB>},{<EM>x</EM><SUB>7</SUB>}}}} </LI></UL>Step 3.
(Eighth iteration) <EM>C</EM>.size = 3
<UL>
<LI>(<EM>c<SUB>min</SUB>,<EM>c<SUB>min</SUB>) =
(<EM>c</EM><SUB>14</SUB>,<EM>c</EM><SUB>16</SUB>) </EM>2</EM>1
<LI><EM>C</EM> = { {{<EM>x</EM><SUB>6</SUB>},
{{<EM>x</EM><SUB>5</SUB>},{<EM>x</EM><SUB>7</SUB>}}},
{{{<EM>x</EM><SUB>4</SUB>},{<EM>x</EM><SUB>8</SUB>}},
{{{{{<EM>x</EM><SUB>2</SUB>}, {<EM>x</EM><SUB>10</SUB>}},
{<EM>x</EM><SUB>3</SUB>}},{<EM>x</EM><SUB>1</SUB>}},{<EM>x</EM><SUB>9</SUB>}}}}
</LI></UL>Step 3. (Ninth iteration) <EM>C</EM>.size = 2
<UL>
<LI>(<EM>c<SUB>min</SUB>,<EM>c<SUB>min</SUB>) =
(<EM>c</EM><SUB>17</SUB>,<EM>c</EM><SUB>18</SUB>) </EM>2</EM>1
<LI><EM>C</EM> = {{{{<EM>x</EM><SUB>4</SUB>},{<EM>x</EM><SUB>8</SUB>}},
{{{{{<EM>x</EM><SUB>2</SUB>}, {<EM>x</EM><SUB>10</SUB>}},
{<EM>x</EM><SUB>3</SUB>}},{<EM>x</EM><SUB>1</SUB>}},{<EM>x</EM><SUB>9</SUB>}}},
{{<EM>x</EM><SUB>6</SUB>},
{{<EM>x</EM><SUB>5</SUB>},{<EM>x</EM><SUB>7</SUB>}}}} </LI></UL>Step 3. (Tenth
iteration) <EM>C</EM>.size = 1. Algorithm done. <BR>
<P>The cluster created from this algorithm can be seen in Figure 6. The
corresponding dendogram formed from the hierarchy in <EM>C</EM> is shown in
Figure 7. The points which appeared most closely together on the graph of
input data in Figure 5 are grouped together more closely in the hierarchy.
<BR><BR></P>
<CENTER><IMG alt="" src="Clustering - 咨询之路 - 博客园.files/Clusters.gif" border=1>
<BR>Figure 6: Graph with Clusters </CENTER><BR>
<CENTER><IMG alt="" src="Clustering - 咨询之路 - 博客园.files/Hiereg.gif" border=1>
<BR>Figure 7: Sample Dendogram </CENTER>
<CENTER></CENTER>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -