📄 clustering - 咨询之路 - 博客园.htm
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0063)http://www.cnblogs.com/calmwater/archive/2006/05/20/405345.html -->
<HTML><HEAD id=Head><TITLE>Clustering - 咨询之路 - 博客园</TITLE>
<META http-equiv=Content-Type content="text/html; charset=utf-8">
<META id=metaKeywords content=Clustering name=keywords><LINK id=MainCss
href="Clustering - 咨询之路 - 博客园.files/style.css" type=text/css
rel=stylesheet><LINK id=RSSLink title=RSS
href="http://www.cnblogs.com/calmwater/rss.aspx" type=application/rss+xml
rel=alternate>
<META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD>
<BODY>
<FORM id=Form1 name=Form1 onsubmit="javascript:return WebForm_OnSubmit();"
action=405345.html method=post>
<DIV><INPUT id=__EVENTTARGET type=hidden name=__EVENTTARGET> <INPUT
id=__EVENTARGUMENT type=hidden name=__EVENTARGUMENT> <INPUT
id=" __VIEWSTATE" type=hidden name=__VIEWSTATE> </DIV>
<SCRIPT type=text/javascript>
<!--
var theForm = document.forms['Form1'];
if (!theForm) {
theForm = document.Form1;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
// -->
</SCRIPT>
<SCRIPT src="Clustering - 咨询之路 - 博客园.files/WebResource.axd"
type=text/javascript></SCRIPT>
<SCRIPT language=JavaScript>
function ctlent(evt,id)
{
if(evt.ctrlKey && evt.keyCode == 13)
{
try
{
TempSave(id);
}
catch(ex)
{
}
finally
{
__doPostBack('AjaxHolder$PostComment$btnSubmit','')
}
}
}</SCRIPT>
<SCRIPT language=JavaScript>function SetReplyAuhor(author){document.getElementById('AjaxHolder_PostComment_tbComment').value+="@"+author+"\n";document.getElementById('AjaxHolder_PostComment_tbComment').focus();return false}</SCRIPT>
<SCRIPT
src="F:\study\ClusteringToolbox\Clustering - 咨询之路 - 博客园.files\WebResource(1).axd"
type=text/javascript></SCRIPT>
<SCRIPT src="Clustering - 咨询之路 - 博客园.files/ScriptResource.axd"
type=text/javascript></SCRIPT>
<SCRIPT
src="F:\study\ClusteringToolbox\Clustering - 咨询之路 - 博客园.files\ScriptResource(1).axd"
type=text/javascript></SCRIPT>
<SCRIPT type=text/javascript>
<!--
function WebForm_OnSubmit() {
if (typeof(ValidatorOnSubmit) == "function" && ValidatorOnSubmit() == false) return false;
return true;
}
// -->
</SCRIPT>
<DIV id=banner>
<DIV class=header>
<DIV><A class=headermaintitle id=Header1_HeaderTitle
href="http://www.cnblogs.com/calmwater/">咨询之路</A> </DIV>
<DIV>奔向麦肯锡 </DIV></DIV></DIV>
<DIV id=leftcontent style="DISPLAY: none">
<H1 class=listtitle>导航</H1>
<UL class=list>
<LI class=listitem><A class=listitem id=MyLinks1_HomeLink
href="http://www.cnblogs.com/">博客园</A>
<LI class=listitem><A class=listitem id=MyLinks1_MyHomeLink
href="http://calmwater.cnblogs.com/">首页</A>
<LI class=listitem><A class=listitem id=MyLinks1_NewPostLink
href="http://www.cnblogs.com/calmwater/admin/EditPosts.aspx?opt=1">新随笔</A>
<LI class=listitem><A class=listitem id=MyLinks1_ContactLink accessKey=9
href="http://www.cnblogs.com/calmwater/contact.aspx?id=1">联系</A>
<LI class=listitem><A class=listitem id=MyLinks1_Syndication
href="http://www.cnblogs.com/calmwater/rss">聚合</A><A id=MyLinks1_XMLLink
href="http://www.cnblogs.com/calmwater/rss"><IMG
style="BORDER-TOP-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-RIGHT-WIDTH: 0px"
src="Clustering - 咨询之路 - 博客园.files/xml.gif"></A>
<LI class=listitem><A class=listitem id=MyLinks1_Admin
href="http://www.cnblogs.com/calmwater/admin/EditPosts.aspx">管理</A> </LI></UL>
<TABLE class=Cal id=Calendar1_entryCal title=Calendar
style="BORDER-RIGHT: 1px solid; BORDER-TOP: 1px solid; BORDER-LEFT: 1px solid; BORDER-BOTTOM: 1px solid; BORDER-COLLAPSE: collapse"
cellSpacing=0 cellPadding=0 border=0>
<TBODY>
<TR>
<TD style="BACKGROUND-COLOR: silver" colSpan=7>
<TABLE class=CalTitle style="WIDTH: 100%; BORDER-COLLAPSE: collapse"
cellSpacing=0 border=0>
<TBODY>
<TR>
<TD class=CalNextPrev style="WIDTH: 15%"><A
title="Go to the previous month" style="COLOR: black"
href="javascript:__doPostBack('Calendar1$entryCal','V2282')"><</A></TD>
<TD style="WIDTH: 70%" align=middle>2006年5月</TD>
<TD class=CalNextPrev style="WIDTH: 15%" align=right><A
title="Go to the next month" style="COLOR: black"
href="javascript:__doPostBack('Calendar1$entryCal','V2343')">></A></TD></TR></TBODY></TABLE></TD></TR>
<TR>
<TH class=CalDayHeader scope=col align=middle abbr=日>日</TH>
<TH class=CalDayHeader scope=col align=middle abbr=一>一</TH>
<TH class=CalDayHeader scope=col align=middle abbr=二>二</TH>
<TH class=CalDayHeader scope=col align=middle abbr=三>三</TH>
<TH class=CalDayHeader scope=col align=middle abbr=四>四</TH>
<TH class=CalDayHeader scope=col align=middle abbr=五>五</TH>
<TH class=CalDayHeader scope=col align=middle abbr=六>六</TH></TR>
<TR>
<TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>30</TD>
<TD style="WIDTH: 14%" align=middle>1</TD>
<TD style="WIDTH: 14%" align=middle>2</TD>
<TD style="WIDTH: 14%" align=middle>3</TD>
<TD style="WIDTH: 14%" align=middle>4</TD>
<TD style="WIDTH: 14%" align=middle>5</TD>
<TD class=CalWeekendDay style="WIDTH: 14%" align=middle>6</TD></TR>
<TR>
<TD class=CalWeekendDay style="WIDTH: 14%" align=middle><A
href="http://www.cnblogs.com/calmwater/archive/2006/05/07.html"><U>7</U></A></TD>
<TD style="WIDTH: 14%" align=middle>8</TD>
<TD style="WIDTH: 14%" align=middle><A
href="http://www.cnblogs.com/calmwater/archive/2006/05/09.html"><U>9</U></A></TD>
<TD style="WIDTH: 14%" align=middle><A
href="http://www.cnblogs.com/calmwater/archive/2006/05/10.html"><U>10</U></A></TD>
<TD style="WIDTH: 14%" align=middle>11</TD>
<TD style="WIDTH: 14%" align=middle><A
href="http://www.cnblogs.com/calmwater/archive/2006/05/12.html"><U>12</U></A></TD>
<TD class=CalWeekendDay style="WIDTH: 14%" align=middle><A
href="http://www.cnblogs.com/calmwater/archive/2006/05/13.html"><U>13</U></A></TD></TR>
<TR>
<TD class=CalWeekendDay style="WIDTH: 14%" align=middle>14</TD>
<TD style="WIDTH: 14%" align=middle>15</TD>
<TD style="WIDTH: 14%" align=middle>16</TD>
<TD style="WIDTH: 14%" align=middle>17</TD>
<TD style="WIDTH: 14%" align=middle>18</TD>
<TD style="WIDTH: 14%" align=middle><A
href="http://www.cnblogs.com/calmwater/archive/2006/05/19.html"><U>19</U></A></TD>
<TD class=CalWeekendDay style="WIDTH: 14%" align=middle><A
href="http://www.cnblogs.com/calmwater/archive/2006/05/20.html"><U>20</U></A></TD></TR>
<TR>
<TD class=CalWeekendDay style="WIDTH: 14%" align=middle>21</TD>
<TD style="WIDTH: 14%" align=middle>22</TD>
<TD style="WIDTH: 14%" align=middle>23</TD>
<TD style="WIDTH: 14%" align=middle>24</TD>
<TD style="WIDTH: 14%" align=middle>25</TD>
<TD style="WIDTH: 14%" align=middle>26</TD>
<TD class=CalWeekendDay style="WIDTH: 14%" align=middle>27</TD></TR>
<TR>
<TD class=CalWeekendDay style="WIDTH: 14%" align=middle>28</TD>
<TD style="WIDTH: 14%" align=middle>29</TD>
<TD style="WIDTH: 14%" align=middle>30</TD>
<TD style="WIDTH: 14%" align=middle>31</TD>
<TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>1</TD>
<TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>2</TD>
<TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>3</TD></TR>
<TR>
<TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>4</TD>
<TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>5</TD>
<TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>6</TD>
<TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>7</TD>
<TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>8</TD>
<TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>9</TD>
<TD class=CalOtherMonthDay style="WIDTH: 14%"
align=middle>10</TD></TR></TBODY></TABLE>
<H1 class=listtitle>随笔档案</H1>
<UL class=list>
<LI class=listitem><A class=listitem
id=ArchiveLinks1_Categories_CatList_ctl00_LinkList_ctl01_Link
href="http://www.cnblogs.com/calmwater/archive/2006/05.html">2006年5月 (15)</A>
<LI class=listitem><A class=listitem
id=ArchiveLinks1_Categories_CatList_ctl00_LinkList_ctl02_Link
href="http://www.cnblogs.com/calmwater/archive/2006/04.html">2006年4月 (1)</A>
<LI class=listitem><A class=listitem
id=ArchiveLinks1_Categories_CatList_ctl00_LinkList_ctl03_Link
href="http://www.cnblogs.com/calmwater/archive/2006/03.html">2006年3月 (15)</A>
</LI></UL></DIV>
<DIV id=centercontent>
<DIV class=singlepost>
<DIV class=posttitle><A class=singleposttitle id=viewpost1_TitleUrl
href="http://www.cnblogs.com/calmwater/archive/2006/05/20/405345.html">Clustering</A>
</DIV>
<P><STRONG><FONT size=5>Introduction</FONT></STRONG>
<P><STRONG><EM>Cluster analysis</EM> is the process of grouping objects into
subsets that have meaning in the context of a particular problem. The objects
are thereby organized into an efficient representation that characterizes the
population being sampled. Unlike classification, clustering does not rely on
predefined classes. Clustering is referred to as an <STRONG><EM>unsupervised
learning method</EM> because no information is provided about the "right answer"
for any of the objects. It can uncover previously undetected relationships in a
complex data set. Many applications for cluster analysis exist. For example, in
a business application, cluster analysis can be used to discover and
characterize customer groups for marketing purposes. </STRONG></STRONG></P>
<P>Two types of clustering algorithms are <STRONG><EM>nonhierarchical</EM> and
<STRONG><EM>hierarchical</EM>. In nonhierarchical clustering, such as the
<STRONG>k<EM>-means</EM> algorithm, the relationship between clusters is
undetermined. Hierarchical clustering repeatedly links pairs of clusters until
every data object is included in the hierarchy. With both of these approaches,
an important issue is how to determine the similarity between two objects, so
that clusters can be formed from objects with a high similarity to each other.
Commonly, <STRONG><EM>distance functions</EM>, such as the
<STRONG><EM>Manhattan</EM> and <STRONG><EM>Euclidian</EM> distance functions,
are used to determine similarity. A distance function yields a higher value for
pairs of objects that are less similar to one another. Sometimes a
<STRONG><EM>similarity function</EM> is used instead, which yields higher values
for pairs that are more similar. <BR><BR><BR><FONT size=+2><STRONG>Distance
Functions</STRONG></FONT>
</STRONG></STRONG></STRONG></STRONG></STRONG></STRONG></STRONG></P>
<P>Given two <EM>p</EM>-dimensional data objects <EM>i</EM> =
(<EM>x<SUB>i</SUB>,<EM>x<SUB>i</SUB>, ...,<EM>x<SUB>ip</SUB>) and <EM>j</EM> =
(<EM>x<SUB>j</SUB>,<EM>x<SUB>j</SUB>, ...,<EM>x<SUB>jp</SUB>), the following
common distance functions can be defined: </EM></EM>2</EM>1</EM></EM>2</EM>1</P>
<BLOCKQUOTE><STRONG>Euclidian Distance Function:</STRONG> <BR><IMG alt=""
src="Clustering - 咨询之路 - 博客园.files/Euclid.gif" border=1>
<BR><BR><STRONG>Manhattan Distance Function:</STRONG> <BR><IMG alt=""
src="Clustering - 咨询之路 - 博客园.files/Man.gif" border=1> </BLOCKQUOTE>When using
the Euclidian distance function to compare distances, it is not necessary to
calculate the square root because distances are always positive numbers and as
such, for two distances, <EM>d</EM><SUB>1</SUB> and <EM>d</EM><SUB>2</SUB>,
<FONT face=Symbol>Ö</FONT><EM>d</EM><SUB>1</SUB> > <FONT
face=Symbol>Ö</FONT><EM>d</EM><SUB>2</SUB> <FONT face=Symbol>Û</FONT>
<EM>d</EM><SUB>1</SUB> > <EM>d</EM><SUB>2</SUB>. If some of an object′s
attributes are measured along different scales, so when using the Euclidian
distance function, attributes with larger scales of measurement may overwhelm
attributes measured on a smaller scale. To prevent this problem, the attribute
values are often normalized to lie between 0 and 1.
<P>Other distance functions may be more appropriate for some data.
<BR><BR><BR><FONT size=+2><STRONG><EM>k</EM>-means Algorithm</STRONG></FONT>
</P>
<P>The <EM>k</EM>-means algorithm is one of a group of algorithms called
<STRONG><EM>partitioning methods</EM></STRONG>. The problem of partitional
clustering can be formally stated as follows: Given <EM>n</EM> objects in a
<EM>d</EM>-dimensional metric space, determine a partition of the objects into
<EM>k</EM> groups, or clusters, such that the objects in a cluster are more
similar to each other than to objects in different clusters. Recall that a
partition divides a set into disjoint parts that together include all members of
the set. The value of <EM>k</EM> may or may not be specified and a clustering
criterion, typically the <STRONG><EM>squared-error criterion</EM>, must be
adopted. </STRONG></P>
<P>The solution to this problem is straightforward. Select a clustering
criterion, then for each data object select the cluster that optimizes the
criterion. The <EM>k</EM>-means algorithm initializes <EM>k</EM> clusters by
arbitrarily selecting one object to represent each cluster. Each of the
remaining objects are assigned to a cluster and the clustering criterion is used
to calculate the cluster mean. These means are used as the new cluster points
and each object is reassigned to the cluster that it is most similar to. This
continues until there is no longer a change when the clusters are recalculated.
The algorithm is shown in Figure 1. <BR><BR></P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -