⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 clustering - 咨询之路 - 博客园.htm

📁 这里包含了聚类的工具箱还有很详细的文档说明
💻 HTM
📖 第 1 页 / 共 3 页
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0063)http://www.cnblogs.com/calmwater/archive/2006/05/20/405345.html -->
<HTML><HEAD id=Head><TITLE>Clustering - 咨询之路 - 博客园</TITLE>
<META http-equiv=Content-Type content="text/html; charset=utf-8">
<META id=metaKeywords content=Clustering name=keywords><LINK id=MainCss 
href="Clustering - 咨询之路 - 博客园.files/style.css" type=text/css 
rel=stylesheet><LINK id=RSSLink title=RSS 
href="http://www.cnblogs.com/calmwater/rss.aspx" type=application/rss+xml 
rel=alternate>
<META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD>
<BODY>
<FORM id=Form1 name=Form1 onsubmit="javascript:return WebForm_OnSubmit();" 
action=405345.html method=post>
<DIV><INPUT id=__EVENTTARGET type=hidden name=__EVENTTARGET> <INPUT 
id=__EVENTARGUMENT type=hidden name=__EVENTARGUMENT> <INPUT 
id="&#13;&#10;__VIEWSTATE" type=hidden name=__VIEWSTATE> </DIV>
<SCRIPT type=text/javascript>
<!--
var theForm = document.forms['Form1'];
if (!theForm) {
    theForm = document.Form1;
}
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}
// -->
</SCRIPT>

<SCRIPT src="Clustering - 咨询之路 - 博客园.files/WebResource.axd" 
type=text/javascript></SCRIPT>

<SCRIPT language=JavaScript>
									function ctlent(evt,id)
											{
												if(evt.ctrlKey && evt.keyCode == 13)
												{	
													try
													{
														TempSave(id);
													}
													catch(ex)
													{
													}
													finally
													{
													    __doPostBack('AjaxHolder$PostComment$btnSubmit','')
													}
												}
		
												}</SCRIPT>

<SCRIPT language=JavaScript>function SetReplyAuhor(author){document.getElementById('AjaxHolder_PostComment_tbComment').value+="@"+author+"\n";document.getElementById('AjaxHolder_PostComment_tbComment').focus();return false}</SCRIPT>

<SCRIPT 
src="F:\study\ClusteringToolbox\Clustering - 咨询之路 - 博客园.files\WebResource(1).axd" 
type=text/javascript></SCRIPT>

<SCRIPT src="Clustering - 咨询之路 - 博客园.files/ScriptResource.axd" 
type=text/javascript></SCRIPT>

<SCRIPT 
src="F:\study\ClusteringToolbox\Clustering - 咨询之路 - 博客园.files\ScriptResource(1).axd" 
type=text/javascript></SCRIPT>

<SCRIPT type=text/javascript>
<!--
function WebForm_OnSubmit() {
if (typeof(ValidatorOnSubmit) == "function" && ValidatorOnSubmit() == false) return false;
return true;
}
// -->
</SCRIPT>

<DIV id=banner>
<DIV class=header>
<DIV><A class=headermaintitle id=Header1_HeaderTitle 
href="http://www.cnblogs.com/calmwater/">咨询之路</A> </DIV>
<DIV>奔向麦肯锡 </DIV></DIV></DIV>
<DIV id=leftcontent style="DISPLAY: none">
<H1 class=listtitle>导航</H1>
<UL class=list>
  <LI class=listitem><A class=listitem id=MyLinks1_HomeLink 
  href="http://www.cnblogs.com/">博客园</A> 
  <LI class=listitem><A class=listitem id=MyLinks1_MyHomeLink 
  href="http://calmwater.cnblogs.com/">首页</A> 
  <LI class=listitem><A class=listitem id=MyLinks1_NewPostLink 
  href="http://www.cnblogs.com/calmwater/admin/EditPosts.aspx?opt=1">新随笔</A> 
  <LI class=listitem><A class=listitem id=MyLinks1_ContactLink accessKey=9 
  href="http://www.cnblogs.com/calmwater/contact.aspx?id=1">联系</A> 
  <LI class=listitem><A class=listitem id=MyLinks1_Syndication 
  href="http://www.cnblogs.com/calmwater/rss">聚合</A><A id=MyLinks1_XMLLink 
  href="http://www.cnblogs.com/calmwater/rss"><IMG 
  style="BORDER-TOP-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-RIGHT-WIDTH: 0px" 
  src="Clustering - 咨询之路 - 博客园.files/xml.gif"></A> 
  <LI class=listitem><A class=listitem id=MyLinks1_Admin 
  href="http://www.cnblogs.com/calmwater/admin/EditPosts.aspx">管理</A> </LI></UL>
<TABLE class=Cal id=Calendar1_entryCal title=Calendar 
style="BORDER-RIGHT: 1px solid; BORDER-TOP: 1px solid; BORDER-LEFT: 1px solid; BORDER-BOTTOM: 1px solid; BORDER-COLLAPSE: collapse" 
cellSpacing=0 cellPadding=0 border=0>
  <TBODY>
  <TR>
    <TD style="BACKGROUND-COLOR: silver" colSpan=7>
      <TABLE class=CalTitle style="WIDTH: 100%; BORDER-COLLAPSE: collapse" 
      cellSpacing=0 border=0>
        <TBODY>
        <TR>
          <TD class=CalNextPrev style="WIDTH: 15%"><A 
            title="Go to the previous month" style="COLOR: black" 
            href="javascript:__doPostBack('Calendar1$entryCal','V2282')">&lt;</A></TD>
          <TD style="WIDTH: 70%" align=middle>2006年5月</TD>
          <TD class=CalNextPrev style="WIDTH: 15%" align=right><A 
            title="Go to the next month" style="COLOR: black" 
            href="javascript:__doPostBack('Calendar1$entryCal','V2343')">&gt;</A></TD></TR></TBODY></TABLE></TD></TR>
  <TR>
    <TH class=CalDayHeader scope=col align=middle abbr=日>日</TH>
    <TH class=CalDayHeader scope=col align=middle abbr=一>一</TH>
    <TH class=CalDayHeader scope=col align=middle abbr=二>二</TH>
    <TH class=CalDayHeader scope=col align=middle abbr=三>三</TH>
    <TH class=CalDayHeader scope=col align=middle abbr=四>四</TH>
    <TH class=CalDayHeader scope=col align=middle abbr=五>五</TH>
    <TH class=CalDayHeader scope=col align=middle abbr=六>六</TH></TR>
  <TR>
    <TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>30</TD>
    <TD style="WIDTH: 14%" align=middle>1</TD>
    <TD style="WIDTH: 14%" align=middle>2</TD>
    <TD style="WIDTH: 14%" align=middle>3</TD>
    <TD style="WIDTH: 14%" align=middle>4</TD>
    <TD style="WIDTH: 14%" align=middle>5</TD>
    <TD class=CalWeekendDay style="WIDTH: 14%" align=middle>6</TD></TR>
  <TR>
    <TD class=CalWeekendDay style="WIDTH: 14%" align=middle><A 
      href="http://www.cnblogs.com/calmwater/archive/2006/05/07.html"><U>7</U></A></TD>
    <TD style="WIDTH: 14%" align=middle>8</TD>
    <TD style="WIDTH: 14%" align=middle><A 
      href="http://www.cnblogs.com/calmwater/archive/2006/05/09.html"><U>9</U></A></TD>
    <TD style="WIDTH: 14%" align=middle><A 
      href="http://www.cnblogs.com/calmwater/archive/2006/05/10.html"><U>10</U></A></TD>
    <TD style="WIDTH: 14%" align=middle>11</TD>
    <TD style="WIDTH: 14%" align=middle><A 
      href="http://www.cnblogs.com/calmwater/archive/2006/05/12.html"><U>12</U></A></TD>
    <TD class=CalWeekendDay style="WIDTH: 14%" align=middle><A 
      href="http://www.cnblogs.com/calmwater/archive/2006/05/13.html"><U>13</U></A></TD></TR>
  <TR>
    <TD class=CalWeekendDay style="WIDTH: 14%" align=middle>14</TD>
    <TD style="WIDTH: 14%" align=middle>15</TD>
    <TD style="WIDTH: 14%" align=middle>16</TD>
    <TD style="WIDTH: 14%" align=middle>17</TD>
    <TD style="WIDTH: 14%" align=middle>18</TD>
    <TD style="WIDTH: 14%" align=middle><A 
      href="http://www.cnblogs.com/calmwater/archive/2006/05/19.html"><U>19</U></A></TD>
    <TD class=CalWeekendDay style="WIDTH: 14%" align=middle><A 
      href="http://www.cnblogs.com/calmwater/archive/2006/05/20.html"><U>20</U></A></TD></TR>
  <TR>
    <TD class=CalWeekendDay style="WIDTH: 14%" align=middle>21</TD>
    <TD style="WIDTH: 14%" align=middle>22</TD>
    <TD style="WIDTH: 14%" align=middle>23</TD>
    <TD style="WIDTH: 14%" align=middle>24</TD>
    <TD style="WIDTH: 14%" align=middle>25</TD>
    <TD style="WIDTH: 14%" align=middle>26</TD>
    <TD class=CalWeekendDay style="WIDTH: 14%" align=middle>27</TD></TR>
  <TR>
    <TD class=CalWeekendDay style="WIDTH: 14%" align=middle>28</TD>
    <TD style="WIDTH: 14%" align=middle>29</TD>
    <TD style="WIDTH: 14%" align=middle>30</TD>
    <TD style="WIDTH: 14%" align=middle>31</TD>
    <TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>1</TD>
    <TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>2</TD>
    <TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>3</TD></TR>
  <TR>
    <TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>4</TD>
    <TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>5</TD>
    <TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>6</TD>
    <TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>7</TD>
    <TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>8</TD>
    <TD class=CalOtherMonthDay style="WIDTH: 14%" align=middle>9</TD>
    <TD class=CalOtherMonthDay style="WIDTH: 14%" 
align=middle>10</TD></TR></TBODY></TABLE>
<H1 class=listtitle>随笔档案</H1>
<UL class=list>
  <LI class=listitem><A class=listitem 
  id=ArchiveLinks1_Categories_CatList_ctl00_LinkList_ctl01_Link 
  href="http://www.cnblogs.com/calmwater/archive/2006/05.html">2006年5月 (15)</A> 
  <LI class=listitem><A class=listitem 
  id=ArchiveLinks1_Categories_CatList_ctl00_LinkList_ctl02_Link 
  href="http://www.cnblogs.com/calmwater/archive/2006/04.html">2006年4月 (1)</A> 
  <LI class=listitem><A class=listitem 
  id=ArchiveLinks1_Categories_CatList_ctl00_LinkList_ctl03_Link 
  href="http://www.cnblogs.com/calmwater/archive/2006/03.html">2006年3月 (15)</A> 
  </LI></UL></DIV>
<DIV id=centercontent>
<DIV class=singlepost>
<DIV class=posttitle><A class=singleposttitle id=viewpost1_TitleUrl 
href="http://www.cnblogs.com/calmwater/archive/2006/05/20/405345.html">Clustering</A> 
</DIV>
<P><STRONG><FONT size=5>Introduction</FONT></STRONG> 
<P><STRONG><EM>Cluster analysis</EM> is the process of grouping objects into 
subsets that have meaning in the context of a particular problem. The objects 
are thereby organized into an efficient representation that characterizes the 
population being sampled. Unlike classification, clustering does not rely on 
predefined classes. Clustering is referred to as an <STRONG><EM>unsupervised 
learning method</EM> because no information is provided about the "right answer" 
for any of the objects. It can uncover previously undetected relationships in a 
complex data set. Many applications for cluster analysis exist. For example, in 
a business application, cluster analysis can be used to discover and 
characterize customer groups for marketing purposes. </STRONG></STRONG></P>
<P>Two types of clustering algorithms are <STRONG><EM>nonhierarchical</EM> and 
<STRONG><EM>hierarchical</EM>. In nonhierarchical clustering, such as the 
<STRONG>k<EM>-means</EM> algorithm, the relationship between clusters is 
undetermined. Hierarchical clustering repeatedly links pairs of clusters until 
every data object is included in the hierarchy. With both of these approaches, 
an important issue is how to determine the similarity between two objects, so 
that clusters can be formed from objects with a high similarity to each other. 
Commonly, <STRONG><EM>distance functions</EM>, such as the 
<STRONG><EM>Manhattan</EM> and <STRONG><EM>Euclidian</EM> distance functions, 
are used to determine similarity. A distance function yields a higher value for 
pairs of objects that are less similar to one another. Sometimes a 
<STRONG><EM>similarity function</EM> is used instead, which yields higher values 
for pairs that are more similar. <BR><BR><BR><FONT size=+2><STRONG>Distance 
Functions</STRONG></FONT> 
</STRONG></STRONG></STRONG></STRONG></STRONG></STRONG></STRONG></P>
<P>Given two <EM>p</EM>-dimensional data objects <EM>i</EM> = 
(<EM>x<SUB>i</SUB>,<EM>x<SUB>i</SUB>, ...,<EM>x<SUB>ip</SUB>) and <EM>j</EM> = 
(<EM>x<SUB>j</SUB>,<EM>x<SUB>j</SUB>, ...,<EM>x<SUB>jp</SUB>), the following 
common distance functions can be defined: </EM></EM>2</EM>1</EM></EM>2</EM>1</P>
<BLOCKQUOTE><STRONG>Euclidian Distance Function:</STRONG> <BR><IMG alt="" 
  src="Clustering - 咨询之路 - 博客园.files/Euclid.gif" border=1> 
  <BR><BR><STRONG>Manhattan Distance Function:</STRONG> <BR><IMG alt="" 
  src="Clustering - 咨询之路 - 博客园.files/Man.gif" border=1> </BLOCKQUOTE>When using 
the Euclidian distance function to compare distances, it is not necessary to 
calculate the square root because distances are always positive numbers and as 
such, for two distances, <EM>d</EM><SUB>1</SUB> and <EM>d</EM><SUB>2</SUB>, 
<FONT face=Symbol>Ö</FONT><EM>d</EM><SUB>1</SUB> &gt; <FONT 
face=Symbol>Ö</FONT><EM>d</EM><SUB>2</SUB> <FONT face=Symbol>Û</FONT> 
<EM>d</EM><SUB>1</SUB> &gt; <EM>d</EM><SUB>2</SUB>. If some of an object′s 
attributes are measured along different scales, so when using the Euclidian 
distance function, attributes with larger scales of measurement may overwhelm 
attributes measured on a smaller scale. To prevent this problem, the attribute 
values are often normalized to lie between 0 and 1. 
<P>Other distance functions may be more appropriate for some data. 
<BR><BR><BR><FONT size=+2><STRONG><EM>k</EM>-means Algorithm</STRONG></FONT> 
</P>
<P>The <EM>k</EM>-means algorithm is one of a group of algorithms called 
<STRONG><EM>partitioning methods</EM></STRONG>. The problem of partitional 
clustering can be formally stated as follows: Given <EM>n</EM> objects in a 
<EM>d</EM>-dimensional metric space, determine a partition of the objects into 
<EM>k</EM> groups, or clusters, such that the objects in a cluster are more 
similar to each other than to objects in different clusters. Recall that a 
partition divides a set into disjoint parts that together include all members of 
the set. The value of <EM>k</EM> may or may not be specified and a clustering 
criterion, typically the <STRONG><EM>squared-error criterion</EM>, must be 
adopted. </STRONG></P>
<P>The solution to this problem is straightforward. Select a clustering 
criterion, then for each data object select the cluster that optimizes the 
criterion. The <EM>k</EM>-means algorithm initializes <EM>k</EM> clusters by 
arbitrarily selecting one object to represent each cluster. Each of the 
remaining objects are assigned to a cluster and the clustering criterion is used 
to calculate the cluster mean. These means are used as the new cluster points 
and each object is reassigned to the cluster that it is most similar to. This 
continues until there is no longer a change when the clusters are recalculated. 
The algorithm is shown in Figure 1. <BR><BR></P>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -