首页 › 资源下载 › 其他 › This data set contai › 源码查看

http:^^www.cs.cornell.edu^info^people^vitrano^cs537projprop.html

来自「This data set contains WWW-pages collect」· HTML 代码 · 共 133 行

HTML

133 行

MIME-Version: 1.0
Server: CERN/3.0
Date: Sunday, 01-Dec-96 19:31:05 GMT
Content-Type: text/html
Content-Length: 3351
Last-Modified: Monday, 21-Oct-96 21:38:43 GMT

<html><head><title>Data Mining</title></head><body BACKGROUND="stucco.jpg"><h1 align=center>EMV - CS537 Project Proposal</h1><hr><br><center><h2>Classification in Data Mining</h2><br><h4>Eric Vitrano</h4></center><!WA0><!WA0><!WA0><!WA0><img align=center src="http://www.cs.cornell.edu/Info/People/vitrano/colorbar.gif"><br><body><p><u><h3>Common Level</u></h3>Class table<ul>methods :<ul>open(filename)<br>	close(filename)<br>	write tuple(char *) /* sends in a char string of the whole record in 			ascii */<br>	read tuple(tuple number) /* will read the tuple and return a tuple 			instance */<br>	get_scheme /* returns a char string which lists the scheme */<br>	set_scheme /* takes in a char string and sets the scheme of the 			table to that scheme */<br></ul></ul>Class tuple<ul>method:<ul>	get_attribute(attribute number, location to copy the 	attribute value to )</ul></ul></p><hr><p><u><h3>Data Mining Classifiers</u></h3><b>Stage 1</b><br>Once the above groundwork is complete, I will implement a version of an elementarydata mining classification algorithm.  This algorithm will be based on the ID-3decision tree model, with limited pruning.  A summary of the algorithm in pseudocodeform is as follows:<ul>	Tree Building<br><ul>		MakeTree (Training Data)<br>		{		   Partition (Training Data);		}<br>		Partition (Data)		{		   If all (s in S) in same class - tree done.<br>		   Else for each attribute, find best split (Split (S)), and partition.<br>		   Partition (All partitions from above).		}<br></ul>	Tree Pruning<br><ul>		RemoveNode (Node)<br>		{		   For all (nodes in Node)<br>			If (node in Node) has same class value as parent, remove.		}<br>	Split Evaluation<br>		Split (Data)<br>		{		   For each attribute, calculate goodness of an attribute. <br>			return highest goodness.		}<br>		Split_Partition (Data)<br>		{		   Partition Data into two sets based on goodness from Split.		}<br></ul></ul><br>The above algorithm will be implemented in Visual C++, with the intention to build a decisiontree that will classify tuples into defined classes.  The tree must be trained using a trainingset where the classes of the tuples is known, and then tested on data to see if the returnedclasses are of the appropriate type.  The results can then be used for directing queries on incoming data, as well as classifying existing data.</p><p><b>Stage 2</b><br>When the above algorithm is implemented, a further algorithm will be implemented.  This nextalgorithm will either be related to SLIQ, or will be something generated by observing thedevelopment and processes of the general case.Possible areas of improvement would be pruning on the fly, limiting the searches of the dataand the amount of data needed to be kept in memory, and presorting/partial classification ofthe data.<br><br></p><hr><p><h3><u>Time Estimates</h3></u>I would expect the following schedule to be an approximate scheme for progress:<br><h4><ul>	October  21 - Completion of groundwork steps.<br>	November  4 - Completion of the general algorithm.<br>	November 25 - Completion of Stage 2 algorithm.<br>	December  2 - Evaluation and further consideration of data mining classifictiaon.</ul></h4></p></body><br><br><!WA1><!WA1><!WA1><!WA1><img align=center src="http://www.cs.cornell.edu/Info/People/vitrano/colorbar.gif"><br><li><h5><!WA2><!WA2><!WA2><!WA2><a href="http://www.cs.cornell.edu/Info/People/vitrano/vitrano.html">EMV Home Page</a></h5><br></html>

http:^^www.cs.cornell.edu^info^people^vitrano^cs537projprop.html - 源码说明

本页面展示了「This data set contains WWW-pages collected from computer science departments of various universities」中的 http:^^www.cs.cornell.edu^info^people^vitrano^cs537projprop.html 源码文件，采用 HTML 编程语言编写，共 133 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫开发者社区收录了大量与数据集相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?