http:^^www.cs.cornell.edu^info^people^vitrano^cs537projprop.html

来自「This data set contains WWW-pages collect」· HTML 代码 · 共 133 行

HTML
133
字号
MIME-Version: 1.0
Server: CERN/3.0
Date: Sunday, 01-Dec-96 19:31:05 GMT
Content-Type: text/html
Content-Length: 3351
Last-Modified: Monday, 21-Oct-96 21:38:43 GMT

<html><head><title>Data Mining</title></head><body BACKGROUND="stucco.jpg"><h1 align=center>EMV - CS537 Project Proposal</h1><hr><br><center><h2>Classification in Data Mining</h2><br><h4>Eric Vitrano</h4></center><!WA0><!WA0><!WA0><!WA0><img align=center src="http://www.cs.cornell.edu/Info/People/vitrano/colorbar.gif"><br><body><p><u><h3>Common Level</u></h3>Class table<ul>methods :<ul>open(filename)<br>	close(filename)<br>	write tuple(char *) /* sends in a char string of the whole record in 			ascii */<br>	read tuple(tuple number) /* will read the tuple and return a tuple 			instance */<br>	get_scheme /* returns a char string which lists the scheme */<br>	set_scheme /* takes in a char string and sets the scheme of the 			table to that scheme */<br></ul></ul>Class tuple<ul>method:<ul>	get_attribute(attribute number, location to copy the 	attribute value to )</ul></ul></p><hr><p><u><h3>Data Mining Classifiers</u></h3><b>Stage 1</b><br>Once the above groundwork is complete, I will implement a version of an elementarydata mining classification algorithm.  This algorithm will be based on the ID-3decision tree model, with limited pruning.  A summary of the algorithm in pseudocodeform is as follows:<ul>	Tree Building<br><ul>		MakeTree (Training Data)<br>		{		   Partition (Training Data);		}<br>		Partition (Data)		{		   If all (s in S) in same class - tree done.<br>		   Else for each attribute, find best split (Split (S)), and partition.<br>		   Partition (All partitions from above).		}<br></ul>	Tree Pruning<br><ul>		RemoveNode (Node)<br>		{		   For all (nodes in Node)<br>			If (node in Node) has same class value as parent, remove.		}<br>	Split Evaluation<br>		Split (Data)<br>		{		   For each attribute, calculate goodness of an attribute. <br>			return highest goodness.		}<br>		Split_Partition (Data)<br>		{		   Partition Data into two sets based on goodness from Split.		}<br></ul></ul><br>The above algorithm will be implemented in Visual C++, with the intention to build a decisiontree that will classify tuples into defined classes.  The tree must be trained using a trainingset where the classes of the tuples is known, and then tested on data to see if the returnedclasses are of the appropriate type.  The results can then be used for directing queries on incoming data, as well as classifying existing data.</p><p><b>Stage 2</b><br>When the above algorithm is implemented, a further algorithm will be implemented.  This nextalgorithm will either be related to SLIQ, or will be something generated by observing thedevelopment and processes of the general case.Possible areas of improvement would be pruning on the fly, limiting the searches of the dataand the amount of data needed to be kept in memory, and presorting/partial classification ofthe data.<br><br></p><hr><p><h3><u>Time Estimates</h3></u>I would expect the following schedule to be an approximate scheme for progress:<br><h4><ul>	October  21 - Completion of groundwork steps.<br>	November  4 - Completion of the general algorithm.<br>	November 25 - Completion of Stage 2 algorithm.<br>	December  2 - Evaluation and further consideration of data mining classifictiaon.</ul></h4></p></body><br><br><!WA1><!WA1><!WA1><!WA1><img align=center src="http://www.cs.cornell.edu/Info/People/vitrano/colorbar.gif"><br><li><h5><!WA2><!WA2><!WA2><!WA2><a href="http://www.cs.cornell.edu/Info/People/vitrano/vitrano.html">EMV Home Page</a></h5><br></html>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?