⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 project-notes.htm

📁 在weak环境下的knn算法开发 具体需要的说明都在文件包中
💻 HTM
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><title>Project Notes</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><style type="text/css">body {	margin: 50px;	font-family: Arial, Helvetica, sans-serif;	font-size: 10pt;	line-height: 25px;	}.titletext {	font-family: Arial, Helvetica, sans-serif;	font-size: 18pt;}.section {	font-family: Arial, Helvetica, sans-serif;	font-size: 12pt;	font-weight: bold;}</style></head><body><div align="center" class="titletext">  <p>Weka K-Nearest-Neighbor Development</p></div><hr align="center" width="300" size="1"><p><br>  <br>  <font class="section">Plan</font> <br>  <br>  The original plan was a vague &quot;implementing k-nearest-neighbor classifier   in the weka environment.&quot; Since then the plan for this project has formed   a bit more than that. The second plan involved creating a k-nearest-neighbor   classifier that can be used in Weka Explorer. However, complications explained   later has made that goal impractical. Even so, I had intended to implement a   gui-based program to perform the k-nearest-neighbor classifier, with plenty   of options, and that can still be done. </p><p><br>  <font class="section">Process</font> </p><p>I started by learning how to develop applications in Weka from the book Data   Mining: Practical Machine Learning Tools and Techniques (Second Edition) by   Ian H. Witten and Eibe Frank, which I borrowed. I later discovered that a tutorial   pdf file included in the Weka distribution contained a slightly older version   of the relevant chapters on weka development, which I used when after I returned   the book.<br>  <br>  Using the book as a guide, I wrote a KNN class compliant with Weka specifications.   Most of the class was written using Vim, which is a Vi-like editor. Later, I   discovered that it would be impossible for me to compile the code with the Weka   library because I lack the storage space on the FSU servers to store the weka.jar   file. As a result I coded the rest of the file using NetBeans IDE 4.1 on my   own computer, and compiled successfully.<br></p><font class="section">Weka Software Development</font><p>  All classifiers in Weka extend weka.classifiers.Classifier. There are some important   members to override when developing a new classifier. The relevant methods used   in my KNN implementation are as follows:</p><p><strong>buildClassifier()</strong></p><p>This member takes an Instances object (training set), which is the Weka representation   of a data set, and builds the classifier. The k-nearest-neighbor, in its simplest   form, does not need to be built before execution, so this method simply checks   for invalid data, and stores the Instances as member data to be used when the   k-nearest-neighbor is performed on test samples.</p><p><strong>classifyInstance()</strong></p><p>This is a method that classifies a single instance (data point). This is where   the bulk of the work takes place. The distance is calculated between the given   instance and every instance in the training set (previously stored by buildClassifier()).   A running list of nearest neighbors is kept in the process, a LinkedList is   used for this because a new entry can be added easily. The list has a maximum   size of k, so to keep only the k nearest neighbors. A tricky part is later counting   the number of occurances of a given class in a flexible way, unbounded by number   of classes, and taking into consideration non-integer class labels, which is   often the case in Weka. This was achieved with a Hashtable, where a Double object   (class) acted as a key, and an Integer (count) as the value. The values in the   Hashtable are incremented as the list of k-nearest-neighbors are read. The likely   class of the test sample is whichever has the highest value.</p><p>There were some snags found later that sunk my attempt to make the KNN classifier   work with Weka Explorer or other Weka GUIs. First of all, in order to add the   KNN classifier so that it can be used in a Weka GUI involves going into the   weka.jar file and modifying GenericPropertiesCreator.props to include the KNN's   package. As it turns out, the tools I have lack the ability to rebuild a .jar   file in such a way that it actually works afterwards. I could go find a better   jarring tool that I might get better results with, but I realized that it's   not really worth it. That is because even if I could make the KNN executable   by Weka Explorer, it would only be able to classify using a default k, because   there is no provision that I can find within Weka Explorer that allows a user   to specify a classifier-specific argument. And although one could specify such   arguments when using Weka's Simple CLI, which is a command-line interface, that's   not very impressive.</p><p>So instead of fixing something that would get me meager results, I decided   to devote the rest of my time to making my own GUI-based program to run the   KNN with. </p><p><font class="section">Homemade GUI</font> </p><p>&quot;Seemed like a good idea at the time&quot;</p><p>I proceeded to create a GUI. Not remembering how to do that, I figured out   how to make NetBeans do it for me. Each button and field allows users to input   parameters for classification.</p><p>Simple interface, 2 tabs, one for loading datasets, the other for classifier   settings. A text box for the results of the classifier, and a status window   at the bottom so the user knows how the program is doing.</p><p>I got the basic knn classifer to work perfectly, the partial distance one,   not so much. Can't find the problem.</p><p>Here's a problem I found later. I can't actually execute the program outside   of netbeans. Usually I'll have an executable jar file, but in this case, the   jar refuses to execute. It is quite a problem.</p><p><font class="section">Theoretical Improvements</font> </p><p>&quot;that'll be the day&quot;</p><p>Some things I wanted to do on this project that I didn't get to, that would   have really made this application better.</p><ul>  <li>A working partial distance functionality.</li>  <li>search tree method, which is complicated, becuase there is many parameters     one could have, such as number of branches per node, perhaps a different number     for each level, how many levels the tree should have, or how many data points     in each leaf node.</li>  <li>Multi-threaded k-nearest-neighbor, could divide the classification of data     points into subsets executing concurrently. Threads are too complicated for     the time constraints I have.</li>  <li>The attributes to omit in the partial distance technique be listed with     their attribute names, and selectable, as opposed to the text field currently     in the program.</li>  <li>...</li></ul></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -