📄 readme_parallel
字号:
===================================================================== ====== README ====== Weka-Parallel 3.2.3 12 Sep 2002 Java Programs for Machine Learning Copyright (C) 2002 Sebastian Celis, David Musicant email: celiss@carleton.edu dmusican@carleton.edu webpage: http://www.mathcs.carleton.edu/weka/ =====================================================================Contents:---------1. About2. Getting started3. Source code4. Memory Issues5. Credits6. Submission of code and bug reports7. Copyright----------------------------------------------------------------------1. About:---------This version of Weka was created with the intention of being able to runthe cross-validation portion of any given classifier very quickly.This speed increase is accomplished by simultaneously performing thenecessary calculations using many different machines.----------------------------------------------------------------------2. Getting started:-------------------=============INITIAL SETUP=============First, install Weka-Parallel on every machine that is to takepart in the distributed calculations. Do this the exact same wayyou would install Weka. These installation instructions can be foundin the file README included with both Weka and Weka-Parallel.A Weka-Parallel session is run on a single client machine and a number ofdistributed servers. Each server runs software in the background thatlistens for incoming requests placed by the client and then fulfills them.For each computer that is to be used as a distributed server, launch thesoftware by entering the following line at a command line prompt on thatcomputer:java weka.core.DistributedServer <port number>Every computer with the DistributedServer program running can act as aserver and will listen on a specified port for all incoming work requests. Each request that the server receives is given its own thread, thusallowing many computers to connect to this one server at the same time. This threading will also take advantage of multi-processor machines. If aserver has two processors, alter the configuration file described below bytelling Weka-Parallel to connect to this computer more than once. This isdone by entering the address on the configuration file twice.This server program can be launched manually on each computer or can beplaced in a startup script that runs when each computer boots. A logof all connections processed from the server software is automaticallysent to the output screen but can be redirected to a text file if theuser so chooses.==================CONFIGURATION FILE==================In order for Weka to know which computers to distribute work to, a configfile needs to be present at ~/.weka-parallel on UNIX systems andC:\WINDOWS\.weka-parallel on Windows systems. The first line of the configfile should look like:PORT=XXXXwhere XXXX is the port number on which each instance of DistributedServer islistening. The remainder of the config file should consist of the addressesof each computer running an instance of DistributedServer with one addressper line.The configuration file can be created manually or can be created throughWeka-Parallel's GUI in the cross-validation options under the classifiers pane.========================RUNNING WEKA IN PARALLEL========================When using a command line interface-----------------------------------To do cross-validation within Weka in parallel, simply add the -a tag ontoa standard weka command line. For example:weka.classifiers.j48.J48 -t weather.arff -aWhen using the simple GUI-------------------------After selecting a dataset and classifier, first press the button next tocross-validation to alter the cross-validation settings. Then check the boxmarked "Run in parallel" and then hit "OK". Finally, run the classifierand the cross-validation will occur in parallel.----------------------------------------------------------------------3. Source code:---------------The source code for Weka-Parallel is in weka-src.jar. Toexpand it, use the jar utility that's in every Java distribution.----------------------------------------------------------------------4. Memory Issues:-----------------Most Java virtual machines allocate a maximum amount of memory to run Javaprograms. Because of this, it is not difficult to exceed available memoryby using a large dataset. This is typically not a difficult problem asJava will report an out of memory error to the user. The user can can then rerun the program with a larger amount of memory.We run into a bigger problem in Weka-Parallel when one of the remoteservers runs out of memory. The Java program running on the server willcrash, with no warning sent back to the client. The user must thenmanually restart the Java distributed server program on the servercomputer.Another way that the remote servers might run out of memory is if multiplecopies of Weka-Parallel are connecting to the same remote machine at thesame time. Due to the threaded nature of Weka-Parallel, the server willconnect to both copies of Weka-Parallel simultaneously and thus load in acopy of each dataset. If the datasets are large enough to exceed memorytogether, then the DistributedServer will crash.Thus, if you think that you will ever use large datasets, or if youbelieve that multiple people running Weka-Parallel will try and use thesame remote machine at the same time, start the remote machine's programup with the flags that increase the amount of memory that Java allocates. For example, with Sun's JDK,java -mx250000000 -oss250000000 weka.core.DistributedServer <port>will set the maximum Java heap and stack size of the remote server to250,000,000 bytes.Occasionally, you might see the following message when running Weka-Parallel:do_ypcall: clnt_call: RPC: Unable to send; errno = No buffer space availableThis occurs when the client runs out of memory while sending data to eachof the servers. While not guaranteed, the program should continue to run and output the correct results. If you do see the above error message, werecommend you increase the amount of memory allocated to Weka-Parallel. ----------------------------------------------------------------------5. Credits:-----------David Musicant - Project advisor and coordinatorSebastian Celis - weka.core.DistributedServer, weka.classifiers.EvaluationClient, as well as changes in: weka.classifiers.Evaluation, weka.gui.explorer.ClassifierPanelA special thanks to the original Weka team who have created anamazing piece of software.----------------------------------------------------------------------6. Submission of code and bug reports:--------------------------------------If you have written any code that benefits Weka's new parallelizationfeatures, and you think it should be included in this distribution, sendthe code to dmusican@carleton.edu.If you find any bugs, send a fix to the above address. If that's toohard, send a bug report instead.----------------------------------------------------------------------7. Copyright:-------------Weka and Weka-Parallel are distributed under the GNU public license.Please read the file COPYING that is included with both Weka andWeka-Parallel.----------------------------------------------------------------------
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -