📄 fastdnaml_doc_1.2p.txt
字号:
--------------------------------Different MPI implementations have different ways of starting up programs.Here we will describe how to do it under MPICH, the implementation fromArgonne National Lab, using the ch_p4 device. First create a p4 procgroupfile containing the machines you want to use and the programs to run on them.For example, if a Beowulf cluster consists of machines ppc00, ppc01,...ppc32,and you want to run the monitor and 5 workers, you could create the followingprocgroup file: ppc00 0 ppc00 1 /home/jsmith/fastDNAml_1.2.2p/src/mpi_foreman ppc00 1 /home/jsmith/fastDNAml_1.2.2p/src/mpi_fastDNAml ppc06 1 /home/jsmith/fastDNAml_1.2.2p/src/mpi_worker ppc07 1 /home/jsmith/fastDNAml_1.2.2p/src/mpi_worker ppc08 1 /home/jsmith/fastDNAml_1.2.2p/src/mpi_worker ppc09 1 /home/jsmith/fastDNAml_1.2.2p/src/mpi_worker ppc10 1 /home/jsmith/fastDNAml_1.2.2p/src/mpi_workerProcgroups are described in the MPICH documentation from Argonne. In thisexample we are assuming you start mpi_dnaml_mon by hand on ppc00. Each linesays "run 1 instance of the given program on the specified machine". Thefirst line is there to indicate mpi_dnaml_mon was started by hand. Sincempi_dnaml_mon, mpi_fastDNAml, and mpi_foreman accumulate very little usertime relative to the workers, it is usually possible to run them all on thesame node (in this case ppc00) without hurting performance. You can list theprograms in any order. For example,if you do not want to run mpi_dnaml_monand you want to run the mpi_foreman on ppc08, the procgroup file could be, ppc00 0 ppc00 1 /home/jsmith/fastDNAml_1.2.2p/src/mpi_worker ppc06 1 /home/jsmith/fastDNAml_1.2.2p/src/mpi_worker ppc07 1 /home/jsmith/fastDNAml_1.2.2p/src/mpi_worker ppc08 1 /home/jsmith/fastDNAml_1.2.2p/src/mpi_foreman ppc09 1 /home/jsmith/fastDNAml_1.2.2p/src/mpi_worker ppc10 1 /home/jsmith/fastDNAml_1.2.2p/src/mpi_workerIn this case, mpi_fastDNAml (started by hand) and one worker would run onppc00. Let's say this procgroup file is mpi56.pg. To run the program on thesequence data file test56.phy, and assuming testdata is the current workingdirectory, you would enter the command, mpirun -p4pg mpi56.pg mpi_fastDNAml -n mpi56 -s test56.phyHere, infolevel = 0 (because we are not running dnaml_mon) run_name = mpi56 sequence_file = test56.phy work_dir = current working directoryand we have requested mpi_fastDNAml to ship the sequence data to all workers.An example set of output files for a run similar to this is given in thetestdata directory. The command used for that run was mpirun -p4pg mpi56.pg mpi_dnaml_mon -d4 -nmpi56 -s test56.phyThe output files the run produced are, mpi56.bst -- the best-tree file mpi56.cpt -- the last checkpoint file, in Newick format. mpi56.log -- the logfile output by mpi_dnaml_mon mpi56.trf -- the final best tree, also in Newick format. mpi56.mon -- this is stderr captured from dnaml_mon. PBS Batch Jobs of the MPI App on Beowulf Machines--------------------------------------------------Since fastDNAml consists of 4 different programs, it is a little tricky to runit under PBS. The problem is PBS creates a PBS_NODEFILE containing the machinesit has chosen, and you have to convert this into a p4 procgroup file like theones shown above. A perl script called mkp4pg is included in the testdatadirectory for doing this. There is also an example PBS script, mpi_test.pbs.Study that script, and you should be able to figure out how to make it work onyour system. MPICH/PBS is the only environment we have for batch MPI programson our Beowulf cluster at IU. If you have some other MPI implementation and/orbatch queueing system, please tell us how you got it to work and we will inlcudeit in this documentation.LoadLeveler Batch Jobs of the MPI App on the IBM SP----------------------------------------------------MPI programs on the the SP are run under control of POE (the Parallel OperatingEnvironment). There is an example LoadLeveler batch script, css0_ip_02x04.ll,in the testdata directory which runs the MPI application on two Winterhawk2nodes, using all 4 processors of each node. It uses internet protocol overthe SP switch. Since mpi_fastDNAml is a MPMD (Multi-Program, Multi-Data) typeof parallel program, you have to specify the "-pgmmodel mpmd" option to POE andprovide a file (in this case SP_poe_08.cmd) listing the programs to run. Onceyou get past the hurdle of learning LoadLeveler and a little POE stuff, therest of the story is the same as already described.Running the PVM Application--------------------------------PVM is a more natural fit to the master/worker structure of this code than MPI.MPI is good for tightly coupled parallel applications where relatively largemessages containing highly structured data are passed frequently among proces-ses. But fastDNAml_1.2.2p does relatively infrequent communication of small tomedium size messages, and hence has a very low communication/computation ratio.A tree of n taxa can be converted to a Newick string of around n*42 bytes, whichis only 4200 bytes for even 100 taxa. A message of this size could be sent overa 10 Mbit/sec ethernet network in less than 10 msec. If a worker process spends30 seconds finding the optimum branch lengths and evaluating likelihoods, andit takes another 10 msec to send the result back, the comm/comp ratio would beless than 6.7E-4, a very favorable number. This is why fastDNAml is a good par-allel application for networks of workstations. The communication/computationratio is still good even for wide area networks of workstations and/or multi-processors, as long as the network is not too congested. PVM was designed toease the construction and execution of parallel applications on such hetero-geneous clusters.We will not describe the PVM system here. The PVM developers have set up agood web site at http://www.epm.ornl.gov/pvm/pvm_home.html, which has, amongother information, an HTML version of their book, "PVM: Parallel VirtualMachine A Users' Guide and Tutorial for Networked Parallel Computing". Thebook is also availble from MIT press.Assuming you have a basic understanding of how PVM works, here is how to runpvm_fastDNAml. Start the PVM console, specifying a hostfile if you wish, oradding the hosts after starting it. Then either "quit" the console or go to another window and start up pvm_dnaml_mon (if you want to monitor the run) orpvm_fastDNAml (if you do not). The command line arguments are almost the sameas for the MPI version. The one difference is that you specify the number ofworkers to spawn with a -p option, instead of specifying the total number ofprocesses to run as in a P4 procgroup. For instance, to do an unmonitored runwith 8 workers on the hosts in hostfile, first start up the virtual machine bytyping pvm hostfilethen either "quit" PVM or go to another window and type $HOME/pvm3/bin/$PVM_ARCH/pvm_fastDNAml -p8 -ntest56 \ -w$HOME/fastDNAml_1.2.2/testdata -s test56.phyIf you do want to monitor the run, say at infolevel 3, then type $HOME/pvm3/bin/$PVM_ARCH/pvm_dnaml_mon -d3 -p8 -ntest56 \ -w$HOME/fastDNAml_1.2.2p/testdata -s test56.phyYou must make sure the executables are where PVM can find them. You can specifythis with "ep" lines in the hostfile, or use the default location,$HOME/pvm3/bin/$PVM_ARCH. Also, we have used the -s option here to make themaster ship the input file test56.phy to all the workers. Pvm_dnaml_mon willspawn the master. If the master starts up successfully and can read the inputfile, it will spawn the the foreman, informing it to spawn the 8 workers. Thisis a different startup method from the MPI version. Since MPI does not have anydynamic process management functions, all of the MPI processes start simultan-eously (at least from the programmer's and user's point of view) and they ren-dezvous to see who is alive and how to reach each other.If the program crashes, look in /tmp/pvml.<uid> on each machine. There area number of errors that will be logged there. The most common are that anexecutable or the input data file were not found.PBS Batch Jobs of the PVM App--------------------------------------------------As with the MPI application, running pvm_fastDNAml as a PBS batch job is alittle tricky. Some batch queuing systems, such as IBM's LoadLeveler, are ableto set up and manage a parallel virtual machine for batch runs, much as thePVM console does for interactive runs. PBS does not have this capability.Rather, you have to start up and shutdown the virtual machine yourself fromwithin your batch script. PBS publishes the nodes it allocates to your job ina file specified in environment variable PBS_NODEFILE. In your PBS script youthen feed this file to the PVM master daemon, pvmd3, which will start up a PVMdaemon on each node. Then you run the program the same way as described above.After your run is finished, you must shut down the parallel virtual machine.The easiest way is to send a SIGTERM to the master pvmd3. Before it terminates,it sends the signal to all the other pvmd3s, thus shutting down the wholevirtual machine. There is an example PBS script called pvm_test.pbs in thetestdata directory. We used this on our Beowulf cluster at Indiana University.How to use the cmon monitor--------------------------------------------------The section, "Program arguments, options, input and output" described how to start up cmon. Here I will explain the (very few) commands you can give itinteractively. add <host> -- Add a new host to the virtual machine. This is exactly like the "add" command in the PVM console. spawn <host> -- Spawn a worker on the specified host. kill <tid> -- cmon maintains a list of workers on the curses screen, listing their PVM task IDs, the hosts they are running on, and their CPU usage and wallclock runtimes. To kill a process use the "kill" command and specify the task ID. The process will not be killed until it returns the tree it is currently working on. Thus no trees are lost by killing processes. You can even kill all the workers. The master and foreman will just wait until you spawn at least one new worker, and then calculations will resume. quit -- quit the monitor. BUT, do not quit before the program is finished. In the future, I hope to make cmon attachable to and detachable from a running PVM application, but that is harder than it sounds. So for now, once you start cmon, pvm_fastDNAml, pvm_foreman, and the pvm_workers, leave cmon running until the whole program is finished.Well, that's it. Cmon is not a finished product, but rather a proof-of-conceptfor dynamic process management of parallel fastDNAml. It allows enough featuresthat users can play with it and perhaps suggest new features that would beuseful. For one, it would be nice to incorporate all the features of thestandard PVM console into cmon, so you would not have to start up the consoleAND cmon. You could just start up cmon, "add" a bunch of machines to thevirtual machine, start up fastDNAml, dynamically adjust the infolevel (the -doption), and so on. These, and probably many other things are all feasible,but it's a lot of work, and I just wanted to get something running to play with.Known Problems-------------------------------------------------- 1. In cmon, you can't backspace if you make a typo in entering a command. Solution is to just enter the botched command, let cmon tell you it doesn't understand it, and reenter it correctly. 2. In the MPI version, if you make a mistake in specifying the sequence data file, mpi_master aborts the whole application. This is what I designed it to do, but in our MPICH implemenation of MPI on our Beowulf cluster, not all the MPI processes get killed. So if this happens, you should check each of the machines and kill any remaining processes. 3. In spite of all my efforts to provide useful error messages, sometimes they just don't come out right. Gary Olsen was very careful to cover every possible error that could happen in his serial code, and all his error handling is still intact in the parallel version. But mine, for handling errors about working directory path, sequence file name, command line errors, and others related to the parallel code, is still a bit clunky. 4. The foreman has to decide when a worker has gone AWOL. Currently, this is done by just setting a timeout clock every time a worker is sent a tree to work on. If the worker does not send back a result within the timeout period, the foreman marks it AWOL and gives its tree to another worker. The timeout period is hard-coded to 120 seconds. If a tree actually takes longer than that to evaluate, the foreman will go into an infinite loop, trying over and over again to get some worker to do the work in 120 seconds. If you think this is happening, edit foreman.c search for "120", change it to some larger time and recompile. 5. If you are running on a massively parallel machine which has only one /tmp directory used by all the nodes, do not use the -s option. The way workers deal with a shipped sequence data file is to write the message out to /tmp and then open that file and read it normally. If a bunch of workers are all writing large files to /tmp, that could be a problem. (The C library function tmpfile is used to create and open the file. So the file is automatically deleted when closed.)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -