⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 fastdnaml_doc_1.2p.txt

📁 fastDNAml is an attempt to solve the same problem as DNAML, but to do so faster and using less memo
💻 TXT
📖 第 1 页 / 共 2 页
字号:
                          fastDNAml_1.2.2pDon BerryUniversity Information Technology ServicesIndiana University, Bloomington, INdkberry@indiana.edu-----------------------------------------------------------------------------FastDNAml_1.2.2p is a parallel version of Gary Olsen's fastDNAml_1.2.2. Itconsists of a master program, which constructs the trees to be evaluated,a set of worker programs which do the actual evaluation and branch lengthoptimization, and a "foreman" program which manages all the workers. Thisstructure is based on the P4 parllel code, fastDNAml_1.0.6, that Olsen,Hideo Matsuda, Ray Hagstrom and Ross Overbeek released in 1992. In theircode they used the Argonne P4 message passing library to pass messges amongall the processes. In fastDNAml_1.2.2p, this library has been replaced by aset of special fastDNAml communication functions, which in turn invokefunctions from either the PVM or MPI message passing libraries. This isolatesthe main code from any specific message passing library, so that ports to newlibraries will be easier. The fastDNAml communication functions for MPI arecontained in file comm_mpi.c, and those for PVM are in comm_pvm.c. There isalso a comm_seq.c, which contains basically stubs, so that one can even compilea sequential code. This sequential version could then replace Gary's originalv1.2.2 code. P4 has rather fallen out of favor for parallel programming(although it provides the basic "abstract device interface" for the ArgonneMPICH implementation of MPI), but if a P4 version is ever needed, a comm_p4.cfile can easily be constructed without touching the rest of the code.Another major upgrade to the P4 code is installation of some fault-tolerancefor running on networks of computers, where nodes may drop out or becomeunreachable. In fact, the PVM code has been run on a world-wide networkconsisting of several nodes of an IBM SP at Indiana University, several moreSP nodes at National University of Singapore, and several DEC Alpha machinesat Australian National University in Canberra. It recovered and continued thecalculation just fine when various worker processes were killed. The MPI versionis based on the MPI-1 standard, which assumes a static set of processes. MPIwas originally targeted to massively parallel machines where one could assumethat either all processes would stay up for the duration of the run, or theywould all go down together. Consequently, MPI assumes tighter coupling amongprocesses. In some implementations, if one process dies, the whole applicationmay abort, although it will probably keep running if a process just becomestemporarily unreachable. So the fault-tolerance built into fastDNAml_1.2.2pis partially defeated by some finicky MPI implementations.In contrast, an advantage of PVM is that it was designed to be used in dynamicenvironments such as local area networks of workstations. PVM includes functionsfor dynamically adding and deleting hosts to/from the virtual machine, and forstarting and stopping processes. To make use of these features, fastDNAml_1.2.2pincludes a special user interface program called "cmon". With it, a user canadd hosts and spawn and kill worker processes. (Cmon currently just has themost basic functions. As we see other functions that might be useful, they willbe added to future releases.) Cmon is discussd more at the end of this document.The programs comprising the parallel application------------------------------------------------------Before using the parallel program, you should read Gary's documentation infile fastDNAml_doc_1.2.txt. It explains his original v1.2.2 serial code. Thisfile gives an overview of how the parallel program works, how to compile it,and how to run it. It also lists some differences regarding input and output.The serial application is called "fastDNAml", just as in Gary's code, whilethe master processes of the MPI and PVM applications are "mpi_fastDNAml" and"pvm_fastDNAml", respectively.The MPI and PVM applications actually consist of four programs:   mpi_fastDNAml   pvm_fastDNAml      This is the "top-level" program of the four, and the user would      run either mpi_fastDNAml or pvm_fastDNAml to start up the parallel      application, just as he would run fastDNAml to start up the serial      application. But in the parallel case, this program just figures      out which trees to evaluate, and sends them off to the foreman      for eventual evaluation. Because of this, *_fastDNAml is often       called the "master" program. (In fact the source code is master.c.)   mpi_foreman   pvm_foreman      Recieves trees from *_fastDNAml and controls their evaluation by a      set of worker processes. This process combines the functions of the      dispatch and merge programs in the old Argonne p4 version of fastDNAml,      and includes some fault recovery features to deal with workers      which may die or become unreachable. This is necessary for running      on fault-prone clusters of workstations.   mpi_worker   pvm_worker      The program that does all the work of optimizing branch lengths and      evaluating tree likelihoods. You would run several instances of this      program.   mpi_dnaml_mon   pvm_dnaml_mon      Monitors the parallel program and prints progress reports, or "tele-      metry" to stderr. The user can specify three levels of telemetry.      Under some implementations of MPI and PVM, this process may require      its own processor. If you do not care about getting telemetry during      the run, you can omit the monitor and recover use of its processor      for an extra worker. The run can still be monitored by looking at      the .bst file, which is output by the master process. If you *do*      want to run the monitor, you must specify it on the command line      rather than mpi_fastDNAml or pvm_fastDNAml. Also, *_dnaml_mon is      used to monitor only the parallel application. The serial code      monitors itself if you specify the -d option as described below.   cmon      Mpi_dnaml_mon and pvm_dnaml_mon print out all the telemetry on a      scrolling screen, as though it were a stream of printer paper. The cmon      monitor uses the curses library to update a single screen with the      telemetry as the run progresses. (cmon is my unimaginative acroynm for      "curses monitor".) This monitor is only available for the PVM version.      In addition to providing a more useful display of the telemetry, cmon      also allows user to interactively add new hosts to the virtual machine      and to start and stop worker processes.      Compiling--------------------------------There is a makefile for the IBM SP and one for LINUX clusters. To make theserial version for a LINUX Beowulf cluster, just type,   make -f Makefile.LINUX serialTo make the MPI and PVM parallel versions, you will first have to set severalMPI and PVM macros at the beginning of the makefile to match the locations ofMPI and PVM on your machine. Then make the MPI or PVM versions by typing either   make -f Makefile.LINUX mpior   make -f Makefile.LINUX pvmYou can also make each program separately, for example,   make -f Makefile.LINUX mpi_fastDNAmlYou have to make cmon separately:   make -f Makefile.LINUX cmonProgram arguments, options, input and output---------------------------------------------The input sequence data files for fastDNAMl_1.2.2p are the same as Gary describesfor his fastDNAml_1.2.2 (see fastDNAml_doc_1.2.txt). But fastDNAml_1.2.2p doesnot read them from stdin. Rather, you give the file name on the command line andit reads the file directly. This may be an inconvenience if you have been takingadvantage of the shell scripts to massage input files and feed them to fastDNAmlvia its stdin, but that method ran into some complications in the parallel code.I hope to iron these problems out in a future release. Besides the required inputdata file name, there are four optional command line arguments, and the PVM versionrequires specification of the number of workers desired. Suppose you want to runthe PVM version, and you want to monitor it. The command line would take thefollowing form:   pvm_dnaml_mon [-d infolevel] -p #_workers [-n run_name] [-w work_dir] [-s] \        sequence_fileor   cmon [-d infolevel] -p #_workers [-n run_name] [-w work_dir] [-s] sequence_fileThe square brackets indicate optional arguments. Meanings are as follows:   -d infolevel  This causes *_dnaml_mon to output telemetry about the run                 to its stderr. There are 4 levels, each one of which includes                 the telemetry of the levels below it:                   0 = (default) Currently, this does the same as level 1.                   1 = measure and print out at the end of the run the                       wallclock time, user CPU time, and system CPU time                       for each process or thread                   2 = print a "step time" after every fifth taxon has                       been added to the tree                   3 = print the number of every taxon added to the tree                   4 = print a "+" sign for every tree the master sends                       to the foreman for evaluation, and a "-" sign for                       every evaluated tree the foreman receives back from                       a worker.                 These levels are also pertinent to the cmon monitor, but                 it updates tree count fields and CPU and wallclock time                 fields on the terminal screen for each of the workers,                 rather than printing + and - signs. To get the most telemtry                 when using cmon, use -d4.   -p #_workers  The number of workers you want. This is only for the PVM                 program. For the MPI program, you specify the programs to                 be run in some sort of command or process group file (see                 "Running the MPI Application" below). See the documentation                 for the MPI installed on your machine.   -n run_name   A name for the output files. If you omit this, it defaults                 to the name of the sequence data file. FastDNAml_v1.2.2p                 writes four files, each named with the run name and a                 3-character extension:      run_name.log  -  Written by dnaml_mon. A line is written to this file                       at each step time, the same information the monitor                       writes to its stderr for infolevel 2. This file is                       written independent of the infolevel option.      run_name.cpt  -  A checkpoint file, written by the master after each                       taxon is added (and the best tree found) and after                       each round of branch swapping. If the run is iterrupted                       this file can be attached to the end of the sequence                       data file and the run can be restarted from that point                       using the R option.      run_name.bst  -  The "best tree" file, output by the master as the run                       progresses. This file contains details about the order                       of addition of the taxa, how many trees were tested,                       what the best likelihoods were, and an ASCII drawing                       of the final tree with branch lengths and confidence                       intervals.      run_name.trf  -  The final best tree.   -w work_dir   This option allows you to set the working directory at run                 time, a feature which can be useful in certain cirumstances.                 It is passed to the monitor, master, foreman, and all workers.                 The workers need a working directory only if they are going                 to read the sequence data file themselves (see -s option).                 This is the only file I/O they do. Work_dir can be either a                 relative or absolute path. The default is the current working                 directory.                    -s            Instructs only the master to read the sequence data file,                 and ship it to the workers in a message. Without this option,                 each worker must be able to read the sequence data file                 itself. This could be a problem on some systems if you want                 to run hundreds of workers, or on heterogeneous clusters                 where the machines do not have a common file system. To                 avoid the headache of staging the data on every file system,                 you can select the -s option.  On the other hand, if the                 sequence data file is large, and you are running the program                 on a slow or wide-area network, it could take some time for                 the file to be shipped out to all the workers. If you want                 to do a lot of runs with the same sequence data, the effort                 of copying the file by hand to every computer might be                 amortized over the many runs.Running the MPI Application

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -