📄 readme
字号:
README for MPD 1.0.0 - May, 2005General-------MPD is a process management system for starting parallel jobs,especially MPICH jobs. Before running a job (with mpiexec), thempd daemons must be running on each host and connected into a ring.This README explains how to do that and also test and manage the daemons after they have been started.You need to have Python version 2.2 or later installed to run the mpd.You can type which pythonto make sure you have it installed, and pythonto find out what your version is. The current version can be obtainedfrom www.python.org.Type mpdhelpfor a list of mpd-related commands. Each command can be run with the--help argument for usage information.How to use MPD--------------(Another version of these startup instructions can be found in thempich2/README at the top level of the MPICH2 distribution.)You can start one mpd on the current host by running mpd &This starts a ring of one mpd. Other mpd's join the ring by being runwith host and port arguments for the first mpd. You can automate thisprocess by using mpdboot. Make a file with machine names in it. This file may or may not include thelocal machine. It will be handy to use the default, which is ./mpd.hosts .donner% cat ./mpd.hostsdonner.mcs.anl.govfoo.mcs.anl.govshakey.mcs.anl.govterra.mcs.anl.govdonner% After mpich is built, the mpd commands are in mpich2/bin, or the binsubdirectory of the install directory if you have done an install.You should put this (bin) directory in your PATH in your .cshrc or .bashrc, so that it will be picked up by the mpd's thatare started remotely:Put in .cshrc: setenv PATH /home/you/mpich2/bin:$PATHPut in .bashrc: export PATH=/home/you/mpich2/bin:$PATHTo start some mpds, use mpdboot. It uses the mpd.hosts file:donner% mpdboot -n 4 donner%This command starts a total of 4 daemons, one on the local machine and therest on machines in the mpd.hosts file. You can specify another file (-f) oranother mpd command (-m). The mpdboot command uses ssh to start the mpd oneach machine in the mpd.hosts file.You can use mpdtrace to see where your mpd's are running:donner% mpdtracedonnerfooshakeydonner% You can run something with mpiexecdonner% mpiexec -np 2 hostname donner.mcs.anl.govfoo.mcs.anl.govdonner%You can run an mpich2 job:donner% mpiexec -np 10 /home/lusk/hellowHello world from process 0 of 10Hello world from process 1 of 10Hello world from process 2 of 10Hello world from process 3 of 10Hello world from process 4 of 10Hello world from process 5 of 10Hello world from process 6 of 10Hello world from process 7 of 10Hello world from process 9 of 10Hello world from process 8 of 10donner% You can take down the daemons:donner% mpdallexitdonner%If things go bad and daemons seem to be in a bad state, you can remove theUnix sockets on all the machines in mpd.hosts by doing a cleanup:donner% mpdcleanupParallel Debugging----------------------------There are at least 3 methods by which you may do parallel debuggingvia mpd.1. MPD provides support for the totalview debugger via the -tv option tompiexec. Of course you must have a licensed copy of totalview to usethis option. Also, you must have configured with --enable-totalview.2. You can also start GUI debuggers such as ddd on each process that you execute, e.g.: mpiexec -n 2 ddd mypgmThis option is really only useful for small numbers of processes since you will create a window per process.3. This release also contains some specific support for using gdb todebug parallel programs. It is an interactive option of mpiexec thatstarts each application under the control of gdb and also (initially)broadcasts each gdb command to all processes, merging identical outputfrom multiple processes into a single, labelled line. It also addsone extra command, the "z" command, which can be used to select aprocess or range of proccesses for input to be directed to. With noarguments it reverts to broadcasting subsequent commands to all processes.This capability can be seen in the following transcript of a session withmpiexec debugging 10 processes. After the breakpoint at line 30 and twosingle steps, process 0 is selected ("z 0") and stepped once separately,to synchronize it with the other processes just before the MPI_Bcast.Then all processes are selected again ("z") and the next "n" steps themall through the broadcast together.Example of running mpiexec with -gdb:(magpie:52) % ./mpiexec.py -gdb -n 8 ~/mpich2/examples/cpi0-7: (gdb) l 360-7: 31 0-7: 32 fprintf(stdout,"Process %d of %d is on %s\n",0-7: 33 myid, numprocs, processor_name);0-7: 34 fflush(stdout);0-7: 35 0-7: 36 n = 10000; /* default # of rectangles */0-7: 37 if (myid == 0)0-7: 38 startwtime = MPI_Wtime();0-7: 39 0-7: 40 MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);0-7: (gdb) b 360-7: Breakpoint 2 at 0x80493ac: file cpi.c, line 36.0-7: (gdb) r0-7: Continuing.0: Process 0 of 8 is on magpie1: Process 1 of 8 is on magpie2: Process 2 of 8 is on magpie3: Process 3 of 8 is on magpie4: Process 4 of 8 is on magpie5: Process 5 of 8 is on magpie6: Process 6 of 8 is on magpie7: Process 7 of 8 is on magpie0-7: 1: Breakpoint 2, main (argc=1, argv=0xbfffe914) at cpi.c:362: Breakpoint 2, main (argc=1, argv=0xbfffe894) at cpi.c:363: Breakpoint 2, main (argc=1, argv=0xbfffe814) at cpi.c:364: Breakpoint 2, main (argc=1, argv=0xbfffe794) at cpi.c:365: Breakpoint 2, main (argc=1, argv=0xbfffe714) at cpi.c:366: Breakpoint 2, main (argc=1, argv=0xbfffe694) at cpi.c:367: Breakpoint 2, main (argc=1, argv=0xbfffe614) at cpi.c:360: Breakpoint 2, main (argc=1, argv=0xbfffe994) at cpi.c:360-7: 36 n = 10000; /* default # of rectangles */0-7: (gdb) n0-7: 37 if (myid == 0)0-7: (gdb) n0: 38 startwtime = MPI_Wtime();1-7: 40 MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);0-7: (gdb) z 00: (gdb) n0: 40 MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);0: (gdb) z0-7: (gdb) n0-7: 42 h = 1.0 / (double) n;0-7: (gdb) n0-7: 43 sum = 0.0;0-7: (gdb) n0-7: 45 for (i = myid + 1; i <= n; i += numprocs)0-7: (gdb) n0-7: 47 x = h * ((double)i - 0.5);0-7: (gdb) n0-7: 48 sum += f(x);0-7: (gdb) n0-7: 45 for (i = myid + 1; i <= n; i += numprocs)0-7: (gdb) n0-7: 47 x = h * ((double)i - 0.5);0-7: (gdb) n0-7: 48 sum += f(x);0-7: (gdb) n0-7: 45 for (i = myid + 1; i <= n; i += numprocs)0-7: (gdb) n0-7: 47 x = h * ((double)i - 0.5);0-7: (gdb) n0-7: 48 sum += f(x);0-7: (gdb) n0-7: 45 for (i = myid + 1; i <= n; i += numprocs)0-7: (gdb) n0-7: 47 x = h * ((double)i - 0.5);0-7: (gdb) n0-7: 48 sum += f(x);0-7: (gdb) p sum0: $1 = 11.9999862100317361: $1 = 11.9999840500407752: $1 = 11.9999816500517323: $1 = 11.9999790100648934: $1 = 11.9999761300805725: $1 = 11.9999730100991226: $1 = 11.9999696501209127: $1 = 11.9999660501463450-7: (gdb) c0-7: Continuing.0: pi is approximately 3.1415926544231247, Error is 0.00000000083333161-7: 1-7: Program exited normally.1-7: (gdb) 0: wall clock time = 50.7267320: 0: Program exited normally.0: (gdb) q0-7: MPIGDB ENDING(magpie:53) %
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -