📄 mpi.qbk
字号:
std::srand(time(0) + world.rank()); int my_number = std::rand(); if (world.rank() == 0) { std::vector<int> all_numbers; gather(world, my_number, all_numbers, 0); for (int proc = 0; proc < world.size(); ++proc) std::cout << "Process #" << proc << " thought of " << all_numbers[proc] << std::endl; } else { gather(world, my_number, 0); } return 0; } Executing this program with seven processes will result in output suchas the following. Although the random values will change from one runto the next, the order of the processes in the output will remain thesame because only process 0 writes to `std::cout`.[preProcess #0 thought of 332199874Process #1 thought of 20145617Process #2 thought of 1862420122Process #3 thought of 480422940Process #4 thought of 1253380219Process #5 thought of 949458815Process #6 thought of 650073868]The `gather` operation collects values from every process into avector at one process. If instead the values from every process needto be collected into identical vectors on every process, use the[funcref boost::mpi::all_gather `all_gather`] algorithm,which is semantically equivalent to calling `gather` followed by a`broadcast` of the resulting vector.[endsect][section:reduce Reduce] The [funcref boost::mpi::reduce `reduce`] collectivesummarizes the values from each process into a single value at theuser-specified "root" process. The Boost.MPI `reduce` operation issimilar in spirit to the STL _accumulate_ operation, because it takesa sequence of values (one per process) and combines them via afunction object. For instance, we can randomly generate values in eachprocess and the compute the minimum value over all processes via acall to [funcref boost::mpi::reduce `reduce`](`random_min.cpp`):: #include <boost/mpi.hpp> #include <iostream> #include <cstdlib> namespace mpi = boost::mpi; int main(int argc, char* argv[]) { mpi::environment env(argc, argv); mpi::communicator world; std::srand(time(0) + world.rank()); int my_number = std::rand(); if (world.rank() == 0) { int minimum; reduce(world, my_number, minimum, mpi::minimum<int>(), 0); std::cout << "The minimum value is " << minimum << std::endl; } else { reduce(world, my_number, mpi::minimum<int>(), 0); } return 0; }The use of `mpi::minimum<int>` indicates that the minimum valueshould be computed. `mpi::minimum<int>` is a binary function objectthat compares its two parameters via `<` and returns the smallervalue. Any associative binary function or function object willwork. For instance, to concatenate strings with `reduce` one could usethe function object `std::plus<std::string>` (`string_cat.cpp`): #include <boost/mpi.hpp> #include <iostream> #include <string> #include <boost/serialization/string.hpp> namespace mpi = boost::mpi; int main(int argc, char* argv[]) { mpi::environment env(argc, argv); mpi::communicator world; std::string names[10] = { "zero ", "one ", "two ", "three ", "four ", "five ", "six ", "seven ", "eight ", "nine " }; std::string result; reduce(world, world.rank() < 10? names[world.rank()] : std::string("many "), result, std::plus<std::string>(), 0); if (world.rank() == 0) std::cout << "The result is " << result << std::endl; return 0; } In this example, we compute a string for each process and then performa reduction that concatenates all of the strings together into one,long string. Executing this program with seven processors yields thefollowing output:[preThe result is zero one two three four five six]Any kind of binary function objects can be used with `reduce`. Forinstance, and there are many such function objects in the C++ standard`<functional>` header and the Boost.MPI header`<boost/mpi/operations.hpp>`. Or, you can create your ownfunction object. Function objects used with `reduce` must beassociative, i.e. `f(x, f(y, z))` must be equivalent to `f(f(x, y),z)`. If they are also commutative (i..e, `f(x, y) == f(y, x)`),Boost.MPI can use a more efficient implementation of `reduce`. Tostate that a function object is commutative, you will need tospecialize the class [classref boost::mpi::is_commutative`is_commutative`]. For instance, we could modify the previous exampleby telling Boost.MPI that string concatenation is commutative: namespace boost { namespace mpi { template<> struct is_commutative<std::plus<std::string>, std::string> : mpl::true_ { }; } } // end namespace boost::mpiBy adding this code prior to `main()`, Boost.MPI will assume thatstring concatenation is commutative and employ a different parallelalgorithm for the `reduce` operation. Using this algorithm, theprogram outputs the following when run with seven processes:[preThe result is zero one four five six two three]Note how the numbers in the resulting string are in a different order:this is a direct result of Boost.MPI reordering operations. The resultin this case differed from the non-commutative result because stringconcatenation is not commutative: `f("x", "y")` is not the same as`f("y", "x")`, because argument order matters. For truly commutativeoperations (e.g., integer addition), the more efficient commutativealgorithm will produce the same result as the non-commutativealgorithm. Boost.MPI also performs direct mappings from functionobjects in `<functional>` to `MPI_Op` values predefined by MPI (e.g.,`MPI_SUM`, `MPI_MAX`); if you have your own function objects that cantake advantage of this mapping, see the class template [classrefboost::mpi::is_mpi_op `is_mpi_op`].Like [link mpi.gather `gather`], `reduce` has an "all"variant called [funcref boost::mpi::all_reduce `all_reduce`]that performs the reduction operation and broadcasts the result to allprocesses. This variant is useful, for instance, in establishingglobal minimum or maximum values.[endsect][endsect][section:communicators Managing communicators]Communication with Boost.MPI always occurs over a communicator. Acommunicator contains a set of processes that can send messages amongthemselves and perform collective operations. There can be manycommunicators within a single program, each of which contains its ownisolated communication space that acts independently of the othercommunicators. When the MPI environment is initialized, only the "world" communicator(called `MPI_COMM_WORLD` in the MPI C and Fortran bindings) isavailable. The "world" communicator, accessed by default-constructinga [classref boost::mpi::communicator mpi::communicator]object, contains all of the MPI processes present when the programbegins execution. Other communicators can then be constructed byduplicating or building subsets of the "world" communicator. Forinstance, in the following program we split the processes into twogroups: one for processes generating data and the other for processesthat will collect the data. (`generate_collect.cpp`) #include <boost/mpi.hpp> #include <iostream> #include <cstdlib> #include <boost/serialization/vector.hpp> namespace mpi = boost::mpi; enum message_tags {msg_data_packet, msg_broadcast_data, msg_finished}; void generate_data(mpi::communicator local, mpi::communicator world); void collect_data(mpi::communicator local, mpi::communicator world); int main(int argc, char* argv[]) { mpi::environment env(argc, argv); mpi::communicator world; bool is_generator = world.rank() < 2 * world.size() / 3; mpi::communicator local = world.split(is_generator? 0 : 1); if (is_generator) generate_data(local, world); else collect_data(local, world); return 0; }When communicators are split in this way, their processes retainmembership in both the original communicator (which is not altered bythe split) and the new communicator. However, the ranks of theprocesses may be different from one communicator to the next, becausethe rank values within a communicator are always contiguous valuesstarting at zero. In the example above, the first two thirds of theprocesses become "generators" and the remaining processes become"collectors". The ranks of the "collectors" in the `world`communicator will be 2/3 `world.size()` and greater, whereas the ranksof the same collector processes in the `local` communicator will startat zero. The following excerpt from `collect_data()` (in`generate_collect.cpp`) illustrates how to manage multiplecommunicators: mpi::status msg = world.probe(); if (msg.tag() == msg_data_packet) { // Receive the packet of data std::vector<int> data; world.recv(msg.source(), msg.tag(), data); // Tell each of the collectors that we'll be broadcasting some data for (int dest = 1; dest < local.size(); ++dest) local.send(dest, msg_broadcast_data, msg.source()); // Broadcast the actual data. broadcast(local, data, 0); }The code in this except is executed by the "master" collector, e.g.,the node with rank 2/3 `world.size()` in the `world` communicator andrank 0 in the `local` (collector) communicator. It receives a messagefrom a generator via the `world` communicator, then broadcasts themessage to each of the collectors via the `local` communicator.For more control in the creation of communicators for subgroups ofprocesses, the Boost.MPI [classref boost::mpi::group `group`] providesfacilities to compute the union (`|`), intersection (`&`), anddifference (`-`) of two groups, generate arbitrary subgroups, etc.[endsect][section:skeleton_and_content Separating structure from content]When communicating data types over MPI that are not fundamental to MPI(such as strings, lists, and user-defined data types), Boost.MPI mustfirst serialize these data types into a buffer and then communicatethem; the receiver then copies the results into a buffer beforedeserializing into an object on the other end. For some data types,this overhead can be eliminated by using [classrefboost::mpi::is_mpi_datatype `is_mpi_datatype`]. However,variable-length data types such as strings and lists cannot be MPIdata types. Boost.MPI supports a second technique for improving performance byseparating the structure of these variable-length data structures fromthe content stored in the data structures. This feature is onlybeneficial when the shape of the data structure remains the same butthe content of the data structure will need to be communicated severaltimes. For instance, in a finite element analysis the structure of themesh may be fixed at the beginning of computation but the variousvariables on the cells of the mesh (temperature, stress, etc.) will becommunicated many times within the iterative analysis process. In thiscase, Boost.MPI allows one to first send the "skeleton" of the meshonce, then transmit the "content" multiple times. Since the contentneed not contain any information about the structure of the data type,it can be transmitted without creating separate communication buffers.To illustrate the use of skeletons and content, we will take asomewhat more limited example wherein a master process generatesrandom number sequences into a list and transmits them to severalslave processes. The length of the list will be fixed at programstartup, so the content of the list (i.e., the current sequence ofnumbers) can be transmitted efficiently. The complete example isavailable in `example/random_content.cpp`. We being with the masterprocess (rank 0), which builds a list, communicates its structure viaa [funcref boost::mpi::skeleton `skeleton`], then repeatedlygenerates random number sequences to be broadcast to the slaveprocesses via [classref boost::mpi::content `content`]: // Generate the list and broadcast its structure std::list<int> l(list_len); broadcast(world, mpi::skeleton(l), 0); // Generate content several times and broadcast out that content mpi::content c = mpi::get_content(l); for (int i = 0; i < iterations; ++i) { // Generate new random values std::generate(l.begin(), l.end(), &random); // Broadcast the new content of l broadcast(world, c, 0); } // Notify the slaves that we're done by sending all zeroes std::fill(l.begin(), l.end(), 0); broadcast(world, c, 0);The slave processes have a very similar structure to the master. Theyreceive (via the [funcref boost::mpi::broadcast`broadcast()`] call) the skeleton of the data structure, then use itto build their own lists of integers. In each iteration, they receivevia another `broadcast()` the new content in the data structure andcompute some property of the data: // Receive the content and build up our own list std::list<int> l; broadcast(world, mpi::skeleton(l), 0); mpi::content c = mpi::get_content(l); int i = 0; do {
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -