📄 stmoverview.tex

📁 刚才是说明现在是安装程序在 LINUX环境下进行编程的MPICH安装文件
💻 TEX
字号:
 Algorithms for collective communication are often of the store and forward or store and scatter (to one or more processes);  collective computation (such  as reduce or scan) adds a processing step before forwarding. This section describes the routines needed for these operations. They rely on the \texttt{MPID_Segment} to provide an interface between the  general MPI datatypes and the contiguous byte array form in which  most store and forward operations are described. Because the collective routines are all blocking, these routines can use a simplified strategy for initiating the "next" block in a store and forward pipeline. Another difference between these and the point-to-point routines is that we may need more information that just the "tag" that point-to-point provides.  Thus, at the beginning of a store-and-forward operation, a small amount of additional data may be sent. Streams are actually delivered in blocks; as each block is delivered, the application has the option to process and/or forward the block to another process (or processes).  The block size is determined by the device.  The \texttt{MPID_Stream_iforward} routine allows a  code to receive the data into the local destination buffer according to the specified (possibly noncontiguous) datatype while forwarding the block.  This avoids the unpack/pack cycle that is required when only send/receive routines are used. Notes on xfer versus stream.  The original stream interface allowed the  programmer to explicitly describe the steps used to process each section (or segment) of the stream.  This gave the programmer a great deal of  control and flexibility over the handling of each part of a stream.   However, it also made it very difficult for the device to efficiently  handle a stream transfer, particularly for non-polling devices.  The xfer approach essentially builds a simple data transfer program that is then executed by the device.  This is not as flexible as the stream interface, but the device may be able to more efficiently implement xfer.\subsection{Note to Implementors} In determining the block size, you cannot look at the datatype, since the datatypes used by the sender and the receiver may be different.  For example, even though the sender has a contiguous buffer and thus could send a large block without allocating memory or copying data, the receiver may have specified a non-contiguous  data buffer (in the segment definition), thus limiting the size of the block for receiving. An implementation should also consider at least double buffering the communication of a stream.  In other words, once one block is delivered, allowing \texttt{MPID_Stream_wait} to return with that block, begin delivering the next block into a separate buffer.  Where it makes sense, if there is storage for the entire message, an implementation may choose to deliver the entire message as quickly as possible, updating the \texttt{stream->cur_length} as data is delivered. This points out that while the stream routines discuss motion in blocks, there is no particular limit to the number of blocks that are delivered each time.\subsection{Questions} 1. Should a \texttt{MPID_Stream} be an \texttt{MPID_Request}?  That would allow us to use the same completion routines, and to mix stream and non-stream communication.  The current choice was made to make the stream module independent of the other modules, and exploits the fact that these are intended for implementing the collective operations, all of which are blocking (except for the two-phase collective in MPI-IO). Note that if we do make these requests, then the description of  \texttt{MPID_Waitsome} etc. will become more complex; we may need to provide a routine to be called by the \texttt{MPID_Waitsome} in that case.  Viewed in that light, streams are similar to persistant, user-defined MPI requests. 2. Should there be a test as well as a wait on a stream?  If a stream is a kind of request, then we get this automatically. 3. The examples outlined here rely on explicit advancement of the stream by the process the initiates it.  Another approach is to allow the  communication agent to send each piece of the stream as it becomes possible. 4. Should there be a one-sided stream operation?  E.g., just as we have  streams as a generalization of message-passing, do we want the same thing  for RMA operations?
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -