⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 users-guide.tex

📁 fortran并行计算包
💻 TEX
📖 第 1 页 / 共 3 页
字号:
% \documentstyle[11pt,psfig]{article}\documentstyle[11pt]{article}\hoffset=-.7in\voffset=-.6in\textwidth=6.5in\textheight=8.5in\begin{document}\vspace*{-1in}\thispagestyle{empty}\begin{center}ARGONNE NATIONAL LABORATORY \\9700 South Cass Avenue \\Argonne, IL 60439\end{center}\vskip .5 in\begin{center}\rule{1.75in}{.01in} \\\vspace{.1in}ANL/MCS-TM-234 \\\rule{1.75in}{.01in} \\\vskip 1.3 in{\Large\bf Users Guide for ROMIO: A High-Performance, \\ [1ex]Portable MPI-IO Implementation} \\ [4ex]by \\ [2ex]{\large\it Rajeev Thakur, Robert Ross, Ewing Lusk, and William Gropp}\vspace{1in}Mathematics and Computer Science Division\bigskipTechnical Memorandum No.\ 234\vspace{1.4in}Revised May 2004\end{center}\vfill{\small\noindentThis work was supported by the Mathematical, Information, andComputational Sciences Division subprogram of the Office of AdvancedScientific Computing Research, U.S. Department of Energy, underContract W-31-109-Eng-38; and by the Scalable I/O Initiative, amultiagency project funded by the Defense Advanced Research ProjectsAgency (Contract DABT63-94-C-0049), the Department of Energy, theNational Aeronautics and Space Administration, and the NationalScience Foundation.}\newpage%%  Line Spacing (e.g., \ls{1} for single, \ls{2} for double, even \ls{1.5})%%\newcommand{\ls}[1]   {\dimen0=\fontdimen6\the\font     \lineskip=#1\dimen0    \advance\lineskip.5\fontdimen5\the\font    \advance\lineskip-\dimen0    \lineskiplimit=.9\lineskip    \baselineskip=\lineskip    \advance\baselineskip\dimen0    \normallineskip\lineskip    \normallineskiplimit\lineskiplimit    \normalbaselineskip\baselineskip    \ignorespaces   }\renewcommand{\baselinestretch}{1}\newcommand {\ix} {\hspace*{2em}}\newcommand {\mc} {\multicolumn}\tableofcontents\thispagestyle{empty}\newpage\pagenumbering{arabic}\setcounter{page}{1}\begin{center}{\bf Users Guide for ROMIO:  A High-Performance,\\[1ex]Portable MPI-IO Implementation} \\ [2ex]by \\ [2ex]{\it Rajeev Thakur, Robert Ross, Ewing Lusk, and William Gropp}\end{center}\addcontentsline{toc}{section}{Abstract}\begin{abstract}\noindentROMIO is a high-performance, portable implementation of MPI-IO (theI/O chapter in \mbox{MPI-2}). This document describes how to install and useROMIO version~1.2.4 on various machines.\end{abstract}\section{Introduction} ROMIO\footnote{\tt http://www.mcs.anl.gov/romio} is ahigh-performance, portable implementation of MPI-IO (the I/O chapter in MPI-2~\cite{mpi97a}). This document describes how to install and useROMIO version~1.2.4 on various machines.%% MAJOR CHANGES IN THIS VERSION%\section{Major Changes in This Version}\begin{itemize}\item Added section describing ROMIO \texttt{MPI\_FILE\_SYNC} and      \texttt{MPI\_FILE\_CLOSE} behavior to User's Guide\item Bug removed from PVFS ADIO implementation regarding resize operations\item Added support for PVFS listio operations (see Section \ref{sec:hints})\item Added the following working hints:      \texttt{romio\_pvfs\_listio\_read}, \texttt{romio\_pvfs\_listio\_write}\end{itemize}%% GENERAL INFORMATION%\section{General Information}This version of ROMIO includes everything defined in the MPI-2 I/Ochapter except support for file interoperability (\S~9.5 of MPI-2) anduser-defined error handlers for files (\S~4.13.3).  The subarray anddistributed array datatype constructor functions from Chapter 4(\S~4.14.4 \& \S~4.14.5) have been implemented. They are useful foraccessing arrays stored in files. The functions {\tt MPI\_File\_f2c}and {\tt MPI\_File\_c2f} (\S~4.12.4) are also implemented.  C,Fortran, and profiling interfaces are provided for all functions thathave been implemented.This version of ROMIO runs on at least the following machines: IBM SP; IntelParagon; HP Exemplar; SGI Origin2000; Cray T3E; NEC SX-4; othersymmetric multiprocessors from HP, SGI, DEC, Sun, and IBM; and networks ofworkstations (Sun, SGI, HP, IBM, DEC, Linux, and FreeBSD).Supported file systems are IBM PIOFS, Intel PFS, HP/ConvexHFS, SGI XFS, NEC SFS, PVFS, NFS, NTFS, and any Unix file system (UFS).This version of ROMIO is included in MPICH 1.2.4; an earlier versionis included in at least the following MPI implementations: LAM, HPMPI, SGI MPI, and NEC MPI. Note that proper I/O error codes and classes are returned and the statusvariable is filled only when used with MPICH revision 1.2.1 or later.You can open files on multiple file systems in the same program. Theonly restriction is that the directory where the file is to be openedmust be accessible from the process opening the file. For example, aprocess running on one workstation may not be able to access adirectory on the local disk of another workstation, and thereforeROMIO will not be able to open a file in such a directory. NFS-mountedfiles can be accessed.An MPI-IO file created by ROMIO is no different from any other filecreated by the underlying file system. Therefore, you may use any ofthe commands provided by the file system to access the file, for example,{\tt ls}, {\tt mv}, {\tt cp}, {\tt rm}, {\tt ftp}.Please read the limitations of this version of ROMIO that are listedin Section~\ref{sec:limit} of this document (e.g., restriction to homogeneousenvironments). \subsection{ROMIO Optimizations}\label{sec:opt}ROMIO implements two I/O optimization techniques that in generalresult in improved performance for applications.  The first of theseis \emph{data sieving}~\cite{choudhary:passion}.  Data sieving is atechnique for efficiently accessing noncontiguous regions of data in fileswhen noncontiguous accesses are not provided as a file system primitive.The naive approach to accessing noncontiguous regions is to use a separateI/O call for each contiguous region in the file.  This results in a largenumber of I/O operations, each of which is often for a very small amountof data.  The added network cost of performing an I/O operation across thenetwork, as in parallel I/O systems, is often high because of latency.Thus, this naive approach typically performs very poorly because ofthe overhead of multiple operations.  % In the data sieving technique, a number of noncontiguous regions areaccessed by reading a block of data containing all of the regions,including the unwanted data between them (called ``holes'').  The regionsof interest are then extracted from this large block by the client.This technique has the advantage of a single I/O call, but additionaldata is read from the disk and passed across the network.There are four hints that can be used to control the application ofdata sieving in ROMIO: \texttt{ind\_rd\_buffer\_size},\texttt{ind\_wr\_buffer\_size}, \texttt{romio\_ds\_read},and \texttt{romio\_ds\_write}.  These are discussed inSection~\ref{sec:hints}.The second optimization is \emph{two-phaseI/O}~\cite{bordawekar:primitives}.  Two-phase I/O, also called collectivebuffering, is an optimization that only applies to collective I/Ooperations.  In two-phase I/O, the collection of independent I/O operationsthat make up the collective operation are analyzed to determine whatdata regions must be transferred (read or written).  These regions arethen split up amongst a set of aggregator processes that will actuallyinteract with the file system.  In the case of a read, these aggregatorsfirst read their regions from disk and redistribute the data to thefinal locations, while in the case of a write, data is first collectedfrom the processes before being written to disk by the aggregators.There are five hints that can be used to control the applicationof two-phase I/O: \texttt{cb\_config\_list}, \texttt{cb\_nodes},\texttt{cb\_buffer\_size}, \texttt{romio\_cb\_read},and \texttt{romio\_cb\_write}.  These are discussed inSubsection~\ref{sec:hints}.\subsection{Hints}\label{sec:hints}The following hints control the data sieving optimization and areapplicable to all file system types:\begin{itemize}\item \texttt{ind\_rd\_buffer\_size} -- Controls the size (in bytes) of theintermediate buffer used by ROMIO when performing data sieving duringread operations.  Default is \texttt{4194304} (4~Mbytes).\item \texttt{ind\_wr\_buffer\_size} -- Controls the size (in bytes) of theintermediate buffer used by ROMIO when performing data sieving duringwrite operations.  Default is \texttt{524288} (512~Kbytes).\item \texttt{romio\_ds\_read} -- Determines when ROMIO will choose to perform data sieving.Valid values are \texttt{enable}, \texttt{disable}, or \texttt{automatic}.Default value is \texttt{automatic}.  In \texttt{automatic} mode ROMIOmay choose to enable or disable data sieving based on heuristics.\item \texttt{romio\_ds\_write} -- Same as above, only for writes.\end{itemize}The following hints control the two-phase (collective buffering)optimization and are applicable to all file system types:\begin{itemize}\item \texttt{cb\_buffer\_size} -- Controls the size (in bytes) of theintermediate buffer used in two-phase collective I/O.  If the amountof data that an aggregator will transfer is larger than this value,then multiple operations are used.  The default is \texttt{4194304} (4~Mbytes).\item \texttt{cb\_nodes} -- Controls the maximum number of aggregatorsto be used.  By default this is set to the number of unique hosts in thecommunicator used when opening the file.\item \texttt{romio\_cb\_read} -- Controls when collective buffering isapplied to collective read operations.  Valid values are\texttt{enable}, \texttt{disable}, and \texttt{automatic}.  Default is\texttt{automatic}.  When enabled, all collective reads will usecollective buffering.  When disabled, all collective reads will beserviced with individual operations by each process.  When set to\texttt{automatic}, ROMIO will use heuristics to determine when toenable the optimization.\item \texttt{romio\_cb\_write} -- Controls when collective buffering isapplied to collective write operations.  Valid values are \texttt{enable}, \texttt{disable}, and \texttt{automatic}.  Default is \texttt{automatic}.  See the description of \texttt{romio\_cb\_read} foran explanation of the values.\item \texttt{romio\_no\_indep\_rw} -- This hint controls when ``deferredopen'' is used.  When set to \texttt{true}, ROMIO will make an effort to avoidperforming any file operation on non-aggregator nodes.  The application isexpected to use only collective operations.  This is discussed in furtherdetail below.\item \texttt{cb\_config\_list} -- Provides explicit control over aggregators.  This is discussed in further detail below.\end{itemize}For some systems configurations, more control is needed to specify whichhardware resources (processors or nodes in an SMP) are preferred forcollective I/O, either for performance reasons or because only certainresources have access to storage.  The additional MPI\_Info key name\texttt{cb\_config\_list} specifies a comma-separated list of strings,each string specifying a particular node and an optional limit on thenumber of processes to be used for collective buffering on this node.This refers to the same processes that \texttt{cb\_nodes} refers to,but specifies the available nodes more precisely.The format of the value of \texttt{cb\_config\_list} is given by thefollowing BNF:\begin{verbatim}cb_config_list => hostspec [ ',' cb_config_list ]hostspec => hostname [ ':' maxprocesses ]hostname => <alphanumeric string>         |  '*' maxprocesses => <digits>         |  '*'\end{verbatim}The value \texttt{hostname} identifies a processor. This name must matchthe name returned by \texttt{MPI\_Get\_processor\_name}~\footnote{TheMPI standard requires that the output from this routine identify aparticular piece of hardware; some MPI implementations may not conformto this requirement. MPICH does conform to the MPI standard.}%for the specified hardware. The value \texttt{*} as a hostname matches allprocessors. The value of maxprocesses may be any nonnegative integer(zero is allowed).The value \texttt{maxprocesses} specifies the maximum number ofprocesses that may be used for collective buffering on the specifiedhost. If no value is specified, the value one is assumed. If \texttt{*}is specified for the number of processes, then all MPI processes withthis same hostname will be used..Leftmost components of the info value take precedence.Note: Matching of processor names to \texttt{cb\_config\_list} entriesis performed with string matching functions and is independent of thelisting of machines that the user provides to mpirun/mpiexec.  In otherwords, listing the same machine multiple times in the list of hosts torun on will not cause a \texttt{*:1} to assign the same host fouraggregators, because the matching code will see that the processor nameis the same for all four and will assign exactly one aggregator to theprocessor.The value of this info key must be the same for all processes (i.e., thecall is collective and each process must receive the same hint value forthese collective buffering hints).  Further, in the ROMIO implementationthe hint is only recognized at \texttt{MPI\_File\_open} time.The set of hints used with a file is available through the routine\texttt{MPI\_File\_get\_info}, as documented in the MPI-2 standard. As an additional feature in the ROMIO implementation, wildcards willbe expanded to indicate the precise configuration used with the file,with the hostnames in the rank order used for the collective bufferingalgorithm (\emph{this is not implemented at this time}).Here are some examples of how this hint might be used:\begin{itemize}\item \texttt{*:1} One process per hostname (i.e., one process per node)\item \texttt{box12:30,*:0} Thirty processes on one machine, namely      \texttt{box12}, and none anywhere else.\item \texttt{n01,n11,n21,n31,n41} One process on each of these specific      nodes only.\end{itemize}When the values specified by \texttt{cb\_config\_list} conflict withother hints (e.g., the number of collective buffering nodes specified by\texttt{cb\_nodes}), the implementation is encouraged to take the minimum

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -