📄 format.tex
字号:
\title{Guide to RSF format}\email{sergey.fomel@beg.utexas.edu}\author{Sergey Fomel}\lefthead{Fomel}\righthead{RSF format}\maketitle\begin{abstract} This guide explains the RSF file format.\end{abstract}\inputdir{format}\section{Principles}The main design principle behind the RSF file format is KISS (``Keep ItSimple, Stupid!''). The RSF format is borrowed from the SEPlib data formatoriginally designed at the Stanford Exploration Project\cite[]{Claerbout.sep.70.413}. The format is made as simple as possible formaximum convenience, transparency and flexibility.According to the Unix tradition, common file formats should be in a readabletextual form so that they can be easily examined and processed with universaltools. \cite{taoup} writes:\begin{quote} To design a perfect anti-Unix, make all file formats binary and opaque, and require heavyweight tools to read and edit them.\end{quote}\begin{quote} If you feel an urge to design a complex binary file format, or a complex binary application protocol, it is generally wise to lie down until the feeling passes.\end{quote}Storing large-scale datasets in a text format may not be economical. RSFchooses the next best thing: it allows data values to be stored in a binaryformat but puts all data attributes in text files that can be read by humansand processed with universal text-processing utilities.\subsection{Example}Let us first create some synthetic RSF data.\begin{verbatim}bash$ sfmath n1=1000 output='sin(0.5*x1)' > sin.rsf\end{verbatim}Open and read the file \texttt{sin.rsf}.\begin{verbatim}bash$ cat sin.rsfsfmath rsf/rsf/rsftour: fomels@egl Sun Jul 31 07:18:48 2005 o1=0 data_format="native_float" esize=4 in="/tmp/sin.rsf@" x1=0 d1=1 n1=1000\end{verbatim}The file contains nine lines with simple readable text. The first lineshows the name of the program, the working directory, the user andcomputer that created the file and the time it was created (thatinformation is recorded for accounting purposes). Other lines containparameter-value pairs separated by the ``='' sign. The ``in''parameter points to the location of the binary data. Before we discussthe meaning of parameters in more detail, let us plot the data.\begin{verbatim}bash$ < sin.rsf sfwiggle title='One Trace' | xtpen\end{verbatim}On your screen, you should see a plot similar to Figure~\ref{fig:sin1}.\plot{sin1}{width=5in}{An example sinusoid plot.} Suppose you want to reformat the data so that instead of one trace of athousand samples, it contains twenty traces with fifty samples each. Tryrunning\begin{verbatim}bash$ < sin.rsf sed 's/n1=1000/n1=100 n2=10/' > sin10.rsf bash$ < sin10.rsf sfwiggle title=Traces | xtpen\end{verbatim}or (using pipes)\begin{verbatim}bash$ < sin.rsf sed 's/n1=1000/n1=50 n2=20/' | sfwiggle title=Traces | xtpen\end{verbatim}On your screen, you should see a plot similar to Figure~\ref{fig:sin2}.\plot{sin2}{width=5in}{An example sinusoid plot, with data reformatted to twenty traces.}What happened? We used \texttt{sed}, a standard Unix line editing utility tochange the parameters describing the data dimensions. Because of thesimplicity of this operation, there is no need to create specialized dataformatting tools or to make the \texttt{sfwiggle} program accept additionalformatting parameters. Other general-purpose Unix tools that can be applied onRSF files include \texttt{cat}, \texttt{echo}, \texttt{grep}, etc. An alternative way to obtain the previous result is to run\begin{verbatim}bash$ ( cat sin.rsf; echo n1=50 n2=20 ) > sin10.rsf bash$ < sin10.rsf sfwiggle title=Traces | xtpen\end{verbatim}In this case, the \texttt{cat} utility simply copies the contents of theprevious file, and the \texttt{echo} utility appends new line ``\texttt{n1=50 n2=20}''. A new value of the \texttt{n1} parameter overwrites the old valueof \texttt{n1=1000}, and we achieve the same result as before.Of course, one could also edit the file by hand with one of the generalpurpose text editors. For recording the history of data processing, it isusually preferable to be able to process files with non-interactive tools.\section{Header and Data files}A simple way to check the layout of an RSF file is with the \texttt{sfin}program.\begin{verbatim}bash$ sfin sin10.rsfsin10.rsf: in="/tmp/sin.rsf@" esize=4 type=float form=native n1=50 d1=1 o1=0 n2=20 d2=? o2=? 1000 elements 4000 bytes\end{verbatim}The program reports the following information: the location of the data file(\texttt{/tmp/sin.rsf\@}), the element size (4 bytes), the elementtype (floating point), the element form (native), the hypercube dimensions($50 \times 20$), axis scaling (1 and unspecified), and axis origin (0 andunspecified). It also checks the total number of elements and bytes in thedata file.Let us examine this information in detail. First, we can verify that the datafile exists and contains the specified number of bytes:\begin{verbatim}bash$ ls -l /tmp/sin.rsf@-rw-r--r-- 1 sergey users 4000 2004-10-04 00:35 /tmp/sin.rsf@\end{verbatim}4000 bytes in this file are required to store $50 \times 20$ floating-point4-byte numbers in a binary form. Thus, the data file contains nothing but theraw data in a contiguous binary form.\subsection{Datapath}How did the RSF program (\texttt{sfmath}) decide where to put the data file?In the order of priority, the rules for selecting the data file name and thedata file directory are as follows:\begin{enumerate}\item Check \texttt{out=} parameter on the command line. The parameter specifies the output data file location explicitly.\item Specify the path and the file name separately. \begin{itemize} \item The rules for the path selection are: \begin{enumerate} \item Check \texttt{datapath=} parameter on the command line. The parameter specifies a string to prepend to the file name. The string may contain the file directory. \item Check \texttt{DATAPATH} environmental variable. It has the same meaning as the parameter specified with \texttt{datapath=}. \item Check for \texttt{.datapath} file in the current directory. The file may contain a line \begin{verbatim}datapath=/path/to_file/\end{verbatim} or\begin{verbatim}machine_name datapath=/path/to_file/\end{verbatim} if you indent to use different paths on different platforms. \item Check for \texttt{.datapath} file in the user home directory. \item Put the data file in the current directory (similar to \texttt{datapath=./}). \end{enumerate} \item The rules for the filename selection are: \begin{enumerate} \item If the output RSF file is in the current directory, the name of the data file is made by appending \@. \item If the output file is not in the current directory or if it is created temporarily by a program, the name is made by appending random characters to the name of the program and selected to be unique. \end{enumerate} \end{itemize}\end{enumerate}Examples:\begin{itemize}\item \ \\\begin{verbatim}bash$ sfspike n1=10 out=test1 > spike.rsfbash$ grep in spike.rsf in="test1"\end{verbatim}\item \ \\\begin{verbatim}bash$ sfspike n1=10 datapath=/tmp/ > spike.rsfbash$ grep in spike.rsf in="/tmp/spike.rsf@"\end{verbatim}\item \ \\\begin{verbatim}bash$ DATAPATH=/tmp/ sfspike n1=10 > spike.rsfbash$ grep in spike.rsf in="/tmp/spike.rsf@"\end{verbatim}\item \ \\\begin{verbatim}bash$ sfspike n1=10 datapath=/tmp/ > /tmp/spike.rsfbash$ grep in /tmp/spike.rsfin="/tmp/sfspikejcARVf"\end{verbatim}\end{itemize}\subsubsection{Packing header and data together}While the header and data files are separated by default, it is also possibleto pack them together into one file. To do that, specify the program's``\texttt{out}'' parameter as \texttt{out=stdout}. Example:\begin{verbatim}bash$ sfspike n1=10 out=stdout > spike.rsfbash$ grep in spike.rsfBinary file spike.rsf matchesbash$ sfin spike.rsfspike.rsf: in="stdin" esize=4 type=float form=native n1=10 d1=0.004 o1=0 label1="Time" unit1="s" 10 elements 40 bytesbash$ ls -l spike.rsf-rw-r--r-- 1 sergey users 196 2004-11-10 21:39 spike.rsf\end{verbatim}If you examine the contents of \texttt{spike.rsf}, you will find that itstarts with the text header information, followed by specialsymbols, followed by binary data. Packing headers and data together may not be a good idea for data processingbut it works well for storing data: it is easier to move the packed filearound than to move two different files (header and binary) together whileremembering to preserve their connection. Packing header and data together isalso the current mechanism used to push RSF files through Unix pipes.\subsection{Type}The data stored with RSF can have different types: character, unsignedcharacter, integer, floating point, or complex. By default, single precisionis used for numbers (\texttt{int} and \texttt{float} data types in the Cprogramming language). The number of bytes required for represent thesenumbers may depend on the platform.\subsection{Form}The data stored with RSF can also be in a different form: ASCII, nativebinary, and XDR binary. Native binary is often used by default. It is thebinary format employed by the machine that is running the application. OnLinux-running PC, the native binary format will typically correspond to theso-called little-endian byte ordering. On some other platform, it might bebig-endian ordering. XDR is a binary format designed by Sun for exchanging
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -