📄 section-data.tex

📁 一个语言识别引擎
💻 TEX
字号:

\chapter{Data formats}

The information in this chapter is very provisional.


The policy for representing data in YARP is to stick closely to the XDR 
External Data Representation Standard~\cite{srinivasan95xdr} (but
without zero padding currently).

\begin{itemize}
\item yarp.format.matrix for sending matrices (and vectors).
\item yarp.format.bottle for sending an unstructured bunch of stuff.
\end{itemize}




\section{Vector/Matrix/Image data format: yarp.format.matrix}

This data format consists of a header and a body.  We will speak of it
as representing (2D) matrices but it is also the recommended format for
vectors and (2D) images.  The header consists of the following.

\begin{tabular}{|l|l|l|}
\hline
{\bf bytes} & {\bf label} & {\bf comment} \\ \hline \hline
NetInt32 & length & total number of bytes in the body \\
NetInt32 & width & a matrix width (number of columns) \\
NetInt32 & id & an identifier for the type of data in the matrix cells \\
NetInt32 & height & a matrix height (number of rows) \\
NetInt32 & cellSize & number of bytes in each cell \\
NetInt32 & unused & should be set to 1 \\
NetInt32 & unused & should be set to 1 \\ \hline
\end{tabular}

``NetInt32'' is an integer represented as a
sequence of 4 bytes in big-endian order (following the XDR specification).
The somewhat strange ordering of numbers within the
header is for backwards-compatibility reasons.

The header is followed immediately by the body, which consists
of {\em length} bytes. This contains all the cells of the matrix,
in row order (first going left to right on the top row, then 
going left to right on the next row, etc).  The size of each 
cell should be {\em cellSize} and are interpreted based on the {\em id}
given in the header.

Here are defined IDs:

\begin{tabular}{|l|l|l|l|}
\hline
{\bf id} & {\bf label} & {\bf cellSize} & {\bf comment} \\ \hline \hline
0 & ``invalid'' & 0 & empty pixel type \\
1 & ``mono'' & 1 & a single unsigned byte (monochrome pixel) \\
2 & ``rgb'' & 3 & three unsigned bytes (a color pixel in r g b order) \\
3 & ``hsv'' & 3 & three unsigned bytes (a color pixel as hue saturation value) \\
4 & ``bgr'' & 3 & three unsigned bytes (a color pixel in b g r order) \\
5 & ``mono-signed'' & 1 & a single signed byte \\
6 & ``rgb-signed'' & 3 & three signed bytes (color differences in r g b order) \\
7 & ``mono-float'' & 4 & a single 32-bit floating point number \\
8 & ``rgb-float'' & 12 & three floating point numbers (color in r g b order) \\
9 & ``hsv-float'' & 12 & three floating point numbers (color as hue saturation value) \\
10 & ``int'' & 4 & a single 32-bit integer \\ \hline
\end{tabular}



\section{YARP Bottles: yarp.format.bottle}

The ``Bottle'' datatype
is a simple, fairly open data format for transporting a couple of
values across the network in a portable way.  It is handy to use until
you feel the need to make your own custom maximally-efficient formats for
transmission. The comes from the idea of throwing a ``message in a
bottle'' into the network and hoping it will eventually wash ashore
somewhere else. In the very early days of YARP, that is what
communication felt like.

The Bottle data format has two encodings, one binary, one text-based.

Here are examples of bottles in the text-based format:

\begin{quote}

  999 ``hello world''

  1.2 -15.0 88.111 ``foo'' 1 0 1 1

\end{quote}

Here is the structure of the corresponding bottles in the binary format.
Numbers in parentheses are encoded as 4-byte big-endian integers.
Numbers in square brackets are encoded as 4-byte
(check this) floating point numbers.
Strings are encoding as sequences of bytes, with a null termination.

\begin{quote}

  (6) ``YARP2'' (28) (1) (999) (5) (12) ``hello world''

  (6) ``YARP2'' (68) (2) [1.2] (2) [-15.0] (2) [88.111] (5) (4) ``foo'' (1) (1) (1) (0) (1) (1) (1) (1)

\end{quote}


Why the reason for the two formats?  They are suitable in different
situations.  The binary format is much easier to parse and read in
a block.  It is also very extendible.  The text-based format is
a bit more awkward to parse, but easier for a human to read and write.


\subsection{Binary format}

The binary format begins with a 4-byte big-endian integer encoding a
length, followed by a sequence of bytes of that length.  The sequence
of bytes forms a null terminated string.  Any string is acceptable.
Older versions of the YARP protocol have variability in this string.
The current C++ implementation sets this string to ``YARP2''.

This is followed by another 4-byte big-endian integer encoding
a length, followed by a sequence of bytes of that length.  This
sequence of bytes is the ``payload''.  If consists of a
sequence of units of the following form:

\begin{itemize} \pflist
  \item A 4-byte big-endian integer encoding an identifier.
  \item Some sequence of bytes, specified and interpreted
    in a way that is particular to that identifier.
\end{itemize}

\noindent
Here is a partial list of identifiers and the sequence of bytes
that follows them:

\begin{figure}[h]
\begin{tabular}{|l|p{5in}|}
\hline
{\bf code} & {\bf content} \\ \hline\hline
1 & Followed by an integer.  Specificially, a 32-bit big-endian integer. \\
2 & Followed by a floating point number.  Specifically, a 32-bit? double (what standard?) \\
5 & Followed by a string, represented as a length (32-bit big-endian integer) and
    sequence of bytes (null-terminated) \\
\hline
\end{tabular}
\end{figure}


\subsection{Text format}

Bottles can also be read and written in a text format, a sequence of
bytes (not including carriage returns or line feeds) terminated by a
carriage return or line feed.  Implementations should use UNIX-style
line breaks where possible, but accept as broad a range of styles as
possible.

The text is a sequence of space-separated items of the following form:

\begin{figure}[h]
\begin{tabular}{|l|p{5in}|}
\hline
{\bf pattern} & {\bf content} \\ \hline\hline
$[+-]?[0-9]+$ & an integer \\
$[+-]?[0-9]*.[0-9]*$ & a floating point number \\
$".*"$ & a string -- needs appropriate internal quoting (what standard?) \\
\hline
\end{tabular}
\end{figure}


\section{Missing stuff}

Extending, vectors, etc.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -