📄 simd-viterbi.3
字号:
.TH SIMD-VITERBI 3
.SH NAME
create_viterbi27, init_viterbi27, update_viterbi27, chainback_viterbi27,
delete_viterbi27, create_viterbi29, init_viterbi29, update_viterbi29,
chainback_viterbi29, delete_viterbi29 -\ IA32 SIMD-assisted Viterbi decoders
.SH SYNOPSIS
.nf
.ft B
#include "viterbi27.h"
struct v27 *create_viterbi27(int blocklen);
int init_viterbi27(struct v27 *vp,int starting_state);
int update_viterbi27(struct v27 *vp,unsigned char sym1,unsigned char sym2);
int chainback_viterbi27(struct v27 *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
void delete_viterbi27(struct v27 *vp);
.fi
.sp
.nf
.ft B
#include "viterbi29.h"
struct v29 *create_viterbi29(int blocklen);
int init_viterbi29(struct v29 *vp,int starting_state);
int update_viterbi29(struct v29 *vp,unsigned char sym1,unsigned char sym2);
int chainback_viterbi29(struct v29 *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
void delete_viterbi29(struct v29 *vp);
.fi
.SH DESCRIPTION
These functions implement high performance Viterbi decoders for two
convolutional codes: a rate 1/2 constraint length 7 (k=7) code
("viterbi27") and a rate 1/2 k=9 code ("viterbi29"). The decoders use
the Intel IA32 SIMD instruction sets, if available, to improve
performance.
There are three different IA32 SIMD instruction sets. The most common
is MMX, first implemented on later Intel Pentiums and then on the
Intel Pentium II and most Intel clones (AMD K6, Transmeta Crusoe,
etc). SSE was introduced on the Pentium III and later implemented in
the AMD Athlon 4 (AMD calls it "3D Now! Professional"). Most recently,
SSE2 was introduced in the Intel Pentium 4. As of mid 2001, there are
no other known implementations of SSE2.
Four separate libraries implement the decoders for four different
instruction sets. \fBlibviterbi.a\fR uses no SIMD instructions; it is
provided for source-code compatibility with non-IA32 machines.
\fBlibviterbimmx.a\fR is for IA-32 machines that support the MMX
instructions; \fBlibviterbisse.a\fR is for machines with the SSE
instructions, and \fBlibviterbisse2.a\fR is for machines with SSE2
support. The function names and calling conventions are the same for
all four versions, although the size of certain internal data
structures are different.
.SH USAGE
Two versions of each function are provided, one for the k=7 code and
another for the k=9 code. In the following discussion the k=7 code
will be assumed. To use the k=9 code, simply change all references to
"viterbi27" to "viterbi29".
Before Viterbi decoding can begin, an instance must first be created with
\fBcreate_viterbi27\fR. This function creates an instance of
\fBstruct viterbi27\fR that contains the path metrics and the branch
decisions. \fBcreate_viterbi27\fR takes one argument that gives the
length of the data block in bits. You \fImust not\fR attempt to
decode a block longer than the length given to \fBcreate_viterbi27\fR.
After a decoder instance is created, and before decoding a new frame,
\fBinit_viterbi27\fR must be called to reset the decoder state.
It accepts the instance pointer returned by
\fBcreate_viterbi27\fR and the initial starting state of the
convolutional encoder (usually 0). If the initial starting state is unknown or
incorrect, the decoder will still function but the decoded data may be
incorrect at the start of the block.
Each pair of received symbols is processed with a call to
\fBupdate_viterbi27\fR. Each symbol is expected to range from 0
through 15, with 0 corresponding to a "strong 0" and 15 corresponding
to a "strong 1". The caller is responsible for determining the proper
pairing of input symbols (commonly known as decoder symbol phasing).
At the end of the block, the data is recovered with a call to
\fBchainback_viterbi27\fR. The arguments are the pointer to the
decoder instance, a pointer to a user-supplied buffer into which the
decoded data is to be written, the number of data bits (not bytes)
that are to be decoded, and the terminal state of the convolutional
encoder at the end of the frame (usually 0). If the terminal state is
incorrect or unknown, the decoded data bits at the end of the frame
may be unreliable. The decoded data is written in big-endian order,
i.e., the first bit in the frame is written into the high order bit of
the first byte in the buffer. If the frame is not an integral number
of bytes long, the low order bits of the last byte in the frame will
be unused.
Note that the decoders assume the use of a tail, i.e., the encoding
and transmission of a sufficient number of padding bits beyond the end
of the user data to force the convolutional encoder into the known
terminal state given to \fBchainback_viterbi27\fR. The k=7 code
uses 6 tail bits (12 tail symbols) and the k=9 code uses 8 tail
bits (16 tail symbols).
The tail bits are not included in the length arguments to
\fBcreate_viterbi27\fR and \fBchainback_viterbi27\fR. For example, if
the block contains 1000 user bits, then this would be the length
parameter given to \fBcreate_viterbi27\fR and
\fBchainback_viterbi27\fR, and \fBupdate_viterbi27\fR would be called
a total of 1006 times - the last 6 with the 12 encoded symbols
representing the tail bits.
After the call to \fBchainback_viterbi27\fR, the decoder may be reset
with a call to \fBinit_viterbi27\fR and another block can be decoded.
Alternatively, \fBdelete_viterbi27\fR can be called to free all resources
used by the Viterbi decoder.
When the MMX and SSE versions of the decoder are used in a program
that uses floating point, you \fImust\fR insert a \fBemms\fR
instructions as needed to avoid interference with your floating point
computations. The MMX registers are only used by
\fBupdate_viterbi27\fR, so if you perform floating point between each
call to this function you should insert the emms instruction
immediately after the cal to \fBupdate_viterbi27\fR. If you perform
floating point only after the end of the frame, you may defer the
instruction until after \fBchainback_viterbi27\fR has been called.
The emms instruction can be inserted with the statement asm("emms");
The SSE2 version uses the XMM registers. These do not interfere with the
X87 floating point stack, so the emms calls are not necessary with
this version.
.SH RETURN VALUES
\fBcreate_viterbi27\fR returns a pointer to the structure containing
the decoder state. \fBupdate_viterbi27\fR returns the amount by which
the decoder path metrics were normalized in the current step. Only the
SSE and SSE2 versions perform normalization, as the C and MMX versions
use modulo arithmetic.
.SH AUTHOR
Phil Karn, KA9Q (karn@ka9q.net)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -