📄 simd-viterbi.3
字号:
.TH SIMD-VITERBI 3.SH NAMEcreate_viterbi27, set_viterbi27_polynomial, init_viterbi27, update_viterbi27_blk,chainback_viterbi27, delete_viterbi27,create_viterbi29, set_viterbi_29_polynomial, init_viterbi29, update_viterbi29_blk,chainback_viterbi29, delete_viterbi29,create_viterbi39, set_viterbi_39_polynomial, init_viterbi39, update_viterbi39_blk,chainback_viterbi39, delete_viterbi39,create_viterbi615, set_viterbi615_polynomial, init_viterbi615, update_viterbi615_blk,chainback_viterbi615, delete_viterbi615 -\ IA32 SIMD-assisted Viterbi decoders.SH SYNOPSIS.nf.ft B#include "fec.h"void *create_viterbi27(int blocklen);void set_viterbi27_polynomial(int polys[2]);int init_viterbi27(void *vp,int starting_state);int update_viterbi27_blk(void *vp,unsigned char syms[],int nbits);int chainback_viterbi27(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);void delete_viterbi27(void *vp);.fi.sp.nf.ft Bvoid *create_viterbi29(int blocklen);void set_viterbi29_polynomial(int polys[2]);int init_viterbi29(void *vp,int starting_state);int update_viterbi29_blk(void *vp,unsigned char syms[],int nbits);int chainback_viterbi29(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);void delete_viterbi29(void *vp);.fi.sp.nf.ft Bvoid *create_viterbi39(int blocklen);void set_viterbi39_polynomial(int polys[3]);int init_viterbi39(void *vp,int starting_state);int update_viterbi39_blk(void *vp,unsigned char syms[],int nbits);int chainback_viterbi39(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);void delete_viterbi39(void *vp);.fi.sp.nf.ft Bvoid *create_viterbi615(int blocklen);void set_viterbi615_polynomial(int polys[6]);int init_viterbi615(void *vp,int starting_state);int update_viterbi615_blk(void *vp,unsigned char syms[],int nbits);int chainback_viterbi615(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);void delete_viterbi615(void *vp);.fi.SH DESCRIPTIONThese functions implement high performance Viterbi decoders for fourconvolutional codes: a rate 1/2 constraint length 7 (k=7) code("viterbi27"), a rate 1/2 k=9 code ("viterbi29"),a rate 1/3 k=9 code ("viterbi39") and a rate 1/6 k=15 code ("viterbi615").The decoders use the Intel IA32 or PowerPC SIMD instruction sets, if available, to improvedecoding speed.On the IA32 there are three different SIMD instruction sets. The firstand most common is MMX, introduced on later Intel Pentiums and then onthe Intel Pentium II and most Intel clones (AMD K6, Transmeta Crusoe,etc). SSE was introduced on the Pentium III and later implemented inthe AMD Athlon 4 (AMD calls it "3D Now! Professional"). Mostrecently, SSE2 was introduced in the Intel Pentium 4, and has beenadopted by more recent AMD CPUs. The presence of SSE2 implies theexistence of SSE, which in turn implies MMX.Altivec is the PowerPC SIMD instruction set. It is roughly comparableto SSE2. Altivec was introduced to the general public in the AppleMacintosh G4; it is also present in the G5. Altivec is actually aMotorola trademark; Apple calls it "Velocity Engine" and IBM calls it"VMX". All refer to the same thing.When built for the IA32 or PPC architectures, the functionsautomatically use the most powerful SIMD instruction set available. Ifno SIMD instructions are available, or if the library is built for anon-IA32, non-PPC machine, a portable C version is executedinstead..SH USAGEFour versions of each function are provided, one for each code.In the following discussion, change "viterbi" to "viterbi27", "viterbi29", "viterbi39"or "viterbi615" as desired. Before Viterbi decoding can begin, an instance must first be created with\fBcreate_viterbi()\fR. This function creates and returns a pointer toan internal control structurecontaining the path metrics and the branchdecisions. \fBcreate_viterbi()\fR takes one argument that gives thelength of the data block in bits. You \fImust not\fR attempt todecode a block longer than the length given to \fBcreate_viterbi()\fR.Before decoding a new frame,\fBinit_viterbi()\fR must be called to reset the decoder state.It accepts the instance pointer returned by\fBcreate_viterbi()\fR and the initial starting state of theconvolutional encoder (usually 0). If the initial starting state is unknown orincorrect, the decoder will still function but the decoded data may beincorrect at the start of the block.Blocks of received symbols are processed with calls to\fBupdate_viterbi_blk()\fR. The \fBnbits\fR parameter specifies thenumber of \fIdata bits\fR (not channel symbols) represented by the\fBsyms\fR buffer. (For rate 1/2 codes, the number of symbols in\fBsyms\fR is twice \fInbits\fR, and so on.)Each symbol is expected to rangefrom 0 through 255, with 0 corresponding to a "strong 0" and 255corresponding to a "strong 1". The caller is responsible fordetermining the proper pairing of input symbols (commonly known asdecoder symbol phasing).At the end of the block, the data is recovered with a call to\fBchainback_viterbi()\fR. The arguments are the pointer to thedecoder instance, a pointer to a user-supplied buffer into which thedecoded data is to be written, the number of data bits (not bytes)that are to be decoded, and the terminal state of the convolutionalencoder at the end of the frame (usually 0). If the terminal state isincorrect or unknown, the decoded data bits at the end of the framemay be unreliable. The decoded data is written in big-endian order,i.e., the first bit in the frame is written into the high order bit ofthe first byte in the buffer. If the frame is not an integral numberof bytes long, the low order bits of the last byte in the frame willbe unused.Note that the decoders assume the use of a tail, i.e., the encodingand transmission of a sufficient number of padding bits beyond the endof the user data to force the convolutional encoder into the knownterminal state given to \fBchainback_viterbi()\fR. The tail isalways one bit less than the constraint length of the code, so the k=7code uses 6 tail bits (12 tail symbols), the k=9 code uses 8 tail bits(16 tail symbols) and the k=15 code uses 14 tail bits (84 tailsymbols).The tail bits are not included in the length arguments to\fBcreate_viterbi()\fR and \fBchainback_viterbi()\fR. For example, ifthe block contains 1000 user bits, then this would be the lengthparameter given to \fBcreate_viterbi27()\fR and\fBchainback_viterbi27()\fR, and \fBupdate_viterbi27_blk()\fR would be calledwith a total of 2012 symbols - the last 12 encoded symbolsrepresenting the tail bits.After the call to \fBchainback_viterbi()\fR, the decoder may be resetwith a call to \fBinit_viterbi()\fR and another block can be decoded.Alternatively, \fBdelete_viterbi()\fR can be called to free all resourcesused by the Viterbi decoder.The \fBset_viterbi_polynomial()\fR function allows use of other than the defaultcode generator polynomials. Although only one set of polynomials are generallyused with each code, there can are different conventions as to their order andsymbol polarity, and these functions simplifies their use.The default polynomials for the viterbi27 routesare those of the NASA-JPL convention \fIwithout\fR symbol inversion.The NASA-JPL convention normally inverts the first symbol.The CCSDS/NASA-GSFC convention swaps the two symbols and inverts the second..spTo set the NASA-JPL convention with symbol inversion:.sp.nf.ft Bint polys[2] = { -V27POLYA,V27POLYB };set_viterbi27_polynomial(polys);.ft R.fi.spand to set the CCSDS convention with symbol inversion:.sp.nf.ft Bint polys[2] = { V27POLYB,-V27POLYA };set_viterbi27_polynomial(polys);.ft R.fi.spThe default polynomials for the viterbi615 routinesare those used by the Cassini spacecraft \fIwithout\fRsymbol inversion. Mars Pathfinder (MPF) and STEREOswap the third and fourth polynomials.Both conventions invert thefirst, third and fifth symbols. Refer to fec.h for the polynomial constant definitions..spTo set the Cassini convention with symbol inversion, do the following:.nf.ft Bint polys[6] = { -V615POLYA,V615POLYB,-V615POLYC,V615POLYD,-V615POLYE,V615POLYF };set_viterbi615_polynomial(polys);.ft R.fi.spand to set the MPF/STEREO convention with symbol inversion:.sp.nf.ft Bint polys[6] = { -V615POLYA,V615POLYB,-V615POLYD,V615POLYC,-V615POLYE,V615POLYF };set_viterbi615_polynomial(polys);.ft R.fiFor performance reasons, calling this function changes the codegenerator polynomials for \fIall\fR instances of corresponding Viterbi decoder,including those already created..SH ERROR PERFORMANCEThese decoders have all been extensively tested and found to provideperformance consistent with that expected for soft-decision Viterbidecoding with 8-bit symbols.Due to internal differences, the implementationsvary slightly in error performance. Ingeneral, the portable C versions exhibit the best error performancebecause they use full-sized branch metrics, and the MMX versionsexhibit the worst because they use 8-bit branch metrics with modulocomparisons. The SSE, SSE2 and Altivec implementations of the r=1/2 k=7 andr=1/2 k=9 codes use unsigned8-bit branch metrics, and are almost as good as the C versions. Ther=1/3 k=9 and r=1/6 k=15 codes are implemented with 16-bit path metrics in all SIMDversions..SH DIRECT ACCESS TO SPECIFIC FUNCTION VERSIONSCalling the functions listed above automatically calls the appropriateversion of the function depending on the CPU type and available SIMDinstructions. A particular version can also be called directly byappending the appropriate suffix to the function name. The availablesuffixes are "_mmx", "_sse", "_sse2", "_av" and "_port", for the MMX,SSE, SSE2, Altivec and portable versions, respectively. For example,the SSE2 version of the update_viterbi27_blk() function can be invokedas update_viterbi27_blk_sse2().Naturally, the _av functions are only available on the PowerPC and the_mmx, _sse and _sse2 versions are only available on IA-32. Callinga SIMD-enabled function on a CPU that doesn't support the appropriateset of instructions will result in an illegal instruction exception..SH RETURN VALUES\fBcreate_viterbi\fR returns a pointer to the structure containingthe decoder state. The other functions return -1 on error, 0 otherwise..SH AUTHOR & COPYRIGHTPhil Karn, KA9Q (karn@ka9q.net).SH LICENSEThis software may be used under the terms of the GNU Limited General Public License (LGPL).
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -