📄 ctc_trellis_encoder.asm

📁 移动通讯PHY设计中用到的数据块的CTC编译模块
💻 ASM
📖 第 1 页 / 共 2 页
字号:
12 下一页
/*****************************************************************************Copyright (c) 2005 Analog Devices.  All Rights Reserved.Developed by Analog Devices Australia - Unit 3, 97 Lewis Road,Wantirna, Victoria, Australia, 3152.  Email: ada.info@analog.comTHIS SOFTWARE IS PROPRIETARY & CONFIDENTIAL.  By using this module youagree to the terms of the associated Analog Devices License Agreement.******************************************************************************$Revision: 2438 $$Date: 2005-09-13 15:51:40 +1000 (Tue, 13 Sep 2005) $Project:        IEEE 802.16 LibraryTitle:          CTC trellis encoderAuthor(s):      Michael Lopez (michael.lopez@analog.com)Revised by:Description:                CTC encoder as descibed in section 8.4.9.2.3 of                [1].  This function provides only the constituent                encoder along with circulation state functionality.References:                [1] IEEE P802.16-2004, October 2004******************************************************************************Target Processor:           ADSP-TS201Target Tools Revision:      easmts 1.6.0.11*****************************************************************************//* * void ctc_trellis_encoder(ctc_encoder_params_t *params, *                          unsigned num_input_bytes) *//* * Algorithm description: * * The specification says to do the following: *  1) Encode inputs A,B starting from state 0 *  2) Look at output state to get circulation state *  3) Encode inputs A,B starting from circulation state * * Instead, we implement as: *  1) Encode input a starting from state 0 (compute block X) *  2) Encode input b starting from state 0 (compute block Y) *  3) XOR outputs and final states of (1) and (2) together *  4) Find circulation state using final state in (3) *  5) Encode input 0 starting from circulation state (using *     lookup table) and XOR with output of (3) * * The main loop processes (1) in compute block X and (2) in compute * block Y.  It first implements the denominator and then the numerators. * The most difficult part is implementing the denominators. * * The transfer functions all have denominator H(D) = 1+D+D^3. * Instead of implementing 1/H(D), we instead implement the equivalent * expanded polynomial H^31(D)/H^32(D).  This allows us to encode * 32 bits at once. * * 1/H(D) = H^31(D) / H^32(D) * * H^32(D) = 1 + D^32 + D^96 * * H^31(D) =   1   + D    + D^2  +      + D^4  + *            D^7  + D^8  + D^9  +      + D^11 + *            D^14 + D^15 + D^16 +      + D^18 + *            D^21 + D^22 + D^23 +      + D^25 + *            D^28 + D^29 + D^30 + *            D^33 + D^34 + D^35 +      + D^37 + *            D^40 + D^41 + D^42 +      + D^44 + *            D^47 + D^48 + D^49 +      + D^51 + *            D^54 + D^55 + D^56 +      + D^58 + *            D^61 + D^62 + D^63 +      + D^65 + *            D^68 + D^69 + D^70 +      + D^72 + *            D^75 + D^76 + D^77 +      + D^79 + *            D^82 + D^83 + D^84 +      + D^86 + *            D^89 + D^90 + D^91 +      + D^93 * * Since the largest delay is 93 and we apply to 32 bits at a time, this * expanded transfer function requires almost a full quadword (128 bits) of * state variables.  However, the TigerSHARC shifter usually only works on 64 * bits at a time.  Therefore, H^31(D) is applied in three pieces: *   - Delays 0 to 63 operating as left shifts within the first longword *   - Delays 65 to 93 operating as left shifts from the second longword *   - Delays 33 to 63 operating as right shifts from the second longword * * After applying the denominator, we apply the numerators to get the outputs * and states. The required numerators for A are: *   Y:  1 + D^2 + D^3 *   W:  1 + D^3 *   s1: 1 *   s2: D *   s3: D^2 * The required numerators for B are: *   Y:  1 + D + D^2 + D^3 *   W:  1 + D^2 *   s1: 1 + D + D^2 *   s2: 1 + D^2 *   s3: 1 * * The main part of this code gets run twice: One for the noninterleaved * data (a1, b1) and one for the interleaved data (a2, b2). */.global _ctc_trellis_encoder;/* * Arguments: *           j4: ctc_encoder_params_t  *params *           j5: unsigned               num_input_bytes (includes a1 and b1) * *     [j4 + 0]: bit32x1  *a1, *     [j4 + 1]: bit32x1  *b1, *     [j4 + 2]: bit32x1  *a2, *     [j4 + 3]: bit32x1  *b2, *     [j4 + 4]: bit32x1  *y1, *     [j4 + 5]: bit32x1  *w1, *     [j4 + 6]: bit32x1  *y2, *     [j4 + 7]: bit32x1  *w2*/#include "circ_state_tables.h".section program;.align_code 4;_ctc_trellis_encoder://------------------PROLOGUE-------------------------------//    j26 = j27 - 64;             k26 = k27 - 64;;    [j27 += -28] = cjmp;        k27 = k27 - 20;;    q[j27 + 24] = xR27:24;      q[k27 + 16] = yR27:24;;    q[j27 + 20] = xR31:28;      q[k27 + 12] = yR31:28;;//---------------------------------------------------------///////////////////////////////////////////////////////////////////// Preliminaries//// 1) In xr31:30, compute number of main loop iterations////    The J4 input is the total number of input bytes.//    Each iteration processes 64 bits (32 bits of A, 32 bits of B).//    One additional iteration is required at the end (due to software//    pipelining).//    Therefore, the number of iterations is//          ceil(num_input_bytes * 8 / 64) + 1//        = floor((num_input_bytes + 15)/8//    Result is kept in xr31 and lc0//    xr30 remains num_input_bytes//// 2) In yr28:26, compute the amount of shift that will be needed for the//    circulation state computation.  This shift is needed because the//    main loop processes 32 bits each of A and B per iteration, but the//    input data may end on any multiple of 8.////    For A, simply take the last 3 bits of the state.//    If the last 3 bits of num_input_bytes are x, we need to shift the state by//       -29,     if  x = 0//       -(4x-3), otherwise//    and then mask out the last 3 bits.//    Result is kept in yxr28.//// 3) In xr23:20, compute (N mod 7) used for offset into circulation//    state table//    Result is kept in xr29//// 4) Miscellaneous//     - Initializing k4 to zero makes the first main loop output dummy//     - Initialize output pointers k6 (for y) and k7 (for w)//     - Initialize r0, r1, r2, r5, r7, r11, r13, r15, r17, r19 to zero for main loop//     - k5 will be -1 for noninterleaved data, 0 for interleaved data/////////////////////////////////////////////////////////////////// N mod 7                   // circ state shift         // # iterations          // miscr30 = j5;                    yr27 = 7;;j11:8 = Q[J4 += 4];          r19:18 = r19:18 - r19:18;   xr31 = 15;               sr17:16 = lshift r17:16 by 16;;k11:8 = Q[J4 += 4];          yr27 = r30 and r27;                                  k5 = k31-1;;xr29 = lshift r30 by 2 (NF); yr28 = 29;                  xr23 = 0x12492492;;if nyale;                    do, yr28 = lshift r27 by 2; yr26 = 3;                r3:0   = r19:16;;xr21:20 = r29 * r23;         xr22 = 7;                   xr31 = r30 + r31 (NF);   r7:4   = r19:16;;if nyale;                    do, yr28 = r28 - r26;       r15:12 = r19:16;         xr3 = [j8 += 1];;xr21 = r21 * r22 (I);        k7 = k9+k31;                xr31 = lshift r31 by -3; yr3 = [j9 += 1];;k4 = k31+k31;                yr28 = -r28;                lc0 = xr31;              sr11  = lshift r11 by 16;;
12 下一页
💿 文件大小 6 K
👤 上传用户 dedien
📂 所属分类 DSP编程
🏷️ 相关标签

#PHY #CTC #移动通讯 #数据
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -