📄 ctc_trellis_encoder.asm
字号:
/*****************************************************************************Copyright (c) 2005 Analog Devices. All Rights Reserved.Developed by Analog Devices Australia - Unit 3, 97 Lewis Road,Wantirna, Victoria, Australia, 3152. Email: ada.info@analog.comTHIS SOFTWARE IS PROPRIETARY & CONFIDENTIAL. By using this module youagree to the terms of the associated Analog Devices License Agreement.******************************************************************************$Revision: 2438 $$Date: 2005-09-13 15:51:40 +1000 (Tue, 13 Sep 2005) $Project: IEEE 802.16 LibraryTitle: CTC trellis encoderAuthor(s): Michael Lopez (michael.lopez@analog.com)Revised by:Description: CTC encoder as descibed in section 8.4.9.2.3 of [1]. This function provides only the constituent encoder along with circulation state functionality.References: [1] IEEE P802.16-2004, October 2004******************************************************************************Target Processor: ADSP-TS201Target Tools Revision: easmts 1.6.0.11*****************************************************************************//* * void ctc_trellis_encoder(ctc_encoder_params_t *params, * unsigned num_input_bytes) *//* * Algorithm description: * * The specification says to do the following: * 1) Encode inputs A,B starting from state 0 * 2) Look at output state to get circulation state * 3) Encode inputs A,B starting from circulation state * * Instead, we implement as: * 1) Encode input a starting from state 0 (compute block X) * 2) Encode input b starting from state 0 (compute block Y) * 3) XOR outputs and final states of (1) and (2) together * 4) Find circulation state using final state in (3) * 5) Encode input 0 starting from circulation state (using * lookup table) and XOR with output of (3) * * The main loop processes (1) in compute block X and (2) in compute * block Y. It first implements the denominator and then the numerators. * The most difficult part is implementing the denominators. * * The transfer functions all have denominator H(D) = 1+D+D^3. * Instead of implementing 1/H(D), we instead implement the equivalent * expanded polynomial H^31(D)/H^32(D). This allows us to encode * 32 bits at once. * * 1/H(D) = H^31(D) / H^32(D) * * H^32(D) = 1 + D^32 + D^96 * * H^31(D) = 1 + D + D^2 + + D^4 + * D^7 + D^8 + D^9 + + D^11 + * D^14 + D^15 + D^16 + + D^18 + * D^21 + D^22 + D^23 + + D^25 + * D^28 + D^29 + D^30 + * D^33 + D^34 + D^35 + + D^37 + * D^40 + D^41 + D^42 + + D^44 + * D^47 + D^48 + D^49 + + D^51 + * D^54 + D^55 + D^56 + + D^58 + * D^61 + D^62 + D^63 + + D^65 + * D^68 + D^69 + D^70 + + D^72 + * D^75 + D^76 + D^77 + + D^79 + * D^82 + D^83 + D^84 + + D^86 + * D^89 + D^90 + D^91 + + D^93 * * Since the largest delay is 93 and we apply to 32 bits at a time, this * expanded transfer function requires almost a full quadword (128 bits) of * state variables. However, the TigerSHARC shifter usually only works on 64 * bits at a time. Therefore, H^31(D) is applied in three pieces: * - Delays 0 to 63 operating as left shifts within the first longword * - Delays 65 to 93 operating as left shifts from the second longword * - Delays 33 to 63 operating as right shifts from the second longword * * After applying the denominator, we apply the numerators to get the outputs * and states. The required numerators for A are: * Y: 1 + D^2 + D^3 * W: 1 + D^3 * s1: 1 * s2: D * s3: D^2 * The required numerators for B are: * Y: 1 + D + D^2 + D^3 * W: 1 + D^2 * s1: 1 + D + D^2 * s2: 1 + D^2 * s3: 1 * * The main part of this code gets run twice: One for the noninterleaved * data (a1, b1) and one for the interleaved data (a2, b2). */.global _ctc_trellis_encoder;/* * Arguments: * j4: ctc_encoder_params_t *params * j5: unsigned num_input_bytes (includes a1 and b1) * * [j4 + 0]: bit32x1 *a1, * [j4 + 1]: bit32x1 *b1, * [j4 + 2]: bit32x1 *a2, * [j4 + 3]: bit32x1 *b2, * [j4 + 4]: bit32x1 *y1, * [j4 + 5]: bit32x1 *w1, * [j4 + 6]: bit32x1 *y2, * [j4 + 7]: bit32x1 *w2*/#include "circ_state_tables.h".section program;.align_code 4;_ctc_trellis_encoder://------------------PROLOGUE-------------------------------// j26 = j27 - 64; k26 = k27 - 64;; [j27 += -28] = cjmp; k27 = k27 - 20;; q[j27 + 24] = xR27:24; q[k27 + 16] = yR27:24;; q[j27 + 20] = xR31:28; q[k27 + 12] = yR31:28;;//---------------------------------------------------------///////////////////////////////////////////////////////////////////// Preliminaries//// 1) In xr31:30, compute number of main loop iterations//// The J4 input is the total number of input bytes.// Each iteration processes 64 bits (32 bits of A, 32 bits of B).// One additional iteration is required at the end (due to software// pipelining).// Therefore, the number of iterations is// ceil(num_input_bytes * 8 / 64) + 1// = floor((num_input_bytes + 15)/8// Result is kept in xr31 and lc0// xr30 remains num_input_bytes//// 2) In yr28:26, compute the amount of shift that will be needed for the// circulation state computation. This shift is needed because the// main loop processes 32 bits each of A and B per iteration, but the// input data may end on any multiple of 8.//// For A, simply take the last 3 bits of the state.// If the last 3 bits of num_input_bytes are x, we need to shift the state by// -29, if x = 0// -(4x-3), otherwise// and then mask out the last 3 bits.// Result is kept in yxr28.//// 3) In xr23:20, compute (N mod 7) used for offset into circulation// state table// Result is kept in xr29//// 4) Miscellaneous// - Initializing k4 to zero makes the first main loop output dummy// - Initialize output pointers k6 (for y) and k7 (for w)// - Initialize r0, r1, r2, r5, r7, r11, r13, r15, r17, r19 to zero for main loop// - k5 will be -1 for noninterleaved data, 0 for interleaved data/////////////////////////////////////////////////////////////////// N mod 7 // circ state shift // # iterations // miscr30 = j5; yr27 = 7;;j11:8 = Q[J4 += 4]; r19:18 = r19:18 - r19:18; xr31 = 15; sr17:16 = lshift r17:16 by 16;;k11:8 = Q[J4 += 4]; yr27 = r30 and r27; k5 = k31-1;;xr29 = lshift r30 by 2 (NF); yr28 = 29; xr23 = 0x12492492;;if nyale; do, yr28 = lshift r27 by 2; yr26 = 3; r3:0 = r19:16;;xr21:20 = r29 * r23; xr22 = 7; xr31 = r30 + r31 (NF); r7:4 = r19:16;;if nyale; do, yr28 = r28 - r26; r15:12 = r19:16; xr3 = [j8 += 1];;xr21 = r21 * r22 (I); k7 = k9+k31; xr31 = lshift r31 by -3; yr3 = [j9 += 1];;k4 = k31+k31; yr28 = -r28; lc0 = xr31; sr11 = lshift r11 by 16;;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -