📄 fir_decima_spl.asm
字号:
/*******************************************************************************
Copyright(c) 2000 - 2002 Analog Devices. All Rights Reserved.
Developed by Joint Development Software Application Team, IPDC, Bangalore, India
for Blackfin DSPs ( Micro Signal Architecture 1.0 specification).
By using this module you agree to the terms of the Analog Devices License
Agreement for DSP Software.
********************************************************************************
Module Name : fir_decima_spl.asm
Label name : __fir_decima_spl
Version : 1.3
Change History :
Version Date Author Comments
1.3 11/18/2002 Swarnalatha Tested with VDSP++ 3.0
compiler 6.2.2 on
ADSP-21535 Rev.0.2
1.2 11/13/2002 Swarnalatha Tested with VDSP++ 3.0
on ADSP-21535 Rev. 0.2
1.1 03/27/2002 Nishanth Modified to match
silicon cycle count
1.0 06/02/2001 Nishanth Original
Description : This function performs FIR based Decimation Filter. The
function produces the filtered decimated output for a given
input data. The characteristics of the filter are dependant
on the coefficient values,the number of taps(L) and decimation
index(M) supplied by the calling program.
The coefficients stored in vector `h` are applied to the
elements of vector `x[]`. For filtering, 40 bit accumulator is
used. The most significant 16 bits of the result is stored in
the output vetor `y[ ]`computed according to a decimation
index `L`.
The implementation of a zero phase decimator is demonstrated
in the program.
The implementation provided below does not use a delay line
once it does not require samples older than x(0).
This has been done to avoid overhead due to unnecessary
duplication of input data.
The equation for decimation by M can be expressed as:
y(n) = h(0) * x(n*M) + h(1) * x(n*M-1) + ...... + h(L-1) *
x(n*M+1-L)
This implementation is divided into three stages.
a) In the first stage, it computes only y(0) since all
samples are from delay line (x(0) is copied to delay line.)
y(0) = h(0) * x(0) + h(1) * x(-1) + ... + h(L-1) * x(-L+1)
b) In the second stage, it finds the output samples which
require delay line except y(0), i.e. for the first
Ceil(L/M)-1 output samples
y(1) = h(0) * x(M) + h(1) * x(M-1) + ...+ h(L-1) * x(M-L+1)
...
y(f) = h(0) * x(f*M) + h(1) * x(f*M-1) + ... h(L-1) *
x(f*M-L+1)
,where f = Ceil(L/M) - 1.
This stage has been separated out due to the use of delay
line. There are two inner loops. One finds sum of terms
containing inputs present in delay line and the other, ones in
input buffer.
c) In the third stage, all the remaining output samples are
calculated.
i.e. y(Ceil(L/M)) to y(Nout - 1) are computed in stage 3.
d) After filtering the input, the delay line is updated by the
last L-1 input samples.
Assumptions : 1. This routine assumes that the number of filter
coefficients(L) is even since filtering of each sample is
done by two MACs simultaneously.
2. Decimation factor(M) is assumed to be even so that each
sample to be filtered is having an offset of 2 bytes from
4 byte boundary.
3. It also assumes that L > M. If L <= M, Ceil(L/M)-1 = 0,
i.e., Stage1 need not be done. But loop using lc0 does the
loop atleast once.
4. It also assumes that number of input samples is an integral
multiple of decimation factor. This is for correct updation
of delay line.
5. It assumes that there is one extra location in the delay
line at the end to which x[0] is copied
6. It assumes an input buffer aligned to 32 bit boundary and
first 2 Bytes occupied by any value other than the actual
input.
7. Length of the filter L divided by 2 must be atleast 3 i.e.
L/2 >= 3
Prototype : void fir_decima(const fract16 x[], fract16 y[], int Ni,
fract16 h[], int L, int M, int LBYM, fract16 d[]);
x[] - input array
y[] - output array
Nout - Number of output samples
h[] - Filter coefficient array
L - No. of coefficients
M - Decimation Factor
LBYM - Ceil(L/M)
d[] - Delay line buffer
Registers used : A0, A1, R0-R3, R5-R7, I0-I3, B2, M0, M1, L0-L3, P0-P2, P4, P5,
LC0, LC1.
Performance :
Code size : 260 bytes
Cycle Count : 1530 Cycles (For Ni=256, L=16 and M=2)
*******************************************************************************/
.section L1_code;
.global __fir_decima_spl;
.align 8;
__fir_decima_spl:
[--SP]=(R7:5,P5:4); // Push R7 and P5 to P4
P4 = [SP+32]; // Address of filter coefficients
P1 = [SP+36]; // Number of Coefficients (L)
P0 = [SP+44]; // Ceil(L/M) (Stage 1 + 2 counter)
R7 = [SP+48]; // Address of delay line buffer
P2 = R2; // Number of output samples
I0 = R0; // Address of input buffer
L0 = 0; // Input buffer is a linear Buffer
R2 = P1; // R2 = L
R2 <<= 1; // R2 = 2 * L (length(L2) for circular buffer)
R7 = R7 + R2(S) || R3=[SP+40];
// R7 = Delay line add. + 2*L , R3=Decimation
// Factor (M)
I1 = R7; // Location after end of delay line
L1 = 0; // Delay line buffer is a linear buffer
P2 -= P0; // Stage3 counter
P0 += -1; // Stage2 counter
P5 = P1; // Stage2b counter = L
B2 = P4; // Address of coefficients
I2 = P4;
L2 = R2; // Circular buffering of coefficients(length = 2*L)
P4 = R3; // P4 = M
I3 = R1; // Address of output buffer
L3 = 0; // Linear Buffer
M0 = R2; // M0 = 2*L
R3 = R3 + R3(S) || R7.L = W[I0++] || R7.H = W[I1--];
// R3=2*M as input data is of type fract16
// Fetch x(0) from delay line buffer, Modify delay
// line pointer.
W[I1--] = R7.l; // Store x(0) to location after delay line
R6 = R3; // R6 = 2*M
R6 += 8; // R6 = 2*M + 8
M1 = R6; // M1 = 2*M + 8
P5 -= P4; // Stage2b counter = L-M
// Start of stage 1
A1=A0=0 || R0 = [I1--] || R1 = [I2++];
// Fetch x(0) and x(-1) to R0 , Fetch h(o) and h(1)
// to R1
LSETUP (FIR_DEC_STG1,FIR_DEC_STG1) LC0 = P1 >> 1;
// Loop for finding y(0), count value = L/2
FIR_DEC_STG1:
A1+=R0.H*R1.L, A0+=R0.L*R1.H || R0 = [I1--]|| R1 = [I2++];
// A1+=x(0)*h(0), A0+=x(-1)*h(1)
// Fetch x(-2) and x(-3) into R0.H and R0.L
// Fetch h2 and h3 into R1.L and R1.H (first time)
R7.L = (A0+=A1) || I1 += M0;
// Add the two MACs results, Modify Delay line
// pointer
I1 += 4; // Modify delay line pointer
// End of stage 1
// Start of stage 2
LSETUP (FIR_DEC_STG2_ST,FIR_DEC_STG2_END) LC0 = P0;
// Stage 2 loop (L/M - 1)
P0 = P4;
FIR_DEC_STG2_ST:
A1=A0=0 || R0 = [I0--];
// Fetch samples from input buffer to R0
R6 = R6 + R3(S) || W[I3++] = R7.L || R5 = [I1--];
// Adjust modifier for input buffer
// Store previous result, Fetch samples from delay
// line to R0
LSETUP(FIR_DEC_STG2A,FIR_DEC_STG2A) LC1 = P4 >> 1;
// Loop for terms containing samples from input
// buffer
P4 = P4 + P0; // Increment counter for input buffer loop
FIR_DEC_STG2A:
A1+=R0.H*R1.L, A0+=R0.L*R1.H || R0 = [I0--] || R1 = [I2++];
// Find sum of terms containing samples from input
// buffer
R2 = R2 - R3(S) || I0 += M1;
M1 = R6;
LSETUP(FIR_DEC_STG2B,FIR_DEC_STG2B) LC1 = P5>>1;
// Loop for terms containing samples from delay
// line buffer
FIR_DEC_STG2B:
A1+=R5.H*R1.L, A0+=R5.L*R1.H || R5 = [I1--] || R1 = [I2++];
// Find sum of terms containing samples from delay
// line buffer
P5 -= P0; // Decrement counter for delay line loop
// Adjust modifier, Modify delay line pointer
R7.L=(A0+=A1)||R0 = [I1++M0] ;
// Add the two MACs results, Modify Input buffer
// pointer Increment modifier by 2*M
FIR_DEC_STG2_END:
M0 = R2; // Decrement modifier by 2*M
// End of stage 2
// Start of stage 3
P1+=-4;
MNOP;
NOP;
LSETUP (FIR_DEC_STG3_ST,FIR_DEC_STG3_END) LC0 = P2;
// Loop for Nout - Ceil(L/M)
FIR_DEC_STG3_ST:
A1=A0=0 || R0 = [I0--] || W[I3++] = R7.L;
// Fetch input into R0 and store output present in
// R7.L
A1+=R0.H*R1.L, A0+=R0.L*R1.H || R0 = [I0--] || R1 = [I2++];
A1+=R0.H*R1.L, A0+=R0.L*R1.H || R0 = [I0--] || R1 = [I2++];
LSETUP (FIR_DEC_STG3A,FIR_DEC_STG3A) LC1 = P1 >> 1;
// LC1 is the no. of coefficients(L)/2 - 2
FIR_DEC_STG3A:
A1+=R0.H*R1.L, A0+=R0.L*R1.H || R0 = [I0--] || R1 = [I2++];
// A1+=x(kM)*h(0), A0+=x(kM-1)*h(1)
// Fetch x(M-2) and x(M-3) into R0.H and R0.L
// Fetch h2 and h3 into R1.L and R1.H (first time)
FIR_DEC_STG3_END:
R7.L=(A0+=A1) || I0 += M1;
// Get y to R7.L and modify I0
// End of stage 3
P1 += 3; // Loop counter = L-1
W[I3++] = R7.L || R0.L = W[I0--];
// Fetch last input sample and Store final output
// sample
LSETUP( FIR_DEC_DELUPDATE,FIR_DEC_DELUPDATE) LC0 = P1;
FIR_DEC_DELUPDATE:
R0.L = W[I0--] || W[I1--] = R0.L;
// Update delay line buffer with last input samples
(R7:5,P5:4)=[SP++]; // Pop R7 and P5-P4
RTS;
NOP; //to avoid one stall if LINK or UNLINK happens to be
//the next instruction after RTS in the memory.
__fir_decima_spl.end:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -