📄 voicin.f
字号:
*************************************************************************** ** VOICIN Version 52****************************************************************************** Voicing Detection (VOICIN) makes voicing decisions for each half* frame of input speech. Tentative voicing decisions are made two frames* in the future (2F) for each half frame. These decisions are carried* through one frame in the future (1F) to the present (P) frame where* they are examined and smoothed, resulting in the final voicing* decisions for each half frame. * The voicing parameter (signal measurement) column vector (VALUE)* is based on a rectangular window of speech samples determined by the* window placement algorithm. The voicing parameter vector contains the* AMDF windowed maximum-to-minimum ratio, the zero crossing rate, energy* measures, reflection coefficients, and prediction gains. The voicing* window is placed to avoid contamination of the voicing parameter vector* with speech onsets. * The input signal is then classified as unvoiced (including* silence) or voiced. This decision is made by a linear discriminant* function consisting of a dot product of the voicing decision* coefficient (VDC) row vector with the measurement column vector* (VALUE). The VDC vector is 2-dimensional, each row vector is optimized* for a particular signal-to-noise ratio (SNR). So, before the dot* product is performed, the SNR is estimated to select the appropriate* VDC vector. * The smoothing algorithm is a modified median smoother. The* voicing discriminant function is used by the smoother to determine how* strongly voiced or unvoiced a signal is. The smoothing is further* modified if a speech onset and a voicing decision transition occur* within one half frame. In this case, the voicing decision transition* is extended to the speech onset. For transmission purposes, there are* constraints on the duration and transition of voicing decisions. The* smoother takes these constraints into account. * Finally, the energy estimates are updated along with the dither* threshold used to calculate the zero crossing rate (ZC).** Inputs:* VWIN - Voicing window limits* INBUF - Input speech buffer* LPBUF - Low-pass filtered speech buffer* BUFLIM - INBUF and LPBUF limits* HALF - Present analysis half frame number* MINAMD - Minimum value of the AMDF* MAXAMD - Maximum value of the AMDF* MINTAU - Pointer to the lag of the minimum AMDF value* IVRC(2) - Inverse filter's RC's* OBOUND - Onset boundary descriptions* AF - The analysis frame number* Output:* VOIBUF(2,0:AF) - Buffer of voicing decisions* Internal:* QS - Ratio of preemphasized to full-band energies* RC1 - First reflection coefficient* AR_B - Product of the causal forward and reverse pitch prediction gains* AR_F - Product of the noncausal forward and rev. pitch prediction gains* ZC - Zero crossing rate* DITHER - Zero crossing threshold level* MAXMIN - AMDF's 1 octave windowed maximum-to-minimum ratio* MINPTR - Location of minimum AMDF value* NVDC - Number of elements in each VDC vector* NVDCL - Number of VDC vectors* VDCL - SNR values corresponding to the set of VDC's* VDC - 2-D voicing decision coefficient vector* VALUE(9) - Voicing Parameters* VOICE(2,3)- History of LDA results* LBE - Ratio of low-band instantaneous to average energies* FBE - Ratio of full-band instantaneous to average energies* LBVE - Low band voiced energy* LBUE - Low band unvoiced energy* FBVE - Full band voiced energy* FBUE - Full band unvoiced energy* OFBUE - Previous full-band unvoiced energy* OLBUE - Previous low-band unvoiced energy* REF - Reference energy for initialization and DITHER threshold* SNR - Estimate of signal-to-noise ratio* SNR2 - Estimate of low-band signal-to-noise ratio* SNRL - SNR level number* OT - Onset transition present* VSTATE - Decimal interpretation of binary voicing classifications* FIRST - First call flag SUBROUTINE VOICIN( VWIN, INBUF, LPBUF, BUFLIM, HALF, 1 MINAMD, MAXAMD, MINTAU, IVRC, OBOUND, VOIBUF, AF )* Global Variables: INCLUDE 'contrl.fh' INCLUDE 'vcomm.fh' REAL MINAMD, MAXAMD INTEGER VWIN(2), BUFLIM(4), HALF, MINTAU INTEGER AF, OBOUND(AF) REAL INBUF(BUFLIM(1):BUFLIM(2)) REAL LPBUF(BUFLIM(3):BUFLIM(4)) INTEGER VOIBUF(2,0:AF) INTEGER ZC, LBE, FBE INTEGER I, SNRL, VSTATE REAL DITHER, SNR, SNR2, IVRC(2) REAL MAXMIN, QS, RC1, AR_B REAL AR_F REAL VOICE(2,3) REAL VALUE(9) LOGICAL OT* Declare and initialize filters: INTEGER LBVE, LBUE, FBVE, FBUE, OFBUE, OLBUE, SFBUE, SLBUE INTEGER REF PARAMETER (REF = 3000) LOGICAL FIRST DATA FIRST /.TRUE./, DITHER/20/, NVDC/10/, NVDCL/5/ DATA VDCL/600,450,300,200,6*0/* Voicing Decision Parameter vector (* denotes zero coefficient):** * MAXMIN* LBE/LBVE* ZC* RC1* QS* IVRC2* aR_B* aR_F* * LOG(LBE/LBVE)* Define 2-D voicing decision coefficient vector according to the voicing* parameter order above. Each row (VDC vector) is optimized for a specific* SNR. The last element of the vector is the constant.* E ZC RC1 Qs IVRC2 aRb aRf c DATA VDC/ 1 0, 1714, -110, 334, -4096, -654, 3752, 3769, 0, 1181, 1 0, 874, -97, 300, -4096, -1021, 2451, 2527, 0, -500, 1 0, 510, -70, 250, -4096, -1270, 2194, 2491, 0, -1500, 1 0, 500, -10, 200, -4096, -1300, 2000, 2000, 0, -2000, 1 0, 500, 0, 0, -4096, -1300, 2000, 2000, 0, -2500, 1 50*0/ IF (FIRST) THEN LBVE = REF FBVE = REF FBUE = REF/16 OFBUE = REF/16 LBUE = REF/32 OLBUE = REF/32 SNR = 64*(FBVE/FBUE) FIRST = .FALSE. END IF* The VOICE array contains the result of the linear discriminant function * (analog values). The VOIBUF array contains the hard-limited binary * voicing decisions. The VOICE and VOIBUF arrays, according to FORTRAN * memory allocation, are addressed as:** (half-frame number, future-frame number)** | Past | Present | Future1 | Future2 |* | 1,0 | 2,0 | 1,1 | 2,1 | 1,2 | 2,2 | 1,3 | 2,3 | ---> time** Update linear discriminant function history each frame: IF (HALF .EQ. 1) THEN VOICE(1,1)=VOICE(1,2) VOICE(2,1)=VOICE(2,2) VOICE(1,2)=VOICE(1,3) VOICE(2,2)=VOICE(2,3) MAXMIN = MAXAMD/MAX(MINAMD,1.) END IF* Calculate voicing parameters twice per frame: CALL VPARMS( VWIN, INBUF, LPBUF, BUFLIM, HALF, DITHER, MINTAU, 1 ZC, LBE, FBE, QS, RC1, AR_B, AR_F )* Estimate signal-to-noise ratio to select the appropriate VDC vector.* The SNR is estimated as the running average of the ratio of the* running average full-band voiced energy to the running average* full-band unvoiced energy. SNR filter has gain of 63. SNR = NINT( 63*( SNR + FBVE/FLOAT(MAX(FBUE,1)) )/64.) SNR2 = (SNR*FBUE)/MAX(LBUE,1)* Quantize SNR to SNRL according to VDCL thresholds. SNRL = 1 DO SNRL = 1, NVDCL-1 IF (SNR2 .GT. VDCL(SNRL)) GOTO 69 END DO* (Note: SNRL = NVDCL here)69 CONTINUE* Linear discriminant voicing parameters: VALUE(1) = MAXMIN VALUE(2) = FLOAT(LBE)/MAX(LBVE,1) VALUE(3) = ZC VALUE(4) = RC1 VALUE(5) = QS VALUE(6) = IVRC(2) VALUE(7) = AR_B VALUE(8) = AR_F* Evaluation of linear discriminant function: VOICE(HALF,3) = VDC(10,SNRL) DO I = 1, 9 VOICE(HALF,3) = VOICE(HALF,3) + VDC(I,SNRL)*VALUE(I) END DO* Classify as voiced if discriminant > 0, otherwise unvoiced* Voicing decision for current half-frame: 1 = Voiced; 0 = Unvoiced IF (VOICE(HALF,3) .GT. 0.0) THEN VOIBUF(HALF,3)=1 ELSE VOIBUF(HALF,3)=0 END IF* Skip voicing decision smoothing in first half-frame: IF (HALF .EQ. 1) GOTO 99* Voicing decision smoothing rules (override of linear combination):** Unvoiced half-frames: At least two in a row.* --------------------** Voiced half-frames: At least two in a row in one frame.* ------------------- Otherwise at least three in a row.* (Due to the way transition frames are encoded)** In many cases, the discriminant function determines how to smooth.* In the following chart, the decisions marked with a * may be overridden.** Voicing override of transitions at onsets:* If a V/UV or UV/V voicing decision transition occurs within one-half* frame of an onset bounding a voicing window, then the transition is* moved to occur at the onset.** P 1F* ----- -----* 0 0 0 0* 0 0 0* 1 (If there is an onset there)* 0 0 1* 0* (Based on 2F and discriminant distance)* 0 0 1 1* 0 1* 0 0 (Always)* 0 1* 0* 1 (Based on discriminant distance)* 0* 1 1 0* (Based on past, 2F, and discriminant distance)* 0 1* 1 1 (If there is an onset there)* 1 0* 0 0 (If there is an onset there)* 1 0 0 1* 1 0* 1* 0 (Based on discriminant distance)* 1 0* 1 1 (Always)* 1 1 0 0* 1 1 0* 1* (Based on 2F and discriminant distance)* 1 1 1* 0 (If there is an onset there)* 1 1 1 1** Determine if there is an onset transition between P and 1F.* OT (Onset Transition) is true if there is an onset between * P and 1F but not after 1F. OT = (AND(OBOUND(1), 2) .NE. 0 .OR. OBOUND(2) .EQ. 1) 1 .AND. AND(OBOUND(3), 1) .EQ. 0* Multi-way dispatch on voicing decision history: VSTATE = VOIBUF(1,1)*8 + VOIBUF(2,1)*4 + VOIBUF(1,2)*2 + VOIBUF(2,2) GOTO (99,1,2,99,4,5,6,7,8,99,10,11,99,13,14,99) VSTATE+11 IF (OT .AND. VOIBUF(1,3) .EQ. 1) VOIBUF(1,2) = 1 GOTO 992 IF (VOIBUF(1,3) .EQ. 0 .OR. 1 VOICE(1,2) .LT. -VOICE(2,2)) THEN VOIBUF(1,2) = 0 ELSE VOIBUF(2,2) = 1 END IF GOTO 994 VOIBUF(2,1) = 0 GOTO 995 IF (VOICE(2,1) .LT. -VOICE(1,2)) THEN VOIBUF(2,1) = 0 ELSE VOIBUF(1,2) = 1 END IF GOTO 99* VOIBUF(2,0) must be 06 IF (VOIBUF(1,0) .EQ. 1 .OR. 1 VOIBUF(1,3) .EQ. 1 .OR. 1 VOICE(2,2) .GT. VOICE(1,1)) THEN VOIBUF(2,2) = 1 ELSE VOIBUF(1,1) = 1 END IF GOTO 997 IF (OT) VOIBUF(2,1) = 0 GOTO 998 IF (OT) VOIBUF(2,1) = 1 GOTO 9910 IF (VOICE(1,2) .LT. -VOICE(2,1)) THEN VOIBUF(1,2) = 0 ELSE VOIBUF(2,1) = 1 END IF GOTO 9911 VOIBUF(2,1) = 1 GOTO 9913 IF (VOIBUF(1,3) .EQ. 0 .AND. VOICE(2,2) .LT. -VOICE(1,2)) THEN VOIBUF(2,2) = 0 ELSE VOIBUF(1,2) = 1 END IF GOTO 9914 IF (OT .AND. VOIBUF(1,3) .EQ. 0) VOIBUF(1,2) = 0* GOTO 9999 CONTINUE* Now update parameters:* ----------------------** During unvoiced half-frames, update the low band and full band unvoiced* energy estimates (LBUE and FBUE) and also the zero crossing* threshold (DITHER). (The input to the unvoiced energy filters is* restricted to be less than 10dB above the previous inputs of the* filters.)* During voiced half-frames, update the low-pass (LBVE) and all-pass * (FBVE) voiced energy estimates. IF (VOIBUF(HALF,3) .EQ. 0) THEN SFBUE = NINT(( 63*SFBUE + 8*MIN(FBE,3*OFBUE) )/64.) FBUE = SFBUE/8 OFBUE = FBE SLBUE = NINT(( 63*SLBUE + 8*MIN(LBE,3*OLBUE) )/64.) LBUE = SLBUE/8 OLBUE = LBE ELSE LBVE = NINT(( 63*LBVE + LBE )/64.) FBVE = NINT(( 63*FBVE + FBE )/64.) END IF* Set dither threshold to yield proper zero crossing rates in the* presence of low frequency noise and low level signal input.* NOTE: The divisor is a function of REF, the expected energies. DITHER = MIN(MAX( 64*SQRT(FLOAT(LBUE*LBVE)) / REF,1.),20.)* Print Test Data IF( LISTL.GE.3 ) THEN IF(HALF.EQ.1) WRITE(FDEBUG,930) VWIN,MINAMD,MAXAMD,MINTAU,IVRC930 FORMAT(' Voicing:VWIN MINA MAXA MINTAU IVRC1 IVRC2'/ 1 5X,2I4,2F9.1,I8,2F7.3/ 1 ' HALF DISCR MAX/MIN LE/LVE ZC RC1 QS IVRC2' 1 ' aR_B aR_F : DITH LBE FBE LBVE FBVE LBUE FBUE', 1 ' SNR SNRL VS OT') WRITE(FDEBUG,940) HALF, VOICE(HALF,3), (VALUE(I),I=1,8), DITHER, 1 LBE, FBE, LBVE, FBVE, LBUE, FBUE, 1 SNR, SNRL, VSTATE, OT940 FORMAT(1X,I3,':',F8.0,F9.1,F7.3,F7.2,5F7.3,F5.1,6I6,F5.1,2I3,L3) END IF* Voicing decisions are returned in VOIBUF. RETURN END
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -