📄 voicin.f,v
字号:
head 1.10;access;symbols;locks; strict;comment @* @;1.10date 96.03.29.17.59.14; author jaf; state Exp;branches;next 1.9;1.9date 96.03.29.17.54.46; author jaf; state Exp;branches;next 1.8;1.8date 96.03.27.18.19.54; author jaf; state Exp;branches;next 1.7;1.7date 96.03.26.20.00.06; author jaf; state Exp;branches;next 1.6;1.6date 96.03.26.19.38.09; author jaf; state Exp;branches;next 1.5;1.5date 96.03.19.20.43.45; author jaf; state Exp;branches;next 1.4;1.4date 96.03.19.15.00.58; author jaf; state Exp;branches;next 1.3;1.3date 96.03.19.00.10.49; author jaf; state Exp;branches;next 1.2;1.2date 96.03.13.16.09.28; author jaf; state Exp;branches;next 1.1;1.1date 96.02.07.14.50.28; author jaf; state Exp;branches;next ;desc@@1.10log@Avoided using VALUE(9), although it shouldn't affect the function ofthe code at all, because it was always multiplied by VDC(9,SNRL),which is 0 for all values of SNRL. Still, if VALUE(9) had an initialvalue of IEEE NaN, it might cause trouble (I don't know how IEEEdefines Nan * 0. It should either be NaN or 0.)@text@*************************************************************************** ** VOICIN Version 52** $Log: voicin.f,v $* Revision 1.9 1996/03/29 17:54:46 jaf* Added a few comments about the accesses made to argument array VOIBUF* and the local saved array VOICE.** Revision 1.8 1996/03/27 18:19:54 jaf* Added an assignment to VSTATE that does not affect the function of the* program at all. The only reason I put it in was so that the tracing* statements at the end, when enabled, will print a consistent value for* VSTATE when HALF .EQ. 1, rather than a garbage value that could change* from one call to the next.** Revision 1.7 1996/03/26 20:00:06 jaf* Removed the inclusion of the file "vcomm.fh", and put its contents* into this file. It was included nowhere else but here.** Revision 1.6 1996/03/26 19:38:09 jaf* Commented out trace statements.** Revision 1.5 1996/03/19 20:43:45 jaf* Added comments about which indices of OBOUND and VOIBUF can be* accessed, and whether they are read or written. VOIBUF is fairly* messy.** Revision 1.4 1996/03/19 15:00:58 jaf* Moved the DATA statements for the *VDC* variables later, as it is* apparently illegal to have DATA statements before local variable* declarations.** Revision 1.3 1996/03/19 00:10:49 jaf* Heavily commented the local variables that are saved from one* invocation to the next, and how the local variable FIRST is used to* avoid the need to assign most of them initial values with DATA* statements.** A few should be initialized, but aren't. I've guessed initial values* for two of these, SFBUE and SLBUE, and I've convinced myself that for* VOICE, the effects of uninitialized values will die out after 2 or 3* frame times. It would still be good to choose initial values for* these, but I don't know what reasonable values would be (0 comes to* mind).** Revision 1.2 1996/03/13 16:09:28 jaf* Comments added explaining which of the local variables of this* subroutine need to be saved from one invocation to the next, and which* do not.** WARNING! Some of them that should are never given initial values in* this code. Hopefully, Fortran 77 defines initial values for them, but* even so, giving them explicit initial values is preferable.** WARNING! VALUE(9) is used, but never assigned a value. It should* probably be eliminated from the code.** Revision 1.1 1996/02/07 14:50:28 jaf* Initial revision******************************************************************************* Voicing Detection (VOICIN) makes voicing decisions for each half* frame of input speech. Tentative voicing decisions are made two frames* in the future (2F) for each half frame. These decisions are carried* through one frame in the future (1F) to the present (P) frame where* they are examined and smoothed, resulting in the final voicing* decisions for each half frame. * The voicing parameter (signal measurement) column vector (VALUE)* is based on a rectangular window of speech samples determined by the* window placement algorithm. The voicing parameter vector contains the* AMDF windowed maximum-to-minimum ratio, the zero crossing rate, energy* measures, reflection coefficients, and prediction gains. The voicing* window is placed to avoid contamination of the voicing parameter vector* with speech onsets. * The input signal is then classified as unvoiced (including* silence) or voiced. This decision is made by a linear discriminant* function consisting of a dot product of the voicing decision* coefficient (VDC) row vector with the measurement column vector* (VALUE). The VDC vector is 2-dimensional, each row vector is optimized* for a particular signal-to-noise ratio (SNR). So, before the dot* product is performed, the SNR is estimated to select the appropriate* VDC vector. * The smoothing algorithm is a modified median smoother. The* voicing discriminant function is used by the smoother to determine how* strongly voiced or unvoiced a signal is. The smoothing is further* modified if a speech onset and a voicing decision transition occur* within one half frame. In this case, the voicing decision transition* is extended to the speech onset. For transmission purposes, there are* constraints on the duration and transition of voicing decisions. The* smoother takes these constraints into account. * Finally, the energy estimates are updated along with the dither* threshold used to calculate the zero crossing rate (ZC).** Inputs:* VWIN - Voicing window limits* The indices read of arrays VWIN, INBUF, LPBUF, and BUFLIM* are the same as those read by subroutine VPARMS.* INBUF - Input speech buffer* LPBUF - Low-pass filtered speech buffer* BUFLIM - INBUF and LPBUF limits* HALF - Present analysis half frame number* MINAMD - Minimum value of the AMDF* MAXAMD - Maximum value of the AMDF* MINTAU - Pointer to the lag of the minimum AMDF value* IVRC(2) - Inverse filter's RC's* Only index 2 of array IVRC read under normal operation.* (Index 1 is also read when debugging is turned on.)* OBOUND - Onset boundary descriptions* Indices 1 through 3 read if (HALF .NE. 1), otherwise untouched.* AF - The analysis frame number* Output:* VOIBUF(2,0:AF) - Buffer of voicing decisions* Index (HALF,3) written.* If (HALF .EQ. 1), skip down to "Read (HALF,3)" below.* Indices (1,2), (2,1), (1,2), and (2,2) read.* One of the following is then done:* read (1,3) and possibly write (1,2)* read (1,3) and write (1,2) or (2,2)* write (2,1)* write (2,1) or (1,2)* read (1,0) and (1,3) and then write (2,2) or (1,1)* no reads or writes on VOIBUF* Finally, read (HALF,3)* Internal:* QS - Ratio of preemphasized to full-band energies* RC1 - First reflection coefficient* AR_B - Product of the causal forward and reverse pitch prediction gains* AR_F - Product of the noncausal forward and rev. pitch prediction gains* ZC - Zero crossing rate* DITHER - Zero crossing threshold level* MAXMIN - AMDF's 1 octave windowed maximum-to-minimum ratio* MINPTR - Location of minimum AMDF value* NVDC - Number of elements in each VDC vector* NVDCL - Number of VDC vectors* VDCL - SNR values corresponding to the set of VDC's* VDC - 2-D voicing decision coefficient vector* VALUE(9) - Voicing Parameters* VOICE(2,3)- History of LDA results* On every call when (HALF .EQ. 1), VOICE(*,I+1) is* shifted back to VOICE(*,I), for I=1,2.* VOICE(HALF,3) is written on every call.* Depending on several conditions, one or more of* (1,1), (1,2), (2,1), and (2,2) might then be read.* LBE - Ratio of low-band instantaneous to average energies* FBE - Ratio of full-band instantaneous to average energies* LBVE - Low band voiced energy* LBUE - Low band unvoiced energy* FBVE - Full band voiced energy* FBUE - Full band unvoiced energy* OFBUE - Previous full-band unvoiced energy* OLBUE - Previous low-band unvoiced energy* REF - Reference energy for initialization and DITHER threshold* SNR - Estimate of signal-to-noise ratio* SNR2 - Estimate of low-band signal-to-noise ratio* SNRL - SNR level number* OT - Onset transition present* VSTATE - Decimal interpretation of binary voicing classifications* FIRST - First call flag* * This subroutine maintains local state from one call to the next. If* you want to switch to using a new audio stream for this filter, or* reinitialize its state for any other reason, call the ENTRY* INITVOICIN.* SUBROUTINE VOICIN( VWIN, INBUF, LPBUF, BUFLIM, HALF, 1 MINAMD, MAXAMD, MINTAU, IVRC, OBOUND, VOIBUF, AF )* Global Variables: INCLUDE 'contrl.fh'* Arguments INTEGER VWIN(2), BUFLIM(4) REAL INBUF(BUFLIM(1):BUFLIM(2)) REAL LPBUF(BUFLIM(3):BUFLIM(4)) INTEGER HALF REAL MINAMD, MAXAMD INTEGER MINTAU REAL IVRC(2) INTEGER AF, OBOUND(AF), VOIBUF(2,0:AF)* Parameters/constants INTEGER REF PARAMETER (REF = 3000)* Voicing coefficient and Linear Discriminant Analysis variables:* Max number of VDC's and VDC levels INTEGER MAXVDC, MXVDCL PARAMETER (MAXVDC = 10, MXVDCL = 10)* The following are not Fortran PARAMETER's, but they are* initialized with DATA statements, and never modified.* Actual number of VDC's and levels INTEGER NVDC, NVDCL REAL VDC(MAXVDC, MXVDCL), VDCL(MXVDCL)* Local variables that need not be saved* Note:* * VALUE(1) through VALUE(8) are assigned values, but VALUE(9)* never is. Yet VALUE(9) is read in the loop that begins "DO I =* 1, 9" below. I believe that this doesn't cause any problems in* this subroutine, because all VDC(9,*) array elements are 0, and* this is what is multiplied by VALUE(9) in all cases. Still, it* would save a multiplication to change the loop to "DO I = 1, 8". INTEGER ZC, LBE, FBE INTEGER I, SNRL, VSTATE REAL SNR2 REAL QS, RC1, AR_B, AR_F REAL VALUE(9) LOGICAL OT* Local state* WARNING!* * VOICE, SFBUE, and SLBUE should be saved from one invocation to* the next, but they are never given an initial value.* * Does Fortran 77 specify some default initial value, like 0, or* is it undefined? If it is undefined, then this code should be* corrected to specify an initial value.* * For VOICE, note that it is "shifted" in the statement that* begins "IF (HALF .EQ. 1) THEN" below. Also, uninitialized* values in the VOICE array can only affect entries in the VOIBUF* array that are for the same frame, or for an older frame. Thus* the effects of uninitialized values in VOICE cannot linger on* for more than 2 or 3 frame times.* * For SFBUE and SLBUE, the effects of uninitialized values can* linger on for many frame times, because their previous values* are exponentially decayed. Thus it is more important to choose* initial values for these variables. I would guess that a* reasonable initial value for SFBUE is REF/16, the same as used* for FBUE and OFBUE. Similarly, SLBUE can be initialized to* REF/32, the same as for LBUE and OLBUE.* * These guessed initial values should be validated by re-running* the modified program on some audio samples.* REAL DITHER, SNR REAL MAXMIN REAL VOICE(2,3)* Declare and initialize filters: INTEGER LBVE, LBUE, FBVE, FBUE, OFBUE, OLBUE, SFBUE, SLBUE LOGICAL FIRST DATA FIRST /.TRUE./, DITHER/20/* * The following variables are saved from one invocation to the* next, but are not initialized with DATA statements. This is* acceptable, because FIRST is initialized ot .TRUE., and the* first time that this subroutine is then called, they are all* given initial values.* * SNR* LBVE, LBUE, FBVE, FBUE, OFBUE, OLBUE* * MAXMIN is initialized on the first call, assuming that HALF* .EQ. 1 on first call. This is how ANALYS calls this subroutine.* SAVE DITHER, SNR SAVE MAXMIN SAVE VOICE SAVE LBVE, LBUE, FBVE, FBUE, OFBUE, OLBUE, SFBUE, SLBUE SAVE FIRST* Voicing Decision Parameter vector (* denotes zero coefficient):** * MAXMIN* LBE/LBVE* ZC* RC1* QS* IVRC2* aR_B* aR_F* * LOG(LBE/LBVE)* Define 2-D voicing decision coefficient vector according to the voicing* parameter order above. Each row (VDC vector) is optimized for a specific* SNR. The last element of the vector is the constant.* E ZC RC1 Qs IVRC2 aRb aRf c DATA VDC/ 1 0, 1714, -110, 334, -4096, -654, 3752, 3769, 0, 1181, 1 0, 874, -97, 300, -4096, -1021, 2451, 2527, 0, -500, 1 0, 510, -70, 250, -4096, -1270, 2194, 2491, 0, -1500, 1 0, 500, -10, 200, -4096, -1300, 2000, 2000, 0, -2000, 1 0, 500, 0, 0, -4096, -1300, 2000, 2000, 0, -2500, 1 50*0/ DATA NVDC/10/, NVDCL/5/ DATA VDCL/600,450,300,200,6*0/ IF (FIRST) THEN LBVE = REF FBVE = REF FBUE = REF/16 OFBUE = REF/16 SFBUE = REF/16 LBUE = REF/32 OLBUE = REF/32 SLBUE = REF/32 SNR = 64*(FBVE/FBUE) FIRST = .FALSE. END IF* The VOICE array contains the result of the linear discriminant function * (analog values). The VOIBUF array contains the hard-limited binary * voicing decisions. The VOICE and VOIBUF arrays, according to FORTRAN * memory allocation, are addressed as:** (half-frame number, future-frame number)** | Past | Present | Future1 | Future2 |* | 1,0 | 2,0 | 1,1 | 2,1 | 1,2 | 2,2 | 1,3 | 2,3 | ---> time** Update linear discriminant function history each frame: IF (HALF .EQ. 1) THEN VOICE(1,1)=VOICE(1,2) VOICE(2,1)=VOICE(2,2) VOICE(1,2)=VOICE(1,3) VOICE(2,2)=VOICE(2,3) MAXMIN = MAXAMD/MAX(MINAMD,1.) END IF* Calculate voicing parameters twice per frame: CALL VPARMS( VWIN, INBUF, LPBUF, BUFLIM, HALF, DITHER, MINTAU, 1 ZC, LBE, FBE, QS, RC1, AR_B, AR_F )* Estimate signal-to-noise ratio to select the appropriate VDC vector.* The SNR is estimated as the running average of the ratio of the* running average full-band voiced energy to the running average* full-band unvoiced energy. SNR filter has gain of 63. SNR = NINT( 63*( SNR + FBVE/FLOAT(MAX(FBUE,1)) )/64.) SNR2 = (SNR*FBUE)/MAX(LBUE,1)* Quantize SNR to SNRL according to VDCL thresholds. SNRL = 1 DO SNRL = 1, NVDCL-1 IF (SNR2 .GT. VDCL(SNRL)) GOTO 69 END DO* (Note: SNRL = NVDCL here)69 CONTINUE* Linear discriminant voicing parameters: VALUE(1) = MAXMIN VALUE(2) = FLOAT(LBE)/MAX(LBVE,1) VALUE(3) = ZC VALUE(4) = RC1 VALUE(5) = QS VALUE(6) = IVRC(2) VALUE(7) = AR_B VALUE(8) = AR_F* Evaluation of linear discriminant function: VOICE(HALF,3) = VDC(10,SNRL) DO I = 1, 8 VOICE(HALF,3) = VOICE(HALF,3) + VDC(I,SNRL)*VALUE(I) END DO* Classify as voiced if discriminant > 0, otherwise unvoiced* Voicing decision for current half-frame: 1 = Voiced; 0 = Unvoiced IF (VOICE(HALF,3) .GT. 0.0) THEN VOIBUF(HALF,3)=1 ELSE VOIBUF(HALF,3)=0 END IF* Skip voicing decision smoothing in first half-frame:* Give a value to VSTATE, so that trace statements below will print* a consistent value from one call to the next when HALF .EQ. 1.* The value of VSTATE is not used for any other purpose when this is* true. VSTATE = -1 IF (HALF .EQ. 1) GOTO 99* Voicing decision smoothing rules (override of linear combination):** Unvoiced half-frames: At least two in a row.* --------------------** Voiced half-frames: At least two in a row in one frame.* ------------------- Otherwise at least three in a row.* (Due to the way transition frames are encoded)** In many cases, the discriminant function determines how to smooth.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -