📄 rasta.c
字号:
/************************************************************************ * * * ROUTINES IN THIS FILE: * * * * main(): main calling routine * * * ************************************************************************//******************************************************************** Scope of AlgorithmThe actual processing is split up into a few pieces: 1) power spectral analysis 2) auditory spectrum computation 3) compression (possibly adaptive) 4) temporal filtering of each channel trajectory 5) expansion (also possibly adaptive) 6) postprocessing (e.g., preemphasis, loudness compression) 7) autoregressive all-pole modeling (cepstral coefficients)and i/o can either be ascii, binary (shorts on the input and floatson the output), or standard ESPS files.Since the lowest frequency and highest frequency bands extendinto forbidden territory (negative or greater-than-Nyquistfrequencies), they are ignored for most of the analysis. Thisis done by computing a variable called first_good (which for1 bark spacing is 1) and generally ignoring the first and last``first_good'' channels. Just prior to the all-pole modeling,the values from the good channels are copied over to the questionableones. (Note that ``first_good'' is available to most routinesvia a pointer to a general parameter structure that has suchuseful things as the number of filters, the analysis stepsizein msec, the sampling rate, etc. . See rasta.h fora definition of this structure).This program (Rasta 2.0) implements the May 1994 version of RASTA-PLPanalysis. It incorporates several primary pieces:1) PLP Analysis - as described in Hermansky's 1990 JASA paper,the basic form of spectral analysis is Perceptual Linear Prediction,or PLP. This computes the cepstral parameters of an all-pole modelof an auditory spectrum, which is a power spectrum thathas been frequency warped to the bark scale, smoothed byan asymmetric critical-band trapezoid function,down-sampled into approximately 1 Bark intervals,cube root compressed for an intensity-loudness transformation, and weighted by a fixed equal loudness curve. 2) RASTA filtering - as described in several Hermansky and Morganpapers, the basic idea here is to bandpass filter the trajectoriesof the spectral parameters. In the case of RASTA-PLP, this filteringis done on a nonlinear transformation of an auditory-like spectrum, prior to the autoregressive modeling in PLP.3) J-processing - For RASTA, the bandpass filtering is done in the log domain. An alternative is to use the J-familyof log-like curves y = log(1 + Jx)where J is a constant that can appears to be optimally set whenit is inversely proportional to the noise power, (currently typically1/3 of the inverse noise), x is thecritical band value, and y is the non-linearly transformed critical band value.Rather than do the true inverse, which would be x = (exp(y) - 1)/Jand could get negative values, we use x' = (exp(y))/JThis prevents negative values, and in doing so effectively adds noisefloor by adding 1/J to the true inverse. One way of doing J-processing is to pick one constant J value and enterthis value at the command line. This J value should be dependent on theSNR of the speech. We also may want to estimate the noise level for adaptive settings of the J-parameter during the utterance. Both methods of picking J should be handled with care. For the firstcase, see the README file for a discussion of the perils of using evena default J if it is too far from what you really need;if the application situation is relatively fixed, you are better off makingan off-line noise measurement to get a good J value; in any event someexperimentation will soon show the proper constants involved for a problem.For the second case, noise level is estimated for adaptive settings of theJ-parameter during the utterance. This should be done with care as well, as the use of a time-varying J brings in a new complication that you mustconsider in the training and recognition, since changing J's over a timeseries introduces a new source of variability in the analysis that mustbe accounted for. The different J values, as required by differing noiseconditions, generates different spectral shapes and dynamics of the spectra.This means that the training system must contend with a new source of variability due to the change in processing strategy that is adaptivelydetermined from the data. One approach to handle this variability is bydoing spectral mapping. In the current version, Spectral mapping is performed whenever J-Rasta processing is used with adaptive Js.Spectral mapping - transform the spectrum processed with a J-value corresponding to noisy speech to a spectrum processed with a J value for clean speech. In other words, we find a mapping between log(1+xJ) and log(1+xJref) where Jref is the reference J, i.e. J value for clean speech. For this approach, we have used a multiple regression within each critical band. In principle, this solution reduces the variability due to the choice of J, and so minimizes the effect on the training process. How this works is:1) Training of the recognizer: -- Train the recognizer with clean speech processed with J = 1.0e-6, a suitable J value for clean speech.2) Finding the relationship between spectrum corresponding to different Jah values to the spectrum corresponding to J = 1.0e-6 -- For each of the Jahs in the set {3.16e-7, 1.0e-7, 3.16e-8, 1.0e-8, 3.16e-9, 1.0e-9}, find a mapping relationship of the corresponding bandpass filtered spectrum to the spectrum corresponding to J = 1.0e-6. In other words, find a set of mapping coefficients for each Jah to 1.0e-6. The mapping method will be discussed later.3) Extracting the speech features for the testing speech data -- obtain the critical band values as in PLP -- estimate the noise energy and thus the Jah value. Call this Jah value J(orig). -- Pick a Jah from the set {3.16e-7, 1.0e-7, 3.16e-8, 1.0e-8,3.16e-9, 1.0e-9} that is closest to J(orig) and call this J(quant). -- perform the non-linear transformation of the spectrum using log (1+J(quant)* X). -- bandpass filter the transformed critical band values. -- use the set of mapping coefficients for J(quant) to do the spectral mapping or spectral transformation. -- preemphasize via the equal-loudness curve and then perform amplitude compression using the intensity-loudness power law as in PLP -- take the inverse of the non-linear function. -- compute the cepstral parameters for the AR model. How regression coefficients are computed in our experiment: In order to map critical band values(after bandpass filtering) processed with different J values to those processed with J = 1.0e-6, J-Rasta filtered critical band outputs from 10 speakers(excluded from the training and testing sets) are used to train multiple regression models. For example, for mapping from J= J(quant) to J = 1.0e-6, the regression equation can be written as: Yi = B2i* X2 + B3i* X3 + ... + B16i * X16 + B17i (**)where Yi = i th bandpass filtered critical band processed with J=1.0e-6 i = 2, .. 16 X2, X3, ... X16 2rd to 16th bandpass filtered critical band values processed with J = J(quant), where J(quant) is in the set {3.16e-7, 1.0e-7, 3.16e-8, 1.0e-8,3.16e-9, 1.0e-9} B2i, B3i ... B17i are the 16 mapping coefficients For equation (**), we have made the assumption that the sampling frequency is 8kHz and the number of critical bands is seventeen. The first and the last bands extend into forbidden territory -- negative or greater than Nyquist frequencies. Thus the the two bands are ignored for most of the analysis. Their values are made equal to the adjacent band's just before the autoregressive all-pole modeling. This is why we only make Yi dependent on bandpass filtered critical bands X2,X3,... X16, altogether fifteen critical bands. The default mapping coefficients sets is stored in map_weights.dat. This is suitable for s.f. 8kHz, 17 critical bands. For users who have a different setup, they may want to find their own mapping coefficients set. This could be done by using the command options -R and -f. Command -R allows you to get bandpass filtered critical band values as output instead of cepstral coefficients. These outputs could be used as regression data. A simple multiple regression routine can be used to generate the mapping coefficients from these regression data. These mapping coefficients can be stored in a file. Command -f allows you to use this file to replace the default file map_weights.dat. The format of this file is: beginning of file <Total number of Jahs, for the example shown above, it is [7] > <# of critical bands, for the setup for 8kHz, this is [15]> <# of mapping coefficients/band, for the setup for 8kHz, this is [16]> <The J for clean speech, [1.0e-6]> <mapping coefficients for Y2, [B22, B32, B42,...]> <mapping coefficients for Y3, [B23, B33, B43 ...]> | | | V <mapping coefficients for Y16> <The second largest Jah besides 1e-6, [3.16e-7]> | | mapping coefficients | | V <The third largest Jah besides 1e-6, [1.0e-7]> | | mapping coefficients | | V . . . . end of file*********************************************************************/#include <stdio.h>#include "others/mymath.h"#include "rasta.h"#include "functions.h"/******************************************************/void param_init(struct param* pptr){ pptr->winsize = TYP_WIN_SIZE; pptr->stepsize = TYP_STEP_SIZE; pptr->sampfreq = PHONE_SAMP_FREQ; pptr->polepos = POLE_DEFAULT; pptr->order = TYP_MODEL_ORDER; pptr->lift = TYP_ENHANCE; pptr->winco = HAMMING_COEF; pptr->rfrac = ONE; pptr->jah = JAH_DEFAULT; pptr->gainflag = TRUE; pptr->lrasta = FALSE; pptr->jrasta = FALSE; pptr->cJah = FALSE; pptr->mapcoef_fname = "map_weights.dat"; pptr->crbout = FALSE; pptr->comcrbout = FALSE; pptr->infname = "-"; /* used for stdin */ pptr->outfname = "-"; /* used for stdout */ pptr->num_fname = NULL; /* file with RASTA polynomial numer\n") */ pptr->denom_fname = NULL; /* file with RASTA polynomial denom\n") */ pptr->ascin = FALSE; pptr->ascout = FALSE; pptr->debug = FALSE; pptr->smallmask = FALSE; pptr->espsin = FALSE; pptr->espsout = FALSE; pptr->matin = FALSE; pptr->matout = FALSE; pptr->swapbytes = FALSE; pptr->nfilts = NOT_SET; pptr->nout = NOT_SET; pptr->online = FALSE; /* If set, do frame-by-frame analysis rather than reading in whole file first */ pptr->HPfilter = FALSE; pptr->history = FALSE; pptr->hist_fname = "history.out";}void init_param(struct fvec *sptr, struct param *pptr) { int overlap, usable_length; float tmp;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -