rasta.c

来自「ears-0.32, linux下有用的语音信号处理工具包」· C语言代码 · 共 559 行 · 第 1/2 页
559 行
/************************************************************************ *                                                                       * *               ROUTINES IN THIS FILE:                                  * *                                                                       * *                      main(): main calling routine                     * *                                                                       * ************************************************************************//********************************************************************	Scope of AlgorithmThe actual processing is split up into a few pieces:        1) power spectral analysis        2) auditory spectrum computation        3) compression (possibly adaptive)        4) temporal filtering of each channel trajectory        5) expansion (also possibly adaptive)        6) postprocessing (e.g., preemphasis, loudness compression)        7) autoregressive all-pole modeling (cepstral coefficients)and i/o can either be ascii, binary (shorts on the input and floatson the output), or standard ESPS files.Since the lowest frequency and highest frequency bands extendinto forbidden territory (negative or greater-than-Nyquistfrequencies), they are ignored for most of the analysis. Thisis done by computing a variable called first_good (which for1 bark spacing is 1) and generally ignoring the first and last``first_good'' channels. Just prior to the all-pole modeling,the values from the good channels are copied over to the questionableones. (Note that ``first_good'' is available to most routinesvia a pointer to a general parameter structure that has suchuseful things as the number of filters, the analysis stepsizein msec, the sampling rate, etc. . See rasta.h fora definition of this structure).This program (Rasta 2.0) implements the May 1994 version of RASTA-PLPanalysis. It incorporates several primary pieces:1) PLP Analysis - as described in Hermansky's 1990 JASA paper,the basic form of spectral analysis is Perceptual Linear Prediction,or PLP. This computes the cepstral parameters of an all-pole modelof an auditory spectrum, which is a power spectrum thathas been frequency warped to the bark scale, smoothed byan asymmetric critical-band trapezoid function,down-sampled into approximately 1 Bark intervals,cube root compressed for an intensity-loudness transformation, and weighted by a fixed equal loudness curve. 2) RASTA filtering - as described in several Hermansky and Morganpapers, the basic idea here is to bandpass filter the trajectoriesof the spectral parameters. In the case of RASTA-PLP, this filteringis done on a nonlinear transformation of an auditory-like spectrum, prior to the autoregressive modeling in PLP.3) J-processing - For RASTA, the bandpass filtering is done in the log domain. An alternative is to use the J-familyof log-like curves	y = log(1 + Jx)where J is a constant that can appears to be optimally set whenit is inversely proportional to the noise power, (currently typically1/3 of the inverse noise), x is thecritical band value, and y is the non-linearly transformed critical band value.Rather than do the true inverse, which would be	x = (exp(y) - 1)/Jand could get negative values, we use	x' = (exp(y))/JThis prevents negative values, and in doing so effectively adds noisefloor by adding 1/J to the true inverse. One way of doing J-processing is to pick one constant J value and enterthis value at the command line. This J value should be dependent on theSNR of the speech. We also may want to estimate the noise level for adaptive settings of the J-parameter during the utterance. Both methods of picking J should be handled with care. For the firstcase, see the README file for a discussion of the perils of using evena default J if it is too far from what you really need;if the application situation is relatively fixed, you are better off makingan off-line noise measurement to get a good J value; in any event someexperimentation will soon show the proper constants involved for a problem.For the second case, noise level is estimated for adaptive settings of theJ-parameter during the utterance. This should be done with care as well, as the use of a time-varying J brings in a new complication that you mustconsider in the training and recognition, since changing J's over a timeseries introduces a new source of variability in the analysis that mustbe accounted for. The different J values, as required by differing noiseconditions, generates different spectral shapes and dynamics of the spectra.This means that the training system must contend with a new source of variability due to the change in processing strategy that is adaptivelydetermined from the data. One approach to handle this variability is bydoing spectral mapping. In the current version, Spectral mapping is performed whenever J-Rasta processing is used with adaptive Js.Spectral mapping - transform the spectrum processed with a J-value                     corresponding to noisy speech to a spectrum processed                   with a J value for clean speech. In other words, we                   find a mapping between log(1+xJ) and log(1+xJref)                    where Jref is the reference J, i.e. J value for clean speech.                   For this approach, we have used a multiple regression                    within each critical band. In principle, this solution                   reduces the variability due to the choice of J, and so                   minimizes the effect on the training process.       How this works is:1) Training of the recognizer:   -- Train the recognizer with clean speech processed with J = 1.0e-6, a       suitable J value for clean speech.2) Finding the relationship between spectrum corresponding to different Jah    values to the spectrum corresponding to J = 1.0e-6   -- For each of the Jahs in the set {3.16e-7, 1.0e-7, 3.16e-8, 1.0e-8,       3.16e-9, 1.0e-9}, find a mapping relationship of the corresponding       bandpass filtered spectrum to the spectrum corresponding to J =      1.0e-6. In other words, find a set of mapping coefficients for each      Jah to 1.0e-6. The mapping method will be discussed later.3) Extracting the speech features for the testing speech data   -- obtain the critical band values as in PLP   -- estimate the noise energy and thus the Jah value. Call this Jah value       J(orig).   -- Pick a Jah from the set {3.16e-7, 1.0e-7, 3.16e-8, 1.0e-8,3.16e-9, 1.0e-9}      that is closest to J(orig) and call this J(quant).   -- perform the non-linear transformation of the spectrum using       log (1+J(quant)* X).   -- bandpass filter the transformed critical band values.   -- use the set of mapping coefficients for J(quant) to do the spectral       mapping or spectral transformation.   -- preemphasize via the equal-loudness curve and then perform amplitude      compression using the intensity-loudness power law as in PLP   -- take the inverse of the non-linear function.    -- compute the cepstral parameters for the AR model.            How regression coefficients are computed in our experiment:    In order to map critical band values(after bandpass filtering) processed    with different J values to those processed with J = 1.0e-6, J-Rasta     filtered critical band outputs from 10 speakers(excluded from the training    and testing sets) are used to train     multiple regression models.      For example, for mapping from J= J(quant) to J = 1.0e-6, the regression    equation can be written as:    Yi = B2i* X2 + B3i* X3 + ... + B16i * X16 + B17i       (**)where Yi = i th bandpass filtered critical band processed with J=1.0e-6           i = 2, .. 16      X2, X3, ... X16    2rd to 16th bandpass filtered critical band values                         processed with J = J(quant), where                         J(quant) is in the set                          {3.16e-7, 1.0e-7, 3.16e-8, 1.0e-8,3.16e-9, 1.0e-9}       B2i, B3i ... B17i     are the 16  mapping coefficients           For equation (**), we have made the assumption that the sampling frequency    is 8kHz and the number of critical bands is seventeen. The first and     the last bands extend into forbidden territory -- negative or greater    than Nyquist frequencies. Thus the the two bands are ignored for most    of the analysis. Their values are made equal to the adjacent band's just    before the autoregressive all-pole modeling. This is why we only make    Yi dependent on bandpass filtered critical bands X2,X3,... X16, altogether    fifteen critical bands.     The default mapping coefficients sets is stored in map_weights.dat.     This is suitable for s.f. 8kHz, 17 critical bands. For users who have     a different setup, they may want to find their own mapping coefficients    set. This could be done by using the command options -R and -f. Command     -R allows you to get bandpass filtered critical band values as output    instead of cepstral coefficients.     These outputs could be used as regression data. A simple multiple    regression routine can be used to generate the mapping coefficients    from these regression data. These mapping coefficients can be stored    in a file. Command -f allows you to use this file to replace the    default file map_weights.dat. The format of this file is:     beginning of file     <Total number of Jahs, for the example shown above, it is [7] >     <# of critical bands, for the setup for 8kHz, this is [15]>    <# of mapping coefficients/band, for the setup for 8kHz, this is [16]>        <The J for clean speech, [1.0e-6]>       <mapping coefficients for Y2, [B22, B32, B42,...]>    <mapping coefficients for Y3, [B23, B33, B43 ...]>        |        |        |        V    <mapping coefficients for Y16>         <The second largest Jah besides 1e-6, [3.16e-7]>                |        |      mapping coefficients        |        |        V       <The third largest Jah besides 1e-6, [1.0e-7]>               |        |      mapping coefficients        |        |        V        .     .     .     .         end of file*********************************************************************/#include <stdio.h>#include "others/mymath.h"#include "rasta.h"#include "functions.h"/******************************************************/void param_init(struct param* pptr){  pptr->winsize  = TYP_WIN_SIZE;   pptr->stepsize = TYP_STEP_SIZE;   pptr->sampfreq = PHONE_SAMP_FREQ;  pptr->polepos = POLE_DEFAULT;  pptr->order = TYP_MODEL_ORDER;  pptr->lift = TYP_ENHANCE;  pptr->winco = HAMMING_COEF;  pptr->rfrac =  ONE;  pptr->jah = JAH_DEFAULT;  pptr->gainflag = TRUE;  pptr->lrasta = FALSE;  pptr->jrasta = FALSE;  pptr->cJah = FALSE;  pptr->mapcoef_fname = "map_weights.dat";  pptr->crbout = FALSE;  pptr->comcrbout = FALSE;  pptr->infname = "-";          /* used for stdin */  pptr->outfname = "-";         /* used for stdout */  pptr->num_fname = NULL;       /* file with RASTA polynomial numer\n") */  pptr->denom_fname = NULL;     /* file with RASTA polynomial denom\n") */  pptr->ascin = FALSE;  pptr->ascout = FALSE;  pptr->debug = FALSE;  pptr->smallmask = FALSE;  pptr->espsin = FALSE;  pptr->espsout = FALSE;  pptr->matin = FALSE;  pptr->matout = FALSE;  pptr->swapbytes = FALSE;  pptr->nfilts = NOT_SET;  pptr->nout = NOT_SET;  pptr->online = FALSE;         /* If set, do frame-by-frame analysis                                   rather than reading in whole file first */  pptr->HPfilter = FALSE;  pptr->history = FALSE;  pptr->hist_fname = "history.out";}void init_param(struct fvec *sptr, struct param *pptr) {  int overlap, usable_length;  float tmp;
rasta.c - 源码说明

本页面展示了「ears-0.32, linux下有用的语音信号处理工具包」中的 rasta.c 源码文件，采用 C语言编程语言编写，共 559 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与linux相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?