get_f0.1

来自「speech signal process tools」· 1 代码 · 共 242 行

1
242
字号
.\" Copyright (c) 1996 Entropic Research Laboratory, Inc.; All rights reserved.\" %W% %G% ERL.ds ]W (c) 1993 - 1996 Entropic Research Laboratory, Inc..TH  GET_F0 1\-ESPS 9/5/96.SH NAME.nfget_f0 \- robust analysis of speech fundamental frequency (pitch tracking).fi.SH SYNOPSIS.Bget_f0[.BI \-P " param_file"][.BI \-{pr} "range"][.BI \-s " range"][.BI \-S " frame_step"][.BI \-i " frame_step"][.BI \-x " debug_level"].I in_file.I out_file.PP.SH DESCRIPTION\fIGet_f0\fR implements a fundamental frequency (F0) estimationalgorithm using the normalized cross correlation function and dynamicprogramming.  The algorithm implemented here is described exactly inthe reference cited below..PPThe input file \fIin_file\fR is a standard FEA_SD sampled-data datafile.  The output file \fIout_file\fR is a FEA file containing 4fields: \fIF0\fR for fundamental frequency estimate, \fIprob_voice\fRfor "probability of voicing", \fIrms\fR for local root mean squaredmeasurements, and \fIac_peak\fR for the peak normalizedcross-correlation value that was found to determine the output F0.The RMS value of each record is computed based on a 30 msec hanningwindow with its left edge placed 5 msec before the beginning of theframe.  In unvoiced regions, \fIac_peak\fI is the largestcross-correlation value found at any lag.  The \fIprob_voice\fR elementonly takes on two values: 0 and 1, whereas in the older \fIformant\fRprogram, it was a graded (though error prone) measure..PPNote that the analysis frame onset is well defined as the point wherethe reference window used for cross-correlation begins.  However theeffective frame size is really a function of the local F0, except forthe RMS measurement as stated above..PPIf \fIin_file\fR is replace by by "-", the standard input is  read.If \fIout_file\fR is  replaced  by  "-", the standard output iswritten.  The processing is truly stream oriented.  There is no limiton the length of the input sequence..PP\fIGet_f0\fR does not remove the DC component from the input signal.Large DC offsets will impair the voiced/unvoiced decision and lead tomisleading RMS measurements, especially in low-amplitude regions.  Itis recommended that the DC component be removed by a program such as\fIrem_dc\fR(1-ESPS) before using \fIget_f0\fR..PPNote that \fIget_f0\fR is designed to replace the pitch trackingfunction of the older \fIformant\fR program. fIGet_f0\fR is bothfaster and more accurate than \fIformant\fR, and does not have thebatch-processing limitations of the latter..PP.SH OPTIONS.PPThe following options are supported:.TP.BI \-P " param_file \fR[params]\fP"uses the parameter file \fIparam_file\fR rather than the default, which is \fIparams\fR. .TP.BI \-r " first:last".TP.BI \-r " first:+incr"Determines the range of points from input file.  In the first form, apair of unsigned integers gives the first and last points of therange.  If \fIfirst\fR is omitted, 1 is used.  If \fIlast\fR isomitted, the last point in the file is used.  The second form isequivalent to the first with \fIlast = first + incr\fR.  If no rangeis specified, the whole input file is processed..TP.BI \-p " "Same as the \fB-r\fR option.  (Note that this is a change from version5.0, where \fB-p\fR was used for the frame intertval option.).TP.BI \-s " first:last".TP.BI \-s " first:+incr"Same function as the \fB-r\fR option, but specifies the range of input data in seconds.TP.BI \-i " frame_step [0.01]"Specifies frame step in second, between 0.1 and 1/sampling rate in sec..TP.BI \-S " frame_step [0.01 * sampling frequency]"Same as the -i option, but specifies frame step in samples.TP.BI \-x " debug_level \fR[0]\fP"If .I debug_levelis positive,.I get_f0prints debugging messages and other information on the standard erroroutput.  The messages proliferate as the  .I debug_levelincreases.  If \fIdebug_level\fP is 0 (the default), no messages areprinted.  .SH ESPS PARAMETERS\fIGet_f0\fR is designed for use "as is" with little or no need tochange its parameters under most circumstances.  Exceptions includethe frame rate, and the maximum F0 and minimum F0 to track.  If the F0estimates from \fIget_f0\fR do not appear reasonable, you should checkyour signal or signal conditioning before beginning parameteradjustments.  Common causes of difficulty include a strong periodiccomponent in the background causing the voicing to stay on, or asignificant DC offset causing poor RMS estimates.  The followingparameter file options are supported..TP.I "start - integer".IPThe first point in the input sampled data file that is processed.  Avalue of 1 denotes the first sample in the file.  This is only readif the \fB\-p\fP option is not used.  If it is not in the parameterfile, the default value of 1 is used.  .TP.I "nan - integer".IPThe total number of data points to process.  If .I nanis 0, the whole file is processed.  .I Nanis read only if the \fB\-p\fP option is not used.  (See the discussion under \fB\-l\fP)..TP.I " frame_step - float"Analysis frame step interval. Computation increases as 1/\fIframe_step\fR. Valid value lies in [1/sampling rate,  0.1]..TP.I " min_f0 - float"Minimum F0 to search for. Note that computational cost grows as1/min_f0.  Valid values are greater than or equal to (Fs/10000) Hz,where Fs is the sample rate of the input speech signal.  Default is50.0.  \fImin_f0\fR and \fImax_f0\fR determine the number ofcross-correlation lags to compute for each frame..TP.I " max_f0 - float"Maximum F0 to search for.  Valid values are greater than \fImin_F0\fR andsmaller than one half the sampling rate of input file.  Default is 550.0.\fImin_f0\fR and \fImax_f0\fR determine the number ofcross-correlation lags to compute for each frame..PPThe default settings of the following parameters were determined by exhaustive search of the parameter space using hand-verifieddata as the reference.  Twiddling with the parameters is not recommended..TP.I " cand_thresh - float"Determines cross correlation peak height required for a peak to beconsidered a pitch-peak candidate.  Valid value lies in [.01, .99].Default is 0.3..TP.I " lag_weight - float"Amount of weight given to the shortness of the proposed pitch interval.Higher numbers make high F0 estimates more likely. Valid value lies in [0, 1].Default is 0.3..TP.I " freq_weight - float"Strength of F0 continuity.  Higher numbers impose smoother contours. Valid value lies in [0, 1].  Default is 0.02..TP.I " trans_cost - float"Fixed cost of making a voicing-state transition.  Higher numbers discouragestate changes. Valid value lies in [0, 1].  Default is 0.005..TP.I " trans_amp - float"Voicing-state transition cost modulated by the local rate of amplitudechange.  Higher numbers discourage transitions EXCEPT when the rate ofamplitude change is great. Valid values lie in [0, 100].  Default is 0.5..TP.I " trans_spec - float"Voicing-state transition cost modulated by the local rate of spectralchange.  Higher numbers discourage transitions EXCEPT when the rate ofspectral change is great. Valid values lie in [0, 100].  Default is0.5..TP.I " voice_bias - float"Determines fixed preference for voiced or unvoiced state.  Positivenumbers encourage the voiced hypothesis, negative numbers theunvoiced. Valid values lie in [-1, 1].  Default is 0.0..TP.I " double_cost - float"The cost of a rapid one-octave (up or down) F0 change.  High numbersdiscourage any jumps, low numbers permit octave jumps. Valid values lie in[0, 10].  Default is 0.35..TP.I " wind_dur - float"Size of correlation window.  Computation increases directly as wind_dur. Valid values lie in [10/sampling rate, .1].  Default is 0.0075..TP.I " n_cands - integer"The maximum number of correlation peaks considered as possibleF0-peak candidates in any frame.  At most, the top n-cands candidatesare considered in each frame. The computational cost grows approximatelyas n_cands SQUARED.  Valid values lie in [3, 100].  Default is 20..PP.SH ESPS COMMONNo ESPS common parameter processing is supported..PP.SH ESPS HEADERSThe usual \fIrecord_freq\fR, \fIstart_time\fR header items, all supported parameters are stored as generic header items.  In addition,the \fIrecord_freq\fR header item of the \fIin_file\fR input file issaved as the \fIsrc_sf\fR header item..PP.SH FUTURE CHANGESIn a future release DC will be removed prior to RMS comutation.  Also,an optional element may be added to the output vector to include RMScomputed on the preemphasized speech..PP.SH EXAMPLES.PP.SH ERRORS AND DIAGNOSTICS.PP.SH BUGS.PPNone known..SH REFERENCESTalkin, D. (1995). A Robust Algorithm for Pitch Tracking (RAPT). InKleijn, W. B. and Paliwal, K. K. (Eds.), \fISpeech Coding andSynthesis\fR. New York: Elsevier..PP.SH "SEE ALSO"FEA(5\-ESPS), epochs(1-ESPS), formant(1-ESPS), rem_dc(1-ESPS).PP.SH AUTHORSDavid Talkin, Derek Lin

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?