⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 formant.1

📁 speech signal process tools
💻 1
📖 第 1 页 / 共 2 页
字号:
.\" @(#)formant.1	1.20 20 Sep 1997  ESI.TH FORMANT 1\-ESPS 20 Sep 1997.ds ]W "\fI\s+4\ze\h'0.05'e\s-4\v'-0.4m'\fP\(*p\v'0.4m'\ ERL.if t .ds - \(em\h'-0.5m'\(em.if n .ds - ---.SH "NAME"formant - speech formant and fundamental frequency (pitch) analysis.SH "SYNOPSIS".B formant [.BI \-p " preemphasis"] [.BI \-n  " num_formants" ] [.BI \-o  " lpc_order"] [.BI \-i  " frame_step"] [.BI \-w " window_duration"] [.BI \-W " window_type"] [.BI \-t " lpc_type"] [.BI \-F ] [.BI \-O " output_path"] [.BI \-r " range" ] [.BI \-S] [.BI \-f " ds_freq"] [.BI \-y " f0_max"] [.BI \-z " f0_min"] [.BI \-N " nom_f0_freq"] [.BI \-B " max_buff_bytes"] [.BI \-R " maxrms_duration"] [.BI \-M " maxrms_value"] [.BI \-x " debug_level"] .I " infile".SH "DESCRIPTION".PP\fIFormant\fR estimates speech formant trajectories, fundamentalfrequency (F0) and related information.  In particular, for eachframe of sampled data, \fIformant\fP estimates formant frequencies,formant bandwidths, pole frequencies corresponding to linearpredictor coefficients, and voicing information (fundamentalfrequency, voiced/unvoiced decision, rms energy, first normalizedautocorrelation, and the first reflection coefficient)..PPIf only F0 analysis is desired, the new and better F0 estimationprogram \fIget_f0\fP(1\-ESPS) should be used, since it is faster, moreaccurate and more convenient to use.  \fIGet_f0\fR processes data instream mode, and so has no constraints on the length of the input datasequence (or file)..PPDynamic programming is used to optimize F0 and formant trajectory estimatesby imposing frequency continuity constraints.  The formant frequenciesare selected from candidates proposed by solving for the roots of thelinear predictor polynomial computed periodically from the speechwaveform.  The local costs of all possible mappings of the complexroots to formant frequencies are computed at each frame based on thefrequencies and bandwidths of the component formants for each mapping.The cost of connecting each of these mappings with each of themappings in the previous frame is then minimized using a modifiedViterbi algorithm..PPThe input file.I infileis a sampled-data file\*-typically an ESPS FEA_SD file,though other formats are accpted as well (see.IR get_feasd_recs (3\-ESPS))..I Formantproduces various output files with the same file name body as\fIinfile\fP (the name body results from removing the last of anyextensions\*-e.g., the name body of "foo.sd" is "foo"), but withdifferent extensions.  Voicing information is stored in a FEA filewith extension ".f0", formants and bandwidths are stored in a FEA filewith extension ".fb", and pole frequencies are stored in an ASCII filewith extension ".pole".  For details on these and related files thatare relevant for use with \fIxwaves\fP, see "FILE FORMATS", below.The \fB\-O\fP option permits specification of an alternate output pathin cases where it is undesirable or impossible to write in the inputdirectory..PPIf the sampling frequency of the input speech file is greater than 10kHz (or the \fB\-f\fP specified value), \fIformant\fP downsamples theinput file to the appropriate sampling rate and saves the results inan ESPS FEA_SD file with a ".ds" extension.  The input signal, possiblydownsampled, is then high-pass filtered to remove low frequency rumble(cut-off at approximately 80 Hz), with the result stored in an ESPS FEA_SDfile with a ".hp" extension. The ".hp" file is then used for theF0 and formant frequency estimates in the manner described above..PPPreemphasis is applied prior to the linear prediction analysis inorder to compensate partially for voice source and radiationcharacteristics.  The high-pass filtering is intended to provide azero-mean signal for linear prediction analysis when there is apossibility of residual DC because the real zero of the preemphasisfilter is within the unit circle.  For reliable F0 estimation based oncross correlations, it is essential that the DC and rumble be removed,which is another reason for the high pass filtering..PPIf ".ds" or ".hp" files already exist in the current directory when \fIformant\fP is run, they are used directly and not re-computed.  This shortcut is intended to save time when analysis conditions are varied.  It is, however, somewhat error prone so you should be aware of what files are around.   .SH OPTIONS.PP.TP.BI \-d " debug_level" " \fR[0]\fP"Valuesgreater than 0 cause messages to print to stderr..TP.BI \-p " preemphasis" " \fR[.7]\fP"Specifies the pre-emphasis constant to use before linear predictoranalysis. Possible values range from 0 to 1..TP.BI \-n " num_formants" " \fR[4]\fP"Specifies the number of formants to attempt to track.  For adult speakers, thisnumber is normally less than or equal to the sampling frequency in kHz. (after down sampling) dividedby 2, and less than or equal to (\fIlpc_order\fP \- 4) / 2.Currently, a maximum of seven formants are supported..TP.BI \-w " window_duration" " \fR[.049]\fP"Specifies the length of the data window in seconds over which thelinear predictor analysis takes place.  Note that this default windowsize is intended for use with a cos**4 window.  The equivalent lengthfor a hanning window would be about 25 ms..TP.BI \-W " window_type" " \fR[cos**4]\fP"Specifies the type of data window to apply to the data prior tolinear predictor analysis. Possible values are 0 (rectangular),1 (Hamming), 2 (cos**4) and 3 (hanning). .TP.BI \-i " frame_step" " \fR[.01]\fP\"Specifies the step size in seconds between frames.  This determinesthe amount by which the onset of the data window is moved betweenanalysis frames..TP.BI \-o " lpc_order" " \fR[12]\fP"Specifies the order of the linear predictor analysis done in eachframe. Maximum order is 30..TP.BI \-f " ds_freq" " \fR[10000]\fP"Specifies the sampling frequency of the data to be used in the voicingand formant frequency analysis.  If \fIds_freq\fP is lower than theinput file's sample frequency, the data is down sampled.  Othwewise,the input sample rate prevails..TP.BI \-N " nom_f1_freq" \fR[500]\fP"Specifies the nominal value of the first formant frequency.  Thisvalue is used by the program to adjust the nominal values of all otherformants and of the ranges over which the formants are permitted toexist.  The default value of 500Hz assumes that the vocal tract lengthis 17 cm and that the speed of sound is 34000 cm/sec.  Nominal F1values scale directly with sound velocity and inversely withvocal-tract length..TP.BI \-t " lpc_type" " \fR[autocorrelation]\fP"Specifies the linear predictor analysis method.  Possible valuesinclude 0 (autocorrelation) and 1 (stabilized covariance).  Ifstabilized covariance method is chosen, however, \fIwindow_duration\fP(\fB\-w\fP) is set to .025, \fIpreemphasis\fR (\fB\-p\fP) is set toexp{\-1800*pi/samp_freq), and \fIwindow_type\fR (\fB\-W\fP) is set torectangular..TP.BI \-y " f0_max" " \fR[maximum F0 value]\fP"Specifies the maximum F0 value to search for.  Default is 500 Hz..TP.BI \-y " f0_min" " \fR[minimum F0 value]\fP"Specifies the minimum F0 value to search for.  Default is 60 Hz..TP.B \-SEnable the creation of \fISIGnal\fP files for the F0 estimates inaddition to the normal ESPS files.  In previous versions of\fIformant\fP this was the default, since \fIxwaves\fP required fileswith \fISIGnal\fP headers to correctly invoke the special F0 display.\fIXwaves\fP now develops this display from the ESPS files.  Those whostill desire the old behavior may reenable it with this option..TP.BI \-r " range"Select a subrange of points to be processed, using the format.IR start\-end ,.I start:end or.IR start:+count .Either the start or the end may be omitted; the beginning or the endof the file are used if no alternative is specified.  If no range isspecified, the entire input file will be processed..IPIf multiple files were specified, the same range from each file is processed.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -