📄 epochs.1

📁 speech signal process tools
💻 1
字号:
.\" Copyright (c) 1996 Entropic Research Laboratory, Inc.; All rights reserved.\" @(#)epochs.1	1.15 4/2/97 ERL.ds ]W (c) 1996 Entropic Research Laboratory, Inc..TH  EPOCHS 1\-ESPS 4/2/97.SH NAME.nfepochs \- glottal-pulse analysis using dynamic programming .fi.SH SYNOPSIS.Bepochs[.BI -P " param_file"][.BI -p][.BI -n][.BI -b " in_labelfile"][.BI -f " in_f0file"].BI -o " out_labelfile"][.BI -x " debug_level"].I in_file.I out_file.PP.SH DESCRIPTION\fIEpochs\fR uses dynamic programming optimization to determine boththe polarity and locations of the pitch epoch pulses (times of vocal foldclosure) during voiced speech.  The algorithm will work for anyquasi-periodic input signal, but is currently tuned for operation onintegrated LPC speech residual signals..PPThe input file \fIin_file\fR is typically a 16-bit PCMinverse-filtered speech file of FEA_SD type processed to approximatethe derivative of the glottal volume velocity.  A speech residualsignal produced by \fIget_resid (1-ESPS)\fR would be appropriate.  Theoutput file \fIout_file\fR is a FEA_SD file with impulses at the epochlocations and zeros elsewhere.  The polarity of the output pulsesreflects the polarity of the peaks chosen in the input signal..PPUnvoiced speech in \fIin_file\fR often results in false pitch epoch estimation.  Consider using \fImask(1-ESPS)\fR to gateout unwanted pitch epochs by masking them against a bettervoice/unvoiced estimator, such as that produced by\fIget_f0(1-ESPS)\fR..PP.SH OPTIONS.PPThe following options are supported:.TP.BI -P " param_file \fR[params]\fP" uses the parameter file \fIparam_file\fRrather than the default, which is \fI$ESPS_BASE/lib/params/Pepochs\fP. .TP.BI -p.TP.BI -nspecifies that only positive (\fI-p\fR) or negative (\fI-n\fR) localextrema in \fIin_file\fR need be considered as potential epochs.If neither option is specified, all local extrema are considered.The processing time can be cut in half or more by specifying one of these options, if the polarity of the \fIin_file\fR is known in advance.\fIIn_file\fR may be examined with \fIwaves+\fR to determine the polarity of any given data set.  NOTE: if the pulse polarity is incorrectlyspecified, analysis results will range from tolerable to horrible..TP.BI -b " in_labelfile"\fIIn_labelfile\fR is the name of a label file (as created by \fIwaves+\fR;"V" label for voiced, "O" label for unvoiced and others) used to gateon the epoch finder only in selected regions.  Best epoch-location results (especiallynear voice onset/offset) are obtained if neither \fI-b\fR or \fI-f\fR option is used..TP.BI -f " in_f0file"\fIIn_f0file\fR is the name of a FEA file containing the\fIprob_voice\fR field (as created by \fIget_f0(1-ESPS)\fR) used togate on the epoch finder only in selected regions.  Bestepoch-location results (especially near voice onset/offset) areobtained if neither \fI-b\fR or \fI-f\fR option is used..TP.BI -o " out_labelfile"\fIOut_labelfile\fR is the name of a file to contain epoch marks in \fIwaves+\fR "label" format (see \fIxlabel\fR(1-ESPS))..TP.BI -x " debug_level \fR[0]\fP"If \fIdebug_level\fR is positive, \fIepochs\fR prints debugging messages and other information on the standard erroroutput.  The messages proliferate as the  \fIdebug_level\fRincreases.  If \fIdebug_level\fP is 0 (the default), no messages areprinted.  .SH ESPS PARAMETERSThe following parameters are supported.  Except the \fIpolarity\fR parameter,all parameters have default settings optimally determined by exhaustive search of the parameter space using electro-glottographic data as the reference.  Twiddling with the parameters is NOT recommended.  A possibleexception to this is the \fIclip_level\fR which can be increased toreduce the number of peaks considered using dynamic progrmming.  Thiswill speed up operation, but can result in loss of peaks for somevoicing conditions (e.g. extreme breathy voice)..TP.I polarity - stringSimilar to \fI-p\fR and \fI-n\fR command-line options:a value of "+" to consider only peaks with positive polarity as potential epochs; a value of "-" for negative polarity; and a value of"NONE" is to consider all peaks as potential epoch locations..PPAll parameters below have values between 0.0 and 1.0..TP.I clip_level - floatThe clipping level that determines what fraction of the local RMS a peakmust exceed to be considered as an epoch candidate.  Default is 0.5..TP.I peak_quality_wt - floatWeight given to peak quality.  Default is 1.0..TP.I period_dissim_cost -floatCost of dissimilarities in the shape of consecutive peaks.  Default is 0.4..TP.I peak_qual_dissim_cost - floatCost for peak quality dissimilarity.  Default is 0.05..TP.I shape_to_peak - floatRelative contribution of shape to peak quality.  Default is 0.35..TP.I freq_dh_cost - floatCost of each frequency doubling/halving.  Sometimes the F0 really doesjump by octaves.  Low values make octave jumps more likely.  Default is0.7..TP.I peak_award - floatAward for selecting a peak.  Determines average peak density.  Default is0.4..TP.I v_uv_cost - floatcost for V-UV transition.  High values discourage state transitions.  Defaultis 0.2..TP.I uv_v_cost - floatCost for UV-V transition.  High values discourage state transitions.  Defaultis 0.2..TP.I rms_onoff_cost - floatCost for RMS rise/fall appropriateness.  This assumes RMS rises atvoice onset and falls at voice offset.  The extent to which voicingtransition costs are relaxed at these points is adjusted by thisfactor.  Default is 0.3..TP.I uv_cost - floatCost of unvoiced classification.  Default is 0.7..TP.I jitter - floatReasonable inter-period variation expressed as a fraction of a period.  Defaultis 0.1..PP.SH ESPS COMMONNo ESPS common parameter processing is supported..PP.SH ESPS HEADERSThe generic header items saved are the standard header items, \fIstart_time\fRand \fIrecord_freq\fR, and any parameter under \fBESPS PARAMETERS\fR if itsvalue is acquired by means of parameter file..PP.SH FUTURE CHANGES.PP.SH EXAMPLES.PPThe following fragment of a Bourne shell script demonstrates howepochs might be located in an ESPS speech file named "spch.sd".  Theresulting "spch.lab" file contains epoch marks expressed in time in aform compatible with the \fIxlabel\fR program.  The "spch.pe" filerepresents the epoch locations as a series of pulses in a file directlyviewable using \fIxwaves\fR.  See also the example in \fIlp_syn (1-ESPS)\fR..PP.nf.na.ne 10# Determine the sample rate of the original speech file.sf=`hditem -i record_freq spch.sd`## Establish the window size and frame step for periodic analyses.size=.02step=.005## Get analysis step size and window length in samples.ssize=`echo $sf $size \\* p q | dc`sstep=`echo $sf $step \\* p q | dc`## Standard rule-of-thumb computation for LPC order.order=`echo $sf 1000 / 2 + p q | dc`## Compute reflection coefficients using a standard set of parametersrefcof -z -r1:1000000 -e.97 -x0 -wHANNING -l$ssize -S$sstep \\       -o$order spch.sd spch.rc## Get a high-resolution estimate of F0 and a reasonably accurate#  voicing-state estimate.get_f0 -i $step spch.sd spch.f0## Compute the LPC residual (approximates the glottal flow derivative).get_resid -a 1 -i 0.0 spch.sd spch.rc spch.res## Blank out the residual signal in the unvoiced regions.mask spch.f0 spch.res spch.resm## Find the points of glottal closure in the voiced regions.epochs  -o spch.lab spch.resm spch.pe.fi.ad.PP.SH ERRORS AND DIAGNOSTICS.PP.SH BUGS.PPNone known..SH REFERENCESTalkin, D., "Voicing epoch determination with dynamic programming,"\fIJ. Acoust. Soc. Amer.\fP, 85, Supplement 1, 1989..spTalkin, D. and Rowley, J., "Pitch-Synchronous analysis and synthesisfor TTS systems," \fIProceedings of the ESCA Workshop on SpeechSynthesis\fP, C. Benoit, Ed., Imprimerie des Ecureuils, Gieres, France,1990..PP.SH "SEE ALSO".nf\fIrefcof\fP(1-ESPS), \fIget_resid\fP(1-ESPS), \fImask\fP(1-ESPS), \fIget_f0\fP(1-ESPS), \fIps_ana\fP(1-ESPS).fi.PP.SH AUTHORSDavid Talkin, Derek Lin
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -