📄 matlab routines for linear predictive coding (lpc).htm
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0053)http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/lpc.html -->
<HTML><HEAD><TITLE>Matlab routines for Linear Predictive Coding (LPC)</TITLE>
<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
<META content="D:\Program Files\Microsoft Office\Office\html.dot" name=Template>
<META content="MSHTML 5.00.2614.3500" name=GENERATOR></HEAD>
<BODY link=#0000ff vLink=#800080>
<H1>Matlab routines for Linear Predictive Coding (LPC)</H1>
<HR>
<P>Return to <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html">voicebox home
page</A></P>
<HR>
<H2>Data Format</H2>
<P>All the LPC routines described in this section can process several frames
together. Each frame corresponds to a single row of the data matrix; if there is
only one frame then the matrix must be a row vector rather than a column
vector.</P>
<H2>LPC Analysis</H2>
<P>Two routines are provided: <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcauto.txt">lpcauto</A>
for autocorrelation analysis and <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpccovar.txt">lpccovar</A>
for covariance analysis.</P>
<P>The analysis order, <I>p</I>, denotes the number of poles in the resultant
autoregressive filter. The appropriate value for <I>p</I> is typically
2+<I>f<SUB>s</SUB></I>/1kHz where <I>f<SUB>s</SUB></I> is the sample frequency.
This expression assumes that sound takes about = ms to travel the length of the
vocal tract.</P>
<H3>Fancy versions of LPC</H3>
<P>Although the analysis order must be the same for all frames, the individual
frame lengths can vary; this allows <I>pitch-synchronous</I> analysis. It is
also possible to restrict the analysis interval to particular segments of each
frame to allow <I>closed-phase</I> analysis. For high pitched voices, the closed
phases may be very short: it is possible in this case to combine the data from
two or more consecutive cycles to give <I>multi-cycle closed-phase</I> analysis.
To obtain reliable estimates of the AR coefficients must be based on at least 2
ms of data.</P>
<H2>LPC Coefficient Representations</H2>
<P>The coefficients generated by LPC analysis can be represented in many
equivalent forms. Voicebox recognizes the coefficient sets listed below and
denotes each with a two-letter mnemonic. The number of coefficients varies: for
an analysis of order <I>p</I> there can be <I>p</I>, <I>p</I>+1, or <I>p</I>+2
coefficients. This is indicated in the table. The meaning of the coefficient
sets is explained below with reference to the lossless tube model of speech
production. Routines are provided to convert each representation to the other
forms indicated: the routine that converts from representation <EM>xx</EM> to
representation <EM>yy</EM> is called lpc<EM>xx</EM>2<EM>yy</EM>.</P>
<P>The routine <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcconv.txt">lpcconv
</A>can be used to figure out the sequence of calls needed to convert between
any pair of these representations.</P>
<TABLE border=0 cellPadding=6 cellSpacing=0 width="100%">
<TBODY>
<TR>
<TD vAlign=top width=50><B>Code</B></TD>
<TD vAlign=top width=70><B>Size</B></TD>
<TD vAlign=top width=120><B>Convert from</B></TD>
<TD vAlign=top width=120><B>Convert to</B></TD>
<TD vAlign=top><B>Description</B></TD></TR>
<TR>
<TD vAlign=top width=50>aa</TD>
<TD vAlign=top width=70><I>p</I>+2</TD>
<TD vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcdl2aa.txt">dl</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcrf2aa.txt">rf</A></TD>
<TD vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcaa2ao.txt">ao</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcaa2dl.txt">dl</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcaa2rf.txt">rf</A></TD>
<TD vAlign=top>The <I>area coefficients</I> represent the cross-sectional
areas of the vocal tract segments. The areas are normalised so that
aa(<I>p</I>+2), the effective area of the free space beyond the lips, is
equal to 1. aa(1) is the area at the glottis and is usually near 0.</TD></TR>
<TR>
<TD vAlign=top width=50>am</TD>
<TD vAlign=top width=70>(<EM>p</EM>+1)<SUP>2</SUP></TD>
<TD vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2am.txt">ar</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcrr2am.txt">rr</A></TD>
<TD vAlign=top width=120></TD>
<TD vAlign=top>An upper unit-triangular matrix containing the AR
coefficients for all orders 0,...,<EM>p</EM>. This matrix is a diagonal
multiple of the hermitian square root of the symmetric toeplitz matrix
toeplitz(rr).</TD></TR>
<TR>
<TD vAlign=top width=50>ao</TD>
<TD vAlign=top width=70><I>p</I>+1</TD>
<TD vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcaa2ao.txt">aa</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcrf2ao.txt">rf</A></TD>
<TD vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcao2rf.txt">rf</A></TD>
<TD vAlign=top>The <I>area ratios</I> give the ratio of one tube segment
to that of the following segment.</TD></TR>
<TR>
<TD vAlign=top width=50>ar</TD>
<TD vAlign=top width=70><I>p</I>+1</TD>
<TD vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpccc2ar.txt">cc</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcim2ar.txt">im</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcls2ar.txt">ls</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcrf2ar.txt">rf</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcrr2ar.txt">rr,
</A><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpczz2ar.txt">zz</A><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcrr2ar.txt">
</A></TD>
<TD vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2am.txt">am</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2cc.txt">cc</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2db.txt">db</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2ff.txt">ff</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2im.txt">im</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2ls.txt">ls</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2pf.txt">pf</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2pp.txt">pp</A>,<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2ra.txt">
ra</A>, <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2rf.txt">rf</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2rr.txt">rr</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2zz.txt">zz</A></TD>
<TD vAlign=top>The <I>autoregressive coefficients</I> or <I>AR
coefficients</I> represent the transfer function from the output flow of
the vocal tract to the input flow. The coefficients are usually normalised
so that ar(1)=1.</TD></TR>
<TR>
<TD vAlign=top width=50>cc</TD>
<TD vAlign=top width=70><I>p</I></TD>
<TD vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2cc.txt">ar</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcpf2cc.txt">pf</A>,
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcpf2cc.txt"></A><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpczz2cc.txt">zz</A></TD>
<TD height=145 vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpccc2ar.txt">ar</A></TD>
<TD vAlign=top>The <I>complex cepstrum coefficients</I> are actually real
despite their name. They equal the inverse fourier transform of the log
frequency response of the autoregressive filter. These coefficients do not
include cc(0) which is the DC component of the log frequency
response.</TD></TR>
<TR>
<TD vAlign=top width=50>cw</TD>
<TD vAlign=top width=70><I>p</I></TD>
<TD vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcpp2cw.txt">pp</A></TD>
<TD vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpccw2zz.txt">zz</A></TD>
<TD vAlign=top>The roots of the power spectrum polynomial <I>pp</I>. These
are the, normally complex, values of cos(w) that make the power spectrum
of the inverse filter equal to zero. </TD></TR>
<TR>
<TD vAlign=top width=50>db</TD>
<TD vAlign=top width=70><I>p</I>+2</TD>
<TD vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcar2db.txt">ar</A></TD>
<TD vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcdb2pf.txt">pf</A></TD>
<TD vAlign=top>The <I>power spectrum</I> of the AR filter expressed in
decibels. The first and last elements of ff() are respectively the DC and
nyquist terms.</TD></TR>
<TR>
<TD vAlign=top width=50>dl</TD>
<TD vAlign=top width=70><I>p</I></TD>
<TD vAlign=top width=120><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/lpcaa2dl.txt">aa</A></TD>
<TD vAlign=top width=120><A
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -