📄 voicebox speech processing toolbox for matlab.htm
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0058)http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html -->
<HTML><HEAD><TITLE>VOICEBOX: Speech Processing Toolbox for MATLAB</TITLE>
<META content="text/html; charset=gb2312" http-equiv=Content-Type>
<META content="MSHTML 5.00.2614.3500" name=GENERATOR>
<META content="E:\Program Files\Microsoft Office\Office\html.dot"
name=Template></HEAD>
<BODY link=#0000ff vLink=#800080>
<H1>VOICEBOX: Speech Processing Toolbox for MATLAB</H1>
<H2>Introduction</H2>
<P>VOICEBOX is a speech processing toolbox consists of MATLAB routines that are
maintained by and mostly written by <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/dmb.html">Mike Brookes</A>, <A
href="http://www.ee.ic.ac.uk/">Department of Electrical & Electronic
Engineering</A>, <A href="http://www.ic.ac.uk/">Imperial College</A>, Exhibition
Road, London SW7 2BT, UK. Several of the routines require MATLAB V5.</P>
<P>The routines are available as a <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.tar.Z">compressed
tar file</A> or as a <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.zip">zip archive</A>
and are made available under the terms of the <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/copying.txt">GNU Public
License</A>. </P>
<P>Please send any comments, suggestions, bug reports etc to <A
href="mailto:mike.brookes@ic.ac.uk">mike.brookes@ic.ac.uk</A>. </P>
<HR>
<H2>Contents</H2>
<HR>
<DL>
<DT><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html#file">Audio
File Input/Output </A>
<DD>Read and write WAV and other speech file formats
<DT><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html#frequency">Frequency
Scales </A>
<DD>Convert between Hz, Mel, Erb and MIDI frequency scales
<DT><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html#fourier">Fourier/DCT/Hartley
Transforms</A>
<DD>Various related transforms
<DT><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html#random">Random
Number Generation</A>
<DD>Generate random vectors and noise signals
<DT><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html#distance">Vector
Distances</A>
<DD>Calculate distances between vector lists
<DT><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html#analysis">Speech
Analysis</A>
<DD>Active level estimation, Spectrograms
<DT><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html#lpc">LPC
Analysis of Speech</A>
<DD>Linear Predictive Coding routines
<DT><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html#synthesis">Speech
Synthesis</A>
<DD>Glottal waveform models
<DT><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html#enhance">Speech
Enhancement</A>
<DD>Spectral noise subtraction
<DT><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html#coding">Speech
Coding</A>
<DD>PCM coding, Vector quantisation
<DT><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html#recog">Speech
Recognition</A>
<DD>Front-end processing for recognition
<DT><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html#utility">Utility
Functions</A>
<DD>Miscellaneous utility functions </DD></DL>
<HR>
<HR>
<H2><A name=file>Audio File Input/Output</A></H2>
<BLOCKQUOTE>
<P>Routines are available to read and, in some cases write, a variety of file
formats:</P>
<TABLE border=0 cellPadding=2 width="100%">
<TBODY>
<TR>
<TD width=50><B>Read</B></TD>
<TD width=50><B>Write</B></TD>
<TD width=30><B>Suffix</B></TD>
<TD> </TD></TR>
<TR>
<TD width=50><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/readwav.txt">readwav</A></TD>
<TD width=50><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/writewav.txt">writewav</A></TD>
<TD width=30>.wav</TD>
<TD>These routines allow an arbitrary number of channels and can deal
with linear PCM (any precision up to 32 bits), A-law PCM and Mu-law PCM.
Large files can be read and written in small chunks.</TD></TR>
<TR>
<TD width=50><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/readhtk.txt">readhtk</A></TD>
<TD width=50><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/writehtk.txt">writehtk</A></TD>
<TD width=30>.htk</TD>
<TD>Read and write waveform files used by Entropic's Hidden Markov
Toolkit.</TD></TR>
<TR>
<TD width=50><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/readsfs.txt">readsfs</A></TD>
<TD width=50> </TD>
<TD width=30>.sfs</TD>
<TD>Speech Filing system files from Mark Huckvale at UCL.</TD></TR>
<TR>
<TD width=50><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/readsph.txt">readsph</A></TD>
<TD width=50> </TD>
<TD width=30>.sph</TD>
<TD>NIST Sphere format files (including TIMIT).</TD></TR>
<TR>
<TD width=50><A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/readaif.txt">readaif</A></TD>
<TD width=50> </TD>
<TD width=30>.aif</TD>
<TD>Audio Interchange File Format used by Mac
users.</TD></TR></TBODY></TABLE></BLOCKQUOTE>
<HR>
<H2><A name=frequency>Frequency Scale Conversion</A></H2>
<UL>
<LI>The <I>mel scale</I> is based on the human perception of sinewave pitch.
The routines <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/mel2frq.txt">mel2frq</A>
and <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/frq2mel.txt">frq2mel</A>
convert between this scale and frequency in Hz.
<LI>The <I>erb</I> scale is based on the equivalent rectangular bandwidths of
the human ear. The routines <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/erb2frq.txt">erb2frq</A>
and <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/frq2erb.txt">frq2erb</A>
convert between the erb rate scale and frequency in Hz.
<LI>The <I>midi standard</I> specifies a numbering of <I>semitones</I> with
middle C being 60. The routines <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/frq2midi.txt">frq2midi</A>
and <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/midi2frq.txt">midi2frq</A>
convert between this musical frequency scale and Hz. <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/frq2midi.txt">frq2midi</A>
will in addition output note names in a character format. <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/midi2frq.txt">midi2frq</A>
can use the normal equal tempered scale or else the pythagorean scale of just
intonation. </LI></UL>
<HR>
<H2><A name=fourier>Fourier, DCT and Hartley Transforms</A></H2>
<UL>
<LI>The routines <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/rfft.txt">rfft</A>, and
<A href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/irfft.txt">irfft</A>
perform forward and inverse fourier transforms on real data. Only half of the
conjugate symmetric transform is generated by the forward routine RFFT. For
even length data, the inverse routine, IRFFT, is asymptotically twice as fast
as the built-in fft routine IFFT. The routine <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/rsfft.txt">rsfft
</A>performs the forward transform on real symmetric data.
<LI>The routines <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/rdct.txt">rdct</A>, and
<A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/irdct.txt">irdcft</A>
perform forward and inverse discrete cosine transforms on real data. The
routines are asymptotically twice as fast as the complex-data routines in the
image-processing and signal-processing toolboxes.
<LI>The routine <A
href="http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/txt/rhartley.txt">rhartley
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -