📄 tr156.tex

📁 语音处理平台可以分析语音能量第一振峰频率等数据
💻 TEX
📖 第 1 页 / 共 3 页
字号:
in table~\ref{tab:rice}.\begin{table}[hbtp]\begin{center} \begin{tabular}{|c|c|c|c|c|c|} \hline	     	& sign	& lower	& number & full   \\Number      	& bit	& bits	& of `0's & code   \\\hline0		& 0	& 00	& 1	& 0001	 \\13		& 0	& 01	& 3	& 0010001 \\-7		& 1	& 11	& 2	& 111001 \\\hline\end{tabular} \end{center}\caption[nop]{Examples of Huffman codes for $n = 2$}\label{tab:rice}\end{table}As with all Huffman codes, a whole number of bits are used per sample,resulting in instantaneous decoding at the expense of introducingquantization error in the p.d.f.  This is illustrated with the pointsmarked '$+$' in figure~\ref{fig:logpdf}.  In the example, $n = 2$ giving aminimum code length of 4.  The error introduced by coding according tothe Laplacian p.d.f. instead of the true p.d.f. is only 0.004 bits persample, and the error introduced by using Huffman codes is only 0.12bits per sample.  These are small compared to a typical code length of 7for 16 kHz speech corpora.This Huffman code is also simple in that it may be encoded and decodedwith a few logical operations.  Thus the implementation need not employa tree search for decoding, so reducing the computational and storageoverheads associated with transmitting a more general p.d.f.The optimal number of low order bits to be transmitted directly islinearly related to the variance of the signal.  The Laplacian isdefined as:\begin{eqnarray}p(x) & = & { 1 \over {\sqrt 2 } \sigma } e^{{- \sqrt 2 \over \sigma} | x |}\end{eqnarray}where $|x|$ is the absolute value of $x$ and $\sigma^2$ is the variance of thedistribution.  Taking the expectation of $|x|$ gives:\begin{eqnarray}E(|x|)	& = & \int_{-\infty}^\infty |x|  p(x) dx \\	& = & \int_0^\infty x {{\sqrt 2} \over \sigma} e^{{- \sqrt 2 \over \sigma} x } dx \\	& = & \int_0^\infty e^{{- \sqrt 2 \over \sigma} x } dx - 	\left[ x e^{{- \sqrt 2 \over \sigma} x } \right]_0^\infty \\	& = & {\sigma \over \sqrt 2}\end{eqnarray}For optimal Huffman coding we need to find the number of low order bits,$n$, such that such that half the samples lie in the range $\pm 2^n$.This ensures that the Huffman code is $n + 1$ bits long with probability0.5 and $n + k + 1$ long with probability $2^{-(n + k)}$, which isoptimal.\begin{eqnarray}1/2 & = & \int_{-2^n}^{2^n} p(x) dx \\    & = & \int_{-2^n}^{2^n}  { 1 \over {\sqrt 2 } \sigma } e^{{- \sqrt 2\over \sigma} | x |} dx \\    & = & - e^{{- \sqrt 2 \over \sigma} 2^n} + 1 \\\end{eqnarray}Solving for $n$ gives:\begin{eqnarray}n & = & \log_2 \left( \log(2) {\sigma \over \sqrt 2 } \right)\label{eq:n4lpc}\\  & = & \log_2 \left(\log(2) E(|x|)  \right) \label{eq:n4poly}\end{eqnarray}When polynomial filters are used $n$ is obtained from $E(|x|)$ usingequation~\ref{eq:n4poly}.  In the LPC implementation $n$ is derivedfrom $\sigma$ which is obtained directly from the calculations forpredictor coefficients the using the autocorrelation method.\section{Lossy coding}The previous sections have outlined the complete waveform compressionalgorithm for lossless coding.  There are a wide range of applicationswhereby some loss in waveform accuracy is an acceptable tradeoff inreturn for better compression.  A reasonably clean way to implement thisis to dynamically change the quantisation level on a segment-by-segmentbasis.  Not only does this preserve the waveform shape, but theresulting distortion can be easily understood.  Assuming that thesamples are uniformally distributed within the new quantisation intervalof $n$, then the probability of any one value in this range is $1/n$ andthe noise power introduced is $i^2$ for the lower values that arerounded down and $(n -i)^2$ for those values that are rounded up.  Hencethe total noise power introduced by the increased quantisation is:\begin{eqnarray}{1 \over n } \left( \sum_{i=0}^{n / 2 - 1} i^2 + \sum_{i = n/2}^{n - 1}(n -i)^2 \right) & = & {1 \over 12 } (n^2 + 2) \label{eq:quantnoise}\end{eqnarray}It may also be assumed that the signal was uniformally distributed inthe original quantisation interval before digitisation, i.e.\ aquantisation error of $\int_{-1/2}^{1/2} x^2 {\rm d}x = 1/12$.Shorten supports two main types of lossy coding: the case where everysegment is coded at the same rate; and the case where the bit rate isdynamically adapted to maintain a specified segmental signal to noiseratio.  In the first mode, the variance of the prediction residual ofthe original waveform is estimated and then the appropriate quantisationperformed to limit the bit rate.  In areas of the waveform where thereare strong sample to sample correlations this results in a relativelyhigh signal to noise ratio, and in areas with little correlation thesignal to noise ratio approaches that of the signal power divided by thequantisation noise of equation~\ref{eq:quantnoise}.  In the second mode,this equation is used to estimate the greatest additional quantisationthat can be performed whilst maintaining a specified segmental signal tonoise ratio.  In both cases the new quantisation interval, $n$, isrestricted to be a power of two for computational efficiency.\section{Compression Performance \label{ss:perf}}The previous sections have demonstrated that low order linear predictionfollowed by Huffman coding to the Laplace distribution results in anefficient lossless waveform coder.  Table~\ref{tab:comp} compares thistechnique to the popular general purpose compression utilities that areavailable.  The table shows that the speech specific compression utilitycan achieve considerably better compression than more general tools.The compression and decompression speeds are the factors faster thanreal time when executed on a standard SparcStation I, except the resultfor the g722 ADPCM compression which was implemented on a SGI IndigoR4400 workstation using the supplied aifccompress/aifcdecompressutilities.  The SGI timings were scaled by a factor of 3.9 which wasdetermined by the relative execution times of shorten decompression onthe two platforms.\begin{table}[htbp]\begin{center}\begin{tabular}{|l|r|r|r|}  \hlineprogram         	& \% size & compress	& decompress \\			& 	  & speed 	& speed \\ \hlineUNIX compress   	& 74.0	  & 5.1 	& 15.0  \\UNIX pack       	& 69.8	  & 16.1 	& 8.0   \\GNU gzip        	& 66.0	  & 2.2 	& 17.2  \\shorten default (fast)	& 42.6	  & 13.4	& 16.1  \\shorten LPC (slow)	& 41.7 	  & 5.6		& 8.0	\\aifc[de]compress	& lossy	  & 2.3		& 2.2	\\ \hline\end{tabular}\end{center}\caption{Compression rates and speeds}\label{tab:comp}\end{table}To investigate the effects of lossy coding on speech recognitionperformance the test portion of the TIMIT database was coded at fourbits per sample and the resulting speech was recognised with a state ofthe art phone recognition system.  Both shorten and the g722 ADPCMstandard gave negligible additional errors (about 70 more errors overthe baseline of 15934 errors), but it was necessary to apply a factor offour scaling to the waveform for use with the g722 ADPCM algorithm.g722 ADPCM without scaling and the telephony quality g721 ADPCMalgorithm (designed for 8kHz sampling and operated at 16kHz) bothproduced significantly more errors (approximately 500 in 15934 errors).Coding this database at four bit per sample results in approximatelyanother factor of two compression over lossless coding.Decompression and playback of 16 bit, 44.1 kHz stereo audio takesapproximately 45\% of the available processing power of a 486DX2/66based machine and 25\% of a 60 MHz Pentium.  Disk access accounted for20\% of the time on the slower machine.  Performing compression to threebits per sample gives another factor of three compression, reducing thedisk access time proportionally and providing 20\% faster execution withno perceptual degradation (to the authors ears).  Thus real timedecompression of high quality audio is possible for a wide range ofpersonal computers.\section{Conclusion}This report has described a simple waveform coder designed for use withstored waveform files.  The use of a simple linear predictor followed byHuffman coding according to the Laplacian distribution has been found tobe appropriate for the examples studied.  Various techniques have beenadopted to improve the efficiency resulting in real time operation onmany platforms.  Lossy compression is supported, both to a specified bitrate and to a specified signal to noise ratio.  Most simple sample fileformats are accepted resulting in a flexible tool for the workstationenvironment.\begin{thebibliography}{1}\bibitem{GarofoloRobinsonFiscus94}John Garofolo, Tony Robinson, and Jonathan Fiscus.\newblock The development of file formats for very large speech corpora: Sphere  and shorten.\newblock In {\em Proc. ICASSP}, volume~I, pages 113--116, 1994.\bibitem{JayantNoll84}N.~S. Jayant and P.~Noll.\newblock {\em Digital Coding of Waveforms}.\newblock Prentice Hall, Englewood Cliffs, NJ, 1984.\newblock ISBN 0-13-211913-7 01.\bibitem{Rice71}R.~F. Rice and J.~R. Plaunt.\newblock Adaptive variable-length coding for efficient compression of  spacecraft television data.\newblock {\em IEEE Transactions on Communication Technology}, 19(6):889--897,  1971.\bibitem{YehRiceMiller91}Pen-Shu Yeh, Robert Rice, and Warner Miller.\newblock On the optimality of code options for a universal noisless coder.\newblock JPL Publication 91-2, Jet Propulsion Laboratories, February 1991.\bibitem{Rice91}Robert~F. Rice.\newblock Some practical noiseless coding techniques, {Part II, Module  PSI14,K+}.\newblock JPL Publication 91-3, Jet Propulsion Laboratories, November 1991.\end{thebibliography}\newpage\section*{Appendix: The shorten man page (version 1.22)}\begin{verbatim}SHORTEN(1)               USER COMMANDS                 SHORTEN(1)NAME     shorten - fast compression for waveform filesSYNOPSIS     shorten [-hl] [-a #bytes] [-b #samples] [-c  #channels]  [-d     #bytes]  [-m  #blocks]  [-n  #dB] [-p #order] [-q #bits] [-r     #bits]   [-t   filetype]   [-v   #version]    [waveform-file     [shortened-file]]     shorten -x [-hl] [ -a #bytes] [-d  #bytes]   [shortened-file     [waveform-file]]DESCRIPTION     shorten reduces the size of waveform files (such  as  audio)     using  Huffman  coding  of prediction residuals and optional     additional quantisation.  In lossless  mode  the  amount  of     compression  obtained depends on the nature of the waveform.     Those composing of low frequencies and low  amplitudes  give     the  best  compression,  which  may be 2:1 or better.  Lossy     compression operates by specifying a minimum acceptable seg-     mental  signal to noise ratio or a maximum bit rate.   Lossy     compression operates by zeroing the lower order bits of  the     waveform, so retaining waveform shape.     If both file names are specified then these are used as  the     input and output files.  The first file name can be replaced     by "-" to read from standard input and likewise  the  second     filename can be replaced by "-" to write to standard output.     Under UNIX, if only one file name is  specified,  then  that     name is used for input and the output file name is generated
💿 文件大小 14783 K
👤 上传用户 novelty1234
📂 所属分类文章/文档
🏷️ 相关标签

#语音 #处理平台 #分 #能量
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -