⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 manual.lyx

📁 一个开源的sip源代码
💻 LYX
📖 第 1 页 / 共 5 页
字号:
#LyX 1.4.4 created this file. For more info see http://www.lyx.org/\lyxformat 245\begin_document\begin_header\textclass scrbook\language english\inputencoding auto\fontscheme pslatex\graphics default\paperfontsize 10\spacing onehalf\papersize letterpaper\use_geometry true\use_amsmath 2\cite_engine basic\use_bibtopic false\paperorientation portrait\leftmargin 2cm\topmargin 2cm\rightmargin 2cm\bottommargin 2cm\secnumdepth 3\tocdepth 3\paragraph_separation indent\defskip medskip\quotes_language english\papercolumns 1\papersides 1\paperpagestyle headings\tracking_changes false\output_changes true\end_header\begin_body\begin_layout TitleThe Speex Codec Manual\newline(version 1.2-beta2)\end_layout\begin_layout AuthorJean-Marc Valin\end_layout\begin_layout Standard\newpageCopyright (c) 2002-2006 Jean-Marc Valin/Xiph.org Foundation\end_layout\begin_layout StandardPermission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Section, with no Front-Cover Texts, and with no Back-Cover. A copy of the license is included in the section entitled "GNU Free Documentation License". \end_layout\begin_layout Standard\newpage\begin_inset LatexCommand \tableofcontents{}\end_inset\newpage\end_layout\begin_layout Standard\begin_inset FloatList table\end_inset\newpage\end_layout\begin_layout ChapterIntroduction to Speex\end_layout\begin_layout StandardThe Speex project (\family typewriterhttp://www.speex.org/\family default) has been started because there was a need for a speech codec that was open-source and free from software patents. These are essential conditions for being used by any open-source software. There is already Vorbis that does general audio, but it is not really suitable for speech. Also, unlike many other speech codecs, Speex is not targeted at cell phones but rather at voice over IP (VoIP) and file-based compression. \end_layout\begin_layout StandardAs design goals, we wanted to have a codec that would allow both very good quality speech and low bit-rate (unfortunately not at the same time!), which led us to developing a codec with multiple bit-rates. Of course very good quality also meant we had to do wideband (16 kHz sampling rate) in addition to narrowband (telephone quality, 8 kHz sampling rate).\end_layout\begin_layout StandardDesigning for VoIP instead of cell phone use means that Speex must be robust to lost packets, but not to corrupted ones since packets either arrive unaltered or don't arrive at all. Also, the idea was to have a reasonable complexity and memory requirement without compromising too much on the efficiency of the codec.\end_layout\begin_layout StandardAll this led us to the choice of CELP\begin_inset LatexCommand \index{CELP}\end_inset as the encoding technique to use for Speex. One of the main reasons is that CELP has long proved that it could do the job and scale well to both low bit-rates (think DoD CELP @ 4.8 kbps) and high bit-rates (think G.728 @ 16 kbps). \end_layout\begin_layout StandardThis document is divided in the following way. Section \begin_inset LatexCommand \ref{sec:Feature-description}\end_inset describes the different Speex features and defines some terms that will be used in later sections. Section \begin_inset LatexCommand \ref{sec:Command-line-encoder/decoder}\end_inset provides information about the standard command-line tools, while \begin_inset LatexCommand \ref{sec:Programming-with-Speex}\end_inset contains information about programming using the Speex API. Section \begin_inset LatexCommand \ref{sec:Formats-and-standards}\end_inset has some information related to Speex and standards. The three last sections describe the internals of the codec and require some signal processing knowledge. Section \begin_inset LatexCommand \ref{sec:Introduction-to-CELP}\end_inset explains the general idea behind CELP, while sections \begin_inset LatexCommand \ref{sec:Speex-narrowband-mode}\end_inset and \begin_inset LatexCommand \ref{sec:Speex-wideband-mode}\end_inset are specific to Speex. Note that if you are only interested in using Speex, those three last sections are not required.\end_layout\begin_layout Standard\newpage\end_layout\begin_layout ChapterCodec description\begin_inset LatexCommand \label{sec:Feature-description}\end_inset\end_layout\begin_layout StandardThis section describes the main features provided by Speex.\end_layout\begin_layout SectionConcepts\end_layout\begin_layout StandardBefore introducing all the Speex features, here are some concepts in speech coding that help better understand the rest of the manual. Emphasis is placed on Speex.\end_layout\begin_layout Subsection*Sampling rate\begin_inset LatexCommand \index{sampling rate}\end_inset\end_layout\begin_layout StandardSpeex is mainly designed for three different sampling rates: 8 kHz, 16 kHz, and 32 kHz. These are respectively refered to as narrowband\begin_inset LatexCommand \index{narrowband}\end_inset, wideband\begin_inset LatexCommand \index{wideband}\end_inset and ultra-wideband\begin_inset LatexCommand \index{ultra-wideband}\end_inset. For a sampling rate of \begin_inset Formula $F_{s}$\end_inset kHz, the highest frequency that can be represented is equal to \begin_inset Formula $F_{s}/2$\end_inset kHz. This is a consequence of Nyquist's sampling theorem (and \begin_inset Formula $F_{s}/2$\end_inset is known as the Nyquist frequency).\end_layout\begin_layout Subsection*Quality\begin_inset LatexCommand \index{quality}\end_inset\end_layout\begin_layout StandardSpeex encoding is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate\begin_inset LatexCommand \index{constant bit-rate}\end_inset (CBR) operation, the quality parameter is an integer, while for variable bit-rate (VBR), the parameter is a float. \end_layout\begin_layout Subsection*Complexity\begin_inset LatexCommand \index{complexity}\end_inset (variable)\end_layout\begin_layout StandardWith Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way that's similar to the -1 to -9 options to \emph ongzip\emph default and \emph onbzip2\emph default compression utilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but the CPU requirements for complexity 10 is about 5 times higher than for complexity 1. In practice, the best trade-off is between complexity 2 and 4, though higher settings are often useful when encoding non-speech sounds like DTMF\begin_inset LatexCommand \index{DTMF}\end_inset tones.\end_layout\begin_layout Subsection*Variable Bit-Rate\begin_inset LatexCommand \index{variable bit-rate}\end_inset (VBR)\end_layout\begin_layout StandardVariable bit-rate (VBR) allows a codec to change its bit-rate dynamically to adapt to the \begin_inset Quotes eld\end_insetdifficulty\begin_inset Quotes erd\end_inset of the audio being encoded. In the example of Speex, sounds like vowels and high-energy transients require a higher bit-rate to achieve good quality, while fricatives (e.g. s,f sounds) can be coded adequately with less bits. For this reason, VBR can achive lower bit-rate for the same quality, or a better quality for a certain bit-rate. Despite its advantages, VBR has two main drawbacks: first, by only specifying quality, there's no guaranty about the final average bit-rate. Second, for some real-time applications like voice over IP (VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel.\end_layout\begin_layout Subsection*Average Bit-Rate\begin_inset LatexCommand \index{average bit-rate}\end_inset (ABR)\end_layout\begin_layout StandardAverage bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that obtained by encoding in VBR with exactly the right quality setting to meet the target average bit-rate.\end_layout\begin_layout Subsection*Voice Activity Detection\begin_inset LatexCommand \index{voice activity detection}\end_inset (VAD)\end_layout\begin_layout StandardWhen enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encode them with just enough bits to reproduce the background noise. This is called \begin_inset Quotes eld\end_insetcomfort noise generation\begin_inset Quotes erd\end_inset (CNG).\end_layout\begin_layout Subsection*Discontinuous Transmission\begin_inset LatexCommand \index{discontinuous transmission}\end_inset (DTX)\end_layout\begin_layout StandardDiscontinuous transmission is an addition to VAD/VBR operation, that allows to stop transmitting completely when the background noise is stationary. In file-based operation, since we cannot just stop writing to the file, only 5 bits are used for such frames (corresponding to 250 bps).\end_layout\begin_layout Subsection*Perceptual enhancement\begin_inset LatexCommand \index{perceptual enhancement}\end_inset\end_layout\begin_layout StandardPerceptual enhancement is a part of the decoder which, when turned on, tries to reduce (the perception of) the noise produced by the coding/decoding process. In most cases, perceptual enhancement make the sound further from the original \emph onobjectively\emph default (if you use SNR), but in the end it still \emph onsounds\emph default better (subjective improvement).\end_layout\begin_layout Subsection*Algorithmic delay\begin_inset LatexCommand \index{algorithmic delay}\end_inset\end_layout\begin_layout StandardEvery speech codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of \begin_inset Quotes eld\end_insetlook-ahead\begin_inset Quotes erd\end_inset required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values don't account for the CPU time it takes to encode or decode the frames.\end_layout\begin_layout SectionCodec\end_layout\begin_layout StandardThe main characteristics of Speex can be summarized as follows:\end_layout\begin_layout ItemizeFree software/open-source\begin_inset LatexCommand \index{open-source}\end_inset, patent\begin_inset LatexCommand \index{patent}\end_inset and royalty-free\end_layout\begin_layout ItemizeIntegration of narrowband\begin_inset LatexCommand \index{narrowband}\end_inset and wideband\begin_inset LatexCommand \index{wideband}\end_inset using an embedded bit-stream\end_layout\begin_layout ItemizeWide range of bit-rates available (from 2.15 kbps to 44 kbps)\end_layout\begin_layout ItemizeDynamic bit-rate switching (AMR) and Variable Bit-Rate\begin_inset LatexCommand \index{variable bit-rate}\end_inset (VBR) operation\end_layout\begin_layout ItemizeVoice Activity Detection\begin_inset LatexCommand \index{voice activity detection}\end_inset (VAD, integrated with VBR) and discontinuous transmission (DTX)\end_layout\begin_layout ItemizeVariable complexity\begin_inset LatexCommand \index{complexity}\end_inset\end_layout\begin_layout ItemizeEmbedded wideband structure (scalable sampling rate)\end_layout\begin_layout ItemizeUltra-wideband mode at 32 kHz\end_layout\begin_layout ItemizeIntensity stereo encoding option\end_layout\begin_layout ItemizeFixed-point implementation (work in progress)\end_layout\begin_layout SectionPreprocessor\end_layout\begin_layout StandardThis part refers to the preprocessor module introduced in the 1.1.x branch. The preprocessor is designed to be used on the audio \emph onbefore\emph default running the encoder. The preprocessor provides three main functionalities:\end_layout\begin_layout Itemizenoise suppression\end_layout\begin_layout Itemizeautomatic gain control (AGC)\end_layout\begin_layout Itemizevoice activity detection (VAD)\end_layout\begin_layout StandardThe denoiser can be used to reduce the amount of background noise present in the input signal. This provides higher quality speech whether or not the denoised signal is encoded with Speex (or at all). However, when using the denoised signal with the codec, there is an additional benefit. Speech codecs in general (Speex included) tend to perform poorly on noisy input, which tends to amplify the noise. The denoiser greatly reduces this effect.\end_layout\begin_layout StandardAutomatic gain control (AGC) is a feature that deals with the fact that the recording volume may vary by a large amount between different setups. The AGC provides a way to adjust a signal to a reference volume. This is useful for voice over IP because it removes the need for manual adjustment of the microphone gain. A secondary advantage is that by setting the microphone gain to a conservative (low) level, it is easier to avoid clipping.\end_layout\begin_layout StandardThe voice activity detector (VAD) provided by the preprocessor is more advanced than the one directly provided in the codec. \end_layout\begin_layout SectionAdaptive Jitter Buffer\end_layout\begin_layout StandardWhen transmitting voice (or any content for that matter) over UDP or RTP, packet may be lost, arrive with different delay, or even out of order. The purpose of a jitter buffer is to reorder packets and buffer them long enough (but no longer than necessary) so they can be sent to be decoded. \end_layout\begin_layout SectionAcoustic Echo Canceller\end_layout\begin_layout Standard

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -