📄 audio_ostream.aspx.htm

📁 识别语音的C代码! 还有教程说明.大家自己看看吧!
💻 HTM
📖 第 1 页 / 共 5 页
字号:
<p>You can have as many <code>audio_ostream</code>s as you like all working in parallel.</p>

<p>To handle COM issues, I used the wonderful COMSTL library which
takes care of all the delicate and brittle COMplications, such as
(un-)initialization, resource (de-)allocation, reference counting etc.</p>

<p><code>boost::iostreams</code> is used to provide the full <code>std::ostream</code> support with very little effort writing boilerplate code.</p>

<p>Since both <code>boost::iostreams</code> and COMSTL are header only
libraries I decided to make my class header only too. The minor price
of this decision is that the SAPI headers will be included into any
file that uses <code>audio_ostream</code>.</p>

<h2>Using the code</h2>

<p>Using the code cannot be easier:</p>

<pre lang="c++"><span class="code-preprocessor">#include</span><span class="code-preprocessor"> <span class="code-string">"</span><span class="code-string">audiostream.hpp"</span>
</span>
<span class="code-keyword">using</span> <span class="code-keyword">namespace</span> std;
<span class="code-keyword">using</span> <span class="code-keyword">namespace</span> audiostream;
<span class="code-keyword">int</span> main()
{
   audio_ostream aout;
   aout <span class="code-keyword">&lt;</span><span class="code-keyword">&lt;</span> <span class="code-string">"</span><span class="code-string">Hello World!"</span>  <span class="code-keyword">&lt;</span><span class="code-keyword">&lt;</span> endl;
   <span class="code-comment">//</span><span class="code-comment"> some more code...
</span>
   <span class="code-keyword">return</span> <span class="code-digit">0</span>;
}</pre>

<p>This little program will, obviously, say "Hello World!".</p>

<p>The audio stream is asynchronous so the program will continue running even while the text is being said (that's why the line <code><span class="code-comment">//</span><span class="code-comment"> some more code...</span></code> is there, to allow it to finish speaking). This is conceptually similar to how <code>std::ostream</code>s buffer results until the internal buffer is full and only then the text is displayed.</p>

<p>To use the class:</p>

<ol>
<li><code><span class="code-preprocessor">#include</span><span class="code-preprocessor"></span></code> the <code>audiostream.hpp</code> header file. </li>

<li>Create an instance of <code>audio_ostream</code> (or <code>waudio_ostream</code>) </li>

<li>Use the stream as you would any <code>std::ostream</code>. </li>
</ol>

<p>That's really all you need to do to start using the class.</p>

<h2>Pre-Requisites</h2>

<p>For the code to compile and run you will need 3 libraries:</p>

<ol>
  <li>For the TTS engine, you will need to install the <a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b4530&amp;DisplayLang=en">Microsoft 
    Speech SDK</a> (I used ver. 5.1). </li>
  <li>For COMSTL you will need the <a href="http://synesis.com.au/software/stlsoft/">STLSoft 
    libraries</a> (you'll need STLSoft version 1.9.1 beta 44, or later). </li>
  <li>The <a href="http://boost.org/">Boost</a> Iostreams library. You can download 
    Boost <a href="http://sourceforge.net/project/showfiles.php?group_id=7586">here</a>. 
  </li>
</ol>

<p>Set your compiler and linker paths accordingly (Boost and STLSOft are header only).</p>

<h2>Advanced Usage</h2>

<p>It's possible to change the voice gender, speed, language and many
more parameters of the voice using the SAPI text-to-speech (TTS) XML
tags.</p>

<p>Just insert the relevant XML tags into the stream to affect change. The complete list of possible XML tags can be found <a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/Whitepapers/WP_XML_TTS_Tutorial.asp">here</a>.</p>

<p>For example:</p>

<pre lang="xml">audio_ostream aout;
// Select a male voice.
aout <span class="code-keyword">&lt;</span><span class="code-leadattribute">&lt;</span> <span class="code-attribute">"&lt;voice</span> <span class="code-attribute">required</span><span class="code-keyword">='</span><span class="code-keyword">Gender=Male'</span><span class="code-keyword">&gt;</span>Hello World!" <span class="code-keyword">&lt;</span><span class="code-leadattribute">&lt;</span> <span class="code-attribute">endl;</span> 
<span class="code-attribute">aout</span> <span class="code-attribute">&lt;&lt;</span> <span class="code-attribute">"Five</span> <span class="code-attribute">hundred</span> <span class="code-attribute">milliseconds</span> <span class="code-attribute">of</span> <span class="code-attribute">silence"</span> <span class="code-attribute">&lt;&lt;</span> <span class="code-attribute">flush</span> <span class="code-attribute">&lt;&lt;</span> 
    <span class="code-attribute">"&lt;silence</span> <span class="code-attribute">msec</span><span class="code-keyword">='</span><span class="code-keyword">500'</span><span class="code-keyword">/</span><span class="code-keyword">&gt;</span> just occurred." <span class="code-keyword">&lt;</span><span class="code-leadattribute">&lt;</span> <span class="code-attribute">endl;</span>
</pre>

<p>For some reason, the XML tags must be the first items in the SAPI spoken string, without any preceding text. <code>flush</code>ing the stream before the tag, as in the example, facilitates this.</p>

<p>You can also call <code>SetRate()</code> with values [-10,10] to control the speed of the speech.</p>

<h2>The Magic</h2>

<h3>The Core Class</h3>

<p>The heart of the code is the <code>audio_sink</code> class:</p>

<pre lang="c++"><span class="code-keyword">template</span> <span class="code-keyword">&lt;</span> <span class="code-keyword">class</span> SinkType <span class="code-keyword">&gt;</span>
<span class="code-keyword">class</span> audio_sink: <span class="code-keyword">public</span> SinkType
{
<span class="code-keyword">public</span>:
   audio_sink()
   {      
      <span class="code-comment">//</span><span class="code-comment"> Initialize the COM libraries
</span>
      <span class="code-keyword">static</span> comstl::com_initializer coinit;                         
      <span class="code-comment">//</span><span class="code-comment"> Get SAPI Speech COM object
</span>
      HRESULT hr;
      <span class="code-keyword">if</span>(FAILED(hr = comstl::co_create_instance(CLSID_SpVoice, _pVoice))) 
          <span class="code-keyword">throw</span> comstl::com_exception(
              <span class="code-string">"</span><span class="code-string">Failed to create SpVoice COM instance"</span>,hr); 
   } 
   
   <span class="code-comment">//</span><span class="code-comment"> speak a character string
</span>
   std::streamsize write(<span class="code-keyword">const</span> <span class="code-keyword">char</span>* s, std::streamsize n)
   {
      <span class="code-comment">//</span><span class="code-comment"> make a null terminated string.
</span>
      std::<span class="code-SDKkeyword">string</span> str(s,n);                        
      <span class="code-comment">//</span><span class="code-comment"> convert to wide character and call the actual speak method.
</span>
      <span class="code-keyword">return</span> write(winstl::a2w(str), str.size());  
   }
   
   <span class="code-comment">//</span><span class="code-comment"> speak a wide character string
</span>
   std::streamsize write(<span class="code-keyword">const</span> <span class="code-keyword">wchar_t</span>* s, std::streamsize n)
   {
      <span class="code-comment">//</span><span class="code-comment"> make a null terminated wstring.
</span>
      std::wstring str(s,n);                       
      <span class="code-comment">//</span><span class="code-comment"> The actual COM call to Speak.
</span>
      _pVoice-<span class="code-keyword">&gt;</span>Speak(str.c_str(), SPF_ASYNC, <span class="code-digit">0</span>);   
      <span class="code-keyword">return</span> n;
   }
   
   <span class="code-comment">//</span><span class="code-comment"> Set the speech speed.
</span>
   <span class="code-keyword">void</span> setRate(<span class="code-keyword">long</span> n) { _pVoice-<span class="code-keyword">&gt;</span>SetRate(n); }   

<span class="code-keyword">private</span>:      
   <span class="code-comment">//</span><span class="code-comment"> COM object smart pointer.
</span>
   stlsoft::ref_ptr<span class="code-keyword">&lt;</span> ISpVoice <span class="code-keyword">&gt;</span> _pVoice;             
};</pre>

<p>There's a lot going on in this little class. Let's tease apart the pieces one-by-one.</p>

<h3>COMSTL, stlsoft::ref_ptr&lt;&gt; and ISpVoice</h3>

<p>The only member of the class is <code>stlsoft::ref_ptr<span class="code-keyword">&lt;</span> ISpVoice <span class="code-keyword">&gt;</span> _pVoice</code>.</p>

<p>This is the smart pointer that will handle all the COM stuff for us. The STLSoft class <a hfer="http://www.synesis.com.au/software/stlsoft/doc-1.9/
classstlsoft_1_1ref__ptr.html">stlsoft::ref_ptr&lt;&gt;</a> provides RAII-safe handling of reference-counted interfaces (RCIs). Specifically, it is ideal for handling COM objects.</p>

<p>We are using it with the <code>ISpVoice</code> interface. From Microsoft's <a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/html/ISpVoice.asp">site</a>:</p>
<em>The <code>ISpVoice</code> interface enables an application to
perform text synthesis operations. Applications can speak text strings
and text files, or play audio files through this interface. All of
these can be done synchronously or asynchronously.</em> 
<p>In the constructor, we first initialize COM usage via the <code>comstl::com_initializer</code>. This only happens once (since it is a static object), and should not trouble us anymore. To initialize <code>_pVoice</code> we call <code>comstl::co_create_instance()</code> with the <code>CLSID_SpVoice</code> ID. If all goes well, we are now holding an <code>ISpVoice</code> object handle. All reference counting issues will be handled by <code>stlsoft::ref_ptr<span class="code-keyword">&lt;</span><span class="code-keyword">&gt;</span></code>. If the call fails an <code>comstl::com_exception</code> exception is thrown and the class instance will not be created.</p>

<p>To speak some text we just need to call <code>_pVoice-<span class="code-keyword">&gt;</span>Speak()</code> with a wide character string.</p>

<p>To "speak text" we just need to call <code>_pVoice-<span class="code-keyword">&gt;</span>Speak()</code> with a wide character string.</p>

<p>However, we would like to support other character types like <code><span class="code-keyword">char</span>*</code>, <code>std::<span class="code-SDKkeyword">string</span></code> and more. In fact, we want to support any type that can be converted to a string or wide-string via an <code><span class="code-keyword">operator</span><span class="code-keyword">&lt;</span><span class="code-keyword">&lt;</span>()</code>.</p>

<h3>Boost Iostreams </h3>

<p><a href="http://www.boost.org/libs/iostreams/doc/index.html">boost::iostreams</a> makes it easy to create standard C++ streams and stream buffers for accessing new Sources and Sinks. To rephrase from the <a href="http://www.boost.org/libs/iostreams/doc/index.html">site</a>:</p>

<p><em>A Sink provides write-access to a sequence of characters of a
given type. A Sink may expose this sequence by defining a member
function <code>write</code>, invoked indirectly by the Iostreams library through the function <code>boost::iostreams::write</code>.</em></p>

<p>There are 2 pre-defined sinks, <code>boost::iostreams::sink</code> and <code>boost::iostreams::wsink</code> for writing narrow and wide string respectively.</p>

<p>To make our class a Sink and get all its functionality, all we have
to do is to derive our class from either of these classes (depending if
we want narrow and wide character output). Thus, <code>audio_sink</code> is a template class that derives from its template parameter.</p>

<p>To use our sink and create a concrete <code>ostream</code>, we need to use the <code>boost::iostreams::stream</code> class.</p>

<p>The supporting class is <code>audio_ostream_t</code>: </p>

<pre lang="c++"><span class="code-keyword">template</span> <span class="code-keyword">&lt;</span> <span class="code-keyword">class</span> SinkType <span class="code-keyword">&gt;</span>
<span class="code-keyword">class</span> audio_ostream_t: <span class="code-keyword">public</span> boost::iostreams::stream<span class="code-keyword">&lt;</span> SinkType <span class="code-keyword">&gt;</span>, 
<span class="code-keyword">public</span> SinkType
{
<span class="code-keyword">public</span>:
   audio_ostream_t()
   {
      <span class="code-comment">//</span><span class="code-comment"> Connect to Sink
</span>
      open(*<span class="code-keyword">this</span>);
   }
};
<span class="code-keyword">typedef</span> audio_ostream_t<span class="code-keyword">&lt;</span> audio_sink<span class="code-keyword">&lt;</span> boost::iostreams::sink  <span class="code-keyword">&gt;</span> <span class="code-keyword">&gt;</span>  
    audio_ostream ;
<span class="code-keyword">typedef</span> audio_ostream_t<span class="code-keyword">&lt;</span> audio_sink<span class="code-keyword">&lt;</span> boost::iostreams::wsink <span class="code-keyword">&gt;</span> <span class="code-keyword">&gt;</span> 
    waudio_ostream;</pre>

<p>This class allows us to combine both the sink and stream objects into a single entity.</p>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -