codeproject audio_ostream - a text-to-speech ostream_ free source code and programming help.htm

来自「linux 下的简单语音识别代码可以参考下」· HTM 代码 · 共 1,079 行 · 第 1/5 页
HTM
1,079 行
        <LI>A simple example of how to use <A 
        href="http://www.boost.org/libs/iostreams/doc/index.html">boost::iostreams</A> 
        </LI></UL>
      <H2>Background</H2>
      <P>I recently had to add audio outputs to a program (running on 
      Windows).</P>
      <P><A 
      href="http://www.microsoft.com/downloads/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b4530&amp;DisplayLang=en">Microsoft's 
      SAPI SDK</A> provides a COM interface through which wide character strings 
      can be spoken via SAPI's TTS engine. The Code Project has many articles 
      explaining how to use SAPI to varying degrees of complexity. So why 
      another?</P>
      <P>Well, there were some additional features that I wanted that did not 
      exist in those articles.</P>
      <OL>
        <LI>As little or no COM hassle. Ideally, it should work within the 
        simplest Console application. 
        <LI>Full (transparent) support for types other than wide-char. e.g. 
        <CODE><SPAN class=code-keyword>char</SPAN>*</CODE>, <CODE>std::<SPAN 
        class=code-SDKkeyword>string</SPAN></CODE>s and even <CODE><SPAN 
        class=code-keyword>int</SPAN></CODE>s, <CODE><SPAN 
        class=code-keyword>float</SPAN></CODE>s, etc. 
        <LI>Intuitive (or at least familiar) syntax </LI></OL>
      <P>To achieve these goals I developed <CODE>audio_ostream</CODE>.</P>
      <P><CODE>audio_ostream</CODE> is a full-fledged <CODE>std::ostream</CODE> 
      which supports any type that has an <CODE><SPAN 
      class=code-keyword>operator</SPAN><SPAN 
      class=code-keyword>&lt;</SPAN><SPAN 
      class=code-keyword>&lt;</SPAN>()</CODE>.</P>
      <P>You can have as many <CODE>audio_ostream</CODE>s as you like all 
      working in parallel.</P>
      <P>To handle COM issues, I used the wonderful COMSTL library which takes 
      care of all the delicate and brittle COMplications, such as 
      (un-)initialization, resource (de-)allocation, reference counting etc.</P>
      <P><CODE>boost::iostreams</CODE> is used to provide the full 
      <CODE>std::ostream</CODE> support with very little effort writing 
      boilerplate code.</P>
      <P>Since both <CODE>boost::iostreams</CODE> and COMSTL are header only 
      libraries I decided to make my class header only too. The minor price of 
      this decision is that the SAPI headers will be included into any file that 
      uses <CODE>audio_ostream</CODE>.</P>
      <H2>Using the code</H2>
      <P>Using the code cannot be easier:</P><PRE lang=c++><SPAN class=code-preprocessor>#include</SPAN><SPAN class=code-preprocessor> <SPAN class=code-string>"</SPAN><SPAN class=code-string>audiostream.hpp"</SPAN>
</SPAN><SPAN class=code-keyword>using</SPAN> <SPAN class=code-keyword>namespace</SPAN> std;
<SPAN class=code-keyword>using</SPAN> <SPAN class=code-keyword>namespace</SPAN> audiostream;
<SPAN class=code-keyword>int</SPAN> main()
{
   audio_ostream aout;
   aout <SPAN class=code-keyword>&lt;</SPAN><SPAN class=code-keyword>&lt;</SPAN> <SPAN class=code-string>"</SPAN><SPAN class=code-string>Hello World!"</SPAN>  <SPAN class=code-keyword>&lt;</SPAN><SPAN class=code-keyword>&lt;</SPAN> endl;
   <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> some more code...
</SPAN>   <SPAN class=code-keyword>return</SPAN> <SPAN class=code-digit>0</SPAN>;
}</PRE>
      <P>This little program will, obviously, say "Hello World!".</P>
      <P>The audio stream is asynchronous so the program will continue running 
      even while the text is being said (that's why the line <CODE><SPAN 
      class=code-comment>//</SPAN><SPAN class=code-comment> some more 
      code...</SPAN></CODE> is there, to allow it to finish speaking). This is 
      conceptually similar to how <CODE>std::ostream</CODE>s buffer results 
      until the internal buffer is full and only then the text is displayed.</P>
      <P>To use the class:</P>
      <OL>
        <LI><CODE><SPAN class=code-preprocessor>#include</SPAN><SPAN 
        class=code-preprocessor></SPAN></CODE> the <CODE>audiostream.hpp</CODE> 
        header file. 
        <LI>Create an instance of <CODE>audio_ostream</CODE> (or 
        <CODE>waudio_ostream</CODE>) 
        <LI>Use the stream as you would any <CODE>std::ostream</CODE>. </LI></OL>
      <P>That's really all you need to do to start using the class.</P>
      <H2>Pre-Requisites</H2>
      <P>For the code to compile and run you will need 3 libraries:</P>
      <OL>
        <LI>For the TTS engine, you will need to install the <A 
        href="http://www.microsoft.com/downloads/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b4530&amp;DisplayLang=en">Microsoft 
        Speech SDK</A> (I used ver. 5.1). 
        <LI>For COMSTL you will need the <A 
        href="http://synesis.com.au/software/stlsoft/">STLSoft libraries</A> 
        (you'll need STLSoft version 1.9.1 beta 44, or later). 
        <LI>The <A href="http://boost.org/">Boost</A> Iostreams library. You can 
        download Boost <A 
        href="http://sourceforge.net/project/showfiles.php?group_id=7586">here</A>. 
        </LI></OL>
      <P>Set your compiler and linker paths accordingly (Boost and STLSOft are 
      header only).</P>
      <H2>Advanced Usage</H2>
      <P>It's possible to change the voice gender, speed, language and many more 
      parameters of the voice using the SAPI text-to-speech (TTS) XML tags.</P>
      <P>Just insert the relevant XML tags into the stream to affect change. The 
      complete list of possible XML tags can be found <A 
      href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/Whitepapers/WP_XML_TTS_Tutorial.asp">here</A>.</P>
      <P>For example:</P><PRE lang=xml>audio_ostream aout;
// Select a male voice.
aout <SPAN class=code-keyword>&lt;</SPAN><SPAN class=code-leadattribute>&lt;</SPAN> <SPAN class=code-attribute>"&lt;voice</SPAN> <SPAN class=code-attribute>required</SPAN><SPAN class=code-keyword>='</SPAN><SPAN class=code-keyword>Gender=Male'</SPAN><SPAN class=code-keyword>&gt;</SPAN>Hello World!" <SPAN class=code-keyword>&lt;</SPAN><SPAN class=code-leadattribute>&lt;</SPAN> <SPAN class=code-attribute>endl;</SPAN> 
<SPAN class=code-attribute>aout</SPAN> <SPAN class=code-attribute>&lt;&lt;</SPAN> <SPAN class=code-attribute>"Five</SPAN> <SPAN class=code-attribute>hundred</SPAN> <SPAN class=code-attribute>milliseconds</SPAN> <SPAN class=code-attribute>of</SPAN> <SPAN class=code-attribute>silence"</SPAN> <SPAN class=code-attribute>&lt;&lt;</SPAN> <SPAN class=code-attribute>flush</SPAN> <SPAN class=code-attribute>&lt;&lt;</SPAN> 
    <SPAN class=code-attribute>"&lt;silence</SPAN> <SPAN class=code-attribute>msec</SPAN><SPAN class=code-keyword>='</SPAN><SPAN class=code-keyword>500'</SPAN><SPAN class=code-keyword>/</SPAN><SPAN class=code-keyword>&gt;</SPAN> just occurred." <SPAN class=code-keyword>&lt;</SPAN><SPAN class=code-leadattribute>&lt;</SPAN> <SPAN class=code-attribute>endl;</SPAN>
</PRE>
      <P>For some reason, the XML tags must be the first items in the SAPI 
      spoken string, without any preceding text. <CODE>flush</CODE>ing the 
      stream before the tag, as in the example, facilitates this.</P>
      <P>You can also call <CODE>SetRate()</CODE> with values [-10,10] to 
      control the speed of the speech.</P>
      <H2>The Magic</H2>
      <H3>The Core Class</H3>
      <P>The heart of the code is the <CODE>audio_sink</CODE> class:</P><PRE lang=c++><SPAN class=code-keyword>template</SPAN> <SPAN class=code-keyword>&lt;</SPAN> <SPAN class=code-keyword>class</SPAN> SinkType <SPAN class=code-keyword>&gt;</SPAN>
<SPAN class=code-keyword>class</SPAN> audio_sink: <SPAN class=code-keyword>public</SPAN> SinkType
{
<SPAN class=code-keyword>public</SPAN>:
   audio_sink()
   {      
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> Initialize the COM libraries
</SPAN>      <SPAN class=code-keyword>static</SPAN> comstl::com_initializer coinit;                         
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> Get SAPI Speech COM object
</SPAN>      HRESULT hr;
      <SPAN class=code-keyword>if</SPAN>(FAILED(hr = comstl::co_create_instance(CLSID_SpVoice, _pVoice))) 
          <SPAN class=code-keyword>throw</SPAN> comstl::com_exception(
              <SPAN class=code-string>"</SPAN><SPAN class=code-string>Failed to create SpVoice COM instance"</SPAN>,hr); 
   } 
   
   <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> speak a character string
</SPAN>   std::streamsize write(<SPAN class=code-keyword>const</SPAN> <SPAN class=code-keyword>char</SPAN>* s, std::streamsize n)
   {
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> make a null terminated string.
</SPAN>      std::<SPAN class=code-SDKkeyword>string</SPAN> str(s,n);                        
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> convert to wide character and call the actual speak method.
</SPAN>      <SPAN class=code-keyword>return</SPAN> write(winstl::a2w(str), str.size());  
   }
   
   <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> speak a wide character string
</SPAN>   std::streamsize write(<SPAN class=code-keyword>const</SPAN> <SPAN class=code-keyword>wchar_t</SPAN>* s, std::streamsize n)
   {
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> make a null terminated wstring.
</SPAN>      std::wstring str(s,n);                       
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> The actual COM call to Speak.
</SPAN>      _pVoice-<SPAN class=code-keyword>&gt;</SPAN>Speak(str.c_str(), SPF_ASYNC, <SPAN class=code-digit>0</SPAN>);   
      <SPAN class=code-keyword>return</SPAN> n;
   }
   
   <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> Set the speech speed.
</SPAN>   <SPAN class=code-keyword>void</SPAN> setRate(<SPAN class=code-keyword>long</SPAN> n) { _pVoice-<SPAN class=code-keyword>&gt;</SPAN>SetRate(n); }   

<SPAN class=code-keyword>private</SPAN>:      
   <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> COM object smart pointer.
</SPAN>   stlsoft::ref_ptr<SPAN class=code-keyword>&lt;</SPAN> ISpVoice <SPAN class=code-keyword>&gt;</SPAN> _pVoice;             
};</PRE>
      <P>There's a lot going on in this little class. Let's tease apart the 
      pieces one-by-one.</P>
      <H3>COMSTL, stlsoft::ref_ptr&lt;&gt; and ISpVoice</H3>
      <P>The only member of the class is <CODE>stlsoft::ref_ptr<SPAN 
      class=code-keyword>&lt;</SPAN> ISpVoice <SPAN 
      class=code-keyword>&gt;</SPAN> _pVoice</CODE>.</P>
      <P>This is the smart pointer that will handle all the COM stuff for us. 
      The STLSoft class <A 
      hfer="http://www.synesis.com.au/software/stlsoft/doc-1.9/&#13;&#10;classstlsoft_1_1ref__ptr.html">stlsoft::ref_ptr&lt;&gt;</A> 
      provides RAII-safe handling of reference-counted interfaces (RCIs). 
      Specifically, it is ideal for handling COM objects.</P>
      <P>We are using it with the <CODE>ISpVoice</CODE> interface. From 
      Microsoft's <A 
      href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/html/ISpVoice.asp">site</A>:</P><EM>The 
      <CODE>ISpVoice</CODE> interface enables an application to perform text 
      synthesis operations. Applications can speak text strings and text files, 
      or play audio files through this interface. All of these can be done 
      synchronously or asynchronously.</EM> 
      <P>In the constructor, we first initialize COM usage via the 
      <CODE>comstl::com_initializer</CODE>. This only happens once (since it is 
      a static object), and should not trouble us anymore. To initialize 
      <CODE>_pVoice</CODE> we call <CODE>comstl::co_create_instance()</CODE> 
      with the <CODE>CLSID_SpVoice</CODE> ID. If all goes well, we are now 
      holding an <CODE>ISpVoice</CODE> object handle. All reference counting 
      issues will be handled by <CODE>stlsoft::ref_ptr<SPAN 
      class=code-keyword>&lt;</SPAN><SPAN class=code-keyword>&gt;</SPAN></CODE>. 
      If the call fails an <CODE>comstl::com_exception</CODE> exception is 
      thrown and the class instance will not be created.</P>
      <P>To speak some text we just need to call <CODE>_pVoice-<SPAN 
      class=code-keyword>&gt;</SPAN>Speak()</CODE> with a wide character 
      string.</P>
      <P>To "speak text" we just need to call <CODE>_pVoice-<SPAN 
      class=code-keyword>&gt;</SPAN>Speak()</CODE> with a wide character 
      string.</P>
      <P>However, we would like to support other character types like 
      <CODE><SPAN class=code-keyword>char</SPAN>*</CODE>, <CODE>std::<SPAN 
      class=code-SDKkeyword>string</SPAN></CODE> and more. In fact, we want to 
      support any type that can be converted to a string or wide-string via an 
      <CODE><SPAN class=code-keyword>operator</SPAN><SPAN 
      class=code-keyword>&lt;</SPAN><SPAN 
      class=code-keyword>&lt;</SPAN>()</CODE>.</P>
      <H3>Boost Iostreams </H3>
      <P><A 
      href="http://www.boost.org/libs/iostreams/doc/index.html">boost::iostreams</A> 
      makes it easy to create standard C++ streams and stream buffers for 
      accessing new Sources and Sinks. To rephrase from the <A 
      href="http://www.boost.org/libs/iostreams/doc/index.html">site</A>:</P>
      <P><EM>A Sink provides write-access to a sequence of characters of a given 
      type. A Sink may expose this sequence by defining a member function 
      <CODE>write</CODE>, invoked indirectly by the Iostreams library through 
      the function <CODE>boost::iostreams::write</CODE>.</EM></P>
      <P>There are 2 pre-defined sinks, <CODE>boost::iostreams::sink</CODE> and 
      <CODE>boost::iostreams::wsink</CODE> for writing narrow and wide string 
      respectively.</P>
      <P>To make our class a Sink and get all its functionality, all we have to 
      do is to derive our class from either of these classes (depending if we 
      want narrow and wide character output). Thus, <CODE>audio_sink</CODE> is a 
      template class that derives from its template parameter.</P>
      <P>To use our sink and create a concrete <CODE>ostream</CODE>, we need to 
      use the <CODE>boost::iostreams::stream</CODE> class.</P>
      <P>The supporting class is <CODE>audio_ostream_t</CODE>: </P><PRE lang=c++><SPAN class=code-keyword>template</SPAN> <SPAN class=code-keyword>&lt;</SPAN> <SPAN class=code-keyword>class</SPAN> SinkType <SPAN class=code-keyword>&gt;</SPAN>
<SPAN class=code-keyword>class</SPAN> audio_ostream_t: <SPAN class=code-keyword>public</SPAN> boost::iostreams::stream<SPAN class=code-keyword>&lt;</SPAN> SinkType <SPAN class=code-keyword>&gt;</SPAN>, 
<SPAN class=code-keyword>public</SPAN> SinkType
{
<SPAN class=code-keyword>public</SPAN>:
   audio_ostream_t()
   {
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> Connect to Sink
</SPAN>      open(*<SPAN class=code-keyword>this</SPAN>);
   }
};
<SPAN class=code-keyword>typedef</SPAN> audio_ostream_t<SPAN class=code-keyword>&lt;</SPAN> audio_sink<SPAN class=code-keyword>&lt;</SPAN> boost::iostreams::sink  <SPAN class=code-keyword>&gt;</SPAN> <SPAN class=cod
codeproject audio_ostream - a text-to-speech ostream_ free source code and programming help.htm - 源码说明

本页面展示了「linux 下的简单语音识别代码可以参考下」中的 codeproject audio_ostream - a text-to-speech ostream_ free source code and programming help.htm 源码文件，采用 HTM 编程语言编写，共 1,079 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与linux相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?