⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 codeproject audio_ostream - a text-to-speech ostream_ free source code and programming help.htm

📁 linux 下的 简单语音识别 代码 可以参考下
💻 HTM
📖 第 1 页 / 共 5 页
字号:
        <LI>A simple example of how to use <A 
        href="http://www.boost.org/libs/iostreams/doc/index.html">boost::iostreams</A> 
        </LI></UL>
      <H2>Background</H2>
      <P>I recently had to add audio outputs to a program (running on 
      Windows).</P>
      <P><A 
      href="http://www.microsoft.com/downloads/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b4530&amp;DisplayLang=en">Microsoft's 
      SAPI SDK</A> provides a COM interface through which wide character strings 
      can be spoken via SAPI's TTS engine. The Code Project has many articles 
      explaining how to use SAPI to varying degrees of complexity. So why 
      another?</P>
      <P>Well, there were some additional features that I wanted that did not 
      exist in those articles.</P>
      <OL>
        <LI>As little or no COM hassle. Ideally, it should work within the 
        simplest Console application. 
        <LI>Full (transparent) support for types other than wide-char. e.g. 
        <CODE><SPAN class=code-keyword>char</SPAN>*</CODE>, <CODE>std::<SPAN 
        class=code-SDKkeyword>string</SPAN></CODE>s and even <CODE><SPAN 
        class=code-keyword>int</SPAN></CODE>s, <CODE><SPAN 
        class=code-keyword>float</SPAN></CODE>s, etc. 
        <LI>Intuitive (or at least familiar) syntax </LI></OL>
      <P>To achieve these goals I developed <CODE>audio_ostream</CODE>.</P>
      <P><CODE>audio_ostream</CODE> is a full-fledged <CODE>std::ostream</CODE> 
      which supports any type that has an <CODE><SPAN 
      class=code-keyword>operator</SPAN><SPAN 
      class=code-keyword>&lt;</SPAN><SPAN 
      class=code-keyword>&lt;</SPAN>()</CODE>.</P>
      <P>You can have as many <CODE>audio_ostream</CODE>s as you like all 
      working in parallel.</P>
      <P>To handle COM issues, I used the wonderful COMSTL library which takes 
      care of all the delicate and brittle COMplications, such as 
      (un-)initialization, resource (de-)allocation, reference counting etc.</P>
      <P><CODE>boost::iostreams</CODE> is used to provide the full 
      <CODE>std::ostream</CODE> support with very little effort writing 
      boilerplate code.</P>
      <P>Since both <CODE>boost::iostreams</CODE> and COMSTL are header only 
      libraries I decided to make my class header only too. The minor price of 
      this decision is that the SAPI headers will be included into any file that 
      uses <CODE>audio_ostream</CODE>.</P>
      <H2>Using the code</H2>
      <P>Using the code cannot be easier:</P><PRE lang=c++><SPAN class=code-preprocessor>#include</SPAN><SPAN class=code-preprocessor> <SPAN class=code-string>"</SPAN><SPAN class=code-string>audiostream.hpp"</SPAN>
</SPAN><SPAN class=code-keyword>using</SPAN> <SPAN class=code-keyword>namespace</SPAN> std;
<SPAN class=code-keyword>using</SPAN> <SPAN class=code-keyword>namespace</SPAN> audiostream;
<SPAN class=code-keyword>int</SPAN> main()
{
   audio_ostream aout;
   aout <SPAN class=code-keyword>&lt;</SPAN><SPAN class=code-keyword>&lt;</SPAN> <SPAN class=code-string>"</SPAN><SPAN class=code-string>Hello World!"</SPAN>  <SPAN class=code-keyword>&lt;</SPAN><SPAN class=code-keyword>&lt;</SPAN> endl;
   <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> some more code...
</SPAN>   <SPAN class=code-keyword>return</SPAN> <SPAN class=code-digit>0</SPAN>;
}</PRE>
      <P>This little program will, obviously, say "Hello World!".</P>
      <P>The audio stream is asynchronous so the program will continue running 
      even while the text is being said (that's why the line <CODE><SPAN 
      class=code-comment>//</SPAN><SPAN class=code-comment> some more 
      code...</SPAN></CODE> is there, to allow it to finish speaking). This is 
      conceptually similar to how <CODE>std::ostream</CODE>s buffer results 
      until the internal buffer is full and only then the text is displayed.</P>
      <P>To use the class:</P>
      <OL>
        <LI><CODE><SPAN class=code-preprocessor>#include</SPAN><SPAN 
        class=code-preprocessor></SPAN></CODE> the <CODE>audiostream.hpp</CODE> 
        header file. 
        <LI>Create an instance of <CODE>audio_ostream</CODE> (or 
        <CODE>waudio_ostream</CODE>) 
        <LI>Use the stream as you would any <CODE>std::ostream</CODE>. </LI></OL>
      <P>That's really all you need to do to start using the class.</P>
      <H2>Pre-Requisites</H2>
      <P>For the code to compile and run you will need 3 libraries:</P>
      <OL>
        <LI>For the TTS engine, you will need to install the <A 
        href="http://www.microsoft.com/downloads/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b4530&amp;DisplayLang=en">Microsoft 
        Speech SDK</A> (I used ver. 5.1). 
        <LI>For COMSTL you will need the <A 
        href="http://synesis.com.au/software/stlsoft/">STLSoft libraries</A> 
        (you'll need STLSoft version 1.9.1 beta 44, or later). 
        <LI>The <A href="http://boost.org/">Boost</A> Iostreams library. You can 
        download Boost <A 
        href="http://sourceforge.net/project/showfiles.php?group_id=7586">here</A>. 
        </LI></OL>
      <P>Set your compiler and linker paths accordingly (Boost and STLSOft are 
      header only).</P>
      <H2>Advanced Usage</H2>
      <P>It's possible to change the voice gender, speed, language and many more 
      parameters of the voice using the SAPI text-to-speech (TTS) XML tags.</P>
      <P>Just insert the relevant XML tags into the stream to affect change. The 
      complete list of possible XML tags can be found <A 
      href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/Whitepapers/WP_XML_TTS_Tutorial.asp">here</A>.</P>
      <P>For example:</P><PRE lang=xml>audio_ostream aout;
// Select a male voice.
aout <SPAN class=code-keyword>&lt;</SPAN><SPAN class=code-leadattribute>&lt;</SPAN> <SPAN class=code-attribute>"&lt;voice</SPAN> <SPAN class=code-attribute>required</SPAN><SPAN class=code-keyword>='</SPAN><SPAN class=code-keyword>Gender=Male'</SPAN><SPAN class=code-keyword>&gt;</SPAN>Hello World!" <SPAN class=code-keyword>&lt;</SPAN><SPAN class=code-leadattribute>&lt;</SPAN> <SPAN class=code-attribute>endl;</SPAN> 
<SPAN class=code-attribute>aout</SPAN> <SPAN class=code-attribute>&lt;&lt;</SPAN> <SPAN class=code-attribute>"Five</SPAN> <SPAN class=code-attribute>hundred</SPAN> <SPAN class=code-attribute>milliseconds</SPAN> <SPAN class=code-attribute>of</SPAN> <SPAN class=code-attribute>silence"</SPAN> <SPAN class=code-attribute>&lt;&lt;</SPAN> <SPAN class=code-attribute>flush</SPAN> <SPAN class=code-attribute>&lt;&lt;</SPAN> 
    <SPAN class=code-attribute>"&lt;silence</SPAN> <SPAN class=code-attribute>msec</SPAN><SPAN class=code-keyword>='</SPAN><SPAN class=code-keyword>500'</SPAN><SPAN class=code-keyword>/</SPAN><SPAN class=code-keyword>&gt;</SPAN> just occurred." <SPAN class=code-keyword>&lt;</SPAN><SPAN class=code-leadattribute>&lt;</SPAN> <SPAN class=code-attribute>endl;</SPAN>
</PRE>
      <P>For some reason, the XML tags must be the first items in the SAPI 
      spoken string, without any preceding text. <CODE>flush</CODE>ing the 
      stream before the tag, as in the example, facilitates this.</P>
      <P>You can also call <CODE>SetRate()</CODE> with values [-10,10] to 
      control the speed of the speech.</P>
      <H2>The Magic</H2>
      <H3>The Core Class</H3>
      <P>The heart of the code is the <CODE>audio_sink</CODE> class:</P><PRE lang=c++><SPAN class=code-keyword>template</SPAN> <SPAN class=code-keyword>&lt;</SPAN> <SPAN class=code-keyword>class</SPAN> SinkType <SPAN class=code-keyword>&gt;</SPAN>
<SPAN class=code-keyword>class</SPAN> audio_sink: <SPAN class=code-keyword>public</SPAN> SinkType
{
<SPAN class=code-keyword>public</SPAN>:
   audio_sink()
   {      
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> Initialize the COM libraries
</SPAN>      <SPAN class=code-keyword>static</SPAN> comstl::com_initializer coinit;                         
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> Get SAPI Speech COM object
</SPAN>      HRESULT hr;
      <SPAN class=code-keyword>if</SPAN>(FAILED(hr = comstl::co_create_instance(CLSID_SpVoice, _pVoice))) 
          <SPAN class=code-keyword>throw</SPAN> comstl::com_exception(
              <SPAN class=code-string>"</SPAN><SPAN class=code-string>Failed to create SpVoice COM instance"</SPAN>,hr); 
   } 
   
   <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> speak a character string
</SPAN>   std::streamsize write(<SPAN class=code-keyword>const</SPAN> <SPAN class=code-keyword>char</SPAN>* s, std::streamsize n)
   {
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> make a null terminated string.
</SPAN>      std::<SPAN class=code-SDKkeyword>string</SPAN> str(s,n);                        
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> convert to wide character and call the actual speak method.
</SPAN>      <SPAN class=code-keyword>return</SPAN> write(winstl::a2w(str), str.size());  
   }
   
   <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> speak a wide character string
</SPAN>   std::streamsize write(<SPAN class=code-keyword>const</SPAN> <SPAN class=code-keyword>wchar_t</SPAN>* s, std::streamsize n)
   {
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> make a null terminated wstring.
</SPAN>      std::wstring str(s,n);                       
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> The actual COM call to Speak.
</SPAN>      _pVoice-<SPAN class=code-keyword>&gt;</SPAN>Speak(str.c_str(), SPF_ASYNC, <SPAN class=code-digit>0</SPAN>);   
      <SPAN class=code-keyword>return</SPAN> n;
   }
   
   <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> Set the speech speed.
</SPAN>   <SPAN class=code-keyword>void</SPAN> setRate(<SPAN class=code-keyword>long</SPAN> n) { _pVoice-<SPAN class=code-keyword>&gt;</SPAN>SetRate(n); }   

<SPAN class=code-keyword>private</SPAN>:      
   <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> COM object smart pointer.
</SPAN>   stlsoft::ref_ptr<SPAN class=code-keyword>&lt;</SPAN> ISpVoice <SPAN class=code-keyword>&gt;</SPAN> _pVoice;             
};</PRE>
      <P>There's a lot going on in this little class. Let's tease apart the 
      pieces one-by-one.</P>
      <H3>COMSTL, stlsoft::ref_ptr&lt;&gt; and ISpVoice</H3>
      <P>The only member of the class is <CODE>stlsoft::ref_ptr<SPAN 
      class=code-keyword>&lt;</SPAN> ISpVoice <SPAN 
      class=code-keyword>&gt;</SPAN> _pVoice</CODE>.</P>
      <P>This is the smart pointer that will handle all the COM stuff for us. 
      The STLSoft class <A 
      hfer="http://www.synesis.com.au/software/stlsoft/doc-1.9/&#13;&#10;classstlsoft_1_1ref__ptr.html">stlsoft::ref_ptr&lt;&gt;</A> 
      provides RAII-safe handling of reference-counted interfaces (RCIs). 
      Specifically, it is ideal for handling COM objects.</P>
      <P>We are using it with the <CODE>ISpVoice</CODE> interface. From 
      Microsoft's <A 
      href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/html/ISpVoice.asp">site</A>:</P><EM>The 
      <CODE>ISpVoice</CODE> interface enables an application to perform text 
      synthesis operations. Applications can speak text strings and text files, 
      or play audio files through this interface. All of these can be done 
      synchronously or asynchronously.</EM> 
      <P>In the constructor, we first initialize COM usage via the 
      <CODE>comstl::com_initializer</CODE>. This only happens once (since it is 
      a static object), and should not trouble us anymore. To initialize 
      <CODE>_pVoice</CODE> we call <CODE>comstl::co_create_instance()</CODE> 
      with the <CODE>CLSID_SpVoice</CODE> ID. If all goes well, we are now 
      holding an <CODE>ISpVoice</CODE> object handle. All reference counting 
      issues will be handled by <CODE>stlsoft::ref_ptr<SPAN 
      class=code-keyword>&lt;</SPAN><SPAN class=code-keyword>&gt;</SPAN></CODE>. 
      If the call fails an <CODE>comstl::com_exception</CODE> exception is 
      thrown and the class instance will not be created.</P>
      <P>To speak some text we just need to call <CODE>_pVoice-<SPAN 
      class=code-keyword>&gt;</SPAN>Speak()</CODE> with a wide character 
      string.</P>
      <P>To "speak text" we just need to call <CODE>_pVoice-<SPAN 
      class=code-keyword>&gt;</SPAN>Speak()</CODE> with a wide character 
      string.</P>
      <P>However, we would like to support other character types like 
      <CODE><SPAN class=code-keyword>char</SPAN>*</CODE>, <CODE>std::<SPAN 
      class=code-SDKkeyword>string</SPAN></CODE> and more. In fact, we want to 
      support any type that can be converted to a string or wide-string via an 
      <CODE><SPAN class=code-keyword>operator</SPAN><SPAN 
      class=code-keyword>&lt;</SPAN><SPAN 
      class=code-keyword>&lt;</SPAN>()</CODE>.</P>
      <H3>Boost Iostreams </H3>
      <P><A 
      href="http://www.boost.org/libs/iostreams/doc/index.html">boost::iostreams</A> 
      makes it easy to create standard C++ streams and stream buffers for 
      accessing new Sources and Sinks. To rephrase from the <A 
      href="http://www.boost.org/libs/iostreams/doc/index.html">site</A>:</P>
      <P><EM>A Sink provides write-access to a sequence of characters of a given 
      type. A Sink may expose this sequence by defining a member function 
      <CODE>write</CODE>, invoked indirectly by the Iostreams library through 
      the function <CODE>boost::iostreams::write</CODE>.</EM></P>
      <P>There are 2 pre-defined sinks, <CODE>boost::iostreams::sink</CODE> and 
      <CODE>boost::iostreams::wsink</CODE> for writing narrow and wide string 
      respectively.</P>
      <P>To make our class a Sink and get all its functionality, all we have to 
      do is to derive our class from either of these classes (depending if we 
      want narrow and wide character output). Thus, <CODE>audio_sink</CODE> is a 
      template class that derives from its template parameter.</P>
      <P>To use our sink and create a concrete <CODE>ostream</CODE>, we need to 
      use the <CODE>boost::iostreams::stream</CODE> class.</P>
      <P>The supporting class is <CODE>audio_ostream_t</CODE>: </P><PRE lang=c++><SPAN class=code-keyword>template</SPAN> <SPAN class=code-keyword>&lt;</SPAN> <SPAN class=code-keyword>class</SPAN> SinkType <SPAN class=code-keyword>&gt;</SPAN>
<SPAN class=code-keyword>class</SPAN> audio_ostream_t: <SPAN class=code-keyword>public</SPAN> boost::iostreams::stream<SPAN class=code-keyword>&lt;</SPAN> SinkType <SPAN class=code-keyword>&gt;</SPAN>, 
<SPAN class=code-keyword>public</SPAN> SinkType
{
<SPAN class=code-keyword>public</SPAN>:
   audio_ostream_t()
   {
      <SPAN class=code-comment>//</SPAN><SPAN class=code-comment> Connect to Sink
</SPAN>      open(*<SPAN class=code-keyword>this</SPAN>);
   }
};
<SPAN class=code-keyword>typedef</SPAN> audio_ostream_t<SPAN class=code-keyword>&lt;</SPAN> audio_sink<SPAN class=code-keyword>&lt;</SPAN> boost::iostreams::sink  <SPAN class=code-keyword>&gt;</SPAN> <SPAN class=cod

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -