📄 speech synthesis & speech recognition using sapi 5_1.htm

📁 softonline.dll中函数的使用,请见不同的例程,VB函数见VB例子,VC函数见VC例子,VFP函数见VFP的例子,BCB函数见BCB例子, Delphi函数见Delphi例子
💻 HTM
📖 第 1 页 / 共 5 页
字号:
  I: Integer;
  SOToken: ISpeechObjectToken;
  SOTokens: ISpeechObjectTokens;
<B>begin</B>
  SendMessage(lstProgress.Handle, LB_SETHORIZONTALEXTENT, Width, 0);
  <FONT color=#003399><I>//Ensure all events fire</I></FONT>
  SpVoice.EventInterests := SVEAllEvents;
  Log(<I>'About to enumerate voices'</I>);
  SOTokens := SpVoice.GetVoices(<I>''</I>, <I>''</I>);
  <B>for</B> I := 0 <B>to</B> SOTokens.Count - 1 <B>do</B>
  <B>begin</B>
    <FONT color=#003399><I>//For each voice, store the descriptor in the TStrings list</I></FONT>
    SOToken := SOTokens.Item(I);
    cbVoices.Items.AddObject(SOToken.GetDescription(0), TObject(SOToken));
    <FONT color=#003399><I>//Increment descriptor reference count to ensure it's not destroyed</I></FONT>
    SOToken._AddRef;
  <B>end</B>;
  <B>if</B> cbVoices.Items.Count &gt; 0 <B>then</B>
  <B>begin</B>
    cbVoices.ItemIndex := 0; <FONT color=#003399><I>//Select 1st voice</I></FONT>
    cbVoices.OnChange(cbVoices); <FONT color=#003399><I>//&amp; ensure OnChange triggers</I></FONT>
  <B>end</B>;
  Log(<I>'Enumerated voices'</I>);
  Log(<I>'About to check attributes'</I>);
  tbRate.Position := SpVoice.Rate;
  lblRate.Caption := IntToStr(tbRate.Position);
  tbVolume.Position := SpVoice.Volume;
  lblVolume.Caption := IntToStr(tbVolume.Position);
  Log(<I>'Checked attributes'</I>);
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>The SpVoice object's <FONT face="Courier New, Courier, mono">GetVoices</FONT> 
method returns a collection object that allows access to each voice as an <FONT 
face="Courier New, Courier, mono">ISpeechObjectToken</FONT>. In this code, both 
parameters are passed as empty strings, but the first can be used to specify 
required parameters of the returned voices and the second for optional 
parameters. So a call to <FONT 
face="Courier New, Courier, mono">GetVoices('Gender = male', '')</FONT> would 
return only male voices.</P>
<P>In order to keep track of the voices, these <FONT 
face="Courier New, Courier, mono">ISpeechObjectToken</FONT> interfaces are 
added, along with a description, to the combobox's <FONT 
face="Courier New, Courier, mono">Items</FONT> (the description in the <FONT 
face="Courier New, Courier, mono">Strings</FONT> array and the interfaces in the 
<FONT face="Courier New, Courier, mono">Objects</FONT> array).</P>
<P>Storing an interface reference in an object reference is possible as long as 
we remember exactly what we stored, and we don't make the mistake of accessing 
it as an object reference. Also, since the interface reference is stored using 
an inappropriate type, it is important to manually increment its reference count 
to stop it being destroyed when the RTL code decrements the reference count at 
the end of the method.</P>
<P>The <FONT face="Courier New, Courier, mono">OnDestroy</FONT> event handler 
tidies up these descriptor objects by decrementing their reference counts, 
thereby allowing them to be destroyed.</P>
<TABLE bgColor=white border=1>
  <TBODY>
  <TR>
    <TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmTextToSpeech.FormDestroy(Sender: TObject);
<B>var</B>
  I: Integer;
<B>begin</B>
  <FONT color=#003399><I>//Release all the voice descriptors</I></FONT>
  <B>for</B> I := 0 <B>to</B> cbVoices.Items.Count - 1 <B>do</B>
    ISpeechObjectToken(Pointer(cbVoices.Items.Objects[I]))._Release;
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>When the user selects a different voice from the combobox, the <FONT 
face="Courier New, Courier, mono">OnChange</FONT> event handler selects the new 
voice and displays the voice attributes (including the path in the Windows 
registry where the voice attributes are stored).</P>
<TABLE bgColor=white border=1>
  <TBODY>
  <TR>
    <TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmTextToSpeech.cbVoicesChange(Sender: TObject);
<B>var</B>
  SOToken: ISpeechObjectToken;
<B>begin</B>
  <B>with</B> lstEngineInfo.Items <B>do</B>
  <B>begin</B>
    Clear;
    SOToken := ISpeechObjectToken(Pointer(
      cbVoices.Items.Objects[cbVoices.ItemIndex]));
    SpVoice.Voice := SOToken;
    Add(Format(<I>'Name: %s'</I>, [SOToken.GetAttribute(<I>'Name'</I>)]));
    Add(Format(<I>'Vendor: %s'</I>, [SOToken.GetAttribute(<I>'Vendor'</I>)]));
    Add(Format(<I>'Age: %s'</I>, [SOToken.GetAttribute(<I>'Age'</I>)]));
    Add(Format(<I>'Gender: %s'</I>, [SOToken.GetAttribute(<I>'Gender'</I>)]));
    Add(Format(<I>'Language: %s'</I>, [SOToken.GetAttribute(<I>'Language'</I>)]));
    Add(Format(<I>'Reg key: %s'</I>, [SOToken.Id]));
  <B>end</B>
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<H3><A name=Speech>Making Your Computer Talk</A></H3>
<P>There are different calls to start speech and to continue paused speech, so a 
helper flag is employed to record whether pause has been pressed. This allows 
the play button to start a fresh speech stream as well as continue a paused 
speech stream. The text to speak is taken from a richedit control and is spoken 
asynchronously thanks to the <FONT 
face="Courier New, Courier, mono">SVSFlagsAsync</FONT> flag being used.</P>
<TABLE bgColor=white border=1>
  <TBODY>
  <TR>
    <TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmTextToSpeech.btnPlayClick(Sender: TObject);
<B>begin</B>
  <B>if</B> <B>not</B> BeenPaused <B>then</B>
    SpVoice.Speak(reText.Text, SVSFlagsAsync)
  <B>else</B>
  <B>begin</B>
    SpVoice.Resume;
    BeenPaused := False
  <B>end</B>
<B>end</B>;

<B>procedure</B> TfrmTextToSpeech.btnPauseClick(Sender: TObject);
<B>begin</B>
  SpVoice.Pause;
  BeenPaused := True
<B>end</B>;

<B>procedure</B> TfrmTextToSpeech.btnStopClick(Sender: TObject);
<B>begin</B>
  SpVoice.Skip(<I>'Sentence'</I>, MaxInt)
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>There is another <A 
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.zip">speech 
demo</A> in the same directory in the project TextToSpeechReadWordDoc.dpr. As 
the name suggests, this sample reads out loud from a Word document. It uses 
Automation to control Microsoft Word (as well as the SAPI voice object).</P>
<TABLE bgColor=white border=1>
  <TBODY>
  <TR>
    <TD><PRE><CODE><FONT color=black size=2>
<B>type</B>
  TfrmVTxtAutoLateBound = <B>class</B>(TForm)
  ...
  <B>private</B>
    MSWord: Variant;
  <B>end</B>;
...
<B>procedure</B> TfrmTextToSpeechReadWordDoc.FormCreate(Sender: TObject);
<B>begin</B>
  MSWord := CreateOleObject(<I>'Word.Application'</I>);
<B>end</B>;

<B>procedure</B> TfrmTextToSpeechReadWordDoc.btnReadDocClick(Sender: TObject);
<B>const</B>
<FONT color=#003399><I>// Constants for enum WdUnits</I></FONT>
  wdCharacter = $00000001;
  wdParagraph = $00000004;
<FONT color=#003399><I>// Constants for enum WdMovementType</I></FONT>
  wdExtend = $00000001;
<B>var</B>
  Moved: Integer;
  Txt: <B>String</B>;
<B>begin</B>
  (Sender <B>as</B> TButton).Enabled := False;
  Stopped := False;
  <B>if</B> dlgOpenDoc.Execute <B>then</B>
  <B>begin</B>
    MSWord.Documents.Open(FileName := dlgOpenDoc.FileName);
    Moved := 2;
    <B>while</B> (Moved &gt; 1) <B>and</B> <B>not</B> Stopped <B>do</B>
    <B>begin</B>
      <FONT color=#003399><I>//Select next paragraph</I></FONT>
      Moved := MSWord.Selection.EndOf(<B>Unit</B>:=wdParagraph, Extend:=wdExtend);
      <B>if</B> Moved &gt; 1 <B>then</B>
      <B>begin</B>
        Txt := Trim(MSWord.Selection.Text);
        <B>if</B> Length(Txt) &gt; 0 <B>then</B>
          SpVoice.Speak(Txt, SVSFlagsAsync);
        Application.ProcessMessages;
        <FONT color=#003399><I>//Move to start of next paragraph</I></FONT>
        MSWord.Selection.MoveRight(<B>Unit</B> := wdCharacter);
      <B>end</B>
    <B>end</B>;
  <B>end</B>;
  MSWord.ActiveDocument.Close;
  TButton(Sender).Enabled := True;
<B>end</B>;

<B>procedure</B> TfrmTextToSpeechReadWordDoc.btnStopClick(Sender: TObject);
<B>begin</B>
  SpVoice.Skip(<I>'Sentence'</I>, Maxint);
  Stopped := True;
<B>end</B>;

<B>procedure</B> TfrmTextToSpeechReadWordDoc.FormDestroy(Sender: TObject);
<B>begin</B>
  btnStop.Click;
  MSWord.Quit;
  MSWord := Unassigned;
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<H3><A name=Events>Voice Events</A></H3>
<P>The <FONT face="Courier New, Courier, mono">SpVoice</FONT> object has a 
variety of events that fire during speech. Each block of speech starts with an 
<FONT face="Courier New, Courier, mono">OnStartStream</FONT> event and ends with 
<FONT face="Courier New, Courier, mono">OnEndStream</FONT>. <FONT 
face="Courier New, Courier, mono">OnStartStream</FONT> identifies the speech 
stream, and all the other events pass the stream number to which they pertain. 
As each sentence is started an <FONT 
face="Courier New, Courier, mono">OnSentence</FONT> event fires and there is 
also an <FONT face="Courier New, Courier, mono">OnWord</FONT> event that 
triggers at the start of each spoken word.</P>
<P>Additionally (among others) an <FONT 
face="Courier New, Courier, mono">OnAudioLevel</FONT> event allows a progress 
bar to be used as a VU meter for the spoken text. However it is important to 
note that for some events to fire you must set the <FONT 
face="Courier New, Courier, mono">EventInterests</FONT> property accordingly; to 
receive the <FONT face="Courier New, Courier, mono">OnAudioLevel</FONT> event 
you should set <FONT face="Courier New, Courier, mono">EventInterests</FONT> to 
<FONT face="Courier New, Courier, mono">SVEAudioLevel</FONT> or <FONT 
face="Courier New, Courier, mono">SVEAllEvents</FONT>.</P>
<TABLE bgColor=white border=1>
  <TBODY>
  <TR>
    <TD><PRE><CODE><FONT color=black size=2>
<B>const</B>
  Phonemes: <B>array</B>[1..49] <B>of</B> <B>String</B> = (
    <I>'-'</I>, <I>'!'</I>, <I>'&amp;'</I>, <I>','</I>, <I>'.'</I>, <I>'?'</I>, <I>'_'</I>,
    <I>'1'</I>, <I>'2'</I>, <I>'aa'</I>, <I>'ae'</I>, <I>'ah'</I>, <I>'ao'</I>, <I>'aw'</I>,
    <I>'ax'</I>, <I>'ay'</I>, <I>'b'</I>, <I>'ch'</I>, <I>'d'</I>, <I>'dh'</I>, <I>'eh'</I>,
    <I>'er'</I>, <I>'ey'</I>, <I>'f'</I>, <I>'g'</I>, <I>'h'</I>, <I>'ih'</I>, <I>'iy'</I>,
    <I>'jh'</I>, <I>'k'</I>, <I>'l'</I>, <I>'m'</I>, <I>'n'</I>, <I>'ng'</I>, <I>'ow'</I>,
    <I>'oy'</I>, <I>'p'</I>, <I>'r'</I>, <I>'s'</I>, <I>'sh'</I>, <I>'t'</I>, <I>'th'</I>,
    <I>'uh'</I>, <I>'uw'</I>, <I>'v'</I>, <I>'w'</I>, <I>'y'</I>, <I>'z'</I>, <I>'zh'</I>
  );

<B>procedure</B> TfrmTextToSpeech.SpVoicePhoneme(Sender: TObject;
  StreamNumber: Integer; StreamPosition: OleVariant; Duration: Integer;
  NextPhoneId: Smallint; Feature: TOleEnum; CurrentPhoneId: Smallint);
<B>begin</B>
  <B>if</B> CurrentPhoneId &lt;&gt; 7 <B>then</B> <FONT color=#003399><I>//Display phonemes, except silence</I></FONT>
    memEnginePhonemes.Text :=
      memEnginePhonemes.Text + Phonemes[CurrentPhoneId] +<I>'-'</I>
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<H3><A name=Animation>Animating Speech</A></H3>
<P>An <FONT face="Courier New, Courier, mono">OnViseme</FONT> event is triggered 
for each recognised viseme (a portion of speech requiring the mouth to move into 
a visibly different position); there are 22 different visemes generated by 
English speech and these are based on the Disney 13 visemes (cartoons have less 
granularity and Disney animators discovered many years ago that only 13 cartoon 
mouth shapes are required to represent all English phonemes).</P>
<P>If you have some artistic flair and can draw a mouth in each position 
represented by the visemes you could use this event to provide a simple animated 
representation of speech.</P>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -