📄 speech synthesis & speech recognition using sapi 5_1.htm
字号:
I: Integer;
SOToken: ISpeechObjectToken;
SOTokens: ISpeechObjectTokens;
<B>begin</B>
SendMessage(lstProgress.Handle, LB_SETHORIZONTALEXTENT, Width, 0);
<FONT color=#003399><I>//Ensure all events fire</I></FONT>
SpVoice.EventInterests := SVEAllEvents;
Log(<I>'About to enumerate voices'</I>);
SOTokens := SpVoice.GetVoices(<I>''</I>, <I>''</I>);
<B>for</B> I := 0 <B>to</B> SOTokens.Count - 1 <B>do</B>
<B>begin</B>
<FONT color=#003399><I>//For each voice, store the descriptor in the TStrings list</I></FONT>
SOToken := SOTokens.Item(I);
cbVoices.Items.AddObject(SOToken.GetDescription(0), TObject(SOToken));
<FONT color=#003399><I>//Increment descriptor reference count to ensure it's not destroyed</I></FONT>
SOToken._AddRef;
<B>end</B>;
<B>if</B> cbVoices.Items.Count > 0 <B>then</B>
<B>begin</B>
cbVoices.ItemIndex := 0; <FONT color=#003399><I>//Select 1st voice</I></FONT>
cbVoices.OnChange(cbVoices); <FONT color=#003399><I>//& ensure OnChange triggers</I></FONT>
<B>end</B>;
Log(<I>'Enumerated voices'</I>);
Log(<I>'About to check attributes'</I>);
tbRate.Position := SpVoice.Rate;
lblRate.Caption := IntToStr(tbRate.Position);
tbVolume.Position := SpVoice.Volume;
lblVolume.Caption := IntToStr(tbVolume.Position);
Log(<I>'Checked attributes'</I>);
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>The SpVoice object's <FONT face="Courier New, Courier, mono">GetVoices</FONT>
method returns a collection object that allows access to each voice as an <FONT
face="Courier New, Courier, mono">ISpeechObjectToken</FONT>. In this code, both
parameters are passed as empty strings, but the first can be used to specify
required parameters of the returned voices and the second for optional
parameters. So a call to <FONT
face="Courier New, Courier, mono">GetVoices('Gender = male', '')</FONT> would
return only male voices.</P>
<P>In order to keep track of the voices, these <FONT
face="Courier New, Courier, mono">ISpeechObjectToken</FONT> interfaces are
added, along with a description, to the combobox's <FONT
face="Courier New, Courier, mono">Items</FONT> (the description in the <FONT
face="Courier New, Courier, mono">Strings</FONT> array and the interfaces in the
<FONT face="Courier New, Courier, mono">Objects</FONT> array).</P>
<P>Storing an interface reference in an object reference is possible as long as
we remember exactly what we stored, and we don't make the mistake of accessing
it as an object reference. Also, since the interface reference is stored using
an inappropriate type, it is important to manually increment its reference count
to stop it being destroyed when the RTL code decrements the reference count at
the end of the method.</P>
<P>The <FONT face="Courier New, Courier, mono">OnDestroy</FONT> event handler
tidies up these descriptor objects by decrementing their reference counts,
thereby allowing them to be destroyed.</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmTextToSpeech.FormDestroy(Sender: TObject);
<B>var</B>
I: Integer;
<B>begin</B>
<FONT color=#003399><I>//Release all the voice descriptors</I></FONT>
<B>for</B> I := 0 <B>to</B> cbVoices.Items.Count - 1 <B>do</B>
ISpeechObjectToken(Pointer(cbVoices.Items.Objects[I]))._Release;
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>When the user selects a different voice from the combobox, the <FONT
face="Courier New, Courier, mono">OnChange</FONT> event handler selects the new
voice and displays the voice attributes (including the path in the Windows
registry where the voice attributes are stored).</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmTextToSpeech.cbVoicesChange(Sender: TObject);
<B>var</B>
SOToken: ISpeechObjectToken;
<B>begin</B>
<B>with</B> lstEngineInfo.Items <B>do</B>
<B>begin</B>
Clear;
SOToken := ISpeechObjectToken(Pointer(
cbVoices.Items.Objects[cbVoices.ItemIndex]));
SpVoice.Voice := SOToken;
Add(Format(<I>'Name: %s'</I>, [SOToken.GetAttribute(<I>'Name'</I>)]));
Add(Format(<I>'Vendor: %s'</I>, [SOToken.GetAttribute(<I>'Vendor'</I>)]));
Add(Format(<I>'Age: %s'</I>, [SOToken.GetAttribute(<I>'Age'</I>)]));
Add(Format(<I>'Gender: %s'</I>, [SOToken.GetAttribute(<I>'Gender'</I>)]));
Add(Format(<I>'Language: %s'</I>, [SOToken.GetAttribute(<I>'Language'</I>)]));
Add(Format(<I>'Reg key: %s'</I>, [SOToken.Id]));
<B>end</B>
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<H3><A name=Speech>Making Your Computer Talk</A></H3>
<P>There are different calls to start speech and to continue paused speech, so a
helper flag is employed to record whether pause has been pressed. This allows
the play button to start a fresh speech stream as well as continue a paused
speech stream. The text to speak is taken from a richedit control and is spoken
asynchronously thanks to the <FONT
face="Courier New, Courier, mono">SVSFlagsAsync</FONT> flag being used.</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmTextToSpeech.btnPlayClick(Sender: TObject);
<B>begin</B>
<B>if</B> <B>not</B> BeenPaused <B>then</B>
SpVoice.Speak(reText.Text, SVSFlagsAsync)
<B>else</B>
<B>begin</B>
SpVoice.Resume;
BeenPaused := False
<B>end</B>
<B>end</B>;
<B>procedure</B> TfrmTextToSpeech.btnPauseClick(Sender: TObject);
<B>begin</B>
SpVoice.Pause;
BeenPaused := True
<B>end</B>;
<B>procedure</B> TfrmTextToSpeech.btnStopClick(Sender: TObject);
<B>begin</B>
SpVoice.Skip(<I>'Sentence'</I>, MaxInt)
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>There is another <A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.zip">speech
demo</A> in the same directory in the project TextToSpeechReadWordDoc.dpr. As
the name suggests, this sample reads out loud from a Word document. It uses
Automation to control Microsoft Word (as well as the SAPI voice object).</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>type</B>
TfrmVTxtAutoLateBound = <B>class</B>(TForm)
...
<B>private</B>
MSWord: Variant;
<B>end</B>;
...
<B>procedure</B> TfrmTextToSpeechReadWordDoc.FormCreate(Sender: TObject);
<B>begin</B>
MSWord := CreateOleObject(<I>'Word.Application'</I>);
<B>end</B>;
<B>procedure</B> TfrmTextToSpeechReadWordDoc.btnReadDocClick(Sender: TObject);
<B>const</B>
<FONT color=#003399><I>// Constants for enum WdUnits</I></FONT>
wdCharacter = $00000001;
wdParagraph = $00000004;
<FONT color=#003399><I>// Constants for enum WdMovementType</I></FONT>
wdExtend = $00000001;
<B>var</B>
Moved: Integer;
Txt: <B>String</B>;
<B>begin</B>
(Sender <B>as</B> TButton).Enabled := False;
Stopped := False;
<B>if</B> dlgOpenDoc.Execute <B>then</B>
<B>begin</B>
MSWord.Documents.Open(FileName := dlgOpenDoc.FileName);
Moved := 2;
<B>while</B> (Moved > 1) <B>and</B> <B>not</B> Stopped <B>do</B>
<B>begin</B>
<FONT color=#003399><I>//Select next paragraph</I></FONT>
Moved := MSWord.Selection.EndOf(<B>Unit</B>:=wdParagraph, Extend:=wdExtend);
<B>if</B> Moved > 1 <B>then</B>
<B>begin</B>
Txt := Trim(MSWord.Selection.Text);
<B>if</B> Length(Txt) > 0 <B>then</B>
SpVoice.Speak(Txt, SVSFlagsAsync);
Application.ProcessMessages;
<FONT color=#003399><I>//Move to start of next paragraph</I></FONT>
MSWord.Selection.MoveRight(<B>Unit</B> := wdCharacter);
<B>end</B>
<B>end</B>;
<B>end</B>;
MSWord.ActiveDocument.Close;
TButton(Sender).Enabled := True;
<B>end</B>;
<B>procedure</B> TfrmTextToSpeechReadWordDoc.btnStopClick(Sender: TObject);
<B>begin</B>
SpVoice.Skip(<I>'Sentence'</I>, Maxint);
Stopped := True;
<B>end</B>;
<B>procedure</B> TfrmTextToSpeechReadWordDoc.FormDestroy(Sender: TObject);
<B>begin</B>
btnStop.Click;
MSWord.Quit;
MSWord := Unassigned;
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<H3><A name=Events>Voice Events</A></H3>
<P>The <FONT face="Courier New, Courier, mono">SpVoice</FONT> object has a
variety of events that fire during speech. Each block of speech starts with an
<FONT face="Courier New, Courier, mono">OnStartStream</FONT> event and ends with
<FONT face="Courier New, Courier, mono">OnEndStream</FONT>. <FONT
face="Courier New, Courier, mono">OnStartStream</FONT> identifies the speech
stream, and all the other events pass the stream number to which they pertain.
As each sentence is started an <FONT
face="Courier New, Courier, mono">OnSentence</FONT> event fires and there is
also an <FONT face="Courier New, Courier, mono">OnWord</FONT> event that
triggers at the start of each spoken word.</P>
<P>Additionally (among others) an <FONT
face="Courier New, Courier, mono">OnAudioLevel</FONT> event allows a progress
bar to be used as a VU meter for the spoken text. However it is important to
note that for some events to fire you must set the <FONT
face="Courier New, Courier, mono">EventInterests</FONT> property accordingly; to
receive the <FONT face="Courier New, Courier, mono">OnAudioLevel</FONT> event
you should set <FONT face="Courier New, Courier, mono">EventInterests</FONT> to
<FONT face="Courier New, Courier, mono">SVEAudioLevel</FONT> or <FONT
face="Courier New, Courier, mono">SVEAllEvents</FONT>.</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>const</B>
Phonemes: <B>array</B>[1..49] <B>of</B> <B>String</B> = (
<I>'-'</I>, <I>'!'</I>, <I>'&'</I>, <I>','</I>, <I>'.'</I>, <I>'?'</I>, <I>'_'</I>,
<I>'1'</I>, <I>'2'</I>, <I>'aa'</I>, <I>'ae'</I>, <I>'ah'</I>, <I>'ao'</I>, <I>'aw'</I>,
<I>'ax'</I>, <I>'ay'</I>, <I>'b'</I>, <I>'ch'</I>, <I>'d'</I>, <I>'dh'</I>, <I>'eh'</I>,
<I>'er'</I>, <I>'ey'</I>, <I>'f'</I>, <I>'g'</I>, <I>'h'</I>, <I>'ih'</I>, <I>'iy'</I>,
<I>'jh'</I>, <I>'k'</I>, <I>'l'</I>, <I>'m'</I>, <I>'n'</I>, <I>'ng'</I>, <I>'ow'</I>,
<I>'oy'</I>, <I>'p'</I>, <I>'r'</I>, <I>'s'</I>, <I>'sh'</I>, <I>'t'</I>, <I>'th'</I>,
<I>'uh'</I>, <I>'uw'</I>, <I>'v'</I>, <I>'w'</I>, <I>'y'</I>, <I>'z'</I>, <I>'zh'</I>
);
<B>procedure</B> TfrmTextToSpeech.SpVoicePhoneme(Sender: TObject;
StreamNumber: Integer; StreamPosition: OleVariant; Duration: Integer;
NextPhoneId: Smallint; Feature: TOleEnum; CurrentPhoneId: Smallint);
<B>begin</B>
<B>if</B> CurrentPhoneId <> 7 <B>then</B> <FONT color=#003399><I>//Display phonemes, except silence</I></FONT>
memEnginePhonemes.Text :=
memEnginePhonemes.Text + Phonemes[CurrentPhoneId] +<I>'-'</I>
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<H3><A name=Animation>Animating Speech</A></H3>
<P>An <FONT face="Courier New, Courier, mono">OnViseme</FONT> event is triggered
for each recognised viseme (a portion of speech requiring the mouth to move into
a visibly different position); there are 22 different visemes generated by
English speech and these are based on the Disney 13 visemes (cartoons have less
granularity and Disney animators discovered many years ago that only 13 cartoon
mouth shapes are required to represent all English phonemes).</P>
<P>If you have some artistic flair and can draw a mouth in each position
represented by the visemes you could use this event to provide a simple animated
representation of speech.</P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -