📄 speech synthesis & speech recognition using sapi 5_1.htm
字号:
<B>begin</B>
InvokeUI(SPDUI_AudioVolume, <I>'Audio Volume'</I>)
<B>end</B>;
<B>procedure</B> TfrmContinuousDictation.InvokeUI(<B>const</B> TypeOfUI, Caption: WideString);
<B>var</B>
U: OleVariant;
<B>begin</B>
U := Unassigned;
<B>if</B> SpSharedRecoContext.Recognizer.IsUISupported(TypeOfUI, U) <B>then</B>
SpSharedRecoContext.Recognizer.DisplayUI(Handle, Caption, TypeOfUI, U)
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<H3><A name=CnC>Command and Control Recognition</A></H3>
<P>For C and C recognition we will need a grammar to give the SR engine rules by
which to recognise the commands. This grammar is used by a sample project called
CommandAndControl.dpr in <A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.zip">the
files that accompany this article</A>.</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<GRAMMAR LANGID="809">
<!-- "Constant" definitions -->
<DEFINE>
<ID NAME="RID_start" VAL="1"/>
<ID NAME="PID_chosencolour" VAL="2"/>
<ID NAME="PID_colourvalue" VAL="3"/>
</DEFINE>
<!-- Rule definitions -->
<RULE NAME="start" ID="RID_start" TOPLEVEL="ACTIVE">
<O>colour</O>
<RULEREF NAME="colour" PROPNAME="chosencolour" PROPID="PID_chosencolour" />
<O>please</O>
</RULE>
<RULE NAME="colour">
<L PROPNAME="colourvalue" PROPID="PID_colourvalue">
<P VAL="1">red</P>
<P VAL="2">blue</P>
<P VAL="3">green</P>
</L>
</RULE>
</GRAMMAR>
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>After defining some constants the rules are laid out next. The top level rule
(<I>start</I>, which is just an arbitrarily chosen name) is defined as the
optional word <I>colour</I>, a value from another rule (also called
<I>colour</I>) and the optional word <I>please</I>. The value from the colour
rule can be identified programmatically (rather than by scanning the recognised
text) thanks to it being defined as a property (<I>chosencolour</I>).</P>
<P>The colour rule defines one of three colours that can be spoken, each of
which has a value defined for it. Again, this value will be accessible thanks to
the list being defined as a property (<I>colourvalue</I>).</P>
<P>This grammar is stored in an XML file and loaded in the <FONT
face="Courier New, Courier, mono">OnCreate</FONT> event handler.</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmCommandAndControl.FormCreate(Sender: TObject);
<B>begin</B>
<FONT color=#003399><I>//OnAudioLevel event is not fired by default - this changes that</I></FONT>
SpSharedRecoContext.EventInterests := SREAllEvents;
SRGrammar := SpSharedRecoContext.CreateGrammar(0);
SRGrammar.CmdLoadFromFile(<I>'C and C Grammar.xml'</I>, SLODynamic);
SRGrammar.CmdSetRuleIdState(0, SGDSActive)
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>Notice that two different <FONT
face="Courier New, Courier, mono">ISpeechRecoGrammar</FONT> methods are used to
instigate command and control recognition. <FONT
face="Courier New, Courier, mono">CmdLoadFromFile</FONT> loads a grammar from an
XML file and <FONT face="Courier New, Courier, mono">CmdSetRuleIdState</FONT>
activates all top level rules when the first parameter is zero (you can activate
individual rules by passing their rule ID).</P>
<P>The <FONT face="Courier New, Courier, mono">OnRecognition</FONT> event
handler does the work of locating the <I>chosencolour</I> property and then
finding the nested <I>colourvalue</I> property. Its value is used to change the
form colour at the user's request, for example with phrases such as:</P>
<UL>
<LI>red please
<LI>colour green
<LI>colour blue please
<LI>red </LI></UL>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmCommandAndControl.SpSharedRecoContextRecognition(
ASender: TObject; StreamNumber: Integer; StreamPosition: OleVariant;
RecognitionType: TOleEnum; <B>const</B> Result: ISpeechRecoResult);
<B>begin</B>
<B>with</B> Result.PhraseInfo <B>do</B>
<B>begin</B>
Log(<I>'OnRecognition: %s'</I>, [GetText(0, -1, True)]);
<B>case</B> GetPropValue(Result, [<I>'chosencolour'</I>, <I>'colourvalue'</I>]) <B>of</B>
1: Color := clRed;
2: Color := clBlue;
3: Color := clGreen;
<B>end</B>
<B>end</B>
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>This code uses a helper routine, <FONT
face="Courier New, Courier, mono">GetPropValue</FONT> whose task is to locate
the appropriate property in the result object, by following the property path
specified in the string array parameter. The code for <FONT
face="Courier New, Courier, mono">GetPropValue</FONT> and its own helper
routine, <FONT face="Courier New, Courier, mono">GetProp</FONT>, looks like
this:</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>function</B> GetProp(Props: ISpeechPhraseProperties;
<B>const</B> Name: <B>String</B>): ISpeechPhraseProperty; overload;
<B>var</B>
I: Integer;
Prop: ISpeechPhraseProperty;
<B>begin</B>
Result := <B>nil</B>;
<B>for</B> I := 0 <B>to</B> Props.Count - 1 <B>do</B>
<B>begin</B>
Prop := Props.Item(I);
<B>if</B> CompareText(Prop.Name, Name) = 0 <B>then</B>
<B>begin</B>
Result := Prop;
Break
<B>end</B>
<B>end</B>
<B>end</B>;
<B>function</B> GetPropValue(SRResult: ISpeechRecoResult;
<B>const</B> Path: <B>array</B> <B>of</B> <B>String</B>): OleVariant;
<B>var</B>
Prop: ISpeechPhraseProperty;
PathLoop: Integer;
<B>begin</B>
<B>for</B> PathLoop := Low(Path) <B>to</B> High(Path) <B>do</B>
<B>begin</B>
<B>if</B> PathLoop = Low(Path) <B>then</B> <FONT color=#003399><I>//top level property</I></FONT>
Prop := GetProp(SRResult.PhraseInfo.Properties, Path[PathLoop])
<B>else</B> <FONT color=#003399><I>//nested property</I></FONT>
Prop := GetProp(Prop.Children, Path[PathLoop]);
<B>if</B> <B>not</B> Assigned(Prop) <B>then</B>
<B>begin</B>
Result := Unassigned;
Exit;
<B>end</B>
<B>end</B>;
Result := Prop.Value
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>This is what the application looks like when running.</P>
<P align=center><IMG
src="Speech Synthesis & Speech Recognition Using SAPI 5_1.files/CommandAndControl.png"></P>
<H3><A name=Troubleshooting>Speech Recognition Troubleshooting</A></H3>
<P>If you get issues of SR stopping (or not starting) unexpectedly, or other
weird SR issues, check your recording settings have the microphone enabled.</P>
<UL>
<LI>Double-click the Volume icon in your Task Bar's System Tray. If no Volume
icon is present, choose <FONT face="Courier New, Courier, mono">Start |
Programs | Accessories | Entertainment | Volume Control</FONT>.
<LI>If you see a <FONT face="Courier New, Courier, mono">Microphone</FONT>
column, ensure it has its <FONT face="Courier New, Courier, mono">Mute</FONT>
checkbox checked
<LI>Choose <FONT face="Courier New, Courier, mono">Options |
Properties</FONT>, click <FONT
face="Courier New, Courier, mono">Recording</FONT>, ensure the <FONT
face="Courier New, Courier, mono">Microphone</FONT> option is checked and
press OK.
<LI>Now ensure the <FONT face="Courier New, Courier, mono">Microphone</FONT>
column has its <FONT face="Courier New, Courier, mono">Select</FONT> checkbox
enabled, if it has one, or that its <FONT
face="Courier New, Courier, mono">Mute</FONT> checkbox is unchecked, if it has
one. </LI></UL>
<H2><A name=Deployment>SAPI 5.1 Deployment</A></H2>
<P>When distributing SAPI 5.1 applications you will need get hold of the
redistributable components package available as SpeechSDK51MSM.exe from <A
href="http://www.microsoft.com/speech/download/SDK51"
target=_blank>http://www.microsoft.com/speech/download/SDK51</A> (a colossal
file, weighing in at 132 Mb) contains Windows Installer merge modules for all
the SAPI 5.1 components (the main DLLs, the TTS and SR engines, the Control
Panel applet) and the SDK documentation includes a white paper on how to use all
these components from within a Windows Installer compatible installation
building tool.</P>
<P align=center><IMG
src="Speech Synthesis & Speech Recognition Using SAPI 5_1.files/SAPI5CPL.png"></P>
<H2><A name=Summary>Summary</A></H2>
<P>Adding various speech capabilities into a Delphi application does not take an
awful lot of work, particularly if you do the background work to understand the
SAPI concepts.</P>
<P>There is much to Speech API that we have not looked at in this paper but
hopefully the areas covered will be enough to whet your appetite and get you
exploring further on your own.</P>
<H2><A name=References></A>References/Further Reading</H2>
<P>The following is a list of useful articles and papers that I found on SAPI
5.1 development during my research on this subject.</P>
<OL>
<LI><A name=Ref1></A><I><A
href="http://www.delphi3000.com/articles/article_2581.asp"
target=_blank>Speech Part 1 - How to Add "Text to Speech" (Speech Synthesis)
to your Delphi Apps</A> </I>by Alec Bergamini, <A
href="http://www.delphi3000.com/" target=_blank>Delphi 3000</A>.<BR>This
discusses installing the SAPI 5.1 SDK and getting simple speech.
<LI><A name=Ref1></A><I><A
href="http://www.delphi3000.com/articles/article_2629.asp" target=_blank>9.
Speech Part 2 - How to Add Simple Dictation speech recognition to your Delphi
Apps</A> </I>by Alec Bergamini, <A href="http://www.delphi3000.com/"
target=_blank>Delphi 3000</A>.<BR>This looks at simple dictation SR. </LI></OL>
<H2><A name=AboutBrian>About Brian Long</A> </H2>
<P><A href="mailto:brian@blong.com">Brian Long</A> used to work at <A
href="http://www.borland.com/">Borland</A> UK, performing a number of duties
including Technical Support on all the programming tools. Since leaving in 1995,
Brian has been providing training and consultancy services to the Delphi and
C++Builder communities, and the newly forming Kylix community.
<P>If you need training in these products, or need solutions to problems you
have with them, please <A href="mailto:brian@blong.com">get in touch</A>, or
visit <A href="http://www.blong.com/">Brian's Web site</A>.
<P>Besides authoring a <A
href="http://www.amazon.com/exec/obidos/ASIN/0201593831/qid=905701291/sr=1-1/002-9464178-4139807">Borland
Pascal problem-solving book</A> published in 1994, Brian is a regular columnist
in <A href="http://www.thedelphimagazine.com/">The Delphi Magazine</A> and has
had numerous articles published in Developer's Review, <A
href="http://www.computingnet.co.uk/">Computing</A>, Delphi Developer's Journal
and EXE Magazine. He was nominated for the <A
href="http://www.borland.com/delphi/vote">Spirit of Delphi 2000</A> award.</P>
<P>In his spare time (and waiting for his C++ programs to compile) Brian has
learnt the art of <A href="http://www.juggling.org/">juggling</A> and making
inflatable <A href="http://www.paperfolding.com/">origami</A> paper frogs.</P>
<HR>
<P><A href="http://www.blong.com/Conferences/DCon2002/Speech/Speech.htm">Go to
the speech capabilities overview </A><BR>
<P><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI4HighLevel/SAPI4.htm">Go
to the SAPI 4 High Level Interfaces article</A><BR>
<P><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI4LowLevel/SAPI4.htm">Go
to the SAPI 4 Low Level Interfaces article</A><BR>
<P><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#Top">Go
back to the top of this SAPI 5.1 article</A><BR></FONT></P></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -