📄 speech synthesis & speech recognition using sapi 5_1.htm
字号:
<P>Part of the process of speech recognition involves deciding what words have
actually been spoken. Recognisers use a grammar to decide what has been said,
where possible. SAPI 5.x represents grammars in XML.</P>
<P>In the case of dictation, a grammar can be used to indicate some words that
are likely to be spoken. It is not feasible to try and represent the entire
spoken English language as a grammar so the recogniser uses its own rules and
context analysis, with any help from a grammar you might supply.</P>
<P>With Command and Control, the words that are understood are limited to the
supported commands defined in the grammar. The grammar defines various rules
that dictate what will be said and this makes the recogniser's job much easier.
Rather than trying to understand anything spoken, it only needs to recognise
speech that follows the supplied rules. A simple grammar that recognises three
colours might look like this:</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<GRAMMAR LANGID="809">
<!-- "Constant" definitions -->
<DEFINE>
<ID NAME="RID_start" VAL="1"/>
</DEFINE>
<!-- Rule definitions -->
<RULE NAME="start" ID="RID_start" TOPLEVEL="ACTIVE">
<L>
<P>red</P>
<P>blue</P>
<P>green</P>
</L>
</RULE>
</GRAMMAR>
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>The GRAMMAR root node defines the language as British English ($809, American
English is $409). Note that the <I>colour</I> rule is a top level rule and has
been marked to be active by default, meaning it will be active whenever speech
recognition is enabled for this context.</P>
<P>Grammars support lists to make implementing many similar commands easy and
also support optional sections. For example this grammar will recognise any of
the following:</P>
<UL>
<LI>colour red
<LI>colour red please
<LI>colour blue
<LI>colour blue please
<LI>colour green
<LI>colour green please </LI></UL>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<GRAMMAR LANGID="809">
<DEFINE>
<ID NAME="RID_start" VAL="1"/>
</DEFINE>
<RULE NAME="start" ID="RID_start" TOPLEVEL="ACTIVE">
<P>colour</P>
<RULEREF NAME="colour" />
<O>please</O>
</RULE>
<RULE NAME="colour">
<L>
<P>red</P>
<P>blue</P>
<P>green</P>
</L>
</RULE>
</GRAMMAR>
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>You can find more details about the supported grammar syntax in the SAPI
documentation</P>
<H3><A name=DSR>Continuous Dictation Recognition</A></H3>
<P>Thankfully this is quite straightforward to use. We need to set up a
recognition context object for the shared recogniser so drop a <FONT
face="Courier New, Courier, mono">TSpSharedRecContext</FONT> component on the
form.</P>
<P><U><B>Note:</B></U> the recogniser will implicitly be set up if we do not
create one specifically. This means you do not need to drop a <FONT
face="Courier New, Courier, mono">TSpSharedRecognizer</FONT> or a <FONT
face="Courier New, Courier, mono">TSpInprocRecognizer</FONT> on the form unless
you need to use them directly.</P>
<P>The code below shows how you create a simple grammar that will satisfy the SR
engine for continuous dictation. The grammar is represented by an <FONT
face="Courier New, Courier, mono">ISpeechRecoGrammar</FONT> interface and is
used to start the dictation session. The code comes from the
ContinuousDictation.dpr <A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.zip">sample
project</A>.</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
SRGrammar: ISpeechRecoGrammar;
...
<B>procedure</B> TfrmContinuousDictation.FormCreate(Sender: TObject);
<B>begin</B>
<FONT color=#003399><I>//OnAudioLevel event is not fired by default - this changes that</I></FONT>
SpSharedRecoContext.EventInterests := SREAllEvents;
SRGrammar := SpSharedRecoContext.CreateGrammar(0);
SRGrammar.DictationSetState(SGDSActive)
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<H4><A name=GramNotify>Grammar Notifications</A></H4>
<P>As the SR engine does its work it calls notification methods when certain
things happen, such as a phrase having been finished and recognised. These
notifications are available as standard Delphi events in the Delphi Automation
object component wrappers. This greatly simplifies the job of responding to the
notifications.</P>
<P>The main event is <FONT
face="Courier New, Courier, mono">OnRecognition</FONT>, which is called when the
SR engine has decided what has been spoken. Whilst working it out, it will
likely call the <FONT face="Courier New, Courier, mono">OnHypothesis</FONT>
event several times. Finished phrases are added to a memo on the form and whilst
a phrase is being worked out the hypotheses are added to a list box so you can
see how the SR engine made its decision. Each time a new phrase is started, the
hypothesis list is cleared.</P>
<P>You can see the list of hypotheses building up in this screenshot of the
program running.</P>
<P align=center><IMG src=""></P>
<P>Both <FONT face="Courier New, Courier, mono">OnRecognition</FONT> and <FONT
face="Courier New, Courier, mono">OnHypothesis</FONT> are passed a <FONT
face="Courier New, Courier, mono">Result</FONT> parameter; this is an <FONT
face="Courier New, Courier, mono">ISpeechRecoResult</FONT> results object. In
Delphi 7 this is declared using the correct <FONT
face="Courier New, Courier, mono">ISpeechRecoResult</FONT> interface type, but
in earlier versions this was just declared as an <FONT
face="Courier New, Courier, mono">OleVariant</FONT> (which contained the <FONT
face="Courier New, Courier, mono">ISpeechRecoResult</FONT> interface).</P>
<P>This code can be used in Delphi 6 and earlier to access the text that was
recognised:</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmContinuousDictation.SpSharedRecoContextHypothesis(
Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant;
<B>var</B> Result: OleVariant);
<B>begin</B>
lstHypotheses.Items.Add(Result.PhraseInfo.GetText);
lstHypotheses.ItemIndex := lstHypotheses.Items.Count - 1
<B>end</B>;
<B>procedure</B> TfrmContinuousDictation.SpSharedRecoContextRecognition(
Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant;
RecognitionType: TOleEnum; <B>var</B> Result: OleVariant);
<B>begin</B>
memText.SelText := Result.PhraseInfo.GetText + #32
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P><U><B>Note:</B></U> this code uses late bound Automation on the results
object (so no Code Completion or Code Parameters), but you could use early bound
Automation with:</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmContinuousDictation.SpSharedRecoContextHypothesis(
Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant;
<B>var</B> Result: OleVariant);
<B>var</B>
SRResult: ISpeechRecoResult;
<B>begin</B>
SRResult := IDispatch(Result) <B>as</B> ISpeechRecoResult;
lstHypotheses.Items.Add(SRResult.PhraseInfo.GetText(0, -1, True));
lstHypotheses.ItemIndex := lstHypotheses.Items.Count - 1
<B>end</B>;
<B>procedure</B> TfrmContinuousDictation.SpSharedRecoContextRecognition(
Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant;
RecognitionType: TOleEnum; <B>var</B> Result: OleVariant);
<B>var</B>
SRResult: ISpeechRecoResult;
<B>begin</B>
SRResult := IDispatch(Result) <B>as</B> ISpeechRecoResult;
memText.SelText := SRResult.PhraseInfo.GetText(0, -1, True) + #32
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P><B><U>Note:</U></B> the code here does not check if a valid <FONT
face="Courier New, Courier, mono">IDispatch</FONT> reference is in the Variant
but probably should.</P>
<P>In Delphi 7 the code should look like this:</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmContinuousDictation.SpSharedRecoContextHypothesis(
ASender: TObject; StreamNumber: Integer; StreamPosition: OleVariant;
<B>const</B> Result: ISpeechRecoResult);
<B>begin</B>
lstHypotheses.Items.Add(Result.PhraseInfo.GetText(0, -1, True));
lstHypotheses.ItemIndex := lstHypotheses.Items.Count - 1
<B>end</B>;
<B>procedure</B> TfrmContinuousDictation.SpSharedRecoContextRecognition(
ASender: TObject; StreamNumber: Integer; StreamPosition: OleVariant;
RecognitionType: TOleEnum; <B>const</B> Result: ISpeechRecoResult);
<B>begin</B>
memText.SelText := Result.PhraseInfo.GetText(0, -1, True) + #32
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<H4><A name=EngineDialogs>Engine Dialogs</A></H4>
<P>The buttons on the form allow various engine dialogs to be invoked (if
supported). This support is all attained through a couple of methods of the
recogniser object.</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>const</B>
SPDUI_EngineProperties = <I>'EngineProperties'</I>;
SPDUI_AddRemoveWord = <I>'AddRemoveWord'</I>;
SPDUI_UserTraining = <I>'UserTraining'</I>;
SPDUI_MicTraining = <I>'MicTraining'</I>;
SPDUI_RecoProfileProperties = <I>'RecoProfileProperties'</I>;
SPDUI_AudioProperties = <I>'AudioProperties'</I>;
SPDUI_AudioVolume = <I>'AudioVolume'</I>;
<B>procedure</B> TfrmContinuousDictation.btnEnginePropsClick(Sender: TObject);
<B>begin</B>
InvokeUI(SPDUI_EngineProperties, <I>'Engine Properties'</I>)
<B>end</B>;
<B>procedure</B> TfrmContinuousDictation.btnUserSettingsClick(Sender: TObject);
<B>begin</B>
InvokeUI(SPDUI_RecoProfileProperties, <I>'User Settings'</I>)
<B>end</B>;
<B>procedure</B> TfrmContinuousDictation.btnLexiconClick(Sender: TObject);
<B>begin</B>
InvokeUI(SPDUI_AddRemoveWord, <I>'Add/Remove Word'</I>)
<B>end</B>;
<B>procedure</B> TfrmContinuousDictation.btnTrainGeneralClick(Sender: TObject);
<B>begin</B>
InvokeUI(SPDUI_UserTraining, <I>'Speaker Training'</I>)
<B>end</B>;
<B>procedure</B> TfrmContinuousDictation.btnTrainMicClick(Sender: TObject);
<B>begin</B>
InvokeUI(SPDUI_MicTraining, <I>'Microphone Setup'</I>)
<B>end</B>;
<B>procedure</B> TfrmContinuousDictation.btnAudioPropsClick(Sender: TObject);
<B>begin</B>
InvokeUI(SPDUI_AudioProperties, <I>'Audio Properties'</I>)
<B>end</B>;
<B>procedure</B> TfrmContinuousDictation.btnAudioVolClick(Sender: TObject);
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -