📄 speech synthesis & speech recognition using sapi 5_1.htm

📁 softonline.dll中函数的使用,请见不同的例程,VB函数见VB例子,VC函数见VC例子,VFP函数见VFP的例子,BCB函数见BCB例子, Delphi函数见Delphi例子
💻 HTM
📖 第 1 页 / 共 5 页
字号:
<P>Part of the process of speech recognition involves deciding what words have 
actually been spoken. Recognisers use a grammar to decide what has been said, 
where possible. SAPI 5.x represents grammars in XML.</P>
<P>In the case of dictation, a grammar can be used to indicate some words that 
are likely to be spoken. It is not feasible to try and represent the entire 
spoken English language as a grammar so the recogniser uses its own rules and 
context analysis, with any help from a grammar you might supply.</P>
<P>With Command and Control, the words that are understood are limited to the 
supported commands defined in the grammar. The grammar defines various rules 
that dictate what will be said and this makes the recogniser's job much easier. 
Rather than trying to understand anything spoken, it only needs to recognise 
speech that follows the supplied rules. A simple grammar that recognises three 
colours might look like this:</P>
<TABLE bgColor=white border=1>
  <TBODY>
  <TR>
    <TD><PRE><CODE><FONT color=black size=2>
&lt;GRAMMAR LANGID="809"&gt;

  &lt;!-- "Constant" definitions --&gt;
  &lt;DEFINE&gt;
    &lt;ID NAME="RID_start" VAL="1"/&gt;
  &lt;/DEFINE&gt;

  &lt;!-- Rule definitions --&gt;
  &lt;RULE NAME="start" ID="RID_start" TOPLEVEL="ACTIVE"&gt;
    &lt;L&gt;
      &lt;P&gt;red&lt;/P&gt;
      &lt;P&gt;blue&lt;/P&gt;
      &lt;P&gt;green&lt;/P&gt;
    &lt;/L&gt;
  &lt;/RULE&gt;
&lt;/GRAMMAR&gt;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>The GRAMMAR root node defines the language as British English ($809, American 
English is $409). Note that the <I>colour</I> rule is a top level rule and has 
been marked to be active by default, meaning it will be active whenever speech 
recognition is enabled for this context.</P>
<P>Grammars support lists to make implementing many similar commands easy and 
also support optional sections. For example this grammar will recognise any of 
the following:</P>
<UL>
  <LI>colour red 
  <LI>colour red please 
  <LI>colour blue 
  <LI>colour blue please 
  <LI>colour green 
  <LI>colour green please </LI></UL>
<TABLE bgColor=white border=1>
  <TBODY>
  <TR>
    <TD><PRE><CODE><FONT color=black size=2>
&lt;GRAMMAR LANGID="809"&gt;
  &lt;DEFINE&gt;
    &lt;ID NAME="RID_start" VAL="1"/&gt;
  &lt;/DEFINE&gt;

  &lt;RULE NAME="start" ID="RID_start" TOPLEVEL="ACTIVE"&gt;
    &lt;P&gt;colour&lt;/P&gt;
    &lt;RULEREF NAME="colour" /&gt;
    &lt;O&gt;please&lt;/O&gt;
  &lt;/RULE&gt;

  &lt;RULE NAME="colour"&gt;
    &lt;L&gt;
      &lt;P&gt;red&lt;/P&gt;
      &lt;P&gt;blue&lt;/P&gt;
      &lt;P&gt;green&lt;/P&gt;
    &lt;/L&gt;
  &lt;/RULE&gt;
&lt;/GRAMMAR&gt;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>You can find more details about the supported grammar syntax in the SAPI 
documentation</P>
<H3><A name=DSR>Continuous Dictation Recognition</A></H3>
<P>Thankfully this is quite straightforward to use. We need to set up a 
recognition context object for the shared recogniser so drop a <FONT 
face="Courier New, Courier, mono">TSpSharedRecContext</FONT> component on the 
form.</P>
<P><U><B>Note:</B></U> the recogniser will implicitly be set up if we do not 
create one specifically. This means you do not need to drop a <FONT 
face="Courier New, Courier, mono">TSpSharedRecognizer</FONT> or a <FONT 
face="Courier New, Courier, mono">TSpInprocRecognizer</FONT> on the form unless 
you need to use them directly.</P>
<P>The code below shows how you create a simple grammar that will satisfy the SR 
engine for continuous dictation. The grammar is represented by an <FONT 
face="Courier New, Courier, mono">ISpeechRecoGrammar</FONT> interface and is 
used to start the dictation session. The code comes from the 
ContinuousDictation.dpr <A 
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.zip">sample 
project</A>.</P>
<TABLE bgColor=white border=1>
  <TBODY>
  <TR>
    <TD><PRE><CODE><FONT color=black size=2>
  SRGrammar: ISpeechRecoGrammar;
  ...
<B>procedure</B> TfrmContinuousDictation.FormCreate(Sender: TObject);
<B>begin</B>
  <FONT color=#003399><I>//OnAudioLevel event is not fired by default - this changes that</I></FONT>
  SpSharedRecoContext.EventInterests := SREAllEvents;
  SRGrammar := SpSharedRecoContext.CreateGrammar(0);
  SRGrammar.DictationSetState(SGDSActive) 
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<H4><A name=GramNotify>Grammar Notifications</A></H4>
<P>As the SR engine does its work it calls notification methods when certain 
things happen, such as a phrase having been finished and recognised. These 
notifications are available as standard Delphi events in the Delphi Automation 
object component wrappers. This greatly simplifies the job of responding to the 
notifications.</P>
<P>The main event is <FONT 
face="Courier New, Courier, mono">OnRecognition</FONT>, which is called when the 
SR engine has decided what has been spoken. Whilst working it out, it will 
likely call the <FONT face="Courier New, Courier, mono">OnHypothesis</FONT> 
event several times. Finished phrases are added to a memo on the form and whilst 
a phrase is being worked out the hypotheses are added to a list box so you can 
see how the SR engine made its decision. Each time a new phrase is started, the 
hypothesis list is cleared.</P>
<P>You can see the list of hypotheses building up in this screenshot of the 
program running.</P>
<P align=center><IMG src=""></P>
<P>Both <FONT face="Courier New, Courier, mono">OnRecognition</FONT> and <FONT 
face="Courier New, Courier, mono">OnHypothesis</FONT> are passed a <FONT 
face="Courier New, Courier, mono">Result</FONT> parameter; this is an <FONT 
face="Courier New, Courier, mono">ISpeechRecoResult</FONT> results object. In 
Delphi 7 this is declared using the correct <FONT 
face="Courier New, Courier, mono">ISpeechRecoResult</FONT> interface type, but 
in earlier versions this was just declared as an <FONT 
face="Courier New, Courier, mono">OleVariant</FONT> (which contained the <FONT 
face="Courier New, Courier, mono">ISpeechRecoResult</FONT> interface).</P>
<P>This code can be used in Delphi 6 and earlier to access the text that was 
recognised:</P>
<TABLE bgColor=white border=1>
  <TBODY>
  <TR>
    <TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmContinuousDictation.SpSharedRecoContextHypothesis(
  Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant;
  <B>var</B> Result: OleVariant);
<B>begin</B>
  lstHypotheses.Items.Add(Result.PhraseInfo.GetText);
  lstHypotheses.ItemIndex := lstHypotheses.Items.Count - 1
<B>end</B>;

<B>procedure</B> TfrmContinuousDictation.SpSharedRecoContextRecognition(
  Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant;
  RecognitionType: TOleEnum; <B>var</B> Result: OleVariant);
<B>begin</B>
  memText.SelText := Result.PhraseInfo.GetText + #32
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P><U><B>Note:</B></U> this code uses late bound Automation on the results 
object (so no Code Completion or Code Parameters), but you could use early bound 
Automation with:</P>
<TABLE bgColor=white border=1>
  <TBODY>
  <TR>
    <TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmContinuousDictation.SpSharedRecoContextHypothesis(
  Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant;
  <B>var</B> Result: OleVariant);
<B>var</B>
  SRResult: ISpeechRecoResult;
<B>begin</B>
  SRResult := IDispatch(Result) <B>as</B> ISpeechRecoResult;
  lstHypotheses.Items.Add(SRResult.PhraseInfo.GetText(0, -1, True));
  lstHypotheses.ItemIndex := lstHypotheses.Items.Count - 1
<B>end</B>;

<B>procedure</B> TfrmContinuousDictation.SpSharedRecoContextRecognition(
  Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant;
  RecognitionType: TOleEnum; <B>var</B> Result: OleVariant);
<B>var</B>
  SRResult: ISpeechRecoResult;
<B>begin</B>
  SRResult := IDispatch(Result) <B>as</B> ISpeechRecoResult;
  memText.SelText := SRResult.PhraseInfo.GetText(0, -1, True) + #32
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P><B><U>Note:</U></B> the code here does not check if a valid <FONT 
face="Courier New, Courier, mono">IDispatch</FONT> reference is in the Variant 
but probably should.</P>
<P>In Delphi 7 the code should look like this:</P>
<TABLE bgColor=white border=1>
  <TBODY>
  <TR>
    <TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmContinuousDictation.SpSharedRecoContextHypothesis(
  ASender: TObject; StreamNumber: Integer; StreamPosition: OleVariant;
  <B>const</B> Result: ISpeechRecoResult);
<B>begin</B>
  lstHypotheses.Items.Add(Result.PhraseInfo.GetText(0, -1, True));
  lstHypotheses.ItemIndex := lstHypotheses.Items.Count - 1
<B>end</B>;

<B>procedure</B> TfrmContinuousDictation.SpSharedRecoContextRecognition(
  ASender: TObject; StreamNumber: Integer; StreamPosition: OleVariant;
  RecognitionType: TOleEnum; <B>const</B> Result: ISpeechRecoResult);
<B>begin</B>
  memText.SelText := Result.PhraseInfo.GetText(0, -1, True) + #32
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<H4><A name=EngineDialogs>Engine Dialogs</A></H4>
<P>The buttons on the form allow various engine dialogs to be invoked (if 
supported). This support is all attained through a couple of methods of the 
recogniser object.</P>
<TABLE bgColor=white border=1>
  <TBODY>
  <TR>
    <TD><PRE><CODE><FONT color=black size=2>
<B>const</B>
  SPDUI_EngineProperties = <I>'EngineProperties'</I>;
  SPDUI_AddRemoveWord = <I>'AddRemoveWord'</I>;
  SPDUI_UserTraining = <I>'UserTraining'</I>;
  SPDUI_MicTraining = <I>'MicTraining'</I>;
  SPDUI_RecoProfileProperties = <I>'RecoProfileProperties'</I>;
  SPDUI_AudioProperties = <I>'AudioProperties'</I>;
  SPDUI_AudioVolume = <I>'AudioVolume'</I>;

<B>procedure</B> TfrmContinuousDictation.btnEnginePropsClick(Sender: TObject);
<B>begin</B>
  InvokeUI(SPDUI_EngineProperties, <I>'Engine Properties'</I>)
<B>end</B>;

<B>procedure</B> TfrmContinuousDictation.btnUserSettingsClick(Sender: TObject);
<B>begin</B>
  InvokeUI(SPDUI_RecoProfileProperties, <I>'User Settings'</I>)
<B>end</B>;

<B>procedure</B> TfrmContinuousDictation.btnLexiconClick(Sender: TObject);
<B>begin</B>
  InvokeUI(SPDUI_AddRemoveWord, <I>'Add/Remove Word'</I>)
<B>end</B>;

<B>procedure</B> TfrmContinuousDictation.btnTrainGeneralClick(Sender: TObject);
<B>begin</B>
  InvokeUI(SPDUI_UserTraining, <I>'Speaker Training'</I>)
<B>end</B>;

<B>procedure</B> TfrmContinuousDictation.btnTrainMicClick(Sender: TObject);
<B>begin</B>
  InvokeUI(SPDUI_MicTraining, <I>'Microphone Setup'</I>)
<B>end</B>;

<B>procedure</B> TfrmContinuousDictation.btnAudioPropsClick(Sender: TObject);
<B>begin</B>
  InvokeUI(SPDUI_AudioProperties, <I>'Audio Properties'</I>)
<B>end</B>;

<B>procedure</B> TfrmContinuousDictation.btnAudioVolClick(Sender: TObject);
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -