📄 ch15.htm
字号:
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><i>Dialog Name</i></td>
<td WIDTH="462"><i>Description</i> </td>
</tr>
<tr>
<td WIDTH="129">About Box</td>
<td WIDTH="462">Used to display the dialog box that identifies the TTS engine and shows
its copyright information. </td>
</tr>
<tr>
<td WIDTH="129">Lexicon Dialog</td>
<td WIDTH="462">Can be used to offer the speaker the opportunity to alter the
pronunciation lexicon, including altering the phonetic spelling of troublesome words, or
adding or deleting personal vocabulary files. </td>
</tr>
<tr>
<td WIDTH="129">General Dialog</td>
<td WIDTH="462">Can be used to display general information about the TTS engine. Examples
might be controlling the speed at which the text will be read, the character of the voice
that will be used for playback, and other user preferences as supported by the TTS engine.
</td>
</tr>
<tr>
<td WIDTH="129">Translate Dialog</td>
<td WIDTH="462">Can be used to offer the user the ability to alter the pronunciation of
key words in the lexicon. For example, the TTS engine that ships with Microsoft Voice has
a special entry that forces the speech engine to express all occurrences of
"TTS" as "text to speech," instead of just reciting the letters
"T-T-S." </td>
</tr>
</table>
</center></div>
<h2><a NAME="LowLevelSAPI"><font SIZE="5" COLOR="#FF0000">Low-Level SAPI</font></a></h2>
<p>The low-level SAPI services provide access to a much greater level of control of
Windows speech recognition and text-to-speech services. This level is best for
implementing advanced SR and TTS services, including the creation of dictation systems. </p>
<p>Just as there are two basic service types for high-level SAPI, there are two primary
COM interfaces defined for low-level SAPI-one for speech recognition and one for
text-to-speech services. The rest of this chapter outlines each of the objects and their
interfaces. </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><b>Note</b></td>
</tr>
<tr>
<td><blockquote>
<p>This section of the chapter covers the low-level SAPI services. These services are
available only from C or C++ programs-not Visual Basic. However, even if you do not
program in C, you can still learn a lot from this section of the chapter. The material in
this section can give you a good understanding of the details behind the SAPI OLE
automation objects, and may also give you some ideas on how you can use the VB-level SAPI
services in your programs.</p>
</blockquote>
</td>
</tr>
</table>
</center></div>
<h3><a NAME="SpeechRecognition">Speech Recognition</a></h3>
<p>The <tt><font FACE="Courier">Speech Recognition</font></tt> object has several child
objects and collections. There are two top-level objects in the SR system: the SR <tt><font
FACE="Courier">Engine Enumerator</font></tt> object and the SR <tt><font FACE="Courier">Sharing</font></tt>
object. These two objects are created using their unique CLSID (class ID) values. The
purpose of both objects is to give an application information about the available speech
recognition engines and allow the application to register with the appropriate engine.
Once the engine is selected, one or more grammar objects can be created, and as each
phrase is heard, an SR <tt><font FACE="Courier">Results</font></tt> object is created for
each phrase. This object is a temporary object that contains details about the phrase that
was captured by the speech recognition engine. Figure 15.2 shows how the different objects
relate to each other, and how they are created. </p>
<p><a HREF="f15-2.gif"><b>Figure 15.2 : </b><i>Mapping the low-level SAPI objects.</i></a>
</p>
<p>When an SR engine is created, a link to a valid audio input device is also created.
While it is possible to create a custom audio input device, it is not required. The
default audio input device is an attached microphone, but can also be set to point to a
telephone device. </p>
<p>The rest of this section details the low-level SAPI SR objects and their interfaces. </p>
<h4>The SR <tt><font FACE="Courier">Enumerator</font></tt> and <tt><font FACE="Courier">Engine
Enumerator</font></tt> Objects </h4>
<p>The role of the SR <tt><font FACE="Courier">Enumerator</font></tt> and <tt><font
FACE="Courier">Engine Enumerator</font></tt> objects is to locate and select an
appropriate SR engine for the requesting application. The <tt><font FACE="Courier">Enumerator</font></tt>
object lists all available speech recognition modes and their associated installed
engines. This information is supplied by the child object of the <tt><font FACE="Courier">Enumerator</font></tt>
object: the <tt><font FACE="Courier">Engine Enumerator</font></tt> object. The result of
this search is a pointer to the SR engine interface that best meets the service request. </p>
<p>The <tt><font FACE="Courier">Enumerator</font></tt> and <tt><font FACE="Courier">Engine
Enumerator</font></tt> objects support only two interfaces:
<ul>
<li><font COLOR="#000000">The </font><tt><font FACE="Courier">ISREnum</font></tt> interface
is used to get a list of all available engines. </li>
<li><font COLOR="#000000">The </font><tt><font FACE="Courier">ISRFind</font></tt> interface
is used to select the desired engine. </li>
</ul>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><b>Note</b></td>
</tr>
<tr>
<td><blockquote>
<p>The SR <tt><font FACE="Courier">Enumerator</font></tt> and <tt><font FACE="Courier">Engine
Enumerator</font></tt> objects are used only to locate and select an engine object. Once
that is done, these two objects can be discarded. </p>
</blockquote>
</td>
</tr>
</table>
</center></div>
<h4>The SR <tt><font FACE="Courier">Sharing</font></tt> Object </h4>
<p>The SR <tt><font FACE="Courier">Sharing</font></tt> object is a possible replacement
for the SR <tt><font FACE="Courier">Enumerator</font></tt> and <tt><font FACE="Courier">Engine
Enumerator</font></tt> objects. The SR <tt><font FACE="Courier">Sharing</font></tt> object
uses only one interface, the <tt><font FACE="Courier">ISRSharing</font></tt> interface, to
locate and select an engine object that will be shared with other applications on the pc.
In essence, this allows for the registration of a requesting application with an
out-of-process memory SR server object. While often slower than creating an instance of a
private SR object, using the <tt><font FACE="Courier">Sharing</font></tt> object can
reduce strain on memory resources. </p>
<p>The SR <tt><font FACE="Courier">Sharing</font></tt> interface is an optional feature of
speech engines and may not be available depending on the design of the engine itself. </p>
<h4>The SR <tt><font FACE="Courier">Engine</font></tt> Object </h4>
<p>The SR <tt><font FACE="Courier">Engine</font></tt> Object is the heart of the speech
recognition system. This object represents the actual speech engine and it supports
several interfaces for the monitoring of speech activity. The SR <tt><font FACE="Courier">Engine</font></tt>
is created using the <tt><font FACE="Courier">Select</font></tt> method of the <tt><font
FACE="Courier">ISREnum</font></tt> interface of the SR <tt><font FACE="Courier">Enumerator</font></tt>
object described earlier. Table 15.3 lists the interfaces supported by the SR <tt><font
FACE="Courier">Engine</font></tt> object along with a short description of their uses.<br>
</p>
<p align="center"><b>Table 15.3. The interfaces of the SR <tt><font SIZE="1"
FACE="Courier">Engine</font></tt> object.</b> </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><i>Interface Name</i></td>
<td WIDTH="424"><i>Description</i> </td>
</tr>
<tr>
<td WIDTH="166"><tt><font FACE="Courier">ISRCentral</font></tt> </td>
<td WIDTH="424">The main interface for the SR <tt><font FACE="Courier">Engine</font></tt>
object. Allows the loading and unloading of grammars, checks information status of the
engine, starts and stops the engine, and registers and releases the engine notification
callback. </td>
</tr>
<tr>
<td WIDTH="166"><tt><font FACE="Courier">ISRDialogs</font></tt> </td>
<td WIDTH="424">Used to display a series of dialog boxes that allow users to set
parameters of the engine and engage in training to improve the SR performance. </td>
</tr>
<tr>
<td WIDTH="166"><tt><font FACE="Courier">ISRAttributes</font></tt> </td>
<td WIDTH="424">Used to set and get basic attributes of the engine, including input device
name and type, volume controls, and other information. </td>
</tr>
<tr>
<td WIDTH="166"><tt><font FACE="Courier">ISRSpeaker</font></tt> </td>
<td WIDTH="424">Allows users to manage a list of speakers that use the engine. This is
especially valuable when more than one person uses the same device. This is an optional
interface. </td>
</tr>
<tr>
<td WIDTH="166"><tt><font FACE="Courier">ISRLexPronounce</font></tt> </td>
<td WIDTH="424">This interface is used to provide users access to modify the pronunciation
or playback of certain words in the lexicon. This is an optional interface. </td>
</tr>
</table>
</center></div>
<p>The SR <tt><font FACE="Courier">Engine</font></tt> object also provides a notification
callback interface (<tt><font FACE="Courier">ISRNotifySink</font></tt>) to capture
messages sent by the engine. These messages can be used to check on the performance status
of the engine, and can provide feedback to the application (or speaker) that can be used
to improve performance. </p>
<h4>The <tt><font FACE="Courier">Grammar</font></tt> Object</h4>
<p>The <tt><font FACE="Courier">Grammar</font></tt> object is a child object of the SR <tt><font
FACE="Courier">Engine</font></tt> object. It is used to load parsing grammars for use by
the speech engine in analyzing audio input. The <tt><font FACE="Courier">Grammar</font></tt>
object contains all the rules, words, lists, and other parameters that control how the SR
engine interprets human speech. Each phrase detected by the SR engine is processed using
the loaded grammars. </p>
<p>The <tt><font FACE="Courier">Grammar</font></tt> object supports three interfaces:
<ul>
<li><tt><font FACE="Courier">ISRGramCFG</font></tt>-This interface is used to handle grammar
functions specific to context-free grammars, including the management of lists and rules. </li>
<li><tt><font FACE="Courier">ISRGramDictation</font></tt>-This interface is used to handle
grammar functions specific to dictation grammars, including words, word groups, and sample
text. </li>
<li><tt><font FACE="Courier">IRSGramCommon</font></tt>-This interface is use to handle tasks
common to both dictation and context-free grammars. This includes loading and unloading
grammars, activating or deactivating a loaded grammar, training the engine, and possibly
storing SR results objects. </li>
</ul>
<p>The <tt><font FACE="Courier">Grammar</font></tt> object also supports a notification
callback to handle messages regarding grammar events. Optionally, the grammar object can
create an SR <tt><font FACE="Courier">Results</font></tt> object. This object is discussed
fully in the next section. </p>
<h4>The SR <tt><font FACE="Courier">Results</font></tt> Object </h4>
<p>The SR <tt><font FACE="Courier">Results</font></tt> object contains detailed
information about the most recent speech recognition event. This could include a recorded
representation of the speech, the interpreted phrase constructed by the engine, the name
of the speaker, performance statistics, and so on. </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><b>Note</b></td>
</tr>
<tr>
<td><blockquote>
<p>The SR <tt><font FACE="Courier">Results</font></tt> object is optional and is not
supported by all engines. </p>
</blockquote>
</td>
</tr>
</table>
</center></div>
<p>Table 15.4 shows the interfaces defined for the SR <tt><font FACE="Courier">Results</font></tt>
object, along with descriptions of their use. Only the first interface in the table is
required (the <tt><font FACE="Courier">ISRResBasic</font></tt> interface).<br>
</p>
<p align="center"><b>Table 15.4. The defined interfaces for the SR <tt><font
FACE="Courier">Results</font></tt> object.</b> </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><i>Interface Name</i></td>
<td WIDTH="414"><i>Description</i> </td>
</tr>
<tr>
<td WIDTH="176"><tt><font FACE="Courier">ISRResBasic</font></tt> </td>
<td WIDTH="414">Used to provide basic information about the results object, including an
audio representation of the phrase, the selected interpretation of the audio, the grammar
used to analyze the input, and the start and stop time of the recognition event. </td>
</tr>
<tr>
<td WIDTH="176"><tt><font FACE="Courier">ISRResAudio</font></tt> </td>
<td WIDTH="414">Used to retrieve an audio representation of the recognized phrase. This
audio file can be played back to the speaker or saved as a WAV format file for later
review. </td>
</tr>
<tr>
<td WIDTH="176"><tt><font FACE="Courier">ISRResGraph</font></tt> </td>
<td WIDTH="414">Used to produce a graphic representation of the recognition event. This
graph could show the phonemes used to construct the phrase, show the engine's
"score" for accurately detecting the phrase, and so on. </td>
</tr>
<tr>
<td WIDTH="176"><tt><font FACE="Courier">ISRResCorrection</font></tt> </td>
<td WIDTH="414">Used to provide an opportunity to confirm that the interpretation was
accurate, possibly allowing for a correction in the analysis. </td>
</tr>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -