📄 ch15.htm

📁 MAPI__SAPI__TAPI
💻 HTM
📖 第 1 页 / 共 4 页
字号:

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><i>Dialog Name</i></td>
    <td WIDTH="462"><i>Description</i> </td>
  </tr>
  <tr>
    <td WIDTH="129">About Box</td>
    <td WIDTH="462">Used to display the dialog box that identifies the TTS engine and shows 
    its copyright information. </td>
  </tr>
  <tr>
    <td WIDTH="129">Lexicon Dialog</td>
    <td WIDTH="462">Can be used to offer the speaker the opportunity to alter the 
    pronunciation lexicon, including altering the phonetic spelling of troublesome words, or 
    adding or deleting personal vocabulary files. </td>
  </tr>
  <tr>
    <td WIDTH="129">General Dialog</td>
    <td WIDTH="462">Can be used to display general information about the TTS engine. Examples 
    might be controlling the speed at which the text will be read, the character of the voice 
    that will be used for playback, and other user preferences as supported by the TTS engine. 
    </td>
  </tr>
  <tr>
    <td WIDTH="129">Translate Dialog</td>
    <td WIDTH="462">Can be used to offer the user the ability to alter the pronunciation of 
    key words in the lexicon. For example, the TTS engine that ships with Microsoft Voice has 
    a special entry that forces the speech engine to express all occurrences of 
    &quot;TTS&quot; as &quot;text to speech,&quot; instead of just reciting the letters 
    &quot;T-T-S.&quot; </td>
  </tr>
</table>
</center></div>

<h2><a NAME="LowLevelSAPI"><font SIZE="5" COLOR="#FF0000">Low-Level SAPI</font></a></h2>

<p>The low-level SAPI services provide access to a much greater level of control of 
Windows speech recognition and text-to-speech services. This level is best for 
implementing advanced SR and TTS services, including the creation of dictation systems. </p>

<p>Just as there are two basic service types for high-level SAPI, there are two primary 
COM interfaces defined for low-level SAPI-one for speech recognition and one for 
text-to-speech services. The rest of this chapter outlines each of the objects and their 
interfaces. </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><b>Note</b></td>
  </tr>
  <tr>
    <td><blockquote>
      <p>This section of the chapter covers the low-level SAPI services. These services are 
      available only from C or C++ programs-not Visual Basic. However, even if you do not 
      program in C, you can still learn a lot from this section of the chapter. The material in 
      this section can give you a good understanding of the details behind the SAPI OLE 
      automation objects, and may also give you some ideas on how you can use the VB-level SAPI 
      services in your programs.</p>
    </blockquote>
    </td>
  </tr>
</table>
</center></div>

<h3><a NAME="SpeechRecognition">Speech Recognition</a></h3>

<p>The <tt><font FACE="Courier">Speech Recognition</font></tt> object has several child 
objects and collections. There are two top-level objects in the SR system: the SR <tt><font
FACE="Courier">Engine Enumerator</font></tt> object and the SR <tt><font FACE="Courier">Sharing</font></tt> 
object. These two objects are created using their unique CLSID (class ID) values. The 
purpose of both objects is to give an application information about the available speech 
recognition engines and allow the application to register with the appropriate engine. 
Once the engine is selected, one or more grammar objects can be created, and as each 
phrase is heard, an SR <tt><font FACE="Courier">Results</font></tt> object is created for 
each phrase. This object is a temporary object that contains details about the phrase that 
was captured by the speech recognition engine. Figure 15.2 shows how the different objects 
relate to each other, and how they are created. </p>

<p><a HREF="f15-2.gif"><b>Figure 15.2 : </b><i>Mapping the low-level SAPI objects.</i></a> 
</p>

<p>When an SR engine is created, a link to a valid audio input device is also created. 
While it is possible to create a custom audio input device, it is not required. The 
default audio input device is an attached microphone, but can also be set to point to a 
telephone device. </p>

<p>The rest of this section details the low-level SAPI SR objects and their interfaces. </p>

<h4>The SR <tt><font FACE="Courier">Enumerator</font></tt> and <tt><font FACE="Courier">Engine 
Enumerator</font></tt> Objects </h4>

<p>The role of the SR <tt><font FACE="Courier">Enumerator</font></tt> and <tt><font
FACE="Courier">Engine Enumerator</font></tt> objects is to locate and select an 
appropriate SR engine for the requesting application. The <tt><font FACE="Courier">Enumerator</font></tt> 
object lists all available speech recognition modes and their associated installed 
engines. This information is supplied by the child object of the <tt><font FACE="Courier">Enumerator</font></tt> 
object: the <tt><font FACE="Courier">Engine Enumerator</font></tt> object. The result of 
this search is a pointer to the SR engine interface that best meets the service request. </p>

<p>The <tt><font FACE="Courier">Enumerator</font></tt> and <tt><font FACE="Courier">Engine 
Enumerator</font></tt> objects support only two interfaces: 

<ul>
  <li><font COLOR="#000000">The </font><tt><font FACE="Courier">ISREnum</font></tt> interface 
    is used to get a list of all available engines. </li>
  <li><font COLOR="#000000">The </font><tt><font FACE="Courier">ISRFind</font></tt> interface 
    is used to select the desired engine. </li>
</ul>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><b>Note</b></td>
  </tr>
  <tr>
    <td><blockquote>
      <p>The SR <tt><font FACE="Courier">Enumerator</font></tt> and <tt><font FACE="Courier">Engine 
      Enumerator</font></tt> objects are used only to locate and select an engine object. Once 
      that is done, these two objects can be discarded. </p>
    </blockquote>
    </td>
  </tr>
</table>
</center></div>

<h4>The SR <tt><font FACE="Courier">Sharing</font></tt> Object </h4>

<p>The SR <tt><font FACE="Courier">Sharing</font></tt> object is a possible replacement 
for the SR <tt><font FACE="Courier">Enumerator</font></tt> and <tt><font FACE="Courier">Engine 
Enumerator</font></tt> objects. The SR <tt><font FACE="Courier">Sharing</font></tt> object 
uses only one interface, the <tt><font FACE="Courier">ISRSharing</font></tt> interface, to 
locate and select an engine object that will be shared with other applications on the pc. 
In essence, this allows for the registration of a requesting application with an 
out-of-process memory SR server object. While often slower than creating an instance of a 
private SR object, using the <tt><font FACE="Courier">Sharing</font></tt> object can 
reduce strain on memory resources. </p>

<p>The SR <tt><font FACE="Courier">Sharing</font></tt> interface is an optional feature of 
speech engines and may not be available depending on the design of the engine itself. </p>

<h4>The SR <tt><font FACE="Courier">Engine</font></tt> Object </h4>

<p>The SR <tt><font FACE="Courier">Engine</font></tt> Object is the heart of the speech 
recognition system. This object represents the actual speech engine and it supports 
several interfaces for the monitoring of speech activity. The SR <tt><font FACE="Courier">Engine</font></tt> 
is created using the <tt><font FACE="Courier">Select</font></tt> method of the <tt><font
FACE="Courier">ISREnum</font></tt> interface of the SR <tt><font FACE="Courier">Enumerator</font></tt> 
object described earlier. Table 15.3 lists the interfaces supported by the SR <tt><font
FACE="Courier">Engine</font></tt> object along with a short description of their uses.<br>
</p>

<p align="center"><b>Table 15.3. The interfaces of the SR <tt><font SIZE="1"
FACE="Courier">Engine</font></tt> object.</b> </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><i>Interface Name</i></td>
    <td WIDTH="424"><i>Description</i> </td>
  </tr>
  <tr>
    <td WIDTH="166"><tt><font FACE="Courier">ISRCentral</font></tt> </td>
    <td WIDTH="424">The main interface for the SR <tt><font FACE="Courier">Engine</font></tt> 
    object. Allows the loading and unloading of grammars, checks information status of the 
    engine, starts and stops the engine, and registers and releases the engine notification 
    callback. </td>
  </tr>
  <tr>
    <td WIDTH="166"><tt><font FACE="Courier">ISRDialogs</font></tt> </td>
    <td WIDTH="424">Used to display a series of dialog boxes that allow users to set 
    parameters of the engine and engage in training to improve the SR performance. </td>
  </tr>
  <tr>
    <td WIDTH="166"><tt><font FACE="Courier">ISRAttributes</font></tt> </td>
    <td WIDTH="424">Used to set and get basic attributes of the engine, including input device 
    name and type, volume controls, and other information. </td>
  </tr>
  <tr>
    <td WIDTH="166"><tt><font FACE="Courier">ISRSpeaker</font></tt> </td>
    <td WIDTH="424">Allows users to manage a list of speakers that use the engine. This is 
    especially valuable when more than one person uses the same device. This is an optional 
    interface. </td>
  </tr>
  <tr>
    <td WIDTH="166"><tt><font FACE="Courier">ISRLexPronounce</font></tt> </td>
    <td WIDTH="424">This interface is used to provide users access to modify the pronunciation 
    or playback of certain words in the lexicon. This is an optional interface. </td>
  </tr>
</table>
</center></div>

<p>The SR <tt><font FACE="Courier">Engine</font></tt> object also provides a notification 
callback interface (<tt><font FACE="Courier">ISRNotifySink</font></tt>) to capture 
messages sent by the engine. These messages can be used to check on the performance status 
of the engine, and can provide feedback to the application (or speaker) that can be used 
to improve performance. </p>

<h4>The <tt><font FACE="Courier">Grammar</font></tt> Object</h4>

<p>The <tt><font FACE="Courier">Grammar</font></tt> object is a child object of the SR <tt><font
FACE="Courier">Engine</font></tt> object. It is used to load parsing grammars for use by 
the speech engine in analyzing audio input. The <tt><font FACE="Courier">Grammar</font></tt> 
object contains all the rules, words, lists, and other parameters that control how the SR 
engine interprets human speech. Each phrase detected by the SR engine is processed using 
the loaded grammars. </p>

<p>The <tt><font FACE="Courier">Grammar</font></tt> object supports three interfaces: 

<ul>
  <li><tt><font FACE="Courier">ISRGramCFG</font></tt>-This interface is used to handle grammar 
    functions specific to context-free grammars, including the management of lists and rules. </li>
  <li><tt><font FACE="Courier">ISRGramDictation</font></tt>-This interface is used to handle 
    grammar functions specific to dictation grammars, including words, word groups, and sample 
    text. </li>
  <li><tt><font FACE="Courier">IRSGramCommon</font></tt>-This interface is use to handle tasks 
    common to both dictation and context-free grammars. This includes loading and unloading 
    grammars, activating or deactivating a loaded grammar, training the engine, and possibly 
    storing SR results objects. </li>
</ul>

<p>The <tt><font FACE="Courier">Grammar</font></tt> object also supports a notification 
callback to handle messages regarding grammar events. Optionally, the grammar object can 
create an SR <tt><font FACE="Courier">Results</font></tt> object. This object is discussed 
fully in the next section. </p>

<h4>The SR <tt><font FACE="Courier">Results</font></tt> Object </h4>

<p>The SR <tt><font FACE="Courier">Results</font></tt> object contains detailed 
information about the most recent speech recognition event. This could include a recorded 
representation of the speech, the interpreted phrase constructed by the engine, the name 
of the speaker, performance statistics, and so on. </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><b>Note</b></td>
  </tr>
  <tr>
    <td><blockquote>
      <p>The SR <tt><font FACE="Courier">Results</font></tt> object is optional and is not 
      supported by all engines. </p>
    </blockquote>
    </td>
  </tr>
</table>
</center></div>

<p>Table 15.4 shows the interfaces defined for the SR <tt><font FACE="Courier">Results</font></tt> 
object, along with descriptions of their use. Only the first interface in the table is 
required (the <tt><font FACE="Courier">ISRResBasic</font></tt> interface).<br>
</p>

<p align="center"><b>Table 15.4. The defined interfaces for the SR <tt><font
FACE="Courier">Results</font></tt> object.</b> </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><i>Interface Name</i></td>
    <td WIDTH="414"><i>Description</i> </td>
  </tr>
  <tr>
    <td WIDTH="176"><tt><font FACE="Courier">ISRResBasic</font></tt> </td>
    <td WIDTH="414">Used to provide basic information about the results object, including an 
    audio representation of the phrase, the selected interpretation of the audio, the grammar 
    used to analyze the input, and the start and stop time of the recognition event. </td>
  </tr>
  <tr>
    <td WIDTH="176"><tt><font FACE="Courier">ISRResAudio</font></tt> </td>
    <td WIDTH="414">Used to retrieve an audio representation of the recognized phrase. This 
    audio file can be played back to the speaker or saved as a WAV format file for later 
    review. </td>
  </tr>
  <tr>
    <td WIDTH="176"><tt><font FACE="Courier">ISRResGraph</font></tt> </td>
    <td WIDTH="414">Used to produce a graphic representation of the recognition event. This 
    graph could show the phonemes used to construct the phrase, show the engine's 
    &quot;score&quot; for accurately detecting the phrase, and so on. </td>
  </tr>
  <tr>
    <td WIDTH="176"><tt><font FACE="Courier">ISRResCorrection</font></tt> </td>
    <td WIDTH="414">Used to provide an opportunity to confirm that the interpretation was 
    accurate, possibly allowing for a correction in the analysis. </td>
  </tr>
💿 文件大小 527 K
👤 上传用户 pjamytian
📂 所属分类 TAPI编程
📄 代码行数 1,138 行
💻 语言类型 HTM
🏷️ 相关标签

#MAPI #SAPI #TAPI
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -