⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch15.htm

📁 MAPI__SAPI__TAPI
💻 HTM
📖 第 1 页 / 共 4 页
字号:
<html>

<head>
<title>Chapter 15 -- SAPI Architecture</title>
<meta NAME="GENERATOR" CONTENT="Microsoft FrontPage 3.0">
</head>

<body TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#CE2910">
<!-- Spidersoft WebZIP Ad Banner Insert -->
<!-- End of Spidersoft WebZIP Ad Banner Insert-->

<h1><font COLOR="#FF0000">Chapter 15</font></h1>

<h1><b><font SIZE="5" COLOR="#FF0000">SAPI Architecture</font></b> </h1>

<hr WIDTH="100%">

<h3 ALIGN="CENTER"><font SIZE="+2" COLOR="#000000">CONTENTS<a NAME="CONTENTS"></a> </font></h3>

<ul>
  <li><a HREF="#Introduction">Introduction</a> </li>
  <li><a HREF="#HighLevelSAPI">High-Level SAPI</a> <ul>
      <li><a HREF="#VoiceCommand">Voice Command</a> </li>
      <li><a HREF="#VoiceText">Voice Text</a> </li>
    </ul>
  </li>
  <li><a HREF="#LowLevelSAPI">Low-Level SAPI</a> <ul>
      <li><a HREF="#SpeechRecognition">Speech Recognition</a> </li>
      <li><a HREF="#TexttoSpeech">Text-to-Speech</a> </li>
    </ul>
  </li>
  <li><a HREF="#SpeechObjectsandOLEAutomation">Speech Objects and OLE Automation</a> <ul>
      <li><a HREF="#OLEAutomationSpeechRecognitionServic">OLE Automation Speech Recognition 
        Services</a> </li>
      <li><a HREF="#OLEAutomationTexttoSpeechServices">OLE Automation Text-to-Speech Services</a> </li>
    </ul>
  </li>
  <li><a HREF="#Summary">Summary</a> </li>
</ul>

<hr>

<h2><a NAME="Introduction"><font SIZE="5" COLOR="#FF0000">Introduction</font></a> </h2>

<p>The Speech API is implemented as a series of Component Object Model (COM) interfaces. 
This chapter identifies the top-level objects, their child objects, and their methods. </p>

<p>The SAPI model is divided into two distinct levels: 

<ul>
  <li><i>High-level SAPI</i>-This level provides basic speech services in the form of 
    command-and-control speech recognition and simple text-to-speech output. </li>
  <li><i>Low-level SAPI</i>-This level provides detailed access to all speech services, 
    including direct interfaces to control dialogs and manipulation of both speech recognition 
    (SR) and text-to-speech (TTS) behavior attributes. </li>
</ul>

<p>Each of the two levels of SAPI services has its own set of objects and methods. </p>

<p>Along with the two sets of COM interfaces, Microsoft has also published an OLE 
Automation type library for the high-level SAPI objects. This set of OLE objects is 
discussed at the end of the chapter. </p>

<p>When you complete this chapter you'll understand the basic architecture of the SAPI 
model, including all the SAPI objects and their uses. Detailed information about the 
object's methods and parameters will be covered in the next chapter-&quot;SAPI 
Basics.&quot; </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><b>Note</b></td>
  </tr>
  <tr>
    <td><blockquote>
      <p>Most of the Microsoft Speech API is accessible only through C++ code. For this reason, 
      many of the examples shown in this chapter are expressed in Microsoft Visual C++ code. You 
      do not need to be able to code in C++ in order to understand the information discussed 
      here. At the end of this chapter, the OLE Automation objects available through Visual 
      Basic are also discussed.</p>
    </blockquote>
    </td>
  </tr>
</table>
</center></div>

<h2><a NAME="HighLevelSAPI"><font SIZE="5" COLOR="#FF0000">High-Level SAPI</font></a></h2>

<p>The high-level SAPI services provide access to basic forms of speech recognition and 
text-to-speech services. This is ideal for providing voice-activated menus, command 
buttons, and so on. It is also sufficient for basic rendering of text into speech. </p>

<p>The high-level SAPI interface has two top-level objects-one for voice command services 
(speech recognition), and one for voice text services (text-to-speech). The following two 
sections describe each of these top-level objects, their child objects, and the interfaces 
available through each object. </p>

<h3><a NAME="VoiceCommand">Voice Command</a></h3>

<p>The <tt><font FACE="Courier">Voice Command</font></tt> object is used to provide speech 
recognition services. It is useful for providing simple command-and-control speech 
services such as implementing menu options, activating command buttons, and issuing other 
simple operating system commands. </p>

<p>The <tt><font FACE="Courier">Voice Command</font></tt> object has one child object and 
one collection object. The child object is the <tt><font FACE="Courier">Voice Menu</font></tt> 
object and the collection object is a collection of enumerated menu objects (see Figure 
15.1). </p>

<p><a HREF="f15-1.gif"><b>Figure 15.1 : </b><i>The Voice Command object.</i></a> </p>

<h4><tt><font FACE="Courier">Voice Command</font></tt> Object </h4>

<p>The <tt><font FACE="Courier">Voice Command</font></tt> object supports three 
interfaces: 

<ul>
  <li><font COLOR="#000000">The </font><tt><font FACE="Courier">Voice Command</font></tt> 
    interface </li>
  <li><font COLOR="#000000">The </font><tt><font FACE="Courier">Attributes</font></tt> 
    interface </li>
  <li><font COLOR="#000000">The </font><tt><font FACE="Courier">Dialogs</font></tt> interface </li>
</ul>

<p>The <tt><font FACE="Courier">Voice Command</font></tt> interface is used to enumerate, 
create, and delete voice menu objects. This interface is also used to register an 
application to use the SR engine. An application must successfully complete the 
registration before the SR engine can be used. An additional method defined for the <tt><font
FACE="Courier">Voice Command</font></tt> interface is the <tt><font FACE="Courier">Mimic</font></tt> 
method. This is used to play back a voice command to the engine; it can be used to 
&quot;speak&quot; voice commands directly to the SR engine. This is similar to playing 
keystroke or mouse-action macros back to the operating system. </p>

<p>The <tt><font FACE="Courier">Attributes</font></tt> interface is used to set and 
retrieve a number of basic parameters that control the behavior of the voice command 
system. You can enable or disable voice commands, adjust input gain, establish the SR 
mode, and control the input device (microphone or telephone). </p>

<p>The <tt><font FACE="Courier">Dialogs</font></tt> interface gives you access to a series 
of dialog boxes that can be used as a standard set of input screens for setting and 
displaying SR engine information. The SAPI model identifies five different dialog boxes 
that should be available through the <tt><font FACE="Courier">Dialogs</font></tt> 
interface. The exact layout and content of these dialog boxes is not dictated by 
Microsoft, but is determined by the developer of the speech recognition engine. However, 
Microsoft has established general guidelines for the contents of the SR engine dialog 
boxes. Table 15.1 lists each of the five defined dialog boxes along with short 
descriptions of their suggested contents.<br>
</p>

<p align="center"><b>Table 15.1. The Voice Command dialog boxes.</b> </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><i>Dialog Box Name </i></td>
    <td WIDTH="421"><i>Description</i> </td>
  </tr>
  <tr>
    <td WIDTH="170">About Box</td>
    <td WIDTH="421">Used to display the dialog box that identifies the SR engine and show its 
    copyright information. </td>
  </tr>
  <tr>
    <td WIDTH="170">Command Verification</td>
    <td WIDTH="421">Can be used as a verification pop-up window during a speech recognition 
    session. When the engine identifies a word or phrase, this box can appear requesting the 
    user to confirm that the engine has correctly understood the spoken command. </td>
  </tr>
  <tr>
    <td WIDTH="170">General Dialog</td>
    <td WIDTH="421">Can be used to provide general access to the SR engine settings such as 
    identifying the speaker, controlling recognition parameters, and the amount of disk space 
    allotted to the SR engine. </td>
  </tr>
  <tr>
    <td WIDTH="170">Lexicon Dialog</td>
    <td WIDTH="421">Can be used to offer the speaker the opportunity to alter the 
    pronunciation lexicon, including altering the phonetic spelling of troublesome words, or 
    adding or deleting personal vocabulary files. </td>
  </tr>
</table>
</center></div>

<h4>The <tt><font FACE="Courier">Voice Menu</font></tt> Object and the Menu Object 
Collection</h4>

<p>The <tt><font FACE="Courier">Voice Menu</font></tt> object is the only child object of 
the <tt><font FACE="Courier">Voice Command</font></tt> object. It is used to allow 
applications to define, add, and delete voice commands in a menu. You can also use the <tt><font
FACE="Courier">Voice Menu</font></tt> object to activate and deactivate menus and, 
optionally, to provide a training dialog box for the menu. </p>

<p>The voice menu collection object contains a set of all menu objects defined in the 
voice command database. Microsoft SAPI defines functions to select and copy menu 
collections for use by the voice command speech engine. </p>

<h4>The Voice Command Notification Callback</h4>

<p>In the process of registering the application to use a voice command object, a 
notification <i>callback</i> (or <i>sink</i>) is established. This callback receives 
messages regarding the SR engine activity. Typical messages sent out by the SR engine can 
include notifications that the engine has detected commands being spoken, that some 
attribute of the engine has been changed, or that spoken commands have been heard but not 
recognized. </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><b>Note</b></td>
  </tr>
  <tr>
    <td><blockquote>
      <p>Notification callbacks require a pointer to the function that will receive all related 
      messages. Callbacks cannot be registered using Visual Basic; you need C or C++. However, 
      the voice command OLE Automation type library that ships with the Speech SDK has a 
      notification callback built into it.</p>
    </blockquote>
    </td>
  </tr>
</table>
</center></div>

<h3><a NAME="VoiceText">Voice Text</a></h3>

<p>The SAPI model defines a basic text-to-speech service called <i>voice text</i>. This 
service has only one object-the <tt><font FACE="Courier">Voice Text</font></tt> object. 
The <tt><font FACE="Courier">Voice Text</font></tt> object supports three interfaces: 

<ul>
  <li><font COLOR="#000000">The </font><tt><font FACE="Courier">Voice Text</font></tt> 
    interface </li>
  <li><font COLOR="#000000">The </font><tt><font FACE="Courier">Attributes</font></tt> 
    interface </li>
  <li><font COLOR="#000000">The </font><tt><font FACE="Courier">Dialogs</font></tt> interface </li>
</ul>

<p>The <tt><font FACE="Courier">Voice Text</font></tt> interface is the primary interface 
of the TTS portion of the high-level SAPI model. The <tt><font FACE="Courier">Voice Text</font></tt> 
interface provides a set method to start, pause, resume, fast forward, rewind, and stop 
the TTS engine while it is speaking text. This mirrors the VCR-type controls commonly 
employed for pc video and audio playback. </p>

<p>The <tt><font FACE="Courier">Voice Text</font></tt> interface is also used to register 
the application that will request TTS services. An application must successfully complete 
the registration before the TTS engine can be used. This registration function can 
optionally pass a pointer to a callback function to be used to capture voice text 
messages. This establishes a notification callback with several methods, which are 
triggered by messages sent from the underlying TTS engine. </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><b>Note</b></td>
  </tr>
  <tr>
    <td><blockquote>
      <p>Notification callbacks require a pointer to the function that will receive all related 
      messages. Callbacks cannot be registered using Visual Basic; you need C or C++. However, 
      the voice text OLE Automation type library that ships with the Speech SDK has a 
      notification callback built into it.</p>
    </blockquote>
    </td>
  </tr>
</table>
</center></div>

<p>The <tt><font FACE="Courier">Attribute</font></tt> interface provides access to 
settings that control the basic behavior of the TTS engine. For example, you can use the <tt><font
FACE="Courier">Attributes</font></tt> interface to set the audio device to be used, set 
the playback speed (in words per minute), and turn the speech services on and off. If the 
TTS engine supports it, you can also use the <tt><font FACE="Courier">Attributes</font></tt> 
interface to select the TTS speaking mode. The TTS speaking mode usually refers to a 
predefined set of voices, each having its own character or style (for example, male, 
female, child, adult, and so on). </p>

<p>The <tt><font FACE="Courier">Dialogs</font></tt> interface can be used to allow users 
the ability to set and retrieve information regarding the TTS engine. The exact contents 
and layout of the dialog boxes are not determined by Microsoft but by the TTS engine 
developer. Microsoft does, however, suggest the possible contents of each dialog box. 
Table 15.2 shows the four voice text dialogs defined by the SAPI model, along with short 
descriptions of their suggested contents.<br>
</p>

<p align="center"><b>Table 15.2. The Voice Text dialog boxes.</b> </p>
<div align="center"><center>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -