📄 ch15.htm
字号:
<html>
<head>
<title>Chapter 15 -- SAPI Architecture</title>
<meta NAME="GENERATOR" CONTENT="Microsoft FrontPage 3.0">
</head>
<body TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#CE2910">
<!-- Spidersoft WebZIP Ad Banner Insert -->
<!-- End of Spidersoft WebZIP Ad Banner Insert-->
<h1><font COLOR="#FF0000">Chapter 15</font></h1>
<h1><b><font SIZE="5" COLOR="#FF0000">SAPI Architecture</font></b> </h1>
<hr WIDTH="100%">
<h3 ALIGN="CENTER"><font SIZE="+2" COLOR="#000000">CONTENTS<a NAME="CONTENTS"></a> </font></h3>
<ul>
<li><a HREF="#Introduction">Introduction</a> </li>
<li><a HREF="#HighLevelSAPI">High-Level SAPI</a> <ul>
<li><a HREF="#VoiceCommand">Voice Command</a> </li>
<li><a HREF="#VoiceText">Voice Text</a> </li>
</ul>
</li>
<li><a HREF="#LowLevelSAPI">Low-Level SAPI</a> <ul>
<li><a HREF="#SpeechRecognition">Speech Recognition</a> </li>
<li><a HREF="#TexttoSpeech">Text-to-Speech</a> </li>
</ul>
</li>
<li><a HREF="#SpeechObjectsandOLEAutomation">Speech Objects and OLE Automation</a> <ul>
<li><a HREF="#OLEAutomationSpeechRecognitionServic">OLE Automation Speech Recognition
Services</a> </li>
<li><a HREF="#OLEAutomationTexttoSpeechServices">OLE Automation Text-to-Speech Services</a> </li>
</ul>
</li>
<li><a HREF="#Summary">Summary</a> </li>
</ul>
<hr>
<h2><a NAME="Introduction"><font SIZE="5" COLOR="#FF0000">Introduction</font></a> </h2>
<p>The Speech API is implemented as a series of Component Object Model (COM) interfaces.
This chapter identifies the top-level objects, their child objects, and their methods. </p>
<p>The SAPI model is divided into two distinct levels:
<ul>
<li><i>High-level SAPI</i>-This level provides basic speech services in the form of
command-and-control speech recognition and simple text-to-speech output. </li>
<li><i>Low-level SAPI</i>-This level provides detailed access to all speech services,
including direct interfaces to control dialogs and manipulation of both speech recognition
(SR) and text-to-speech (TTS) behavior attributes. </li>
</ul>
<p>Each of the two levels of SAPI services has its own set of objects and methods. </p>
<p>Along with the two sets of COM interfaces, Microsoft has also published an OLE
Automation type library for the high-level SAPI objects. This set of OLE objects is
discussed at the end of the chapter. </p>
<p>When you complete this chapter you'll understand the basic architecture of the SAPI
model, including all the SAPI objects and their uses. Detailed information about the
object's methods and parameters will be covered in the next chapter-"SAPI
Basics." </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><b>Note</b></td>
</tr>
<tr>
<td><blockquote>
<p>Most of the Microsoft Speech API is accessible only through C++ code. For this reason,
many of the examples shown in this chapter are expressed in Microsoft Visual C++ code. You
do not need to be able to code in C++ in order to understand the information discussed
here. At the end of this chapter, the OLE Automation objects available through Visual
Basic are also discussed.</p>
</blockquote>
</td>
</tr>
</table>
</center></div>
<h2><a NAME="HighLevelSAPI"><font SIZE="5" COLOR="#FF0000">High-Level SAPI</font></a></h2>
<p>The high-level SAPI services provide access to basic forms of speech recognition and
text-to-speech services. This is ideal for providing voice-activated menus, command
buttons, and so on. It is also sufficient for basic rendering of text into speech. </p>
<p>The high-level SAPI interface has two top-level objects-one for voice command services
(speech recognition), and one for voice text services (text-to-speech). The following two
sections describe each of these top-level objects, their child objects, and the interfaces
available through each object. </p>
<h3><a NAME="VoiceCommand">Voice Command</a></h3>
<p>The <tt><font FACE="Courier">Voice Command</font></tt> object is used to provide speech
recognition services. It is useful for providing simple command-and-control speech
services such as implementing menu options, activating command buttons, and issuing other
simple operating system commands. </p>
<p>The <tt><font FACE="Courier">Voice Command</font></tt> object has one child object and
one collection object. The child object is the <tt><font FACE="Courier">Voice Menu</font></tt>
object and the collection object is a collection of enumerated menu objects (see Figure
15.1). </p>
<p><a HREF="f15-1.gif"><b>Figure 15.1 : </b><i>The Voice Command object.</i></a> </p>
<h4><tt><font FACE="Courier">Voice Command</font></tt> Object </h4>
<p>The <tt><font FACE="Courier">Voice Command</font></tt> object supports three
interfaces:
<ul>
<li><font COLOR="#000000">The </font><tt><font FACE="Courier">Voice Command</font></tt>
interface </li>
<li><font COLOR="#000000">The </font><tt><font FACE="Courier">Attributes</font></tt>
interface </li>
<li><font COLOR="#000000">The </font><tt><font FACE="Courier">Dialogs</font></tt> interface </li>
</ul>
<p>The <tt><font FACE="Courier">Voice Command</font></tt> interface is used to enumerate,
create, and delete voice menu objects. This interface is also used to register an
application to use the SR engine. An application must successfully complete the
registration before the SR engine can be used. An additional method defined for the <tt><font
FACE="Courier">Voice Command</font></tt> interface is the <tt><font FACE="Courier">Mimic</font></tt>
method. This is used to play back a voice command to the engine; it can be used to
"speak" voice commands directly to the SR engine. This is similar to playing
keystroke or mouse-action macros back to the operating system. </p>
<p>The <tt><font FACE="Courier">Attributes</font></tt> interface is used to set and
retrieve a number of basic parameters that control the behavior of the voice command
system. You can enable or disable voice commands, adjust input gain, establish the SR
mode, and control the input device (microphone or telephone). </p>
<p>The <tt><font FACE="Courier">Dialogs</font></tt> interface gives you access to a series
of dialog boxes that can be used as a standard set of input screens for setting and
displaying SR engine information. The SAPI model identifies five different dialog boxes
that should be available through the <tt><font FACE="Courier">Dialogs</font></tt>
interface. The exact layout and content of these dialog boxes is not dictated by
Microsoft, but is determined by the developer of the speech recognition engine. However,
Microsoft has established general guidelines for the contents of the SR engine dialog
boxes. Table 15.1 lists each of the five defined dialog boxes along with short
descriptions of their suggested contents.<br>
</p>
<p align="center"><b>Table 15.1. The Voice Command dialog boxes.</b> </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><i>Dialog Box Name </i></td>
<td WIDTH="421"><i>Description</i> </td>
</tr>
<tr>
<td WIDTH="170">About Box</td>
<td WIDTH="421">Used to display the dialog box that identifies the SR engine and show its
copyright information. </td>
</tr>
<tr>
<td WIDTH="170">Command Verification</td>
<td WIDTH="421">Can be used as a verification pop-up window during a speech recognition
session. When the engine identifies a word or phrase, this box can appear requesting the
user to confirm that the engine has correctly understood the spoken command. </td>
</tr>
<tr>
<td WIDTH="170">General Dialog</td>
<td WIDTH="421">Can be used to provide general access to the SR engine settings such as
identifying the speaker, controlling recognition parameters, and the amount of disk space
allotted to the SR engine. </td>
</tr>
<tr>
<td WIDTH="170">Lexicon Dialog</td>
<td WIDTH="421">Can be used to offer the speaker the opportunity to alter the
pronunciation lexicon, including altering the phonetic spelling of troublesome words, or
adding or deleting personal vocabulary files. </td>
</tr>
</table>
</center></div>
<h4>The <tt><font FACE="Courier">Voice Menu</font></tt> Object and the Menu Object
Collection</h4>
<p>The <tt><font FACE="Courier">Voice Menu</font></tt> object is the only child object of
the <tt><font FACE="Courier">Voice Command</font></tt> object. It is used to allow
applications to define, add, and delete voice commands in a menu. You can also use the <tt><font
FACE="Courier">Voice Menu</font></tt> object to activate and deactivate menus and,
optionally, to provide a training dialog box for the menu. </p>
<p>The voice menu collection object contains a set of all menu objects defined in the
voice command database. Microsoft SAPI defines functions to select and copy menu
collections for use by the voice command speech engine. </p>
<h4>The Voice Command Notification Callback</h4>
<p>In the process of registering the application to use a voice command object, a
notification <i>callback</i> (or <i>sink</i>) is established. This callback receives
messages regarding the SR engine activity. Typical messages sent out by the SR engine can
include notifications that the engine has detected commands being spoken, that some
attribute of the engine has been changed, or that spoken commands have been heard but not
recognized. </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><b>Note</b></td>
</tr>
<tr>
<td><blockquote>
<p>Notification callbacks require a pointer to the function that will receive all related
messages. Callbacks cannot be registered using Visual Basic; you need C or C++. However,
the voice command OLE Automation type library that ships with the Speech SDK has a
notification callback built into it.</p>
</blockquote>
</td>
</tr>
</table>
</center></div>
<h3><a NAME="VoiceText">Voice Text</a></h3>
<p>The SAPI model defines a basic text-to-speech service called <i>voice text</i>. This
service has only one object-the <tt><font FACE="Courier">Voice Text</font></tt> object.
The <tt><font FACE="Courier">Voice Text</font></tt> object supports three interfaces:
<ul>
<li><font COLOR="#000000">The </font><tt><font FACE="Courier">Voice Text</font></tt>
interface </li>
<li><font COLOR="#000000">The </font><tt><font FACE="Courier">Attributes</font></tt>
interface </li>
<li><font COLOR="#000000">The </font><tt><font FACE="Courier">Dialogs</font></tt> interface </li>
</ul>
<p>The <tt><font FACE="Courier">Voice Text</font></tt> interface is the primary interface
of the TTS portion of the high-level SAPI model. The <tt><font FACE="Courier">Voice Text</font></tt>
interface provides a set method to start, pause, resume, fast forward, rewind, and stop
the TTS engine while it is speaking text. This mirrors the VCR-type controls commonly
employed for pc video and audio playback. </p>
<p>The <tt><font FACE="Courier">Voice Text</font></tt> interface is also used to register
the application that will request TTS services. An application must successfully complete
the registration before the TTS engine can be used. This registration function can
optionally pass a pointer to a callback function to be used to capture voice text
messages. This establishes a notification callback with several methods, which are
triggered by messages sent from the underlying TTS engine. </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><b>Note</b></td>
</tr>
<tr>
<td><blockquote>
<p>Notification callbacks require a pointer to the function that will receive all related
messages. Callbacks cannot be registered using Visual Basic; you need C or C++. However,
the voice text OLE Automation type library that ships with the Speech SDK has a
notification callback built into it.</p>
</blockquote>
</td>
</tr>
</table>
</center></div>
<p>The <tt><font FACE="Courier">Attribute</font></tt> interface provides access to
settings that control the basic behavior of the TTS engine. For example, you can use the <tt><font
FACE="Courier">Attributes</font></tt> interface to set the audio device to be used, set
the playback speed (in words per minute), and turn the speech services on and off. If the
TTS engine supports it, you can also use the <tt><font FACE="Courier">Attributes</font></tt>
interface to select the TTS speaking mode. The TTS speaking mode usually refers to a
predefined set of voices, each having its own character or style (for example, male,
female, child, adult, and so on). </p>
<p>The <tt><font FACE="Courier">Dialogs</font></tt> interface can be used to allow users
the ability to set and retrieve information regarding the TTS engine. The exact contents
and layout of the dialog boxes are not determined by Microsoft but by the TTS engine
developer. Microsoft does, however, suggest the possible contents of each dialog box.
Table 15.2 shows the four voice text dialogs defined by the SAPI model, along with short
descriptions of their suggested contents.<br>
</p>
<p align="center"><b>Table 15.2. The Voice Text dialog boxes.</b> </p>
<div align="center"><center>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -