📄 ch16.htm
字号:
<html>
<head>
<title>Chapter 16 -- SAPI Basics</title>
<meta NAME="GENERATOR" CONTENT="Microsoft FrontPage 3.0">
</head>
<body TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#CE2910">
<!-- Spidersoft WebZIP Ad Banner Insert -->
<!-- End of Spidersoft WebZIP Ad Banner Insert-->
<h1><font COLOR="#FF0000">Chapter 16</font></h1>
<h1><b><font SIZE="5" COLOR="#FF0000">SAPI Basics</font></b> </h1>
<hr WIDTH="100%">
<h3 ALIGN="CENTER"><font SIZE="+2" COLOR="#000000">CONTENTS<a NAME="CONTENTS"></a> </font></h3>
<ul>
<li><a HREF="#SAPIHardware">SAPI Hardware</a> <ul>
<li><a HREF="#GeneralHardwareRequirements">General Hardware Requirements</a> </li>
<li><a HREF="#SoftwareRequirementsOperatingSystems">Software Requirements-Operating Systems
and Speech Engines</a> </li>
<li><a HREF="#SpecialHardwareRequirementsSoundCard">Special Hardware Requirements-Sound
Cards, Microphones, and Speakers</a> </li>
</ul>
</li>
<li><a HREF="#TechnologyIssues">Technology Issues</a> <ul>
<li><a HREF="#SRTechniques">SR Techniques</a> </li>
<li><a HREF="#SRLimits">SR Limits</a> </li>
<li><a HREF="#TTSTechniques">TTS Techniques</a> </li>
<li><a HREF="#TTSLimits">TTS Limits</a> </li>
</ul>
</li>
<li><a HREF="#GeneralSRDesignIssues">General SR Design Issues</a> </li>
<li><a HREF="#VoiceCommandMenuDesign">Voice Command Menu Design</a> </li>
<li><a HREF="#TTSDesignIssues">TTS Design Issues</a> </li>
<li><a HREF="#Summary">Summary</a> </li>
</ul>
<hr>
<p><font COLOR="#000000">This chapter covers a handful of </font>issues that must be
addressed when designing and installing SR/TTS applications, including hardware
requirements, and the state of current SR/TTS technology and its limits. The chapter also
includes some tips for designing your SR/TTS applications. </p>
<p>SR/TTS applications can be resource hogs. The section on hardware shows you the
minimal, recommended, and preferred processor and RAM requirements for the most common
SR/TTS applications. Of course, speech applications also need special hardware, including
audio cards, microphones, and speakers. In this chapter, you'll find a general list of
compatible devices, along with tips on what other options you have and how to use them. </p>
<p>You'll also learn about the general state of SR/TTS technology and its limits. This
will help you design applications that do not place unrealistic demands on the software or
raise users' expectations beyond the capabilities of your application. </p>
<p>Finally, this chapter contains a set of tips and suggestions for designing and
implementing SR/TTS services. You'll learn how to design SR and TTS interfaces that reduce
the chance of engine errors, and increase the usability of your programs. </p>
<p>When you complete this chapter, you'll know just what hardware is needed for speech
systems and how to design programs that can successfully implement SR/TTS services that
really work. </p>
<h2><a NAME="SAPIHardware"><font SIZE="5" COLOR="#FF0000">SAPI Hardware</font></a> </h2>
<p>Speech systems can be resource intensive. It is especially important that SR engines
have enough RAM and disk space to respond quickly to user requests. Failure to respond
quickly results in additional commands spoken into the system. This has the effect of
creating a spiraling degradation in performance. The worse things get, the worse things
get. It doesn't take too much of this before users decide your software is more trouble
than it's worth! </p>
<p>Text-to-speech engines can also tax the system. While TTS engines do not always require
a great deal of memory to operate, insufficient processor speed can result in halting or
unintelligible playback of text. </p>
<p>For these reasons, it is important to establish clear hardware and software
requirements when designing and implementing your speech-aware and speech-enabled
applications. Not all pcs will have the memory, disk space, and hardware needed to
properly implement SR and TTS services. There are three general categories of workstation
resources that should be reviewed:
<ul>
<li><i>General hardware</i>, including processor speed and RAM memory </li>
<li><i>Software</i>, including operating system and SR/TTS engines </li>
<li><i>Special hardware</i>, including sound cards, microphones, speakers, and headphones </li>
</ul>
<p>The following three sections provide some general guidelines to follow when
establishing minimal resource requirements for your applications. </p>
<h3><a NAME="GeneralHardwareRequirements">General Hardware Requirements</a> </h3>
<p>Speech systems can tax processor and RAM resources. SR services require varying levels
of resources depending on the type of SR engine installed and the level of services
implemented. TTS engine requirements are rather stable, but also depend on the TTS engine
installed. </p>
<p>The SR and TTS engines currently available for SAPI systems usually can be successfully
implemented using as little as a 486/33 processor chip and an additional 1MB of RAM.
However, overall pc performance with this configuration is pretty poor and is not
recommended. A good suggested processor is a Pentium processor (P60 or better) with at
least 16MB of total RAM. Systems that will be supporting dictation SR services require the
most computational power. It is not unreasonable to expect the workstation to use 32MB of
RAM and a P100 or higher processor. Obviously, the more resources, the better the
performance. </p>
<h4>SR Processor and Memory Requirements</h4>
<p>In general, SR systems that implement command and control services will only need an
additional 1MB of RAM (not counting the application's RAM requirement). Dictation services
should get at least another 8MB of RAM-preferably more. The type of speech sampling,
analysis, and size of recognition vocabulary all affect the minimal resource requirements.
Table 16.1 shows published minimal processor and RAM requirements of speech recognition
services.<br>
</p>
<p align="center"><b>Table 16.1. Published minimal processor and RAM requirements of SR
services.</b> </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><i>Levels of Speech-Recognition Services</i> </td>
<td WIDTH="142"><p align="center"><i>Minimal Processor</i></td>
<td WIDTH="189"><p align="center"><i>Minimal Additional RAM</i></td>
</tr>
<tr>
<td WIDTH="259">Discrete, speaker-dependent, whole word, small vocabulary </td>
<td WIDTH="142"><p align="center">386/16</td>
<td WIDTH="189"><p align="center">64K </td>
</tr>
<tr>
<td WIDTH="259">Discrete, speaker-independent, whole word, small vocabulary </td>
<td WIDTH="142"><p align="center">386/33</td>
<td WIDTH="189"><p align="center">256K </td>
</tr>
<tr>
<td WIDTH="259">Continuous, speaker-independent, sub-word, small vocabulary </td>
<td WIDTH="142"><p align="center">486/33</td>
<td WIDTH="189"><p align="center">1MB </td>
</tr>
<tr>
<td WIDTH="259">Discrete, speaker-dependent, whole word, large vocabulary </td>
<td WIDTH="142">Pentium</td>
<td WIDTH="189"><p align="center">8MB </td>
</tr>
<tr>
<td WIDTH="259">Continuous, speaker-independent, sub-word, large vocabulary </td>
<td WIDTH="142">RISC processor</td>
<td WIDTH="189"><p align="center">8MB </td>
</tr>
</table>
</center></div>
<p>These memory requirements are in addition to the requirements of the operating system
and any loaded applications. The minimal Windows 95 memory model should be 12MB.
Recommended RAM is 16MB and 24MB is preferred. The minimal NT memory should be 16MB with
24MB recommended and 32MB preferred. </p>
<h4>TTS Processor and Memory Requirements</h4>
<p>TTS engines do not place as much of a demand on workstation resources as SR engines.
Usually TTS services only require a 486/33 processor and only 1MB of additional RAM. TTS
programs themselves are rather small-about 150K. However, the grammar and prosody rules
can demand as much as another 1MB depending on the complexity of the language being
spoken. It is interesting to note that probably the most complex and demanding language
for TTS processing is English. This is primarily due to the irregular spelling patterns of
the language. </p>
<p>Most TTS engines use speech synthesis to produce the audio output. However, advanced
systems can use diphone concatenation. Since diphone-based systems rely on a set of actual
voice samples for reproducing written text, these systems can require an additional 1MB of
RAM. To be safe, it is a good idea to suggest a requirement of 2MB of additional RAM, with
a recommendation of 4MB for advanced TTS systems. </p>
<h3><a NAME="SoftwareRequirementsOperatingSystems">Software Requirements-Operating Systems
and Speech Engines</a></h3>
<p>The general software requirements are rather simple. The Microsoft Speech API can only
be implemented on Windows 32-bit operating systems. This means you'll need Windows 95 or
Windows NT 3.5 or greater on the workstation. </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><b>Note</b></td>
</tr>
<tr>
<td><blockquote>
<p>All the testing and programming examples covered in this book have been performed using
Windows 95. It is assumed that Windows NT systems will not require any additional
modifications.</p>
</blockquote>
</td>
</tr>
</table>
</center></div>
<p>The most important software requirements for implementing speech services are the SR
and TTS engines. An SR/TTS engine is the back-end processing module in the SAPI model.
Your application is the front end, and the <tt><font FACE="Courier">SPEEch.DLL</font></tt>
acts as the broker between the two processes. </p>
<p>The new wave of multimedia pcs usually has SR/TTS engines as part of their initial
software package. For existing pcs, most sound cards now ship with SR/TTS engines. </p>
<p>Microsoft's Speech SDK does not include a set of SR/TTS engines. However, Microsoft
does have an engine on the market. Their Microsoft Phone software system (available as
part of modem/sound card packages) includes the Microsoft Voice SR/TTS engine. You can
also purchase engines directly from third-party vendors. </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><b>Note</b></td>
</tr>
<tr>
<td><blockquote>
<p>Refer to appendix B, "SAPI Resources," for a list of vendors that support the
Speech API. You can also check the CD-ROM that ships with this book for the most recent
list of SAPI vendors. Finally, the Microsoft Speech SDK contains a list of SAPI engine
providers in the <tt><font FACE="Courier">ENGINE.DOC</font></tt> file. </p>
</blockquote>
</td>
</tr>
</table>
</center></div>
<h3><a NAME="SpecialHardwareRequirementsSoundCard">Special Hardware Requirements-Sound
Cards, Microphones, and Speakers</a></h3>
<p>Complete speech-capable workstations need three additional pieces of hardware:
<ul>
<li><font COLOR="#000000">A </font><i>sound card</i> for audio reproduction </li>
<li><i>Speakers</i> for audio playback </li>
<li><font COLOR="#000000">A </font><i>microphone</i><font FACE="AGaramond Bold"> </font>for
audio input </li>
</ul>
<p>Just about any sound card can support SR/TTS engines. Any of the major vendors' cards
are acceptable, including Sound Blaster and its compatibles, Media Vision, ESS technology,
and others. Any card that is compatible with Microsoft's Windows Sound System is also
acceptable. </p>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -