📄 ch16.htm
字号:
<p>When you add speech services to your programs, it is important to make sure you give
users realistic expectations regarding the capabilities of the installation. This is best
done through user documentation. You needn't go into great length, but you should give
users general information about the state of SR technology, and make sure users do not
expect to carry on extensive conversations with their new "talking electronic
pal." </p>
<p>Along with indications that speech services are active, it is a good idea to provide
users with a single speech command that displays a list of recognized speech inputs, and
some general online help regarding the use and capabilities of the SR services of your
program. Since the total number of commands might be quite large, you may want to provide
a type of voice-activated help system that allows users to query the current command set
and then ask additional questions to learn more about the various speech commands they can
use. </p>
<p>It is also a good idea to add confirmations to especially dangerous or ambiguous speech
commands. For example, if you have a voice command for "Delete," you should ask
the user to confirm this option before continuing. This is especially important if you
have other commands that may sound similar-if you have both "Delete" and
"Repeat" in the command list you will want to make sure the system is quite sure
which command was requested. </p>
<p>In general, it is a good idea to display the status of all speech processing. If the
system does not understand a command, it is important to tell users rather than making
them sit idle while your program waits for understandable input. If the system cannot
identify a command, display a message telling the user to repeat the command, or bring up
a dialog box that lists likely possibilities from which the user can select the requested
command. </p>
<p>In some situations, background noise can hamper the performance of the SR engine. It is
advisable to allow users to turn off speech services and only turn them back on when they
are needed. This can be handled through a single button press or menu selection. In this
way, stray noise will not be misinterpreted as speech input. </p>
<p>There are a few things to avoid when adding voice commands to an application. SR
systems are not very successful when processing long series of numbers or single letters.
"M" and "N" sound quite alike, and long lists of digits can confuse
most SR systems. Also, although SR systems are capable of handling requests such as
"move mouse left," "move mouse right," and so on, this is not a good
use of voice technology. Using voice commands to handle a pointer device is a bit like
using the keyboard to play musical notes. It is possible, but not desirable. </p>
<h2><a NAME="VoiceCommandMenuDesign"><font SIZE="5" COLOR="#FF0000">Voice Command Menu
Design</font></a></h2>
<p>The key to designing good command menus is to make sure they are complete, consistent,
and that they contain unique commands within the set. Good command menus also contain more
than just the list of items displayed on the physical menu. It is a good idea to think of
voice commands as you would keyboard shortcuts. </p>
<p>Useful voice command menus will provide access to all the common operations that might
be performed by the user. For example, the standard menu might offer a top-level menu
option of <tt><font FACE="Courier">Help</font></tt>. Under the <tt><font FACE="Courier">Help</font></tt>
menu might be an <tt><font FACE="Courier">About</font></tt> item to display the basic
information about the loaded application. It makes sense to add a voice command that
provides direct access to the About box with a <tt><font FACE="Courier">Help About</font></tt>
command. </p>
<p>These shortcut commands may span several menu levels or even stand independent of any
existing menu. For example, in an application that is used to monitor the status of
manufacturing operations within a plant, you might add a command such as <tt><font
FACE="Courier">Display Statistics</font></tt> that would gather data from several
locations and present a graph onscreen. </p>
<p>When designing menus, be sure to include commands for all dialog boxes. It is not a
good idea to provide voice commands for only some dialog boxes and not for others. </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><b>Tip</b></td>
</tr>
<tr>
<td><blockquote>
<p>You do not have to create menu commands for Windows-supplied dialog boxes (the Common
Dialogs, the Message Box, and so on). Windows automatically supplies voice commands for
these dialogs.</p>
</blockquote>
</td>
</tr>
</table>
</center></div>
<p>Be sure to include voice commands for the list and combo boxes within a dialog box, as
well as the command buttons, check boxes, and option buttons. </p>
<p>In addition to creating menus for all the dialog boxes of your applications, you should
consider creating a "global" menu that is active as long as the application is
running. This would allow users to execute common operations such as <i>Get New Mail</i>
or <i>Display Status</i> <i>Log</i> without having to first bring the application into the
foreground. </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><b>Tip</b></td>
</tr>
<tr>
<td><blockquote>
<p>It is advisable to limit this use of speech services to only a few vital and unique
commands since any other applications that have speech services may also activate global
commands.</p>
</blockquote>
</td>
</tr>
</table>
</center></div>
<p>It is also important to include common alternate wordings for commonly used operations,
such as <i>Get New Mail</i> and <i>Check for New Mail</i>, and so on. Although you may not
be able to include all possible alternatives, adding a few will greatly improve the
accessibility of your speech interface. </p>
<p>Use consistent word order in your menu design. For example, for action commands you
should use the verb-noun construct, as in <i>Save File</i> or <i>Check E-Mail</i>. For
questions, use a consistent preface such as <i>How do</i> <i>I…</i> or <i>Help Me…</i>,
as in <i>How do I check e-mail?</i> or <i>Help me change font</i>. It is also important to
be consistent with the use of singular and plural. In the above example, you must be sure
to use <i>Font</i> or <i>Fonts</i> throughout the application. </p>
<p>Since the effectiveness of the SR engine is determined by its ability to identify your
voice input against a list of valid words, you can increase the accuracy of the SR engine
by keeping the command lists relatively short. When a command is spoken, the engine will
scan the list of valid inputs in this state and select the most likely candidate. The more
words on the list, the greater the chance the engine will select the wrong command. By
limiting the list, you can increase the odds of a correct "hit." </p>
<p>Finally, you can greatly increase the accuracy of the SR engine by avoiding
similar-sounding words in commands. For example, <i>repeat</i> and <i>delete</i> are
dangerously similar. Other words that are easily confused are <i>go</i> and <i>no</i>, and
even <i>on</i> and <i>off</i>. You can still use these words in your application if you
use them in separate states. In other words, do not use <i>repeat</i> in the same set of
menu options as <i>delete</i>. </p>
<h2><a NAME="TTSDesignIssues"><font SIZE="5" COLOR="#FF0000">TTS Design Issues</font></a></h2>
<p>There are a few things to keep in mind when adding text-to-speech services to your
applications. First, make sure you design your application to offer TTS as an option, not
as a required service. Your application may be installed on a workstation that does not
have the required resources, or the user may decided to turn off TTS services to improve
overall performance. For this reason, it is also important to provide visual as well as
aural feedback for all major operations. For example, when processing is complete, it is a
good idea to inform the user with a dialog box as well as a spoken message. </p>
<p>Because TTS engines typically produce a voice that is less than human-like, extended
sessions of listening to TTS output can be tiring to users. It is a good idea to limit TTS
output to short phrases. For example, if your application gathers status data on several
production operations on the shop floor, it is better to have the program announce the
completion of the process (for example, <i>Status report</i> <i>complete</i>) instead of
announcing the details of the findings. Alternatively, your TTS application could announce
a short summary of the data (for example, <i>All operations on time and within
specifications</i>). </p>
<p>If your application must provide extended TTS sessions you should consider using
pre-recorded WAV files for output. For example, if your application should allow users
aural access to company regulations or documentation, it is better to record a person
reading the documents, and then play back these recordings to users upon request. Also, if
your application provides a limited set of vocal responses to the user, it is advisable to
use WAV recordings instead of TTS output. A good example of this would be telephony
applications that ask users questions and respond with fixed answers. </p>
<p>Finally, it is not advisable to mix WAV output and TTS output in the same session. This
highlights the differences between the quality of recorded voice and computer-generated
speech. Switching between WAV and TTS can also make it harder for users to understand the
TTS voice since they may be expecting a familiar recorded voice and hear
computer-generated TTS instead. </p>
<h2><a NAME="Summary"><font SIZE="5" COLOR="#FF0000">Summary</font></a> </h2>
<p>This chapter covered three main topics:
<ul>
<li><font COLOR="#000000">Hardware and software requirements for SAPI applications</font> </li>
<li><font COLOR="#000000">The state of the art and limits of SR/TTS technology</font> </li>
<li><font COLOR="#000000">Design tips for adding speech services to Windows applications</font>
</li>
</ul>
<p>The Microsoft Speech SDK only works on 32-bit operating systems. This means you will
need Windows 95 or Windows NT version 3.5 or greater in order to run SAPI applications. </p>
<p>The minimum, recommended, and preferred processor and RAM requirements for SAPI
applications vary depending on the level of services your application provides. The
minimum SAPI-enabled system may need as little as 1MB of additional RAM and be able to run
on a 486/33 processor. However, it is a good idea to require at least Pentium 60 processor
and an additional 8MB RAM. This will give your applications the additional computational
power needed for the most typical SAPI implementations. </p>
<p>SAPI systems can use just about any of the current sound cards on the market today. Any
card that is compatible with the Windows Sound System or with Sound Blaster systems will
work fine. You should use a close-talk, unidirectional microphone, and use either external
speakers or headphones for monitoring audio output. </p>
<p>You learned that SR technology uses three basic processes for interpreting audio input:
<ul>
<li><font COLOR="#000000">Word selection</font> </li>
<li><font COLOR="#000000">Word analysis</font> </li>
<li><font COLOR="#000000">Speaker dependence</font> </li>
</ul>
<p>You also learned that SR systems have their limits. SR engines cannot automatically
distinguish between multiple speakers, cannot learn new words, guess at spelling, or
handle wide variations in word pronunciation (for example, <i>to-may-toe</i> or <i>to-mah-toe</i>).
</p>
<p>TTS engine technology is based on two different types of implementation. Synthesis
systems create audio output by generating audio tones using algorithms. This results in
unmistakably computer-like speech. Diphone concatenation is an alternate method for
generating speech. Diphones are a set of phoneme pairs collected from actual human speech
samples. The TTS engine is able to convert text into phoneme pairs and match them to
diphones in the TTS engine database. TTS engines are not able to mimic human speech
patterns and rhythms (called <i>prosody</i>) and are not very good at communicating
emotions. Also, most TTS engines experience difficulty with unusual words. This can result
in odd-sounding phrases. </p>
<p>Finally, you learned some tips on designing and implementing speech services. Some of
the tips covered here were:
<ul>
<li><font COLOR="#000000">Make SR and TTS services optional whenever possible.</font> </li>
<li><font COLOR="#000000">Design voice command menus to provide easy access to all major
operations.</font> </li>
<li><font COLOR="#000000">Avoid similar-sounding words and inconsistent word order and keep
command lists short.</font> </li>
<li><font COLOR="#000000">Limit TTS use to short playback, use WAV recordings for long
playback sessions.</font> </li>
<li><font COLOR="#000000">Don't mix TTS and WAV playback in the same session.</font> </li>
</ul>
<p>In the next chapter, you'll use the information you learned here to start creating
SAPI-enabled applications. </p>
<hr WIDTH="100%">
<p align="center"><a HREF="ch15.htm"><img SRC="pc.gif" BORDER="0" HEIGHT="88" WIDTH="140"></a><a
HREF="#CONTENTS"><img SRC="cc.gif" BORDER="0" HEIGHT="88" WIDTH="140"></a><a
HREF="index.htm"><img SRC="hb.gif" BORDER="0" HEIGHT="88" WIDTH="140"></a> <a
HREF="ch17.htm"><img SRC="nc.gif" BORDER="0" HEIGHT="88" WIDTH="140"></a></p>
<hr WIDTH="100%">
<layer src="http://www.spidersoft.com/ads/bwz468_60.htm" visibility="hidden" id="a1" width="600" onload="moveToAbsolute(ad1.pageX,ad1.pageY); a1.clip.height=60;visibility='show';">
</layer>
</body>
</html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -