⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch16.htm

📁 MAPI__SAPI__TAPI
💻 HTM
📖 第 1 页 / 共 3 页
字号:

<p>When you add speech services to your programs, it is important to make sure you give 
users realistic expectations regarding the capabilities of the installation. This is best 
done through user documentation. You needn't go into great length, but you should give 
users general information about the state of SR technology, and make sure users do not 
expect to carry on extensive conversations with their new &quot;talking electronic 
pal.&quot; </p>

<p>Along with indications that speech services are active, it is a good idea to provide 
users with a single speech command that displays a list of recognized speech inputs, and 
some general online help regarding the use and capabilities of the SR services of your 
program. Since the total number of commands might be quite large, you may want to provide 
a type of voice-activated help system that allows users to query the current command set 
and then ask additional questions to learn more about the various speech commands they can 
use. </p>

<p>It is also a good idea to add confirmations to especially dangerous or ambiguous speech 
commands. For example, if you have a voice command for &quot;Delete,&quot; you should ask 
the user to confirm this option before continuing. This is especially important if you 
have other commands that may sound similar-if you have both &quot;Delete&quot; and 
&quot;Repeat&quot; in the command list you will want to make sure the system is quite sure 
which command was requested. </p>

<p>In general, it is a good idea to display the status of all speech processing. If the 
system does not understand a command, it is important to tell users rather than making 
them sit idle while your program waits for understandable input. If the system cannot 
identify a command, display a message telling the user to repeat the command, or bring up 
a dialog box that lists likely possibilities from which the user can select the requested 
command. </p>

<p>In some situations, background noise can hamper the performance of the SR engine. It is 
advisable to allow users to turn off speech services and only turn them back on when they 
are needed. This can be handled through a single button press or menu selection. In this 
way, stray noise will not be misinterpreted as speech input. </p>

<p>There are a few things to avoid when adding voice commands to an application. SR 
systems are not very successful when processing long series of numbers or single letters. 
&quot;M&quot; and &quot;N&quot; sound quite alike, and long lists of digits can confuse 
most SR systems. Also, although SR systems are capable of handling requests such as 
&quot;move mouse left,&quot; &quot;move mouse right,&quot; and so on, this is not a good 
use of voice technology. Using voice commands to handle a pointer device is a bit like 
using the keyboard to play musical notes. It is possible, but not desirable. </p>

<h2><a NAME="VoiceCommandMenuDesign"><font SIZE="5" COLOR="#FF0000">Voice Command Menu 
Design</font></a></h2>

<p>The key to designing good command menus is to make sure they are complete, consistent, 
and that they contain unique commands within the set. Good command menus also contain more 
than just the list of items displayed on the physical menu. It is a good idea to think of 
voice commands as you would keyboard shortcuts. </p>

<p>Useful voice command menus will provide access to all the common operations that might 
be performed by the user. For example, the standard menu might offer a top-level menu 
option of <tt><font FACE="Courier">Help</font></tt>. Under the <tt><font FACE="Courier">Help</font></tt> 
menu might be an <tt><font FACE="Courier">About</font></tt> item to display the basic 
information about the loaded application. It makes sense to add a voice command that 
provides direct access to the About box with a <tt><font FACE="Courier">Help About</font></tt> 
command. </p>

<p>These shortcut commands may span several menu levels or even stand independent of any 
existing menu. For example, in an application that is used to monitor the status of 
manufacturing operations within a plant, you might add a command such as <tt><font
FACE="Courier">Display Statistics</font></tt> that would gather data from several 
locations and present a graph onscreen. </p>

<p>When designing menus, be sure to include commands for all dialog boxes. It is not a 
good idea to provide voice commands for only some dialog boxes and not for others. </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><b>Tip</b></td>
  </tr>
  <tr>
    <td><blockquote>
      <p>You do not have to create menu commands for Windows-supplied dialog boxes (the Common 
      Dialogs, the Message Box, and so on). Windows automatically supplies voice commands for 
      these dialogs.</p>
    </blockquote>
    </td>
  </tr>
</table>
</center></div>

<p>Be sure to include voice commands for the list and combo boxes within a dialog box, as 
well as the command buttons, check boxes, and option buttons. </p>

<p>In addition to creating menus for all the dialog boxes of your applications, you should 
consider creating a &quot;global&quot; menu that is active as long as the application is 
running. This would allow users to execute common operations such as <i>Get New Mail</i> 
or <i>Display Status</i> <i>Log</i> without having to first bring the application into the 
foreground. </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><b>Tip</b></td>
  </tr>
  <tr>
    <td><blockquote>
      <p>It is advisable to limit this use of speech services to only a few vital and unique 
      commands since any other applications that have speech services may also activate global 
      commands.</p>
    </blockquote>
    </td>
  </tr>
</table>
</center></div>

<p>It is also important to include common alternate wordings for commonly used operations, 
such as <i>Get New Mail</i> and <i>Check for New Mail</i>, and so on. Although you may not 
be able to include all possible alternatives, adding a few will greatly improve the 
accessibility of your speech interface. </p>

<p>Use consistent word order in your menu design. For example, for action commands you 
should use the verb-noun construct, as in <i>Save File</i> or <i>Check E-Mail</i>. For 
questions, use a consistent preface such as <i>How do</i> <i>I…</i> or <i>Help Me…</i>, 
as in <i>How do I check e-mail?</i> or <i>Help me change font</i>. It is also important to 
be consistent with the use of singular and plural. In the above example, you must be sure 
to use <i>Font</i> or <i>Fonts</i> throughout the application. </p>

<p>Since the effectiveness of the SR engine is determined by its ability to identify your 
voice input against a list of valid words, you can increase the accuracy of the SR engine 
by keeping the command lists relatively short. When a command is spoken, the engine will 
scan the list of valid inputs in this state and select the most likely candidate. The more 
words on the list, the greater the chance the engine will select the wrong command. By 
limiting the list, you can increase the odds of a correct &quot;hit.&quot; </p>

<p>Finally, you can greatly increase the accuracy of the SR engine by avoiding 
similar-sounding words in commands. For example, <i>repeat</i> and <i>delete</i> are 
dangerously similar. Other words that are easily confused are <i>go</i> and <i>no</i>, and 
even <i>on</i> and <i>off</i>. You can still use these words in your application if you 
use them in separate states. In other words, do not use <i>repeat</i> in the same set of 
menu options as <i>delete</i>. </p>

<h2><a NAME="TTSDesignIssues"><font SIZE="5" COLOR="#FF0000">TTS Design Issues</font></a></h2>

<p>There are a few things to keep in mind when adding text-to-speech services to your 
applications. First, make sure you design your application to offer TTS as an option, not 
as a required service. Your application may be installed on a workstation that does not 
have the required resources, or the user may decided to turn off TTS services to improve 
overall performance. For this reason, it is also important to provide visual as well as 
aural feedback for all major operations. For example, when processing is complete, it is a 
good idea to inform the user with a dialog box as well as a spoken message. </p>

<p>Because TTS engines typically produce a voice that is less than human-like, extended 
sessions of listening to TTS output can be tiring to users. It is a good idea to limit TTS 
output to short phrases. For example, if your application gathers status data on several 
production operations on the shop floor, it is better to have the program announce the 
completion of the process (for example, <i>Status report</i> <i>complete</i>) instead of 
announcing the details of the findings. Alternatively, your TTS application could announce 
a short summary of the data (for example, <i>All operations on time and within 
specifications</i>). </p>

<p>If your application must provide extended TTS sessions you should consider using 
pre-recorded WAV files for output. For example, if your application should allow users 
aural access to company regulations or documentation, it is better to record a person 
reading the documents, and then play back these recordings to users upon request. Also, if 
your application provides a limited set of vocal responses to the user, it is advisable to 
use WAV recordings instead of TTS output. A good example of this would be telephony 
applications that ask users questions and respond with fixed answers. </p>

<p>Finally, it is not advisable to mix WAV output and TTS output in the same session. This 
highlights the differences between the quality of recorded voice and computer-generated 
speech. Switching between WAV and TTS can also make it harder for users to understand the 
TTS voice since they may be expecting a familiar recorded voice and hear 
computer-generated TTS instead. </p>

<h2><a NAME="Summary"><font SIZE="5" COLOR="#FF0000">Summary</font></a> </h2>

<p>This chapter covered three main topics: 

<ul>
  <li><font COLOR="#000000">Hardware and software requirements for SAPI applications</font> </li>
  <li><font COLOR="#000000">The state of the art and limits of SR/TTS technology</font> </li>
  <li><font COLOR="#000000">Design tips for adding speech services to Windows applications</font> 
  </li>
</ul>

<p>The Microsoft Speech SDK only works on 32-bit operating systems. This means you will 
need Windows 95 or Windows NT version 3.5 or greater in order to run SAPI applications. </p>

<p>The minimum, recommended, and preferred processor and RAM requirements for SAPI 
applications vary depending on the level of services your application provides. The 
minimum SAPI-enabled system may need as little as 1MB of additional RAM and be able to run 
on a 486/33 processor. However, it is a good idea to require at least Pentium 60 processor 
and an additional 8MB RAM. This will give your applications the additional computational 
power needed for the most typical SAPI implementations. </p>

<p>SAPI systems can use just about any of the current sound cards on the market today. Any 
card that is compatible with the Windows Sound System or with Sound Blaster systems will 
work fine. You should use a close-talk, unidirectional microphone, and use either external 
speakers or headphones for monitoring audio output. </p>

<p>You learned that SR technology uses three basic processes for interpreting audio input: 

<ul>
  <li><font COLOR="#000000">Word selection</font> </li>
  <li><font COLOR="#000000">Word analysis</font> </li>
  <li><font COLOR="#000000">Speaker dependence</font> </li>
</ul>

<p>You also learned that SR systems have their limits. SR engines cannot automatically 
distinguish between multiple speakers, cannot learn new words, guess at spelling, or 
handle wide variations in word pronunciation (for example, <i>to-may-toe</i> or <i>to-mah-toe</i>). 
</p>

<p>TTS engine technology is based on two different types of implementation. Synthesis 
systems create audio output by generating audio tones using algorithms. This results in 
unmistakably computer-like speech. Diphone concatenation is an alternate method for 
generating speech. Diphones are a set of phoneme pairs collected from actual human speech 
samples. The TTS engine is able to convert text into phoneme pairs and match them to 
diphones in the TTS engine database. TTS engines are not able to mimic human speech 
patterns and rhythms (called <i>prosody</i>) and are not very good at communicating 
emotions. Also, most TTS engines experience difficulty with unusual words. This can result 
in odd-sounding phrases. </p>

<p>Finally, you learned some tips on designing and implementing speech services. Some of 
the tips covered here were: 

<ul>
  <li><font COLOR="#000000">Make SR and TTS services optional whenever possible.</font> </li>
  <li><font COLOR="#000000">Design voice command menus to provide easy access to all major 
    operations.</font> </li>
  <li><font COLOR="#000000">Avoid similar-sounding words and inconsistent word order and keep 
    command lists short.</font> </li>
  <li><font COLOR="#000000">Limit TTS use to short playback, use WAV recordings for long 
    playback sessions.</font> </li>
  <li><font COLOR="#000000">Don't mix TTS and WAV playback in the same session.</font> </li>
</ul>

<p>In the next chapter, you'll use the information you learned here to start creating 
SAPI-enabled applications. </p>

<hr WIDTH="100%">

<p align="center"><a HREF="ch15.htm"><img SRC="pc.gif" BORDER="0" HEIGHT="88" WIDTH="140"></a><a
HREF="#CONTENTS"><img SRC="cc.gif" BORDER="0" HEIGHT="88" WIDTH="140"></a><a
HREF="index.htm"><img SRC="hb.gif" BORDER="0" HEIGHT="88" WIDTH="140"></a> <a
HREF="ch17.htm"><img SRC="nc.gif" BORDER="0" HEIGHT="88" WIDTH="140"></a></p>

<hr WIDTH="100%">
<layer src="http://www.spidersoft.com/ads/bwz468_60.htm" visibility="hidden" id="a1" width="600" onload="moveToAbsolute(ad1.pageX,ad1.pageY); a1.clip.height=60;visibility='show';">
</layer>
</body>
</html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -