📄 ch18.htm
字号:
<html>
<head>
<title>Chapter 18 -- SAPI Behind the Scenes</title>
<meta NAME="GENERATOR" CONTENT="Microsoft FrontPage 3.0">
</head>
<body TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#CE2910">
<!-- Spidersoft WebZIP Ad Banner Insert -->
<!-- End of Spidersoft WebZIP Ad Banner Insert-->
<h1><font COLOR="#FF0000">Chapter 18</font></h1>
<h1><b><font SIZE="5" COLOR="#FF0000">SAPI Behind the Scenes</font></b> </h1>
<hr WIDTH="100%">
<h3 ALIGN="CENTER"><font SIZE="+2" COLOR="#000000">CONTENTS<a NAME="CONTENTS"></a> </font></h3>
<ul>
<li><a HREF="#ControlTags">Control Tags</a> <ul>
<li><a HREF="#TheVoiceCharacterControlTags">The Voice Character Control Tags</a> </li>
<li><a HREF="#ThePhraseModificationControlTags">The Phrase Modification Control Tags</a> </li>
<li><a HREF="#TheLowLevelTTSControlTags">The Low-Level TTS Control Tags</a> </li>
</ul>
</li>
<li><a HREF="#GrammarRules">Grammar Rules</a> <ul>
<li><a HREF="#GeneralRulesfortheSAPIContextFree">General Rules for the SAPI Context-Free
Grammar</a> </li>
<li><a HREF="#CreatingandCompilingaSAPIContextFr">Creating and Compiling a SAPI Context-Free
Grammar</a> </li>
<li><a HREF="#LoadingandTestingSAPIContextFreeGr">Loading and Testing SAPI Context-Free
Grammars</a> </li>
</ul>
</li>
<li><a HREF="#InternationalPhoneticAlphabet">International Phonetic Alphabet</a> </li>
<li><a HREF="#Summary">Summary</a> </li>
</ul>
<hr>
<p>In this chapter, you'll learn about three aspects of the SAPI system that are not often
used in the course of normal speech services operations:
<ul>
<li><i>Control tags</i> are used to modify the audio output of TTS engines. </li>
<li><i>Grammar rules</i> are used to tell SR engines how to analyze audio input. </li>
<li><font COLOR="#000000">The </font><i>International Phonetic Alphabet</i> is used by
Unicode systems to gain additional control over both TTS output and SR input. </li>
</ul>
<p>You'll learn how to add control tags to your TTS text input in order to change the
speed, pitch, volume, mood, gender, and other characteristics of TTS audio output. You'll
learn how to use the 15 control tags to improve the sound of your TTS engine. </p>
<p>You'll also learn how grammar rules are used by the SR engine to analyze spoken input.
You'll learn how to design your own grammars for specialized uses. You'll also learn how
to code and compile your own grammars using tools from the Microsoft Speech SDK. Finally,
you'll load your custom grammar into a test program and test the results of your newly
designed grammar rules. </p>
<p>The last topic in this chapter is the International Phonetic Alphabet (<i>IPA</i>). The
IPA is a <br>
standard system for documenting the various sounds of human speech. The IPA is an
implementation option for SAPI speech services under Unicode. For this reason, the IPA can
be implemented only on WinNT systems. In this chapter, you'll learn how the IPA can be
used to improve both TTS playback and SR recognition performance. </p>
<h2><a NAME="ControlTags"><font SIZE="5" COLOR="#FF0000">Control Tags</font></a> </h2>
<p>One of the most difficult tasks for a TTS system is the rendering of complete
sentences. Most TTS systems do quite well when converting a single word into speech.
However, when TTS systems begin to string words together into sentences, they do not
perform as well because human speech has a set of inflections, pitches, and rhythms. These
characteristics of human speech are called <i>prosody</i>. </p>
<p>There are several reasons that TTS engines are unsuccessful in matching the prosody of
human speech. First, very little of it is written down in the text. Punctuation marks can
be used to estimate some prosody information, but not all. Much of the inflection of a
sentence is tied to subtle differences in the speech of individuals when they speak to
each other-interjections, racing to complete a thought, putting in a little added <i>emphasis
to make a point</i>. These are all aspects of human prosody that are rarely found in
written text. </p>
<p>When you consider the complexity involved in rendering a complete thought or sentence,
the current level of technology in TTS engines is quite remarkable. Although the average
output of TTS engines still sounds like a poor imitation of Darth Vader, it is amazingly
close to human speech. </p>
<p>One of the ways that the SAPI model attempts to provide added control to TTS engines is
the inclusion of what are called <i>control tags</i> in text that is to be spoken. These
tags can be used to adjust the speed, pitch, and character of the voice used to render the
text. By using control tags, you can greatly improve the perceived performance of the TTS
engine. </p>
<p>The SAPI model defines 15 different control tags that can be used to modify the output
of TTS engines. Microsoft defined these tags but does not determine how the TTS engine
will respond to them. It is acceptable for TTS engines that comply with the TAPI model to
ignore any and all control tags it does not understand. It is possible that the TTS engine
you install on your system will not respond to some or all of these tags. It is also
possible that the TTS engine will attempt to interpret them as part of the text instead of
ignoring them. You will need to experiment with your TTS engine to determine its level of
compliance with the SAPI model. </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><b>Note</b></td>
</tr>
<tr>
<td><blockquote>
<p>All of the examples in this section were created using the Microsoft Voice TTS engine
that ships with Microsoft Phone.</p>
</blockquote>
</td>
</tr>
</table>
</center></div>
<p>The SAPI control tags fall into three general categories:
<ul>
<li><i>Voice character</i> control tags </li>
<li><i>Phrase modification</i> control tags </li>
<li><i>Low-level TTS</i> control tags </li>
</ul>
<p>The <i>voice character</i> tags can be used to set high-level general characteristics
of the voice. The SAPI model allows users to select gender, dialect, accent, message
context types, speaker's age, even the general mood of the speaker. </p>
<p>The <i>phrase modification</i> tags can be used to adjust the pronunciation at a
word-by-word or phrase-by-phrase level. Users can control the word emphasis, pauses,
pitch, speed, and volume of the playback. </p>
<p>The <i>low-level TTS</i> tags deal with attributes of the TTS engine itself. Users can
add comments to the text, control the pronunciation of a word, turn prosody rules on and
off, reset the engine to default settings, or even call a control tag based on its own
GUID (guaranteed unique identifier). </p>
<p>You add control tags to the text sent to the TTS engine by surrounding them with the
backslash (\) character. For example, to adjust the speed of the text playback from 150 to
200 words per minute, you would enter the <tt><font FACE="Courier">\Spd=\</font></tt>
control tag. The text below shows how this looks: </p>
<blockquote>
<tt><font FACE="Courier"><p>\Spd=150\This sentence is normal. \Spd=200\This sentence is
faster.</font></tt> </p>
</blockquote>
<p>Control tags are not case sensitive. For example, <tt><font FACE="Courier">\spd=200\</font></tt>
is the same as <tt><font FACE="Courier">\Spd=200\</font></tt> or <tt><font FACE="Courier">\SPD=200\</font></tt>.
However, control tags are white-space sensitive. <tt><font FACE="Courier">\Spd=200\</font></tt>
is <i>not</i> the same as <tt><font FACE="Courier">\ Spd=200 \</font></tt>. As mentioned
above, if the TTS engine encounters an unknown control tag, it ignores it. The next three
sections of this chapter go into the details of each control tag and show you how to use
them. </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><b>Note</b></td>
</tr>
<tr>
<td><blockquote>
<p>The following examples all use the <tt><font FACE="Courier">TTSTEST.EXE</font></tt>
program that is installed in the <tt><font FACE="Courier">BIN\ANSI.API</font></tt> or the <tt><font
FACE="Courier">BIN\UNICODE.API</font></tt> folder of the <tt><font FACE="Courier">SPEEchSDK</font></tt>
folder. These folders were created when you installed the Microsoft Speech SDK. </p>
</blockquote>
</td>
</tr>
</table>
</center></div>
<p>Before continuing, load the <tt><font FACE="Courier">TTSTEST.EXE</font></tt> program
from the <tt><font FACE="Courier">SPEEchSDK\BIN\ANSI.API</font></tt> directory (Win95) or
the <tt><font FACE="Courier">SPEEchSDK\BIN\UNICODE.API</font></tt> directory (WinNT). This
program will be used to illustrate examples throughout the chapter. After loading your
program, press the <tt><font FACE="Courier">Register</font></tt> button to start the TTS
engine on your workstation. Then press the <tt><font FACE="Courier">Add Mode</font></tt>
button to select a voice for playback. Finally, make sure <tt><font FACE="Courier">TTSDATAFLAG_TAGGED</font></tt>
is checked. This informs the application that you will be sending control tags with your
text. Your screen should now look something like the one in Figure 18.1. </p>
<p><a HREF="f18-1.gif"><b>Figure 18.1 : </b><i>Starting the TTSTEST.EXE application.</i></a>
</p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><b>Note</b></td>
</tr>
<tr>
<td><blockquote>
<p>Even if you do not have a copy of the software, you can still learn a lot by reviewing
the material covered in this section.</p>
</blockquote>
</td>
</tr>
</table>
</center></div>
<h3><a NAME="TheVoiceCharacterControlTags">The Voice Character Control Tags</a></h3>
<p>There are three control tags that allow you to alter the general character of the
speaking voice. Microsoft has identified several characteristics of playback voices that
can be altered using control tags. However, your TTS engine may not recognize all of them.
The three control tags in this group are
<ul>
<li><tt><font FACE="Courier">Chr</font></tt>-Used to set the character of the voice. </li>
<li><tt><font FACE="Courier">Ctx</font></tt>-Used to set the context of the spoken text. </li>
<li><tt><font FACE="Courier">Vce</font></tt>-Used to set additional characteristics of the
voice, including language, accent, dialect, gender, name, and age. </li>
</ul>
<h4>Using the <tt><font FACE="Courier">Chr</font></tt> Tag to Set the Voice Character</h4>
<p>The <tt><font FACE="Courier">Chr</font></tt> tag allows you to set the general
character of the voice. The syntax for the <tt><font FACE="Courier">Chr</font></tt> tag is
</p>
<blockquote>
<tt><font FACE="Courier"><p>\Chr=string[[,string...]]\</font></tt> </p>
</blockquote>
<p>More than one characteristic can be applied at the same time. The default value is <tt><font
FACE="Courier">Normal</font></tt>. Others that are recognized by the Microsoft Voice TTS
engine are <tt><font FACE="Courier">Monotone</font></tt> and <tt><font FACE="Courier">Whisper</font></tt>.
Additional characteristics suggested by Microsoft are </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><tt><font FACE="Courier">Angry</font></tt></td>
<td WIDTH="102"><font FACE="Courier">Business</font></td>
<td WIDTH="80"><font FACE="Courier">Calm</font> </td>
<td WIDTH="112"><font FACE="Courier">Depressed</font></td>
<td WIDTH="78"><font FACE="Courier">Excited</font></td>
</tr>
<tr>
<td WIDTH="103"><tt><font FACE="Courier">Falsetto</font></tt> </td>
<td WIDTH="102"><font FACE="Courier">Happy</font></td>
<td WIDTH="80"><font FACE="Courier">Loud</font> </td>
<td WIDTH="112"><font FACE="Courier">Perky</font></td>
<td WIDTH="78"><font FACE="Courier">Quiet</font> </td>
</tr>
<tr>
<td WIDTH="103"><tt><font FACE="Courier">Sarcastic</font></tt> </td>
<td WIDTH="102"><font FACE="Courier">Scared</font></td>
<td WIDTH="80"><font FACE="Courier">Shout</font> </td>
<td WIDTH="112"><font FACE="Courier">Tense</font></td>
<td WIDTH="78"><font FACE="Courier"> </font> </td>
</tr>
</table>
</center></div>
<p>To test the <tt><font FACE="Courier">Chr</font></tt> tag, enter the text shown in
Listing 18.1 into the input box of <tt><font FACE="Courier">TTSTEST.EXE</font></tt>. </p>
<hr>
<blockquote>
<b><p>Listing 18.1. Testing the <tt><font FACE="Courier">Chr</font></tt> control tag.<br>
</b></p>
</blockquote>
<blockquote>
<tt><font FACE="Courier"><p>\chr="monotone"\<br>
How are you today?<br>
\chr="whisper"\<br>
I am fine.<br>
\chr="normal"\<br>
Good to hear.</font></tt> </p>
</blockquote>
<hr>
<p>Each sentence will be spoken using a different characteristic. After entering the text,
press the <tt><font FACE="Courier">TextData</font></tt> button to hear the results. </p>
<h4>Using the <tt><font FACE="Courier">Ctx</font></tt> Tag to Set the Message Context</h4>
<p>Another valuable control tag is <tt><font FACE="Courier">Ctx</font></tt>, the context
tag. You can use this tag to tell the TTS engine the context of the message you are asking
it to render. Like the <tt><font FACE="Courier">Chr</font></tt> tag, the <tt><font
FACE="Courier">Ctx</font></tt> tag takes <tt><font FACE="Courier">string</font></tt> as a
parameter. Microsoft has defined the strings in Table 18.1 for the context tag. <br>
</p>
<p align="center"><b>Table 18.1. The context tag parameters.</b> </p>
<div align="center"><center>
<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
<tr>
<td><i>Context Tag Parameter</i></td>
<td WIDTH="317"><i>Description</i> </td>
</tr>
<tr>
<td WIDTH="170">Address</td>
<td WIDTH="317">Addresses and/or phone numbers. </td>
</tr>
<tr>
<td WIDTH="170">C</td>
<td WIDTH="317">Code in the C or C++ programming language. </td>
</tr>
<tr>
<td WIDTH="170">Document</td>
<td WIDTH="317">Text document.</td>
</tr>
<tr>
<td WIDTH="170">E-Mail</td>
<td WIDTH="317">Electronic mail.</td>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -