⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch18.htm

📁 MAPI__SAPI__TAPI
💻 HTM
📖 第 1 页 / 共 5 页
字号:
<html>

<head>
<title>Chapter 18 -- SAPI Behind the Scenes</title>
<meta NAME="GENERATOR" CONTENT="Microsoft FrontPage 3.0">
</head>

<body TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#CE2910">
<!-- Spidersoft WebZIP Ad Banner Insert -->
<!-- End of Spidersoft WebZIP Ad Banner Insert-->

<h1><font COLOR="#FF0000">Chapter 18</font></h1>

<h1><b><font SIZE="5" COLOR="#FF0000">SAPI Behind the Scenes</font></b> </h1>

<hr WIDTH="100%">

<h3 ALIGN="CENTER"><font SIZE="+2" COLOR="#000000">CONTENTS<a NAME="CONTENTS"></a> </font></h3>

<ul>
  <li><a HREF="#ControlTags">Control Tags</a> <ul>
      <li><a HREF="#TheVoiceCharacterControlTags">The Voice Character Control Tags</a> </li>
      <li><a HREF="#ThePhraseModificationControlTags">The Phrase Modification Control Tags</a> </li>
      <li><a HREF="#TheLowLevelTTSControlTags">The Low-Level TTS Control Tags</a> </li>
    </ul>
  </li>
  <li><a HREF="#GrammarRules">Grammar Rules</a> <ul>
      <li><a HREF="#GeneralRulesfortheSAPIContextFree">General Rules for the SAPI Context-Free 
        Grammar</a> </li>
      <li><a HREF="#CreatingandCompilingaSAPIContextFr">Creating and Compiling a SAPI Context-Free 
        Grammar</a> </li>
      <li><a HREF="#LoadingandTestingSAPIContextFreeGr">Loading and Testing SAPI Context-Free 
        Grammars</a> </li>
    </ul>
  </li>
  <li><a HREF="#InternationalPhoneticAlphabet">International Phonetic Alphabet</a> </li>
  <li><a HREF="#Summary">Summary</a> </li>
</ul>

<hr>

<p>In this chapter, you'll learn about three aspects of the SAPI system that are not often 
used in the course of normal speech services operations: 

<ul>
  <li><i>Control tags</i> are used to modify the audio output of TTS engines. </li>
  <li><i>Grammar rules</i> are used to tell SR engines how to analyze audio input. </li>
  <li><font COLOR="#000000">The </font><i>International Phonetic Alphabet</i> is used by 
    Unicode systems to gain additional control over both TTS output and SR input. </li>
</ul>

<p>You'll learn how to add control tags to your TTS text input in order to change the 
speed, pitch, volume, mood, gender, and other characteristics of TTS audio output. You'll 
learn how to use the 15 control tags to improve the sound of your TTS engine. </p>

<p>You'll also learn how grammar rules are used by the SR engine to analyze spoken input. 
You'll learn how to design your own grammars for specialized uses. You'll also learn how 
to code and compile your own grammars using tools from the Microsoft Speech SDK. Finally, 
you'll load your custom grammar into a test program and test the results of your newly 
designed grammar rules. </p>

<p>The last topic in this chapter is the International Phonetic Alphabet (<i>IPA</i>). The 
IPA is a <br>
standard system for documenting the various sounds of human speech. The IPA is an 
implementation option for SAPI speech services under Unicode. For this reason, the IPA can 
be implemented only on WinNT systems. In this chapter, you'll learn how the IPA can be 
used to improve both TTS playback and SR recognition performance. </p>

<h2><a NAME="ControlTags"><font SIZE="5" COLOR="#FF0000">Control Tags</font></a> </h2>

<p>One of the most difficult tasks for a TTS system is the rendering of complete 
sentences. Most TTS systems do quite well when converting a single word into speech. 
However, when TTS systems begin to string words together into sentences, they do not 
perform as well because human speech has a set of inflections, pitches, and rhythms. These 
characteristics of human speech are called <i>prosody</i>. </p>

<p>There are several reasons that TTS engines are unsuccessful in matching the prosody of 
human speech. First, very little of it is written down in the text. Punctuation marks can 
be used to estimate some prosody information, but not all. Much of the inflection of a 
sentence is tied to subtle differences in the speech of individuals when they speak to 
each other-interjections, racing to complete a thought, putting in a little added <i>emphasis 
to make a point</i>. These are all aspects of human prosody that are rarely found in 
written text. </p>

<p>When you consider the complexity involved in rendering a complete thought or sentence, 
the current level of technology in TTS engines is quite remarkable. Although the average 
output of TTS engines still sounds like a poor imitation of Darth Vader, it is amazingly 
close to human speech. </p>

<p>One of the ways that the SAPI model attempts to provide added control to TTS engines is 
the inclusion of what are called <i>control tags</i> in text that is to be spoken. These 
tags can be used to adjust the speed, pitch, and character of the voice used to render the 
text. By using control tags, you can greatly improve the perceived performance of the TTS 
engine. </p>

<p>The SAPI model defines 15 different control tags that can be used to modify the output 
of TTS engines. Microsoft defined these tags but does not determine how the TTS engine 
will respond to them. It is acceptable for TTS engines that comply with the TAPI model to 
ignore any and all control tags it does not understand. It is possible that the TTS engine 
you install on your system will not respond to some or all of these tags. It is also 
possible that the TTS engine will attempt to interpret them as part of the text instead of 
ignoring them. You will need to experiment with your TTS engine to determine its level of 
compliance with the SAPI model. </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><b>Note</b></td>
  </tr>
  <tr>
    <td><blockquote>
      <p>All of the examples in this section were created using the Microsoft Voice TTS engine 
      that ships with Microsoft Phone.</p>
    </blockquote>
    </td>
  </tr>
</table>
</center></div>

<p>The SAPI control tags fall into three general categories: 

<ul>
  <li><i>Voice character</i> control tags </li>
  <li><i>Phrase modification</i> control tags </li>
  <li><i>Low-level TTS</i> control tags </li>
</ul>

<p>The <i>voice character</i> tags can be used to set high-level general characteristics 
of the voice. The SAPI model allows users to select gender, dialect, accent, message 
context types, speaker's age, even the general mood of the speaker. </p>

<p>The <i>phrase modification</i> tags can be used to adjust the pronunciation at a 
word-by-word or phrase-by-phrase level. Users can control the word emphasis, pauses, 
pitch, speed, and volume of the playback. </p>

<p>The <i>low-level TTS</i> tags deal with attributes of the TTS engine itself. Users can 
add comments to the text, control the pronunciation of a word, turn prosody rules on and 
off, reset the engine to default settings, or even call a control tag based on its own 
GUID (guaranteed unique identifier). </p>

<p>You add control tags to the text sent to the TTS engine by surrounding them with the 
backslash (\) character. For example, to adjust the speed of the text playback from 150 to 
200 words per minute, you would enter the <tt><font FACE="Courier">\Spd=\</font></tt> 
control tag. The text below shows how this looks: </p>

<blockquote>
  <tt><font FACE="Courier"><p>\Spd=150\This sentence is normal. \Spd=200\This sentence is 
  faster.</font></tt> </p>
</blockquote>

<p>Control tags are not case sensitive. For example, <tt><font FACE="Courier">\spd=200\</font></tt> 
is the same as <tt><font FACE="Courier">\Spd=200\</font></tt> or <tt><font FACE="Courier">\SPD=200\</font></tt>. 
However, control tags are white-space sensitive. <tt><font FACE="Courier">\Spd=200\</font></tt> 
is <i>not</i> the same as <tt><font FACE="Courier">\ Spd=200 \</font></tt>. As mentioned 
above, if the TTS engine encounters an unknown control tag, it ignores it. The next three 
sections of this chapter go into the details of each control tag and show you how to use 
them. </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><b>Note</b></td>
  </tr>
  <tr>
    <td><blockquote>
      <p>The following examples all use the <tt><font FACE="Courier">TTSTEST.EXE</font></tt> 
      program that is installed in the <tt><font FACE="Courier">BIN\ANSI.API</font></tt> or the <tt><font
      FACE="Courier">BIN\UNICODE.API</font></tt> folder of the <tt><font FACE="Courier">SPEEchSDK</font></tt> 
      folder. These folders were created when you installed the Microsoft Speech SDK. </p>
    </blockquote>
    </td>
  </tr>
</table>
</center></div>

<p>Before continuing, load the <tt><font FACE="Courier">TTSTEST.EXE</font></tt> program 
from the <tt><font FACE="Courier">SPEEchSDK\BIN\ANSI.API</font></tt> directory (Win95) or 
the <tt><font FACE="Courier">SPEEchSDK\BIN\UNICODE.API</font></tt> directory (WinNT). This 
program will be used to illustrate examples throughout the chapter. After loading your 
program, press the <tt><font FACE="Courier">Register</font></tt> button to start the TTS 
engine on your workstation. Then press the <tt><font FACE="Courier">Add Mode</font></tt> 
button to select a voice for playback. Finally, make sure <tt><font FACE="Courier">TTSDATAFLAG_TAGGED</font></tt> 
is checked. This informs the application that you will be sending control tags with your 
text. Your screen should now look something like the one in Figure 18.1. </p>

<p><a HREF="f18-1.gif"><b>Figure 18.1 : </b><i>Starting the TTSTEST.EXE application.</i></a> 
</p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><b>Note</b></td>
  </tr>
  <tr>
    <td><blockquote>
      <p>Even if you do not have a copy of the software, you can still learn a lot by reviewing 
      the material covered in this section.</p>
    </blockquote>
    </td>
  </tr>
</table>
</center></div>

<h3><a NAME="TheVoiceCharacterControlTags">The Voice Character Control Tags</a></h3>

<p>There are three control tags that allow you to alter the general character of the 
speaking voice. Microsoft has identified several characteristics of playback voices that 
can be altered using control tags. However, your TTS engine may not recognize all of them. 
The three control tags in this group are 

<ul>
  <li><tt><font FACE="Courier">Chr</font></tt>-Used to set the character of the voice. </li>
  <li><tt><font FACE="Courier">Ctx</font></tt>-Used to set the context of the spoken text. </li>
  <li><tt><font FACE="Courier">Vce</font></tt>-Used to set additional characteristics of the 
    voice, including language, accent, dialect, gender, name, and age. </li>
</ul>

<h4>Using the <tt><font FACE="Courier">Chr</font></tt> Tag to Set the Voice Character</h4>

<p>The <tt><font FACE="Courier">Chr</font></tt> tag allows you to set the general 
character of the voice. The syntax for the <tt><font FACE="Courier">Chr</font></tt> tag is 
</p>

<blockquote>
  <tt><font FACE="Courier"><p>\Chr=string[[,string...]]\</font></tt> </p>
</blockquote>

<p>More than one characteristic can be applied at the same time. The default value is <tt><font
FACE="Courier">Normal</font></tt>. Others that are recognized by the Microsoft Voice TTS 
engine are <tt><font FACE="Courier">Monotone</font></tt> and <tt><font FACE="Courier">Whisper</font></tt>. 
Additional characteristics suggested by Microsoft are </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><tt><font FACE="Courier">Angry</font></tt></td>
    <td WIDTH="102"><font FACE="Courier">Business</font></td>
    <td WIDTH="80"><font FACE="Courier">Calm</font> </td>
    <td WIDTH="112"><font FACE="Courier">Depressed</font></td>
    <td WIDTH="78"><font FACE="Courier">Excited</font></td>
  </tr>
  <tr>
    <td WIDTH="103"><tt><font FACE="Courier">Falsetto</font></tt> </td>
    <td WIDTH="102"><font FACE="Courier">Happy</font></td>
    <td WIDTH="80"><font FACE="Courier">Loud</font> </td>
    <td WIDTH="112"><font FACE="Courier">Perky</font></td>
    <td WIDTH="78"><font FACE="Courier">Quiet</font> </td>
  </tr>
  <tr>
    <td WIDTH="103"><tt><font FACE="Courier">Sarcastic</font></tt> </td>
    <td WIDTH="102"><font FACE="Courier">Scared</font></td>
    <td WIDTH="80"><font FACE="Courier">Shout</font> </td>
    <td WIDTH="112"><font FACE="Courier">Tense</font></td>
    <td WIDTH="78"><font FACE="Courier">&nbsp;</font> </td>
  </tr>
</table>
</center></div>

<p>To test the <tt><font FACE="Courier">Chr</font></tt> tag, enter the text shown in 
Listing 18.1 into the input box of <tt><font FACE="Courier">TTSTEST.EXE</font></tt>. </p>

<hr>

<blockquote>
  <b><p>Listing 18.1. Testing the <tt><font FACE="Courier">Chr</font></tt> control tag.<br>
  </b></p>
</blockquote>

<blockquote>
  <tt><font FACE="Courier"><p>\chr=&quot;monotone&quot;\<br>
  How are you today?<br>
  \chr=&quot;whisper&quot;\<br>
  I am fine.<br>
  \chr=&quot;normal&quot;\<br>
  Good to hear.</font></tt> </p>
</blockquote>

<hr>

<p>Each sentence will be spoken using a different characteristic. After entering the text, 
press the <tt><font FACE="Courier">TextData</font></tt> button to hear the results. </p>

<h4>Using the <tt><font FACE="Courier">Ctx</font></tt> Tag to Set the Message Context</h4>

<p>Another valuable control tag is <tt><font FACE="Courier">Ctx</font></tt>, the context 
tag. You can use this tag to tell the TTS engine the context of the message you are asking 
it to render. Like the <tt><font FACE="Courier">Chr</font></tt> tag, the <tt><font
FACE="Courier">Ctx</font></tt> tag takes <tt><font FACE="Courier">string</font></tt> as a 
parameter. Microsoft has defined the strings in Table 18.1 for the context tag. <br>
</p>

<p align="center"><b>Table 18.1. The context tag parameters.</b> </p>
<div align="center"><center>

<table BORDERCOLOR="#000000" BORDER="1" WIDTH="80%">
  <tr>
    <td><i>Context Tag Parameter</i></td>
    <td WIDTH="317"><i>Description</i> </td>
  </tr>
  <tr>
    <td WIDTH="170">Address</td>
    <td WIDTH="317">Addresses and/or phone numbers. </td>
  </tr>
  <tr>
    <td WIDTH="170">C</td>
    <td WIDTH="317">Code in the C or C++ programming language. </td>
  </tr>
  <tr>
    <td WIDTH="170">Document</td>
    <td WIDTH="317">Text document.</td>
  </tr>
  <tr>
    <td WIDTH="170">E-Mail</td>
    <td WIDTH="317">Electronic mail.</td>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -