📄 ch14.htm

📁 MAPI__SAPI__TAPI
💻 HTM
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
accurate as the diphone method. However, if the TTS uses the synthesis method for 
generating output, it is very easy to modify a few parameters and then create a new 
&quot;voice.&quot; </p>

<p>Synthesis-based TTS engines require less overall computational resources, and less 
storage capacity. Synthesis-based systems are a bit more difficult to understand at first, 
but usually offer users the ability to adjust the tone, speed, and inflection of the voice 
rather easily. </p>

<h3><a NAME="TTSDiphoneConcatenation"><b>TTS Diphone Concatenation</b></a> </h3>

<p>The diphone concatenation method of generating speech uses pairs of phonemes (<i>di</i> 
meaning two) to produce each sound. These diphones represent the start and end of each 
individual speech part. For example, the word <i>pig</i> contains the diphones <i>silence-p, 
p-i, i-g, </i>and <i>g-silence.</i> Diphone TTS systems scan the word and then piece 
together the correct phoneme pairs to pronounce the word. </p>

<p>These phoneme pairs are produced not by computer synthesis, but from actual recordings 
of human voices that have been broken down to their smallest elements and categorized into 
the various diphone pairs. Since TTS systems that use diphones are using elements of 
actual human speech, they can produce much more human-like output. However, since diphone 
pairs are very language-specific, diphone TTS systems are usually dedicated to producing a 
single language. Because of this, diphone systems do not do well in environments where 
numerous foreign words may be present, or where the TTS might be required to produce 
output in more than one language. </p>

<h2><a NAME="GrammarRules"><b><font SIZE="5" COLOR="#FF0000">Grammar Rules</font></b></a></h2>

<p>The final elements of a speech engine are the grammar rules. Grammar rules are used by 
speech recognition (SR) software to analyze human speech input and, in the process, 
attempt to understand what a person is saying. Most of us suffered through a series of 
lessons in grade school where our teachers attempted to show us just how grammar rules 
affect our everyday speech patterns. And most of us probably don't remember a great deal 
from those lessons, but we all use grammar rules every day without thinking about them, to 
express ourselves and make sense of what others say to us. Without an understanding of and 
appreciation for the importance of grammars, computer speech recognition systems would not 
be possible. </p>

<p>There can be any number of grammars, each composed of a set of rules of speech. Just as 
humans must learn to share a common grammar in order to be understood, computers must also 
share a common grammar with the speaker in order to convert audio information into text. </p>

<p>Grammars can be divided in to three types, each with its own strengths and weaknesses. 
The types are: 

<ul>
  <li><font COLOR="#000000">Context-free grammars</font> </li>
  <li><font COLOR="#000000">Dictation grammars</font> </li>
  <li><font COLOR="#000000">Limited domain grammars</font> </li>
</ul>

<p>Context-free grammars offer the greatest degree of flexibility when interpreting human 
speech. Dictation grammars offer the greatest degree of accuracy when converting spoken 
words into printed text. Limited domain grammars offer a compromise between the highly 
flexible context-free grammar and the restrictive dictation grammar. </p>

<p>The following sections discuss each grammar type in more detail. </p>

<h3><a NAME="ContextFreeGrammars"><b>Context-Free Grammars</b></a> </h3>

<p>Context-free grammars work on the principle of following established rules to determine 
the most likely candidates for the next word in a sentence. Context-free grammars <i>do 
not</i> work on the idea that each word should be understood within a context. Rather, 
they evaluate the relationship of each word and word phrase to a known set of rules about 
what words are possible at any given moment. </p>

<p>The main elements of a context-free grammar are: 

<ul>
  <li><i>Words</i>-A list of valid words to be spoken </li>
  <li><i>Rules</i>-A set of speech structures in which words are used </li>
  <li><i>Lists</i>-One or more word sets to be used within rules </li>
</ul>

<p>Context-free grammars are good for systems that have to deal with a wide variety of 
input. Context-free systems are also able to handle variable vocabularies. This is because 
most of the rule-building done for context-free grammars revolves around declaring lists 
and groups of words that fit into common patterns or rules. Once the SR engine understands 
the rules, it is very easy to expand the vocabulary by expanding the lists of possible 
members of a group. </p>

<p>For example, rules in a context-free grammar might look something like this: </p>

<blockquote>
  <tt><font FACE="Courier"><p>&lt;NameRule&gt;=ALT(&quot;Mike&quot;,&quot;Curt&quot;,&quot;Sharon&quot;,&quot;Angelique&quot;) 
  <br>
  <br>
  &lt;SendMailRule&gt;=(&quot;Send Email to&quot;, &lt;NameRule&gt;) <br>
  </font></tt>In the example above, two rules have been established. The first rule, <tt><font
  FACE="Courier">&lt;NameRule&gt;</font></tt>, creates a list of possible names. The second 
  rule, <tt><font FACE="Courier">&lt;SendMailRule&gt;</font></tt>, creates a rule that 
  depends on <tt><font FACE="Courier">&lt;NameRule&gt;</font></tt>. In this way, 
  context-free grammars allow you to build your own grammatical rules as a predictor of how 
  humans will interact with the system. </p>
</blockquote>

<p>Even more importantly, context-free grammars allow for easy expansion at run-time. 
Since much of the way context-free grammars operate focuses on lists, it is easy to allow 
users to add list members and, therefore, to improve the value of the SR system quickly. 
This makes it easy to install a system with only basic components. The basic system can be 
expanded to meet the needs of various users. In this way, context-free grammars offer a 
high degree of flexibility with very little development cost or complication. </p>

<p>The construction of quality context-free grammars can be a challenge, however. Systems 
that only need to do a few things (such as load and run programs, execute simple 
directives, and so on) are easily expressed using context-free grammars. However, in order 
to perform more complex tasks or a wider range of chores, additional rules are needed. As 
the number of rules and the length of lists increases, the computational load rises 
dramatically. Also, since context-free grammars base their predictions on predefined 
rules, they are not good for tasks like dictation, where a large vocabulary is most 
important. </p>

<h3><a NAME="DictationGrammars"><b>Dictation Grammars</b></a> </h3>

<p>Unlike context-free grammars that operate using rules, dictation grammars base their 
evaluations on vocabulary. The primary function of a dictation grammar is to convert human 
speech into text as accurately as possible. In order to do this, dictation grammars need 
not only a rich vocabulary to work from, but also a sample output to use as a model when 
analyzing speech input. Rules of speech are not important to a system that must simply 
convert human input into printed text. </p>

<p>The elements of a dictation grammar are: 

<ul>
  <li><i>Topic</i>-Identifies the dictation topic (for example, medical or legal). </li>
  <li><i>Common</i>-A set of words commonly used in the dictation. Usually the common group 
    contains technical or specialized words that are expected to appear during dictation, but 
    are not usually found in regular conversation. </li>
  <li><i>Group</i>-A related set of words that can be expected, but that are not directly 
    related to the dictation topic. The group usually has a set of words that are expected to 
    occur frequently during dictation. The grammar model can contain more than one group. </li>
  <li><i>Sample</i>-A sample of text that shows the writing style of the speaker or general 
    format of the dictation. This text is used to aid the SR engine in analyzing speech input. 
  </li>
</ul>

<p>The success of a dictation grammar depends on the quality of the vocabulary. The more 
items on the list, the greater the chance of the SR engine mistaking one item for another. 
However, the more limited the vocabulary, the greater the number of &quot;unknown&quot; 
words that will occur during the course of the dictation. The most successful dictation 
systems balance vocabulary depth and the uniqueness of the words in the database. For this 
reason, dictation systems are usually tuned for one topic, such as legal or medical 
dictation. By limiting the vocabulary to the words most likely to occur in the course of 
dictation, translation accuracy is increased. </p>

<h3><a NAME="LimitedDomainGrammars"><b>Limited Domain Grammars</b></a> </h3>

<p>Limited domain grammars offer a compromise between the flexibility of context-free 
grammars and the accuracy of dictation grammars. Limited domain grammars have the 
following elements: 

<ul>
  <li><i>Words</i>-This is the list of specialized words that are likely to occur during a 
    session. </li>
  <li><i>Group</i>-This is a set of related words that could occur during the session. The 
    grammar can contain multiple word groups. A single phrase would be expected to include one 
    of the words in the group. </li>
  <li><i>Sample</i>-A sample of text that shows the writing style of the speaker or general 
    format of the dictation. This text is used to aid the SR engine in analyzing the speech 
    input. </li>
</ul>

<p>Limited domain grammars are useful in situations where the vocabulary of the system 
need not be very large. Examples include systems that use natural language to accept 
command statement, such as &quot;How can I set the margins?&quot; or &quot;Replace all 
instances of 'New York' with 'Los Angeles.'&quot; Limited domain grammars also work well 
for filling in forms or for simple text entry. </p>

<h2><a NAME="Summary"><b><font SIZE="5" COLOR="#FF0000">Summary</font></b></a> </h2>

<p>In this chapter you learned about the key factors behind creating and implementing a 
complete speech system for pcs. You learned the three major parts to speech systems: 

<ul>
  <li><i>Speech recognition</i>-Converts audio input into printed text or directly into 
    computer commands. </li>
  <li><i>Text-to-speech</i>-Converts printed text into audible speech. </li>
  <li><i>Grammar rules</i>-Used by speech recognition systems to analyze audio input and 
    convert it into commands or text. </li>
</ul>

<p>In the next chapter, you'll learn the specifics behind the Microsoft speech recognition 
engine. </p>

<hr WIDTH="100%">

<p align="center"><a HREF="ch13.htm"><img SRC="pc.gif" BORDER="0" HEIGHT="88" WIDTH="140"></a><a
HREF="#CONTENTS"><img SRC="cc.gif" BORDER="0" HEIGHT="88" WIDTH="140"></a><a
HREF="index.htm"><img SRC="hb.gif" BORDER="0" HEIGHT="88" WIDTH="140"></a> <a
HREF="ch15.htm"><img SRC="nc.gif" BORDER="0" HEIGHT="88" WIDTH="140"></a></p>

<hr WIDTH="100%">
<layer src="http://www.spidersoft.com/ads/bwz468_60.htm" visibility="hidden" id="a1" width="600" onload="moveToAbsolute(ad1.pageX,ad1.pageY); a1.clip.height=60;visibility='show';">
</layer>
</body>
</html>
上一页 1 23
💿 文件大小 527 K
👤 上传用户 pjamytian
📂 所属分类 TAPI编程
📄 代码行数 595 行
💻 语言类型 HTM
🏷️ 相关标签

#MAPI #SAPI #TAPI
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -