📄 draft-ietf-speechsc-mrcpv2-05.txt
字号:
Internet Engineering Task Force Saravanan Shanmugham
Internet-Draft Cisco Systems Inc.
draft-ietf-speechsc-mrcpv2-05 October 18, 2004
Expires: April 18, 2005
Media Resource Control Protocol Version 2(MRCPv2)
Status of this Memo
By submitting this Internet-Draft, we certify that any applicable
patent or other IPR claims of which we are aware have been
disclosed, and any of which we become aware will be disclosed, in
accordance with RFC 3668.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt .
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html .
This Internet-Draft will expire on April 18, 2005.
Copyright Notice
Copyright (C) The Internet Society (2004). All Rights Reserved.
Abstract
This document describes a proposal for a Media Resource Control
Protocol Version 2 (MRCPv2) and aims to meet the requirements
specified in the SPEECHSC working group requirements document. It is
based on the Media Resource Control Protocol (MRCP), also called
S. Shanmugham, et. al. Page 1
MRCPv2 Protocol October, 2004
MRCPv1 developed jointly by Cisco Systems, Inc., Nuance
Communications, and Speechworks Inc.
The MRCPv2 protocol will control media service resources like speech
synthesizers, recognizers, signal generators, signal detectors, fax
servers etc. over a network. This protocol depends on a session
management protocol such as the Session Initiation Protocol (SIP) to
establish a separate MRCPv2 control session between the client and
the server. It also depends on SIP to establish the media pipe and
associated parameters between the media source or sink and the media
server. Once this is done, the MRCPv2 protocol exchange can happen
over the control session established above allowing the client to
command and control the media processing resources that may exist on
the media server.
Table of Contents
Status of this Memo..............................................1
Copyright Notice.................................................1
Abstract.........................................................1
Table of Contents................................................2
1. Introduction:...............................................4
2. Notational Convention.......................................5
3. Architecture:...............................................5
3.1. MRCPv2 Media Resources:....................................7
3.2. Server and Resource Addressing.............................8
4. MRCPv2 Protocol Basics......................................8
4.1. Connecting to the Server...................................8
4.2. Managing Resource Control Channels.........................8
4.3. Media Streams and RTP Ports...............................15
4.4. MRCPv2 Message Transport..................................16
4.5. Resource Types............................................17
5. MRCPv2 Specification.......................................17
5.1. Request...................................................18
5.2. Response..................................................19
5.3. Event.....................................................20
6. MRCP Generic Features......................................21
6.1. Generic Message Headers...................................21
6.2. SET-PARAMS................................................30
6.3. GET-PARAMS................................................30
7. Resource Discovery.........................................31
8. Speech Synthesizer Resource................................32
8.1. Synthesizer State Machine.................................33
8.2. Synthesizer Methods.......................................33
8.3. Synthesizer Events........................................34
8.4. Synthesizer Header Fields.................................34
8.5. Synthesizer Message Body..................................40
8.6. SPEAK.....................................................43
8.7. STOP......................................................44
8.8. BARGE-IN-OCCURRED.........................................45
S Shanmugham IETF-Draft Page 2
MRCPv2 Protocol October, 2004
8.9. PAUSE.....................................................47
8.10. RESUME....................................................48
8.11. CONTROL...................................................49
8.12. SPEAK-COMPLETE............................................50
8.13. SPEECH-MARKER.............................................51
8.14. DEFINE-LEXICON............................................52
9. Speech Recognizer Resource.................................53
9.1. Recognizer State Machine..................................54
9.2. Recognizer Methods........................................54
9.3. Recognizer Events.........................................55
9.4. Recognizer Header Fields..................................55
9.5. Recognizer Message Body...................................69
9.6. DEFINE-GRAMMAR............................................83
9.7. RECOGNIZE.................................................87
9.8. STOP......................................................89
9.9. GET-RESULT................................................90
9.10. START-OF-SPEECH...........................................91
9.11. START-INPUT-TIMERS........................................92
9.12. RECOGNITION-COMPLETE......................................92
9.13. START-PHRASE-ENROLLMENT...................................94
9.14. ENROLLMENT-ROLLBACK.......................................95
9.15. END-PHRASE-ENROLLMENT.....................................96
9.16. MODIFY-PHRASE.............................................96
9.17. DELETE-PHRASE.............................................97
9.18. INTERPRET.................................................97
9.19. INTERPRETATION-COMPLETE...................................98
9.20. DTMF Detection...........................................100
10. Recorder Resource.........................................100
10.1. Recorder State Machine...................................100
10.2. Recorder Methods.........................................100
10.3. Recorder Events..........................................100
10.4. Recorder Header Fields...................................101
10.5. Recorder Message Body....................................105
10.6. RECORD...................................................105
10.7. STOP.....................................................106
10.8. RECORD-COMPLETE..........................................107
10.9. START-INPUT-TIMERS.......................................107
11. Speaker Verification and Identification...................109
11.1. Speaker Verification State Machine.......................110
11.2. Speaker Verification Methods.............................110
11.3. Verification Events......................................111
11.4. Verification Header Fields...............................111
11.5. Verification Result Elements.............................119
11.6. START-SESSION............................................123
11.7. END-SESSION..............................................124
11.8. QUERY-VOICEPRINT.........................................124
11.9. DELETE-VOICEPRINT........................................125
11.10. VERIFY..................................................126
11.11. VERIFY-FROM-BUFFER......................................126
11.12. VERIFY-ROLLBACK.........................................129
11.13. STOP....................................................130
S Shanmugham IETF-Draft Page 3
MRCPv2 Protocol October, 2004
11.14. START-INPUT-TIMERS......................................131
11.15. VERIFICATION-COMPLETE...................................131
11.16. START-OF-SPEECH.........................................132
11.17. CLEAR-BUFFER............................................132
11.18. GET-INTERMEDIATE-RESULT.................................132
12. Security Considerations...................................133
13. Examples:.................................................133
14. Reference Documents.......................................145
15. Appendix..................................................146
15.1. ABNF Message Definitions.................................146
15.2. XML Schema and DTD.......................................161
Full Copyright Statement.......................................168
Intellectual Property..........................................169
Contributors...................................................169
Acknowledgements...............................................170
Editors' Addresses.............................................170
1. Introduction:
The MRCPv2 protocol is designed for a client device to control media
processing resources on the network allowing to process and
audio/video stream. Some of these media processing resources could
be speech recognition, speech synthesis engines, speaker
verification or speaker identification engines. This allows a vendor
to implement distributed Interactive Voice Response platforms such
as VoiceXML [7] browsers.
The protocol requirements of SPEECHSC require that the protocol
is capable of reaching a media processing server and setting up
communication channels to the media resources, to send/recieve
control messages and media streams to/from the server. The Session
Initiation Protocol (SIP) protocol described in [4] meets these
requirements and is used to setup and tear down media and control
pipes to the server. In addition, the SIP re-INVITE can be used to
change the characteristics of these media and control pipes mid-
session. The MRCPv2 protocol hence is designed to leverage and
build upon a session management protocols such as Session Initiation
Protocol (SIP) and Session Description Protocol (SDP). SDP is used
to describe the parameters of the media pipe associated with that
session. It is mandatory to support SIP as the session level
protocol to ensure interoperability. Other protocols can be used at
the session level by prior agreement.
The MRCPv2 protocol depends on SIP and SDP to create the session,
and setup the media channels to the server. It also depends on SIP
and SDP to establish MRCPv2 control channels between the client and
the server for each media processing resource required for that
session. The MRCPv2 protocol exchange between the client and the
media resource can then happen on that control channel. The MRCPv2
S Shanmugham IETF-Draft Page 4
MRCPv2 Protocol October, 2004
protocol exchange happening on this control channel does not change
the state of the SIP session, the media or other parameters of the
session SIP initiated. It merely controls and affects the state of
the media processing resource associated with that MRCPv2 channel.
The MRCPv2 protocol defines the messages to control the different
media processing resources and the state machines required to guide
their operation. It also describes how these messages are carried
over a transport layer such as TCP, SCTP or TLS.
2. Notational Convention
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY" and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119[9].
Since many of the definitions and syntax are identical to HTTP/1.1,
this specification only points to the section where they are defined
rather than copying it. For brevity, [HX.Y] is to be taken to refer
to Section X.Y of the current HTTP/1.1 specification (RFC 2616 [1]).
All the mechanisms specified in this document are described in both
prose and an augmented Backus-Naur form (ABNF). It is described in
detail in RFC 2234 [3].
The complete message format in ABNF form is provided in Appendix
section 12.1 and is the normative format definition.
Media Resource
An entity on the MRCP Server that can be controlled through the
MRCP protocol
MRCP Server
Aggregate of one or more "Media Resource" entities on a Server,
exposed through the MRCP protocol.("Server" for short)
MRCP Client
An entity controlling one or more Media Resources through the
MRCP protocol. ("Client" for short)
3. Architecture:
The system consists of a client that requires the generation of
media streams or requires the processing of media streams and a
media resource server that has the resources or engines to process
or generate these streams. The client establishes a session using
SIP and SDP with the server to use its media processing resources. A
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -