📄 draft-ietf-speechsc-mrcpv2-05.txt
字号:
SIP URI refers to the MRCPv2 server.
S Shanmugham IETF-Draft Page 5
MRCPv2 Protocol October, 2004
The session management protocol (SIP) will use SDP with the
offer/answer model described RFC 3264 to describe and setup the
MRCPv2 control channels. Separate MRCPv2 control channels are need
for controlling the different media processing resources associated
with that session. Within a SIP session, the individual resource
control channels for the different resources are added or removed
through the SDP offer/answer model and the SIP re-INVITE dialog.
The server, through the SDP exchange, provides the client with a
unique channel identifier and a port number(TCP or SCTP). The client
MAY then open a new TCP connection with the server using this port
number. Multiple MRCPv2 channels can share a TCP connection between
the client and the server. All MRCPv2 messages exchanged between the
client and the server will also carry the specified channel
identifier that MUST be unique among all MRCPv2 control channels
that are active on that server. The client can use this channel to
control the media processing resource associated with that channel.
The session management protocol (SIP) will also establish media
pipes between the client (or source/sink of media) and the MRCP
server using SDP m-lines. A media pipe maybe shared by one or more
media processing resources under that SIP session or each media
processing resource may have its own media pipe.
MRCPv2 client MRCPv2 Media Resource Server
|--------------------| |-----------------------------|
||------------------|| ||---------------------------||
|| Application Layer|| || TTS | ASR | SV | SI ||
||------------------|| ||Engine|Engine|Engine|Engine||
||Media Resource API|| ||---------------------------||
||------------------|| || Media Resource Management ||
|| SIP | MRCPv2 || ||---------------------------||
||Stack | || || SIP | MRCPv2 ||
|| | || || Stack | ||
||------------------|| ||---------------------------||
|| TCP/IP Stack ||----MRCPv2---|| TCP/IP Stack ||
|| || || ||
||------------------||-----SIP-----||---------------------------||
|--------------------| |-----------------------------|
| /
SIP /
| /
|-------------------| RTP
| | /
| Media Source/Sink |-------------/
| |
|-------------------|
Fig 1: Architectural Diagram
S Shanmugham IETF-Draft Page 6
MRCPv2 Protocol October, 2004
MRCPv2 Media Resource Types:
The MRCP server may offer one or more of the following media
processing resources to its clients.
Basic Synthesizer
A speech synthesizer resource with very limited capabilities, that
can be achieved through the playing out concatenated audio file
clips. The speech data is described as SSML data but with limited
support for its elements. It MUST support <speak>, <audio>, <sayas>
and <mark> tags in SSML.
Speech Synthesizer
A full capability speech synthesizer capable of rendering regular
speech and SHOULD have full SSML support.
Recorder
A resource capable of recording audio and saving it to an URI. It
also has some end-pointing capabilities for detecting beginning
speech and silence at the end of recording.
DTMF Recognizer
A limited DTMF only recognizer that is able to recognize DTMF digits
in the input stream to match supplied digit grammar. It could also
do a semantic interpretation based on semantic tags in the grammar.
Speech Recognizer
A full speech recognizer that is capable of receiving audio and
interpreting it to recognition results. It also has a natural
language semantic interpreter to post process the recognized data
according to the semantic data in the grammar and provide semantic
results along with the recognized input. The recognizer may also
support enrolled grammars, where the client can enroll and create
new personal grammars for use in future grammars.
Speaker Verification
A resource capable of verifying the authenticity of a person by
matching his voice to a saved voice-print. This may also involve
matching the callers voice with more than one voice-print, also
called multi-verification or speaker identification.
S Shanmugham IETF-Draft Page 7
MRCPv2 Protocol October, 2004
3.1. Server and Resource Addressing
The MRCPv2 server as a whole is a generic SIP server and addressed
by a specific SIP URL registered by the server.
Example:
sip:mrcpv2@mediaserver.com
4. MRCPv2 Protocol Basics
MRCPv2 requires the use of a connection oriented transport layer
protocol such as TCP or SCTP to guarantee reliable sequencing and
delivery of MRCPv2 control messages between the client and the
server. If security is needed a TLS connection is used to carry
MRCPv2 messages. One or more TCP, SCTP or TLS connections between
the client and the server can be shared between different MRCPv2
channels to the server. The individual messages carry the channel
identifier to differentiate messages on different channels. The
message format for MRCPv2 is text based with mechanisms to carry
embedded binary data. This allows data like recognition grammars,
recognition results, synthesizer speech markup etc. to be carried in
the MRCPv2 message between the client and the server resource. The
protocol does not address session and media establishment and
management and relies of SIP and SDP to do this.
4.1. Connecting to the Server
The MRCPv2 protocol depends on a session establishment and
management protocol such as SIP in conjunction with SDP. The client
finds and reaches a MRCPv2 server across the SIP network using the
INVITE and other SIP dialog exchanges. The SDP offer/answer exchange
model over SIP is used to establish resource control channels for
each resource. The SDP offer/answer exchange is also used to
establish media pipes between the source or sink of audio and the
server.
4.2. Managing Resource Control Channels
The client needs a separate MRCPv2 resource control channel to
control each media processing resource under the SIP session. A
unique channel identifier string identifies these resource control
channels. The channel identifier string consists of a hexadecimal
number specifying the channel ID followed by a string token
specifying the type of resource separated by an "@". The server
generates the hexadecimal channel ID and MUST make sure it does not
clash with any other MRCP channel allocated to that server. MRCPv2
S Shanmugham IETF-Draft Page 8
MRCPv2 Protocol October, 2004
defines the following type of media processing resources. Additional
resource types, their associated methods/events and state machines
can be added by future specification proposing to extend the
capabilities of MRCPv2.
Resource Type Resource Description
speechrecog Speech Recognition
dtmfrecog DTMF Recognition
speechsynth Speech Synthesis
basicsynth Poorman's Speech Synthesizer
speakverify Speaker Verification
recorder Speech Recording
Additional resource types, their associated methods/events and state
machines can be added by future specification proposing to extend
the capabilities of MRCPv2.
The SIP INVITE or re-INVITE dialog exchange and the SDP offer/answer
exchange it carries, will contain m-lines describing the resource
control channel it wants to allocate. There MUST be one SDP m-line
for each MRCPv2 resource that needs to be controlled. This m-line
will have a media type field of "control" and a transport type field
of "TCP", "SCTP" or "TCP/TLS". The port number field of the m-line
MUST contain the discard port of the transport protocol (say port 9
for TCP) in the SDP offer from the client and MUST contain the TCP
listen port on the server in the SDP answer. The client may then
setup a TCP or TLS connection to that server port or share an
already established connection to that port. The format field of the
m-line MUST contain "application/mrcpv2". The client must specify
the resource type identifier in the resource attribute associated
with the control m-line of the SDP offer. The server MUST respond
with the full Channel-Identifier (which includes the resource type
identifier and an unique hexadecimal identifier), in the "channel"
attribute associated with the control m-line of the SDP answer.
All servers MUST support TLS, SHOULD support TCP and MAY support
SCTP and it is up to the client to choose which mode of transport it
wants to use for an MRCPv2 session. When using TCP, SCTP or TLS the
m-lines MUST conform to IETF draft[20] which describes the usage of
SDP for connection oriented transport. When using TLS the SDP m-line
for the control pipe MUST conform to the IETF draft[21] in addition
to the IETF draft[20]. IETF draft[21] specifies the usage of SDP for
establishing a secure connection oriented transport over TLS.
When the client wants to add a media processing resource to the
session, it MUST initiate a re-INVITE dialog. The SDP offer/answer
exchange contained in this SIP dialog will contain an additional
control m-line for the new resource that needs to be allocated. The
server, on seeing the new m-line, will allocate the resource and
respond with a corresponding control m-line in the SDP answer
response.
S Shanmugham IETF-Draft Page 9
MRCPv2 Protocol October, 2004
The a=setup attribute as described in [20] MUST be "active" for the
offer from the client and MUST be "passive" for the answer from the
MRCP server. The a=connection attribute MUST have a value of "new"
on the very first control m-line offer from the client to a MRCP
server. Subsequent control m-lines offers from the client to the
MRCP server MAY contain "new" or "existing", depending on whether
the client wants to share a existing connection oriented pipe. The
value of "existing" tells the server that the client wants to reuse
an existing transport connection between the client and the server.
The server can respond with a value of "existing", if wants to allow
sharing of existing pipes or can reply with a value of "new", in
which case the client MUST initiate new connection oriented pipe.
Note: Only SDP m-lines having a common SDP format field of
"application/mrcpv2" can share connection orient pipes between them.
Such pipe is reserved exclusively for MRCPv2 communication and
cannot be shared with any other protocol.
When the client wants to de-allocate the resource from this session,
it MUST initiate a SIP re-INVITE dialog with the server and MUST
offer the control m-line with a port 0. The server MUST then answer
the control m-line with a response of port 0. This de-allocates the
usage of the associated MRCP identifier and resource. But may not
close the TCP, SCTP or TLS connection if it is currently being
shared among multiple MRCP channels. When all MRCP channels that may
be sharing the connection are released and the associated SIP
connections are closed, the client or server disconnect the shared
connection oriented pipe.
Example 1:
This exchange adds a resource control channel for a synthesizer.
Since a synthesizer would be generating an audio stream, this
interaction also creates a receive-only audio stream for the server
to send audio to.
C->S:
INVITE sip:mresources@mediaserver.com SIP/2.0
Via: SIP/2.0/TCP client.atlanta.example.com:5060;
branch=z9hG4bK74bf9
Max-Forwards: 6
To: MediaServer <sip:mresources@mediaserver.com>
From: sarvi <sip:sarvi@cisco.com>;tag=1928301774
Call-ID: a84b4c76e66710
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -