📄 draft-ietf-speechsc-mrcpv2-05.txt

📁 MRCP V2版协议
💻 TXT
📖 第 1 页 / 共 5 页
字号:
    SIP URI refers to the MRCPv2 server.  
  
 S Shanmugham                  IETF-Draft                        Page 5 

                            MRCPv2 Protocol              October, 2004 

     
    The session management protocol (SIP) will use SDP with the 
    offer/answer model described RFC 3264 to describe and setup the 
    MRCPv2 control channels. Separate MRCPv2 control channels are need 
    for controlling the different media processing resources associated 
    with that session. Within a SIP session, the individual resource 
    control channels for the different resources are added or removed 
    through the SDP offer/answer model and the SIP re-INVITE dialog. 
     
    The server, through the SDP exchange, provides the client with a 
    unique channel identifier and a port number(TCP or SCTP). The client 
    MAY then open a new TCP connection with the server using this port 
    number. Multiple MRCPv2 channels can share a TCP connection between 
    the client and the server. All MRCPv2 messages exchanged between the 
    client and the server will also carry the specified channel 
    identifier that MUST be unique among all MRCPv2 control channels 
    that are active on that server. The client can use this channel to 
    control the media processing resource associated with that channel. 
     
    The session management protocol (SIP) will also establish media 
    pipes between the client (or source/sink of media) and the MRCP 
    server using SDP m-lines. A media pipe maybe shared by one or more 
    media processing resources under that SIP session or each media 
    processing resource may have its own media pipe.  
     
         MRCPv2 client                  MRCPv2 Media Resource Server 
      |--------------------|             |-----------------------------| 
      ||------------------||             ||---------------------------|| 
      || Application Layer||             || TTS  | ASR  | SV   | SI   ||  
      ||------------------||             ||Engine|Engine|Engine|Engine|| 
      ||Media Resource API||             ||---------------------------|| 
      ||------------------||             || Media Resource Management || 
      || SIP  |  MRCPv2   ||             ||---------------------------|| 
      ||Stack |           ||             ||   SIP  |    MRCPv2        || 
      ||      |           ||             ||  Stack |                  || 
      ||------------------||             ||---------------------------|| 
      ||   TCP/IP Stack   ||----MRCPv2---||       TCP/IP Stack        || 
      ||                  ||             ||                           || 
      ||------------------||-----SIP-----||---------------------------|| 
      |--------------------|             |-----------------------------|               
               |                             / 
              SIP                           / 
               |                           /            
      |-------------------|              RTP 
      |                   |              / 
      | Media Source/Sink |-------------/ 
      |                   | 
      |-------------------| 
  
                     Fig 1: Architectural Diagram 
     
  
 S Shanmugham                  IETF-Draft                        Page 6 

                            MRCPv2 Protocol              October, 2004 

   MRCPv2 Media Resource Types: 
     
    The MRCP server may offer one or more of the following media 
    processing resources to its clients. 
     
    Basic Synthesizer 
     
    A speech synthesizer resource with very limited capabilities, that 
    can be achieved through the playing out concatenated audio file 
    clips. The speech data is described as SSML data but with limited 
    support for its elements. It MUST support <speak>, <audio>, <sayas> 
    and <mark> tags in SSML. 
     
     
    Speech Synthesizer 
     
    A full capability speech synthesizer capable of rendering regular 
    speech and SHOULD have full SSML support.  
     
     
    Recorder 
     
    A resource capable of recording audio and saving it to an URI. It 
    also has some end-pointing capabilities for detecting beginning 
    speech and silence at the end of recording. 
     
     
    DTMF Recognizer 
     
    A limited DTMF only recognizer that is able to recognize DTMF digits 
    in the input stream to match supplied digit grammar. It could also 
    do a semantic interpretation based on semantic tags in the grammar. 
     
     
    Speech Recognizer 
     
    A full speech recognizer that is capable of receiving audio and 
    interpreting it to recognition results. It also has a natural 
    language semantic interpreter to post process the recognized data 
    according to the semantic data in the grammar and provide semantic 
    results along with the recognized input. The recognizer may also 
    support enrolled grammars, where the client can enroll and create 
    new personal grammars for use in future grammars. 
     
     
    Speaker Verification    
     
    A resource capable of verifying the authenticity of a person by 
    matching his voice to a saved voice-print. This may also involve 
    matching the callers voice with more than one voice-print, also 
    called multi-verification or speaker identification. 
  
 S Shanmugham                  IETF-Draft                        Page 7 

                            MRCPv2 Protocol              October, 2004 

  
     
 3.1. Server and Resource Addressing 
     
    The MRCPv2 server as a whole is a generic SIP server and addressed 
    by a specific SIP URL registered by the server.  
     
    Example: 
     
      sip:mrcpv2@mediaserver.com 
  
     
 4.   MRCPv2 Protocol Basics 
     
    MRCPv2 requires the use of a connection oriented transport layer 
    protocol such as TCP or SCTP to guarantee reliable sequencing and 
    delivery of MRCPv2 control messages between the client and the 
    server. If security is needed a TLS connection is used to carry 
    MRCPv2 messages. One or more TCP,  SCTP or TLS connections between 
    the client and the server can be shared between different MRCPv2 
    channels to the server. The individual messages carry the channel 
    identifier to differentiate messages on different channels. The 
    message format for MRCPv2 is text based with mechanisms to carry 
    embedded binary data. This allows data like recognition grammars, 
    recognition results, synthesizer speech markup etc. to be carried in 
    the MRCPv2 message between the client and the server resource. The 
    protocol does not address session and media establishment and 
    management and relies of SIP and SDP to do this.  
     
 4.1. Connecting to the Server 
     
    The MRCPv2 protocol depends on a session establishment and 
    management protocol such as SIP in conjunction with SDP. The client 
    finds and reaches a MRCPv2 server across the SIP network using the 
    INVITE and other SIP dialog exchanges. The SDP offer/answer exchange 
    model over SIP is used to establish resource control channels for 
    each resource. The SDP offer/answer exchange is also used to 
    establish media pipes between the source or sink of audio and the 
    server.  
     
      
 4.2. Managing Resource Control Channels 
     
    The client needs a separate MRCPv2 resource control channel to 
    control each media processing resource under the SIP session. A 
    unique channel identifier string identifies these resource control 
    channels. The channel identifier string consists of a hexadecimal 
    number specifying the channel ID followed by a string token 
    specifying the type of resource separated by an "@". The server 
    generates the hexadecimal channel ID and MUST make sure it does not 
    clash with any other MRCP channel allocated to that server. MRCPv2 
  
 S Shanmugham                  IETF-Draft                        Page 8 

                            MRCPv2 Protocol              October, 2004 

    defines the following type of media processing resources. Additional 
    resource types, their associated methods/events and state machines 
    can be added by future specification proposing to extend the 
    capabilities of MRCPv2. 
     
           Resource Type       Resource Description 
            speechrecog         Speech Recognition 
            dtmfrecog           DTMF Recognition 
            speechsynth         Speech Synthesis 
            basicsynth          Poorman's Speech Synthesizer 
            speakverify         Speaker Verification 
            recorder            Speech Recording 
  
    Additional resource types, their associated methods/events and state 
    machines can be added by future specification proposing to extend 
    the capabilities of MRCPv2. 
     
    The SIP INVITE or re-INVITE dialog exchange and the SDP offer/answer 
    exchange it carries, will contain m-lines describing the resource 
    control channel it wants to allocate. There MUST be one SDP m-line 
    for each MRCPv2 resource that needs to be controlled. This m-line 
    will have a media type field of "control" and a transport type field 
    of "TCP", "SCTP" or "TCP/TLS". The port number field of the m-line 
    MUST contain the discard port of the transport protocol (say port 9 
    for TCP) in the SDP offer from the client and MUST contain the TCP 
    listen port on the server in the SDP answer. The client may then 
    setup a TCP or TLS connection to that server port or share an 
    already established connection to that port. The format field of the 
    m-line MUST contain "application/mrcpv2". The client must specify 
    the resource type identifier in the resource attribute associated 
    with the control m-line of the SDP offer. The server MUST respond 
    with the full Channel-Identifier (which includes the resource type 
    identifier and an unique hexadecimal identifier), in the "channel" 
    attribute associated with the control m-line of the SDP answer. 
     
    All servers MUST support TLS, SHOULD support TCP and MAY support 
    SCTP and it is up to the client to choose which mode of transport it 
    wants to use for an MRCPv2 session. When using TCP, SCTP or TLS the 
    m-lines MUST conform to IETF draft[20] which describes the usage of 
    SDP for connection oriented transport. When using TLS the SDP m-line 
    for the control pipe MUST conform to the IETF draft[21] in addition 
    to the IETF draft[20]. IETF draft[21] specifies the usage of SDP for 
    establishing a secure connection oriented transport over TLS. 
     
    When the client wants to add a media processing resource to the 
    session, it MUST initiate a re-INVITE dialog. The SDP offer/answer 
    exchange contained in this SIP dialog will contain an additional 
    control m-line for the new resource that needs to be allocated. The 
    server, on seeing the new m-line, will allocate the resource and 
    respond with a corresponding control m-line in the SDP answer 
    response.  
  
 S Shanmugham                  IETF-Draft                        Page 9 

                            MRCPv2 Protocol              October, 2004 

     
    The a=setup attribute as described in [20] MUST be "active" for the 
    offer from the client and MUST be "passive" for the answer from the 
    MRCP server. The a=connection attribute MUST have a value of "new" 
    on the very first control m-line offer from the client to a MRCP 
    server. Subsequent control m-lines offers from the client to the 
    MRCP server MAY contain "new" or "existing", depending on whether 
    the client wants to share a existing connection oriented pipe. The 
    value of "existing" tells the server that the client wants to reuse 
    an existing transport connection between the client and the server. 
    The server can respond with a value of "existing", if wants to allow 
    sharing of existing pipes or can reply with a value of "new", in 
    which case the client MUST initiate new connection oriented pipe.   
     
    Note: Only SDP m-lines having a common SDP format field of 
    "application/mrcpv2" can share connection orient pipes between them. 
    Such pipe is reserved exclusively for MRCPv2 communication and 
    cannot be shared with any other protocol.  
     
    When the client wants to de-allocate the resource from this session, 
    it MUST initiate a SIP re-INVITE dialog with the server and MUST 
    offer the control m-line with a port 0. The server MUST then answer 
    the control m-line with a response of port 0. This de-allocates the 
    usage of the associated MRCP identifier and resource. But may not 
    close the TCP, SCTP or TLS connection if it is currently being 
    shared among multiple MRCP channels. When all MRCP channels that may 
    be sharing the connection are released and the associated SIP 
    connections are closed, the client or server disconnect the shared 
    connection oriented pipe. 
     
    Example 1:  
    This exchange adds a resource control channel for a synthesizer. 
    Since a synthesizer would be generating an audio stream, this 
    interaction also creates a receive-only audio stream for the server 
    to send audio to. 
      
    C->S:  
           INVITE sip:mresources@mediaserver.com SIP/2.0  
           Via: SIP/2.0/TCP client.atlanta.example.com:5060;  
                branch=z9hG4bK74bf9  
           Max-Forwards: 6  
           To: MediaServer <sip:mresources@mediaserver.com>  
           From: sarvi <sip:sarvi@cisco.com>;tag=1928301774  
           Call-ID: a84b4c76e66710
💿 文件大小 87 K
👤 上传用户 hz5305259
📂 所属分类通讯编程文档
📄 代码行数 1,367 行
💻 语言类型 TXT
🏷️ 相关标签

#MRCP #协议
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -