📄 rfc2833.txt
字号:
Network Working Group H. SchulzrinneRequest for Comments: 2833 Columbia UniversityCategory: Standards Track S. Petrack MetaTel May 2000 RTP Payload for DTMF Digits, Telephony Tones and Telephony SignalsStatus of this Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.Copyright Notice Copyright (C) The Internet Society (2000). All Rights Reserved.Abstract This memo describes how to carry dual-tone multifrequency (DTMF) signaling, other tone signals and telephony events in RTP packets.1 Introduction This memo defines two payload formats, one for carrying dual-tone multifrequency (DTMF) digits, other line and trunk signals (Section 3), and a second one for general multi-frequency tones in RTP [1] packets (Section 4). Separate RTP payload formats are desirable since low-rate voice codecs cannot be guaranteed to reproduce these tone signals accurately enough for automatic recognition. Defining separate payload formats also permits higher redundancy while maintaining a low bit rate. The payload formats described here may be useful in at least three applications: DTMF handling for gateways and end systems, as well as "RTP trunks". In the first application, the Internet telephony gateway detects DTMF on the incoming circuits and sends the RTP payload described here instead of regular audio packets. The gateway likely has the necessary digital signal processors and algorithms, as it often needs to detect DTMF, e.g., for two-stage dialing. Having the gateway detect tones relieves the receiving Internet end system from having to do this work and also avoids that low bit-rate codecs like G.723.1 render DTMF tones unintelligible. Secondly, an InternetSchulzrinne & Petrack Standards Track [Page 1]RFC 2833 Tones May 2000 end system such as an "Internet phone" can emulate DTMF functionality without concerning itself with generating precise tone pairs and without imposing the burden of tone recognition on the receiver. In the "RTP trunk" application, RTP is used to replace a normal circuit-switched trunk between two nodes. This is particularly of interest in a telephone network that is still mostly circuit- switched. In this case, each end of the RTP trunk encodes audio channels into the appropriate encoding, such as G.723.1 or G.729. However, this encoding process destroys in-band signaling information which is carried using the least-significant bit ("robbed bit signaling") and may also interfere with in-band signaling tones, such as the MF digit tones. In addition, tone properties such as the phase reversals in the ANSam tone, will not survive speech coding. Thus, the gateway needs to remove the in-band signaling information from the bit stream. It can now either carry it out-of-band in a signaling transport mechanism yet to be defined, or it can use the mechanism described in this memorandum. (If the two trunk end points are within reach of the same media gateway controller, the media gateway controller can also handle the signaling.) Carrying it in-band may simplify the time synchronization between audio packets and the tone or signal information. This is particularly relevant where duration and timing matter, as in the carriage of DTMF signals.1.1 Terminology In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119 [2] and indicate requirement levels for compliant implementations.2 Events vs. Tones A gateway has two options for handling DTMF digits and events. First, it can simply measure the frequency components of the voice band signals and transmit this information to the RTP receiver (Section 4). In this mode, the gateway makes no attempt to discern the meaning of the tones, but simply distinguishes tones from speech signals. All tone signals in use in the PSTN and meant for human consumption are sequences of simple combinations of sine waves, either added or modulated. (There is at least one tone, the ANSam tone [3] used for indicating data transmission over voice lines, that makes use of periodic phase reversals.) As a second option, a gateway can recognize the tones and translate them into a name, such as ringing or busy tone. The receiver then produces a tone signal or other indication appropriate to the signal.Schulzrinne & Petrack Standards Track [Page 2]RFC 2833 Tones May 2000 Generally, since the recognition of signals often depends on their on/off pattern or the sequence of several tones, this recognition can take several seconds. On the other hand, the gateway may have access to the actual signaling information that generates the tones and thus can generate the RTP packet immediately, without the detour through acoustic signals. In the phone network, tones are generated at different places, depending on the switching technology and the nature of the tone. This determines, for example, whether a person making a call to a foreign country hears her local tones she is familiar with or the tones as used in the country called. For analog lines, dial tone is always generated by the local switch. ISDN terminals may generate dial tone locally and then send a Q.931 SETUP message containing the dialed digits. If the terminal just sends a SETUP message without any Called Party digits, then the switch does digit collection, provided by the terminal as KEYPAD messages, and provides dial tone over the B-channel. The terminal can either use the audio signal on the B-channel or can use the Q.931 messages to trigger locally generated dial tone. Ringing tone (also called ringback tone) is generated by the local switch at the callee, with a one-way voice path opened up as soon as the callee's phone rings. (This reduces the chance of clipping the called party's response just after answer. It also permits pre-answer announcements or in-band call-progress indications to reach the caller before or in lieu of a ringing tone.) Congestion tone and special information tones can be generated by any of the switches along the way, and may be generated by the caller's switch based on ISUP messages received. Busy tone is generated by the caller's switch, triggered by the appropriate ISUP message, for analog instruments, or the ISDN terminal. Gateways which send signaling events via RTP MAY send both named signals (Section 3) and the tone representation (Section 4) as a single RTP session, using the redundancy mechanism defined in Section 3.7 to interleave the two representations. It is generally a good idea to send both, since it allows the receiver to choose the appropriate rendering. If a gateway cannot present a tone representation, it SHOULD send the audio tones as regular RTP audio packets (e.g., as payload format PCMU), in addition to the named signals.Schulzrinne & Petrack Standards Track [Page 3]RFC 2833 Tones May 20003 RTP Payload Format for Named Telephone Events3.1 Introduction The payload format for named telephone events described below is suitable for both gateway and end-to-end scenarios. In the gateway scenario, an Internet telephony gateway connecting a packet voice network to the PSTN recreates the DTMF tones or other telephony events and injects them into the PSTN. Since, for example, DTMF digit recognition takes several tens of milliseconds, the first few milliseconds of a digit will arrive as regular audio packets. Thus, careful time and power (volume) alignment between the audio samples and the events is needed to avoid generating spurious digits at the receiver. DTMF digits and named telephone events are carried as part of the audio stream, and MUST use the same sequence number and time-stamp base as the regular audio channel to simplify the generation of audio waveforms at a gateway. The default clock frequency is 8,000 Hz, but the clock frequency can be redefined when assigning the dynamic payload type. The payload format described here achieves a higher redundancy even in the case of sustained packet loss than the method proposed for the Voice over Frame Relay Implementation Agreement [4]. If an end system is directly connected to the Internet and does not need to generate tone signals again, time alignment and power levels are not relevant. These systems rely on PSTN gateways or Internet end systems to generate DTMF events and do not perform their own audio waveform analysis. An example of such a system is an Internet interactive voice-response (IVR) system. In circumstances where exact timing alignment between the audio stream and the DTMF digits or other events is not important and data is sent unicast, such as the IVR example mentioned earlier, it may be preferable to use a reliable control protocol rather than RTP packets. In those circumstances, this payload format would not be used.3.2 Simultaneous Generation of Audio and Events A source MAY send events and coded audio packets for the same time instants, using events as the redundant encoding for the audio stream, or it MAY block outgoing audio while event tones are active and only send named events as both the primary and redundant encodings.Schulzrinne & Petrack Standards Track [Page 4]RFC 2833 Tones May 2000 Note that a period covered by an encoded tone may overlap in time with a period of audio encoded by other means. This is likely to occur at the onset of a tone and is necessary to avoid possible errors in the interpretation of the reproduced tone at the remote end. Implementations supporting this payload format must be prepared to handle the overlap. It is RECOMMENDED that gateways only render the encoded tone since the audio may contain spurious tones introduced by the audio compression algorithm. However, it is anticipated that these extra tones in general should not interfere with recognition at the far end.3.3 Event Types This payload format is used for five different types of signals: o DTMF tones (Section 3.10); o fax-related tones (Section 3.11); o standard subscriber line tones (Section 3.12); o country-specific subscriber line tones (Section 3.13) and; o trunk events (Section 3.14). A compliant implementation MUST support the events listed in Table 1 with the exception of "flash". If it uses some other, out-of-band mechanism for signaling line conditions, it does not have to implement the other events. In some cases, an implementation may simply ignore certain events, such as fax tones, that do not make sense in a particular environment. Section 3.9 specifies how an implementation can use the SDP "fmtp" parameter within an SDP description to indicate its inability to understand a particular event or range of events. Depending on the available user interfaces, an implementation MAY render all tones in Table 5 the same or, preferably, use the tones conveyed by the concurrent "tone" payload or other RTP audio payload. Alternatively, it could provide a textual representation. Note that end systems that emulate telephones only need to support the events described in Sections 3.10 and 3.12, while systems that receive trunk signaling need to implement those in Sections 3.10, 3.11, 3.12 and 3.14, since MF trunks also carry most of the "line" signals. Systems that do not support fax or modem functionality do not need to render fax-related events described in Section 3.11.Schulzrinne & Petrack Standards Track [Page 5]RFC 2833 Tones May 2000 The RTP payload format is designated as "telephone-event", the MIME type as "audio/telephone-event". The default timestamp rate is 8000 Hz, but other rates may be defined. In accordance with current practice, this payload format does not have a static payload type number, but uses a RTP payload type number established dynamically and out-of-band.3.4 Use of RTP Header Fields Timestamp: The RTP timestamp reflects the measurement point for the current packet. The event duration described in Section 3.5 extends forwards from that time. The receiver calculates jitter for RTCP receiver reports based on all packets with a given timestamp. Note: The jitter value should primarily be used as a means for comparing the reception quality between two users or two time-periods, not as an absolute measure. Marker bit: The RTP marker bit indicates the beginning of a new event.3.5 Payload Format The payload format is shown in Fig. 1. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -