📄 rfc2833.txt
字号:
The payload format is shown in Fig. 1.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| event |E|R| volume | duration |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: Payload Format for Named Events
events: The events are encoded as shown in Sections 3.10 through
3.14.
volume: For DTMF digits and other events representable as tones,
this field describes the power level of the tone, expressed
in dBm0 after dropping the sign. Power levels range from 0 to
-63 dBm0. The range of valid DTMF is from 0 to -36 dBm0 (must
accept); lower than -55 dBm0 must be rejected (TR-TSY-000181,
ITU-T Q.24A). Thus, larger values denote lower volume. This
value is defined only for DTMF digits. For other events, it
is set to zero by the sender and is ignored by the receiver.
Schulzrinne & Petrack Standards Track [Page 6]
RFC 2833 Tones May 2000
duration: Duration of this digit, in timestamp units. Thus, the
event began at the instant identified by the RTP timestamp
and has so far lasted as long as indicated by this parameter.
The event may or may not have ended.
For a sampling rate of 8000 Hz, this field is sufficient to
express event durations of up to approximately 8 seconds.
E: If set to a value of one, the "end" bit indicates that this
packet contains the end of the event. Thus, the duration
parameter above measures the complete duration of the event.
A sender MAY delay setting the end bit until retransmitting
the last packet for a tone, rather than on its first
transmission. This avoids having to wait to detect whether
the tone has indeed ended.
Receiver implementations MAY use different algorithms to
create tones, including the two described here. In the first,
the receiver simply places a tone of the given duration in
the audio playout buffer at the location indicated by the
timestamp. As additional packets are received that extend the
same tone, the waveform in the playout buffer is extended
accordingly. (Care has to be taken if audio is mixed, i.e.,
summed, in the playout buffer rather than simply copied.)
Thus, if a packet in a tone lasting longer than the packet
interarrival time gets lost and the playout delay is short, a
gap in the tone may occur. Alternatively, the receiver can
start a tone and play it until it receives a packet with the
"E" bit set, the next tone, distinguished by a different
timestamp value or a given time period elapses. This is more
robust against packet loss, but may extend the tone if all
retransmissions of the last packet in an event are lost.
Limiting the time period of extending the tone is necessary
to avoid that a tone "gets stuck". Regardless of the
algorithm used, the tone SHOULD NOT be extended by more than
three packet interarrival times. A slight extension of tone
durations and shortening of pauses is generally harmless.
R: This field is reserved for future use. The sender MUST set it
to zero, the receiver MUST ignore it.
Schulzrinne & Petrack Standards Track [Page 7]
RFC 2833 Tones May 2000
3.6 Sending Event Packets
An audio source SHOULD start transmitting event packets as soon as it
recognizes an event and every 50 ms thereafter or the packet interval
for the audio codec used for this session, if known. (The sender does
not need to maintain precise time intervals between event packets in
order to maintain precise inter-event times, since the timing
information is contained in the timestamp.)
Q.24 [5], Table A-1, indicates that all administrations surveyed
use a minimum signal duration of 40 ms, with signaling velocity
(tone and pause) of no less than 93 ms.
If an event continues for more than one period, the source generating
the events should send a new event packet with the RTP timestamp
value corresponding to the beginning of the event and the duration of
the event increased correspondingly. (The RTP sequence number is
incremented by one for each packet.) If there has been no new event
in the last interval, the event SHOULD be retransmitted three times
or until the next event is recognized. This ensures that the duration
of the event can be recognized correctly even if the last packet for
an event is lost.
DTMF digits and events are sent incrementally to avoid having the
receiver wait for the completion of the event. Since some tones
are two seconds long, this would incur a substantial delay. The
transmitter does not know if event length is important and thus
needs to transmit immediately and incrementally. If the receiver
application does not care about event length, the incremental
transmission mechanism avoids delay. Some applications, such as
gateways into the PSTN, care about both delays and event duration.
3.7 Reliability
During an event, the RTP event payload format provides incremental
updates on the event. The error resiliency depends on the playout
delay at the receiver. For example, for a playout delay of 120 ms and
a packet gap of 50 ms, two packets in a row can get lost without
causing a gap in the tones generated at the receiver.
The audio redundancy mechanism described in RFC 2198 [6] MAY be used
to recover from packet loss across events. The effective data rate is
r times 64 bits (32 bits for the redundancy header and 32 bits for
the telephone-event payload) every 50 ms or r times 1280 bits/second,
where r is the number of redundant events carried in each packet. The
value of r is an implementation trade-off, with a value of 5
suggested.
Schulzrinne & Petrack Standards Track [Page 8]
RFC 2833 Tones May 2000
The timestamp offset in this redundancy scheme has 14 bits, so
that it allows a single packet to "cover" 2.048 seconds of
telephone events at a sampling rate of 8000 Hz. Including the
starting time of previous events allows precise reconstruction of
the tone sequence at a gateway. The scheme is resilient to
consecutive packet losses spanning this interval of 2.048 seconds
or r digits, whichever is less. Note that for previous digits,
only an average loudness can be represented.
An encoder MAY treat the event payload as a highly-compressed version
of the current audio frame. In that mode, each RTP packet during an
event would contain the current audio codec rendition (say, G.723.1
or G.729) of this digit as well as the representation described in
Section 3.5, plus any previous events seen earlier.
This approach allows dumb gateways that do not understand this
format to function. See also the discussion in Section 1.
3.8 Example
A typical RTP packet, where the user is just dialing the last digit
of the DTMF sequence "911". The first digit was 200 ms long (1600
timestamp units) and started at time 0, the second digit lasted 250
ms (2000 timestamp units) and started at time 800 ms (6400 timestamp
units), the third digit was pressed at time 1.4 s (11,200 timestamp
units) and the packet shown was sent at 1.45 s (11,600 timestamp
units). The frame duration is 50 ms. To make the parts recognizable,
the figure below ignores byte alignment. Timestamp and sequence
number are assumed to have been zero at the beginning of the first
digit. In this example, the dynamic payload types 96 and 97 have been
assigned for the redundancy mechanism and the telephone event
payload, respectively.
Schulzrinne & Petrack Standards Track [Page 9]
RFC 2833 Tones May 2000
3.9 Indication of Receiver Capabilities using SDP
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number |
| 2 |0|0| 0 |0| 96 | 28 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp |
| 11200 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier |
| 0x5234a8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| block PT | timestamp offset | block length |
|1| 97 | 11200 | 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| block PT | timestamp offset | block length |
|1| 97 | 11200 - 6400 = 4800 | 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| Block PT |
|0| 97 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| digit |E R| volume | duration |
| 9 |1 0| 7 | 1600 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| digit |E R| volume | duration |
| 1 |1 0| 10 | 2000 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| digit |E R| volume | duration |
| 1 |0 0| 20 | 400 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: Example RTP packet after dialing "911"
Receivers MAY indicate which named events they can handle, for
example, by using the Session Description Protocol (RFC 2327 [7]).
The payload formats use the following fmtp format to list the event
values that they can receive:
a=fmtp:<format> <list of values>
The list of values consists of comma-separated elements, which can be
either a single decimal number or two decimal numbers separated by a
hyphen (dash), where the second number is larger than the first. No
whitespace is allowed between numbers or hyphens. The list does not
have to be sorted.
Schulzrinne & Petrack Standards Track [Page 10]
RFC 2833 Tones May 2000
For example, if the payload format uses the payload type number 100,
and the implementation can handle the DTMF tones (events 0 through
15) and the dial and ringing tones, it would include the following
description in its SDP message:
a=fmtp:100 0-15,66,70
Since all implementations MUST be able to receive events 0 through
15, listing these events in the a=fmtp line is OPTIONAL.
The corresponding MIME parameter is "events", so that the following
sample media type definition corresponds to the SDP example above:
audio/telephone-event;events="0-11,66,67";rate="8000"
3.10 DTMF Events
Table 1 summarizes the DTMF-related named events within the
telephone-event payload format.
Event encoding (decimal)
_________________________
0--9 0--9
* 10
# 11
A--D 12--15
Flash 16
Table 1: DTMF named events
3.11 Data Modem and Fax Events
Table 3.11 summarizes the events and tones that can appear on a
subscriber line serving a fax machine or modem. The tones are
described below, with additional detail in Table 7.
ANS: This 2100 +/- 15 Hz tone is used to disable echo
suppression for data transmission [8,9]. For fax machines,
Recommendation T.30 [9] refers to this tone as called
terminal identification (CED) answer tone.
/ANS: This is the same signal as ANS, except that it reverses
phase at an interval of 450 +/- 25 ms. It disables both
echo cancellers and echo suppressors. (In the ITU
Recommendation V.25 [8], this signal is rendered as ANS
with a bar on top.)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -