Professional Documents
Culture Documents
1. Abstract
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC 2119 [1].
3. Introduction
Lakaniemi/Koskelainen [page 1]
RTP Payload Format for AMR March 10, 2000
(ETSI). The AMR codec is standardized for GSM, and it is also chosen
by the Third Generation Partnership Project (3GPP) as the mandatory
speech codec for the third generation systems. AMR provides high
speech quality under a wide range of transmission conditions and is
well suitable also for other than mobile applications.
The AMR includes eight different speech coding modes, whose bit-
rates range from 4.75 to 12.2 kbit/s. The sampling rate is 8000 Hz
and processing is performed on 20 ms frames. Some of the AMR speech
coding modes are speech codecs specified for other standards: the
6.7 kbit/s mode as the ACELP codec specified in section 5.4 of [4]
(PDC-EFR), the 7.4 kbit/s mode as IS-641 codec in TDMA [5] and the
12.2 kbit/s mode as GSM EFR [6].
It is possible that the decoder may want to receive certain AMR mode
for e.g. capacity or quality reasons. This can be signaled to the
other end-point by including a mode request into transmitted packet.
4. Payload format
The RTP payload format for AMR codec consists of variable length
payload header, followed by one or more AMR payload frames. In most
cases the actual payload data does not fill the octet structure. In
these cases the unused bits in the last octet of the payload are
padded with bits of value 0.
The length of the AMR payload header is either 1 or 5 bits and the
header bits are defined as follows:
Lakaniemi/Koskelainen [page 2]
RTP Payload Format for AMR March 10, 2000
+-+
|R|
+-+
Figure 1: Payload header with R=0
+-+-+-+-+-+
|R| MR |
+-+-+-+-+-+
Figure 2: Payload header with R=1. Bits are stored into MR field
from LSB to MSB.
An AMR payload frame has variable size and it consists a 4-bit frame
type field, followed by the AMR speech or CN bits. Note that the AMR
payload frame format is exactly the same as the AMR Interface Format
2 (AMR IF2) defined in Annex A of [9]. The AMR payload frame is
defined as follows:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R| MR | FT1 | |
+-+-+-+-+-+-+-+-+-+ +
| SP1 (103 bits) |
+ +
| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | FT2 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| |
+ SP2 (95 bits) +
| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Lakaniemi/Koskelainen [page 3]
RTP Payload Format for AMR March 10, 2000
Figure 4: An RTP payload for AMR with mode request (R=1) and two AMR
payload frames (a 5.15 kbit/s frame followed by a 4.75 kbit/s frame.
The AMR payload is stored into octets starting from the LSB of the
first octet and filling all octets from LSB to MSB. Possible unused
bits in the MSB of the last octet of the payload are set to value 0.
The octet structure is constructed as defined by the c-like pseudo
code below. Note that in this formula LSB is bit 0 and MSB is bit 7.
c = j;
for (j = 0; j < Nf; j++)
{
for (i = 0; i < N(j); i++)
{
n = c / 8;
k = c % 8;
b(n,k) = f(j,i);
c++;
}
Lakaniemi/Koskelainen [page 4]
RTP Payload Format for AMR March 10, 2000
The timestamp of the RTP header must indicate the sampling time of
the first sample of the first frame in the packet. The time is
indicated as samples, i.e. frame length 20 ms and sampling rate 8
kHz mean that time stamp is advanced by 160 (samples) for each
frame. All frames in a packet must be successive 20 ms frames,
stored in the order they are generated by the encoder.
The encoder shall set the marker bit (M) of the RTP header to value
1 for packets containing the first active speech frame after a non-
speech speech period. For all other packets the marker bit is set to
0.
7. References
Lakaniemi/Koskelainen [page 5]
RTP Payload Format for AMR March 10, 2000
[7] GSM 06.92: Comfort noise aspects for Adaptive Multi-Rate (AMR)
speech traffic channels
[8] GSM 06.62: Comfort noise aspect for Enhanced Full Rate (EFR)
speech traffic channels
8. Author’s Addresses
Ari Lakaniemi
Nokia Research Center
P.O.Box 407
FIN-00045 Nokia Group
Finland
Email: ari.lakaniemi@nokia.com
Petri Koskelainen
Nokia Research Center
P.O.Box 100
FIN-33721 Tampere
Finland
Email: petri.koskelainen@nokia.com
Lakaniemi/Koskelainen [page 6]