You are on page 1of 9

Converting voice to packets and Bandwidth calculation

Cvoice
Jim Nechleba

Nyquist theorem
The Bell Systems Corporation was trying to find a way to deploy more voice circuits with less wire, because analog voice technology required one pair of wires for each voice line. For organizations requiring many voice circuits, this meant running bundles of cable. Long ago, Dr. Harry Nyquist (and many others) created a process that allows equipment to convert analog signals (flowing waveforms) into digital format (1s and 0s). Nyquist found that he could accurately reconstruct audio streams by taking samples that numbered twice the highest audio frequency used in the audio.

The average human ear is able to hear frequencies from 2020,000 Hz. Human speech uses frequencies from 2009000 Hz. Telephone channels typically transmit frequencies from 3003400 Hz. The Nyquist theorem is able to reproduce frequencies from 3004000 Hz.

Studies have found that telephone equipment can accurately transmit understandable human conversation by sending only a limited range of frequencies. The telephone channel frequency range (3003400 Hz) gives you enough sound quality to identify the remote caller and sense their mood. Nyquist believed that you could accurately reproduce an audio signal by sampling at twice the highest frequency. Because he was after audio frequencies from 3004000 Hz, it would mean sampling 8000 times (2 * 4000) every second. A sample is a numeric value. More specifically, in the voice realm, a sample is a numeric value that consumes a single byte of information.

Converting Analog Voice signals to Digital


The sampling device puts an analog waveform against a Y-axis lined with numeric values. This process of converting the analog wave into digital, numeric values is known as quantization. Because 1 byte of information is only able to represent values 0255, the quantization of the voice scale is limited to values measuring a maximum peak of +127 and a maximum low of 127. Positive and negative values are not evenly spaced. This is by design. To achieve a more accurate numeric value (and thus, a more accurate reconstructed signal at the other end), the frequencies more common to voice are tightly packed with numeric values, whereas the fringe frequencies on the high and low end of the spectrum are more spaced apart.

The quantization method


The sampling device breaks the 8 binary bits in each byte into two components: a positive/ negative indicator and the numeric representation. The first bit indicates positive or negative, and the remaining seven bits represent the actual numeric value.
Because the first bit is a 1, you read the number as positive. The remaining seven bits represent the number 52. This would be the digital value used for one voice sample. Nyquist theorem dictates that you need to take 8000 of those samples every single second. Doing the math, figure 8000 samples a second times the 8 bits in each sample, and you get 64,000 bits per second .

G.711 mu-law, G.711 a-law, G.729 codec


There are two forms of the G.711 codec: -law (used primary in the United States and Japan) and a-law (used everywhere else).. The quantization method described in the preceding paragraph represents G.711 a-law. G.711 -law codes in exactly the opposite way. If you were to take all the 1 bits and make them 0s and take all the 0 bits and make them 1s, you would have the G.711 -law equivalent.

Advanced codecs, such as G.729, allow you to compress the number of samples sent and thus use less bandwidth. This is possible because sampling human voice 8000 times a second produces many samples that are very similar or identical. The process G.729 (and most other compressed codecs) uses to compress this audio is to send a sound sample once and simply tell the remote device to continue playing that sound for a certain time interval. This is often described as building a codebook of the human voice traveling between the two endpoints. Using this process, G.729 is able to reduce bandwidth down to 8 kbps for each call; a fairly massive reduction in bandwidth. Unfortunately, chopping the amount of bandwidth down comes with a price. Quality is usually impacted by the compression process.

Choosing a Voice Codec


When selecting a voice codec for your network, you should ask the following questions regarding the codec:
How many Digital Signal Processor (DSP) resources does it take to code audio using the codec? How much bandwidth does the codec consume? How does the codec handle packet loss? Does the codec support multiple sample sizes? What are the ramifications of using them?

Calculating Codec bandwidth Requirements


Step 1. Determine the audio bandwidth required for the audio codec itself. Step 2. Determine data link, network, and transport layer overhead. Step 3. Add any additional overhead amounts. Step 4. Add it all together. Step 5. Subtract bandwidth savings measures.

Calculating Codec bandwidth Requirements


Step 1: Determine the Audio Bandwidth Required for the Audio Codec Itself To find the amount of bandwidth required for the audio codec, you need to determine the size (in bytes) of audio contained in each packet. This size is directly impacted by the audio sample size contained in each packet. The sample size is a specific time interval of audio. For most audio codecs, the sample size is 20 milliseconds (ms), by default. Increasing the sample size gives you a bandwidth savings benefit because the router sends fewer packets overall (and fewer packets mean less header information). The drawback to increasing the sample size is that the overall delay in building the packet is increased. If the two devices communicating already have significant delay between them (due to distance, traffic sharing the link, and so on), the additional coding delay could cause quality of service (QoS) issues. You can use the following formula to determine the voice payload size:

Bytes_Per_Packet = (Sample_Size * Codec_Bandwidth) / 8


The Sample_Size variable in the formula uses a unit value of seconds and the Codec_Bandwidth variable uses a unit value of bits per second (bps). So, if you had a G.729 call using a 20-ms sample size, the formula would calculate like this: Bytes_Per_Packet = (.02 * 8000) / 8 Bytes_Per_Packet = 160 / 8 Bytes_Per_Packet = 20

Calculating Codec bandwidth Requirements


Step 2: Determine Data Link, Network, and Transport Layer Overhead After youve found the amount of voice contained in each packet, you then need to calculate the amount of data contained in the header in each packet. The following values represent the amount of overhead for common data link layer network technologies: Ethernet: 20 bytes Frame Relay: 46 bytes Point-to-Point Protocol (PPP): 6 bytes At the network and transport layers of the OSI model, the values are fixed amounts: IP: 20 bytes UDP: 8 bytes Real-time Transport Protocol (RTP): 12 bytes

Step 3: Add Any Additional Overhead Amounts


Additional overhead gets added into the equation primarily if you are using VoIP over a VPN connection. The following are common overhead values based on the type of VPN used: GRE/L2TP: 24 bytes MPLS: 4 bytes IPsec: 5057 bytes

Calculating Codec bandwidth Requirements


Step 4: Add It All Together When you have all the values from the first three steps, you can add them together in a final equation: Total_Bandwidth = Packet_Size * Packets_Per_Second Now remember, youre after the total bandwidth per call. So, first you need to add together the values from Steps 13 to form the packet size. For example, if you were using the G.729 codec with a 20-ms sample size over an Ethernet network, the packet size would be as follows: + 20 bytes (voice payload) + 20 bytes (IP header) + 8 bytes (UDP header) + 12 bytes (RTP header) + 20 bytes (Ethernet header) ----------------------80 bytes per packet That gives you one piece of the equation, the packet size with overhead. To find the number of packets per second, some simple reasoning can come into play. Remember, each packet contains a 20ms sample size, and 1 second is 1000 milliseconds. So, if you take 1000 ms / 20 ms = 50 ms, this helps you find that it will take 50 packets per second to generate the full second of audio. This now give you all the pieces you need to find the final amount of bandwidth per call: Total_Bandwidth = Packet_Size * Packets_Per_Second Total_Bandwidth = 80 bytes * 50 Packets_Per_Second Total_Bandwidth = 4000 bytes per second Because network engineers do not usually assess network speed in bytes per second, you might want to multiply the final answer by 8 to find the bits per second (because there are 8 bits in a byte):

4000 * 8 = 32,000 bits per second (more commonly written 32 kbps); G729 Codex, one call

You might also like