Audio Coding

Digital Audio Coding Dr. T.
Collins
Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4
Ancient Audio Coding Methods

Audio coding has actually been around for hundreds of years Traditionally, composers record their music by writing out the notes in a standard notation
200 year old example of audio coding
A slightly more modern equivalent example is the Victorian piano-rolls
Standard MIDI Files

A piano roll can be efficiently digitally encoded by recording the time when each note begins and ends This is what a standard MIDI file does The MIDI standard (Musical Instrument Digital Interface) is an internationally agreed language Standard MIDI files encode

MIDI events/messages e.g. note-on, note-off, etc. The time delay between each event Up to 16 different instruments to be played at once Transmission of parameters containing key velocity, volume, modulation etc.
As well as encoding note limits, it also allows:

Standard MIDI Limitations
In a MIDI file, it is the instructions to play the notes that are stored, not the audio itself The quality of the reproduction depends on the synthesiser used for playback
Original recording
Playback on other synthesisers / sound cards
MIDI vs. Digital Audio

MIDI Digital Audio
Stores instructions to turn notes Stores the actual sampled audio on and off Very efficient (typical rate: 1 kbps) Playback quality depends on the MIDI device Less efficient (typical rate: 100 kbps) Playback quality is always the same
Only synthesised instruments can Any sounds (including speech be used and singing) can be recorded
Sampling

Digital audio represents the continuous analogue audio waveform by a series of discrete samples The Sample rate must be at least double the bandwidth of the audio signal
Typical hi-fi sample rates are 44.1 kHz (CD audio) and 48 kHz (DAT tape and DAB radio)
Sound Pressure Level
Sample rate
Fs/2
Fs
Frequency
Quantisation levels

Each sample is quantised to be represented by a binary integer The number of bits used to represent each sample sets the number of quantisation levels The error between the quantised signal and the original audio is the quantisation noise Peak signal-to-quantisation noise ratio using n-bits per sample can be estimated as:
SNR 6 dB n
CD audio uses 16 bit resolution giving a dynamic range of ~96 dB To hear the quantisation noise, the signal level would be close to the threshold of pain!
Sub-band Coding

Like the eye, the ear is more sensitive to some frequencies than others Many audio coding algorithms exploit this using a form of subband coding
Filters Downsample Quantise Multiplex
Digital audio in

16x48000 =768 kbps 16x3x48000 =2304 kbps 16x3x16000 4x3x16000 =768 kbps =192 kbps
Coded audio out
Bit rates:
Perceptual Coding
A key question when designing a sub-band coder:
What should the quantisation levels of the sub-bands be?
Remember that the quantisation process will introduce noise and that we want the noise to be imperceptible We want the noise to be just below the threshold of hearing (also known as the Minimum Audible Field, MAF) So, the question should be:
What is the MAF in each sub-band?
To estimate this, look at Robinson-Dadson curves
Equal Loudness Curves
Quantisation Implications
80 70 Sound Pressure Level [dB-SPL] 60 50 40 30 20 10 0 -10 -20 -30 5000 10000 15000 Peak Signal Level
12 16 bits bits
Threshold of Hearing
Frequency [Hz]
Quantisation noise
Application to Sub-band Coding

80 70 Sound Pressure Level [dB-SPL] 60 50 40 30 20 10 0 -10 -20 -30 5000 10000 15000 Peak Signal Level
11 bits
12 bits
12 bits
12 bits
11 bits
10 bits
9 bits
9 bits
10 bits
10 bits
10 bits
9 bits
Frequency [Hz]
Psychoacoustics
Substantial improvements to our sub-band coder are possible using psychoacoustics Psychoacoustics is the study of how sound is perceived by the ear-brain combination Of interest to us: how the threshold of hearing is not constant In fact, the threshold of hearing constantly changes due to masking
Masking
Signal Signal + Noise (SNR = 24 dB) Noise
In the presence of the signal, the noise sounds much quieter (almost undetectable) Due to the anatomy of the ear, loud sounds mask quieter sounds at nearby frequencies Effectively, the threshold of hearing is raised to the masking threshold The masking threshold can be estimated using a psychoacoustic model and exploited by the coder
The Masking Threshold

80 70 Sound Pressure Level [dB-SPL] 60 50 40 30 20 10 0 -10 -20 -30 5000 10000 15000 Masking threshold Threshold of Hearing Signal
Frequency [Hz]
Applying Masking
80 70 Sound Pressure Level [dB-SPL] 60
Space Oddity, Bowie Frame used for example
5 50 bits
40 30 20 10 0 -10 -20 -30
5 bits
5 bits
5 bits
4 bits
4 bits
4 bits
4 bits
Masking threshold
4 bits
3 bits
2 bits
2 bits
Frequency [Hz] 5000 10000 15000
Average bits per sample = 3.92 Compression ratio = 16:3.92 = 4.1:1
Additional Side Information

The audio signal is processed in discrete blocks of samples known as frames Each frame of each sub-band is:

Scaled to normalise the peak signal level Quantised at a level appropriate for the current signal-tomask ratio
The receiver needs to know the scale factor and quantisation levels used This information must be embedded along with the samples The resulting overhead is very small compared with the compression gains
Block Diagrams
Digital Audio In Sub-band filter bank
Masking thresholds
Scale and Quantise Code Side Info
Multiplex and Data Format
Coded Audio Out
FFT
Psychoacoustic model
ENCODER
Descale & Dequantise Decode Side Info Inverse filter bank Digital Audio Out
Coded Audio In
DeMultiplex
DECODER
MPEG 1: Layers 1, 2 & 3

Three perceptual coders are available in the MPEG 1 specification They are know as layers 1, 2 & 3 Layer 1 (.mp1)

Similar to the simple coder just described 32 sub-bands are used Each frame contains 384 samples (32 x 12) A version of layer 1 was used in the Digital Compact Cassette (DCC) Slightly more complex but better quality than layer 1 Frame length increased to 1152 samples (32 x 36)
Layer 2 (.mp2)

MPEG 1: Layers 1, 2 & 3 (cont)
Layer 2 (cont)

Data formatting of samples and side information is slightly more efficient Used in Digital Audio Broadcasting (DAB) Significantly more complex than layers 1 or 2 Capable of reasonable quality even at very low data rates A combination of sub-band coding and transform coding is used to give up to 576 frequency bands (compared to 32 for layers 1 & 2) Huffman encoding is applied to samples MP3 files now hugely popular for internet and mobile users
Layer 3 (.mp3)

Other Perceptual Coders
The same principles are applied in subtly different ways in most general-purpose audio coders E.g.

Real Audio Microsofts WMA format MiniDisc (ATRAC)
MPEG-4
In the latest version of MPEG, MPEG-4, the specification includes:
General audio coders: Similar to MPEG 1 but including multichannel support Parametric coder: HILN (Harmonics, Individual Lines and Noise) for very low bit rates Speech coders: HVXC and CELP speech coders Structured Audio: Similar to MIDI but including instrument models. Used for synthetic audio. Synthesised Speech: Allows speech to be coded as text and resynthesised at the decoder
Summary
Standard MIDI files
Work by encoding the structure of the music Work by removing the perceptual redundancy from digitised audio Removes perceptual redundancy and statistical redundancy (by entropy coding) Coding method can be chosen to suit signal source Perceptual, statistical and structural redundancy can be exploited
MPEG-1 Layers 1 & 2
MPEG-1 Layer 3
MPEG-4

Audio Coding

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Audio Coding

Uploaded by

Copyright:

Available Formats

Digital Audio Coding Dr. T.

Ancient Audio Coding Methods

200 year old example of audio coding

A slightly more modern equivalent example is the Victorian piano-rolls

Standard MIDI Files

As well as encoding note limits, it also allows:

Standard MIDI Limitations

Playback on other synthesisers / sound cards

MIDI vs. Digital Audio

Sound Pressure Level

Coded audio out

A key question when designing a sub-band coder:

What should the quantisation levels of the sub-bands be?

What is the MAF in each sub-band?

To estimate this, look at Robinson-Dadson curves

Equal Loudness Curves

Application to Sub-band Coding

The Masking Threshold

40 30 20 10 0 -10 -20 -30

Frequency [Hz] 5000 10000 15000

Average bits per sample = 3.92 Compression ratio = 16:3.92 = 4.1:1

Additional Side Information

Scale and Quantise Code Side Info

Multiplex and Data Format

Coded Audio Out

MPEG 1: Layers 1, 2 & 3

MPEG 1: Layers 1, 2 & 3 (cont)

Other Perceptual Coders

Real Audio Microsofts WMA format MiniDisc (ATRAC)

In the latest version of MPEG, MPEG-4, the specification includes:

Standard MIDI files

MPEG-1 Layers 1 & 2

You might also like