Professional Documents
Culture Documents
Fraunhofer IIS
Outline
MPEG Audio
Nikolaus Rettelbach
Manfred Lutzky
Fraunhofer IIS
Inhalt/Titel durch
Klicken
hinzufgen
MPEG-Audio:
Mature
Standards
MPEG-4: AAC-ELD
MPEG-D: MPEG Surround
MPEG-4 HE-AAC, HE-AACv2
MPEG-4: Audio (AAC-LC/LD/SCAL, TwinVQ, BSAC)
MPEG-2: NBC = AAC
MPEG-1/2: Layer I, II, III
1992
Fraunhofer IIS
1997
2003
2007
2012
ISO/IEC
11172-3
Layer-I,II,III
AAC-LC
MPEG-4 Audio
Applications
13818-7
14496-3
AAC-(E)LD
HE-AAC/
HE-AACv2
MPEG-D
23003
MPEG Surround 23003-1
Fraunhofer IIS
Inhalt/Titel durch
Klicken
hinzufgen
MPEG-Audio:
Mature
Standards
1992
Fraunhofer IIS
1997
2003
2007
2012
Fraunhofer IIS
Time/
Frequency
(MDCT)
Psychoacoustic
Control
Quantization
(Scaling/
Noise
Shaping)
Scale-factors
Scale-factors
Inv.
Quantization/
Rescaling
Frequency/
Time
(IMDCT)
Noiseless
Coding
Bit Mux
Perceptual Encoder
Fraunhofer IIS
Noiseless
Decoding
(Huffman)
Audio Output
Decoder
xover
Amplitude
Encoder:
Amplitude
Fraunhofer IIS
Frequency
Amplitude
xover
xover
Fraunhofer IIS
MPEG-4 HE-AACv2
11
Inhalt/Titel durch
Klicken
hinzufgen
MPEG-Audio:
Mature
Standards
MPEG Surround
1992
Fraunhofer IIS
1997
2003
2007
2012
12
HE-AAC
Encoder
Stereo Bitstream
MPS Bitstream
HE-AAC
Decoder
Automatic
Stereo
Downmix
Stereo
Downmix
5.1 Playback
MPS
Encoder
5.1 PCM
MPS
Decoder
Binaural Playback
13
Fraunhofer IIS
MPEG Surround
High-Quality Surround Sound at Stereo Bit-Rates
MPEG Surround allows an efficient and backward compatible compression
of high-quality surround sound
14
Fraunhofer IIS
Inhalt/Titel durch
Klicken
hinzufgen
MPEG-Audio:
Recent
and Future
Standards
2003
Fraunhofer IIS
2007
2012
15
Inhalt/TitelHE-AAC
durch Klicken hinzufgen
Extended
MPEG xHE-AAC
2003
Fraunhofer IIS
2007
2012
16
Fraunhofer IIS
AAC-LC
18
Fraunhofer IIS
Encoder
Decoder
19
Bitstream De-Multiplex
Arithm.
Dec.
Scalefactors
ACELP
Inv.
Quant.
LPC
Dec.
Scaling
LPC to
Freq.
Dom.
LPC
Synth
Filter
IMDCT
FAC
Windowing, Overlap-Add
Bass Postfilter
Bandwidth Extension
Stereo Processing
20
Bitstream De-Multiplex
Arithm.
Dec.
Technical Highlights:
Frequency Domain Noise Shaping
Inv.
Quant.
LPC
Dec.
Scaling
LPC to
Freq.
Dom.
Scalefactors
ACELP
LPC
Synth
Filter
IMDCT
FAC
Windowing, Overlap-Add
Bass Postfilter
Bandwidth Extension
Stereo Processing
21
Fraunhofer IIS
Stereo
MUSHRA score
100
MUSHRA score
100
0
ad
24
126
16
24
AMR
Fraunhofer IIS
USAC
620
VC
24
8 16
b t ate [ bps]
20
664
AMR
HE-AAC
2496
832
48
64
96
22
23
Fraunhofer IIS
24
Fraunhofer IIS
Inhalt/Titel
Klicken hinzufgen
MPEG
SAOCdurch
and Dialogue
Enhancement
2003
Fraunhofer IIS
2007
2012
25
Dialogue Enhancement
Personalized User Experience
User benefit
Enables users to change the balance between dialogue and
background according to individual preferences
Dialog enhancement for better intelligibility
Adaptation to listening environment
Broadcaster benefit
Same audio mix for all listening environments
No need to send different audio versions
Backwards compatible with existing devices
Cost efficient hearing-impaired audio service
26
Fraunhofer IIS
Todays Challenge
Finding the Right Mix
Audio mix with one balance between dialogue and background is always
a compromise
Hearing impaired people require a higher loudness of the dialog
Non-native speakers need about 3dB higher S/N
The listening environment has an influence on the preferred setting of
the mix
Depends on content, e.g.
Sport events: commentary vs. stadium atmosphere
Movies: music & effects vs. dialogue level
27
Fraunhofer IIS
SAOC
Encoder
SAOC
Decoder
28
Fraunhofer IIS
Dialogue Enhancement
Object based with parameterized objects
Each audio element is treated as an object.
The objects are parameterized to allow their manipulation at the
receiver. The parameters are send with the mix.
Backward compatible transmission
Audio mix is not changed
Parameters embedded into bitstream
29
Fraunhofer IIS
Dialogue Enhancement
Signal Flow Overview
30
Fraunhofer IIS
Dialogue Enhancement
Scenarios
Stereo
Mono Dialog or Stereo Dialog
Stereo Background
5.1 Multi-channel
Mono Dialog, Center-only
Stereo Dialog: all three front channels (Left, Center, Right) contain
Dialog signal parts
5.1 Background
31
Fraunhofer IIS
Dialogue Enhancement:
Workflow Integration
Separate objects (sources) are available at the encoder:
Dialog and Background
Mix is done in the SAOC encoder
Bitstream
AAC
Encoder
Parameters
Mix
Dialog
Background
SAOC
Encoder
32
Fraunhofer IIS
Inhalt/Titel
durch
Klicken hinzufgen
MPEG:
Current
Standardization
2003
Fraunhofer IIS
2007
2012
33
MPEG-H 3D-AUDIO
34
Fraunhofer IIS
MPEG-H 3D-Audio
Idea
Fraunhofer IIS
MPEG-H 3D-Audio
Format
36
Fraunhofer IIS
MPEG-H 3D-Audio
Flowchart Channel+Object
Direct Loudspeaker
Output (22.2)
Compressed
bitstream,
256 ... 1200 kbps
Channel+Object
Decoder
Object Renderer
Format Conversion
to reduced number
of Loudspeakers
(e.g. 8.1, 5.1)
Headphone
processing
37
Fraunhofer IIS
MPEG-H 3D-Audio
Timeline
38
Fraunhofer IIS
Inhalt/Titel durch
Klicken hinzufgen
MPEG-Audio:
Communication
Codecs
MPEG-4 AAC-ELDv2
MPEG-4 AAC-ELD
MPEG-4 AAC-LD
1997
Fraunhofer IIS
2003
2007
2012
39
MPEG
Surround
2004
240
algorithmic delay [ms]
2006
HE-AAC
v2
180
2003
HE-AAC
1999
120
AAC-LC
60
0
10
24
40 64
Bit rate per channel [kbps]
128
40
Fraunhofer IIS
2004
240
algorithmic delay [ms]
2006
HE-AAC
v2
180
2003
HE-AAC
1999
120
AAC-LC
60
AAC-LD
0
10
24
40 64
Bit rate per channel [kbps]
2000
128
41
Fraunhofer IIS
2004
240
algorithmic delay [ms]
2006
HE-AAC
v2
180
2003
HE-AAC
1999
120
2008
60
AAC-LC
AAC-ELD
0
10
24
40 64
Bit rate per channel [kbps]
AAC-LD
2000
128
42
Fraunhofer IIS
2006
2004
240
HE-AAC
v2
180
2003
HE-AAC
1999
120
2011
2008
60
0
AAC-ELD
v2
10
Fraunhofer IIS
24
AAC-ELD
AAC-LC
AAC-LD
40 64
Bit rate per channel [kbps]
2000
128
43
2006
2004
240
HE-AAC
v2
180
2003
HE-AAC
1999
120
2011
2014 sched.
60
0
AAC-ELD
3GPP
EVS v2
10
Fraunhofer IIS
2008
24
AAC-ELD
AAC-LC
AAC-LD
40 64
Bit rate per channel [kbps]
2000
128
44
ISO/MPEG AAC-ELD
2011
AAC-ELD
v2
+ MPS
2008
+ SBR
AAC-ELD
AAC-LD
Status:
AAC-ELD International MPEG Standard Q4/2007
AAC-ELD v2 International MPEG Standard, part of MPEG SAOC
Innovation of AAC-ELD:
Low delay Spectral Bandwidth Replication (SBR)
Delay optimized filterbank/window
Innovation of AAC-ELD v2:
Low Delay MPEG Surround (MPS)
Delay optimized codec structure
Fraunhofer IIS
45
ISO/MPEG AAC-ELD
Main features:
AAC-LD
AAC-ELD
AAC-ELD v2
bandwidth
Algorithmic
delay
20 40 ms
15-32 ms
21- 39 ms
(stereo mode)
Typical
bitrate
[kBit/s]
32 (mono)128 (stereo)
24 (mono)
128 (stereo)
24-48 (stereo)
Frequency
domain
mixing
Fraunhofer IIS
Josef-von-Fraunhofer
Prize 2011
46
ISO/MPEG AAC-ELD
licensing
patent pool for all relevant AAC family members run by via
AAC-LD, AAC-ELD v2 and AAC-LC, HE AACv2 part of unified license
Licensors: AT&T, Dolby, Fraunhofer, Philips, LG, Microsoft, NEC, Nokia, NTT,
Orange, Panasonic, Sony, Ericsson
http://www.vialicensing.com/licensing/aac-overview.aspx
47
Fraunhofer IIS
ISO/MPEG AAC-ELD
stereo quality
48
Fraunhofer IIS
ISO/MPEG AAC-ELD
AAC-ELD delay optimized t->f transformations
Delay
saving
Fraunhofer IIS
ISO/MPEG AAC-ELD
Low delay Spectral Bandwidth Replication (SBR)
2008
AAC-ELD
+ SBR
AAC-LD
Warped Copy of
AAC Spectrum
Added Spectral
Components
Amplitude
Amplitude
AAC
Spectrum
Original Spectrum
Frequency
Frequency
Fraunhofer IIS
ISO/MPEG AAC-ELD
Low Delay MPEG Surround (MPS)
2011
AAC-ELD
v2
+ MPS
AAC-ELD
51
Fraunhofer IIS
ISO/MPEG AAC-ELD
Dt. Telekom listening test June 2010 (1)
Excellent quality (mushra points >80) can be achieved with some codecs
at different bitrates
AAC-ELD at 32kbps
The best performance has the AAC-ELD which offers excellent quality at
bitrates beginning at 48 kb/s
52
Fraunhofer IIS
Source: ftp://ftp.3gpp.org/tsg_sa/WG4_CODEC/TSGS4_59/Docs/S4-100479.zip
ISO/MPEG AAC-ELD
Dt. Telekom listening test June 2010 (2)
Mono bitrates in kbit/s for excellent quality
Codec
Arbeit
Club
Fiedel
Jazzpiano Rea
Speech average
AAC-ELD 32
24
48
32
32
32
32
AAC-LD 48
48
32
48
48
48
48
CELT
64
48
64
48
48
48
54
G.718
48
32
G.719
32
32
48
48
32
48
40
G722.1-C 48
32
24
32
G722.2
G.722
SILK
40
40
40
Speex
Source: ftp://ftp.3gpp.org/tsg_sa/WG4_CODEC/TSGS4_59/Docs/S4-100479.zip
Fraunhofer IIS
53
ISO/MPEG AAC-ELD
deployment
iOS
Natively in OS since 5.0 used for Face Time
Andorid:
Natively in OS since Jelly Bean (4.1)
OSX
Natively in OS since Lion
Videoconferencing
Defacto standard for high quality audio, TIP
EBU/ACIP broadcast contribution
Recommended codecs AAC-LC/LD
Fraunhofer IIS
54
2006
2004
240
HE-AAC
v2
180
2003
HE-AAC
1999
120
2011
2014 sched.
60
0
AAC-ELD
3GPP
EVS v2
10
Fraunhofer IIS
2008
24
AAC-ELD
AAC-LC
AAC-LD
40 64
Bit rate per channel [kbps]
2000
128
55
3GPP/SA4
Enhanced Voice Service (EVS) - Objectives
Next generation speech and audio codec for NGN services
5 Objectives:
1. Enhanced quality and coding efficiency for narrowband (NB) and
wideband (WB) speech services
2. Enhanced quality by the introduction of super-wideband (SWB)
speech
3. Enhanced quality for mixed content and music in conversational
applications (for example, in-call music)
4. Robustness to packet loss and delay jitter
5. Backward interoperability to the 3GPP AMR-WB codec
56
Fraunhofer IIS
3GPP/SA4
Enhanced Voice Service (EVS) - Design constraints
Design constraints:
sampling rates: 8-48 kHz
Channels: Mono, stereo
Bitrates: 7.2 128kbps CBR; 5.9 kbps VBR
SWB 13.2 kbps
Delay: 32 ms
Complexity: 88 wMOPS (=2x AMR-WB)
Features: JBM, rate switching, PLC, VAD/DTX/CNG
57
Fraunhofer IIS
3GPP/SA4
Enhanced Voice Service (EVS)
Time schedule:
Submission of qualification executable 11/2012
Qualification 03/2013
Submission of selection executable 11/2013
Selection 04/2014
SA4 finalization of characterization TR 8/2014
SA approval of Characterization TR 9/2014
58
Fraunhofer IIS
3GPP/SA4
Enhanced Voice Service (EVS) - Fraunhofer candidate
encoder
decoder
59
Fraunhofer IIS
3GPP/SA4
Enhanced Voice Service (EVS) qualification rules
WID objectives
1
Tests Sets
Weight
20%
10%
Enhanced
quality by the
introduction of
SWB speech
30%
Enhanced
quality on
mixed content
and music in
conversational
applications
Robustness to
packet loss and
delay jitter
Quality requirements
related to robustness to
packet losses and delay
jitter
5%
2.5%
7.5%
5%
0%
Enhanced
quality and
coding
efficiency for
NB and WB
speech
services
Backward
interoperability
to AMR-WB
Source: http://ftp.3gpp.org/tsg_sa/WG4_CODEC/TSGS4_70/Docs/S4-121249.zip
Fraunhofer IIS
10%
50% weight
SWB FoM
10%
These 4
items
will
count
together
in Rule
2a and
Rule 2b
60
3GPP/SA4
Enhanced Voice Service (EVS) - Quailification test results
Qualified for selection
FoM#1
FoM#2a
FoM#2b
100%
98%
96%
94%
92%
e-NTT
a-Fra
i-QCI
b-Hua
j-Sam
f-DOC
k-Eri
c-Mot
m-ZTE
u
z
n
q
Original PC Label / Blinded PC Label
d-Nok
h-Pan
g-FTO
l-Voi
61
Source: http://ftp.3gpp.org/tsg_sa/WG4_CODEC/TSGS4_72bis/Docs/S4-130292.zip
Fraunhofer IIS
Further Information I
MPEG Home Page:
http://mpeg.chiariglione.org/
MPEG-AAC
ISO/IEC MPEG-2 Advanced Audio Coding; Bosi et al; JAES Volume 45
Issue 10 pp. 789-814; October 1997
HE-AACv2:
EBU TECHNICAL REVIEW January 2006: MPEG-4 HE-AACv2 audio
coding for todays digital media world; Stefan Meltzer and Gerald
Moser http://tech.ebu.ch/docs/techreview/trev_305-moser.pdf
MPEG Surround:
http://www.mpegsurround.com/index.html
xHE-AAC:
MPEG Unified Speech and Audio Coding - The ISO/MPEG Standard
for High-Efficiency Audio Coding of All Content Types; Neuendorf et
al; AES Convention:132 (April 2012) Paper Number:8654
62
Fraunhofer IIS
Further Information II
Dialogue Enhancement
http://tech.ebu.ch/docs/techreview/trev_2012-Q2_DialogueEnhancement_Fuchs.pdf
http://www.iis.fraunhofer.de/de/bf/amm/forschundentw/forschaudiom
ulti/dialogenhanc.html
MPEG-H 3D Audio:
http://mpeg.chiariglione.org/standards/mpeg-h/3d-audio
AAC-ELDv2
http://www.full-hd-voice.com
EVS
Project plan:
http://ftp.3gpp.org/tsg_sa/WG4_CODEC/TSGS4_73/Docs/S4-130521.zip
Design Constraints
http://ftp.3gpp.org/tsg_sa/WG4_CODEC/TSGS4_65/Docs/S4-110710.zip
63
Fraunhofer IIS