Professional Documents
Culture Documents
Overview
Motivation Comparing DSP and Microcontroller architectures FIR and IIR filter benchmarks Real-world examples
2.1 channel speaker crossover Automotive audio system
Conclusion
In some cases they are integrated into a single SOC (System on a Chip) Can the audio processing tasks traditionally handled by a dedicated DSP be migrated to a microcontroller or an application processor?
Microcontrollers
Small register file Cached architecture Low power sleep modes Low cost Large number of peripherals Many variants Integrated flash memory Good driver support
USB/Ethernet/CAN/Flash/etc
Operating systems Many features fully programmable from C Low interrupt latency
October 21, 2011 Copyright 2011 DSP Concepts LLC 6
Application Processors
High-end microcontrollers Clock speeds up to 2 GHz High level of integration
Multiple processor cores Graphics coprocessor Networking USB Security features
Examples
ARM Cortex-A processor family Intel Atom
Used in
Smart phones iPad/tablets Set top boxes Automotive head units
Collision Course
DSPs are adding peripherals and increasing software support
Analog Devices Blackfin TI C5000
Operating systems
11
FIR Filter
Commonly used in
Audio processing Video processing Data smoothing Communications Control
xn z 1 h0
z 1
z 1 h2
z 1 h3 h4
h1
yn
12
Circular Addressing
Using a FIFO on a sample-by-sample basis is very inefficient Avoid data movement and use a circular buffer instead
x[n]
x[n-3]
x[n-2]
x[n-1]
x[n]
x[n-6]
x[n-5]
x[n-4]
stateIndex
coeffIndex
h[6]
h[5]
h[4]
h[3]
h[2]
h[1]
h[0]
sum = 0.0f; for(i=0;i<N;i++) Inner loop is over N filter coefficients { sum += state[stateIndex++] * coeffs[N-i]; if (stateIndex >= N) stateIndex = 0; } outPtr[sample] = sum; }
14
Two data fetches Multiplication Addition Circular addressing Pointer updates Looping
Copyright 2011 DSP Concepts LLC 15
Work arounds
Cache state variables and coefficients and compute 4 outputs in parallel Manually unroll the inner loop by a factor of 4 Use a FIFO but shift in data one block at a time (Similar techniques apply to the Cortex-A8)
16
DSP and Cortex-A8 rely on SIMD Cortex-M4 FIR library available from ARM (CMSIS DSP Library) Cortex-A8 FIR library will be available from DSP Concepts
17
Standard C Start with the textbook implementation of an algorithm and allow the C compiler to optimize as best as it can. Tuned C Hand optimize the code as best as possible while remaining in C. This involves loop unrolling, caching of variables, and using intrinsic functions. Assembly Get the absolute best performance possible using assembly coding.
Biquad Filter
Commonly used in audio: Tone controls Graphic EQ Loudness compensation Crossover filters Etc. Different structures have advantages Direct Form 1 better fixedpoint behavior Direct Form 2 less memory
x[n] z -1 x[n-1] z -1 x[n-2]
Direct Form 1
y[n] b0 z -1 b1 -a1 y[n-1] z -1 b2 -a2 y[n-2]
Direct Form 2
x[n] y[n] b0 z -1 -a1 z -1 -a2 b2
18
b1
// Persist state variables for next call state[0] = wNm1; state[1] = wNm2;
19
Ideal! Inner loop requires 5 instructions With SIMD can compute two filters in parallel
_sampleLoopEnd: //r14 = b1 * w[n-1], (new value for next loop iteration) f14=f3*f4,f12=f8+f15;
20
Work arounds
Manually unroll the inner loop by a factor of 4 (Similar techniques apply to the Cortex-A8)
21
Standard C Start with the textbook implementation of an algorithm and allow the C compiler to optimize as best as it can. Tuned C Hand optimize the code as best as possible while remaining in C. This involves loop unrolling, caching of variables, and using intrinsic functions. Assembly Get the absolute best performance possible using assembly coding.
22
Processors compared
Cortex-M4F. NXP LPC 43xx. 180 MHz Cortex-A9. TI OMAP 4430. 1 GHz 32-bit floating-point DSP. 400 MHz
23
24
Signal Flow
Top-level system
Bass subsystem
Tweeter subsystem
26
MIPS 0.28 0.59 0.59 0.68 2.75 0.72 1.54 0.49 3.03 0.3 0.52 1.65 0.45 6.87 0.52 0.62 0.29 21.89
SIMD No Yes Yes N/A Yes Yes No No N/A N/A N/A Yes Yes N/A N/A N/A No
Cortex-M4 MIPs SIMD 1.25 N/A 2.72 N/A 2.09 N/A 2.79 N/A 10.41 N/A 2.02 5.78 1.35 12.79 1.41 0.74 11.87 2.14 27.42 2.28 1.52 3.45 92.03
27
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
28
29
30
Conclusion
DSPs and microcontrollers are on a collision course Through careful programming techniques you can significantly increase the processing throughput of microcontrollers
Digital signal controllers (e.g., Cortex-M4) are capable of entry-level 2 channel audio processing High-end application processors (e.g., Cortex-A8 / A9) are capable of multichannel premium audio processing Microcontrollers will continue to add specialized audio peripherals in order to gain a foothold in the market DSPs will be pushed out of high volume consumer sockets and be reserved for specialized applications
Prediction
31
Thank You!