You are on page 1of 5

Aim: To implement Huffman coding using MATLAB

Experimental requirements: PC loaded with MATLAB software


Theory:
Source Coding
Source encoding (or source coding) is a process b the help of which the data
generated b a discrete source is efficientl represented! The de"ice that performs the
representation is called a source encoder!
#or a source encoder to be efficient$ we re%uire &nowledge of the statistics of the
source! 'n particular$ if some source smbols are &nown to be more probable than the others$
then we ma e(ploit this feature in the generation of a source code b assigning short code
words to fre%uent source smbols$ and long code words to rare source smbols! )e refer to
such a source code as a variable-length code!
An efficient source encoder should satisf the following two functional re%uirements*
+! The code words produced b the encoder are in binary form!
,! The source code is uniquely decodable$ so that the original source se%uence can be
reconstructed perfectl from the encoded binar se%uence!
Consider a discrete memorless source whose output s
k
is con"erted b the source encoder
into a bloc& of -s and +s$ denoted b b
k
! )e assume that the source has an alphabet with K
different smbols$ and that the kth smbol s
k
occurs with probabilit p
k
$ k . -$ +$ /$ K0+!
Let the binar code word assigned to s
k
b the encoder ha"e a length l
k
$ measured in bits! )e
define the average code word length$ of the source encoder as
L . p
k
l
k
k . -$ +$ /$ K0+
L represents the a"erage number of bits per source smbol used in the encoding process!
Let L
min
denote the minimum possible "alue of L! )e then define the coding efficiency of the source
encoder as
. L
min
1L
)ith L L
min
$ we clearl ha"e +! The source encoder is said to be efficient when approaches
unit!
Source Coding Theorem
2i"en a discrete memorless source of entrop H(S)$ the a"erage code0word length L for an
distortionless source encoding is bounded as

L H(S)
This is also &nown as Noiseless coding theorem or Shannons first theorem!
Accordingl$ the entrop H(S represents a fundamental limit on the a"erage number of bits
per source smbol necessar to represent a discrete memorless source! 'n other words$ L
cannot be made smaller than H(S)!
Thus with L
min
. H(S)$ we ma rewrite the efficienc of a source encoder in terms of the
entrop H(S) as
. H(S) 1L
Huffman Coding
Huffman code is a "ariable0length code! The basic principle of Huffman coding is to assign short
codes to smbols with high p$ and long codes to smbols with low p! The purpose is to construct a
code such that its a"erage code0word length L that approaches the minimum i!e! entrop of source
H!S"
The followings are the encoding steps*
+! The source smbols are listed in order of decreasing probabilit!
,! The two smbols with the lowest probabilit are assigned a - and a +!
3! The probabilities of the last two smbols are added!
4! The list of smbols is re0sorted with the last two smbols combined!
5! 6epeat steps ,$ 3$ 4 until there are onl two smbols left!
7! Assign the final two combined smbols a - and a +!
EXAMPLE
S#$% &' (enerating the message sequence
Let us generate a message which consists of one sentence of te(t* 8information theor is
interesting8! The message will be stored in a "ariable called msg! 'n the MATLAB command
window$ tpe
msg = 'information theory is interesting'
9ur message contains a total of 33 characters$ including spaces!
S#$% )' $stimating the source
Before a Huffman coder can properl encode the gi"en message$ it re%uires information
about the source alphabet and the source statistics! The source alphabet is a list of distinct
smbols in the message se%uence! There are altogether +4 distinct smbols in our message*
symb = {'i' 'n' 'f' 'o' 'r' 'm' 'a' 't' ' ' 'h' 'e' 'y' 's'
'g'}
The source statistics$ on the other hand$ is a list of probabilities corresponding to each smbol
in the source alphabet! Since we ha"e +4 distinct smbols$ our source statistics should ha"e
+4 probabilities*
p = [5/33 4/33 1/33 3/33 3/33 1/33 1/33 4/33 1/33 3/33 1/33
2/33 1/33 3/33]
9bser"e that we ha"e enclosed the elements of symb with curl braces { } instead of the
usual s%uare brac&ets [ ]! :oing so indicates that symb is a cell array$ instead of a
character array! The "ariable symb will become an input to a later function called
huffmandict$ which re%uires that the source alphabet be of cell data tpe!
S#$% *' +onstructing the Huffman codebook
Based on the supplied source alphabet and source statistics$ the MATLAB function
huffmandict generates the Huffman codeboo&*
dict = huffmandictsymb! p"
The entire codeboo& construction process (including the Huffman tree) is done in the
bac&ground$ and is in"isible to the user! The resulting codeboo& is stored in the "ariable
dict!
Li&e the "ariable symb pre"iousl$ the "ariable dict is also a cell arra! 'n order to access the
elements in dict$ we again use curl braces { }! To see the Huffman code for the first smbol$ tpe*
dict{1!#}
To "iew the Huffman code for the rest of the smbols$ simpl increment the subscript "alue*
dict{2!#}
dict{3!#}
$
S#$% ,' $ncoding the message
;ow that the Huffman codeboo& is read$ we ma proceed to con"ert the message se%uence
into a binar stream*
binstream = huffmanencomsg! dict"
The encoding process simpl consists of replacing each smbol in the message with the
corresponding binar word in the codeboo&!
S#$% -' .ecoding the binary stream
The binar stream will be transmitted o"er a communications channel to the recei"er! At the
recei"ing end$ it will be decoded in order to reco"er the original message se%uence! Message
decoding is generall performed with the help of a code tree! 'n MATLAB$ we simpl tpe*
msgdeco = huffmandecobinstream! dict"
Huffman code is a tpe of prefi/ code$ so no ambiguit will arise in the decoding process! 'f
our transmission is error-free$ the decoded message should match the original message
emitted b the source! Also note that the same dictionar (or codeboo&) is used for both
encoding and decoding!
Additional Commands
A) HUFFMANDICT
.0+# 1 H23345N.0+#!S647 %89:
2enerates a binar Huffman code dictionar using the ma(imum "ariance algorithm for the distinct
smbols gi"en b the S64 "ector! The smbols can be represented as a numeric "ector or single0
dimensional alphanumeric cell arra! The second input$ %89:$ represents the probabilit of
occurrence for each of these smbols! S64 and %89: must be of same length!
.0+# 1 H23345N.0+#!S647 %89:7 N
.0+# 1 H23345N.0+#!S647 %89:7 N7 ;5805N+$
<.0+#7 5;(L$N= 1 H23345N.0+#!"""
B) HUFFMANENC
$N+9 1 H23345N$N+9!S0(7 .0+#
<ncodes the input signal$ S0($ based on the code dictionar$ .0+#! The code dictionar is generated
using the H23345N.0+# function! <ach of the smbols appearing in S0( must be present in the
code dictionar$ .0+#! The S0( input can be a numeric "ector or a single0dimensional cell arra
containing alphanumeric "alues!
C) HUFFMANDEC
H23345N.$+9!+94%7 .0+#
:ecodes the numeric Huffman code "ector +94% using the code dictionar$ .0+#! The encoded
signal is generated b the H23345N$N+9 function! The code dictionar can be generated using the
H23345N.0+# function! The decoded signal will be a numeric "ector if the original signals are
onl numeric! 'f an signal "alue in .0+# is alphanumeric$ then the decoded signal will be
represented as a single0dimensional cell arra!
Procedure:
1. Run MATLAB
2. Open a new script fle
3. Write the code or !u"#an codin$ techni%ue.
&. Run the code or e'ecution and o(tain the necessar) results
MATLAB script:
Results:

You might also like