The document describes implementing Huffman coding in MATLAB. It explains the theory behind source coding and Huffman coding. The steps include: (1) defining the source alphabet and statistics, (2) using Huffmandict to construct a codebook, (3) encoding a message string into a binary stream using the codebook, and (4) decoding the binary stream back into the original message string. The goal is to assign shorter codes to more frequent symbols and achieve an average code length close to the source entropy.
The document describes implementing Huffman coding in MATLAB. It explains the theory behind source coding and Huffman coding. The steps include: (1) defining the source alphabet and statistics, (2) using Huffmandict to construct a codebook, (3) encoding a message string into a binary stream using the codebook, and (4) decoding the binary stream back into the original message string. The goal is to assign shorter codes to more frequent symbols and achieve an average code length close to the source entropy.
The document describes implementing Huffman coding in MATLAB. It explains the theory behind source coding and Huffman coding. The steps include: (1) defining the source alphabet and statistics, (2) using Huffmandict to construct a codebook, (3) encoding a message string into a binary stream using the codebook, and (4) decoding the binary stream back into the original message string. The goal is to assign shorter codes to more frequent symbols and achieve an average code length close to the source entropy.
Experimental requirements: PC loaded with MATLAB software
Theory: Source Coding Source encoding (or source coding) is a process b the help of which the data generated b a discrete source is efficientl represented! The de"ice that performs the representation is called a source encoder! #or a source encoder to be efficient$ we re%uire &nowledge of the statistics of the source! 'n particular$ if some source smbols are &nown to be more probable than the others$ then we ma e(ploit this feature in the generation of a source code b assigning short code words to fre%uent source smbols$ and long code words to rare source smbols! )e refer to such a source code as a variable-length code! An efficient source encoder should satisf the following two functional re%uirements* +! The code words produced b the encoder are in binary form! ,! The source code is uniquely decodable$ so that the original source se%uence can be reconstructed perfectl from the encoded binar se%uence! Consider a discrete memorless source whose output s k is con"erted b the source encoder into a bloc& of -s and +s$ denoted b b k ! )e assume that the source has an alphabet with K different smbols$ and that the kth smbol s k occurs with probabilit p k $ k . -$ +$ /$ K0+! Let the binar code word assigned to s k b the encoder ha"e a length l k $ measured in bits! )e define the average code word length$ of the source encoder as L . p k l k k . -$ +$ /$ K0+ L represents the a"erage number of bits per source smbol used in the encoding process! Let L min denote the minimum possible "alue of L! )e then define the coding efficiency of the source encoder as . L min 1L )ith L L min $ we clearl ha"e +! The source encoder is said to be efficient when approaches unit! Source Coding Theorem 2i"en a discrete memorless source of entrop H(S)$ the a"erage code0word length L for an distortionless source encoding is bounded as
L H(S) This is also &nown as Noiseless coding theorem or Shannons first theorem! Accordingl$ the entrop H(S represents a fundamental limit on the a"erage number of bits per source smbol necessar to represent a discrete memorless source! 'n other words$ L cannot be made smaller than H(S)! Thus with L min . H(S)$ we ma rewrite the efficienc of a source encoder in terms of the entrop H(S) as . H(S) 1L Huffman Coding Huffman code is a "ariable0length code! The basic principle of Huffman coding is to assign short codes to smbols with high p$ and long codes to smbols with low p! The purpose is to construct a code such that its a"erage code0word length L that approaches the minimum i!e! entrop of source H!S" The followings are the encoding steps* +! The source smbols are listed in order of decreasing probabilit! ,! The two smbols with the lowest probabilit are assigned a - and a +! 3! The probabilities of the last two smbols are added! 4! The list of smbols is re0sorted with the last two smbols combined! 5! 6epeat steps ,$ 3$ 4 until there are onl two smbols left! 7! Assign the final two combined smbols a - and a +! EXAMPLE S#$% &' (enerating the message sequence Let us generate a message which consists of one sentence of te(t* 8information theor is interesting8! The message will be stored in a "ariable called msg! 'n the MATLAB command window$ tpe msg = 'information theory is interesting' 9ur message contains a total of 33 characters$ including spaces! S#$% )' $stimating the source Before a Huffman coder can properl encode the gi"en message$ it re%uires information about the source alphabet and the source statistics! The source alphabet is a list of distinct smbols in the message se%uence! There are altogether +4 distinct smbols in our message* symb = {'i' 'n' 'f' 'o' 'r' 'm' 'a' 't' ' ' 'h' 'e' 'y' 's' 'g'} The source statistics$ on the other hand$ is a list of probabilities corresponding to each smbol in the source alphabet! Since we ha"e +4 distinct smbols$ our source statistics should ha"e +4 probabilities* p = [5/33 4/33 1/33 3/33 3/33 1/33 1/33 4/33 1/33 3/33 1/33 2/33 1/33 3/33] 9bser"e that we ha"e enclosed the elements of symb with curl braces { } instead of the usual s%uare brac&ets [ ]! :oing so indicates that symb is a cell array$ instead of a character array! The "ariable symb will become an input to a later function called huffmandict$ which re%uires that the source alphabet be of cell data tpe! S#$% *' +onstructing the Huffman codebook Based on the supplied source alphabet and source statistics$ the MATLAB function huffmandict generates the Huffman codeboo&* dict = huffmandictsymb! p" The entire codeboo& construction process (including the Huffman tree) is done in the bac&ground$ and is in"isible to the user! The resulting codeboo& is stored in the "ariable dict! Li&e the "ariable symb pre"iousl$ the "ariable dict is also a cell arra! 'n order to access the elements in dict$ we again use curl braces { }! To see the Huffman code for the first smbol$ tpe* dict{1!#} To "iew the Huffman code for the rest of the smbols$ simpl increment the subscript "alue* dict{2!#} dict{3!#} $ S#$% ,' $ncoding the message ;ow that the Huffman codeboo& is read$ we ma proceed to con"ert the message se%uence into a binar stream* binstream = huffmanencomsg! dict" The encoding process simpl consists of replacing each smbol in the message with the corresponding binar word in the codeboo&! S#$% -' .ecoding the binary stream The binar stream will be transmitted o"er a communications channel to the recei"er! At the recei"ing end$ it will be decoded in order to reco"er the original message se%uence! Message decoding is generall performed with the help of a code tree! 'n MATLAB$ we simpl tpe* msgdeco = huffmandecobinstream! dict" Huffman code is a tpe of prefi/ code$ so no ambiguit will arise in the decoding process! 'f our transmission is error-free$ the decoded message should match the original message emitted b the source! Also note that the same dictionar (or codeboo&) is used for both encoding and decoding! Additional Commands A) HUFFMANDICT .0+# 1 H23345N.0+#!S647 %89: 2enerates a binar Huffman code dictionar using the ma(imum "ariance algorithm for the distinct smbols gi"en b the S64 "ector! The smbols can be represented as a numeric "ector or single0 dimensional alphanumeric cell arra! The second input$ %89:$ represents the probabilit of occurrence for each of these smbols! S64 and %89: must be of same length! .0+# 1 H23345N.0+#!S647 %89:7 N .0+# 1 H23345N.0+#!S647 %89:7 N7 ;5805N+$ <.0+#7 5;(L$N= 1 H23345N.0+#!""" B) HUFFMANENC $N+9 1 H23345N$N+9!S0(7 .0+# <ncodes the input signal$ S0($ based on the code dictionar$ .0+#! The code dictionar is generated using the H23345N.0+# function! <ach of the smbols appearing in S0( must be present in the code dictionar$ .0+#! The S0( input can be a numeric "ector or a single0dimensional cell arra containing alphanumeric "alues! C) HUFFMANDEC H23345N.$+9!+94%7 .0+# :ecodes the numeric Huffman code "ector +94% using the code dictionar$ .0+#! The encoded signal is generated b the H23345N$N+9 function! The code dictionar can be generated using the H23345N.0+# function! The decoded signal will be a numeric "ector if the original signals are onl numeric! 'f an signal "alue in .0+# is alphanumeric$ then the decoded signal will be represented as a single0dimensional cell arra! Procedure: 1. Run MATLAB 2. Open a new script fle 3. Write the code or !u"#an codin$ techni%ue. &. Run the code or e'ecution and o(tain the necessar) results MATLAB script: Results: