Data Transformation Unit

BACHELOR OF ENGINEERING PROJECT ON
INTEGRATED DATA TRANSFORMATION UNIT
Submitted By ADISH GULECHHA NEHA RASKAR SHAILESH TENDULKAR
IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE DEGREE OF BACHELOR OF ENGINEERING IN ELECTRONICS
UNDER THE GUIDANCE OF PROF.GIRISH GIDAYE
Department of Electronics Engineering
Vidyalankar Institute of Technology Wadala (E) Mumbai 400 037.
University of Mumbai 2011- 2012
CERTIFICATE This is to certify that ADISH GULECHHA NEHA RASKAR SHAILESH TENDULKAR Have successfully completed project titled INTEGRATED DATA TRANSFORMATION UNIT
IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE DEGREE OF BACHELOR OF ENGINEERING IN ELECTRONICS
Leading to Bachelors Degree in Engineering 2011-2012

UNDER THE GUIDANCE OF PROF. GIRISH GIDAYE
Signature of Guide
Head of Department
Examiner 1
Examiner 2
Principal
College Seal
ACKNOWLEDGEMENT
First and foremost, we would like to extend our deepest gratitude to our project guide, Professor Girish Gidaye, for giving us the opportunity to work on new areas of digital system design. Without his continued support and interest, this project would not have been the same as presented here.
My sincerest appreciation goes out to all those who have contributed directly and indirectly to the completion of this project. Of particular mention is Professor Shrikant Velankar for his guidance, advices and motivations. His constant encouragement, critics and guidance were a key to bringing this project to a fruitful completion.
My sincere appreciation also extends to all my colleagues and others who have provided assistance at various occasions. Their views and tips are useful indeed. At the same time, the constant encouragement and camaraderie shared between all my friends during my graduate studies has been an enriching experience.
ABSTRACT
Design of Integrated Data Transformation Unit on FPGA to increase the Bandwidth of the channel. The Data would be received from multiple data channels from host processors (PCs). The data streams would be first multiplexed to form a single data stream. Then the data stream would undergo Data compression by Run Length Encoding. Finally the compressed data stream would be encrypted using DES. This data stream would be communicated to another FPGA, where reverse process of decrypting, decompressing and de-multiplexing is carried out to retrieve the original data channels. For these operations, the data received / sent from / to the host processor (PC) on 4, 8 or 16 Channels and Multiplexer / Compression / Encryption as well as De-multiplexer / Decompression / Decryption logic would be implemented on 2 FPGAs. Hence, the concept of secured high bandwidth channel is implemented.
CONTENTS Sr. No. 1. List of Figures List of Tables 2. 3. Introduction Review of Literature 3.1 Verilog 3.2 FPGA 3.3 Multiplexer and De-multiplexer Unit 3.4 Compression and Decompression Unit 3.5 Encryption and Decryption Unit 4. 5. 6. 7. 8. 9. 10. Design Hierarchy Plan of work Testing and Results Discussion of Results Conclusion Appendix References Page Title Page No.
II III
1 3 3 5 7 9 11 22 24 25 29 31 32 48
List of Figures
FIGURE 1 STAGES IN VERILOG FIGURE 2 SYMBOL OF 4:1 MULTIPLEXER FIGURE 3 SYMBOL OF 1:4 DE-MULTIPLEXER FIGURE 4 DATA COMPRESSION MODEL FIGURE 5 DES ALGORITHM OVERVIEV FIGURE 6 KEY SCHEDULING FIGURE 7 CALCULATION OF F(R,K) FIGURE 8 TRANSMISSION SYSTEM HIERARCHICAL FLOW FIGURE 9 RECEIVER SYSTEM HIERARCHICAL FLOW FIGURE 10 RTL SCHEMATIC OF THE TRANSMITTER SYSTEM FIGURE 11 WAVEFORM FOR TRANSMITTER SYSTEM FIGURE 12 RTL SCHEMATIC OF THE RECEIVER SYSTEM FIGURE 13 WAVEFORM FOR RECEIVER SYSTEM FIGURE 14 HUFFMAN CODER BLOCK FIGURE 15 STRUCTURE OF LZSS ALGORITHM USED FIGURE 16 RTL SCHEMATIC OF DES ENCRYPTION FIGURE 17 RTL SCHEMATIC OF DES DECRYPTION FIGURE 18 RTL SCHEMATIC OF COMPLETE TRX SYSTEM FIGURE 19 RTL SCHEMATIC OF COMPLETE RX SYSTEM 5 7 8 9 14 16 19 22 23 26 26 28 28 32 33 36 36 38 38
II
List of Tables
TABLE 1 FUNCTION TABLE OF MUX TABLE 2 FUNCTION TABLE OF 1:4 DEMUX TABLE 3 PC-1 PERMUTED CHOICE 1 TABLE 4 PC-2 PERMUTED CHOICE 2 TABLE 5 IP INITIAL PERMUTATION MATRIX TABLE 6 INVERSE INITIAL PERMUTATION MATRIX TABLE 7 DEVICE UTILIZATION SUMMARY OF TRX SYSTEM TABLE 8 DEVICE UTILIZATION SUMMARY OF RX SYSTEM TABLE 9 TIMING REPORT OF TRANSMITTER CORE TABLE 10 TIMING REPORT OF SYSTEM CORE 7 8 15 17 18 21 25 27 29 30
III
2. INTRODUCTION
This project implements register-transfer-level design of a proprietary highspeed data transformation processor core using Verilog Hardware Description Language. In addition, this project also offers enhancements aimed at improving the design portability to any hardware implementation technologies. The main aim of this project has been to develop a core that processes data in a fast and a secure manner entirely in hardware. We have made use of the Data Encryption Standard (DES) and some other standard compression techniques like LZ77/LZSS which operate on the input stream of the data. All of this takes place completely in hardware which also increases the security of the system. We present the design of a complete Transmitter and the Receiver system which can be ported to the Xilinx Spartan 3/3E Family FPGA Boards.
The growing possibilities of modern communications need the special means of security especially on computer network. The network security is becoming more important as the amount of data being exchanged on the Internet is increasing. Security requirements are necessary both at the final user level and at the enterprise level, especially since the massive utilization of personal computers, networks, and the Internet with its global availability. Throughout time, computational security needs have been focused on different features: secrecy or confidentiality, identification, verification, non-repudiation, integrity control and availability. This has resulted in an explosive growth of the field of information hiding. In addition, the rapid growth of publishing and broadcasting technology also requires an alternative solution in hiding information.
Page | 1
The rapid growth of networking is driving high-bandwidth data transfers all over the world. Today, all the financial transactions, video surveillance, and ecommerce are performed online. All data transfers are carried over networks like LAN, WAN, and ATMs, which are interconnected with routers, switches, bridges, and other network equipment. The growth of virtual private networks (VPNs) and IP security solutions (IPSec) has heightened demand for secure, high performance data transfers.
Page | 2
3. REVIEW OF LITERATURE
3.1 VERILOG
Verilog hardware description language is an IEEE standard (IEEE std. 13641995) language used for describing the behaviour and functionality of digital circuits.
In the semiconductor and electronic design industry, Verilog is a hardware description language (HDL) used to model electronic systems. Verilog HDL, not to be confused with VHDL (a competing language), is most commonly used in the design, verification, and implementation of digital logic chips at the register-transfer level of abstraction. It is also used in the verification of analog and mixed-signal circuits.
Hardware description languages such as Verilog differ from software programming languages because they include ways of describing the propagation of time and signal dependencies (sensitivity). At the time of Verilog's introduction (1984), Verilog represented a tremendous productivity improvement for circuit designers who were already using graphical schematic capture software and specially written software programs to document and simulate electronic circuits.
Entry of large digital designs at the schematic level is very time consuming and can be exceedingly tedious for circuits with wide data paths that must be repeated for each bit of the data path. Hardware description languages (HDLs) provide a more compact textual description of a design. Verilog is a powerful language and offers several different levels of descriptions. The lowest level is the gate level, in which statements are used to define individual gates.
Page | 3
3.1.1 Structural v/s Behavioral Verilog
Behavioral modeling describes what a design must do, but does not have an obvious mapping to hardware. Behavioral Verilog is used to describe designs at a high level of abstraction, to design a processor at the gate level, in order to quantify the complexity and timing requirements of the design. Hence you will use structural Verilog only. The behavioral level of description is the most abstract, resembling C with function calls (called tasks), for and while loops, etc.
In the structural level, more abstract assign statements and always blocks are used. These constructs are more powerful and can describe a design with fewer lines of code, but still provide a clearly defined relationship to actual hardware.
Verilog libraries containing modules that will be the basic building blocks are used for design in Structural Verilog. These library parts include simple logic gates, registers, and memory modules, for example. While the library parts are designed behaviorally, they incorporate some timing information that will be used in simulations. Using the class libraries ensures a uniform timing standard for everyone.
Structural Verilog allows designers to describe a digital system as a hierarchical interconnection of modules.
The Verilog code for the project consists only of module definitions and their instances, the use of some behavioral Verilog for debugging purposes.
Page | 4
3.2 FPGA
A synthesis tool is used to translate the Verilog into actual hardware, such as logic gates on a custom Application Specific Integrated Circuit (ASIC) or configurable logic blocks (CLBs) on a Field Programmable Gate Array (FPGA).
Various stages of ASIC/FPGA
Figure 1 Stages in Verilog Page | 5
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by the customer or designer after manufacturinghence "fieldprogrammable". The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an applicationspecific integrated circuit (ASIC) (circuit diagrams were previously used to specify the configuration, as they were for ASICs, but this is increasingly rare). FPGAs can be used to implement any logical function that an ASIC could perform. The ability to update the functionality after shipping, partial reconfiguration of a portion of the design and the low non-recurring engineering costs relative to an ASIC design (notwithstanding the generally higher unit cost), offer advantages for many applications.
FPGAs contain programmable logic components called "logic blocks", and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together"somewhat like many (changeable) logic gates that can be interwired in (many) different configurations. Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory.
Page | 6
3.3 MULTIPLEXER AND DEMULTIPLEXER UNIT
3.3.1 MULTIPLEXER A multiplexer is a combinatorial circuit that is given a certain number (usually a power of two) data inputs, let us say 2n, and n address inputs used as a binary number to select one of the data inputs. The multiplexer has a single output, which has the same value as the selected data input.
Depending upon the digital code applied at the select inputs one out of n data input is selected& transmitted to a single o/p channel.
At face value a multiplexer is a logic circuit whose function is to select one data line from among many. For this reason, many people refer to multiplexers as data selectors.
Input S1 0 0 1 1 S0 0 1 0 1
Output Y D0 D1 D2 D3
Figure 2 Symbol of 4:1 Multiplexer
Table 1 Function Table of MUX
Page | 7
3.3.2 DEMULTIPLEXER The de-multiplexer is the inverse of the multiplexer, in that it takes a single data input and n address inputs. It has 2n outputs. The address input determine which data output is going to have the same value as the data input. The other data outputs will have the value 0.
Figure 3
Symbol of 1:4 De-multiplexer
Input E E E E E S0 0 1 0 1 S1 0 0 1 1 D0 E 0 0 0 D1 0 E 0 0
Output D2 0 0 E 0 D3 0 0 0 E
Table 2 Function Table of 1:4 DEMUX
Page | 8
3.4 COMPRESSION AND DECOMPRESSION UNIT
Data compression is the technique to reduce the redundancies in data representation in order to decrease data storage requirements and hence communication costs. Reducing the storage requirement is equivalent to increasing the capacity of the storage medium and hence communication bandwidth. Thus the development of efficient compression techniques will continue to be a design challenge for future communication systems and advanced multimedia applications.
Data is represented as a combination of information and redundancy. Information is the portion of data that must be preserved permanently in its original form in order to correctly interpret the meaning or purpose of the data. Redundancy is that portion of data that can be removed when it is not needed or can be reinserted to interpret the data when needed. Most often, the redundancy is reinserted in order to generate the original data in its original form. A technique to reduce the redundancy of data is defined as Data compression. The redundancy in data representation is reduced such a way that it can be subsequently reinserted to recover the original data, which is called decompression of the data.
Figure 4
Data compression model
When we speak of a compression technique or a compression algorithm we actually refer to two algorithms: the first one takes an input X and generates a representation XC that requires fewer bits; the second one is a reconstruction algorithm that operates on the compressed representation XC to generate the reconstruction Y. Page | 9
3.4.1 Types of Data Compression Models There are two types of data compression models: lossy and lossless. The lossy data compression works on the assumption that the data do not have to be stored perfectly. Text files (specially files containing computer programs) are stored using lossless techniques, since losing a single character can make, in the worst case, the text dangerously misleading. Lossless compression ensures that the original information can be exactly reproduced from the compressed data.
3.4.2 Advantages of Data Compression It reduces the data storage requirements The audience can experience rich-quality signals for audio-visual data representation Data security can also be greatly enhanced by encrypting the decoding parameters and transmitting them separately from the compressed database files to restrict access of proprietary information The rate of input-output operations in a computing device can be greatly increased due to shorter representation of data Data Compression obviously reduces the cost of backup and recovery of data in computer systems by storing the backup of large database files in compressed form
The technique used in the design of high-speed data compression and decompression processor cores is based on combination of LZSS compression algorithm and Huffman coding. The source data to be compressed is first processed by the LZSS compression technique since the algorithm is not restricted in what type of data it can process, coupled with the fact that it requires no a priori knowledge of the source. LZSS codeword is then generated whenever matches between the source data and the dictionary elements are detected, where the encoded data are represented as position-length pair codeword. Page | 10
3.5 ENCRYPTION AND DECRYPTION UNIT
Fast computers and advances in telecommunications have made high-speed, global, widespread computer networks possible, in particular the Internet, which is an open network. It has increased the access to databases, such as the open World Wide Web. To decrease communication cost and to be user friendly, private databases containing medical records, proprietary
information, tax information, etc., are often accessible via the Internet by using a low-security password scheme.
The privacy of data is obviously vulnerable during communication, and data in transit can be modified, in particular in open networks. Because of the lack of secure computers, such concerns extend to stored data. Data communicated and/or and accessible over such networks include bank and other financial transactions, love letters, medical records, proprietary information, etc., whose privacy must be protected. The authenticity of (the data in) contracts, databases, electronic commerce, etc. must be protected against modifications by outsider or by one of the parties involved in the transaction.
Modern cryptography provides the means to address these issues. Cryptography includes two basic components: Encryption algorithm and Keys. If sender and recipient use the same key then it is known as symmetrical or private key cryptography. It is always suitable for long data streams. Such system is difficult to use in practice because the sender and receiver must know the key. It also requires sending the keys over a secure channel from sender to recipient. The question is that if secure channel already exist then transmit the data over the same channel.
On the other hand, if different keys are used by sender and recipient then it is known as asymmetrical or public key cryptography. The key used for encryption is called the public key and the key used for decryption is called the private key. Such technique is used for short data streams and also requires more time to encrypt the data. Page | 11
3.5.1 Techniques of Cryptography There are two techniques used for data encryption and decryption, which are:
A] Symmetric Cryptography If sender and recipient use the same key then it is known as symmetrical or private key cryptography. It is always suitable for long data streams. Such system is difficult to use in practice because the sender and receiver must know the key. It also requires sending the keys over a secure channel from sender to recipient.
There are two methods that are used in symmetric key cryptography: block and stream. The block method divides a large data set into blocks (based on predefined size or the key size), encrypts each block separately and finally combines blocks to produce encrypted data. The stream method encrypts the data as a stream of bits without separating the data into blocks. The stream of bits from the data is encrypted sequentially using some of the results from the previous bit until all the bits in the data are encrypted as a whole.
B] Asymmetric Cryptography If sender and recipient use different keys then it is known as asymmetrical or public key cryptography. The key used for encryption is called the public key and the key used for decryption is called the private key. Such technique is used for short data streams and also requires more time to encrypt the data. Asymmetric encryption techniques are almost 1000 times slower than symmetric techniques, because they require more computational processing power. To get the benefits of both methods, a hybrid technique is usually used. In this technique, asymmetric encryption is used to exchange the secret key; symmetric encryption is then used to transfer data between sender and receiver.
Page | 12
3.5.2 DES ALGORITHM Data Encryption Standard (DES) is a cryptographic standard that was proposed as the algorithm for the secure and secret items in 1970 and was adopted as an American federal standard by National Bureau of Standards (NBS) in 1973. DES is a block cipher, which means that during the encryption process, the plaintext is broken into fixed length blocks and each block is encrypted at the same time. Basically it takes a 64 bit input plain text and a key of 64-bits (only 56 bits are used for conversion purpose and rest bits are used for parity checking) and produces a 64 bit cipher text by encryption and which can be decrypted again to get the message using the same key.
Additionally, we must highlight that there are four standardized modes of operation of DES: ECB (Electronic Codebook mode) CBC (Cipher Block Chaining mode) CFB (Cipher Feedback mode) and OFB (Output Feedback mode)
The general depiction of DES encryption algorithm which consists of initial permutation of the 64 bit plain text and then goes through 16 rounds, where each round consists permutation and substitution of the text bit and the inputted key bit, and at last goes through an inverse initial permutation to get the 64 bit cipher text
Page | 13
Figure 5 DES Algorithm Overview
Page | 14
3.5.3 Steps for Algorithm
Step 1: Create 16 sub-keys, each of which is 48-bits long
The 64-bit key is permuted according to the following table, PC-1. Since the first entry in the table is "57", this means that the 57th bit of the original key K becomes the first bit of the permuted key K+. The 49th bit of the original key becomes the second bit of the permuted key. The 4th bit of the original key is the last bit of the permuted key. Note only 56 bits of the original key appear in the permuted key.
. Table 3 PC-1 Permuted choice 1
Next, split this key into left and right halves, C0 and D0, where each half has 28 bits. From the permuted key K+, we get C0 = 0011001111000011001100111100 D0 = 0011001111000011001100110011
Page | 15
With C0 and D0 defined, we now create sixteen blocks Cn and Dn, 1<=n<=16. Each pair of blocks Cn and Dn is formed from the previous pair Cn-1 and Dn-1, respectively, for n = 1, 2, 16, using the schedule of "left shifts" of the previous block. To do a left shift, move each bit one place to the left, except for the first bit, which is cycled to the end of the block.
This means, for example, C3 and D3 are obtained from C2 and D2, respectively, by two left shifts, and C16 and D16 are obtained from C15 and D15, respectively, by one left shift. In all cases, by a single left shift is meant a rotation of the bits one place to the left, so that after one left shift the bits in the 28 positions are the bits that were previously in positions 2, 3,..., 28, 1.
Figure 6 Key Scheduling Page | 16
We now form the keys Kn, for 1<=n<=16, by applying the following permutation table to each of the concatenated pairs Cn Dn. Each pair has 56 bits, but PC-2 only uses 48 of these.
Table 4 PC-2 Permuted Choice 2
Therefore, the first bit of Kn is the 14th bit of CnDn, the second bit the 17th, and so on, ending with the 48th bit of Kn being the 32th bit of CnDn.
Page | 17
Step 2: Encode each 64-bit block of data There is an initial permutation IP of the 64 bits of the message data M. This rearranges the bits according to the following table, where the entries in the table show the new arrangement of the bits from their initial order.
Table 5 IP Initial Permutation Matrix
Here the 58th bit of M is "1", which becomes the first bit of IP. The 50th bit of M is "1", which becomes the second bit of IP. The 7th bit of M is "0", which becomes the last bit of IP.
Next divide the permuted block IP into a left half L0 of 32 bits, and a right half R0 of 32 bits.
We now proceed through 16 iterations, for 1<=n<=16, using a function f which operates on two blocks--a data block of 32 bits and a key Kn of 48 bits--to produce a block of 32 bits. Let + denote XOR addition, (bit-by-bit addition modulo 2).
Then for n going from 1 to 16 we calculate Ln = Rn-1 Rn = Ln-1 + f(Rn-1,Kn) Page | 18
This results in a final block, for n = 16, of L16 R16. That is, in each iteration, we take the right 32 bits of the previous result and make them the left 32 bits of the current step.
For the right 32 bits in the current step, we XOR the left 32 bits of the previous step with the calculation f. R1 = L0 + f(R0,K1)
To calculate f, we first expand each block Rn-1 from 32 bits to 48 bits. This is done by using a selection table that repeats some of the bits in Rn-1 We'll call the use of this selection table the function E. Thus E(Rn-1) has a 32 bit input block, and a 48 bit output block. Thus the first three bits of E(Rn-1) are the bits in positions 32, 1 and 2 of Rn-1 while the last 2 bits of E(Rn-1) are the bits in positions 32 and1. (Note that each block of 4 original bits has been expanded to a block of 6 output bits.)
Next in the f calculation, we XOR the output E(Rn-1) with the key Kn: Kn + E(Rn-1).
Figure 7 Calculation of f(R,K) Page | 19
To this point we have expanded Rn-1 from 32 bits to 48 bits, using the selection table, and XORed the result with the key Kn . We now have 48 bits, or eight groups of six bits. We now do something strange with each group of six bits: we use them as addresses in tables called "S boxes". Each group of six bits will give us an address in a different S box. Located at that address will be a 4 bit number. This 4 bit number will replace the original 6 bits.
The net result is that the eight groups of 6 bits are transformed into eight groups of 4 bits (the 4-bit outputs from the S boxes) for 32 bits total.
Write the previous result, which is 48 bits, in the form: Kn + E(Rn-1) =B1B2B3B4B5B6B7B8, where each Bi is a group of six bits. We now calculate S1(B1)S2(B2)S3(B3)S4(B4)S5(B5)S6(B6)S7(B7)S8(B8) where Si(Bi) refers to the output of the i-th S box.
To repeat, each of the functions S1, S2,..., S8, takes a 6-bit block as input and yields a 4-bit block as output.
The final stage in the calculation of f is to do a permutation P of the S-box output to obtain the final value of f: f = P(S1(B1)S2(B2)...S8(B8))
P yields a 32-bit output from a 32-bit input by permuting the bits of the input block.
We calculate, R2 =L1 +f(R1, K2), and so on for 16 rounds. At the end of the sixteenth round we have the blocks L16 and R16. We then reverse the order of the two blocks into the 64-bit block R16 L16 and apply a final permutation IP-1 as defined by the following table:
Page | 20
Table 6 Inverse Initial Permutation Matrix
Decryption is simply the inverse of encryption, following the same steps as above, but reversing the order in which the sub-keys are applied.
Page | 21
4. DESIGN HIERARCHY
4.1 Transmission System
Figure 8 Transmission System Hierarchical Flow
Page | 22
4.2 Receiver System
Figure 9 Receiver System Hierarchical Flow
Page | 23
5. PLAN OF WORK
August
Formation of final block diagram
September
Study and Selection of algorithms for compression core
October
November
Study of algorithms for encryption core
Selection and Study of hard ware description language
January

February
Coding of MUX and DEMUX unit in Verilog Study of DES encryption algorithm
Coding and implementation of DES encryption and Decryption Unit
March

April
Decision of not implementing compression core because of increase in complexity
Final system connections and structuring
Implementation of final system
Page | 24
6. TESTING AND RESULTS

6.1 Transmission System
6.1.1 Device Utilization Summary
DEVICE UTILIZATION SUMMARY
Logic Utilization Number of slices Number of slice Flip Flops Number of 4 input LUTs Number of bounded IOBs Number of BRAMs Number of GCLKs
Used 559 487
Available 3584 7168
Utilization 15% 6%
989
7168
13%
17
141
12%
4 1
16 8
25% 12%
Table 7 Device Utilization Summary of Trx system
Page | 25
Figure 10 RTL Schematic of the Transmitter System
Figure 11 Waveform for Transmitter System Page | 26
6.2
Receiver System
6.2.1 Device Utilization Summary
DEVICE UTILIZATION SUMMARY
Logic Utilization Number of slices Number of slice Flip Flops Number of 4 input LUTs Number of bounded IOBs Number of BRAMs Number of GCLKs
Used 772 743
Available 3584 7168
Utilization 21% 10%
1185
71568
16%
20
141
14%
16
25%
62%
Table 8 Device Utilization summary of Rx System
Page | 27
Figure 12 RTL Schematic of the Receiver System
Figure 13 Waveform for Receiver System Page | 28
7. DISCUSSION OF RESULTS
7.1 Timing Report of Transmitter Core
Delay: Source: Destination: Data Path:
9.534ns (Levels of Logic = 4) Sel<0> (PAD) sample<58> (PAD) Sel<0> to sample<58>
Cell: in->out
Fanout
Delay
Delay
IBUF:I->O LUT3:I0>O MUXF5:I1>O OBUF:I->O Total
128 1 4
0.715 0.479 0.314 4.909
2.338 0.000 0.779
9.534ns (6.417ns logic, 3.117ns route) (67.3% logic, 32.7% route)
Table 9 Timing Report of Transmitter Core
Page | 29
7.2 Timing Report of System core
Offset: Source: Destination:
10.138ns (Levels of Logic = 4) T/bitcounter_5 (FF) SERIAL_CIPHER_TEXT (PAD)
Source Clock: CLK rising Data Path: T/bitcounter_5 to SERIAL_CIPHER_TEXT
Cell:in->out FDRE:C->Q LUT4:I0->O LUT4:I3->O LUT4:I3->O OBUF:I->O
Fanout 5 1 1 1
Gate Delay 0.626 0.479 0.479 0.479 4.909
Net Delay 1.078 0.704 0.704 0.681
Total
10.138ns (6.972ns logic, 3.166ns route) (68.8% logic, 31.2% route) Table 10 Timing Report of System Core
Page | 30
8. CONCLUSION
A proprietary high-speed encryption and decryption core design is analyzed. It is observed from the timing reports that the computations of the System Core occur at a very high speed as compared to the existing software Prototypes.
Since the data is sent on an FPGA the data sent is secured as it is a Hardware Channel. Hence, due to the Hardware Implementation of such a system a secure and fast data transfer takes place.
The first limitation is that of the hardware implementation of the compression core which occurs due to the complexity of the algorithm to be implemented in HDL. The second limitation is the data sent is sent serially through a PC which makes the system slow (UART).
A complete system core and its associated test firmware are also developed that form the hardware evaluation platform. Using this evaluation platform, functionality of the design running on real hardware is proven.
Page | 31
9. Appendix A
CORE DESIGN
Design of Compression Unit
The main hardware module of the compression unit consists of three hierarchical blocks, which are the LZSS coder, fixed Huffman coder and data packer. All modules are synchronously clocked. The LZSS coder performs the LZSS encoding of the source data symbol, while the fixed Huffman coder reencodes the length of LZSS codeword to achieve better compression ratio. Finally, the data packer packs the unary codes from the fixed Huffman coder into a fixed-length output packet and sends it to the interfacing block.
Figure 14 Huffman Coder Block
This suggests Huffman coding be employed to further encode the length portion of LZSS code-word in order to achieve higher compression saving. In the decompression side, the whole process is performed in the reverse order.
Page | 32
1. LZSS CODER
The LZSS algorithm, however, involves computationally intensive matching process during the compression stage because each input phrase has to be compared with every possible phrase in the dictionary. Furthermore, the dictionary updating process involves variable length shifting of the input source into the dictionary, since the length of longest matched phrase changes with time. If this operation is done using variable-length shifter, considerable amount of hardware resources will be consumed, which can lead to higher implementation cost because bigger (and correspondingly, more expensive) programmable logic device or ASIC silicon is needed. The design tackles these problems through systolic array architecture of the LZSS compression dictionary, where each input data is compared with every dictionary elements simultaneously, while shifting input data is done one symbol at a time through the use of a fixed-length shifter.
Figure 15 Structure of LZSS Algorithm Used
In order to achieve sufficiently high processing speed to obtain data independent throughput, and to use fixed-length shifter to reduce the hardware resource utilization, the LZSS coder design employs systolic arrays Page | 33
architecture. The hardware architecture consists of four main components; namely the dictionary, reduction tree, delay tree and codeword generator submodules.
2. HUFFMAN CODER
The Huffman coding technique also presents certain design challenges. Conventional Huffman coding requires a priori knowledge of the source data distribution characteristics in order to construct an optimal encoding table for better performance.
However, in many real-life applications, it is difficult to determine the characteristics of source data because its probability distribution normally changes with time. Even when the source distribution statistics are available, different sources have different distribution characteristics. The encoding table must then be generated for each type of source data. Furthermore, the generated table must be transmitted along with the encoded data so that decompression can be performed correctly. This would both reduce the compression saving and increase the processing time of the hardware. The design tackles these problems by employing a predefined Huffman encoding table for both compression and decompression cores.
The reason for this is two-fold; the first one is to simplify generation of the encoding table since adaptively building the table for different source data is no longer required. The second reason is to eliminate the need to transmit the encoding table to the decompression side, so that inefficient resource utilization and degradation of compression saving issues due to this encoding table transmission can be overcome.
Page | 34
DES ENCRYPTION AND DECRYPTION SYSTEM CORE
Page | 35
Figure 16 RTL Schematic of DES - ENCRYPTION
Figure 17 RTL Schematic of DES - DECRYPTION
Page | 36
COMPLETE SYSTEM CORE
Page | 37
Figure 18 RTL Schematic of Complete Trx System
Figure 19 RTL Schematic of Complete Rx System
Page | 38
APPENDIX B
TRANSMITTER SYSTEM CORE VERILOG CODE
This appendix presents the Verilog source codes of the transmitter system core and all its sub-modules. The design hierarchy is presented in Design Hierarchy. The Verilog source codes starting from the top level module are presented here. However, the complete codes are not given in the report.
Module Name: Transmitter_System_Top
module
Transmitter_System_Top(CLK,
RST,
CHIP_SELECT_BAR,
ADDRESS, SERIAL_CIPHER_TEXT, Sel, transmit, waddress, we, cs_ram_rec, cs_ram_tx, ENA, DIN, RD, DR ); //Input Signals input ENA,DIN,RD; input CLK; input RST; input cs_ram_rec, cs_ram_tx; input CHIP_SELECT_BAR; input ADDRESS; input transmit,we; input [3:0] waddress; input [1:0]Sel; Page | 39
//Output Signals output DR; output SERIAL_CIPHER_TEXT;
// Internal Wires wire wire wire wire wire CLK; RST; CHIP_SELECT_BAR; ADDRESS; [64 : 1] CIPHER_TEXT_RAM;
wire [1:0]Sel; wire [64:1]I3,I2,I1,I0; wire [64:1]O; wire [64:1] inter_mux; wire [64:1] to_tx; wire [64:1] to_ram;
// Receiver Module RX Receiver( .CLK(CLK), .RST(RST), .DIN(DIN), .ENA(ENA), .RD(RD), .DR(DR), .DOUT(to_ram) );
// Receiver RAM module ram1 RAM_REC( .CLK(CLK), .waddress(waddress), .data_in(to_ram), .we(we), Page | 40
.cs(cs_ram_rec), .data_out(inter_mux) );
// MUX module mux4to1 MUX( .I0(inter_mux), .I1(inter_mux), .I2(inter_mux), .I3(inter_mux), .Sel(Sel), .Y(O) );
// Encryption Module Des_Top ENCRYPT( .CLK(CLK), .RST(RST), .CHIP_SELECT_BAR(CHIP_SELECT_BAR), .ADDRESS(ADDRESS), .PLAIN_TEXT(O), .CIPHER_TEXT(CIPHER_TEXT_RAM) );
//Transmitter RAM Module ram1 RAM_TX( .CLK(CLK), .waddress(waddress), .data_in(CIPHER_TEXT_RAM), .we(we), .cs(cs_ram_tx), .data_out(to_tx) );
Page | 41
//Transmitter Module Transmitter T( .CLK(CLK), .RST(RST), .transmit(transmit), .data(to_tx), .TxD(SERIAL_CIPHER_TEXT) );
endmodule
Page | 42
APPENDIX C
DATASHEETS
Page | 43
Page | 44
Page | 45
Page | 46
Page | 47
10. REFERENCE
[1] J. Gailly, GZIP the Data Compression Program, 1993.
ftp://ftp.gnu.org/gnu/GZIP/ GZIP-1.2.4.tar.gz. [2] T. A. Welch, A Technique for High-Performance Data Compression, IEEE Computer., vol. 17, pp. 819, 1984. [3] J. Ziv and A. Lempel, Compression of Individual Sequences Via VariableRate Coding, IEEE Transactions on Information Theory, 1978. [4] S. Leinen, Long-Term Traffic Statistics, 2001.
http://www.cs.columbia.edu/hgs/ internet/traffic.html. [5] L. Deutsch, DEFLATE Compressed Data Format Specification Version 1.3, 1996. ftp: //ftp.uu.net/pub/archiving/zip/doc/. [6] D. Huffman, A Method for the Construction of Minimum-Redundancy Codes, Proceedings of the Institute of Radio Engineers, vol. 40, pp. 1098 1101, September 1952. [7] J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data Compression, IEEE Transactions on Information Theory, vol. 23, no. 3, pp. 337343, 1977. [8] Z. Li and S. Hauck, Configuration Compression for Virtex FPGAs, Field Programmable Custom Computing Machines, pp. 147159, 2001. [9] J. Storer and T. Szymanski, Data Compression via Textual Substitution, Journal of the ACM, vol. 29, no. 4, pp. 928951, 1982. [10] N. Larsson, Extended Application of Suffix Trees to Data Compression, Proceedings of the Conference on Data Compression, p. 190, 1996. [11] T. C. Bell and D. Kulp, Longest-Match String Searching for Ziv-Lempel Compression, Software - Practice and Experience, vol. 23, no. 7, pp. 757 771, 1993. [12] Suzanne Rigler, FPGA-Based Lossless Data Compression Using GNU Zip.
Page | 48

Data Transformation Unit

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Transformation Unit

Uploaded by

Copyright:

Available Formats

BACHELOR OF ENGINEERING PROJECT ON

INTEGRATED DATA TRANSFORMATION UNIT

Submitted By ADISH GULECHHA NEHA RASKAR SHAILESH TENDULKAR

UNDER THE GUIDANCE OF PROF.GIRISH GIDAYE

Department of Electronics Engineering

Vidyalankar Institute of Technology Wadala (E) Mumbai 400 037.

University of Mumbai 2011- 2012

Leading to Bachelors Degree in Engineering 2011-2012

3.1.1 Structural v/s Behavioral Verilog

Various stages of ASIC/FPGA

Figure 1 Stages in Verilog Page | 5

3.3 MULTIPLEXER AND DEMULTIPLEXER UNIT

Figure 2 Symbol of 4:1 Multiplexer

Table 1 Function Table of MUX

Symbol of 1:4 De-multiplexer

Table 2 Function Table of 1:4 DEMUX

3.4 COMPRESSION AND DECOMPRESSION UNIT

Data compression model

3.5 ENCRYPTION AND DECRYPTION UNIT

Figure 5 DES Algorithm Overview

3.5.3 Steps for Algorithm

Step 1: Create 16 sub-keys, each of which is 48-bits long

. Table 3 PC-1 Permuted choice 1

Figure 6 Key Scheduling Page | 16

Table 4 PC-2 Permuted Choice 2

Table 5 IP Initial Permutation Matrix

Then for n going from 1 to 16 we calculate Ln = Rn-1 Rn = Ln-1 + f(Rn-1,Kn) Page | 18

Figure 7 Calculation of f(R,K) Page | 19

Table 6 Inverse Initial Permutation Matrix

Figure 8 Transmission System Hierarchical Flow

4.2 Receiver System

Figure 9 Receiver System Hierarchical Flow

Study of algorithms for encryption core

Selection and Study of hard ware description language

Coding and implementation of DES encryption and Decryption Unit

Decision of not implementing compression core because of increase in complexity

Final system connections and structuring

Implementation of final system

6. TESTING AND RESULTS

6.1.1 Device Utilization Summary

DEVICE UTILIZATION SUMMARY

Used 559 487

Available 3584 7168

Table 7 Device Utilization Summary of Trx system

Figure 10 RTL Schematic of the Transmitter System

Figure 11 Waveform for Transmitter System Page | 26

6.2.1 Device Utilization Summary

DEVICE UTILIZATION SUMMARY

Used 772 743

Available 3584 7168

Utilization 21% 10%

Table 8 Device Utilization summary of Rx System

Figure 12 RTL Schematic of the Receiver System

Figure 13 Waveform for Receiver System Page | 28

Delay: Source: Destination: Data Path:

9.534ns (Levels of Logic = 4) Sel<0> (PAD) sample<58> (PAD) Sel<0> to sample<58>

IBUF:I->O LUT3:I0>O MUXF5:I1>O OBUF:I->O Total

0.715 0.479 0.314 4.909

2.338 0.000 0.779

9.534ns (6.417ns logic, 3.117ns route) (67.3% logic, 32.7% route)

Table 9 Timing Report of Transmitter Core

7.2 Timing Report of System core

Offset: Source: Destination: