Professional Documents
Culture Documents
N. Kumaresan S. Kodeeswaran
Department of ECE, Anna University, Department of ECE, Anna University,
Regional Campus Coimbatore, Tamil Nadu, India Regional Campus Coimbatore, Tamil Nadu, India
kumaresanauc@gmail.com skodeeswaranme@gmail.com
Abstract—Traditional cryptographic algorithms are developed on a software platform and provides information security schemes. Also, some
processors have performed one of the crypto algorithms (either prime field or binary extension field) on chip level with optimal performance.
The objective is to design and implement both symmetric key and public key algorithms of a cryptographic on chip level and make better
architecture with pleasing performance. Crypto-processor design, have been designed with unified field instructions to make different processor
architecture and improve system performance. The proposed high speed Montgomery modular multiplication and high radix Montgomery
multiplication algorithms for pairing computation supports the public key algorithm. This design has been developed using Verilog HDL‟s and
verified using ModelSim-Altera 6.4a, and it has synthesized with Xilinx 9.1 Integrated Synthesis Environment (ISE) tool.
__________________________________________________*****_________________________________________________
157
IJRITCC | October 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 10 157 – 161
_______________________________________________________________________________________________
The Crypto-processor is properly connected with a clock mathematical approach of the algorithm and straight forward
and reset. The clock defines the operating speed of the design like memory and decoder designs are developed using
processor and reset signal can make the system start from Hardware Description Language (HDL). Once the design is
initial incident at any time. For this processor, there is a set of written in HDL, it is possible to verify using simulation tools.
Opcode, it shows which function processor shall have to do. The processor has been developed using Verilog HDL. Also, it
When initializing Opcode to any of the given instruction, it produces device utilization summary and timing summary.
fetches the source address and destination address for this Finally the design could implement it on any FPGA‟s.
particular operation that can be handled by the instruction Proposed design has explained with a following flow
decoder. This information is passed to data memory and ALU. diagram. Each algorithm has had a sequence of steps and
Data memory can provide operand information to ALU unit as Figure 2 shows a flow diagram of High radix Montgomery
per source address and ALU can perform the corresponding Multiplication algorithm.
operation and it will be stored in temporary memory as well as Step 1: Need to choose A, B, C, D and P1, P2 – these
data memory. Once the sequence of operations is over then values must be large prime numbers. N represents the number
output can be presented in the memory to the user. of input bit width.
ALU consists of general purpose instruction and secure Step 2: Number of iterations to compute the intermediate
instruction in which general purpose instructions that are usual result depends on width of input.
operations followed in typical processor. Also in secure Step 3: Intermediate results need a modulo operation and
instruction operation consists of dual field or unified field simple arithmetic operation like addition and multiplication.
operation which can do both prime field and binary extension Step 4: Final result is the addition of intermediate remainder
field of operation. Instruction sets designed and it is tabulated and simple expression carried out by using addition and
as in Table. 1. It consists of prime instruction for modular multiplication.
addition, subtraction and high radix modular multiplication,
START
and binary extension field instruction for Galois field
multiplication; also it has a general purpose instruction like
arithmetic and logic instruction, shift and rotate.
INPUT: A, B, C, D, P1, P2, R
OUTPUT: (AB + CD) R-1
TABLE I. INSTRUCTION SET
Sl.
Instruction Operation
No.
Addition A + B mod P – Prime Count <= 6
1.
field
Subtraction A – B mod P – Prime
2. Q[count] = S[count] mod 2R
field
S[count] = (S[count] >> R) + (Q[count] * P2) +
Multiplication A*B mod p – Prime (A[count] * B) +(C[count] * D)
3.
field
High Radix
(AB+CD) R-1 – Prime
4. Montgomery
field S[8] = S[7] + (Q[6] * P1)
Multiplication
A * B – Binary
Galois field
5. extension field Out = S[8]
Multiplication
GF(2m)
Arithmetic and
logical, shift & STOP
This includes +, -, *,
6. rotate, LOAD,
/, &, |, ^, >>, <<
STORE – General Figure 2. Flow diagram of high radis Montgomery
Instruction Set multiplication.
START
START
INPUT: A, B, P
OUTPUT: A+ B mod P
INPUT: A, B
OUTPUT: C
C1 = A mod P C2 = B mod P
C = LUT[A][B]
STOP
C = C1 + C2
Figure 5. Galois multiplication GF (2M).
STOP III. RESULTS ANS DISCUSSION
The simulation result can describe how the designs can
Figure 3. Modular addition. response for a different input stream at various time incidents.
With the application of an input to the system, it is possible by
Following flow diagram describes the operation of modular writing test bench for a designed system. In a simulated
subtraction. window there are details which clearly show that at how next
Step1: Choose highest prime number of A, B and P. modules responds as per the input given for each module.
Step2: Then separately calculate remainder value
corresponding input A and B.
Step3: Finally, subtract those values to obtain modular
Subtraction.
START
INPUT: A, B, P
OUTPUT: A- B mod P
This result shows the top module of our Crypto-processor, Figure 10. Timing summary of a Crypto processor.
this shows all input and output of the Crypto-processor. Once
our HDL code can be converted to RTL as shown in above This HDL synthesis report details gives us knowledge about
Figure 7, this can be implemented on any FPGA. This shows how much combinational and sequential logic that has been
the internal design of our Crypto-processor. This internal used by the Crypto-processor. Combinational logic includes
schematic shows how data are connected between modules. adder/subtractors, multiplier, multiplexer and logic gate.
The Crypto-processor consists of data memory, instruction Similarly sequential logic includes counters, registers and
memory, instruction decoder, ALU, program counter and latches.
temporary memory. These details are shown in the following
Figure 8.
After completing synthesis, we get design summary, timing
summary and details of combinational and sequential logic
utilized by design. An area result obtained by the crypto-
processor as taken Virtex5 as a target device. As per calculation
56% of the area has occupied by a Crypto-processor.
IV. CONCLUSION
A Crypto-processor prototype design has been made and
synthesized. It can be able to execute unified field instruction.
To achieve creditable performance and having both instruction
set in single processor architecture, had invoked both prime and
binary field algorithms into single processor. Also, this design
used high radix Montgomery multiplication to make pairing
160
IJRITCC | October 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 10 157 – 161
_______________________________________________________________________________________________
computation to attain system as fast as possible. [4] S. Ghosh, D. Mukhopadhyay, and D. Roy Chowdhury, “High
Consequentially, this design is very useful for simulation and speed flexible pairing cryptoprocessor on FPGA platform”, in
synthesis tools for analyzing this architecture and it has been Proc. 4th Int. Conf. on Pairing-based cryptography, Pairing 2010,
Japan, 2010, pp. 450–466.
occupied 56% area and timing it is also achieved credibly
equally to known architecture. [5] H. Wu, “Bit-parallel finite field multiplier and squarer using
polynomial basis”, IEEE Transactions on Computers, vol. 51,
In future, it could add few more instructions like Modular
no. 7, pp. 750-758, 2002.
exponential and Modular inversion. Also to increase the
[6] J. Han, Y. Li, Z. Yu, and X. Zeng, “A 65nm cryptographic
number of pipeline stages to make better performance.
processor for high speed pairing computation”, IEEE
For papers published in translation journals, please give the
Transactions on VLSI systems, vol. 23, no. 4, pp. 692-701,
English citation first, followed by the original foreign-language 2014.
citation [6].
[7] D. Kammler, D. Zhang, P. Schwabe, H. Scharwaechter, M.
V. REFERENCES Langenberg, D. Auras, G. Ascheid, and R. Mathar, “Designing
an ASIP for cryptographic pairing over Barreto-Naehrig curves”,
[1] V. Bunimov and M. Schimmler, “Area and time efficient in Proc. 11th Int. workshop on Cryptographic hardware and
modular multiplication of large integers”, in Proc. IEEE Int. embedded systems, Switzerland, 2009, pp. 254-271.
Conf. Application-specific systems, architectures, and
[8] Y. Li, J. Han, S. Wang, and D. Fang, “An 800Mhz
processors, ASAP 2003, The Hague, Netherlands, 2003, pp.
cryptographic pairing processor in 65nm CMOS”, in Proc. IEEE
400-409.
Asian Conf. on Solid State Circuits, Kobe, 2012, pp. 217-220.
[2] R. C. C. Cheung, Sylvain Duquesne, Junfeng Fan, Nicolas
[9] O. Nibouche, A. Bouridane, and M. Nibouche, “Architectures
Guillermin, Ingrid Verbauwhede, and Gavin Xiaoxu Yao,
for Montgomery‟s multiplication”, in Proc. IEE Computers and
“FPGA implementation of pairings using residue number system
digital techniques, vol. 150, no. 6, 2003, pp. 361-368.
and lazy reduction”, in Proc. 13th Int. Conf. on Cryptographic
hardware and embedded systems, Nara, Japan, 2011, pp. 421– [10] J. Fan, F. Vercauteren, and I. Verbauwhede, “Faster F p –
441. arithmetic for cryptographic pairing on Barreto-Naehrig curves”,
in Proc. 11th Int. workshop on Cryptographic hardware and
[3] J. Fan, F. Vercauteren, and I. Verbauwhede, “Efficient hardware
embedded systems, Switzerland, 2009, pp. 240-253.
implementation of Fp-Arithmetic for pairing-friendly curves”,
IEEE Transactions on Computers., vol. 61, no. 5, pp. 676-685,
2012.
161
IJRITCC | October 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________