You are on page 1of 35

High Performance Linux Clusters

For Breaking RSA

Dr. Athar Mahboob


Associate Professor President
National University of Sciences & Technology Ibn Khaldun Systems
Pakistan Navy Engineering College Karachi, Pakistan
Karachi, Pakistan http://www.ibnkhaldun.com.pk
The Problem
● Factor a composite number n into its prime factors –
according to the Fundamental Theorem of Arithmetic

n=∏ p
ei ei ∈ ℤ
i
p i ∈P
● Requires computational Scalability as n becomes large
● Factoring thought to be a simple problem – ignored for
centuries
● Easy to verify an answer
● With the advent of Public Key Cryptography based on
RSA – great bounty in being able to factor large integers

2
Notices of the AMS, December
1996

3
The Problem is Relevant

http://www.h-online.com/security/news/item/768-bit-RSA-cracked-898986.html

4
Motivation and Credits
● Learn about Integer factorization problem
● Learn about Linux clustering techniques and solutions
● In addition to all the software packages which are mentioned in
this presentation some of the results presented are from joint
work with Dr. Junaid Ahmed Zubairi and Dr. Nassar Ikram
published as Book chapter:

Athar Mahboob, Junaid Zubairi and Nassar Ikram, Book


Chapter “High Performance Linux Clusters For Breaking RSA”
in Bantham USA, eBook on Applications of Modern High
Performance Networks, eISBN: 978-1-60805-077-2, 2009.

5
The RSA Public Key
Cryptosystem
Key Generation
Select p, q p and q both prime
Calculate n n=p×q
Select integer d gcd((n), d) = 1; 1 < d < (n)
Calculate e e = d-1 mod (n)
Public Key KU = {e, n}
Private Key KR = {d, n}
Encryption
Plaintext: M < n
Ciphertext: C = Me (mod n)
Decryption
Ciphertext: C
Plaintext: M = Cd (mod n)

6
Clusters
● An interconnection of
computer nodes and Cluster Node

suitable software to make End User Processes

it behave/appear like one Software Cluster Middleware

computer System Libraries

Operating System
● Two principal drivers: Memory

– Business driver – CPU


availability Hardware
System Bus
– Research and scientific Network Interface
driver – scalability

7
High Availability Clusters
Cluster Virtual
Client 1 IP Address(es) Server 1

Primary
LVS
Client 2 Director Server 2
Client High Speed Shared
Access Heartbeat Cluster Storage
Network Interconnect Consistent view of
Application State
Client 3 Backup Server 3
LVS
. Director .
. .
. .

Client n Server n

Typical Use Cases: Examples:


●Enterprise Information Services (Email, ●Linux HA (Hearbeat + LVS)
Database) – high availability ●Oracle Clusterware

●Web Application Server Farms – high ●Redhat Cluster Suite

availability with load balancing

8
HPC Clusters
● Computational scalability
● For scientific problem solving
● Two approaches:
– MPI: Message Passing API for cooperating
processes running across multiple nodes
– SSI: Single Server Image, a distributed
operating system approach

9
Single Server Image Clusters
● Create the illusion of one big SMP machine
● With a single process space: ps, top
● Makes clustering transparent to processes
– no need of MPI
● Still need a multi-programmed application
● Many attempts in Linux:
– Mosix (and then openMosix and then nothing)
– OpenSSI (big bang and then nothing)
– Kerrighed (the slow and steady SSI turtle!)
10
Linux, Open Source and Some
Misconceptions
● Business model is based on ● Open source is free
services alone: software !
– Implementation ● Software is free, people
– Customizations are not !
– Training ● Free as in “freedom” not
– Documentation
necessarily as in “free
– Support beer”
● A fair and consumer friendly ● Open source is a viable
business model for software business model
because:
● Open source is a better
– Software is incrementally
developed
software engineering
methodology
– Software is infinitely replicable

11
Linux and Open Source - Some
Disruptions
● Oracle banking on Linux as the OS to run
Oracle (Oracle Enterprise Linux, OCFS, Is
the Sun setting on Solaris?)
● Open Source ERP (Adempiere)
● Open Source in Health Informatics (VA
Vista)
● Google Android uses Linux as OS
● Most toolchain vendors are moving to
Eclipse as the IDE
12
Top500 HPC Clusters
✔ Linux has been gaining ground in HPC space
✔ Linux is the dominant UNIX which will survive
✔ Linux is the universal operating system
✔ From cell phones to super computers Linux is portable

Data from http://www.top500.org

13
The Kerrighed and Linux Kernel
● Kerrighed 2.4 based on Linux 2.6.20 kernel
● Uses configfs pseudo filesystem for cluster
configuration, especially scheduler policy
● Supports NFSROOT based root filesystem
for cluster nodes
● The project has existed for more than a
decade
● The project is active
● Well documented
14
Kerrighed Features
● Global Process Management ● Global Memory Management
– Cluster wide PIDs – Support for distributed system V
memory segments.
– Process migration with open
files, pipes, sockets,
– Memory injection
(EXPERIMENTAL)
shared memory segments,
etc. ● Checkpoint / restart
– Mosix-like global process – Checkpoint/restart of single
scheduler. processes
– Full cluster wide UNIX – Checkpoint/restart of applications
process management (EXPERIMENTAL)
interface (ps, top, kill, etc). ● Architecture
– Customizable distributed – Support of SMP / multi-cores
scheduler machines
– Support for x86-64 architecture
(i386 / x86-32 / IA32 is not
supported anymore).

15
Kerrighed Capabilities
● To allow process migration to take place
between nodes in the cluster type the
following:
● $ sudo krgcapset -d +CAN_MIGRATE
● Also remember to define scheduling policy
– by default a script is provided which
implements a Mosix like sceduler
● Other Kerrighed capabilities:
– REMOTE_FORK

16
The Cluster Interconnect
✔ Plays a significant role in cluster performance
✔ Significance depends on the nature of cooperating processes
✔ Gigabit Ethernet gaining ground as an off-the-shelf interconnect
✔ Gigabit Ethernet performs well most of the time

Data from http://www.top500.org

17
The Clustering Recipe
Cluster Boot Server
● Build a “root filesystem” for
cluster nodes: debootstrap Root filesystem and Kernel for
Cluster Nodes

● Setup Network Booting: DNS Server

DHCP, TFTP, NFS, NTP NTP Server


● Build Kerrighed Linux NFS Server
Kernel
TFTP Server
● Build factoring software
toolkit DHCP Server

● Boot and enjoy Linux Operating System

18
Network Booting Cluster Nodes
Makes it easy to add nodes to cluster
Node 1 Node 2 ... Node n Uses standard protocols and
mechanisms:
●PXE

●DHCP

Cluster Interconnect ●TFT

●NFS

●NTP

Cluster Boot
1. DHCP Request by PXE Boot Firmware
Server
2. DHCP Response containing IP Address,
Boot Server and Bootfile Name

3. TFTP Request by PXE Boot Firmware

4. TFTP Response containing Boot Kernel Cluster Node


Cluster Boot With System
5. NFS Mount Request by Node
Server BIOS Capable
6. Root Filesystem for Node of PXE Boot

7. NTP Time Synchronization Request

8. NTP Response

19
Kerrighed in Action
✔ Linux Kernel building is a processing intensive task
✔ Use “make -j n” to start n parallel build processes

20
Stability Issues in Kerrighed

21
The Factoring Recipe
Msieve Msieve is a highly optimized implementation of
Generates relations used in factoring the Multi Polynomial Quadratic Sieve integer
factorization algorithm

Efficient Large Integer Msieve now uses GMP, the GNU Multi-precision
Library Library
Performs arithmetic of steps in generating
relations

Linux Kernel with


Kerrighed
Performs process migration to level CPU
load on cluster nodes

Intel Pentium 4, 2.4 GHz, 512 MB RAM,


Cluster Node Diskless, 100 Mpbs Switched Fast Ethernet
CPU, RAM, Network Interface
Interconnect

22
Sieveing
A sieve, or sifter, separates wanted elements from unwanted material using a woven screen such as a mesh or
net. However, in cooking, especially with flour, a sifter is used to aerate the substance, among other things. A
strainer is a type of sieve typically used to separate a solid from a liquid. The word "sift" derives from sieve.

From: http://en.wikipedia.org/wiki/Sieve

● We sift out the possible factors from 1 to


sqrt(n).
● Factors occur in pairs. You find one of the
pair, the other is trivial.
● Divide by each successive prime from 1 to
sqrt(n)

23
Fermat's Improvement
● Factor 8051
● Factor 3599
● Factor 2496
● Hint: It's easy to factor these numbers if you
recognize them to be a difference of two squares

a2 – b2 = (a – b)(a + b)
● 8051 = 8100 – 49
= 902 – 72
= (90 – 7)(90+7)
= 83*97
24
Quadratic Sieve
Two main parts in Quadratic Sieve
1. Finding relations
– The quadratic sieve is a fast way of finding 'relations'. A
relation is a statement about the number you want to
factor.
– For a really big number, you'll need millions of these
relations. The process of finding relations is called
'sieving'.
– Once you have 'enough' of these relations, you can take
the millions of relations and combine them together into
a few dozen super-relations that *are* capable of
finishing the factorization with high probability.
2. Solving a matrix storing relations
25
Time Complexity of Various
Integer Factorization Algorithms

26
Factorization Trend
● Year Largest Integer that could be factored
(digits)
● 1970 20
● 1980 50
● 1990 116
● 1994 129
● 1996 130
● 2003 174
● 2010 232
27
Parallelization and Sieving
● Finding relations
– Large number of relations are needed
– Finding relations is the time consuming part
– It can be done in parallel
● Solving a matrix storing relations
– Cannot be done in parallel
– Is not the time consuming part
– Can be done by a single node once sufficient number
of relations are avilable

28
Msieve on Kerrighed in Action

29
The Bounty – Factoring a 103
Digit Number
27922791222035168658202845551641264115889
67906598302885660019469574081234567890121
33589058792410781116
= 2 x 2 x 109 x 173 x 367 x 431 x 7573 x 504353 x
1997715487 x 22918009218061 x
13383503315297540060735451572689075313818
6867985471862847417
Performance of QS Algorithm On 4 Node (8 CPU) Cluster, Pentium 4, 2.4 GHz
Number of Number of Time Speedup
Nodes CPUs (seconds)
1 1 124 1X
4 8 20 6X

30
Future Directions
● Kerrighed 3.0
– More stable, more features
– Runs inside Linux Containers (light-weight
virtual machines)
● Oracle Cluster File System 1.6 – for NFS
scalability and reliability during relations
accumulation
● Python factorization script with automatic
recovery (factmsieve.py) by Brian Gladman
● Try Discrete Logarithm Problem
● And Elliptic Curve Discrete Logarithm Problem
31
Kerrighed and Linux Containers
Kerrighed provides SSI features using a Linux container (lxc). In a few
words, a container is basically a light-weight virtual machine sharing its
kernel with the host OS. Depending on the needs, it may share or isolate
some resources with its host, such as PIDs, IPCs, Net, file systems, etc.,
and provides resource control groups (memory usage allowed, etc.).

On a Kerrighed kernel, the host system doesn't provides kerrighed By default on the Kerrighed system, the host system shares mosts of its
features. Those features are only available inside a special Linux ressources with the Kerrighed container (Network addresses, physical
container called Kerrighed container. A process running on the host devices, filesystem, system users, etc). When the Kerrighed service
system will behave as on a non patched kernel. Processes running boots the container, it additionally executes a configurable set of
in the Kerrighed container will have the ability to migrate from one commands. By default, a ssh server listening on port 2222 is launched on
node to another, checkpoint and restart, use distant memory, etc. each node. Once connected, you are in the SSI cluster!
From http://www.kerrighed.org

32
Some Links/References
● Kerrighed: Single Server Image Linux
Clusters, http://www.kerrighed.org
● MSIEVE: A Library for Factoring Large
Integers, Jason Papadopoulos hosted at
Sourceforge

33
Questions

34
Thanks for your patience

35

You might also like