You are on page 1of 31

Informa(cs

Lecture 4 Informa0on Storage


Introduc(on
Within the developed world data is being
created at a phenomenal rate
Photos and videos are being uploaded 24/7
and people expect them to be available on
demand for ever
The developing world is catching up very
quickly the other 3 billion o3B
The need to store data indenitely is now
being met by remote storage within the cloud

A reminder of some numbers


One minute of audio = 2MB
One minute of HD video = 300MB
1000 pages of text = 3MB
Photograph = 2MB
100 hours of video are uploaded to YouTube
every minute
Daily internet trac = 2EB

Storage Devices
Typically data was stored on hard discs (slow)
or in RAM (fast) and a typical computer might
have about 100 (mes the hard disc space as
RAM
Of course capaci(es have increased
relentlessly and solid state storage is
becoming more common (Solid State Discs)
In parallel with this local storage is now less
important

Storage Devices
0.1 GB/s
1.5 GB/s
0.5 GB/s

0.01 GB/s
3 GB/s

Network speed
0.1GB/s?

Storage interfaces
While most devices have impressive storage, the
focus is moving towards fast data interfaces:
USB3
Thunderbolt
4G and 5G
Fast WiFi
And yet most corporate networks will be at
100MB/s for some (me to come

Data storage schemes


The actual way in which data is stored on a
physical medium has many varia(ons and can
depend on the opera(ng system being used or
can be proprietary
NTFS
FAT
Mac OS
Many others

Data storage schemes


Data is stored, usually in sectors and tracks, and
these sectors dont have to be consecu(ve to
store a single le but can be distributed around
the physical medium
Low level formaang establishes
the layout and high level
formaang sets up the alloca(on
of the sectors

Data storage schemes


While other forms of storage dont have tracks
and sectors they will s(ll be formabed in
compliance with one of the standard schemes
e.g. FAT32
With the right hardware it is possible to read the
bits directly o a hard disc ignoring the le
system and so perhaps recover lost or deleted
data

Error correc(on schemes


Most data that is stored and/or transmibed
can suer from errors that can alter or
invalidate the data
A number of error detec(on and correc(on
schemes can be employed parity checking,
CRC and these add to the data overhead
Checks can also be included to ensure that the
data has not been tampered with.

Data compression
When storage was limited it was quite
common to compress les that were only
used infrequently a number of schemes are
available
Compression of audio and video les are now
common:
Audio mp3
Photo - jpg
Video mpeg2 (DVD) and m4v

Data compression
Note that the media compression schemes
can be op(mised for the data type this is
more dicult if the data format is unknown
Needless to say, there is a (me penalty in
accessing compressed data
Of course data can be compressed and
encrypted at the same (me this is covered
later in this lecture

Databases
We accept that we will be genera(ng huge
amounts of data and have the technology to
move it and store it.
However there is no point in any of this if we
cant nd the data when we need it.
We now need to consider how to store the data
in an op(mum way for later retrieval and
analysis

What is a database?
A database is an organised collec(on of data
The organisa(on has to be appropriate to the
type of data to be stored and the processing
to be carried out Tesco data and Flickr?
The interac(on and interroga(on is carried
out with a Database Management System
DBMS common examples are MySQL (free)
and Oracle (expensive)

What is a database?
Whenever data has to be stored and
interrogated then a DB is usually present a
lot of smartphones use MySQL
Accessing a database is not the same as simply
retrieving a le from a folder think of the
thousands of photos that you have
A Standardized Query Language SQL
allows dierent products to inter-operate and
modify and nd data in the database

DBMS Func(ons

Data deni(on data structures


Edit data add, delete, insert, update
Retrieve obtain relevant data for a query
Administra(on security, integrity, recovery

SQL can allow for ques(ons and hypotheses to


be phrased using a standard language as well as
standard audi(ng calcula(ons

Database design
Obviously a lot of care and thought has to go
into the original database design so that it
operates eciently and returns results as
quickly as possible.
This is becoming a sophis(cated problem with
data being increasingly dispersed within the
cloud what to store where for example?

Cloud storage
Most of us are now familiar with using cloud
storage through services such as Dropbox and
Copy (as well as social media)
Data is now stored on remote servers
although for large companies some may s(ll
be local
The process can be completely transparent to
the user

Dropbox is storing in excess of 10 PB(?) of data


(2012) on Amazon servers around the world

Cloud storage
Advantages
Pay only for what you need
Good for risk management a re at HQ
No upgrades or updates required
No physical maintenance or infrastructure

Cloud storage
Disadvantages
Aback surface area has increased
Trust the supplier corporate takeover
Disputes who owns the data?
Need a network to access some on-site
Ethical EU data must be stored in the EU

Data encryp(on
There are many situa(ons when we wish to
store conden(al informa(on (passwords) or
transmit data securely across public networks.
The same ideas can be used as digital signatures
or to ensure that data has not been tampered
with.
Of course codes and encryp(on go back
thousands of years.

Data encryp(on

Data encryp(on
The basic idea is to apply some process to the
message which can be reversed to recover the
original message.
Text can be processed with ciphers or look up
tables (single use for beber security).
Digital text can be processed mathema(cally
using one or more keys since the lebers of the
alphabet can be regarded as numbers.

Secure hash algorithm - SHA


Begin with a message of variable length (e.g. a
password)
Process it to create a data packet of xed
length
Store this hashed message
Verify any future entries of the password
The process is very dicult to reverse to guess
the password

Secure hash algorithm - SHA


SHA-1 = 160 bits
SHA-2 = 512 bits
Applied repeatedly

Wikipedia

Public key encryp(on


Used to exchange informa(on without the
need to also exchange passwords or keys
Used extensively as part of the even more
secure Preby Good Privacy PGP method see
later
Relies upon sophis(cated mathema(cal
procedures and very large numbers

Public key encryp(on


Each person has a private key and a public key
which can be published
The keys work together and must func(on so
that it doesnt maber which one is used rst
The result of the process must have so many
ways of occurring that someone cant guess
the private key part

Process
Alice and Bob have both public and private
keys
Alice takes a message and applies Bobs public
key and her private key
The message is sent to Bob
Bob uses his private key and Alices public key
to recover the message
Intercep(on is no use because a private key is
s(ll needed

Process
This only works with certain mathema(cal
processes and of course, as men(oned, the
numbers are huge to prevent brute force
guessing

Preby Good Privacy - PGP


This uses a sequence of hashing, compression,
plain cryptography and public key
cryptography
It is extremely robust and gives military-grade
security to any person
Of course this is of concern to many
Governments and there have been demands
for master keys

You might also like