You are on page 1of 112

RAID is an acronym first defined by David A. Patterson, Garth A.

Gibson and Randy Katz at the


University of California, Berkeley in 1987 to describe a Redundant Array of Inexpensive
Disks,[1] a technology that allowed computer users to achieve high levels of storage reliability
from low-cost and less reliable PC-class disk-drive components, via the technique of arranging
the devices into arrays for redundancy.

More recently, marketers representing industry RAID manufacturers reinvented the term to
describe a redundant array of independent disks as a means of disassociating a "low cost"
expectation from RAID technology.[2]

"RAID" is now used as an umbrella term for computer data storage schemes that can divide and
replicate data among multiple hard disk drives. The different schemes/architectures are named by
the word RAID followed by a number, as in RAID 0, RAID 1, etc. RAID's various designs
involve two key design goals: increase data reliability or increase input/output performance.
When multiple physical disks are set up to use RAID technology, they are said to be in a RAID
array. This array distributes data across multiple disks, but the array is seen by the computer user
and operating system as one single disk. RAID can be set up to serve several different purposes.

Contents
[hide]

 1 Purpose and basics


 2 Principles
 3 Standard levels
 4 Nested (hybrid) RAID
 5 Non-standard levels
 6 Implementations
o 6.1 Operating system based ("software RAID")
o 6.2 Hardware-based
o 6.3 Firmware/driver-based RAID
o 6.4 Network-attached storage
o 6.5 Hot spares
 7 Reliability terms
 8 Problems with RAID
o 8.1 Correlated failures
o 8.2 Atomicity
o 8.3 Write cache reliability
o 8.4 Equipment compatibility
o 8.5 Data recovery in the event of a failed array
o 8.6 Drive error recovery algorithms
o 8.7 Other Problems and Viruses
 9 History
 10 See also
o 10.1 Developers of Raid Hardware
 11 References
 12 Further reading
 13 External links

[edit] Purpose and basics


Redundancy is achieved by either writing the same data to multiple drives (known as mirroring),
or writing extra data (known as parity data) across the array, calculated such that the failure of
one (or possibly more, depending on the type of RAID) disks in the array will not result in loss
of data. A failed disk may be replaced by a new one, and the lost data reconstructed from the
remaining data and the parity data. Organizing disks into a redundant array decreases the usable
storage capacity. For instance, a 2-disk RAID 1 array loses half of the total capacity that would
have otherwise been available using both disks independently, and a RAID 5 array with several
disks loses the capacity of one disk. Other types of RAID arrays are arranged, for example, so
that they are faster to write to and read from than a single disk.

There are various combinations of these approaches giving different trade-offs of protection
against data loss, capacity, and speed. RAID levels 0, 1, and 5 are the most commonly found,
and cover most requirements.

 RAID 0 (striped disks) distributes data across several disks in a way that gives improved speed at
any given instant. If one disk fails, however, all of the data on the array will be lost, as there is
neither parity nor mirroring.
 RAID 1 mirrors the contents of the disks, making a form of 1:1 ratio realtime backup. The
contents of each disk in the array are identical to that of every other disk in the array.
 RAID 5 (striped disks with parity) combines three or more disks in a way that protects data
against loss of any one disk. The storage capacity of the array is reduced by one disk.
 RAID 6 (striped disks with dual parity) combines four or more disks in a way that protects data
against loss of any two disks.
 RAID 10 (or 1+0) uses both striping and mirroring. "01" or "0+1" is sometimes distinguished from
"10" or "1+0": a striped set of mirrored subsets and a mirrored set of striped subsets are both
valid, but distinct, configurations.

RAID can involve significant computation when reading and writing information. With
traditional "real" RAID hardware, a separate controller does this computation. In other cases the
operating system or simpler and less expensive controllers require the host computer's processor
to do the computing, which reduces the computer's performance on processor-intensive tasks
(see "Software RAID" and "Fake RAID" below). Simpler RAID controllers may provide only
levels 0 and 1, which require less processing.

RAID systems with redundancy continue working without interruption when one (or possibly
more, depending on the type of RAID) disks of the array fail, although they are then vulnerable
to further failures. When the bad disk is replaced by a new one the array is rebuilt while the
system continues to operate normally. Some systems have to be powered down when removing
or adding a drive; others support hot swapping, allowing drives to be replaced without powering
down. RAID with hot-swapping is often used in high availability systems, where it is important
that the system remains running as much of the time as possible.

RAID is not a good alternative to backing up data. Data may become damaged or destroyed
without harm to the drive(s) on which they are stored. For example, some of the data may be
overwritten by a system malfunction; a file may be damaged or deleted by user error or malice
and not noticed for days or weeks; and, of course, the entire array is at risk of physical damage.

Note that a RAID controller itself can become the single point of failure within a system.

[edit] Principles
RAID combines two or more physical hard disks into a single logical unit by using either special
hardware or software. Hardware solutions often are designed to present themselves to the
attached system as a single hard drive, so that the operating system would be unaware of the
technical workings. For example, you might configure a 1TB RAID 5 array using three 500GB
hard drives in hardware RAID, the operating system would simply be presented with a "single"
1TB volume. Software solutions are typically implemented in the operating system and would
present the RAID drive as a single volume to applications running upon the operating system.

There are three key concepts in RAID: mirroring, the copying of data to more than one disk;
striping, the splitting of data across more than one disk; and error correction, where redundant
data is stored to allow problems to be detected and possibly fixed (known as fault tolerance).
Different RAID levels use one or more of these techniques, depending on the system
requirements. RAID's main aim can be either to improve reliability and availability of data,
ensuring that important data is available more often than not (e.g. a database of customer orders),
or merely to improve the access speed to files (e.g. for a system that delivers video on demand
TV programs to many viewers).

The configuration affects reliability and performance in different ways. The problem with using
more disks is that it is more likely that one will fail, but by using error checking the total system
can be made more reliable by being able to survive and repair the failure. Basic mirroring can
speed up reading data as a system can read different data from both the disks, but it may be slow
for writing if the configuration requires that both disks must confirm that the data is correctly
written. Striping is often used for performance, where it allows sequences of data to be read from
multiple disks at the same time. Error checking typically will slow the system down as data
needs to be read from several places and compared. The design of RAID systems is therefore a
compromise and understanding the requirements of a system is important. Modern disk arrays
typically provide the facility to select the appropriate RAID configuration.

[edit] Standard levels


Main article: Standard RAID levels

A number of standard schemes have evolved which are referred to as levels. There were five
RAID levels originally conceived, but many more variations have evolved, notably several
nested levels and many non-standard levels (mostly proprietary).

Following is a brief summary of the most commonly used RAID levels.[3] Space efficiency is
given as amount of storage space available in an array of n disks, in multiples of the capacity of a
single drive. For example if an array holds n=5 drives of 250GB and efficiency is n-1 then
available space is 4 times 250GB or roughly 1TB.

Minimum Space
Level Description Image
# of disks Efficiency

"Striped set without parity" or "Striping". Provides improved


performance and additional storage but no redundancy or fault
tolerance. Any disk failure destroys the array, which has greater
consequences with more disks in the array (at a minimum,
catastrophic data loss is twice as severe compared to single
drives without RAID). A single disk failure destroys the entire
array because when data is written to a RAID 0 drive, the data
RAID 0 is broken into fragments. The number of fragments is dictated 2 n
by the number of disks in the array. The fragments are written
to their respective disks simultaneously on the same sector.
This allows smaller sections of the entire chunk of data to be
read off the drive in parallel, increasing bandwidth. RAID 0 does
not implement error checking so any error is unrecoverable.
More disks in the array means higher bandwidth, but greater
risk of data loss.

RAID 1 'Mirrored set without parity' or 'Mirroring'. Provides fault 2 1 (size of


tolerance from disk errors and failure of all but one of the the smallest
drives. Increased read performance occurs when using a multi- disk)
threaded operating system that supports split seeks, as well as
a very small performance reduction when writing. Array
continues to operate so long as at least one drive is
functioning. Using RAID 1 with a separate controller for each
disk is sometimes called duplexing.

Hamming code parity. Disks are synchronized and striped in


very small stripes, often in single bytes/words. Hamming codes
RAID 2 3
error correction is calculated across corresponding bits on
disks, and is stored on multiple parity disks.

Striped set with dedicated parity or bit interleaved parity or


byte level parity.

This mechanism provides fault tolerance similar to RAID


RAID 3 5. However, because the strip across the disks is a lot 3 n-1
smaller than a filesystem block, reads and writes to the
array perform like a single drive with a high linear write
performance. For this to work properly, the drives must
have synchronised rotation. If one drive fails, the
performance doesn't change.
Block level parity. Identical to RAID 3, but does block-level
striping instead of byte-level striping. In this setup, files can be
distributed between multiple disks. Each disk operates
RAID 4 independently which allows I/O requests to be performed in 3 n-1
parallel, though data transfer speeds can suffer due to the type
of parity. The error detection is achieved through dedicated
parity and is stored in a separate, single disk unit.

Striped set with distributed parity or interleave parity.


Distributed parity requires all drives but one to be present to
operate; drive failure requires replacement, but the array is not
destroyed by a single drive failure. Upon drive failure, any
subsequent reads can be calculated from the distributed parity
RAID 5 such that the drive failure is masked from the end user. The 3 n-1
array will have data loss in the event of a second drive failure
and is vulnerable until the data that was on the failed drive is
rebuilt onto a replacement drive. A single drive failure in the
set will result in reduced performance of the entire set until the
failed drive has been replaced and rebuilt.
Striped set with dual distributed parity. Provides fault
tolerance from two drive failures; array continues to operate
with up to two failed drives. This makes larger RAID groups
more practical, especially for high availability systems. This
becomes increasingly important because large-capacity drives
RAID 6 lengthen the time needed to recover from the failure of a 4 n-2
single drive. Single parity RAID levels are vulnerable to data loss
until the failed drive is rebuilt: the larger the drive, the longer
the rebuild will take. Dual parity gives time to rebuild the array
without the data being at risk if a (single) additional drive fails
before the rebuild is complete.

[edit] Nested (hybrid) RAID


Main article: Nested RAID levels

In what was originally termed hybrid RAID,[4] many storage controllers allow RAID levels to be
nested. The elements of a RAID may be either individual disks or RAIDs themselves. Nesting
more than two deep is unusual.

As there is no basic RAID level numbered larger than 9, nested RAIDs are usually
unambiguously described by concatenating the numbers indicating the RAID levels, sometimes
with a "+" in between. For example, RAID 10 (or RAID 1+0) consists of several level 1 arrays of
physical drives, each of which is one of the "drives" of a level 0 array striped over the level 1
arrays. It is not called RAID 01, to avoid confusion with RAID 1, or indeed, RAID 01. When the
top array is a RAID 0 (such as in RAID 10 and RAID 50) most vendors omit the "+", though
RAID 5+0 is clearer.

 RAID 0+1: striped sets in a mirrored set (minimum four disks; even number of disks) provides
fault tolerance and improved performance but increases complexity. The key difference from
RAID 1+0 is that RAID 0+1 creates a second striped set to mirror a primary striped set. The array
continues to operate with one or more drives failed in the same mirror set, but if drives fail on
both sides of the mirror the data on the RAID system is lost.

 RAID 1+0: mirrored sets in a striped set (minimum four disks; even number of disks) provides
fault tolerance and improved performance but increases complexity. The key difference from
RAID 0+1 is that RAID 1+0 creates a striped set from a series of mirrored drives. In a failed disk
situation, RAID 1+0 performs better because all the remaining disks continue to be used. The
array can sustain multiple drive losses so long as no mirror loses all its drives.

 RAID 5+0: stripe across distributed parity RAID systems.

 RAID 5+1: mirror striped set with distributed parity (some manufacturers label this as RAID 53).
[edit] Non-standard levels
Main article: Non-standard RAID levels

Many configurations other than the basic numbered RAID levels are possible, and many
companies, organizations, and groups have created their own non-standard configurations, in
many cases designed to meet the specialised needs of a small niche group. Most of these non-
standard RAID levels are proprietary.

Some of the more prominent modifications are:

 Storage Computer Corporation used to call a cached version of RAID 3 and 4, RAID 7. Storage
Computer Corporation is now defunct.
 EMC Corporation used to offer RAID S as an alternative to RAID 5 on their Symmetrix systems.
Their latest generation of Symmetric, the DMX series, does not support RAID S.
 The ZFS filesystem, available in Solaris, OpenSolaris, FreeBSD and Mac OS X, offers RAID-Z, which
solves RAID 5's write hole problem.
 Hewlett-Packard's Advanced Data Guarding (ADG) is a form of RAID 6.
 NetApp's Data ONTAP uses RAID-DP (also referred to as "double", "dual", or "diagonal" parity),
is a form of RAID 6, but unlike many RAID 6 implementations, does not use distributed parity as
in RAID 5. Instead, two unique parity disks with separate parity calculations are used. This is a
modification of RAID 4 with an extra parity disk.
 Accusys Triple Parity (RAID TP) implements three independent parities by extending RAID 6
algorithms on its FC-SATA and SCSI-SATA RAID controllers to tolerate three-disk failure.
 Linux MD RAID10 (RAID10) implements a general RAID driver that defaults to a standard RAID
1+0 with four drives, but can have any number of drives. MD RAID10 can run striped and
mirrored with only two drives with the f2 layout (mirroring with striped reads, normal Linux
software RAID 1 does not stripe reads, but can read in parallel). [5]
 Infrant (now part of Netgear) X-RAID offers dynamic expansion of a RAID5 volume without
having to back up or restore the existing content. Just add larger drives one at a time, let it
resync, then add the next drive until all drives are installed. The resulting volume capacity is
increased without user downtime. (It should be noted that this is also possible in Linux, when
utilizing Mdadm utility. It has also been possible in the EMC Clariion and HP MSA arrays for
several years.)
 BeyondRAID, created by Data Robotics and used in the Drobo series of products, implements
both mirroring and striping simultaneously or individually dependent on disk and data context.
It offers expandability without reconfiguration, the ability to mix and match drive sizes and the
ability to reorder disks. It supports NTFS, HFS+, FAT32, and EXT3 file systems [6]. It also uses thin
provisioning to allow for single volumes up to 16 TB depending on the host operating system
support.
 Hewlett-Packard's EVA series arrays implement vRAID - vRAID-0, vRAID-1, vRAID-5, and vRAID-6.
The EVA allows drives to be placed in groups (called Disk Groups) that form a pool of data blocks
on top of which the RAID level is implemented. Any Disk Group may have "virtual disks" or LUNs
of any vRAID type, including mixing vRAID types in the same Disk Group - a unique feature.
vRAID levels are more closely aligned to Nested RAID levels - vRAID-1 is actually a RAID 1+0 (or
RAID10), vRAID-5 is actually a RAID 5+0 (or RAID50), etc. Also, drives may be added on-the-fly to
an existing Disk Group, and the existing virtual disks data is redistributed evenly over all the
drives, thereby allowing dynamic performance and capacity growth.
 IBM (Among others) has implemented a RAID 1E (Level 1 Enhanced). With an even number of
disks it is similar to a RAID 10 array, but, unlike a RAID 10 array, it can also be implemented with
an odd number of drives. In either case, the total available disk space is n/2. It requires a
minimum of three drives.

[edit] Implementations
It has been suggested that Vinum volume manager be merged into this article or section. (Discuss)

(Specifically, the section comparing hardware / software raid)

The distribution of data across multiple drives can be managed either by dedicated hardware or
by software. When done in software the software may be part of the operating system or it may
be part of the firmware and drivers supplied with the card.

[edit] Operating system based ("software RAID")

Software implementations are now provided by many operating systems. A software layer sits
above the (generally block-based) disk device drivers and provides an abstraction layer between
the logical drives (RAIDs) and physical drives. Most common levels are RAID 0 (striping across
multiple drives for increased space and performance) and RAID 1 (mirroring two drives),
followed by RAID 1+0, RAID 0+1, and RAID 5 (data striping with parity) are supported.

 Apple's Mac OS X Server supports RAID 0, RAID 1, RAID 5 and RAID 1+0. [7]

 FreeBSD supports RAID 0, RAID 1, RAID 3, and RAID 5 and all layerings of the above via GEOM
modules[8][9] and ccd.[10], as well as supporting RAID 0, RAID 1, RAID-Z, and RAID-Z2 (similar to
RAID-5 and RAID-6 respectively), plus nested combinations of those via ZFS.

 Linux supports RAID 0, RAID 1, RAID 4, RAID 5, RAID 6 and all layerings of the above. [11][12]

 Microsoft's server operating systems support 3 RAID levels; RAID 0, RAID 1, and RAID 5. Some of
the Microsoft desktop operating systems support RAID such as Windows XP Professional which
supports RAID level 0 in addition to spanning multiple disks but only if using dynamic disks and
volumes. Windows XP supports RAID 0, 1, and 5 with a simple file patch [13]. RAID functionality in
Windows is slower than hardware RAID, but allows a RAID array to be moved to another
machine with no compatibility issues.

 NetBSD supports RAID 0, RAID 1, RAID 4 and RAID 5 (and any nested combination of those like
1+0) via its software implementation, named RAIDframe.

 OpenBSD aims to support RAID 0, RAID 1, RAID 4 and RAID 5 via its software implementation
softraid.
 OpenSolaris and Solaris 10 supports RAID 0, RAID 1, RAID 5 (or the similar "RAID Z" found only
on ZFS), and RAID 6 (and any nested combination of those like 1+0) via ZFS and now has the
ability to boot from a ZFS volume on both x86 and UltraSPARC. Through SVM, Solaris 10 and
earlier versions support RAID 0, RAID 1, and RAID 5 on both system and data drives.

Software RAID has advantages and disadvantages compared to hardware RAID. The software
must run on a host server attached to storage, and server's processor must dedicate processing
time to run the RAID software. This is negligible for RAID 0 and RAID 1, but may become
significant when using parity-based arrays and either accessing several arrays at the same time or
running many disks. Furthermore all the busses between the processor and the disk controller
must carry the extra data required by RAID which may cause congestion.

Another concern with operating system-based RAID is the boot process. It can be difficult or
impossible to set up the boot process such that it can fail over to another drive if the usual boot
drive fails. Such systems can require manual intervention to make the machine bootable again
after a failure. There are exceptions to this, such as the LILO bootloader for Linux, loader for
FreeBSD[14] , and some configurations of the GRUB bootloader natively understand RAID-1 and
can load a kernel. If the BIOS recognizes a broken first disk and refers bootstrapping to the next
disk, such a system will come up without intervention, but the BIOS might or might not do that
as intended. A hardware RAID controller typically has explicit programming to decide that a
disk is broken and fall through to the next disk.

Hardware RAID controllers can also carry battery-powered cache memory. For data safety in
modern systems the user of software RAID might need to turn the write-back cache on the disk
off (but some drives have their own battery/capacitors on the write-back cache, a UPS, and/or
implement atomicity in various ways, etc). Turning off the write cache has a performance
penalty that can, depending on workload and how well supported command queuing in the disk
system is, be significant. The battery backed cache on a RAID controller is one solution to have
a safe write-back cache.

Finally operating system-based RAID usually uses formats specific to the operating system in
question so it cannot generally be used for partitions that are shared between operating systems
as part of a multi-boot setup. However, this allows RAID disks to be moved from one computer
to a computer with an operating system or file system of the same type, which can be more
difficult when using hardware RAID (e.g. #1: When one computer uses a hardware RAID
controller from one manufacturer and another computer uses a controller from a different
manufacturer, drives typically cannot be interchanged. e.g. #2: If the hardware controller 'dies'
before the disks do, data may become unrecoverable unless a hardware controller of the same
type is obtained, unlike with firmware-based or software-based RAID).

Most operating system-based implementations allow RAIDs to be created from partitions rather
than entire physical drives. For instance, an administrator could divide an odd number of disks
into two partitions per disk, mirror partitions across disks and stripe a volume across the mirrored
partitions to emulate IBM's RAID 1E configuration. Using partitions in this way also allows
mixing reliability levels on the same set of disks. For example, one could have a very robust
RAID 1 partition for important files, and a less robust RAID 5 or RAID 0 partition for less
important data. (Some BIOS-based controllers offer similar features, e.g. Intel Matrix RAID.)
Using two partitions on the same drive in the same RAID is, however, dangerous. (e.g. #1:
Having all partitions of a RAID-1 on the same drive will, obviously, make all the data
inaccessible if the single drive fails. e.g. #2: In a RAID 5 array composed of four drives 250 +
250 + 250 + 500 GB, with the 500-GB drive split into two 250 GB partitions, a failure of this
drive will remove two partitions from the array, causing all of the data held on it to be lost).

[edit] Hardware-based

Hardware RAID controllers use different, proprietary disk layouts, so it is not usually possible to
span controllers from different manufacturers. They do not require processor resources, the BIOS
can boot from them, and tighter integration with the device driver may offer better error
handling.

A hardware implementation of RAID requires at least a special-purpose RAID controller. On a


desktop system this may be a PCI expansion card, PCI-e expansion card or built into the
motherboard. Controllers supporting most types of drive may be used – IDE/ATA, SATA, SCSI,
SSA, Fibre Channel, sometimes even a combination. The controller and disks may be in a stand-
alone disk enclosure, rather than inside a computer. The enclosure may be directly attached to a
computer, or connected via SAN. The controller hardware handles the management of the drives,
and performs any parity calculations required by the chosen RAID level.

Most hardware implementations provide a read/write cache, which, depending on the I/O
workload, will improve performance. In most systems the write cache is non-volatile (i.e.
battery-protected), so pending writes are not lost on a power failure.

Hardware implementations provide guaranteed performance, add no overhead to the local CPU
complex and can support many operating systems, as the controller simply presents a logical disk
to the operating system.

Hardware implementations also typically support hot swapping, allowing failed drives to be
replaced while the system is running.

[edit] Firmware/driver-based RAID

Operating system-based RAID doesn't always protect the boot process and is generally
impractical on desktop versions of Windows (as described above). Hardware RAID controllers
are expensive and proprietary. To fill this gap, cheap "RAID controllers" were introduced that do
not contain a RAID controller chip, but simply a standard disk controller chip with special
firmware and drivers. During early stage bootup the RAID is implemented by the firmware;
when a protected-mode operating system kernel such as Linux or a modern version of Microsoft
Windows is loaded the drivers take over.

These controllers are described by their manufacturers as RAID controllers, and it is rarely made
clear to purchasers that the burden of RAID processing is borne by the host computer's central
processing unit, not the RAID controller itself, thus introducing the aforementioned CPU
overhead which hardware controllers don't suffer from. Firmware controllers often can only use
certain types of hard drives in their RAID arrays (e.g. SATA for Intel Matrix RAID, as there is
neither SCSI nor PATA support in modern Intel ICH southbridges; however, motherboard
makers implement RAID controllers outside of the southbridge on some motherboards). Before
their introduction, a "RAID controller" implied that the controller did the processing, and the
new type has become known in technically knowledgeable circles as "fake RAID" even though
the RAID itself is implemented correctly. Adaptec calls them "HostRAID".

[edit] Network-attached storage


Main article: Network-attached storage

While not directly associated with RAID, Network-attached storage (NAS) is an enclosure
containing disk drives and the equipment necessary to make them available over a computer
network, usually Ethernet. The enclosure is basically a dedicated computer in its own right,
designed to operate over the network without screen or keyboard. It contains one or more disk
drives; multiple drives may be configured as

[edit] Hot spares

Both hardware and software RAIDs with redundancy may support the use of hot spare drives, a
drive physically installed in the array which is inactive until an active drive fails, when the
system automatically replaces the failed drive with the spare, rebuilding the array with the spare
drive included. This reduces the mean time to recovery (MTTR), though it doesn't eliminate it
completely. Subsequent additional failure(s) in the same RAID redundancy group before the
array is fully rebuilt can result in loss of the data; rebuilding can take several hours, especially on
busy systems.

Rapid replacement of failed drives is important as the drives of an array will all have had the
same amount of use, and may tend to fail at about the same time rather than randomly. RAID 6
without a spare uses the same number of drives as RAID 5 with a hot spare and protects data
against simultaneous failure of up to two drives, but requires a more advanced RAID controller.
Further, a hot spare can be shared by multiple RAID sets.

[edit] Reliability terms


Failure rate

Failure rate is only meaningful if failure is defined. If a failure is defined as the loss of a single
drive (logistical failure rate), the failure rate will be the sum of individual drives' failure rates. In
this case the failure rate of the RAID will be larger than the failure rate of its constituent drives.
On the other hand, if failure is defined as loss of data (system failure rate), then the failure rate
of the RAID will be less than that of the constituent drives. How much less depends on the type
of RAID.

Mean time to data loss (MTTDL)


In this context, the average time before a loss of data in a given array. [15]. Mean time to data loss
of a given RAID may be higher or lower than that of its constituent hard drives, depending upon
what type of RAID is employed. The referenced report assumes times to data loss are
exponentially distributed. This means 63.2% of all data loss will occur between time 0 and the
MTTDL.

Mean time to recovery (MTTR)

In arrays that include redundancy for reliability, this is the time following a failure to restore an
array to its normal failure-tolerant mode of operation. This includes time to replace a failed disk
mechanism as well as time to re-build the array (i.e. to replicate data for redundancy).

Unrecoverable bit error rate (UBE)

This is the rate at which a disk drive will be unable to recover data after application of cyclic
redundancy check (CRC) codes and multiple retries.

Write cache reliability

Some RAID systems use RAM write cache to increase performance. A power failure can result in
data loss unless this sort of disk buffer is supplemented with a battery to ensure that the buffer
has enough time to write from RAM back to disk.

Atomic write failure

Also known by various terms such as torn writes, torn pages, incomplete writes, interrupted
writes, non-transactional, etc.

[edit] Problems with RAID


[edit] Correlated failures

The theory behind the error correction in RAID assumes that failures of drives are independent.
Given these assumptions it is possible to calculate how often they can fail and to arrange the
array to make data loss arbitrarily improbable.

In practice, the drives are often the same ages, with similar wear. Since many drive failures are
due to mechanical issues which are more likely on older drives, this violates those assumptions
and failures are in fact statistically correlated. In practice then, the chances of a second failure
before the first has been recovered is not nearly as unlikely as might be supposed, and data loss
can in practice occur at significant rates.[16]

Most hard drives have a quoted service life of five years. However, users should be aware that
drives are built to different levels of robustness, depending on their intended application.
Enterprise-class fibre-channel and SAS drives are generally designed to withstand the heavy use
in an array, but desktop-class drives are less robust and being used in an array could shorten their
life significantly.[citation needed]

[edit] Atomicity

This is a little understood and rarely mentioned failure mode for redundant storage systems that
do not utilize transactional features. Database researcher Jim Gray wrote "Update in Place is a
Poison Apple"[17] during the early days of relational database commercialization. However, this
warning largely went unheeded and fell by the wayside upon the advent of RAID, which many
software engineers mistook as solving all data storage integrity and reliability problems. Many
software programs update a storage object "in-place"; that is, they write a new version of the
object on to the same disk addresses as the old version of the object. While the software may also
log some delta information elsewhere, it expects the storage to present "atomic write semantics,"
meaning that the write of the data either occurred in its entirety or did not occur at all.

However, very few storage systems provide support for atomic writes, and even fewer specify
their rate of failure in providing this semantic. Note that during the act of writing an object, a
RAID storage device will usually be writing all redundant copies of the object in parallel,
although overlapped or staggered writes are more common when a single RAID processor is
responsible for multiple drives. Hence an error that occurs during the process of writing may
leave the redundant copies in different states, and furthermore may leave the copies in neither the
old nor the new state. The little known failure mode is that delta logging relies on the original
data being either in the old or the new state so as to enable backing out the logical change, yet
few storage systems provide an atomic write semantic on a RAID disk.

While the battery-backed write cache may partially solve the problem, it is applicable only to a
power failure scenario.

Since transactional support is not universally present in hardware RAID, many operating systems
include transactional support to protect against data loss during an interrupted write. Novell
Netware, starting with version 3.x, included a transaction tracking system. Microsoft introduced
transaction tracking via the journaling feature in NTFS. Ext4 has journaling with checksums;
ext3 has journaling without checksums but an "append-only" option, or ext3COW (Copy on
Write). If the journal itself in a filesystem is corrupted though, this can be problematic. The
journaling in NetApp WAFL file system gives atomicity by never updating the data in place, as
does ZFS. An alternative method to journaling is soft updates, which are used in some BSD-
derived system's implementation of UFS.

This can present as a sector read failure. Some RAID implementations protect against this failure
mode by remapping the bad sector, using the redundant data to retrieve a good copy of the data,
and rewriting that good data to the newly mapped replacement sector. The UBE (Unrecoverable
Bit Error) rate is typically specified at 1 bit in 1015 for enterprise class disk drives (SCSI, FC,
SAS) , and 1 bit in 1014 for desktop class disk drives (IDE/ATA/PATA, SATA). Increasing disk
capacities and large RAID 5 redundancy groups have led to an increasing inability to
successfully rebuild a RAID group after a disk failure because an unrecoverable sector is found
on the remaining drives. Double protection schemes such as RAID 6 are attempting to address
this issue, but suffer from a very high write penalty.

[edit] Write cache reliability

The disk system can acknowledge the write operation as soon as the data is in the cache, not
waiting for the data to be physically written. This typically occurs in old, non-journaled systems
such as FAT32, or if the Linux/Unix "writeback" option is chosen without any protections like
the "soft updates" option (to promote I/O speed whilst trading-away data reliability). A power
outage or system hang such as a BSOD can mean a significant loss of any data queued in such
cache.

Often a battery is protecting the write cache, mostly solving the problem. If a write fails because
of power failure, the controller may complete the pending writes as soon as restarted. This
solution still has potential failure cases: the battery may have worn out, the power may be off for
too long, the disks could be moved to another controller, the controller itself could fail. Some
disk systems provide the capability of testing the battery periodically, however this leaves the
system without a fully charged battery for several hours.

An additional concern about write cache reliability exists, specifically regarding devices
equipped with a write-back cache—a caching system which reports the data as written as soon as
it is written to cache, as opposed to the non-volatile medium.[18] The safer cache technique is
write-through, which reports transactions as written when they are written to the non-volatile
medium.

[edit] Equipment compatibility

The disk formats on different RAID controllers are not necessarily compatible, so that it may not
be possible to read a RAID on different hardware. Consequently a non-disk hardware failure
may require using identical hardware, or a data backup, to recover the data. Software RAID
however, such as implemented in the Linux kernel, alleviates this concern, as the setup is not
hardware dependent, but runs on ordinary disk controllers. Additionally, Software RAID1 disks
(and some hardware RAID1 disks, for example Silicon Image 5744) can be read like normal
disks, so no RAID system is required to retrieve the data. Data recovery firms typically have a
very hard time recovering data from RAID drives, with the exception of RAID1 drives with
conventional data structure.

[edit] Data recovery in the event of a failed array

With larger disk capacities the odds of a disk failure during rebuild is not negligible. In that event
the difficulty of extracting data from a failed array must be considered. Only RAID 1 stores all
data on each disk. Although it may depend on the controller, some RAID 1 disks can be read as a
single conventional disk. This means a dropped RAID 1 disk, although damaged, can often be
reasonably easily recovered using a software recovery program or CHKDSK. If the damage is
more severe, data can often be recovered by professional drive specialists. RAID5 and other
striped or distributed arrays present much more formidable obstacles to data recovery in the
event the array goes down.

[edit] Drive error recovery algorithms

Many modern drives have internal error recovery algorithms that can take upwards of a minute
to recover and re-map data that the drive fails to easily read. Many RAID controllers will drop a
non-responsive drive in 8 seconds or so. This can cause the array to drop a good drive because it
has not been given enough time to complete its internal error recovery procedure, leaving the rest
of the array vulnerable. So-called enterprise class drives limit the error recovery time and prevent
this problem, but desktop drives can be quite risky for this reason. A fix is known for Western
Digital drives. A utility called WDTLER.exe can limit the error recovery time of a Western
Digital desktop drive so that it will not be dropped from the array for this reason. The utility
enables TLER (time limited error recovery) which limits the error recovery time to 7 seconds.
Western Digital enterprise class drives are shipped from the factory with TLER enabled to
prevent being dropped from RAID arrays. Similar technologies are used by Seagate, Samsung,
and Hitachi (reference http://en.wikipedia.org/wiki/TLER).

[edit] Other Problems and Viruses

While RAID may protect against drive failure, the data is still exposed to operator, software,
hardware and virus destruction. Most well-designed systems include separate backup systems
that hold copies of the data, but don't allow much interaction with it. Most copy the data and
remove it from the computer for safe storage.

[edit] History
Norman Ken Ouchi at IBM was awarded a 1978 U.S. patent 4,092,732[19] titled "System for
recovering data stored in failed memory unit." The claims for this patent describe what would
later be termed RAID 5 with full stripe writes. This 1978 patent also mentions that disk
mirroring or duplexing (what would later be termed RAID 1) and protection with dedicated
parity (that would later be termed RAID 4) were prior art at that time.

The term RAID was first defined by David A. Patterson, Garth A. Gibson and Randy Katz at the
University of California, Berkeley in 1987. They studied the possibility of using two or more
drives to appear as a single device to the host system and published a paper: "A Case for
Redundant Arrays of Inexpensive Disks (RAID)" in June 1988 at the SIGMOD conference.[1]

This specification suggested a number of prototype RAID levels, or combinations of drives. Each
had theoretical advantages and disadvantages. Over the years, different implementations of the
RAID concept have appeared. Most differ substantially from the original idealized RAID levels,
but the numbered names have remained. This can be confusing, since one implementation of
RAID 5, for example, can differ substantially from another. RAID 3 and RAID 4 are often
confused and even used interchangeably.
The standard RAID levels are a basic set of RAID configurations and employ striping,
mirroring, or parity. The standard RAID levels can be nested for other benefits (see Nested RAID
levels for modes like 1+0 or 0+1).

Contents
[hide]

 1 Concatenation (SPAN)
 2 RAID 0
o 2.1 RAID 0 failure rate
o 2.2 RAID 0 performance
 3 RAID 1
o 3.1 RAID 1 failure rate
o 3.2 RAID 1 performance
 4 RAID 2
 5 RAID 3
 6 RAID 4
 7 RAID 5
o 7.1 RAID 5 parity handling
o 7.2 RAID 5 disk failure rate
o 7.3 RAID 5 performance
o 7.4 RAID 5 usable size
o 7.5 ZFS RAID 5
 8 RAID 6
o 8.1 Redundancy and data loss recovery capability
o 8.2 Performance (speed)
o 8.3 Efficiency (potential waste of storage)
o 8.4 Implementation
 9 Non-standard RAID levels
 10 Alternatives
o 10.1 SLED
 11 See also
 12 References
 13 External links

[edit] Concatenation (SPAN)


Diagram of a JBOD setup.

The controller treats each drive as a stand-alone disk, therefore each drive is an independent
logical drive. Concatenation does not provide data redundancy.

Concatenation or spanning of disks is not one of the numbered RAID levels, but it is a popular
method for combining multiple physical disk drives into a single virtual disk. It provides no data
redundancy. As the name implies, disks are merely concatenated together, end to beginning, so
they appear to be a single large disk.

Concatenation may be thought of as the inverse of partitioning. Whereas partitioning takes one
physical drive and creates two or more logical drives, concatenation uses two or more physical
drives to create one logical drive.

In that it consists of an array of independent disks, it can be thought of as a distant relative of


RAID. Concatenation is sometimes used to turn several odd-sized drives into one larger useful
drive, which cannot be done with RAID 0. For example, JBOD ("just a bunch of disks") could
combine 3 GB, 15 GB, 5.5 GB, and 12 GB drives into a logical drive at 35.5 GB, which is often
more useful than the individual drives separately.

In the diagram to the right, data are concatenated from the end of disk 0 (block A63) to the
beginning of disk 1 (block A64); end of disk 1 (block A91) to the beginning of disk 2 (block
A92). If RAID 0 were used, then disk 0 and disk 2 would be truncated to 28 blocks, the size of
the smallest disk in the array (disk 1) for a total size of 84 blocks.

Some RAID controllers use JBOD to refer to configuring drives without RAID features. Each
drive shows up separately in the OS. This JBOD is not the same as concatenation.

Many Linux distributions use the terms "linear mode" or "append mode". The Mac OS X 10.4
implementation – called a "Concatenated Disk Set" – does not leave the user with any usable
data on the remaining drives if one drive fails in a concatenated disk set, although the disks
otherwise operate as described above.
Concatenation is one of the uses of the Logical Volume Manager in Linux, which can be used to
create virtual drives spanning multiple physical drives and/or partitions.

Microsoft's Windows Home Server employs drive extender technology, whereby an array of
independent disks (JBOD) are combined by the OS to form a single pool of available storage.
This storage is presented to the user as a single set of network shares. Drive extender technology
expands on the normal features of concatenation by providing data redundancy through software
– a shared folder can be marked for duplication, which signals to the OS that a copy of the data
should be kept on multiple physical disks, whilst the user will only ever see a single instance of
their data.[1]

The ZFS combined filesystem and RAID software does not support this mode for pool
configuration, when disks are added to a storage pool (even if they are of differing sizes) they are
always in a (dynamic) stripe. When used in the context of ZFS the term JBOD refers to seeing
the drives/luns without a hardware (or other software) RAID.

[edit] RAID 0

Diagram of a RAID 0 setup.

A RAID 0 (also known as a stripe set or striped volume) splits data evenly across two or more
disks (striped) with no parity information for redundancy. It is important to note that RAID 0
was not one of the original RAID levels and provides no data redundancy. RAID 0 is normally
used to increase performance, although it can also be used as a way to create a small number of
large virtual disks out of a large number of small physical ones.

A RAID 0 can be created with disks of differing sizes, but the storage space added to the array
by each disk is limited to the size of the smallest disk. For example, if a 120 GB disk is striped
together with a 100 GB disk, the size of the array will be 200 GB.
[edit] RAID 0 failure rate

Although RAID 0 was not specified in the original RAID paper, an idealized implementation of
RAID 0 would split I/O operations into equal-sized blocks and spread them evenly across two
disks. RAID 0 implementations with more than two disks are also possible, though the group
reliability decreases with member size.

Reliability of a given RAID 0 set is equal to the average reliability of each disk divided by the
number of disks in the set:

That is, reliability (as measured by mean time to failure (MTTF) or mean time between failures
(MTBF) is roughly inversely proportional to the number of members – so a set of two disks is
roughly half as reliable as a single disk. If there were a probability of 5% that the disk would fail
within three years, in a two disk array, that probability would be upped to

The reason for this is that the file system is distributed across all disks. When a drive fails the file
system cannot cope with such a large loss of data and coherency since the data is "striped" across
all drives (the data cannot be recovered without the missing disk). Data can be recovered using
special tools, however, this data will be incomplete and most likely corrupt, and data recovery is
typically very costly and not guaranteed.

[edit] RAID 0 performance

While the block size can technically be as small as a byte, it is almost always a multiple of the
hard disk sector size of 512 bytes. This lets each drive seek independently when randomly
reading or writing data on the disk. How much the drives act independently depends on the
access pattern from the file system level. For reads and writes that are larger than the stripe size,
such as copying files or video playback, the disks will be seeking to the same position on each
disk, so the seek time of the array will be the same as that of a single drive. For reads and writes
that are smaller than the stripe size, such as database access, the drives will be able to seek
independently. If the sectors accessed are spread evenly between the two drives, the apparent
seek time of the array will be half that of a single drive (assuming the disks in the array have
identical access time characteristics). The transfer speed of the array will be the transfer speed of
all the disks added together, limited only by the speed of the RAID controller. Note that these
performance scenarios are in the best case with optimal access patterns.
RAID 0 is useful for setups such as large read-only NFS servers where mounting many disks is
time-consuming or impossible and redundancy is irrelevant.

RAID 0 is also used in some gaming systems where performance is desired and data integrity is
not very important. However, real-world tests with games have shown that RAID-0 performance
gains are minimal, although some desktop applications will benefit.[2][3] Another article examined
these claims and concludes: "Striping does not always increase performance (in certain situations
it will actually be slower than a non-RAID setup), but in most situations it will yield a significant
improvement in performance." [4]

[edit] RAID 1

Diagram of a RAID 1 setup

A RAID 1 creates an exact copy (or mirror) of a set of data on two or more disks. This is useful
when read performance or reliability are more important than data storage capacity. Such an
array can only be as big as the smallest member disk. A classic RAID 1 mirrored pair contains
two disks (see diagram), which increases reliability geometrically over a single disk. Since each
member contains a complete copy of the data, and can be addressed independently, ordinary
wear-and-tear reliability is raised by the power of the number of self-contained copies.

[edit] RAID 1 failure rate

As a trivial example, consider a RAID 1 with two identical models of a disk drive with a 5%
probability that the disk would fail within three years. Provided that the failures are statistically
independent, then the probability of both disks failing during the three year lifetime is

.
Thus, the probability of losing all data is 0.25% if the first failed disk is never replaced. If only
one of the disks fails, no data would be lost, assuming the failed disk is replaced before the
second disk fails.

However, since two identical disks are used and since their usage patterns are also identical, their
failures can not be assumed to be independent. Thus, the probability of losing all data, if the first
failed disk is not replaced, is considerably higher than 0.25% but still below 5%.

[edit] RAID 1 performance

Since all the data exists in two or more copies, each with its own hardware, the read performance
can go up roughly as a linear multiple of the number of copies. That is, a RAID 1 array of two
drives can be reading in two different places at the same time, though not all implementations of
RAID 1 do this.[5] To maximize performance benefits of RAID 1, independent disk controllers
are recommended, one for each disk. Some refer to this practice as splitting or duplexing. When
reading, both disks can be accessed independently and requested sectors can be split evenly
between the disks. For the usual mirror of two disks, this would, in theory, double the transfer
rate when reading. The apparent access time of the array would be half that of a single drive.
Unlike RAID 0, this would be for all access patterns, as all the data are present on all the disks.
In reality, the need to move the drive heads to the next block (to skip blocks already read by the
other drives) can effectively mitigate speed advantages for sequential access. Read performance
can be further improved by adding drives to the mirror. Many older IDE RAID 1 controllers read
only from one disk in the pair, so their read performance is always that of a single disk. Some
older RAID 1 implementations would also read both disks simultaneously and compare the data
to detect errors. The error detection and correction on modern disks makes this less useful in
environments requiring normal availability. When writing, the array performs like a single disk,
as all mirrors must be written with the data. Note that these performance scenarios are in the best
case with optimal access patterns.

RAID 1 has many administrative advantages. For instance, in some environments, it is possible
to "split the mirror": declare one disk as inactive, do a backup of that disk, and then "rebuild" the
mirror. This is useful in situations where the file system must be constantly available. This
requires that the application supports recovery from the image of data on the disk at the point of
the mirror split. This procedure is less critical in the presence of the "snapshot" feature of some
file systems, in which some space is reserved for changes, presenting a static point-in-time view
of the file system. Alternatively, a new disk can be substituted so that the inactive disk can be
kept in much the same way as traditional backup. To keep redundancy during the backup
process, some controllers support adding a third disk to an active pair. After a rebuild to the third
disk completes, it is made inactive and backed up as described above.

[edit] RAID 2
A RAID 2 stripes data at the bit (rather than block) level, and uses a Hamming code for error
correction. The disks are synchronized by the controller to spin in perfect tandem. Extremely
high data transfer rates are possible. This is the only original level of RAID that is not currently
used.
The use of the Hamming(7,4) code (four data bits plus three parity bits) also permits using 7
disks in RAID 2, with 4 being used for data storage and 3 being used for error correction.

RAID 2 is the only standard RAID level, other than some implementations of RAID 6, which
can automatically recover accurate data from single-bit corruption in data. Other RAID levels
can detect single-bit corruption in data, or can sometimes reconstruct missing data, but cannot
reliably resolve contradictions between parity bits and data bits without human intervention.

(Multiple-bit corruption is possible though extremely rare. RAID 2 can detect but not repair
double-bit corruption.)

All hard disks soon after implemented an error correction code that also used Hamming code, so
RAID 2's error corrrection was now redundant and added unnecessary complexity. Like RAID 3,
this level quickly became useless and it is now obsolete. There are no commercial applications of
RAID 2.[6][7]

[edit] RAID 3

Diagram of a RAID 3 setup of 6-byte blocks and two parity bytes, shown are two blocks of data (orange
and green)

A RAID 3 uses byte-level striping with a dedicated parity disk. RAID 3 is very rare in practice.
One of the side effects of RAID 3 is that it generally cannot service multiple requests
simultaneously. This comes about because any single block of data will, by definition, be spread
across all members of the set and will reside in the same location. So, any I/O operation requires
activity on every disk and usually requires synchronized spindles.

In our example, a request for block "A" consisting of bytes A1-A6 would require all three data
disks to seek to the beginning (A1) and reply with their contents. A simultaneous request for
block B would have to wait.
However, the performance characteristic of RAID 3 is very consistent, unlike higher RAID
levels, the size of a stripe is less than the size of a sector or OS block so that, for both reading
and writing, the entire stripe is accessed every time. The performance of the array is therefore
identical to the performance of one disk in the array except for the transfer rate, which is
multiplied by the number of data drives (i.e., less parity drives).

This makes it best for applications that demand the highest transfer rates in long sequential reads
and writes, for example uncompressed video editing. Applications that make small reads and
writes from random places over the disk will get the worst performance out of this level.[7]

The requirement that all disks spin synchronizedly, aka in lockstep, added design considerations
to a level that didn't give significant advantages over other RAID levels, so it quickly became
useless and it's nowadays obsolete.[6] Both RAID 3 and RAID 4 were quickly replaced by RAID
5.[8] However, this level has commercial vendors making implementations of it. It's usually
implemented in hardware, and the performance issues are addressed by using large disk caches.[7]

[edit] RAID 4

Diagram of a RAID 4 setup with dedicated parity disk with each color representing the group of blocks in
the respective parity block (a stripe)

A RAID 4 uses block-level striping with a dedicated parity disk. This allows each member of the
set to act independently when only a single block is requested. If the disk controller allows it, a
RAID 4 set can service multiple read requests simultaneously. RAID 4 looks similar to RAID 5
except that it does not use distributed parity, and similar to RAID 3 except that it stripes at the
block level, rather than the byte level. Generally, RAID 4 is implemented with hardware support
for parity calculations, and a minimum of 3 disks is required for a complete RAID 4
configuration.
In the example on the right, a read request for block A1 would be serviced by disk 0. A
simultaneous read request for block B1 would have to wait, but a read request for B2 could be
serviced concurrently by disk 1.

Unfortunately for writing the parity disk becomes a bottleneck, as simultaneous writes to A1 and
B2 would in addition to the writes to their respective drives also both need to write to the parity
drive. In this way RAID example 4 places a very high load on the parity drive in an array.

The performance of RAID 4 in this configuration can be very poor, but unlike RAID 3 it does
not need synchronized spindles. However, if RAID 4 is implemented on synchronized drives and
the size of a stripe is reduced below the OS block size a RAID 4 array then has the same
performance pattern as a RAID 3 array. For today, only one enterprise-level class
implementation of RAID 4 exist - in network storage systems of NetApp company. Performance
problems, described above, was succesully solved with using a special cached "full stripe
recording" mode at own WAFL (Write Anywhere File Layout) filesystem.

Both RAID 3 and RAID 4 were quickly replaced by RAID 5.[8]

[edit] RAID 5

Diagram of a RAID 5 setup with distributed parity with each color representing the group of blocks in the
respective parity block (a stripe). This diagram shows left asymmetric algorithm

A RAID 5 uses block-level striping with parity data distributed across all member disks. RAID 5
has achieved popularity because of its low cost of redundancy. This can be seen by comparing
the number of drives needed to achieve a given capacity. RAID 1 or RAID 1+0, which yield
redundancy, give only s / 2 storage capacity, where s is the sum of the capacities of n drives
used. In RAID 5, the yield is . As an example, four 1TB drives can be made into a
2 TB redundant array under RAID 1 or RAID 1+0, but the same four drives can be used to build
a 3 TB array under RAID 5. Although RAID 5 is commonly implemented in a disk controller,
some with hardware support for parity calculations (hardware RAID cards) and some using the
main system processor (motherboard based RAID controllers), it can also be done at the
operating system level, e.g., using Windows Dynamic Disks or with mdadm in Linux. A
minimum of three disks is required for a complete RAID 5 configuration. In some
implementations a degraded RAID 5 disk set can be made (three disk set of which only two are
online), while mdadm supports a fully-functional (non-degraded) RAID 5 setup with two disks -
which function as a slow RAID-1, but can be expanded with further volumes.

In the example, a read request for block A1 would be serviced by disk 0. A simultaneous read
request for block B1 would have to wait, but a read request for B2 could be serviced
concurrently by disk 1.

[edit] RAID 5 parity handling

A concurrent series of blocks (one on each of the disks in an array) is collectively called a stripe.
If another block, or some portion thereof, is written on that same stripe, the parity block, or some
portion thereof, is recalculated and rewritten. For small writes, this requires...

 Read the old data block


 Read the old parity block
 Compare the old data block with the write request. For each bit that has flipped (changed from
0 to 1, or from 1 to 0) in the data block, flip the corresponding bit in the parity block
 Write the new data block
 Write the new parity block

The disk used for the parity block is staggered from one stripe to the next, hence the term
distributed parity blocks. RAID 5 writes are expensive in terms of disk operations and traffic
between the disks and the controller.

The parity blocks are not read on data reads, since this would be unnecessary overhead and
would diminish performance. The parity blocks are read, however, when a read of blocks in the
stripe and within the parity block in the stripe are used to reconstruct the errant sector. The CRC
error is thus hidden from the main computer. Likewise, should a disk fail in the array, the parity
blocks from the surviving disks are combined mathematically with the data blocks from the
surviving disks to reconstruct the data on the failed drive on-the-fly.

This is sometimes called Interim Data Recovery Mode. The computer knows that a disk drive
has failed, but this is only so that the operating system can notify the administrator that a drive
needs replacement; applications running on the computer are unaware of the failure. Reading and
writing to the drive array continues seamlessly, though with some performance degradation.

[edit] RAID 5 disk failure rate

The maximum number of drives in a RAID 5 redundancy group is theoretically unlimited, but it
is common practice to limit the number of drives. The tradeoffs of larger redundancy groups are
greater probability of a simultaneous double disk failure, the increased time to rebuild a
redundancy group, and the greater probability of encountering an unrecoverable sector during
RAID reconstruction. As the number of disks in a RAID 5 group increases, the mean time
between failures (MTBF, the reciprocal of the failure rate) can become lower than that of a
single disk. This happens when the likelihood of a second disk's failing out of N − 1 dependent
disks, within the time it takes to detect, replace and recreate a first failed disk, becomes larger
than the likelihood of a single disk's failing.

Worsening this issue has been a relatively stagnant unrecoverable read-error rate of disks for the
last few years, which is typically on the order of one error in 1014 bits for SATA drives.[9] As disk
densities have gone up drastically (> 1 TB) in recent years, it actually becomes probable with a
~10 TB array that an unrecoverable read error will occur during a RAID-5 rebuild.[9] Some of
these potential errors can be avoided in RAID systems that automatically and periodically test
their disks at times of low demand.[10] Expensive enterprise-class disks with lower densities and
better error rates of about 1 in 1015 bits can improve the odds slightly as well.[citation needed] But the
general problem remains that, for modern drives with moving parts that use most of their
capacity regularly, the disk capacity is now in the same order of magnitude as the (inverted)
failure rate, unlike decades earlier when they were a safer two or more magnitudes apart.[citation
needed]
Furthermore, RAID rebuilding pushes a disk system to its maximum throughput, virtually
guaranteeing a failure in the short time it runs.[citation needed] Even enterprise-class RAID 5 setups
will suffer unrecoverable errors in coming years, unless manufacturers are able to establish a
new level of mass-storage reliability through lower failure rates or improved error recovery.[citation
needed]

Nevertheless, there are some short-term strategies for reducing the possibility of failures during
recovery. RAID 6 (below) provides dual-parity protection, allowing the RAID system to
maintain single-failure tolerance until the failed disk has been replaced and the second parity
stripe rebuilt. Some RAID implementations include a hot-spare disk to speed up replacement.
Also, drive failures do not occur randomly, but follow the "bathtub curve". Most failures occur
early and late in the life of the device, and are often connected to production in a way that skews
the failures toward specific manufacturing lots. RAID vendors can try to avoid these lot-based
problems by ensuring that all the disks in a redundancy group are from different lots.[citation needed]

Solid-state drives (SSDs) may present a revolutionary instead of evolutionary way of dealing
with increasing RAID-5 rebuild limitations. With encouragement from many flash-SSD
manufacturers, JEDEC is preparing to set standards in 2009 for measuring UBER (uncorrectable
bit error rates) and "raw" bit error rates (error rates before ECC, error correction code).[11] But
even the economy-class Intel X25-M SSD claims an unrecoverable error rate of 1 sector in 1015
bits and an MTBF of two million hours.[12] Ironically, the much-faster throughput of SSDs
(STEC claims its enterprise-class Zeus SSDs exceed 200 times the transactional performance of
today's 15k-RPM, enterprise-class HDDs)[13] suggests that a similar error rate (1 in 1015) will
result a two-magnitude shortening of MTBF.[citation needed]
[edit] RAID 5 performance

RAID 5 implementations suffer from poor performance when faced with a workload which
includes many writes which are smaller than the capacity of a single stripe.[citation needed] This is
because parity must be updated on each write, requiring read-modify-write sequences for both
the data block and the parity block. More complex implementations may include a non-volatile
write back cache to reduce the performance impact of incremental parity updates.

Random write performance is poor, especially at high concurrency levels common in large multi-
user databases. The read-modify-write cycle requirement of RAID 5's parity implementation
penalizes random writes by as much as an order of magnitude compared to RAID 0.[14]

Performance problems can be so severe that some database experts have formed a group called
BAARF — the Battle Against Any Raid Five.[15]

The read performance of RAID 5 is almost as good as RAID 0 for the same number of disks.
Except for the parity blocks, the distribution of data over the drives follows the same pattern as
RAID 0. The reason RAID 5 is slightly slower is that the disks must skip over the parity blocks.

In the event of a system failure while there are active writes, the parity of a stripe may become
inconsistent with the data. If this is not detected and repaired before a disk or block fails, data
loss may ensue as incorrect parity will be used to reconstruct the missing block in that stripe.
This potential vulnerability is sometimes known as the write hole. Battery-backed cache and
similar techniques are commonly used to reduce the window of opportunity for this to occur. The
same issue occurs for RAID-6.

[edit] RAID 5 usable size

Parity data uses up the capacity of one drive in the array (this can be seen by comparing it with
RAID 4: RAID 5 distributes the parity data across the disks, while RAID 4 centralizes it on one
disk, but the amount of parity data is the same). If the drives vary in capacity, the smallest of
them sets the limit. Therefore, the usable capacity of a RAID 5 array is , where
N is the total number of drives in the array and Smin is the capacity of the smallest drive in the
array.

The number of hard disks that can belong to a single array is theoretically unlimited.

[edit] ZFS RAID 5

ZFS raid is based on the ideas behind RAID 5. It is similar to RAID-5 but uses variable stripe
width to eliminate the RAID-5 write hole (stripe corruption caused by loss of power between
data and parity updates).[16]

[edit] RAID 6
Diagram of a RAID 6 setup, which is identical to RAID 5 other than the addition of a second parity block

[edit] Redundancy and data loss recovery capability

RAID 6 extends RAID 5 by adding an additional parity block; thus it uses block-level striping
with two parity blocks distributed across all member disks.[citation needed] It was not one of the
original RAID levels.

RAID 5 can be seen as a special case of a Reed-Solomon code.[17] RAID 5, being a degenerate
case, requires only addition in the Galois field. Since the operations are on bits, the field used is a
binary galois field GF(2). In cyclic representations of binary galois fields, addition is computed
by a simple XOR.[citation needed]

After understanding RAID 5 as a special case of a Reed-Solomon code, it is easy to see that it is
possible to extend the approach to produce redundancy simply by producing another syndrome;
typically a polynomial in GF(28) (8 means we are operating on bytes). By adding additional
syndromes it is possible to achieve any number of redundant disks, and recover from the failure
of that many drives anywhere in the array, but RAID 6 refers to the specific case of two
syndromes.[citation needed]

[edit] Performance (speed)

RAID 6 does not have a performance penalty for read operations, but it does have a performance
penalty on write operations because of the overhead associated with parity calculations.
Performance varies greatly depending on how RAID 6 is implemented in the manufacturer's
storage architecture – in software, firmware or by using firmware and specialized ASICs for
intensive parity calculations. It can be as fast as a RAID-5 system with one less drive (same
number of data drives).[18]

[edit] Efficiency (potential waste of storage)

RAID 6 is no more space inefficient than RAID 5 with a hot spare drive when used with a small
number of drives, but as arrays become bigger and have more drives the loss in storage capacity
becomes less important and the probability of data loss is greater. RAID 6 provides protection
against data loss during an array rebuild; when a second drive is lost, a bad block read is
encountered, or when a human operator accidentally removes and replaces the wrong disk drive
when attempting to replace a failed drive.

The usable capacity of a RAID 6 array is , where N is the total number of


drives in the array and Smin is the capacity of the smallest drive in the array.

[edit] Implementation

According to the Storage Networking Industry Association (SNIA), the definition of RAID 6 is:
"Any form of RAID that can continue to execute read and write requests to all of a RAID array's
virtual disks in the presence of any two concurrent disk failures. Several methods, including dual
check data computations (parity and Reed-Solomon), orthogonal dual parity check data and
diagonal parity, have been used to implement RAID Level 6."[19]

Network-attached storage (NAS) is file-level computer data storage connected to a computer


network providing data access to heterogeneous network clients.

Contents
[hide]

 1 Description
 2 History
 3 Benefits
 4 Drawbacks
 5 Uses
 6 Operating systems for personal computers
 7 List of Open Source implementations
 8 See also
 9 References
 10 External links

[edit] Description
Visual differentiation of NAS vs. SAN use in network architecture.

A NAS unit is essentially a self-contained computer connected to a network, with the sole
purpose of supplying file-based data storage services to other devices on the network. The
operating system and other software on the NAS unit provide the functionality of data storage,
file systems, and access to files, and the management of these functionalities. The unit is not
designed to carry out general-purpose computing tasks, although it may technically be possible
to run other software on it. NAS units usually do not have a keyboard or display, and are
controlled and configured over the network, often by connecting a browser to their network
address. The alternative to NAS storage on a network is to use a computer as a file server. In its
most basic form a dedicated file server is no more than a NAS unit with keyboard and display
and an operating system which, while optimised for providing storage services, can run other
tasks; however, file servers are increasingly used to supply other functionality, such as supplying
database services, email services, and so on.

A general-purpose operating system is not needed on a NAS device, and often minimal-
functionality or stripped-down operating systems are used. For example FreeNAS, which is
Free / open source NAS software designed for use on standard computer hardware, is just a
version of FreeBSD with all functionality not related to data storage stripped out. NASLite is a
highly optimized Linux distribution running from a floppy disk for the sole purpose of a NAS.
Likewise, NexentaStor is based upon the core of the NexentaOS, a Free / open source hybrid
operating system with an OpenSolaris core and a GNU user environment.

NAS systems contain one or more hard disks, often arranged into logical, redundant storage
containers or RAID arrays (redundant arrays of inexpensive/independent disks). NAS removes
the responsibility of file serving from other servers on the network.

NAS uses file-based protocols such as NFS (popular on UNIX systems), SMB/CIFS (Server
Message Block/Common Internet File System) (used with MS Windows systems), or AFP (used
with Apple Macintosh computers). NAS units rarely limit clients to a single protocol.

NAS provides both storage and filesystem. This is often contrasted with SAN (Storage Area
Network), which provides only block-based storage and leaves filesystem concerns on the
"client" side. SAN protocols are SCSI, Fibre Channel, iSCSI, ATA over Ethernet (AoE), or
HyperSCSI.

Despite differences SAN and NAS are not exclusive and may be combined in one solution:
SAN-NAS hybrid

The boundaries between NAS and SAN systems are starting to overlap, with some products
making the obvious next evolution and offering both file level protocols (NAS) and block level
protocols (SAN) from the same system. An example of this is Openfiler, a free software product
running on Linux.

[edit] History
Network-attached storage was introduced with the early file sharing Novell's NetWare server
operating system and NCP protocol in 1983. In the UNIX world, Sun Microsystems' 1984
release of NFS allowed network servers to share their storage space with networked clients.
3Com and Microsoft would develop the LAN Manager software and protocol to further this new
market. 3Com's 3Server and 3+Share software was the first purpose-built servers (including
proprietary hardware, software, and multiple disks) for open systems servers. Inspired by the
success of file servers from Novell, IBM, and Sun, several firms developed dedicated file
servers. While 3Com was among the first firms to build a dedicated NAS for desktop operating
systems, Auspex Systems was one of the first to develop a dedicated NFS server for use in the
UNIX market. A group of Auspex engineers split away in the early 1990s to create the integrated
NetApp filer, which supported both Windows' CIFS and UNIX'es NFS, and had superior
scalability and ease of deployment. This started the market for proprietary NAS devices now led
by NetApp and EMC Celerra.

Starting in the early 2000s, a series of startups emerged offering alternative solutions to single
filer solutions in the form of clustered NAS – Spinnaker Networks (now acquired by NetApp),
Exanet, IBRIX, Isilon, PolyServe (acquired by Hewlett-Packard in 2007), and Panasas, to name a
few.

In 2009, the parallel NFS pNFS extension to the NFSv4 standard was ratified by the IETF
working group.

[edit] Benefits
Availability of data might potentially be increased with NAS if it provides built-in RAID and
clustering.

Performance can be increased by NAS because the file serving is done by the NAS and not done
by a server responsible for also doing other processing. The performance of NAS devices,
though, depends heavily on the speed of and traffic on the network and on the amount of cache
memory (RAM) on the NAS computers or devices.

It should be noted that NAS is effectively a server in itself, with all major components of a
typical PC – a CPU, motherboard, RAM, etc. – and its reliability is a function of how well it is
designed internally. A NAS without redundant data access paths, redundant controllers,
redundant power supplies, is probably less reliable than Direct Attached Storage (DAS)
connected to a server which does have redundancy for its major components.

[edit] Drawbacks
Due to the multiprotocol, and the reduced CPU and OS layer, the NAS has its limitations
compared to the DAS/SAN systems. If the NAS is occupied with too many users, too many I/O
operations, or CPU processing power that is too demanding, the NAS reaches its limitations. A
server system is easily upgraded by adding one or more servers into a cluster, so CPU power can
be upgraded, while the NAS is limited to its own hardware, which is in most cases not
upgradeable.

Certain NAS devices fail to expose well-known services that are typical of a file server, or
enable them in a way that is not efficient. Examples are: ability to compute disk usage of
separate directories, ability to index files rapidly (locate), ability to mirror efficiently with rsync.
One may still use rsync, but through an NFS or CIFS client; that method fails to enumerate huge
file hierarchies at the nominal speed of local drives and induces considerable network traffic.
The key difference between DAS and NAS is that DAS is simply an extension to an existing
server and is not networked while NAS sits on a network as its own entity; it is easier to share
files with NAS. NAS typically has less CPU and I/O power compared to DAS.

[edit] Uses
NAS is useful for more than just general centralized storage provided to client computers in
environments with large amounts of data. NAS can enable simpler and lower cost systems such
as load-balancing and fault-tolerant email and web server systems by providing storage services.
The potential emerging market for NAS is the consumer market where there is a large amount of
multi-media data. Such consumer market appliances are now commonly available. Unlike their
rackmounted counterparts, they are generally packaged in smaller form factors. The price of
NAS appliances has plummeted in recent years, offering flexible network-based storage to the
home consumer market for little more than the cost of a regular USB or FireWire external hard
disk. Many of these home consumer devices are built around ARM, PowerPC or MIPS
processors running an embedded Linux operating system.

[edit] Operating systems for personal computers


Open source NAS-oriented distributions of Linux and FreeBSD are available, including
FreeNAS, NASLite and Openfiler. They are easy to configure via a Web-based Interface and run
on low-end conventional computers. They can run from a Live CD, bootable USB flash drive
(Live USB), or from one of the mounted hard drives. They run Samba, NFS daemon, and FTP
daemons which are freely available for those operating systems.

NexentaStor, built on the NexentaCore Platform, is similar in that it is built on open source
foundations; however, NexentaStor requires more memory than consumer oriented open source
NAS solutions and also contains most of the features of enterprise class NAS solutions, such as
snapshots, management utilities, tiering services, mirroring, and end to end check summing due,
in part, to the use of ZFS.

[edit] List of Open Source implementations


This is a list of open source implementations which allow a PC to be set up very quickly as a
NAS server:

 FreeNAS
 Openfiler
 NASLite
 Sun Open Storage

[edit] See also


Wikimedia Commons has media related to: Network-attached storage
 List of NAS manufacturers
 Clustered NAS
 Parallel NFS
 File Area Networking
 Storage area network
 Shared disk file system
 Disk enclosure
 Network architecture
 Connection-oriented protocol
 Connectionless protocol
 Congestion collapse
 Network protocols used by NAS devices:
o Server Message Block (SMB) (Mostly superseded by CIFS, also see Samba)
o NFS
o FTP
o HTTP
o UPnP
o AFP
o rsync
o SSH
o Unison
o AFS
o iSCSI
 Network File Control

Cloud computing is a style of computing in which dynamically scalable and often virtualized
resources are provided as a service over the Internet.[1][2] Users need not have knowledge of,
expertise in, or control over the technology infrastructure in the "cloud" that supports them.[3]

The concept generally incorporates combinations of the following:

 infrastructure as a service (IaaS)


 platform as a service (PaaS)
 software as a service (SaaS)
 Other recent (ca. 2007–09)[4][5] technologies that rely on the Internet to satisfy the computing
needs of users. Cloud computing services often provide common business applications online
that are accessed from a web browser, while the software and data are stored on the servers.

The term cloud is used as a metaphor for the Internet, based on how the Internet is depicted in
computer network diagrams and is an abstraction for the complex infrastructure it conceals.[6]

The first academic use of this term appears to be by Prof. Ramnath K. Chellappa (currently at
Goizueta Business School, Emory University) who originally defined it as a computing
paradigm where the boundaries of computing will be determined by economic rationale rather
than technical limits.[7]
Contents
[hide]

 1 Brief
o 1.1 Comparisons
o 1.2 Characteristics
o 1.3 Economics
o 1.4 Companies
o 1.5 Architecture
 2 History
 3 Criticism and disadvantages
 4 Political issues
 5 Legal issues
 6 Key characteristics
 7 Components
o 7.1 Client
o 7.2 Service
o 7.3 Application
o 7.4 Platform
o 7.5 Infrastructure
 8 Architecture
 9 Types
o 9.1 Public cloud
o 9.2 Hybrid cloud
o 9.3 Private cloud
 10 Roles
o 10.1 Provider
o 10.2 User
o 10.3 Vendor
 11 Standards
 12 See also
 13 References
 14 External links

[edit] Brief
[edit] Comparisons

Cloud computing can be confused with:


1. grid computing—"a form of distributed computing whereby a 'super and virtual computer' is
composed of a cluster of networked, loosely coupled computers, acting in concert to perform
very large tasks";
2. utility computing—the "packaging of computing resources, such as computation and storage, as
a metered service similar to a traditional public utility such as electricity";[8] and
3. autonomic computing—"computer systems capable of self-management".[9]

Indeed, many cloud computing deployments as of 2009 depend on grids, have autonomic
characteristics, and bill like utilities—but cloud computing tends to expand what is provided by
grids and utilities.[10] Some successful cloud architectures have little or no centralized
infrastructure or billing systems whatsoever, including peer-to-peer networks such as BitTorrent
and Skype, and volunteer computing such as SETI@home.[11][12]

Furthermore, many analysts are keen to stress the evolutionary, incremental pathway between
grid technology and cloud computing, tracing roots back to Application Service Providers
(ASPs) in the 1990s and the parallels to SaaS, often referred to as applications on the cloud.[13]
Some believe the true difference between these terms is marketing and branding; that the
technology evolution was incremental and the marketing evolution discrete.[14]

[edit] Characteristics

Cloud computing customers do not generally own the physical infrastructure serving as host to
the software platform in question. Instead, they avoid capital expenditure by renting usage from a
third-party provider. They consume resources as a service and pay only for resources that they
use. Many cloud-computing offerings employ the utility computing model, which is analogous to
how traditional utility services (such as electricity) are consumed, while others bill on a
subscription basis. Sharing "perishable and intangible" computing power among multiple tenants
can improve utilization rates, as servers are not unnecessarily left idle (which can reduce costs
significantly while increasing the speed of application development). A side effect of this
approach is that overall computer usage rises dramatically, as customers do not have to engineer
for peak load limits.[15] Additionally, "increased high-speed bandwidth" makes it possible to
receive the same response times from centralized infrastructure at other sites.

[edit] Economics
Diagram showing economics of cloud computing versus traditional IT, including capital expenditure
(CapEx) and operational expenditure (OpEx)

Cloud computing users can avoid capital expenditure (CapEx) on hardware, software, and
services when they pay a provider only for what they use. Consumption is usually billed on a
utility (e.g. resources consumed, like electricity) or subscription (e.g. time based, like a
newspaper) basis with little or no upfront cost. A few cloud providers are now beginning to offer
the service for a flat monthly fee as opposed to on a utility billing basis. Other benefits of this
time sharing style approach are low barriers to entry, shared infrastructure and costs, low
management overhead, and immediate access to a broad range of applications. Users can
generally terminate the contract at any time (thereby avoiding return on investment risk and
uncertainty) and the services are often covered by service level agreements (SLAs) with financial
penalties.[16][17]

According to Nicholas Carr, the strategic importance of information technology is diminishing as


it becomes standardized and less expensive. He argues that the cloud computing paradigm shift
is similar to the displacement of electricity generators by electricity grids early in the 20th
century.[18]

Although companies might be able to save on upfront capital expenditures, they might not save
much and might actually pay more for operating expenses. In situations where the capital
expense would be relatively small, or where the organization has more flexibility in their capital
budget than their operating budget, the cloud model might not make great fiscal sense. Other
factors impacting the scale of any potential cost savings include the efficiency of a company’s
data center as compared to the cloud vendor’s, the company’s existing operating costs, the level
of adoption of cloud computing, and the type of functionality being hosted in the cloud. [19][20]

[edit] Companies

The "big four" of cloud computing services are said to be Amazon, Google, Microsoft and
Salesforce.com.[21][22] Cloud computing is also being adopted by individual users through large
enterprise customers including General Electric, Procter & Gamble and Valeo[23][24].

[edit] Architecture

The majority of cloud computing infrastructure, as of 2009, consists of reliable services


delivered through data centers and built on servers with different levels of virtualization
technologies. The services are accessible anywhere that provides access to networking
infrastructure. Clouds often appear as single points of access for all consumers' computing needs.
Commercial offerings are generally expected to meet quality of service (QoS) requirements of
customers and typically offer SLAs.[25] Open standards are critical to the growth of cloud
computing, and open source software has provided the foundation for many cloud computing
implementations.[26]

[edit] History
The Cloud is a term that borrows from telephony. Up to the 1990s, data circuits (including those
that carried Internet traffic) were hard-wired between destinations. Subsequently, long-haul
telephone companies began offering Virtual Private Network (VPN) service for data
communications. Telephone companies were able to offer VPN based services with the same
guaranteed bandwidth as fixed circuits at a lower cost because they could switch traffic to
balance utilization as they saw fit, thus utilizing their overall network bandwidth more
effectively. As a result of this arrangement, it was impossible to determine in advance precisely
which paths the traffic would be routed over. The term "telecom cloud" was used to describe this
type of networking, and cloud computing is conceptually somewhat similar.
Cloud computing relies heavily on virtual machines (VMs), which are spawned on demand to
meet user needs. A common depiction in network diagrams is a cloud outline.[6]

The underlying concept of cloud computing dates back to 1960, when John McCarthy opined
that "computation may someday be organized as a public utility"; indeed it shares characteristics
with service bureaus that date back to the 1960s. The term cloud had already come into
commercial use in the early 1990s to refer to large Asynchronous Transfer Mode (ATM)
networks. [27] Ill-fated startup General Magic launched a short-lived cloud computing product in
1995 in partnership with several telecommunications company partners such as AT&T, just
before the consumer-oriented Internet became popular. By the turn of the 21st century, the term
"cloud computing" began to appear more widely,[28] although most of the focus at that time was
limited to SaaS.

In 1999, Salesforce.com was established by Marc Benioff, Parker Harris, and their associates.
They applied many technologies developed by companies such as Google and Yahoo! to
business applications. They also provided the concept of "On demand" and SaaS with their real
business and successful customers. The key for SaaS is that it is customizable by customers with
limited technical support required. Business users have enthusiastically welcomed the resulting
flexibility and speed.

In the early 2000s, Microsoft extended the concept of SaaS through the development of web
services. IBM detailed these concepts in 2001 in the Autonomic Computing Manifesto, which
described advanced automation techniques such as self-monitoring, self-healing, self-
configuring, and self-optimizing in the management of complex IT systems with heterogeneous
storage, servers, applications, networks, security mechanisms, and other system elements that
can be virtualized across an enterprise.

Amazon played a key role in the development of cloud computing by modernizing their data
centers after the dot-com bubble which, like most computer networks, were using as little as 10%
of their capacity at any one time just to leave room for occasional spikes. Having found that the
new cloud architecture resulted in significant internal efficiency improvements whereby, small,
fast-moving "two-pizza teams" could add new features faster and easier, Amazon started
providing access to their systems through Amazon Web Services on a utility computing basis in
2005.[29]

In 2007, Google, IBM, and a number of universities embarked on a large scale cloud computing
research project,[30] around the time the term started, it was a hot topic. By mid-2008, cloud
computing gained popularity in the mainstream press, and numerous related events took place.[31]
ng will result in dramatic growth in IT products in some areas and in significant reductions in
other areas."[32]

In 2009, Cloud Computing Solutions by Google, Amazon, Microsoft, and IBM are the most
popular among users with Sun and Ubuntu following them in the Cloud.[33]

[edit] Criticism and disadvantages


This article's Criticism or Controversy section(s) may mean the article does not present a neutral
point of view of the subject. It may be better to integrate the material in such sections into the article
as a whole.

Because cloud computing does not allow users to physically possess the storage of their data (the
exception being the possibility that data can be backed up to a user-owned storage device, such
as a USB flash drive or hard disk), it does leave responsibility of data storage and control in the
hands of the provider. Responsibility for backup data, disaster recovery and other static
"snapshots" has been a long-standing concern for both outsourced as well as resident IT systems.
Additional issues are raised around process (methods, functions, transactions, etc.) visibility and
transportability given the more complex nature of cloud and web service systems. Organizations
that rely upon these systems and services now have to consider the additional responsibility to be
able to understand the services being offered (transforms) in order to be able to react to changes
in contracted services, compatability to competing services, and be able to perform their
fiduciary responsibilty for business continuity (disaster recovery, interruption of service) and
business agility (ability to engage competitive services with least impact to operations). [1]. QoS
(Quality of Service), SLAs (Service Level Agreements) and other parametric behaviors need to
be specified as well as monitored for compliance. Although this is a new area, tested patterns for
service delivery can be adapted to allow for monitoring and quality control.[citation needed]

Cloud computing has been criticized for limiting the freedom of users and making them
dependent on the cloud computing provider, and some critics have alleged that it is only possible
to use applications or services that the provider is willing to offer. Writing in the The London
Times, Jonathan Weber compares cloud computing to centralized systems of the 1950s and 60s,
by which users connected through "dumb" terminals to mainframe computers. Typically, users
had no freedom to install new applications and needed approval from administrators to achieve
certain tasks. Overall, it limited both freedom and creativity. The Times article argues that cloud
computing is a regression to that time.[34]

Similarly, Richard Stallman, founder of the Free Software Foundation, believes that cloud
computing endangers liberties because users sacrifice their privacy and personal data to a third
party. He stated that cloud computing is "simply a trap aimed at forcing more people to buy into
locked, proprietary systems that would cost them more and more over time."[35]

Even if data is securely stored in a cloud, many factors can temporarily disrupt access to the data,
such as network outages, denial of service attacks against the service provider, and a major
failure of the service provider infrastructure.

It may be a challenge to host and maintain intranet and access restricted sites (government,
defense, institutional.)

Commercial sites using tools such as web analytics may not be able to capture the data required
for business planning by their customers.[citation needed]
[edit] Political issues
The Cloud spans many borders and "may be the ultimate form of globalization."[36] As such, it
becomes subject to complex geopolitical issues, and providers are pressed to satisfy myriad
regulatory environments in order to deliver service to a global market. This dates back to the
early days of the Internet, when libertarian thinkers felt that "cyberspace was a distinct place
calling for laws and legal institutions of its own"[36].

Despite efforts (such as US-EU Safe Harbor) to harmonize the legal environment, as of 2009,
providers such as Amazon Web Services cater to major markets (typically the United States and
the European Union) by deploying local infrastructure and allowing customers to select
"availability zones."[37] Nonetheless, concerns persist about security and privacy from individual
through governmental levels (e.g., the USA PATRIOT Act, the use of national security letters,
and the Electronic Communications Privacy Act's Stored Communications Act.

[edit] Legal issues


In March 2007, Dell applied to trademark the term "cloud computing" (U.S. Trademark
77,139,082) in the United States. The "Notice of Allowance" the company received in July 2008
was cancelled in August, resulting in a formal rejection of the trademark application less than a
week later.

In September 2008, the United States Patent and Trademark Office (USPTO) issued a "Notice of
Allowance" to CGactive LLC (U.S. Trademark 77,355,287) for "CloudOS". As defined under
this notice, a cloud operating system is a generic operating system that "manage[s] the
relationship between software inside the computer and on the Web", such as Microsoft Azure[38].

In November 2007, the Free Software Foundation released the Affero General Public License, a
version of GPLv3 intended to close a perceived legal loophole associated with Free software
designed to be run over a network, particularly SaaS. An application service provider is required
to release any changes they make to Affero GPL open source code.[citation needed]

[edit] Key characteristics


 Agility improves with users able to rapidly and inexpensively re-provision technological
infrastructure resources. The cost of overall computing is unchanged, however, and the
providers will merely absorb up-front costs and spread costs over a longer period. [39].
 Cost is claimed to be greatly reduced and capital expenditure is converted to operational
expenditure[40]. This ostensibly lowers barriers to entry, as infrastructure is typically provided by
a third-party and does not need to be purchased for one-time or infrequent intensive computing
tasks. Pricing on a utility computing basis is fine-grained with usage-based options and fewer IT
skills are required for implementation (in-house). [41] Some would argue that given the low cost of
computing resources, that the IT burden merely shifts the cost from in-house to outsourced
providers. Furthermore, any cost reduction benefit must be weighed against a corresponding
loss of control, access and security risks.
 Device and location independence[42] enable users to access systems using a web browser
regardless of their location or what device they are using (e.g., PC, mobile). As infrastructure is
off-site (typically provided by a third-party) and accessed via the Internet, users can connect
from anywhere.[41]
 Multi-tenancy enables sharing of resources and costs across a large pool of users thus allowing
for:
o Centralization of infrastructure in locations with lower costs (such as real estate,
electricity, etc.)
o Peak-load capacity increases (users need not engineer for highest possible load-levels)
o Utilization and efficiency improvements for systems that are often only 10–20% utilized.
[29]

 Reliability improves through the use of multiple redundant sites, which makes cloud computing
suitable for business continuity and disaster recovery.[43] Nonetheless, many major cloud
computing services have suffered outages, and IT and business managers can at times do little
when they are affected.[44][45]
 Scalability via dynamic ("on-demand") provisioning of resources on a fine-grained, self-service
basis near real-time, without users having to engineer for peak loads. Performance is monitored,
and consistent and loosely-coupled architectures are constructed using web services as the
system interface.[41]
 Security typically improves due to centralization of data [46], increased security-focused
resources, etc., but concerns can persist about loss of control over certain sensitive data, and
the lack of security for stored kernels [47]. Security is often as good as or better than under
traditional systems, in part because providers are able to devote resources to solving security
issues that many customers cannot afford[48]. Providers typically log accesses, but accessing the
audit logs themselves can be difficult or impossible. Ownership, control and access to data
controlled by "cloud" providers may be made more difficult,just as it is sometimes difficult to
gain access to "live" support with current utilities. Under the cloud paradigm, management of
sensitive data is placed in the hands of cloud providers and third parties. Currently, many
developers are implementing OAuth (open protocol for secure API authorization), as it allows
more granularity of data controls across cloud applications. OAuth is an open protocol, initiated
by Blain Cook and Chris Messina, to allow secure API authorization in a standard method for
desktop, mobile, and web applications.
 Sustainability comes about through improved resource utilization, more efficient systems, and
carbon neutrality.[49][50] Nonetheless, computers and associated infrastructure are major
consumers of energy. A given (server-based) computing task will use X amount of energy
whether it is on-site, or off.[51]

[edit] Components
Six layers components of cloud computing

[edit] Client

See also category: Cloud clients


A cloud client consists of computer hardware and/or computer software which relies on cloud
computing for application delivery, or which is specifically designed for delivery of cloud
services and which, in either case, is essentially useless without it.[52] For example:

 Mobile (Android, iPhone, Windows Mobile)[53][54][55]

 Thin client (CherryPal, Zonbu, gOS-based systems)[56][57][58]

 Thick client / Web browser (Microsoft Internet Explorer, Mozilla Firefox)

[edit] Service

See also category: Cloud services

A cloud service includes "products, services and solutions that are delivered and consumed in
real-time over the Internet"[41]. For example, Web Services ("software system[s] designed to
support interoperable machine-to-machine interaction over a network")[59] which may be
accessed by other cloud computing components, software, e.g., Software plus services, or end
users directly.[60] Specific examples include:

 Identity (OAuth, OpenID)


 Integration (Amazon Simple Queue Service)
 Payments (Amazon Flexible Payments Service, Google Checkout, PayPal)
 Mapping (Google Maps, Yahoo! Maps, MapQuest)
 Search (Alexa, Google Custom Search, Yahoo! BOSS)
 Video Games (OnLive)
 Live chat (LivePerson)
 Others (Amazon Mechanical Turk)

[edit] Application

See also category: Cloud applications

A cloud application leverages the Cloud in software architecture, often eliminating the need to
install and run the application on the customer's own computer, thus alleviating the burden of
software maintenance, ongoing operation, and support. For example:

 Peer-to-peer / volunteer computing (Bittorrent, BOINC Projects, Skype)


 Web application (Twitter)
 Software as a service (Google Apps, SAP and Salesforce)
 Software plus services (Microsoft Online Services)

[edit] Platform

See also category: Cloud platforms


A cloud platform, such as Platform as a service, the delivery of a computing platform, and/or
solution stack as a service, facilitates deployment of applications without the cost and complexity
of buying and managing the underlying hardware and software layers.[61] For example:

 Code Based Web Application Frameworks


o Java Google Web Toolkit (Google App Engine)
o Python Django (Google App Engine)
o Ruby on Rails (Heroku)
o .NET (Azure Services Platform)
 Non-Code Based Web Application Framework
o WorkXpress
o Wolf Frameworks
 Cloud Hosting (Rackspace Cloud Sites)
 Proprietary (Force.com)

[edit] Infrastructure

See also category: Cloud infrastructure

Cloud infrastructure, such as Infrastructure as a service, is the delivery of computer


infrastructure, typically a platform virtualization environment, as a service.[62] For example:

 Full virtualization (GoGrid, Skytap, iland)


 Management (RightScale)
 Compute (Amazon EC2, Rackspace Cloud Servers, Savvis)
 Platform (Force.com)
 Storage (Amazon S3, Nirvanix, Rackspace Cloud Files, Savvis)

[edit] Architecture
Cloud computing sample architecture

Cloud architecture,[63] the systems architecture of the software systems involved in the delivery
of cloud computing, comprises hardware and software designed by a cloud architect who
typically works for a cloud integrator. It typically involves multiple cloud components
communicating with each other over application programming interfaces, usually web services.
[64]

This closely resembles the Unix philosophy of having multiple programs each doing one thing
well and working together over universal interfaces. Complexity is controlled and the resulting
systems are more manageable than their monolithic counterparts.

Cloud architecture extends to the client, where web browsers and/or software applications access
cloud applications.
Cloud storage architecture is loosely coupled, where metadata operations are centralized
enabling the data nodes to scale into the hundreds, each independently delivering data to
applications or users.

[edit] Types
Cloud computing types

[edit] Public cloud

Public cloud or external cloud describes cloud computing in the traditional mainstream sense,
whereby resources are dynamically provisioned on a fine-grained, self-service basis over the
Internet, via web applications/web services, from an off-site third-party provider who shares
resources and bills on a fine-grained utility computing basis.[41]

[edit] Hybrid cloud

A hybrid cloud environment consisting of multiple internal and/or external providers[65] "will be
typical for most enterprises".[66]

[edit] Private cloud

Private cloud and internal cloud are neologisms that some vendors have recently used to
describe offerings that emulate cloud computing on private networks. These (typically
virtualisation automation) products claim to "deliver some benefits of cloud computing without
the pitfalls", capitalising on data security, corporate governance, and reliability concerns. They
have been criticized on the basis that users "still have to buy, build, and manage them" and as
such do not benefit from lower up-front capital costs and less hands-on management[66],
essentially "[lacking] the economic model that makes cloud computing such an intriguing
concept".[67][68]

While an analyst predicted in 2008 that private cloud networks would be the future of corporate
IT,[69] there is some uncertainty whether they are a reality even within the same firm.[70] Analysts
also claim that within five years a "huge percentage" of small and medium enterprises will get
most of their computing resources from external cloud computing providers as they "will not
have economies of scale to make it worth staying in the IT business" or be able to afford private
clouds.[71]. Analysts have reported on Platform's view that private clouds are a stepping stone to
external clouds, particularly for the financial services, and that future datacenters will look like
internal clouds. [72]

The term has also been used in the logical rather than physical sense, for example in reference to
platform as a service offerings[73], though such offerings including Microsoft's Azure Services
Platform are not available for on-premises deployment.[74]

Fibre Channel over Ethernet (FCoE) is an encapsulation of Fibre Channel packets over
Ethernet networks. This allows Fibre Channel to leverage 10 Gigabit Ethernet networks while
preserving the Fibre Channel protocol[1]. The specification, supported by a large number of
network and storage vendors, was developed by the FC-BB-5 working group of T11. On June 4,
2009, T11 approved the FC-BB-5 Draft Standard and forwarded it to INCITS for the publication
process as an ANSI standard.

Contents
[hide]

 1 Functionality
 2 Application
 3 Frame Format
 4 Timeline
 5 See also
 6 References
 7 External links

[edit] Functionality
FCoE maps Fibre Channel natively over Ethernet while being independent of the Ethernet
forwarding scheme. The FCoE protocol specification replaces the FC0 and FC1 layers of the
Fibre Channel stack with Ethernet. By retaining the native Fibre Channel constructs, FCoE
allows a seamless integration with existing Fibre Channel networks and management software.

Many data centers use Ethernet for TCP/IP networks and Fibre Channel for storage area
networks (SANs). With FCoE, Fibre Channel becomes another network protocol running on
Ethernet, alongside traditional Internet Protocol (IP) traffic. FCoE operates directly above
Ethernet in the network protocol stack, in contrast to iSCSI which runs on top of TCP and IP. As
a consequence, FCoE is not routable at the IP layer, and will not work across routed IP networks.

Since classical Ethernet has no flow control, unlike Fibre Channel, FCoE requires enhancements
to the Ethernet standard to support a flow control mechanism (this prevents congestion and
ensuing frame loss.) The IEEE standards body is working on this in the Data Center Bridging
Task Group.

Fibre Channel required three primary extensions to deliver the capabilities of Fibre Channel over
Ethernet networks:

 Encapsulation of native Fibre Channel frames into Ethernet Frames


 Extensions to the Ethernet protocol itself to enable an Ethernet fabric in which frames are
not routinely lost during periods of congestion.
 Mapping between Fibre Channel N_port IDs (aka FCIDs) and Ethernet MAC addresses
Computers connect to FCoE with Converged Network Adapters (CNAs), which contain both
Fibre Channel Host Bus Adapter (HBA) and Ethernet Network Interface Card (NIC)
functionality on the same adapter card. CNAs have one or more physical Ethernet ports. FCoE
encapsulation can be done in software with a conventional Ethernet network interface card,
however FCoE CNAs offload (from the CPU) the low level frame processing and SCSI protocol
functions traditionally performed by Fibre Channel host bus adapters.

[edit] Application
The main application of FCoE is in data center storage area networks (SANs). FCoE has
particular application in data centers due to the cabling reduction it makes possible, as well as in
server virtualization applications, which often require many physical I/O connections per server.

With FCoE, network (IP) and storage (SAN) data traffic can be consolidated using a single
network switch. This consolidation can:

 reduce the number of network interface cards required to connect to disparate storage and
IP networks
 reduce the number of cables and switches
 reduce power and cooling costs

[edit] Frame Format

FCoE Frame Format

FCoE is encapsulated over Ethernet with the use of a dedicated Ethertype, 0x8906. A single 4-bit
field (version) satisfies the IEEE sub-type requirements. The other bits in the frame specifically,
the source MAC address, destination MAC address, VLAN tags, SOF and EOF are all encoded
as specified in RFC 3643. Reserved bits are present to guarantee that the FCoE frame meets the
minimum length requirement of Ethernet. Inside the encapsulated Fibre Channel frame, the
frame header is retained so as to allow connecting to a storage network by passing on the Fibre
Channel frame directly after de-encapsulation.

The FIP (FCoE Initialization Protocol) is an integral part of FCoE. Its main goal is to discover
and initialize FCoE capable entities connected to an Ethernet cloud. FIP uses a dedicated
Ethertype, 0x8914.
Fibre Channel network protocols
From Wikipedia, the free encyclopedia
Jump to: navigation, search

Communication between devices in a fibre channel network uses different elements of the Fibre
Channel standards. The following sections introduce the main concepts and show how a
combination of primitives and frames is required.

Contents
[hide]

 1 Transmission words and ordered sets


 2 AL_PAs
 3 Meta-data
 4 Primitives
 5 Frames

[edit] Transmission words and ordered sets


All Fibre Channel communication is done in units of four 10-bit codes. This group of 4 codes is
called a transmission word.

An ordered set is a transmission word that includes some combination of control (K) codes and
data (D) codes

[edit] AL_PAs
Each device has an Arbitrated Loop Physical Address (AL_PA). These addresses are defined by
an 8-bit field but must have neutral disparity as defined in the 8B/10B coding scheme. That
reduces the number of possible values from 256 to 134. The 134 possible values have been
divided between the fabric, FC_AL ports, and other special purposes as follows:

AL_PA Quantity Purpose

00 1 FL (fabric) port

01-7E 126 NL (normal) ports


F0 1 Used during LIP and ARB

F7 1 Used during LIP

F8 1 Used during LIP

F9-FE 3 Reserved

FF 1 Used for broadcasts

[edit] Meta-data
In addition to the transfer of data, it is necessary for Fibre Channel communication to include
some meta-data. This allows for the setting up of links, sequence management, and other control
functions. The meta-data falls into two types, primitives which consist of a 4 character
transmission word and non-data frames which are more complex structures. Both are described
in the following sections.

[edit] Primitives
All primitives are four characters in length. They begin with the control character K28.5,
followed by three data characters. In some primitives the three data characters are fixed, in others
they can be varied to change the meaning or to act as parameters for the primitive. In some cases
the last two parameter characters are identical.

Parameters are shown in the table below in the form of their hexadecimal 8-bit values. This is
clearer than their full 10-bit (Dxx.x) form as shown in the Fibre Channel standards:

Mnemonic Meaning Parameters Comments

94F0F0 Request fairness

ARB Arbitrate 94FFFF Fill word

94yyyy Request arbitration for AL_PA=yy

Ends communication, cancelling previous OPN


CLS Close 85B5B5
commands.
Dynamic Half-
DHD 8AB5B5
Duplex

EOF End of frame See note 1

IDLE Idle 95B5B5

15F7F7 Request AL_PA

15F7xx Reinitialise AL_PA=xx

15F8F7 Loop failure at unknown AL_PA


LIP Loop Initialization
15F8xx Loop failure at AL_PA=xx

15FFxx Reset all, originating AL_PA=xx

15yyxx Reset AL_PA=yy, originating AL_PA=xx

09yyxx Bypass AL_PA=yy, originating AL_PA=xx


LPB Loop Port Bypass
09FFxx Bypass all, originating AL_PA=xx

05yyxx Enable AL_PA=yy, originating AL_PA=xx


LPE Loop Port Enable
05FFxx Enable all, originating AL_PA=xx

LR Link Reset 49BF49

LRR Link Reset Response 35BF49

MRK Mark 5Fxxxx Vendor unique - clock sync, spindle sync etc.

NOS Not Operational 55BF45 Link has failed

OLS Offline 358A55 Going offline (due to received NOS or other event)

OPN Open 91FFFF Open broadcast replicate (see note 2)


91yyFF Open selective replicate (see note 2)

91yyxx Open full duplex between AL_PA=xx and AL_PA=yy

91yyyy Open half duplex to AL_PA=yy

R_RDY Receiver_Ready 954949

SOF Start of frame B5cccc See note 3

7Fxxxx Clock Synchronization word X

SYN Synchronise BFyyyy Clock Synchronization word Y

DFzzzz Clock Synchronization word Z

VC_RDY Virtual Circuit Ready F5vvvv Where vv is the virtual circuit ID

Note 1: The first parameter byte of the EOF primitive can have one of four different values (8A,
95, AA, or B5). This is done so that the EOF primitive can rebalance the disparity of the whole
frame. The remaining two parameter bytes define whether the frame is ending normally,
terminating the transfer, or is to be aborted due to an error.

Note 2: The Open selective replicate variant can be repeated a number of times in order to
communicate with more than one destination port simultaneously. The Open broadcast
replicate variant will allow communication with all ports simultaneously.

Note 3: The SOF primitive contains a pair of control bytes (shown as cccc in the table) to
designate the type of frame.

[edit] Frames
The Fibre Channel protocol transmits data in frames each of which can contain up to 2112 bytes
of payload data. The structure of a frame is shown in this table:

Field Length

SOF - Start Of Frame 4

Extended header(s) 0 or more


Routing Control 1

Destination ID 3

Class-Specific Control / Priority 1

Source ID 3

Data Structure Type 1

Frame Control 3

Sequence ID 1

Data Field Control 1

Sequence Count 2

Originator Exchange ID 2

Responder Exchange ID 2

Parameter 4

Data field 0 to 2112

CRC - Cyclic redundancy Check 4

EOF - End of Frame 4

In addition to data frames, there are non-data frames that are used for setup and messaging
purposes. These fall into three categories, link control frames, link service frames, and extended
link service frames. The following table lists the most common ones:

Mnemonic Frame type Meaning

ABTS Link service Abort Sequence


ACK Link control Acknowledge data frame (success)

BA_ACC Line service Basic accept

BA_RJT Link service Basic reject

F_BSY Link control Fabric busy

F_RJT Link control Fabric frame reject

FLOGI Extended link service Fabric login

NOP Link service No Operation

P_BSY Link control Port busy

P_RJT Link control Port frame reject

PLOGI Extended link service Port login

PRLI Extended link service Process login

PRLO Extended link service Process logout

PRMT Link service Dedicated connection preempted

RMC Link service Remove connection

RSI Extended link service Request sequence initiative

Fiber channel 8/10 bit encoding

The Fibre Channel FC1 data link layer implements the 8b/10b encoding and decoding of signals.
The Fibre Channel 8B/10B coding scheme is also used in other telecommunications systems.
Data is expanded using an algorithm that creates one of two possible 10-bit output values for
each input 8-bit value. Each 8-bit input value can map either to a 10-bit output value with odd
disparity, or to one with even disparity. This mapping is usually done at the time when parallel
input data is converted into a serial output stream for transmission over a fibre channel link. The
odd/even selection is done in such a way that a long-term zero disparity between ones and zeroes
is maintained. This is often called "DC balancing".

The 8-bit to 10-bit conversion scheme uses only 512 of the possible 1024 output values. Of the
remaining 512 unused output values, most contain either too many ones or too many zeroes so
are not allowed. However this still leaves enough spare 10-bit odd+even coding pairs to allow for
12 special non-data characters.

The codes that represent the 256 data values are called the data (D) codes. The codes that
represent the 12 special non-data characters are called the control (K) codes.

All of the codes can be described by stating 3 octal values. This is done with a naming
convention of "Dxx.x" or "Kxx.x".

Example:

Input Data Bits: ABCDEFGH


Data is split: ABC DEFGH
Data is shuffled: DEFGH ABC

Now these bits are converted to decimal in the way they are paired.

Input data

C3 (HEX) = 11000011
= 110 00011
= 00011 110
= 3 6

E 8B/10B = D03.6

Retrieved from "http://en.wikipedia.org/wiki/Fibre_Channel_8B/10B_encoding"

Fibre Channel switch


SAN-switch Qlogic with optical Fibre Channel connectors installed.

In the computer storage field, a Fibre Channel switch is a network switch compatible with the
Fibre Channel (FC) protocol. It allows the creation of a Fibre Channel fabric, that is currently the
core component of most storage area networks. The fabric is a network of Fibre Channel devices
which allows many-to-many communication, device name lookup, security, and redundancy. FC
switches implement zoning, a mechanism that disables unwanted traffic between certain fabric
nodes.

A Fibre Channel director is, by current convention, a switch with at least 128 ports. It does not
differ from a switch in core FC protocol functionality. The term itself initially soaked from old
ESCON technology.

Fibre Channel switches may be deployed one at a time or in larger multi-switch configurations.
SAN administrators typically add new switches as their server and storage needs grow,
connecting switches together via fiber optic cable using the standard device ports. Some switch
vendors now offer dedicated high-speed stacking ports to handle inter-switch connections
(similar to existing stackable Ethernet switches), allowing high-performance multi-switch
configurations to be created using fewer switches overall.

Major manufacturers of Fibre Channel switches are: Brocade, Cisco Systems, and QLogic.

N_Port ID Virtualization or NPIV is a Fibre Channel facility allowing multiple N_Port IDs to
share a single physical N_Port. This allows multiple Fibre Channel initiators to occupy a single
physical port, easing hardware requirements in Storage Area Network design, especially where
virtual SANs are called for. NPIV is defined by the Technical Committee T11 in the Fibre
Channel - Link Services (FC-LS) specification.

Normally N_Port initialization proceeds like this:

 N_Port sends FLOGI to address 0xFFFFFE to obtain a valid address


 N_Port sends PLOGI to address 0xFFFFFC to register this address with the name server
 N_Port sends SCR to address 0xFFFFFD to register for state change notifications

However with NPIV it may continue like this:


 N_Port sends FDISC to address 0xFFFFFE to obtain an additional address
 N_Port sends PLOGI to address 0xFFFFFC to register his additional address with the
name server
 N_Port sends SCR to address 0xFFFFFD to register for state change notifications.
 ... (repeat FDISC/PLOGI/SCR for next address)

FDISC is abbreviation of "Discover Fabric Service Parameters", which is a misleading name in


this context. It works just like FLOGI.

In storage networking, Fibre Channel zoning is the partitioning of a Fibre Channel fabric into
smaller subsets to restrict interference, add security, and to simplify management. If a SAN
contains several storage devices, each system connected to the SAN should not be allowed to
interact with all of them. Zoning applies only to the switched fabric topology (FC-SW), it does
not exist in simpler Fibre Channel topologies.

Zoning is sometimes confused with LUN masking, because it serves the same goals. LUN
masking, however, works on Fibre Channel level 4 (i.e. on SCSI level), while zoning works on
level 2. This allows zoning to be implemented on switches, whereas LUN masking is performed
on endpoint devices - host adapters or disk array controllers.

Zoning is also different from VSANs, in that each port can be a member of multiple zones, but
only one VSAN. VSAN (similarly to VLAN) is in fact a separate network (separate sub-fabric),
with its own fabric services (including its own separate zoning).

There are two main methods of zoning, hard and soft, that combine with two sets of attributes,
name and port.

Soft zoning restricts only the fabric name services, to show the device only an allowed subset of
devices. Therefore, when a server looks at the content of the fabric, it will only see the devices it
is allowed to see. However, any server can still attempt to contact any device on the network by
address. In this way, soft zoning is similar to the computing concept of security through
obscurity.

In contrast, hard zoning restricts actual communication across a fabric. This requires efficient
hardware implementation (frame filtering) in the fabric switches, but is much more secure.

Zoning can also be applied to either switch ports or end-station names. Port zoning restricts
specific switch ports from seeing unauthorized ports. WWN zoning (also called name zoning)
restricts access by device's World Wide Name (WWN). With port zoning, even when device is
unplugged from a switch port and a different one is plugged in, the new one still has access to the
zone - i.e. the fact that device's WWN changed is ignored. With WWN zoning, when device is
unplugged from a switch port and plugged to a different port (perhaps on a different switch) it
still has access to the zone, because the switches check only device's WWN - i.e. the specific port
that device connects to is ignored. This is more flexible, but WWNs can be easily spoofed,
reducing security.

Currently, the combination of hard and WWN zoning is the most popular. Because port zoning is
non-standard, it usually requires a homogeneous SAN (all switches from one vendor).

In order to bring the created zones together for ease of deployment and management a zoneset is
employed (also called zoning config). A zoneset is merely a logical container for the individual
zones, that are designed to work at the same time. A zoneset can contain WWN zones, port
zones, or a combination of both (hybrid zones). The zoneset must be "activated" within the
fabric (i.e. distributed through all the switches and then simultaneously enforced). Switches may
contain more than one zoneset, but only one zoneset can be active in the entire fabric.

World Wide Name


World Wide Name (WWN) or World Wide Identifier (WWID) is a unique identifier which
identifies a particular Fibre Channel, Advanced Technology Attachment (ATA) or Serial
Attached SCSI (SAS) target. Each WWN is an 8 bytes derived from an IEEE OUI and vendor-
supplied information.

There are two formats of WWN defined by the IEEE:

 Original format: addresses are assigned to manufacturers by the IEEE standards committee, and
are built into the device at build time, similar to an Ethernet MAC address. First 2 bytes are
either hex 10:00 or 2x:xx (where the x's are vendor-specified) followed by the 3-byte vendor
identifier and 3 bytes for a vendor-specified serial number
 New addressing schema: first nibble is either hex 5 or 6 followed by a 3-byte vendor identifier
and 36 bits for a vendor-specified serial number

[edit] List of a few WWN company identifiers


 00:50:76 IBM
 00:A0:98 NetApp
 00:60:69 Brocade Communications Systems
 00:05:1E Brocade Communications Systems, formerly owned by Rhapsody Networks
 00:60:DF Brocade Communications Systems, formerly CNT Technologies Corporation
 00:E0:8B QLogic HBAs, original identifier space
 00:1B:32 QLogic HBAs. new identifier space starting to be used in 2007
 00:C0:DD QLogic FC switches
 00:90:66 QLogic formerly Troika Networks
 00:11:75 QLogic formerly PathScale, Inc
 08:00:88 Brocade Communications Systems, formerly McDATA Corporation. WWIDs begin with
1000.080

 00:60:B0 Hewlett-Packard - Integrity and HP9000 servers. WWIDs begin with 5006.0b0
 00:11:0A Hewlett-Packard - ProLiant servers. Formerly Compaq. WWIDs begin with 5001.10a
 00:01:FE Hewlett-Packard - EVA disk arrays. Formerly Digital Equipment Corporation. WWIDs
begin with 5000.1fe1
 00:17:A4 Hewlett-Packard - MSL tape libraries. Formerly Global Data Services. WWIDs begin
with 200x.0017.a4
 00:60:48 EMC Corporation, for Symmetrix
 00:60:16 EMC Corporation, for CLARiiON
 00:10:86 ATTO Technology

Switched fabric in Fibre Channel


Example topology of a Fibre Channel switched fabric network

In the Fibre Channel switched fabric topology (called FC-SW), devices are connected to each
other through one or more Fibre Channel switches. This topology allows the connection of up to
the theoretical maximum of 16 million devices, limited only by the available address space (224).
Multiple switches in a fabric usually form a mesh network, with devices being on the "edges"
("leaves") of the mesh. While this topology has the best scalability properties of the three FC
topologies, it is also the most expensive, the only one requiring a costly fibre channel switch.

Visibility among nodes in a fabric is typically controlled with zoning.

Most Fibre Channel network designs employ two separate fabrics for redundancy. The two
fabrics share the edge nodes (devices), but are otherwise unconnected. One of the advantages of
this topology is capability of failover, meaning that in case one link breaks or a switch is out of
order, datagrams can use the second fabric.

Arbitrated loop
Arbitrated loop, also known as FC-AL, is a Fibre Channel topology in which devices are
connected in a one-way loop fashion in a ring topology. Historically it was a lower-cost
alternative to a fabric topology. It allowed connection of many servers and computer storage
devices without using then very costly Fibre Channel switches. As of 2007 the cost of the
switches dropped considerably, so FC-AL is rarely used for a server-to-storage communication.
It is however still commonly utilized on backend of some disk array controllers.

 It is a serial architecture that is compatible with SCSI, handling up to 127 ports (devices). One
port may optionally connect a loop to fabric switch port.
 The bandwidth on the loop is shared among all ports.
 More than 2 ports on the loop can communicate at the same time (there is no concept of a
"token").
 Arbitrated loop with only 2 ports is a valid one, and while having the same physical topology as
point-to-point still acts as a loop protocol-wise.
 Arbitrated loop can be physically cabled in a ring fashion or using a hub. The physical ring ceases
to work if one of the devices in the chain fails. The hub on the other hand, while maintaining a
logical ring, allows a star topology on the cable level. Each receive port on the hub is simply
passed to next active transmit port, bypassing any inactive or failed ports.
 Fibre Channel ports capable of arbitrated loop communication are NL_port (node loop port) and
FL_port (fabric loop port), collecively referred to as the L_ports. Physical connectors on the hub
are not considered ports in terms of the protocol.
 An arbitrated loop with no fabric port (with only NL_ports) is a private loop.
 An arbitrated loop connected to a fabric through FL_port, is a public loop.
 NL_Port must provide fabric logon (FLOGI) and name registration facilities to initiate
communication with other node through the fabric (to be an initiator).

Serial ATA
From Wikipedia, the free encyclopedia
  (Redirected from SATA)

Jump to: navigation, search

"SATA" redirects here. For other uses, see SATA (disambiguation).

SATA (pronounced SAT-ah)


Serial ATA

First-generation (1.5 Gbit/s) SATA ports on a


motherboard

Year created: 2003

Supersedes: Parallel ATA (PATA)

Capacity 1.5, 3.0, 6.0 Gbit/s


Style: Serial

Hotplugging? Yes[1]

External? Yes (eSATA)

The serial ATA, or SATA computer bus, is a storage-interface for connecting host bus adapters
to mass storage devices such as hard disk drives and optical drives. The SATA host adapter is
integrated into almost all modern consumer laptop computers and desktop motherboards.

Serial ATA was designed to replace the older ATA (AT Attachment) standard (also known as
EIDE). It is able to use the same low level commands, but serial ATA host-adapters and devices
communicate via a high-speed serial cable over two pairs of conductors. In contrast, the parallel
ATA (the redesignation for the legacy ATA specifications) used 16 data conductors each
operating at a much lower speed.

SATA offers several compelling advantages over the older parallel ATA (PATA) interface:
reduced cable-bulk and cost (reduced from eighty wires to seven), faster and more efficient data
transfer, full duplex (the ability to transmit and receive at the same time), and hot swapping (the
ability to remove or add devices while operating).

As of 2009, SATA has all but replaced parallel ATA in all shipping consumer PCs. PATA
remains in industrial and embedded applications dependent on CompactFlash storage although
the new CFast storage standard will be based on SATA.[2][3]

Contents
[hide]

 1 SATA specification bodies


 2 Features
o 2.1 Hotplug
 3 Advanced Host Controller Interface
o 3.1 Throughput
 3.1.1 SATA 1.5 Gbit/s (First generation)
 3.1.2 SATA 3 Gbit/s (Second generation)
 3.1.2.1 SATA II committee renamed SATA-IO
 3.1.2.2 SATA II product marketing
 3.1.3 SATA 6 Gbit/s (Third generation)
o 3.2 Cables and connectors
 3.2.1 Data
 3.2.2 Power supply
 3.2.2.1 Standard connector
 3.2.2.2 Slimline connector
 3.2.2.3 Micro connector
o 3.3 Topology
o 3.4 Encoding
 4 External SATA
 5 Backward and forward compatibility
o 5.1 SATA and PATA
o 5.2 SATA 1.5 Gbit/s and SATA 3 Gbit/s
 6 Comparisons with other interfaces
o 6.1 SATA and SCSI
o 6.2 SATA in comparison to other buses
 7 See also
 8 Notes and references
 9 External links

[edit] SATA specification bodies


There are at least four bodies with possible responsibility for providing SATA specifications: the
trade organisation, SATA-IO; the INCITS T10 subcommittee (SCSI); a subgroup of T10
responsible for SAS; and the INCITS T13 subcommittee (ATA). This has caused confusion as
the ATA/ATAPI-7 specification from T13 incorporated an early, incomplete SATA rev. 1
specification from SATA-IO.[4] The remainder of this article will try to use the terminology and
specifications of SATA-IO.

[edit] Features
[edit] Hotplug

All SATA devices support hotplugging. However, proper hotplug support requires the device be
running in its native command mode not via IDE emulation, which requires AHCI (Advanced
Host Controller Interface). Some of the earliest SATA host adapters were not capable of this and
furthermore some popular OSes, such as Windows XP, still do not support AHCI.12 asas a

[edit] Advanced Host Controller Interface


As their standard interface, SATA controllers use the AHCI (Advanced Host Controller
Interface), allowing advanced features of SATA such as hotplug and native command queuing
(NCQ). If AHCI is not enabled by the motherboard and chipset, SATA controllers typically
operate in "IDE emulation" mode which does not allow features of devices to be accessed if the
ATA/IDE standard does not support them.

Windows device drivers that are labeled as SATA are usually running in IDE emulation mode
unless they explicitly state that they are AHCI. While the drivers included with Windows XP do
not support AHCI, AHCI has been implemented by proprietary device drivers.[5] Windows Vista,
[6]
FreeBSD, Linux with kernel version 2.6.19 onward,[7] as well as Solaris and OpenSolaris have
native support for AHCI.

[edit] Throughput

The current SATA specifications detail data transfer rates as high as 6.0 Gbit/s per device. SATA
uses only 4 signal lines; cables are more compact and cheaper than PATA. SATA supports hot-
swapping and NCQ.

[edit] SATA 1.5 Gbit/s (First generation)

First-generation SATA interfaces, now known as SATA 1.5 Gbit/s, communicate at a rate of
1.5 Gbit/s. Taking 8b/10b encoding overhead into account, they have an actual uncoded transfer
rate of 1.2 Gbit/s. The theoretical burst throughput of SATA 1.5 Gbit/s is similar to that of
PATA/133, but newer SATA devices offer enhancements such as NCQ which improve
performance in a multitasking environment.

As of April 2009 mechanical hard disk drives can transfer data at up to 131 MB/s,[8] which is
within the capabilities of the older PATA/133 specification. However, high-performance flash
drives can transfer data at up to 201 MB/s.[9] SATA 1.5 Gbit/s does not provide sufficient
throughput for these drives.

During the initial period after SATA 1.5 Gbit/s finalization, adapter and drive manufacturers
used a "bridge chip" to convert existing PATA designs for use with the SATA interface.[citation
needed]
Bridged drives have a SATA connector, may include either or both kinds of power
connectors, and generally perform identically to their PATA equivalents. Most lack support for
some SATA-specific features such as NCQ. Bridged products gradually gave way to native
SATA products.[citation needed]

[edit] SATA 3 Gbit/s (Second generation)

Soon after the introduction of SATA 1.5 Gbit/s, a number of shortcomings emerged. At the
application level SATA could handle only one pending transaction at a time—like PATA. The
SCSI interface has long been able to accept multiple outstanding requests and service them in the
order which minimizes response time. This feature, native command queuing (NCQ), was
adopted as an optional supported feature for SATA 1.5 Gbit/s and SATA 3 Gbit/s devices.

First-generation SATA devices operated at best a little faster than parallel ATA/133 devices.
Subsequently, a 3 Gbit/s signaling rate was added to the physical layer (PHY layer), effectively
doubling maximum data throughput from 150 MB/s to 300 MB/s.

For mechanical hard drives, SATA 3 Gbit/s transfer rate is expected to satisfy drive throughput
requirements for some time, as the fastest mechanical drives barely saturate a SATA 1.5 Gbit/s
link. A SATA data cable rated for 1.5 Gbit/s will handle current mechanical drives without any
loss of sustained and burst data transfer performance. However, high-performance flash drives
are approaching SATA 3 Gbit/s transfer rate.
Given the importance of backward compatibility between SATA 1.5 Gbit/s controllers and
SATA 3 Gbit/s devices, SATA 3 Gbit/s autonegotiation sequence is designed to fall back to
SATA 1.5 Gbit/s speed when in communication with such devices. In practice, some older
SATA controllers do not properly implement SATA speed negotiation. Affected systems require
the user to set the SATA 3 Gbit/s peripherals to 1.5 Gbit/s mode, generally through the use of a
jumper, however some drives lack this jumper. Chipsets known to have this fault include the
VIA VT8237 and VT8237R southbridges, and the VIA VT6420, VT6421A and VT6421L
standalone SATA controllers.[10] SiS's 760 and 964 chipsets also initially exhibited this problem,
though it can be rectified with an updated SATA controller ROM.[citation needed]

[edit] SATA II committee renamed SATA-IO

Popular usage refers to the SATA 3 Gbit/s specification as Serial ATA II (SATA II or SATA2),
contrary to the wishes of the Serial ATA International Organization (SATA-IO) which defines
the standard. SATA II was originally the name of a committee defining updated SATA
standards, of which the 3 Gbit/s standard was just one. However since it was among the most
prominent features defined by the former SATA II committee, and, more critically, the term "II"
is commonly used for successors, the name SATA II became synonymous with the 3 Gbit/s
standard, so the group has since changed names to the Serial ATA International Organization, or
SATA-IO, to avoid further confusion.

[edit] SATA II product marketing

As of 2009, "SATA II" and "SATA 2" are the most common marketing terms for any "second-
generation" SATA drives, controllers or related accessories. Unfortunately, these terms have no
specific meaning, since they are not the proper official nomenclature. Also, the second-
generation SATA standards only define a set of optional features (3 Gb/s, NCQ — Native
Command Queuing, staggered spin-up and hot-plugging) improving on the first generation
technology, but don't require including those features. Almost any SATA product with any set of
features could legitimately be described as "compatible" with these standards. Only careful
research can determine which features may be included in any particular "SATA II" product. [11]
[12]

[edit] SATA 6 Gbit/s (Third generation)

Serial ATA International Organization presented the draft specification of SATA 6 Gbit/s
physical layer in July 2008,[13] and ratified its physical layer specification on August 18, 2008.[14]
The full 3.0 standard was released on May 27, 2009.[15] While even the fastest conventional hard
disk drives can barely saturate the original SATA 1.5 Gbit/s bandwidth, Solid State Disk drives
are close to saturating the SATA 3 Gbit/s limit at 250 MB/s net read speed. Ten channels of fast
flash can actually reach well over 500 MB/s with new ONFI drives, so a move from SATA 3
Gbit/s to SATA 6 Gbit/s would benefit the flash read speeds. As for the standard hard disks, the
reads from their built-in DRAM cache will end up faster across the new interface.[16]

The new specification contains the following changes:


 A new Native Command Queuing (NCQ) streaming command to enable Isochronous data
transfers for bandwidth-hungry audio and video applications.
 An NCQ Management feature that helps optimize performance by enabling host processing and
management of outstanding NCQ commands.
 Improved power management capabilities.
 A small Low Insertion Force (LIF) connector for more compact 1.8-inch storage devices.
 A connector designed to accommodate 7 mm optical disk drives for thinner and lighter
notebooks.
 Alignment with the INCITS ATA8-ACS standard.

The enhancements are generally aimed at improving quality of service for video streaming and
high priority interrupts. In addition, the standard continues to support distances up to a meter.
The new speeds may require higher power consumption for supporting chips, factors that new
process technologies and power management techniques are expected to mitigate. The new
specification can use existing SATA cables and connectors, although some OEMs are expected
to upgrade host connectors for the higher speeds.[17] Also, the new standard is backwards
compatible with SATA 3 Gbit/s.[18]

In order to avoid parallels to the common SATA II misnomer, the SATA-IO has compiled a set of
marketing guidelines for the new specification. The specification should be called Serial ATA
International Organization: Serial ATA Revision 3.0, and the technology itself is to be referred to
as SATA 6 Gbit/s. A product using this standard should be called the SATA 6 Gbit/s [product
name]. The terms SATA III or SATA 3.0, which are considered to cause confusion among
consumers, must not be used.[18]

[edit] Cables and connectors

Connectors and cables present the most visible differences between SATA and parallel ATA
drives. Unlike PATA, the same connectors are used on 3.5 in SATA hard disks for desktop and
server computers and 2.5 in disks for portable or small computers; this allows 2.5 in drives to be
used in desktop computers with only a mounting bracket and no wiring adapter.

There is a special connector (eSATA) specified for external devices, and an optionally
implemented provision for clips to hold internal connectors firmly in place. SATA drives may be
plugged into SAS controllers and communicate on the same physical cable as native SAS disks,
but SATA controllers cannot handle SAS disks.

[edit] Data
Pin # Function

1 Ground

2 A+ (Transmit)

3 A− (Transmit)
4 Ground

5 B− (Receive)

6 B+ (Receive)

7 ground

8 coding notch

A 7-pin Serial ATA right-angle data cable.

The SATA standard defines a data cable with seven conductors (3 grounds and 4 active data
lines in two pairs) and 8 mm wide wafer connectors on each end. SATA cables can have lengths
up to 1 metre (3.3 ft), and connect one motherboard socket to one hard drive. PATA ribbon
cables, in comparison, connect one motherboard socket to up to two hard drives, carry either 40
or 80 wires, and are limited to 45 centimetres (18 in) in length by the PATA specification
(however, cables up to 90 centimetres (35 in) are readily available). Thus, SATA connectors and
cables are easier to fit in closed spaces and reduce obstructions to air cooling. They are more
susceptible to accidental unplugging and breakage than PATA, but cables can be purchased that
have a locking feature, whereby a small (usually metal) spring holds the plug in the socket.

One of the problems associated with the transmission of data at high speed over electrical
connections is loosely described as noise. Despite attempts to avoid it, some electrical coupling
will exist both between data circuits and between them and other circuits. As a result, the data
circuits can both affect other circuits, whether they are within the same piece of equipment or
not, and can be affected by them. Designers use a number of techniques to reduce the undesirable
effects of such unintentional coupling. One such technique used in SATA links is differential
signalling. This is an enhancement over PATA, which uses single-ended signaling. Twisted pair
cabling also gives superior performance in this regard.
[edit] Power supply

[edit] Standard connector


Pin # Mating Function

 — coding notch

1 3rd

2 3rd 3.3 V

3 2nd

4 1st

5 2nd Ground

6 2nd

7 2nd

8 3rd 5 V

9 3rd

10 2nd Ground

Staggered spinup/activity
11 3rd
(in supporting drives)

12 1st Ground

13 2nd

14 3rd 12 V

15 3rd
A 15-pin Serial ATA power receptacle. This connector does not
provide the extended pins 4 and 12 needed for hot-plugging.

The SATA standard specifies a different power connector than the decades-old four-pin Molex
connector found on pre-SATA devices. Like the data cable, it is wafer-based, but its wider 15-
pin shape prevents accidental mis-identification and forced insertion of the wrong connector
type. Native SATA devices favor the SATA power-connector, although some early SATA drives
retained older 4-pin Molex in addition to the SATA power connector.

SATA features more pins than the traditional connector for several reasons:

 A third voltage is supplied, 3.3 V, in addition to the traditional 5 V and 12 V.
 Each voltage transmits through three pins ganged together, because the small contacts by
themselves cannot supply sufficient current for some devices. (Each pin should be able to
provide 1.5 A.)
 Five pins ganged together provide ground.
 For each of the three voltages, one of the three pins serves for hotplugging. The ground pins and
power pins 3, 7, and 13 are longer on the plug (located on the SATA device) so they will connect
first. A special hot-plug receptacle (on the cable or a backplane) can connect ground pins 4 and
12 first.
 Pin 11 can function for staggered spinup, activity indication, or nothing. Staggered spinup is used
to prevent many drives from spinning up simultaneously, as this may draw too much power.
Activity is an indication of whether the drive is busy, and is intended to give feedback to the user
through a LED.

Adapters exist which can convert a 4-pin Molex connector to a SATA power connector.
However, because the 4-pin Molex connectors do not provide 3.3 V power, these adapters
provide only 5 V and 12 V power and leave the 3.3 V lines unconnected. This precludes the use
of such adapters with drives that require 3.3 V power. Understanding this, drive manufacturers
have largely left the 3.3 V power lines unused.

[edit] Slimline connector

SATA 2.6 first defined the slimline connector, intended for smaller form-factors; e.g., notebook
optical drives.
Pin
Function
#

1 Device Present

2–3 5 V

4 Manufacturing Diagnostic

5–6 Ground

A 6-pin Slimline Serial ATA power connector. Note that pin 1 (device present) is shorter than the others.

[edit] Micro connector

The micro connector originated with SATA 2.6. It is intended for 1.8-inch hard drives. There is
also a micro data connector, which it is similar to the standard data connector but is slightly
thinner.

Pin
Function
#

1–2 3.3 V

3–4 Ground

5–6 5 V

7 Reserved

8–9 Vendor Specific


[edit] Topology

SATA topology: host – expansor - device

SATA uses a point-to-point architecture. The connection between the controller and the storage
device is direct.

Modern PC systems usually have a SATA controller on the motherboard, or installed in a PCI or
PCI Express slot. Most SATA controllers have multiple SATA ports and can be connected to
multiple storage devices. There are also port expanders or multipliers which allow multiple
storage devices to be connected to a single SATA controller port.

[edit] Encoding

These high-speed transmission protocols use a logic encoding known as 8b/10b encoding. The
signal uses non-return to zero (NRZ) encoding with LVDS.

In the 8b/10b encoding the data sequence includes the synchronizing signal. This technique is
known as clock data recovery, because it does not use a separate synchronizing signal. Instead, it
uses the serial signal's 0 to 1 transitions to recover the clock signal.

[edit] External SATA


It has been suggested that this section be split into a new article entitled eSATA. (Discuss)

This section needs additional citations for verification.


Please help improve this article by adding reliable references. Unsourced material may be challenged and
removed. (May 2009)

The official eSATA logo

eSATA, standardized in 2004, provides a variant of SATA meant for external connectivity. It
has revised electrical requirements in addition to incompatible cables and connectors:

 Minimum transmit potential increased: Range is 500–600 mV instead of 400–600 mV.


 Minimum receive potential decreased: Range is 240–600 mV instead of 325–600 mV.
 Identical protocol and logical signaling (link/transport-layer and above), allowing native SATA
devices to be deployed in external enclosures with minimal modification
 Maximum cable length of 2 metres (6.6 ft) (USB and FireWire allow longer distances.)
 The external cable connector equates to a shielded version of the connector specified in SATA
1.0a with these basic differences:
o The external connector has no "L" shaped key, and the guide features are vertically
offset and reduced in size. This prevents the use of unshielded internal cables in external
applications and vice-versa.
o To prevent ESD damage, the design increased insertion depth from 5 mm to 6.6 mm and
the contacts are mounted farther back in both the receptacle and plug.
o To provide EMI protection and meet FCC and CE emission requirements, the cable has
an extra layer of shielding, and the connectors have metal contact-points.
o The connector shield has springs as retention features built in on both the top and
bottom surfaces.
o The external connector and cable have a design-life of over five thousand insertions and
removals, while the internal connector is only specified to withstand fifty.

SATA (left) and eSATA (right) connectors

Aimed at the consumer market, eSATA enters an external storage market already served by the
USB and FireWire interfaces. Most external hard-disk-drive cases with FireWire or USB
interfaces use either PATA or SATA drives and "bridges" to translate between the drives'
interfaces and the enclosures' external ports, and this bridging incurs some inefficiency. Some
single disks can transfer 131 MB/s during real use,[8] more than twice the maximum transfer rate
of USB 2.0 or FireWire 400 (IEEE 1394a) and well in excess of the maximum transfer rate of
FireWire 800, though the S3200 FireWire 1394b spec reaches ~400 MB/s (3.2 Gbit/s). Finally,
some low-level drive features, such as S.M.A.R.T., may not operate through USB or FireWire
bridging. eSATA does not suffer from these issues. USB 3.0's 5.0Gbit/S and Firewire's future
6.4Gbit/S will be faster than eSATA I, but the eSATA version of SATA III will operate at
6.0Gbit/S, thereby operating at negligible differences of each other.[19]
HDMI, Ethernet, and eSATA ports on a Sky+ HD Digibox

eSATA can be differentiated from USB 2.0 and FireWire external storage for several reasons. As
of early 2008, the vast majority of mass-market computers have USB ports and many computers
and consumer electronic appliances have FireWire ports, but few devices have external SATA
connectors. For small form-factor devices (such as external 2.5-inch disks), a PC-hosted USB or
FireWire link supplies sufficient power to operate the device. Where a PC-hosted port is
concerned, eSATA connectors cannot supply power, and would therefore be more cumbersome
to use[20].

Owners of desktop computers that lack a built-in eSATA interface can upgrade them with the
installation of an eSATA host bus adapter (HBA), while notebooks can be upgraded with
Cardbus[21] or ExpressCard[22] versions of an eSATA HBA. With passive adapters the maximum
cable length is reduced to 1 metre (3.3 ft) due to the absence of compliant eSATA signal-levels.
Full SATA speed for external disks (115 MB/s) have been measured with external RAID
enclosures.[citation needed]

eSATA may[original research?] attract the enterprise and server market, which has already standardized
on the Serial Attached SCSI (SAS) interface, because of its hotplug capability and low price.

Prior to the final eSATA specification, a number of products existed designed for external
connections of SATA drives. Some of these use the internal SATA connector or even connectors
designed for other interface specifications, such as FireWire. These products are not eSATA
compliant. The final eSATA specification features a specific connector designed for rough
handling, similar to the regular SATA connector, but with reinforcements in both the male and
female sides, inspired by the USB connector. eSATA resists inadvertent unplugging, and can
withstand yanking or wiggling which would break a male SATA connector (the hard-drive or
host adapter, usually fitted inside the computer). With an eSATA connector, considerably more
force is needed to damage the connector, and if it does break it is likely to be the female side, on
the cable itself, which is relatively easy to replace.[citation needed]

[edit] Backward and forward compatibility


[edit] SATA and PATA

At the device level, SATA and PATA (Parallel Advanced Technology Attachment) devices
remain completely incompatible—they cannot be interconnected. At the application level, SATA
devices can be specified to look and act like PATA devices.[23] Many motherboards offer a
"legacy mode" option which makes SATA drives appear to the OS like PATA drives on a
standard controller. This eases OS installation by not requiring a specific driver to be loaded
during setup but sacrifices support for some features of SATA and generally disables some of the
boards' PATA or SATA ports since the standard PATA controller interface only supports 4
drives. (Often which ports are disabled is configurable.)

The common heritage of the ATA command set has enabled the proliferation of low-cost PATA
to SATA bridge-chips. Bridge-chips were widely used on PATA drives (before the completion
of native SATA drives) as well as standalone "dongles." When attached to a PATA drive, a
device-side dongle allows the PATA drive to function as a SATA drive. Host-side dongles allow
a motherboard PATA port to function as a SATA host port.

The market has produced powered enclosures for both PATA and SATA drives which interface
to the PC through USB, Firewire or eSATA, with the restrictions noted above. PCI cards with a
SATA connector exist that allow SATA drives to connect to legacy systems without SATA
connectors.

[edit] SATA 1.5 Gbit/s and SATA 3 Gbit/s

The designers of SATA aimed for backward and forward compatibility with future revisions of
the SATA standard.[24]

According to the hard drive manufacturer Maxtor, motherboard host controllers using the VIA
and SIS chipsets VT8237, VT8237R, VT6420, VT6421L, SIS760, SIS964 found on the ECS
755-A2 manufactured in 2003, do not support SATA 3 Gbit/s drives. Additionally, these host
controllers do not support SATA 3 Gbit/s optical disc drives. To address interoperability
problems, the largest hard drive manufacturer, Seagate/Maxtor, has added a user-accessible
jumper-switch known as the Force 150, to switch between 150 MB/s and 300 MB/s operation.[25]
Users with a SATA 1.5 Gbit/s motherboard with one of the listed chipsets should either buy an
ordinary SATA 1.5 Gbit/s hard disk, buy a SATA 3 Gbit/s hard disk with the user-accessible
jumper, or buy a PCI or PCI-E card to add full SATA 3 Gbit/s capability and compatibility.
Western Digital uses a jumper setting called OPT1 Enabled to force 150 MB/s data transfer
speed. OPT1 is used by putting the jumper on pins 5 & 6.[26]

[edit] Comparisons with other interfaces


[edit] SATA and SCSI

SCSI currently offers transfer rates higher than SATA, but it uses a more complex bus, usually
resulting in higher manufacturing costs. SCSI buses also allow connection of several drives
(using multiple channels, 7 or 15 on each channel), whereas SATA allows one drive per channel,
unless using a port multiplier.

SATA 3 Gbit/s offers a maximum bandwidth of 300 MB/s per device compared to SCSI with a
maximum of 320 MB/s. Also, SCSI drives provide greater sustained throughput than SATA
drives because of disconnect-reconnect and aggregating performance. SATA devices generally
link compatibly to SAS enclosures and adapters, while SCSI devices cannot be directly
connected to a SATA bus.

SCSI, SAS and fibre-channel (FC) drives are typically more expensive so they are traditionally
used in servers and disk arrays where the added cost is justifiable. Inexpensive ATA and SATA
drives evolved in the home-computer market, hence there is a view that they are less reliable. As
those two worlds overlapped, the subject of reliability became somewhat controversial. Note
that, generally, the failure rate of a disk drive is related to the quality of its heads, platters and
supporting manufacturing processes, not to its interface.

[edit] SATA in comparison to other buses

Raw bandwidth Transfer speed Max. cable length Power Devices per
Name  
(Mbit/s)   (MB/s)   (m)   provided   Channel  

2 with eSATA HBA (1 1 (15 with port


eSATA 3,000 300 No[27]
with passive adapter) multiplier)

1 (15 with port


SATA 300 3,000 300 1 No
multiplier)

SATA 150 1,500 150 1 No 1 per line

PATA 133 1,064 133 0.46 (18 in) No 2

1 (16k with
SAS 300 3,000 300 8 No
expanders)

1 (16k with
SAS 150 1,500 150 8 No
expanders)
FireWire 100; alternate cables
3,144 393 15 W, 12–25 V 63 (with hub)
3200 available for >100 m

FireWire
786 98.25 100[28] 15 W, 12–25 V 63 (with hub)
800

FireWire
393 49.13 4.5[28][29] 15 W, 12–25 V 63 (with hub)
400

127 (with hub)


USB 3.0* 5,000 625 3[30] 4.5 W, 5 V [30]

USB 2.0 480 60 5[31] 2.5 W, 5 V 127 (with hub)

Ultra-320 15 (plus the


2,560 320 12 No
SCSI HBA)

Fibre
126
Channel
10,520 2,000 2–50,000 No (16,777,216
over optic
with switches)
fiber

Fibre
126
Channel
4,000 400 12 No (16,777,216
over copper
with switches)
cable

1 with point to
InfiniBand point
5 (copper)[32][33] No
12× Quad- 120,000 12,000 Many with
rate switched fabric
<10,000 (fiber)

* USB 3.0 specification released to hardware vendors 17 November 2008.

Unlike PATA, both SATA and eSATA support hot-swapping by design. However, this feature
requires proper support at the host, device (drive), and operating-system level. In general, all
SATA devices (drives) support hot-swapping (due to the requirements on the device-side), but
requisite support is less common on SATA host adapters.[1]

SCSI-3 devices with SCA-2 connectors are designed for hot-swapping. Many server and RAID
systems provide hardware support for transparent hot-swapping. The designers of the SCSI
standard prior to SCA-2 connectors did not target hot-swapping, but, in practice, most RAID
implementations support hot-swapping of hard disks.

Serial Attached SCSI (SAS) is designed for hot-swapping.

Parallel ATA
From Wikipedia, the free encyclopedia
  (Redirected from Advanced Technology Attachment)

Jump to: navigation, search

Parallel ATA

ATA connector on the left, with two motherboard ATA connectors


on the right.
Type Internal storage device connector

Production history

Designer Western Digital, subsequently amended


by many others

Designed 1986

Superseded by Serial ATA (2003)

Specifications

Hot pluggable No
External No

Width 16 bits

Bandwidth 16 MB/s originally


later 33, 66, 100 and 133
MB/s

Max. 2 (master/slave)
devices

Protocol Parallel

Cable 40 or 80 wires ribbon cable

Pins 40

Pin out

Pin 1 Reset

Pin 2 Ground

Pin 3 Data 7

Pin 4 Data 8

Pin 5 Data 6

Pin 6 Data 9

Pin 7 Data 5

Pin 8 Data 10

Pin 9 Data 4

Pin 10 Data 11

Pin 11 Data 3
Pin 12 Data 12

Pin 13 Data 2

Pin 14 Data 13

Pin 15 Data 1

Pin 16 Data 14

Pin 17 Data 0

Pin 18 Data 15

Pin 19 Ground

Pin 20 Key or VCC_in

Pin 21 DDRQ

Pin 22 Ground

Pin 23 I/O write

Pin 24 Ground

Pin 25 I/O read

Pin 26 Ground

Pin 27 IOCHRDY

Pin 28 Cable select

Pin 29 DDACK

Pin 30 Ground

Pin 31 IRQ

Pin 32 No connect

Pin 33 Addr 1
Pin 34 GPIO_DMA66_Detect

Pin 35 Addr 0

Pin 36 Addr 2

Pin 37 Chip select 1P

Pin 38 Chip select 3P

Pin 39 Activity

Pin 40 Ground

Parallel ATA (PATA) is an interface standard for the connection of storage devices such as
hard disks, solid-state drives, and CD-ROM drives in computers. The standard is maintained by
X3/INCITS committee[1]. It uses the underlying AT Attachment and AT Attachment Packet
Interface (ATA/ATAPI) standards.

The current Parallel ATA standard is the result of a long history of incremental technical
development. ATA/ATAPI is an evolution of the AT Attachment Interface, which was itself
evolved in several stages from Western Digital's original Integrated Drive Electronics (IDE)
interface. As a result, many near-synonyms for ATA/ATAPI and its previous incarnations exist,
including abbreviations such as IDE which are still in common informal use. After the market
introduction of Serial ATA in 2003, the original ATA was retroactively renamed Parallel ATA.

Parallel ATA only allows cable lengths up to 18 in (460 mm). Because of this length limit the
technology normally appears as an internal computer storage interface. For many years ATA
provided the most common and the least expensive interface for this application. By the
beginning of 2007, it had largely been replaced by Serial ATA (SATA) in new systems.

Contents
[hide]

 1 History and terminology


o 1.1 IDE and ATA-1
o 1.2 The second ATA interface
o 1.3 EIDE and ATA-2
o 1.4 ATAPI
o 1.5 Current terminology
o 1.6 Drive size limitations
 2 Parallel ATA interface
o 2.1 Pin 20
o 2.2 Pin 28
o 2.3 Pin 34
o 2.4 Differences between connectors on 80 conductor cables
o 2.5 Multiple devices on a cable
o 2.6 Cable select
 2.6.1 Master and slave clarification
o 2.7 Serialized, overlapped, and queued operations
o 2.8 Two devices on one cable — speed impact
 2.8.1 "Lowest speed"
 2.8.2 "One operation at a time"
o 2.9 HDD Passwords and Security
o 2.10 External Parallel ATA devices
 3 ATA standards versions, transfer rates, and features
 4 Related standards, features, and proposals
o 4.1 ATAPI Removable Media Device (ARMD)
o 4.2 ATA over Ethernet
 5 See also
 6 References

[edit] History and terminology


The name of the standard was originally conceived as "PC/AT Attachment" as its primary
feature was a direct connection to the 16-bit ISA bus introduced with the IBM PC/AT. The name
was shortened to "AT Attachment" to avoid possible trademark issues. It is not spelled out as
"Advanced Technology" anywhere in current or recent versions of the specification; it is simply
"AT Attachment".

[edit] IDE and ATA-1

The first version of what is now called the ATA/ATAPI interface was developed by Western
Digital under the name Integrated Drive Electronics (IDE). Together with Control Data
Corporation (who manufactured the hard drive part) and Compaq Computer (into whose systems
these drives would initially go), they developed the connector, the signalling protocols, and so on
with the goal of remaining software compatible with the existing ST-506 hard drive interface.[2]
The first such drives appeared in Compaq PCs in 1986.[3] [4]

The term Integrated Drive Electronics (IDE) refers not just to the connector and interface
definition, but also to the fact that the drive controller is integrated into the drive, as opposed to a
separate controller on or connected to the motherboard.[5] The integrated controller presented the
drive to the host computer as an array of 512-byte blocks with a relatively simple command
interface. This relieved the software in the host computer of the chores of stepping the disk head
arm, moving the head arm in and out, and so on, as had to be done with earlier ST-506 and ESDI
hard drives. All of these low-level details of the mechanical operation of the drive were now
handled by the controller on the drive itself. This also eliminated the need to design a single
controller that could handle many different types of drives, since the controller could be unique
for the drive. The host need only ask for a particular sector, or block, to be read or written, and
either accept the data from the drive or send the data to it.

The interface used by these IDE drives was standardized in 1994 as ANSI standard X3.221-
1994, AT Attachment Interface for Disk Drives. After later versions of the standard were
developed, this became known as "ATA-1".[6][7]

[edit] The second ATA interface

Originally, there was only one ATA controller in early PCs, which could support up to two hard
drives. At the time in combination with the floppy drive, this was sufficient for most people, and
eventually it became common to have two hard drives installed. When the CDROM was
developed, many computers were unable to accept them due to already having two hard drives
installed. Adding the CDROM would have required removal of one of the drives.

SCSI was available as a CDROM expansion option at the time, but devices with SCSI were more
expensive than ATA devices due to the need for a smart controller that is capable of bus
arbitration. SCSI typically added US$ 100-300 to the cost of a storage device, in addition to the
cost of a SCSI controller.

The less-expensive solution was the addition of the second ATA interface, typically included as
an expansion option on a sound card. It was included on the sound card because early business
PCs did not include support for more than simple beeps from the internal speaker, and tuneful
sound playback was considered unnecessary for early business software. When the CDROM was
introduced, it was logical to also add digital audio to the computer at the same time. An older
business PC could be upgraded in this manner to meet the Multimedia PC standard for early
software packages that used sound and colorful video animation.

The second drive interface initially was not well-defined. It was first introduced with modified
controller interfaces specific to certain CDROM drives such as Mitsumi, Sony or Panasonic[8],
and it was common to find early sound cards with two or three separate connectors each
designed to match a certain brand of CDROM drive. This evolved into the standard ATA
interface for ease of cross-compatibility, though the sound card ATA interface still usually
supported only a single CDROM and not hard drives.

This second ATA interface on the sound card eventually evolved into the second motherboard
ATA interface which was long included as a standard component in all PCs. For a long period of
time, ATA ruled as the primary storage device interface and in some systems a third and fourth
motherboard interface was provided (Promise Ultra-100), for up to eight ATA devices attached
to the motherboard.

After the introduction of SATA or Serial ATA, use of ATA declined and motherboards began to
be shipped with only a single interface, for up to two ATA optical drives, along with two or more
SATA connectors for hard drives. Optical drives are now available with SATA, so the ATA
interface often goes unused.
[edit] EIDE and ATA-2

In 1994, about the same time that the ATA-1 standard was adopted, Western Digital introduced
drives under a slightly new name, Enhanced IDE (EIDE). These included most of the features
of the forthcoming ATA-2 specification and several additional enhancements. Other
manufacturers introduced their own variations of ATA-1 such as "Fast ATA" and "Fast ATA-2".

The new version of the ANSI standard, AT Attachment Interface with Extensions ATA-2
(X3.279-1996), was approved in 1996. It included most of the features of the manufacturer-
specific variants.[9][10]

ATA-2 also was the first to note that devices other than hard drives could be attached to the
interface:

3.1.7 Device: Device is a storage peripheral. Traditionally, a device on the ATA interface has been a
hard disk drive, but any form of storage device may be placed on the ATA interface provided it adheres to
this standard.
—from [10], page 2

[edit] ATAPI

As mentioned in the previous sections ATA was originally designed for and worked only with
hard disks and devices that could emulate them. The introduction of ATAPI (ATA Packet
Interface) by a group called the Small Form Factor committee allowed ATA to be used for a
variety of other devices that require functions beyond those necessary for hard disks. For
example, any removable media device needs a "media eject" command, and a way for the host to
determine whether the media is present, and these were not provided in the ATA protocol.

The Small Form Factor committee approached this problem by defining ATAPI, the "ATA
Packet Interface". ATAPI is actually a protocol allowing the ATA interface to carry SCSI
commands and responses; therefore all ATAPI devices are actually "speaking SCSI" other than
at the electrical interface. In fact, some early ATAPI devices were simply SCSI devices with an
ATA/ATAPI to SCSI protocol converter added on. The SCSI commands and responses are
embedded in "packets" (hence "ATA Packet Interface") for transmission on the ATA cable. This
allows any device class for which a SCSI command set has been defined to be interfaced via
ATA/ATAPI.

ATAPI devices are also "speaking ATA", as the ATA physical interface and protocol are still
being used to send the packets. On the other hand, ATA hard drives and solid state drives do not
use ATAPI.

ATAPI devices include CD-ROM and DVD-ROM drives, tape drives, and large-capacity floppy
drives such as the Zip drive and SuperDisk drive.

The SCSI commands and responses used by each class of ATAPI device (CD-ROM, tape, etc.)
are described in other documents or specifications specific to those device classes and are not
within ATA/ATAPI or the T13 committee's purview.
ATAPI was adopted as part of ATA in INCITS 317-1998, AT Attachment with Packet Interface
Extension (ATA/ATAPI-4).[11][12][13]

[edit] Current terminology

The terms "integrated drive electronics" (IDE), "enhanced IDE" and "EIDE" have come to be
used interchangeably with ATA (now Parallel ATA). However the terms "IDE" and "EIDE" are
at best imprecise. Every ATA drive is an "integrated drive electronics" drive, but SCSI drives
could also legitmately be described as having "integrated drive electronics". However the
abbreviation IDE is rarely if ever used for SCSI drives.

In addition there have been several generations of "EIDE" drives marketed, compliant with
various versions of the ATA specification. An early "EIDE" drive might be compatible with
ATA-2, while a later one with ATA-6.

Nevertheless a request for an "IDE" or "EIDE" drive from a computer parts vendor will almost
always yield a drive that will work with modern systems' ATA interfaces.

Another common usage is to refer to the specification version by the fastest mode supported. For
example, ATA-4 supported Ultra DMA modes 0 through 2, the latter providing a maximum
transfer rate of 33 megabytes per second. ATA-4 drives are thus sometimes called "UDMA-33"
drives. Similarly, ATA-6 introduced a maximum transfer speed of 100 megabytes per second,
and some drives complying to this version of the standard are marketed as "PATA/100" drives.

[edit] Drive size limitations

The original ATA specification used a 28-bit addressing mode, allowing for the addressing of 228
(268,435,456) sectors (blocks) of 512 bytes each, resulting in a maximum capacity of about
137 gigabytes[14]. (This is displayed by Windows operating systems as "128 GB".) The BIOS in
early PCs imposed smaller limits such as 8.46 GB, with a maximum of 1024 cylinders, 256
heads and 63 sectors, but this was not a limit imposed by the ATA interface.

ATA-6 introduced 48-bit addressing, increasing the limit to 144 petabytes. As a consequence,


any ATA drive of capacity larger than 137 gigabytes must be an ATA-6 or later drive.
Connecting such a drive to a host with an ATA-5 or earlier interface will limit the usable
capacity to the maximum of the controller.

Some OSs, including Windows 2000 pre-SP4, disable 48-bit LBA by default, requiring the user
to take extra steps to use the entire capacity of an ATA drive larger than 137 gigabytes. [15]

[edit] Parallel ATA interface


Until the introduction of Serial ATA, 40-pin connectors generally attached drives to a ribbon
cable. Each cable has two or three connectors, one of which plugs into an adapter interfacing
with the rest of the computer system. The remaining connector(s) plug into drives. Parallel ATA
cables transfer data 16 bits at a time.

ATA's ribbon cables have had 40 wires for most of its history (44 conductors for the smaller
form-factor version used for 2.5" drives), but an 80-wire version appeared with the introduction
of the Ultra DMA/33 (UDMA) mode. All of the additional wires in the new cable are ground
wires, interleaved with the previously defined wires to reduce the effects of capacitive coupling
between neighboring signal wires, reducing crosstalk. Capacitive coupling is more of a problem
at higher transfer rates, and this change was necessary to enable the 66 megabytes per second
(MB/s) transfer rate of UDMA4 to work reliably. The faster UDMA5 and UDMA6 modes also
require 80-conductor cables.

ATA cables:
40 wire ribbon cable (top)
80 wire ribbon cable (bottom)

Though the number of wires doubled, the number of connector pins and the pinout remain the
same as 40-conductor cables, and the external appearance of the connectors is identical.
Internally the connectors are different; the connectors for the 80-wire cable connect a larger
number of ground wires to a smaller number of ground pins, while the connectors for the 40-wire
cable connect ground wires to ground pins one-for-one. 80-wire cables usually come with three
differently colored connectors (blue — controller, gray — slave drive, and black — master
drive) as opposed to uniformly colored 40-wire cable's connectors (all black). The gray
connector on 80-conductor cables has pin 28 CSEL not connected; making it the slave position
for drives configured cable select.
[edit] Pin 20

In the ATA standard pin 20 is defined as (mechanical) key and is not used; i.e., this socket on the
female connector is often obstructed, and a cable or drive connector with a pin in this position
cannot be connected, making it impossible to plug in a connector the wrong way round.
However, some flash memory drives can use pin 20 as VCC_in to power the drive without
requiring a special power cable[16].

[edit] Pin 28

Pin 28 of the gray (slave/middle) connector of an 80 conductor cable is not attached to any
conductor of the cable. It is attached normally on the black (master drive end) and blue
(motherboard end) connectors.

[edit] Pin 34

Pin 34 is connected to ground inside the blue connector of an 80 conductor cable but not attached
to any conductor of the cable. It is attached normally on the gray and black connectors. See page
315 of [1].

[edit] Differences between connectors on 80 conductor cables


The image shows PATA connectors after removal of strain relief, cover, and cable. Pin one is at
bottom left of the connectors, pin 2 is top left, etc., except that the lower image of the blue
connector shows the view from the opposite side, and pin one is at top right.

Each contact comprises a pair of points which together pierce the insulation of the ribbon cable
with such precision that they make a connection to the desired conductor without harming the
insulation on the neighboring wires. The center row of contacts are all connected to the common
ground bus and attached to the odd numbered conductors of the cable. The top row of contacts
are the even-numbered sockets of the connector (mating with the even-numbered pins of the
receptacle) and attach to every other even-numbered conductor of the cable. The bottom row of
contacts are the odd-numbered sockets of the connector (mating with the odd-numbered pins of
the receptacle) and attach to the remaining even-numbered conductors of the cable.

Note the connections to the common ground bus from sockets 2 (top left), 19 (center bottom
row), 22, 24, 26, 30, and 40 on all connectors. Also note (enlarged detail, bottom, looking from
the opposite side of the connector) that socket 34 of the blue connector does not contact any
conductor but unlike socket 34 of the other two connectors, it does connect to the common
ground bus. On the gray connector, note that socket 28 is completely missing, so that pin 28 of
the drive attached to the gray connector will be open. On the black connector, sockets 28 and 34
are completely normal, so that pins 28 and 34 of the drive attached to the black connector will be
connected to the cable. Pin 28 of the black drive reaches pin 28 of the host receptacle but not pin
28 of the gray drive, while pin 34 of the black drive reaches pin 34 of the gray drive but not pin
34 of the host. Instead, pin 34 of the host is grounded.

The standard dictates color-coded connectors for easy identification by both installer and cable
maker. All three connectors are different from one another. The blue (host) connector has the
socket for pin 34 connected to ground inside the connector but not attached to any conductor of
the cable. Since the old 40 conductor cables do not ground pin 34, the presence of a ground
connection indicates that an 80 conductor cable is installed. The wire for pin 34 is attached
normally on the other types and is not grounded. Installing the cable backwards (with the black
connector on the system board, the blue connector on the remote device and the gray connector
on the center device) will ground pin 34 of the remote device and connect host pin 34 through to
pin 34 of the center device. The gray center connector omits the connection to pin 28 but
connects pin 34 normally, while the black end connector connects both pins 28 and 34 normally.

[edit] Multiple devices on a cable

If two devices attach to a single cable, one must be designated as device 0 (commonly referred to
as master) and the other as device 1 (slave). This distinction is necessary to allow both drives to
share the cable without conflict. The master drive is the drive that usually appears "first" to the
computer's BIOS and/or operating system. On old BIOSes (Intel 486 era and older), the drives
are often referred to by the BIOS as "C" for the master and "D" for the slave following the way
DOS would refer to the active primary partitions on each.

The mode that a drive must use is often set by a jumper setting on the drive itself, which must be
manually set to master or slave. If there is a single device on a cable, it should be configured as
master. However, some hard drives have a special setting called single for this configuration
(Western Digital, in particular). Also, depending on the hardware and software available, a single
drive on a cable can work reliably even though configured as the slave drive (this configuration
is most often seen when a CD ROM has a channel to itself).

[edit] Cable select

A drive mode called cable select was described as optional in ATA-1 and has come into fairly
widespread use with ATA-5 and later. A drive set to "cable select" automatically configures
itself as master or slave, according to its position on the cable. Cable select is controlled by pin
28. The host adapter grounds this pin; if a device sees that the pin is grounded, it becomes the
master device; if it sees that pin 28 is open, the device becomes the slave device.

This setting is usually chosen by a jumper setting on the drive called "cable select", usually
marked CS, which is separate from the "master" or "slave" setting.

Note that if two drives are configured as master and slave manually, this configuration does not
need to correspond to their position on the cable. Pin 28 is only used to let the drives know their
position on the cable; it is not used by the host when communicating with the drives.

With the 40-wire cable it was very common to implement cable select by simply cutting the pin
28 wire between the two device connectors; putting the slave device at the end of the cable, and
the master on the middle connector. This arrangement eventually was standardized in later
versions. If there is just one device on the cable, this results in an unused stub of cable, which is
undesirable for physical convenience and electrical reasons. The stub causes signal reflections,
particularly at higher transfer rates.

Starting with the 80-wire cable defined for use in ATAPI5/UDMA4, the master device goes at
the end of the 18-inch (460 mm) cable--the black connector--and the slave device goes on the
middle connector--the gray one--and the blue connector goes onto the motherboard. So, if there
is only one (master) device on the cable, there is no cable stub to cause reflections. Also, cable
select is now implemented in the slave device connector, usually simply by omitting the contact
from the connector body.

[edit] Master and slave clarification

Although they are in common use, the terms master and slave do not actually appear in current
versions of the ATA specifications. The two devices are correctly referred to as device 0 (master)
and device 1 (slave), respectively. It is a common myth that the controller on the master drive
assumes control over the slave drive, or that the master drive may claim priority of
communication over the other device on the channel. In fact, the drivers in the host operating
system perform the necessary arbitration and serialization, and each drive's controller operates
independently.
The terms "master" and "slave" have not been without controversy. In 2003, the County of Los
Angeles, California, US demanded that suppliers stop using the terms because the county found
them unacceptable in light of its "cultural diversity and sensitivity." [2]

[edit] Serialized, overlapped, and queued operations

The parallel ATA protocols up through ATA-3 require that once a command has been given on
an ATA interface, it must complete before any subsequent command may be given. Operations
on the devices must be serialized—with only one operation in progress at a time—with respect to
the ATA host interface. A useful mental model is that the host ATA interface is busy with the
first request for its entire duration, and therefore can not be told about another request until the
first one is complete. The function of serializing requests to the interface is usually performed by
a device driver in the host operating system.

The ATA-4 and subsequent versions of the specification have included an "overlapped feature
set" and a "queued feature set" as optional features. However, support for these is extremely rare
in actual parallel ATA products and device drivers. By contrast, overlapped and queued
operations have been common in other storage buses. In particular, tagged command queuing is
characteristic of SCSI; this has long been seen as a major advantage of SCSI.

The Serial ATA standard has supported native command queueing since its first release, but it is
an optional feature for both host-adapters and target-devices. Many less expensive PC
motherboards do not support NCQ. Nearly all SATA/II hard drives sold today support NCQ,
while very few removable (CD/DVD) drives do.

[edit] Two devices on one cable — speed impact

There are many debates about how much a slow device can impact the performance of a faster
device on the same cable. There is an effect, but the debate is confused by the blurring of two
quite different causes, called here "Lowest speed" and "One operation at a time".

[edit] "Lowest speed"

It is a common misconception that, if two devices of different speed capabilities are on the same
cable, both devices' data transfers will be constrained to the speed of the slower device.

For all modern ATA host adapters this is not true, as modern ATA host adapters support
independent device timing. This allows each device on the cable to transfer data at its own best
speed. Even with older adapters without independent timing, this effect only applies to the data
transfer phase of a read or write operation. This is usually the shortest part of a complete read or
write operation. [17]

[edit] "One operation at a time"

This is caused by the omission of both overlapped and queued feature sets from most parallel
ATA products. Only one device on a cable can perform a read or write operation at one time,
therefore a fast device on the same cable as a slow device under heavy use will find it has to
wait for the slow device to complete its task first.

However, most modern devices will report write operations as complete once the data is stored in
its onboard cache memory, before the data is written to the (slow) magnetic storage. This allows
commands to be sent to the other device on the cable, reducing the impact of the "one operation
at a time" limit.

The impact of this on a system's performance depends on the application. For example, when
copying data from an optical drive to a hard drive (such as during software installation), this
effect probably doesn't matter: Such jobs are necessarily limited by the speed of the optical drive
no matter where it is. But if the hard drive in question is also expected to provide good
throughput for other tasks at the same time, it probably should not be on the same cable as the
optical drive.

[edit] HDD Passwords and Security

The disk lock is a built-in security feature in the disk. It is part of the ATA specification, and
thus not specific to any brand or device.

A disk always has two passwords: A User password and a Master password. Most disks support
a Master Password Revision Code. Reportedly some disks can tell you if the Master password
has been changed, or if it still the factory default. The revision code is word 92 in the IDENTIFY
response. Reportedly on some disks a value of 0xFFFE means the Master password is
unchanged. The standard does not distinguish this value.

A disk can be locked in two modes: High security mode or Maximum security mode. Bit 8 in
word 128 of the IDENTIFY response tell you which mode your disk is in: 0 = High, 1 =
Maximum.

In High security mode, you can unlock the disk with either the User or Master password, using
the "SECURITY UNLOCK DEVICE" ATA command. There is an attempt limit, normally set to
5, after which you must power cycle or hard-reset the disk before you can attempt again. Also in
High security mode the SECURITY ERASE UNIT command can be used with either the User or
Master password.

In Maximum security mode, you cannot unlock the disk without the User password - the only
way to get the disk back to a usable state is to issue the SECURITY ERASE PREPARE
command, immediately followed by SECURITY ERASE UNIT. In Maximum security mode the
SECURITY ERASE UNIT command requires the User password and will completely erase all
data on the disk. The operation is rather slow, expect half an hour or more for big disks. (Word
89 in the IDENTIFY response indicates how long the operation will take.) [18][19]
[edit] External Parallel ATA devices

It is extremely uncommon to find external PATA devices that directly use the interface for
connection to a computer. PATA is primarily restricted to devices installed internally, due to the
short data cable specification. A device connected externally needs additional cable length to
form a U-shaped bend so that the external device may be placed alongside, or on top of the
computer case, and the standard cable length is too short to permit this.

For ease of reach from motherboard to device, the connectors tend to be positioned towards the
front edge of motherboards, for connection to devices protruding from the front of the computer
case. This front-edge position makes extension out the back to an external device even more
difficult. Ribbon cables are poorly shielded, and the standard relies upon the cabling to be
installed inside a shielded computer case to meet RF emissions limits.

All external PATA devices, such as external hard drives, use some other interface technology to
bridge the distance between the external device and the computer. USB is the most common
external interface, followed by Firewire. A bridge chip inside the external devices converts from
the USB interface to PATA, and typically only supports a single external device without cable
select or master/slave.

A side effect of the PATA bridge chip is that most bridges do not properly support PATA device
idle and power save, which causes external hard drives to spin continuously even when the
connected computer is in standby or turned off. This can result in shortened hard drive lifespan
due to continuous operation at all times when the external device is powered.

[edit] ATA standards versions, transfer rates, and features


The following table shows the names of the versions of the ATA standards and the transfer
modes and rates supported by each. Note that the transfer rate for each mode (for example,
66.7 MB/s for UDMA4, commonly called "Ultra-DMA 66") gives its maximum theoretical
transfer rate on the cable. This is simply two bytes multiplied by the effective clock rate, and
presumes that every clock cycle is used to transfer end-user data. In practice, of course, protocol
overhead reduces this value.

Congestion on the host bus to which the ATA adapter is attached may also limit the maximum
burst transfer rate. For example, the maximum data transfer rate for conventional PCI bus is
133 MB/s, and this is shared among all active devices on the bus.

In addition, no ATA hard drives existed in 2005 that were capable of measured sustained transfer
rates of above 80 MB/s. Furthermore, sustained transfer rate tests do not give realistic throughput
expectations for most workloads: They use I/O loads specifically designed to encounter almost
no delays from seek time or rotational latency. Hard drive performance under most workloads is
limited first and second by those two factors; the transfer rate on the bus is a distant third in
importance. Therefore, transfer speed limits above 66 MB/s really affect performance only when
the hard drive can satisfy all I/O requests by reading from its internal cache — a very unusual
situation, especially considering that such data is usually already buffered by the operating
system.

As of April 2009 mechanical hard disk drives can transfer data at up to 131 MB/s,[20] which is
within the capabilities of the older PATA/133 specification. However, high-performance flash
drives can transfer data at up to 201 MB/s.[21]

Only the Ultra DMA modes use CRC to detect errors in data transfer between the controller and
drive. This is a 16 bit CRC, and it is used for data blocks only. Transmission of command and
status blocks do not use the fast signaling methods that would necessitate CRC. For comparison,
in Serial ATA, 32 bit CRC is used for both commands and data. [22]

Transfer
Other Maximum ANSI
Standard Modes Other New Features
Names disk size Reference
(MB/s)

pre-ATA IDE PIO 0 2.1 GB 22-bit logical block addressing (LBA) -

PIO 0, 1, 2
(3.3, 5.2, 8.3)
Single-word X3.221-1994
ATA-1 ATA, IDE DMA 0, 1, 2 137 GB 28-bit logical block addressing (LBA) (obsolete
(2.1, 4.2, 8.3) since 1999)
Multi-word
DMA 0 (4.2)

PIO 3, 4:
EIDE, Fast
(11.1, 16.6) X3.279-1996
ATA,
ATA-2 Multi-word PCMCIA connector (obsolete
Fast IDE,
DMA 1, 2 since 2001)
Ultra ATA
(13.3, 16.7)

Single-word X3.298-1997
S.M.A.R.T., Security, 44 pin
ATA-3 EIDE DMA modes (obsolete
connector for 2.5" drives
dropped [3] since 2002)

ATA/ATAPI- ATA-4, Ultra DMA 0, AT Attachment Packet Interface NCITS 317-


4 Ultra 1, 2 (16.7, (ATAPI) (support for CD-ROM, tape
drives etc.), Optional overlapped and
25.0, 33.3) queued command set features, Host
ATA/33 aka Protected Area (HPA), CompactFlash 1998
UDMA/33 Association (CFA) feature set for
solid state drives

Ultra DMA 3,
ATA-5,
ATA/ATAPI- 4 (44.4, 66.7) 80-wire cables; CompactFlash NCITS 340-
Ultra
5 aka connector 2000
ATA/66
UDMA/66

UDMA 5
ATA-6, 48-bit LBA, Device Configuration
ATA/ATAPI- (100) NCITS 361-
Ultra 144 PB Overlay (DCO),
6 aka 2002
ATA/100 Automatic Acoustic Management
UDMA/100

NCITS 397-
UDMA 6
2005 (vol 1)
ATA-7, (133) SATA 1.0, Streaming feature set, long
ATA/ATAPI-
Ultra aka logical/physical sector feature set for
7 NCITS 397-
ATA/133 UDMA/133 non-packet devices
2005 (vol 2)
SATA/150 NCITS 397-
2005 (vol 3)
ATA/ATAPI- Hybrid drive featuring non-volatile
ATA-8  — In progress
8 cache to speed up critical OS files

[edit] Related standards, features, and proposals


[edit] ATAPI Removable Media Device (ARMD)

ATAPI devices with removable media, other than CD and DVD drives, are classified as ARMD
(ATAPI Removable Media Device) and can appear as either a super-floppy (non-partitioned
media) or a hard drive (partitioned media) to the operating system. These can be supported as
bootable devices by a BIOS complying with the ATAPI Removable Media Device BIOS
Specification[23], originally developed by Compaq Computer Corporation and Phoenix
Technologies. It specifies provisions in the BIOS of a personal computer to allow the computer
to be bootstrapped from devices such as Zip drives, Jaz drives, SuperDisk (LS-120) drives, and
similar devices.
These devices have removable media like floppy disk drives, but capacities more commensurate
with hard drives, and programming requirements unlike either. Due to limitations in the floppy
controller interface most of these devices were ATAPI devices, connected to one of the host
computer's ATA interfaces, similarly to a hard drive or CD-ROM device. However, existing
BIOS standards did not support these devices. An ARMD-compliant BIOS allows these devices
to booted from and used under the operating system without requiring device-specific code in the
OS.

A BIOS implementing ARMD allows the user to include ARMD devices in the boot search
order. Usually an ARMD device is configured earlier in the boot order than the hard drive.
Similarly to a floppy drive, if bootable media is present in the ARMD drive, the BIOS will boot
from it; if not, the BIOS will continue in the search order, usually with the hard drive last.

There are two variants of ARMD, ARMD-FDD and ARMD-HDD. Originally ARMD caused the
devices to appear as a sort of very large floppy drive, either the primary floppy drive device 00h
or the secondary device 01h. Some operating systems required code changes to support floppy
disks with capacities far larger than any standard floppy disk drive. Also, standard-floppy disk
drive emulation proved to be unsuitable for certain high-capacity floppy disk drives such as
Iomega Zip drives. Later the ARMD-HDD, ARMD-"Hard disk device", variant was developed
to address these issues. Under ARMD-HDD, an ARMD device appears to the BIOS and the
operating system as a hard drive.

[edit] ATA over Ethernet

In August 2004, Sam Hopkins and Brantley Coile of Coraid specified a lightweight ATA-over-
Ethernet protocol to carry ATA commands over Ethernet instead of directly connecting them to a
PATA host adapter. This permitted the established block protocol to be reused in Storage area
network applications.

iSCSI
From Wikipedia, the free encyclopedia
Jump to: navigation, search

In computing, iSCSI (pronounced /аɪsˈkʌzi/ or eye-scuzzy), is an abbreviation of Internet Small


Computer System Interface, an Internet Protocol (IP)-based storage networking standard for
linking data storage facilities. By carrying SCSI commands over IP networks, iSCSI is used to
facilitate data transfers over intranets and to manage storage over long distances. iSCSI can be
used to transmit data over local area networks (LANs), wide area networks (WANs), or the
Internet and can enable location-independent data storage and retrieval. The protocol allows
clients (called initiators) to send SCSI commands (CDBs) to SCSI storage devices (targets) on
remote servers. It is a popular storage area network (SAN) protocol, allowing organizations to
consolidate storage into data center storage arrays while providing hosts (such as database and
web servers) with the illusion of locally-attached disks. Unlike traditional Fibre Channel, which
requires special-purpose cabling, iSCSI can be run over long distances using existing network
infrastructure.

Contents
[hide]

 1 Functionality
 2 Concepts
o 2.1 Initiator
 2.1.1 Host Bus Adapter
 2.1.2 TCP Offload Engine
o 2.2 Target
o 2.3 Logical Unit Number
o 2.4 Addressing
o 2.5 iSNS
 3 Security
o 3.1 Authentication
o 3.2 Authorization
o 3.3 Confidentiality and integrity
 4 Industry support
o 4.1 Operating-system support
o 4.2 Targets
o 4.3 Converters and bridges
 5 See also
 6 References
 7 External links
o 7.1 RFCs

[edit] Functionality
uses TCP/IP (typically TCP ports 860 and 3260). In essence, iSCSI simply allows two hosts to
negotiate and then exchange SCSI commands using IP networks. By doing this iSCSI takes a
popular high-performance local storage bus and emulates it over wide-area networks, creating a
storage area network (SAN). Unlike some SAN protocols, iSCSI requires no dedicated cabling;
it can be run over existing switching and IP infrastructure. As a result, iSCSI is often seen as a
low-cost alternative to Fibre Channel, which requires dedicated infrastructure.

Although iSCSI can communicate with arbitrary types of SCSI devices, system administrators
almost always use it to allow server computers (such as database servers) to access disk volumes
on storage arrays. iSCSI SANs often have one of two objectives:

Storage consolidation
Organizations move disparate storage resources from servers around their network to central
locations, often in data centers; this allows for more efficiency in the allocation of storage. In a
SAN environment, a server can be allocated a new disk volume without any change to hardware
or cabling.

Disaster recovery

Organizations mirror storage resources from one data center to a remote data center, which can
serve as a hot standby in the event of a prolonged outage. In particular, iSCSI SANs allow entire
disk arrays to be migrated across a WAN with minimal configuration changes, in effect making
storage "routable" in the same manner as network traffic.

[edit] Concepts
[edit] Initiator
Further information: SCSI initiator

An initiator functions as an iSCSI client. An initiator typically serves the same purpose to a
computer as a SCSI bus adapter would, except that instead of physically cabling SCSI devices
(like hard drives and tape changers), an iSCSI initiator sends SCSI commands over an IP
network. An initiator falls into two broad types:

Software initiator

A software initiator uses code to implement iSCSI. Typically, this happens in a kernel-resident
device driver that uses the existing network card (NIC) and network stack to emulate SCSI
devices for a computer by speaking the iSCSI protocol. Software initiators are available for most
mainstream operating systems, and this type is the most common mode of deploying iSCSI on
computers.

Hardware initiator

A hardware initiator uses dedicated hardware, typically in combination with software (firmware)
running on that hardware, to implement iSCSI. A hardware initiator mitigates the overhead of
iSCSI and TCP processing and Ethernet interrupts, and therefore may improve the performance
of servers that use iSCSI.

[edit] Host Bus Adapter

An iSCSI host bus adapter (more commonly, HBA) implements a hardware initiator. A typical
HBA is packaged as a combination of a Gigabit (or 10 Gigabit) Ethernet NIC, some kind of
TCP/IP offload engine (TOE) technology and a SCSI bus adapter, which is how it appears to the
operating system.

An iSCSI HBA can include PCI option ROM to allow booting from an iSCSI target.
[edit] TCP Offload Engine
Main article: TCP Offload Engine

A TCP Offload Engine, or "TOE Card", offers an alternative to a full iSCSI HBA. A TOE
"offloads" the TCP/IP operations for this particular network interface from the host processor,
freeing up CPU cycles for the main host applications. When a TOE is used rather than an HBA,
the host processor still has to perform the processing of the iSCSI protocol layer itself, but the
CPU overhead for that task is low.

iSCSI HBAs or TOEs are used when the additional performance enhancement justifies the
additional expense of using an HBA for iSCSI, rather than using a software-based iSCSI client
(initiator).

[edit] Target
Further information: SCSI target

iSCSI Target refers to a storage resource located on an iSCSI server (more generally, one of
potentially many instances of iSCSI storage nodes running on that server) as a "target". An
iSCSI target usually represents hard disk storage that works over the IP or Ethernet networks. As
with initiators, software to provide an iSCSI target is available for most mainstream operating
systems.

As with initiators, software to provide an iSCSI Target is available for most mainstream
operating systems. Common deployment scenarios for an iSCSI target include:

Storage array

In a data center or enterprise environment, an iSCSI target often resides in a large storage array,
such as a NetApp filer or an EMC Corporation NS-series computer appliance. A storage array
usually provides distinct iSCSI targets for numerous clients. [1]

Software target

In a smaller or more specialized setting, mainstream server operating systems (like Linux, Solaris
or Windows Server 2008) and some specific-purpose operating systems (like NexentaStor,
FreeNAS, OpenFiler StarWind Software or FreeSiOS) can provide iSCSI target's functionality.

"iSCSI Target" should not be confused with the term "iSCSI" as the latter is a protocol and not a
storage server instance.

[edit] Logical Unit Number


Main article: Logical Unit Number

In SCSI terminology, LUN stands for logical unit number. A LUN represents an individually
addressable (logical) SCSI device that is part of a physical SCSI device (target). In an iSCSI
environment, LUNs are essentially numbered disk drives. An initiator negotiates with a target to
establish connectivity to a LUN; the result is an iSCSI connection that emulates a connection to a
SCSI hard disk. Initiators treat iSCSI LUNs the same way as they would a raw SCSI or IDE hard
drive; for instance, rather than mounting remote directories as would be done in NFS or CIFS
environments, iSCSI systems format and directly manage filesystems on iSCSI LUNs.

In enterprise deployments, LUNs usually represent slices of large RAID disk arrays, often
allocated one per client. iSCSI imposes no rules or restrictions on multiple computers sharing
individual LUNs; it leaves shared access to a single underlying filesystem as a task for the
operating system.

[edit] Addressing

Special names refer to both iSCSI initiators and targets. iSCSI provides three name-formats:

iSCSI Qualified Name (IQN)

Format: iqn.yyyy-mm.{reversed domain name} (e.g. iqn.2001-


04.com.acme:storage.tape.sys1.xyz) (Note: there is an optional colon with arbitrary
text afterwards. This text is there to help better organize or label resources.)

Extended Unique Identifier (EUI)

Format: eui.{EUI-64 bit address} (e.g. eui.02004567A425678D)

T11 Network Address Authority (NAA)

Format: naa.{NAA 64 or 128 bit identifier} (e.g. naa.52004567BA64678D)

IQN format addresses occur most commonly. They are qualified by a date (yyyy-mm) because
domain names can expire or be acquired by another entity.

The IEEE Registration authority provides EUI in accordance with the EUI-64 standard. NAA is
part OUI which is provided by the IEEE Registration Authority. NAA name formats were added
to iSCSI in RFC 3980, to provide compatibility with naming conventions used in Fibre Channel
and Serial Attached SCSI (SAS) storage technologies.

Usually an iSCSI participant can be defined by three or four fields:

1. Hostname or IP Address (e.g., "iscsi.example.com")


2. Port Number (e.g., 3260)
3. iSCSI Name (e.g., the IQN "iqn.2003-01.com.ibm:00.fcd0ab21.shark128")
4. An optional CHAP Secret (e.g., "secretsarefun")

[edit] iSNS
Main article: Internet Storage Name Service
iSCSI initiators can locate appropriate storage resources using the Internet Storage Name Service
(iSNS) protocol. In theory, iSNS provides iSCSI SANs with the same management model as
dedicated Fibre Channel SANs. In practice, administrators can satisfy many deployment goals
for iSCSI without using iSNS.

[edit] Security
[edit] Authentication

iSCSI initiators and targets prove their identity to each other using the CHAP protocol, which
includes a mechanism to prevent cleartext passwords from appearing on the wire. By itself, the
CHAP protocol is vulnerable to dictionary attacks, spoofing, or reflection attacks. If followed
carefully, the rules for using CHAP within iSCSI prevent most of these attacks.[2]

Additionally, as with all IP-based protocols, IPsec can operate at the network layer. The iSCSI
negotiation protocol is designed to accommodate other authentication schemes, though
interoperability issues limit their deployment.

To ensure that only valid initiators connect to storage arrays, administrators most commonly run
iSCSI only over logically-isolated backchannel networks. In this deployment architecture, only
the management ports of storage arrays are exposed to the general-purpose internal network, and
the iSCSI protocol itself is run over dedicated network segments or virtual LANs (VLAN). This
mitigates authentication concerns; unauthorized users aren't physically provisioned for iSCSI,
and thus can't talk to storage arrays. However, it also creates a transitive trust problem, in that a
single compromised host with an iSCSI disk can be used to attack storage resources for other
hosts.

[edit] Authorization

Because iSCSI aims to consolidate storage for many servers into a single storage array, iSCSI
deployments require strategies to prevent unrelated initiators from accessing storage resources.
As a pathological example, a single enterprise storage array could hold data for servers variously
regulated by the Sarbanes-Oxley Act for corporate accounting, HIPAA for health benefits
information, and PCI DSS for credit card processing. During an audit, storage systems must
demonstrate controls to ensure that a server under one regime cannot access the storage assets of
a server under another.

Typically, iSCSI storage arrays explicitly map initiators to specific target LUNs; an initiator
authenticates not to the storage array, but to the specific storage asset it intends to use. However,
because the target LUNs for SCSI commands are expressed both in the iSCSI negotiation
protocol and in the underlying SCSI protocol, care must be taken to ensure that access control is
provided consistently.
[edit] Confidentiality and integrity
This section needs additional citations for verification.
Please help improve this article by adding reliable references. Unsourced material may be challenged and
removed. (July 2009)

For the most part, iSCSI operates as a cleartext protocol that provides no cryptographic
protection for data in motion during SCSI transactions. As a result, an attacker who can listen in
on iSCSI ethernet traffic can:

 reconstruct and copy the files and filesystems being transferred on the wire

 alter the contents of files by injecting fake iSCSI frames

 corrupt filesystems being accessed by initiators, exposing servers to software flaws in poorly-
tested filesystem code.

These problems do not occur only with iSCSI, but rather apply to any IP-based SAN protocol
without cryptographic security. Adoption and deployment of IPsec, frequently cited as a solution
to the IP SAN security problem, has been hampered by performance and compatibility issues
Fibre Channel, or FC, is a gigabit-speed network technology primarily used for storage
networking. Fibre Channel is standardized in the T11 Technical Committee of the InterNational
Committee for Information Technology Standards (INCITS), an American National Standards
Institute (ANSI)–accredited standards committee. It started use primarily in the supercomputer
field, but has become the standard connection type for storage area networks (SAN) in enterprise
storage. Despite its name, Fibre Channel signaling can run on both twisted pair copper wire and
fiber-optic cables.

Fibre Channel Protocol (FCP) is a transport protocol (similar to TCP used in IP networks)
which predominantly transports SCSI commands over Fibre Channel networks.

Contents
[hide]

 1 History
 2 Fibre Channel topologies
 3 Fibre Channel layers
 4 Ports
 5 Optical carrier medium variants
 6 Fibre Channel infrastructure
 7 Fibre Channel Host Bus Adapters
 8 See also
 9 References
 10 External links

[edit] History
Fibre Channel started in 1988, with ANSI standard approval in 1994, as a way to simplify the
HIPPI system then in use for similar roles. HIPPI used a massive 50-pair cable with bulky
connectors, and had limited cable lengths. When Fibre Channel started to compete for the mass
storage market its primary competitor was IBM's proprietary Serial Storage Architecture (SSA)
interface. Eventually the market chose Fibre Channel over SSA, arguably a better interconnect
technology, rather than give IBM control over the next generation of mid to high end storage
technology. Fibre Channel was primarily concerned with simplifying the connections and
increasing distances, as opposed to increasing speeds. Later, designers added the goals of
connecting SCSI disk storage, providing higher speeds and far greater numbers of connected
devices.
It also added support for any number of "upper layer" protocols, including SCSI, ATM, and IP,
with SCSI being the predominant usage.

The following table shows Fibre Channel speed variants:[1]

Fibre Channel Variants

Line-Rate
NAME Throughput (MBps)* Availability
(GBaud)

1GFC 1.0625 200 1997

2GFC 2.125 400 2001

4GFC 4.25 800 2005

8GFC 8.5 1600 2008

10GFC Serial 10.52 2400 2004

20GFC 21.04 4800 2008

10GFC
12.75
Parallel

 Throughput for duplex connections

[edit] Fibre Channel topologies


There are three major Fibre Channel topologies, describing how a number of ports are connected
together. A port in Fibre Channel terminology is any entity that actively communicates over the
network, not necessarily a hardware port. This port is usually implemented in a device such as
disk storage, an HBA on a server or a Fibre Channel switch.
 Point-to-Point (FC-P2P). Two devices are connected back to back. This is the simplest
topology, with limited connectivity.

 Arbitrated loop (FC-AL). In this design, all devices are in a loop or ring, similar to token
ring networking. Adding or removing a device from the loop causes all activity on the
loop to be interrupted. The failure of one device causes a break in the ring. Fibre Channel
hubs exist to connect multiple devices together and may bypass failed ports. A loop may
also be made by cabling each port to the next in a ring.
o A minimal loop containing only two ports, while appearing to be similar to FC-
P2P, differs considerably in terms of the protocol.
o Multiple pairs of ports may communicate simultaneously in a loop.

 Switched fabric (FC-SW). All devices or loops of devices are connected to Fibre
Channel switches, similar conceptually to modern Ethernet implementations. Advantages
of this topology over FC-P2P or FC-AL include:
o The switches manage the state of the fabric, providing optimized
interconnections.
o The traffic between two ports flows through the switches only, it is not
transmitted to any other port.
o Failure of a port is isolated and should not affect operation of other ports.

Point-to-
Attribute Arbitrated loop Switched fabric
Point

Max ports 2 127 ~16777216 (224)

Address size N/A 8-bit ALPA 24-bit port ID

Loop fails (until port


Side effect of port failure N/A N/A
bypassed)

Mixing different link rates N/A No Yes

Frame delivery In order In order Not guaranteed

Access to medium Dedicated Arbitrated Dedicated


[edit] Fibre Channel layers
Fibre Channel is a layered protocol. It consists of 5 layers, namely:

 FC0 The physical layer, which includes cables, fiber optics, connectors, pinouts etc.
 FC1 The data link layer, which implements the 8b/10b encoding and decoding of signals.
 FC2 The network layer, defined by the FC-PI-2 standard, consists of the core of Fibre
Channel, and defines the main protocols.
 FC3 The common services layer, a thin layer that could eventually implement functions
like encryption or RAID.
 FC4 The Protocol Mapping layer. Layer in which other protocols, such as SCSI, are
encapsulated into an information unit for delivery to FC2.

FC0, FC1, and FC2 are also known as FC-PH, the physical layers of fibre channel.

Fibre Channel routers operate up to FC4 level (i.e. they may operate as SCSI routers), switches
up to FC2, and hubs on FC0 only.

Fibre Channel products are available at 1 Gbit/s, 2 Gbit/s, 4 Gbit/s, 8 Gbit/s, 10 Gbit/s and 20
Gbit/s. Products based on the 1, 2, 4 and 8 Gbit/s standards should be interoperable, and
backward compatible. The 10 Gbit/s standard (and 20 Gbit/s derivative), however, is not
backward compatible with any of the slower speed devices, as it differs considerably on FC1
level (64b/66b encoding instead of 8b/10b encoding). 10Gb and 20Gb Fibre Channel is primarily
deployed as a high-speed "stacking" interconnect to link multiple switches.

[edit] Ports

FC topologies and port types

The following types of ports are defined by Fibre Channel:

 node ports
o N_port is a port on the node (e.g. host or storage device) used with both FC-P2P
or FC-SW topologies. Also known as Node port.
o NL_port is a port on the node used with an FC-AL topology. Also known as
Node Loop port.
o F_port is a port on the switch that connects to a node point-to-point (i.e. connects
to an N_port). Also known as Fabric port. An F_port is not loop capable.
o FL_port is a port on the switch that connects to a FC-AL loop (i.e. to NL_ports).
Also known as Fabric Loop port.
o E_port is the connection between two fibre channel switches. Also known as an
Expansion port. When E_ports between two switches form a link, that link is
referred to as an inter-switch link (ISL).
o EX_port is the connection between a fibre channel router and a fibre channel
switch. On the side of the switch it looks like a normal E_port, but on the side of
the router it is a EX_port.
o TE_port * a Cisco addition to Fibre Channel, now adopted as a standard. It is an
extended ISL or EISL. The TE_port provides not only standard E_port functions
but allows for routing of multiple VSANs (Virtual SANs). This is accomplished
by modifying the standard Fibre Channel frame (vsan tagging) upon
ingress/egress of the VSAN environment. Also known as Trunking E_port.

 general (catch-all) types


o Auto or auto-sensing port found in Cisco switches, can automatically become an
E_, TE_, F_, or FL_port as needed.
o Fx_port a generic port that can become a F_port (when connected to a N_port) or
a FL_port (when connected to a NL_port). Found only on Cisco devices where
oversubscription is a factor.
o G_port or generic port on a switch can operate as an E_port or F_port. Found on
Brocade and McData switches.
o L_port is the loose term used for any arbitrated loop port, NL_port or FL_port.
Also known as Loop port.
o U_port is the loose term used for any arbitrated port. Also known as Universal
port. Found only on Brocade switches.

(*Note: The term "trunking" is not a standard Fibre Channel term and is used by vendors
interchangeably. For example: A trunk (an aggregation of ISLs) in a Brocade device is referred
to as a Port Channel by Cisco. Whereas Cisco refers to trunking as an EISL.)

[edit] Optical carrier medium variants


Typical Fibre Channel connectors - modern LC on the left and older SC (typical for 100 Mbyte/s
speeds) on the right

Speed
Media Type Transmitter Variant Distance
(Mbyte/s)

1300 nm Longwave
400 400-SM-LL-I 2 m - 2 km
Laser

1550 nm Longwave 2 m - >50


200-SM-LL-V
Laser km
200
1300 nm Longwave
200-SM-LL-I 2 m - 2 km
Laser
Single-Mode Fiber
1550 nm Longwave 2 m - >50
100-SM-LL-V
Laser km

1300 nm Longwave
100 100-SM-LL-L 2 m - 10 km
Laser

1300 nm Longwave
100-SM-LL-I 2 m - 2 km
Laser

400-M5/6-SN- 0.5 m -
400
I 150m

200-M5/6-SN- 0.5 m -
200
I 300m
Multimode Fiber 850 nm Shortwave
(50µm) Laser
100-M5/6-SN- 0.5 m -
I 500m
100

100-M6-SL-I 2 m - 175m
Modern FibreChannel devices support SFP.

[edit] Fibre Channel infrastructure

SAN-switch with optical FC connectors installed.

Fibre Channel switches can be divided into two classes. These classes are not part of the
standard, and the classification of every switch is a marketing decision of the manufacturer.

 Enterprise Directors offer a high port-count in a modular (slot-based) chassis with no


single point of failure (high availability).

 Departmental Switches are typically smaller, fixed-configuration (sometimes semi-


modular), less redundant devices.

A fabric consisting entirely of one vendor is considered to be homogeneous. This is often


referred to as operating in its "native mode" and allows the vendor to add proprietary features
which may not be compliant with the Fibre Channel standard.

If multiple switch vendors are used within the same fabric it is heterogeneous, the switches may
only achieve adjacency if all switches are placed into their interoperability modes. This is called
the "open fabric" mode as each vendor's switch may have to disable its proprietary features to
comply with the Fibre Channel standard.

Some switch manufacturers offer a variety of interoperability modes above and beyond the
"native" and "open fabric" states. These "native interoperability" modes allow switches to
operate in the native mode of another vendor and still maintain some of the proprietary behaviors
of both. However, running in native interoperability mode may still disable some proprietary
features and can produce fabrics of questionable stability.

[edit] Fibre Channel Host Bus Adapters


Fibre Channel HBAs are available for all major open systems, computer architectures, and buses,
including PCI and SBus. Some are OS dependent. Each HBA has a unique World Wide Name
(WWN), which is similar to an Ethernet MAC address in that it uses an Organizationally Unique
Identifier (OUI) assigned by the IEEE. However, WWNs are longer (8 bytes). There are two
types of WWNs on a HBA; a node WWN (WWNN), which is shared by all ports on a host bus
adapter, and a port WWN (WWPN), which is unique to each port.

A storage area network (SAN) is an architecture to attach remote computer storage devices
(such as disk arrays, tape libraries, and optical jukeboxes) to servers in such a way that the
devices appear as locally attached to the operating system. Although the cost and complexity of
SANs are dropping, they are uncommon outside larger enterprises.

Network attached storage (NAS), in contrast to SAN, uses file-based protocols such as NFS or
SMB/CIFS where it is clear that the storage is remote, and computers request a portion of an
abstract file rather than a disk block.

Contents
[hide]

 1 Network types
 2 Storage sharing
 3 SAN-NAS hybrid
 4 Benefits
 5 SAN infrastructure
 6 Compatibility
 7 SANs at home
 8 SANs in media and entertainment
 9 Storage virtualization and SANs
 10 See also
 11 References
 12 External links
o 12.1 SAN software articles and white papers

[edit] Network types


Most storage networks use the SCSI protocol for communication between servers and disk drive
devices. They do not use SCSI low-level physical; new storage networks use iSCSI instead. A
mapping layer to other low-level protocols is used to form a network:

 ATA over Ethernet (AoE), mapping of ATA over Ethernet


 Fibre Channel Protocol (FCP), the most prominent one, is mapping of SCSI over Fibre
Channel (FC)
 Fibre Channel over Ethernet (FCoE)
 mapping of FICON over FC, used by mainframe computers
 HyperSCSI, mapping of SCSI over Ethernet
 iFCP[1] or SANoIP[2] mapping of FCP over IP
 iSCSI, mapping of SCSI over TCP/IP
 iSCSI Extensions for RDMA (iSER), mapping of iSCSI over InfiniBand (IB)

[edit] Storage sharing


Historically, data centers first created "islands" of SCSI disk arrays. Each island was dedicated to
an application, and visible as a number of "virtual hard drives" (i.e. LUNs). Essentially, a SAN
connects storage islands together using a high-speed network, thus allowing all applications to
access all disks.

Operating systems still view a SAN as a collection of LUNs, and usually maintain their own file
systems on them. These local file systems, which cannot be shared among multiple operating
systems/hosts, are the most reliable and most widely used. If two independent local file systems
resided on a shared LUN, they would be unaware of this fact, would have no means of cache
synchronization and eventually would corrupt each other. Thus, sharing data between computers
through a SAN requires advanced solutions, such as SAN file systems or clustered computing.
Despite such issues, SANs help to increase storage capacity utilization, since multiple servers
share the storage space on the disk arrays. The common application of a SAN is for the use of
transactionally accessed data that require high-speed block-level access to the hard drives such as
email servers, databases, and high usage file servers.

In contrast, NAS allows many computers to access the same file system over the network and
synchronizes their accesses. Lately, the introduction of NAS heads allowed easy conversion of
SAN storage to NAS.

DAS vs NAS vs SANOrganization

[edit] SAN-NAS hybrid


Despite the differences between NAS and SAN, it is possible to create solutions that include both
technologies, as shown in the diagram.
Hybrid using DAS, NAS and SAN technologies.

[edit] Benefits
Sharing storage usually simplifies storage administration and adds flexibility since cables and
storage devices do not have to be physically moved to shift storage from one server to another.

Other benefits include the ability to allow servers to boot from the SAN itself. This allows for a
quick and easy replacement of faulty servers since the SAN can be reconfigured so that a
replacement server can use the LUN of the faulty server. This process can take as little as half an
hour and is a relatively new idea being pioneered in newer data centers. There are a number of
emerging products designed to facilitate and speed this up still further. Brocade, for example,
offers an Application Resource Manager product which automatically provisions servers to boot
off a SAN, with typical-case load times measured in minutes. While this area of technology is
still new many view it as being the future of the enterprise datacenter.

SANs also tend to enable more effective disaster recovery processes. A SAN could span a distant
location containing a secondary storage array. This enables storage replication either
implemented by disk array controllers, by server software, or by specialized SAN devices. Since
IP WANs are often the least costly method of long-distance transport, the Fibre Channel over IP
(FCIP) and iSCSI protocols have been developed to allow SAN extension over IP networks. The
traditional physical SCSI layer could only support a few meters of distance - not nearly enough
to ensure business continuance in a disaster. Demand for this SAN application has increased
dramatically after the September 11th attacks in the United States, and increased regulatory
requirements associated with Sarbanes-Oxley and similar legislation[citation needed].

The economic consolidation of disk arrays has accelerated the advancement of several features
including I/O caching, snapshotting, and volume cloning (Business Continuance Volumes or
BCVs).

[edit] SAN infrastructure


Qlogic SAN-switch with optical Fibre Channel connectors installed.

SANs often utilize a Fibre Channel fabric topology - an infrastructure specially designed to
handle storage communications. It provides faster and more reliable access than higher-level
protocols used in NAS. A fabric is similar in concept to a network segment in a local area
network. A typical Fibre Channel SAN fabric is made up of a number of Fibre Channel switches.

Today, all major SAN equipment vendors also offer some form of Fibre Channel routing
solution, and these bring substantial scalability benefits to the SAN architecture by allowing data
to cross between different fabrics without merging them. These offerings use proprietary
protocol elements, and the top-level architectures being promoted are radically different. They
often enable mapping Fibre Channel traffic over IP or over SONET/SDH.

[edit] Compatibility
One of the early problems with Fibre Channel SANs was that the switches and other hardware
from different manufacturers were not entirely compatible. Although the basic storage protocols
FCP were always quite standard, some of the higher-level functions did not interoperate well.
Similarly, many host operating systems would react badly to other operating systems sharing the
same fabric. Many solutions were pushed to the market before standards were finalized and
vendors innovated around the standards san.

[edit] SANs at home


SANs are primarily used in large scale, high performance enterprise storage operations. It would
be unusual to find a single disk drive connected directly to a SAN. Instead, SANs are normally
networks of large disk arrays. SAN equipment is relatively expensive and as such, fibre channel
host bus adapters are rare in desktop computers. The iSCSI SAN technology is expected to
eventually produce cheap SANs, but it is unlikely that this technology will be used outside the
enterprise data center environment. Desktop clients are expected to continue using NAS
protocols such as SMB and NFS. The exception to this may be remote storage replication.

[edit] SANs in media and entertainment


Video editing workgroups require very high data transfer rates. Outside of the enterprise market,
this is one area that greatly benefits from SANs.
Per-node bandwidth usage control, sometimes referred to as Quality of Service (QoS), is
especially important in video workgroups as it ensures fair and prioritized bandwidth usage
across the network if there is insufficient open bandwidth available. Avid Unity, Apple's Xsan
and Tiger Technology MetaSAN are specifically designed for video networks and offer this
functionality.

[edit] Storage virtualization and SANs


Storage virtualization refers to the process of completely abstracting logical storage from
physical storage. The physical storage resources are aggregated into storage pools, from which
the logical storage is created. It presents to the user a logical space for data storage and
transparently handles the process of mapping it to the actual physical location. This is currently
implemented inside each modern disk array, using a vendor's proprietary solution. However, the
goal is to virtualize multiple disk arrays, made by different vendors, scattered over the network,
into a single monolithic storage device, which can be managed uniformly.

ATA Encapsulation

SATA (and older PATA) hard drives use the Advanced Technology Attachment (ATA) protocol
to issue commands, such as read, write, and status. AoE encapsulates those commands inside
Ethernet frames and lets them travel over an Ethernet network instead of a SATA or 40-pin
ribbon cable. By using an AoE driver, the host operating system is able to access a remote disk
as if it were directly attached.

The encapsulation of ATA provided by AoE is simple and low-level, allowing the translation to
happen either at high performance or inside a small, embedded device, or both.

[edit] Routability

AoE runs directly on top of Ethernet, instead of an intermediate protocol such as TCP/IP. This
reduces the significant CPU overhead of TCP/IP. However, this means that routers cannot be
used to route a packet across disparate networks (such as the Internet). Instead, AoE packets can
travel within a single local Ethernet storage area network (such as one created by a switch or
VLAN).

[edit] Security

The non-routability of AoE is a source of inherent security (ie, an intruder can't connect through
a router. He must physically plug into the local Ethernet switch). However, there are no AoE-
specific mechanisms for password verification or encryption. Additional security may be
implemented at the file-system level. Certain AoE targets such as Coraid Storage appliances and
GGAOED [13], support access lists ("masks") allowing connections only from specific MAC
addresses.
[edit] Config String

The AoE protocol provides a mechanism for host-based cooperative locking. When more than
one AoE initiator is using an AoE target, they must communicate. The hosts need a way to avoid
interfering with one another as they use and modify the data on the shared AoE device.

One option provided by AoE is to use the storage device itself as the mechanism for determining
the access of particular hosts. The AoE protocol includes a "config string" feature. The config
string can record who is using the device. (It can also record any other information.) If more than
one host tries to set the config string simultaneously, only one succeeds. The other host is
informed of the conflict.

[edit] Related Concepts


Although AoE is a simple network protocol, it opens up a complex realm of storage possibilities.
To understand and evaluate these storage scenarios, it helps to be familiar with a few concepts.

[edit] Storage Area Networks

The purpose of Storage Area Networks (SANs) is often not to make files and directories
available to multiple users, as is the purpose of Network Attached Storage (NAS). Instead, a
SAN allows the physical hard drive to be removed from the server that uses it, and placed on the
network. A SAN interface is thus similar in principle to non-networked interfaces such as SATA
or SCSI. Most users will not use a SAN interface directly. Instead, they will connect to a server
that uses a SAN disk instead of a local disk. Direct connection, however, can also be used.

When using a SAN network to access storage, there are several potential advantages over a local
disk:

 It is easier to add storage capacity and the amount of storage is practically unlimited.
 It is easier to reallocate storage capacity.
 Data may be shared.
 Additionally, compared to other forms of networked storage, SANs are low-level and high
performance

[edit] Utilizing Storage Area Networks

To utilize a SAN disk, the host must format it with a filesystem. However, unlike a SATA or
SCSI disk a SAN hard drive may be accessed by multiple machines. This is a source of both
danger and opportunity.

Traditional filesystems (such as FAT or ext3) are designed to be accessed by a single host, and
will cause unpredictable behavior if accessed by multiple machines. Such filesystems may be
used, and AoE provides mechanisms whereby an AoE target can be guarded against
simultaneous access (see: Config String).
Shared disk filesystems allow multiple machines to use a single hard disk safely by coordinating
simultaneous access to individual files. They are slightly similar to network filesystems. These
filesystems can be used to allow multiple machines access to the same AoE target without an
intermediate server or filesystem (and at higher performance). Examples of shared disk
filesystems are GFS, GPFS, MetaSAN, and OCFS2.

Direct-attached storage (DAS) refers to a digital storage system directly attached to a server or
workstation, without a storage network in between. It is a retronym, mainly used to differentiate
non-networked storage from SAN and NAS.

Contents
[hide]

 1 Features
 2 Storage features common to DAS, SAN and NAS
 3 Disadvantages
 4 References
 5 See also
 6 External Links and Whitepapers

[edit] Features
A typical DAS system is made of a data storage device (for example enclosures holding a
number of hard disk drives) connected directly to a computer through a host bus adapter (HBA).
Between those two points there is no network device (like hub, switch, or router), and this is the
main characteristic of DAS.

The main protocols used for DAS connections are ATA, SATA, SCSI, SAS, and Fibre Channel.

[edit] Storage features common to DAS, SAN and NAS


Most functions found in modern storage do not depend whether the storage is attached directly to
servers (DAS), or via a network (SAN and NAS).

A DAS device can be shared between multiple computers, if only it provides multiple interfaces
(ports) that allow concurrent and direct access. This way it can be usable for computer clusters.
In fact, most SAN-attachable storage devices or NAS devices can be easily used as DAS devices
– all that is needed is to disconnect their ports from the data network and connect one or more
ports directly to a computer (with a plain one-to-one cable).

More advanced DAS devices, like SAN and NAS, can offer fault-tolerant design in many areas:
controller redundancy, cooling redundancy, and storage fault tolerance patterns known as RAID.
Some DAS systems provide embedded disk array controllers to offload RAID processing from
server's HBA. Basic DAS devices do not have such features.

A DAS can, like SAN or NAS, enable storage capacity extension, while keeping high data
bandwidth and access rate.

[edit] Disadvantages
DAS has been referred to as "Islands of Information". Disadvantages of DAS include inability to
share data or unused resources with other servers. Both NAS (network-attached storage) and
SAN (storage area network) architectures attempt to address this, but introduce some new issues
as well, such as higher initial cost[1], manageability, security, and contention for resources.

You might also like