You are on page 1of 19

Nominet Technology Group

Comparing ASM with ZFS

1.

INTRODUCTION................................................................................................................................1

2.

HISTORY OF ASM.............................................................................................................................2

3.

TRADITIONAL FILESYSTEMS......................................................................................................2

4.

THE MODERN APPROACH............................................................................................................2

5.

ASM COMPONENTS.........................................................................................................................3

6.

EXTENTS.............................................................................................................................................5

7.

REBALANCING..................................................................................................................................6

8.

METADATA.........................................................................................................................................7

9.

SOME MYTHS OF ASM....................................................................................................................8

10.

HISTORY OF ZFS.............................................................................................................................9

11.

BUILDING BLOCKS OF ASM......................................................................................................10

12.

LAYERS OF ZFS.............................................................................................................................11

13.

ITS A TREE!...................................................................................................................................13

14.

CREATING A POOL.......................................................................................................................14

15.

SNAPSHOTS & CLONES..............................................................................................................15

16.

COMPARING THE TWO..............................................................................................................16

17.

REFRENCES...................................................................................................................................17

Nominet Technology Group

1. INTRODUCTION

This document is meant to accompany my presentation, Comparing ASM with ZFS.


This presentation describes Oracle's ASM and Sun's ZFS file systems. I will tell a little
bit of their history and how they actually work.
I will also compare and contrast the file systems, giving an understanding of the benefits
of each.
The idea for the presentation came about while I was watching one of the Chief
designers of ZFS, Suns Bill Moore, give a talk on ZFS. I was of course impressed with
the functionality of the file system, though I had heard quite a lot about it prior to this.
What I found unexpectedly in the talk that really intrigued me was that the language Bill
was using and some of the concepts expounded on in the presentation would be familiar
to a DBA audience.
I was also struck by some of the similarities between ASM and ZFS they have some
unique features in common what I mean by that, is that there are some advantages a
software RAID solution (which both of them are) have over hardware RAID.
I had been running ASM in production for around two years by this time (December
2007) and, I suspect like a lot of DBAs had in some ways treated ASM like a black box.
I knew enough to install it and operate it, but knew very little about how it actually
worked. In some ways I think Oracle are greatly responsible for this state of affairs, as
the stunning lack of documentation available regarding ASM has only bred a lack of
understanding.
To be fair, I think Oracle have partly addressed this issue with the 11g documentation
set, which now includes a Storage Administrator guide. However, they really have
only partly addressed this, in that this guide still does not really tell you very many
details on how ASM actually works. There is though an ASM book: Oracle Automatic
Storage Management by Nitin Vengurlekar, Murali Vallath, and Rich Long. This book
covers the gap in the explanation of how ASM actually works.
The boundary of responsibility for storage administration has become increasingly
blurred within organisations with the adoption of ASM. I think this means DBAs more
than ever (though, you could argue it should always have been the case) need to
understand storage concepts to be fully in a position to extract the maximum benefit
from their storage.
Here I will present some of the ideas behind both ASM and ZFS giving some insight of
the benefits of both storage solutions and some of the features they have in common as
well as where they differ.

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

Nominet Technology Group

2. HISTORY OF ASM

ASM has an interesting history that really gives an insight into the development
timeframe of a large corporation like Oracle. The idea for ASM was from Bill Bridge, a
long time Oracle employee. The original idea goes way back to 1996, and it took 3
years for the project to be given management approval.
ASM was released with 10gR1. This occurred a full 7 years after the original idea,
which I think is a long time in technology terms, but a large corporation probably has
difficulty meeting quicker turn around times (cf. Microsoft Vista).
Right from the start one of the initial design goals was to not have a directory tree, and
that files would not have names nor be accessible via the standard OS utilities. It now
makes sense as to why ASMCMD feels like a bolted on afterthought that is somewhat
lacking in functionality.
I do wonder if this has hurt the take up rate of ASM, though I believe a large proportion
of new RAC installs are using ASM. Support for clustering was built in from the
beginning. Indeed Oracles big push on RAC may have been the killer application that
ASM needed to go from a proposal to a fully realised product.

3. TRADITIONAL FILESYSTEMS

File systems have been around for almost 40 years. UFS for example was introduced in
the early 80s and has thus evolved over 2 decades.
ASM and ZFS are both Volume managers and file systems. ASM has been designed
with the specific aim of storing Oracle database files. While ZFS is a general file
system.
Historically each file system managed a single disk. This has some clear drawbacks, in
terms of size, reliability, and speed. This is the niche that volume managers filled.
Volume managers are software that sits between the disk and the file system that enables
things like mirroring and stripping of disks that is completely transparent to the file
system itself.
4. THE MODERN APPROACH

Both ASM and ZFS combine the role of Volume managers & file systems into 1. This
can provide an administrative benefit as well as other advantages.

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

Nominet Technology Group

5. ASM COMPONENTS

When managing your storage via ASM, you are required to run an ASM instance on
your database server in addition to the normal RDBMS instance. The ASM instance
allows the user to allocate disks to disk groups and perform the required storage related
tasks.
The ASM instance is managed in a similar way to a normal RDBMS instance and has an
SGA, and uses an spfile to configure the various parameters associated with it. When it
starts up a set of Oracle processes run that manage the instance. As the ASM instance is
performing less work than an RDBMS instance it requires far fewer resources. ASM
instances mount disk groups to make files stored in ASM available to RDBMS
instances. ASM instances do not mount databases.
ASM instances are started/stopped in a similar way to RDBMS instances using sqlplus,
srvctl or even Enterprise Manager.
ASM instances can be clustered and in a RAC environment, there is one ASM instance
per node of the cluster. There is only ever one ASM instance per node.
Version 11.0 DRAFT 14/05/2008 07:46:00 PM

Nominet Technology Group

On failure of an ASM instance on a node, all databases that are using ASM on that node
will also fail.
A disk group is the fundamental object that ASM manages and exposes to the user. A
disk group can consist of one or more disks. The datafiles belonging to an RDBMS
instance are stored in disk groups. Each individual database file is completely contained
within one disk group, but a disk group can contain files for one or more databases, and
a single database may store files in multiple diskgroups.
It is at the disk group level that the mirroring and striping capabilities of ASM can be
utilised. ASM will automatically stripe data files across all disks in a disk group. The
idea is that by doing this, the I/O will be evenly distributed across all the disks. The size
of the disks should be taken into account when striping the data as all disks in the disk
group should be filled to the same capacity, i.e. a larger disk should receive more data
than a smaller one.
The various levels of redundancy can be specified at the disk group level:

External redundancy let the storage array take care of it

Normal mirrored pair

High triple mirroring

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

Nominet Technology Group

6. EXTENTS

Every ASM disk is divided into allocation units (au). Data files are stored as extents and
an extent consists of one or more allocation unit. When you create a disk group in 11g
you can specify the size of the allocation unit to be from 1MB to 64MB, the size
doubling between these limits. That is you can set the size of the au for a disk group to
be one of 1, 2, 4, 8, 16, 32, or 64MB.
Clearly the larger the au size chosen the less the number of extents it will take to map a
file of a given size. The larger au are clearly beneficial for large data files. Each
individual extent resides on a single disk. Each extent consists of one or more au; with
the concept of variable extent sizes being introduce to better accommodate larger data
files.
Extents can vary in size from 1 au to 8 au to 64 au. The number of au a given extent will
use is dependent on the number of extents allocated and the extent size increases at a
threshold of 20,000 extents (to 8) and then again at 40,000 extents (to 64).

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

Nominet Technology Group

7. REBALANCING

One of the major advantages of ASM is the ease with which the storage configuration
can be changed while the database relying on ASM remains online. This is thanks to
the ability of ASM to automatically rebalance the distribution of data among the disks in
a disk group whenever a reconfiguration of the disk group occurs.
It is the RBAL background process that manages the rebalancing process with the actual
rebalancing work of moving the data extents being performed by the ARBn processes.
A rebalance operation will essentially shift the extents around a disk group with the goal
of ensuring each disk in a disk group is filled up to the same capacity. This can be
beneficial if a new drive has been added to increase capacity as a rebalance will ensure
that all drives are fully participating in servicing I/O requests.

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

Nominet Technology Group

8. METADATA

ASM uses metadata to control disk groups and the allocation of space on the disks
within the disk group to ASM files (i.e. datafiles etc that are under the control of ASM).
All of the metadata associated with a disk group resides within the disk group that is to
say a disk group is self-describing.
ASM does not use a database to store the metadata, the ASM instance is not opened and
it does not mount a database.
The ASM metadata is either stored at fixed physical locations on the disk or in special
ASM files that are not exposed to the end user, e.g. you cant see them with ASMCMD.
User created files under ASM have file numbers that count upwards from 256, while the
metadata files count down from 255, though not all numbers are being utilised yet.
You can see the metadata files via the following X$KFFXP view:
SQL> select NUMBER_KFFXP file#, GROUP_KFFXP DG#, count(XNUM_KFFXP)
AU_count
from x$kffxp
where NUMBER_KFFXP < 256
group by NUMBER_KFFXP, GROUP_KFFXP;
FILE# DG# AU_COUNT
---------- ---- ------------1 1
2
1 2
2
1 3
2
1 4
2
2 1
1
2 2
1
2 3
1
2 4
1
3 1
42
3 2
42
3 3
42
3 4
42
4 1
2
4 2
2
4 3
2
4 4
2
5 1
1
5 2
1
5 3
1
5 4
1
6 1
1
6 2
1
6 3
1
6 4
1

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

Nominet Technology Group

9. SOME MYTHS OF ASM

With the lack of clarity and comprehensiveness in the Oracle documentation several
myths surrounding ASM have gained currency within the wider community.
I think the most popular myth that I have encountered is people saying that somehow
ASM is able to move extents (and hence RDBMS data files) based on I/O levels or even
hot spots on disks.
This never happens, the only goal of ASM rebalancing is to ensure each file is evenly
distributed amongst all disks in a disk group. If a file is evenly distributed then the
chances are that I/O to this file will be evenly distributed but ASM makes no use of any
I/O metrics.
Another pervasive myth is that an RDBMS instance sends its I/O via the ASM instance.
This is completely wrong, each RDBMS instance uses the extent maps it has received
from the ASM instance to read and write direct to the ASM disks.

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

Nominet Technology Group

10. HISTORY OF ZFS

1996 was obviously a popular year for the invention of new file systems as the first
ideas for ZFS occurred to Sun engineer Jeff Bonwick way back then. However, like
ASM it would be several years, before development proper took place. Development
really started in 2001 and ZFS was announced in 2004, but took a further 2 years to be
released with Solaris 10 6/06.
It is the worlds first 128-bit file system and as such has a huge capacity. Unlike ASM,
ZFS is a general-purpose file system; it has not been explicitly designed for storing
database files.
There were several goals kept in mind when designing ZFS:

Ease of administration It is two commands to create your storage pool and


mount a file system

Data Integrity - All data is protected with a 256-bit checksum

Scalability maximum size of each file is 264 bytes

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

Nominet Technology Group

11. BUILDING BLOCKS OF ASM

There are three basic building blocks to ASM

Pooled storage

Copy on write

Transactions

Checksums

Pooled storage takes the concept of virtual memory and tries to apply it to disks. Just
like adding more memory adding more disks to a system should just make the
additional storage available straight away.
ZFS maintains its records as a tree of blocks. Every block is accessible via a single
block called the uberblock. When you change an existing block, instead of getting
overwritten a copy of the data is made and then modified before being written to disk.
This is Copy on Write. This ensures ZFS never overwrites live data. This guarantees the
integrity of the file system as a system crash still leaves the on disk data in a completely
consistent state. There is no need for fsck. Ever.
Every block in the ZFS tree of blocks has a checksum. The checksum is stored in the
parent block.
Operations that modify the files system are bunched together into transactions before
being committed to disk asynchronously. Related changes are put together into a
transaction and either the whole transactions completes or fails. Individual operations
within a transaction can be reordered to optimise performance as long as data integrity
is not affected.

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

10

Nominet Technology Group

12. LAYERS OF ZFS

ZFS is composed of several layers. It is coded in around 1/7th the lines of code as
UFS/Solaris Volume Manager and provides more functionality than this combination.

Starting from the bottom up:


VDEV: method of accessing and arranging devices. Each vdev is responsible for
representing the available space, as well as laying out blocks on the physical disk
ZIO: ZFS I/O pipeline - all data to or from the disk goes through this.
SPA: Storage Pool Allocator - includes routines to create and destroy pools as well as
sync the data out to the vdevs.
ARC: Adaptive Replacement Cache - file system cache

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

11

Nominet Technology Group

DMU: Data Management Unit - presents transactional objects built upon address space
from vdevs . The DMU is responsible for mainting data consistency.
ZIL: ZFS Intent Log - Not all writes are written to disk straight away, synchronous
writes are written to the transaction log.
ZAP: ZFS Attribute Processor most commonly used to implement directories in the
ZPL
DSL: Dataset and Snapshot Layer responsible for implementing snapshots and clones.
ZVOL: ZFS Emulated Volume - This is the ability to present raw devices backed by a
ZFS pool
ZPL: ZFS Posix Layer primary interface for interacting with ZFS as a file system.

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

12

Nominet Technology Group

13. ITS A TREE!

The block structure of ZFS can be thought of as like a tree structure. The leaf nodes are
effectively the blocks on disk while the higher levels are called indirect blocks. The top
block is called the uberblock. You can think of all but the leaf blocks as metadata. The
metadata is allocated dynamically.

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

13

Nominet Technology Group

14. CREATING A POOL

The real ease of administration in ZFS comes when you are attempting to create a file
system. When you are using whole disks there is no need for the device to specially
formatted as ZFS formats the disk. There is also no need to issue the mkfs or newfs
command. The following:
zpool create db c1t0d0
Will create a file system mounted automatically on /db using as much space on the
c1t0d0 device as it requires. There is no need to edit /etc/vfstab entries.
It is with the zpool command that you can also define a redundant pool, for example:
zpool create db mirror c1t0d0 c2t0d0
will create a mirrored pool between these two devices.
Within a pool you can create multiple file systems using the ZFS command:
zfs create db db/oradata
You can dynamically add space to a storage pool with the following:
zpool add db mirror c3t0d0 c4t0d0

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

14

Nominet Technology Group

15. SNAPSHOTS & CLONES

A snapshot is a read-only copy of a file system at a particular point of time. They can be
created quickly and due to the copy on write nature of ZFS snapshots are very cheap to
create. A snapshot to begin with consumes very little space, but as the active dataset
changes the snapshot will begin to consume more and more space by keeping the
references to the old data.
Snapshots are very straightforward to create and initially occupy no storage:
zfs snapshot db@now
This creates a snapshot of the db filesystem with a label called now. Snapshots consume
storage from the same pool as the file system from which they were created.
As the file system that you have taken the snapshot off undergoes changes the snapshot
increases in size as it effectively records and stores the original entries. The Copy on
Write process makes taking snapshots a lot easier as it just a case of keeping the
pointers to the old structure.
The snapshot data is accessible via the .zfs directory within the root of the file system
that has undergone a snapshot. This makes it possible to recover individual files.
Rolling back to a snapshot is also a fairly trivial command:
zfs rollback db@now
In contrast to snapshots a clone is a writable volume who's initial contents are the same
as the dataset from which it was created.
Clones are created from snapshots.
zfs clone db@now db/test
This creates a new clone db/test from the snapshot db@now.
This seems to me like a great way of providing developers with a full copy of a database
to work on without having to consume vast quantities of storage space.

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

15

Nominet Technology Group

16. COMPARING THE TWO

ASM and ZFS are both modern file systems that take the similar approach of combining
a volume manager and a file system together. There are advantages to this approach.
ASM has obviously been written with the sole aim of storing Oracle data files, and thus
optimised for this. ZFS meanwhile is a general-purpose file system and thus has not
been optimised at all for database usage.
I think clearly with ASM having the ability to be used in a clustered environment
whereas ZFS is not cluster aware means ZFS is unable to participate in that segment of
the market.
There may be also questions over whether copy on write has serious performance
drawbacks when used in conjunction with an OLTP database. The copy on write may
well have an impact on sequential I/O with a table that undergoes many updates.
On the other hand, ZFS provides a far richer set of features than ASM and also has a far
friendlier interface. It is also easier to manage I believe than ASM.
Another advantage of ASM though is the ability to rebalance online where the extents
are held so that I/O can be optimally distributed.
I think both file systems have their own advantages and there may be certain cases
where ZFS though perhaps less performant has more functionality and that may be a
trade of to be made.
In a RAC situation there is no choice, it would be ASM all the way.

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

16

Nominet Technology Group

17. REFRENCES

Some sources I used when compiling this document:

Oracle Storage Administrators Guide:


http://download.oracle.com/docs/cd/B28359_01/server.111/b31107/toc.htm

Oracle Automatic Storage Management:


http://www.amazon.co.uk/Oracle-Automatic-Storage-ManagementUnder/dp/0071496076

Description of ZFS layers:


http://opensolaris.org/os/community/zfs/source/

Sun information on ZFS:


http://www.sun.com/software/solaris/zfs_learning_center.jsp

Version 11.0 DRAFT 14/05/2008 07:46:00 PM

17

You might also like