Professional Documents
Culture Documents
1.
INTRODUCTION................................................................................................................................1
2.
HISTORY OF ASM.............................................................................................................................2
3.
TRADITIONAL FILESYSTEMS......................................................................................................2
4.
5.
ASM COMPONENTS.........................................................................................................................3
6.
EXTENTS.............................................................................................................................................5
7.
REBALANCING..................................................................................................................................6
8.
METADATA.........................................................................................................................................7
9.
10.
HISTORY OF ZFS.............................................................................................................................9
11.
12.
LAYERS OF ZFS.............................................................................................................................11
13.
ITS A TREE!...................................................................................................................................13
14.
CREATING A POOL.......................................................................................................................14
15.
16.
17.
REFRENCES...................................................................................................................................17
1. INTRODUCTION
2. HISTORY OF ASM
ASM has an interesting history that really gives an insight into the development
timeframe of a large corporation like Oracle. The idea for ASM was from Bill Bridge, a
long time Oracle employee. The original idea goes way back to 1996, and it took 3
years for the project to be given management approval.
ASM was released with 10gR1. This occurred a full 7 years after the original idea,
which I think is a long time in technology terms, but a large corporation probably has
difficulty meeting quicker turn around times (cf. Microsoft Vista).
Right from the start one of the initial design goals was to not have a directory tree, and
that files would not have names nor be accessible via the standard OS utilities. It now
makes sense as to why ASMCMD feels like a bolted on afterthought that is somewhat
lacking in functionality.
I do wonder if this has hurt the take up rate of ASM, though I believe a large proportion
of new RAC installs are using ASM. Support for clustering was built in from the
beginning. Indeed Oracles big push on RAC may have been the killer application that
ASM needed to go from a proposal to a fully realised product.
3. TRADITIONAL FILESYSTEMS
File systems have been around for almost 40 years. UFS for example was introduced in
the early 80s and has thus evolved over 2 decades.
ASM and ZFS are both Volume managers and file systems. ASM has been designed
with the specific aim of storing Oracle database files. While ZFS is a general file
system.
Historically each file system managed a single disk. This has some clear drawbacks, in
terms of size, reliability, and speed. This is the niche that volume managers filled.
Volume managers are software that sits between the disk and the file system that enables
things like mirroring and stripping of disks that is completely transparent to the file
system itself.
4. THE MODERN APPROACH
Both ASM and ZFS combine the role of Volume managers & file systems into 1. This
can provide an administrative benefit as well as other advantages.
5. ASM COMPONENTS
When managing your storage via ASM, you are required to run an ASM instance on
your database server in addition to the normal RDBMS instance. The ASM instance
allows the user to allocate disks to disk groups and perform the required storage related
tasks.
The ASM instance is managed in a similar way to a normal RDBMS instance and has an
SGA, and uses an spfile to configure the various parameters associated with it. When it
starts up a set of Oracle processes run that manage the instance. As the ASM instance is
performing less work than an RDBMS instance it requires far fewer resources. ASM
instances mount disk groups to make files stored in ASM available to RDBMS
instances. ASM instances do not mount databases.
ASM instances are started/stopped in a similar way to RDBMS instances using sqlplus,
srvctl or even Enterprise Manager.
ASM instances can be clustered and in a RAC environment, there is one ASM instance
per node of the cluster. There is only ever one ASM instance per node.
Version 11.0 DRAFT 14/05/2008 07:46:00 PM
On failure of an ASM instance on a node, all databases that are using ASM on that node
will also fail.
A disk group is the fundamental object that ASM manages and exposes to the user. A
disk group can consist of one or more disks. The datafiles belonging to an RDBMS
instance are stored in disk groups. Each individual database file is completely contained
within one disk group, but a disk group can contain files for one or more databases, and
a single database may store files in multiple diskgroups.
It is at the disk group level that the mirroring and striping capabilities of ASM can be
utilised. ASM will automatically stripe data files across all disks in a disk group. The
idea is that by doing this, the I/O will be evenly distributed across all the disks. The size
of the disks should be taken into account when striping the data as all disks in the disk
group should be filled to the same capacity, i.e. a larger disk should receive more data
than a smaller one.
The various levels of redundancy can be specified at the disk group level:
6. EXTENTS
Every ASM disk is divided into allocation units (au). Data files are stored as extents and
an extent consists of one or more allocation unit. When you create a disk group in 11g
you can specify the size of the allocation unit to be from 1MB to 64MB, the size
doubling between these limits. That is you can set the size of the au for a disk group to
be one of 1, 2, 4, 8, 16, 32, or 64MB.
Clearly the larger the au size chosen the less the number of extents it will take to map a
file of a given size. The larger au are clearly beneficial for large data files. Each
individual extent resides on a single disk. Each extent consists of one or more au; with
the concept of variable extent sizes being introduce to better accommodate larger data
files.
Extents can vary in size from 1 au to 8 au to 64 au. The number of au a given extent will
use is dependent on the number of extents allocated and the extent size increases at a
threshold of 20,000 extents (to 8) and then again at 40,000 extents (to 64).
7. REBALANCING
One of the major advantages of ASM is the ease with which the storage configuration
can be changed while the database relying on ASM remains online. This is thanks to
the ability of ASM to automatically rebalance the distribution of data among the disks in
a disk group whenever a reconfiguration of the disk group occurs.
It is the RBAL background process that manages the rebalancing process with the actual
rebalancing work of moving the data extents being performed by the ARBn processes.
A rebalance operation will essentially shift the extents around a disk group with the goal
of ensuring each disk in a disk group is filled up to the same capacity. This can be
beneficial if a new drive has been added to increase capacity as a rebalance will ensure
that all drives are fully participating in servicing I/O requests.
8. METADATA
ASM uses metadata to control disk groups and the allocation of space on the disks
within the disk group to ASM files (i.e. datafiles etc that are under the control of ASM).
All of the metadata associated with a disk group resides within the disk group that is to
say a disk group is self-describing.
ASM does not use a database to store the metadata, the ASM instance is not opened and
it does not mount a database.
The ASM metadata is either stored at fixed physical locations on the disk or in special
ASM files that are not exposed to the end user, e.g. you cant see them with ASMCMD.
User created files under ASM have file numbers that count upwards from 256, while the
metadata files count down from 255, though not all numbers are being utilised yet.
You can see the metadata files via the following X$KFFXP view:
SQL> select NUMBER_KFFXP file#, GROUP_KFFXP DG#, count(XNUM_KFFXP)
AU_count
from x$kffxp
where NUMBER_KFFXP < 256
group by NUMBER_KFFXP, GROUP_KFFXP;
FILE# DG# AU_COUNT
---------- ---- ------------1 1
2
1 2
2
1 3
2
1 4
2
2 1
1
2 2
1
2 3
1
2 4
1
3 1
42
3 2
42
3 3
42
3 4
42
4 1
2
4 2
2
4 3
2
4 4
2
5 1
1
5 2
1
5 3
1
5 4
1
6 1
1
6 2
1
6 3
1
6 4
1
With the lack of clarity and comprehensiveness in the Oracle documentation several
myths surrounding ASM have gained currency within the wider community.
I think the most popular myth that I have encountered is people saying that somehow
ASM is able to move extents (and hence RDBMS data files) based on I/O levels or even
hot spots on disks.
This never happens, the only goal of ASM rebalancing is to ensure each file is evenly
distributed amongst all disks in a disk group. If a file is evenly distributed then the
chances are that I/O to this file will be evenly distributed but ASM makes no use of any
I/O metrics.
Another pervasive myth is that an RDBMS instance sends its I/O via the ASM instance.
This is completely wrong, each RDBMS instance uses the extent maps it has received
from the ASM instance to read and write direct to the ASM disks.
1996 was obviously a popular year for the invention of new file systems as the first
ideas for ZFS occurred to Sun engineer Jeff Bonwick way back then. However, like
ASM it would be several years, before development proper took place. Development
really started in 2001 and ZFS was announced in 2004, but took a further 2 years to be
released with Solaris 10 6/06.
It is the worlds first 128-bit file system and as such has a huge capacity. Unlike ASM,
ZFS is a general-purpose file system; it has not been explicitly designed for storing
database files.
There were several goals kept in mind when designing ZFS:
Pooled storage
Copy on write
Transactions
Checksums
Pooled storage takes the concept of virtual memory and tries to apply it to disks. Just
like adding more memory adding more disks to a system should just make the
additional storage available straight away.
ZFS maintains its records as a tree of blocks. Every block is accessible via a single
block called the uberblock. When you change an existing block, instead of getting
overwritten a copy of the data is made and then modified before being written to disk.
This is Copy on Write. This ensures ZFS never overwrites live data. This guarantees the
integrity of the file system as a system crash still leaves the on disk data in a completely
consistent state. There is no need for fsck. Ever.
Every block in the ZFS tree of blocks has a checksum. The checksum is stored in the
parent block.
Operations that modify the files system are bunched together into transactions before
being committed to disk asynchronously. Related changes are put together into a
transaction and either the whole transactions completes or fails. Individual operations
within a transaction can be reordered to optimise performance as long as data integrity
is not affected.
10
ZFS is composed of several layers. It is coded in around 1/7th the lines of code as
UFS/Solaris Volume Manager and provides more functionality than this combination.
11
DMU: Data Management Unit - presents transactional objects built upon address space
from vdevs . The DMU is responsible for mainting data consistency.
ZIL: ZFS Intent Log - Not all writes are written to disk straight away, synchronous
writes are written to the transaction log.
ZAP: ZFS Attribute Processor most commonly used to implement directories in the
ZPL
DSL: Dataset and Snapshot Layer responsible for implementing snapshots and clones.
ZVOL: ZFS Emulated Volume - This is the ability to present raw devices backed by a
ZFS pool
ZPL: ZFS Posix Layer primary interface for interacting with ZFS as a file system.
12
The block structure of ZFS can be thought of as like a tree structure. The leaf nodes are
effectively the blocks on disk while the higher levels are called indirect blocks. The top
block is called the uberblock. You can think of all but the leaf blocks as metadata. The
metadata is allocated dynamically.
13
The real ease of administration in ZFS comes when you are attempting to create a file
system. When you are using whole disks there is no need for the device to specially
formatted as ZFS formats the disk. There is also no need to issue the mkfs or newfs
command. The following:
zpool create db c1t0d0
Will create a file system mounted automatically on /db using as much space on the
c1t0d0 device as it requires. There is no need to edit /etc/vfstab entries.
It is with the zpool command that you can also define a redundant pool, for example:
zpool create db mirror c1t0d0 c2t0d0
will create a mirrored pool between these two devices.
Within a pool you can create multiple file systems using the ZFS command:
zfs create db db/oradata
You can dynamically add space to a storage pool with the following:
zpool add db mirror c3t0d0 c4t0d0
14
A snapshot is a read-only copy of a file system at a particular point of time. They can be
created quickly and due to the copy on write nature of ZFS snapshots are very cheap to
create. A snapshot to begin with consumes very little space, but as the active dataset
changes the snapshot will begin to consume more and more space by keeping the
references to the old data.
Snapshots are very straightforward to create and initially occupy no storage:
zfs snapshot db@now
This creates a snapshot of the db filesystem with a label called now. Snapshots consume
storage from the same pool as the file system from which they were created.
As the file system that you have taken the snapshot off undergoes changes the snapshot
increases in size as it effectively records and stores the original entries. The Copy on
Write process makes taking snapshots a lot easier as it just a case of keeping the
pointers to the old structure.
The snapshot data is accessible via the .zfs directory within the root of the file system
that has undergone a snapshot. This makes it possible to recover individual files.
Rolling back to a snapshot is also a fairly trivial command:
zfs rollback db@now
In contrast to snapshots a clone is a writable volume who's initial contents are the same
as the dataset from which it was created.
Clones are created from snapshots.
zfs clone db@now db/test
This creates a new clone db/test from the snapshot db@now.
This seems to me like a great way of providing developers with a full copy of a database
to work on without having to consume vast quantities of storage space.
15
ASM and ZFS are both modern file systems that take the similar approach of combining
a volume manager and a file system together. There are advantages to this approach.
ASM has obviously been written with the sole aim of storing Oracle data files, and thus
optimised for this. ZFS meanwhile is a general-purpose file system and thus has not
been optimised at all for database usage.
I think clearly with ASM having the ability to be used in a clustered environment
whereas ZFS is not cluster aware means ZFS is unable to participate in that segment of
the market.
There may be also questions over whether copy on write has serious performance
drawbacks when used in conjunction with an OLTP database. The copy on write may
well have an impact on sequential I/O with a table that undergoes many updates.
On the other hand, ZFS provides a far richer set of features than ASM and also has a far
friendlier interface. It is also easier to manage I believe than ASM.
Another advantage of ASM though is the ability to rebalance online where the extents
are held so that I/O can be optimally distributed.
I think both file systems have their own advantages and there may be certain cases
where ZFS though perhaps less performant has more functionality and that may be a
trade of to be made.
In a RAC situation there is no choice, it would be ASM all the way.
16
17. REFRENCES
17