Professional Documents
Culture Documents
DataStage Architecture
What is the architecture of data stage?
Architecture of DS is client/server architecture.
We have different types of client /server architecture for DataStage starting from the different
versions.The latest version is DataStage 8.7
http://www-01.ibm.com/support/docview.wss?uid=swg27008803
1. Datastage 7.5 (7.5.1 or 7.5.2) version-standalone
DataStage 7.5 version was a standalone version where DataStage engine, service and repository
(metadata) was all installed in once server and client was installed in local PC and access the servers
using the ds-client. Here the users are created in Unix/windows DataStage server and was added to the
dstage group (dsadm is the owner of the DataStage and dstage is the group of that.)To give access to the
new user just create new Unix/windows user in the DS-server and add them to dstage group. The will
have access to the DataStage server from the client.
Client components & server components
Client components are 4 types they are
1. Data stage designer
2. Data stage administrator
3. Data stage director
4. Data stage manager
Data stage designer is user for to design the jobs. All the DataStage development activities are done
here. For a DataStage developer he should know this part very well.
Data stage manager is used for to import & export the project to view & edit the contents of the
repository. This is handled by DataStage operator/administrator
Data stage administrator is used for creating the project, deleting the project & setting the environment
variables. This is handled by DataStage administrator
Data stage director is use for to run the jobs, validate the jobs, scheduling the jobs. This is handled by
DataStage developer/operator
Server components
DS server: runs executable server jobs, under the control of the DS director, that extract,transform, and
load data into a DWH.
DS Package installer: A user interface used to install packaged DS jobs and plug-in;
Repository or project: a central store that contains all the information required to build DWH or data
mart.
From the above we can find most of the objects are common But the main this is users. This where the
problem exists. Lets talk more on user and schema in detail.
Oracle Database Architecture
Oracle database contain 2 main components.
1. Instance: Memory structures and background processes constitute an instance
2. Database: disk resources
Instance
As we cover above, the memory structures and background processes constitute an instance. The
memory structure itself consists of System Global Area (SGA), Program Global Area (PGA), and an
optional area Software Area Code. In the other hand, the mandatory background processes are
Database Writer (DBWn), Log Writer (LGWR), Checkpoint (CKPT), System Monitor (SMON), and
Process Monitor (PMON). And another optional background processes are Archiver (ARCn), Recoverer
(RECO), etc. Figure 2 will illustrate the relationship for those components on an instance.
SGA is the primary memory structures. When Oracle DBAs talk about memory, they usually mean the
SGA. This area is broken into a few of part memory Buffer Cache, Shared Pool, Redo Log Buffer,
Large Pool, and Java Pool.
Buffer Cache
Buffer cache is used to stores the copies of data block that retrieved from datafiles. That is, when user
retrieves data from database, the data will be stored in buffer cache. Its size can be manipulated via
DB_CACHE_SIZE parameter in init.ora initialization parameter file.
Shared Pool
Shared pool is broken into two small part memories Library Cache and Dictionary Cache. The library
cache is used to stores information about the commonly used SQL and PL/SQL statements; and is
managed by a Least Recently Used (LRU) algorithm. It is also enables the sharing those statemens
among users. In the other hand, dictionary cache is used to stores information about object definitions in
the database, such as columns, tables, indexes, users, privileges, etc.
The shared pool size can be set via SHARED_POOL_SIZE parameter in init.ora initialization parameter
file.
Each DML statement (insert, update, and delete) executed by users will generates the redo entry. What is
a redo entry? It is an information about all data changes made by users. That redo entry is stored in redo
log buffer before it is written into the redo log files. To manipulate the size of redo log buffer, you can use
the LOG_BUFFER parameter in init.ora initialization parameter file.
Large Pool
Large pool is an optional area of memory in the SGA. It is used to relieves the burden place on the shared
pool. It is also used for I/O processes. The large pool size can be set by LARGE_POOL_SIZE parameter
in init.ora initialization parameter file.
Java Pool
As its name, Java pool is used to services parsing of the Java commands. Its size can be set by
JAVA_POOL_SIZE parameter in init.ora initialization parameter file.
Although the result of SQL statemen parsing is stored in library cache, but the value of binding variable
will be stored in PGA. Why? Because it must be private or not be shared among users. The PGA is also
used for sort area.
Software area code is a location in memory where the Oracle application software resides.
Oracle background processes is the processes behind the scene that work together with the memories.
DBWn
Database writer (DBWn) process is used to write data from buffer cache into the datafiles. Historically, the
database writer is named DBWR. But since some of Oracle version allows us to have more than one
database writer, the name is changed to DBWn, where n value is a number 0 to 9.
LGWR
Log writer (LGWR) process is similar to DBWn. It writes the redo entries from redo log buffer into the redo
log files.
CKPT
Checkpoint (CKPT) is a process to give a signal to DBWn to writes data in the buffer cache into datafiles.
It will also updates datafiles and control files header when log file switch occurs.
SMON
System Monitor (SMON) process is used to recover the system crach or instance failure by applying the
entries in the redo log files to the datafiles.
PMON
Process Monitor (PMON) process is used to clean up work after failed processes by rolling back the
transactions and releasing other resources.
Database
The database refers to disk resources, and is broken into two main structures Logical structures and
Physical structures.
Logical Structures
Oracle database is divided into smaller logical units to manage, store, and retrieve data efficiently. The
logical units are tablespace, segment, extent, and data block. Figure 3 will illustrate the relationships
between those units.
Tablespace
A Tablespace is a grouping logical database objects. A database must have one or more tablespaces. In
the Figure 3, we have three tablespaces SYSTEM tablespace, Tablespace 1, and Tablespace 2.
Tablespace is composed by one or more datafiles.
Segment
A Tablespace is further broken into segments. A segment is used to stores same type of objects. That is,
every table in the database will store into a specific segment (named Data Segment) and every index in
the database will also store in its own segment (named Index Segment). The other segment types are
Temporary Segment and Rollback Segment.
Extent
A segment is further broken into extents. An extent consists of one or more data block. When the
database object is enlarged, an extent will be allocated. Unlike a tablespace or a segment, an extent
cannot be named.
Data Block
A data block is the smallest unit of storage in the Oracle database. The data block size is a specific
number of bytes within tablespace and it has the same number of bytes.
Physical Structures
The physical structures are structures of an Oracle database (in this case the disk files) that are not
directly manipulated by users. The physical structure consists of datafiles, redo log files, and control files.
Datafiles
A datafile is a file that correspondens with a tablespace. One datafile can be used by one tablespace, but
one tablespace can has more than one datafiles.
Redo log files are the files that store the redo entries generated by DML statements. It can be used for
recovery processes.
Control Files
Control files are used to store information about physical structure of database, such as datafiles size and
location, redo log files location, etc.
http://ugweb.cs.ualberta.ca/~c391/manual/chapt2.html
http://docs.oracle.com/cd/B10500_01/server.920/a96524/c08memor.htm
Database Administrator server (DAS): which should be up and running always. It is only one user for
the whole server. If it is down mans the DB2 server will be down.
Instance (db2inst1): This is the instance which will have multiple number of database inside it.we can
have more than one instance for one DB2 server.
Fence user(db2fence):This is used to run the functions and procedures in the Database.
Database: The database refers to disk resources, and is broken into two main structures Logical
structures and Physical structures