Professional Documents
Culture Documents
Transform -- the process of converting the extracted data from its previous form into
required form
Load -- the process of writing the data into the target database.
ETL is used to migrate data from one database to another, to form data marts anddata
warehouses and also to convert databases from one format to another format.
1 | Page
It is used to retrieve the data from various operational databases and is transformed into useful
information and finally loaded into Datawarehousing system.
1 INFORMATICA
2 ABINITO
3 DATASTAGE
4. BODI
5 ORACLE WAREHOUSE BUILDERS
Report generation
In report generation, OLAP is used (i.e.) online analytical processing. It is a set of
specification which allows the client applications in retrieving the data for analytical processing.
It is a specialized tool that sits between a database and user in order to provide various analyses
of the data stored in the database.
OLAP Tool is a reporting tool which generates the reports that are useful for Decision support for
top level management.
1.
Business Objects
2.
Cognos
3.
Micro strategy
4.
Hyperion
5.
Oracle Express
6. Microsoft Analysis Services
2 | Page
3 | Page
3.
What are the types of datawarehousing?
EDW (Enterprise datawarehousing)
It provides a central database for decision support throughout the enterprise
It is a collection of DATAMARTS
DATAMART
It is a subset of Datawarehousing
It is a subject oriented database which supports the needs of individuals depts. in an
organizations
It is called high performance query structure
It supports particular line of business like sales, marketing etc..
ODS (Operational data store)
It is defined as an integrated view of operational database designed to support operational
monitoring
It is a collection of operational data sources designed to support Transaction processing
Data is refreshed near real-time and used for business activity
It is an intermediate between the OLTP and OLAP which helps to create an instance reports
4 | Page
4.
5 | Page
5.
6 | Page
Physical data model includes all required tables, columns, relationships, database
properties for the physical implementation of databases. Database performance, indexing
strategy, physical storage and demoralization are important parameters of a physical model.
Logical vs. Physical Data Modeling
Entity
Table
Attribute
Column
Primary Key
Alternate Key
Rule
Relationship
Foreign Key
Definition
Comment
8 | Page
Semi Additive - Measures that can be summed up across few dimensions and not with
others
Ex: Current Balance
Non Additive - Measures that cannot be summed up across any of the dimensions.
Ex: Student attendance
Surrogate Key
9 | Page
Joins between fact and dimension tables should be based on surrogate keys
Users should not obtain any information by looking at these keys
These keys should be simple integers
10 | P a g e
11 | P a g e
12 | P a g e
Q. How can you define a transformation? What are different types of transformations
available in Informatica?
A. A transformation is a repository object that generates, modifies, or passes data. The Designer
provides a set of transformations that perform specific functions. For example, an Aggregator
transformation performs calculations on groups of data. Below are the various transformations
available in Informatica:
Aggregator
Custom
Expression
External Procedure
Filter
Input
Joiner
Lookup
Normalizer
Rank
Router
Sequence Generator
Sorter
Source Qualifier
Stored Procedure
Transaction Control
Union
Update Strategy
XML Generator
XML Parser
XML Source Qualifier
Q. What is a source qualifier? What is meant by Query Override?
A. Source Qualifier represents the rows that the PowerCenter Server reads from a relational or
flat file source when it runs a session. When a relational or a flat file source definition is added to
a mapping, it is connected to a Source Qualifier transformation.
PowerCenter Server generates a query for each Source Qualifier Transformation whenever it
runs the session. The default query is SELET statement containing all the source columns.
Source Qualifier has capability to override this default query by changing the default settings of
the transformation properties. The list of selected ports or the order they appear in the default
query should not be changed in overridden query.
Q. What is aggregator transformation?
A. The Aggregator transformation allows performing aggregate calculations, such as averages
and sums. Unlike Expression Transformation, the Aggregator transformation can only be used to
perform calculations on groups. The Expression transformation permits calculations on a rowbyrow basis only.
13 | P a g e
Aggregator Transformation contains group by ports that indicate how to group the data. While
grouping the data, the aggregator transformation outputs the last row of each group unless
otherwise specified in the transformation properties.
Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAST, MAX,
MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE.
Q. What is Incremental Aggregation?
A. Whenever a session is created for a mapping Aggregate Transformation, the session option for
Incremental Aggregation can be enabled. When PowerCenter performs incremental aggregation,
it passes new source data through the mapping and uses historical cache data to perform
new aggregation calculations incrementally.
Q. How Union Transformation is used?
A. The union transformation is a multiple input group transformation that can be used to merge
data from various sources (or pipelines). This transformation works just like UNION ALL
statement in SQL, that is used to combine result set of two SELECT statements.
Q. Can two flat files be joined with Joiner Transformation?
A. Yes, joiner transformation can be used to join data from two flat file sources.
Q. What is a look up transformation?
A. This transformation is used to lookup data in a flat file or a relational table, view or synonym.
It compares lookup transformation ports (input ports) to the source column values based on the
lookup condition. Later returned values can be passed to other transformations.
Q. Can a lookup be done on Flat Files?
A. Yes.
Q. What is a mapplet?
A. A mapplet is a reusable object that is created using mapplet designer. The mapplet contains set
of transformations and it allows us to reuse that transformation logic in multiple mappings.
Q. What does reusable transformation mean?
A. Reusable transformations can be used multiple times in a mapping. The reusable
transformation is stored as a metadata separate from any other mapping that uses the
transformation. Whenever any changes to a reusable transformation are made, all the mappings
where the transformation is used will be invalidated.
Q. What is update strategy and what are the options for update strategy?
A. Informatica processes the source data row-by-row. By default every row is marked to be
inserted in the target table. If the row has to be updated/inserted based on some logic Update
Strategy transformation is used. The condition can be specified in Update Strategy to mark the
processed row for update or insert.
Following options are available for update strategy:
DD_INSERT: If this is used the Update Strategy flags the row for insertion. Equivalent
numeric value of DD_INSERT is 0.
DD_UPDATE: If this is used the Update Strategy flags the row for update. Equivalent numeric
value of DD_UPDATE is 1.
14 | P a g e
DD_DELETE: If this is used the Update Strategy flags the row for deletion. Equivalent
numeric value of DD_DELETE is 2.
DD_REJECT: If this is used the Update Strategy flags the row for rejection. Equivalent
numeric value of DD_REJECT is 3.
Q. What are the types of loading in Informatica?
There are two types of loading, 1. Normal loading and 2. Bulk loading.
In normal loading, it loads record by record and writes log for that. It takes comparatively a
longer time to load data to the target.
In bulk loading, it loads number of records at a time to target database. It takes less time to load
data to target.
Q. What is aggregate cache in aggregator transformation?
The aggregator stores data in the aggregate cache until it completes aggregate calculations. When
you run a session that uses an aggregator transformation, the informatica server creates index and
data caches in memory to process the transformation. If the informatica server requires more
space, it stores overflow values in cache files.
Q. What type of repositories can be created using Informatica Repository Manager?
A. Informatica PowerCenter includes following type of repositories:
Standalone Repository: A repository that functions individually and this is unrelated to any
other repositories.
Global Repository: This is a centralized repository in a domain. This repository can
contain shared objects across the repositories in a domain. The objects are shared through global
shortcuts.
Local Repository: Local repository is within a domain and its not a global repository.
Local repository can connect to a global repository using global shortcuts and can use objects in
its shared folders.
Versioned Repository: This can either be local or global repository but it allows version
control for the repository. A versioned repository can store multiple copies, or versions of an
object. This feature allows efficiently developing, testing and deploying metadata in the
production environment.
Q. What is a code page?
A. A code page contains encoding to specify characters in a set of one or more languages. The
code page is selected based on source of the data. For example if source contains Japanese text
then the code page should be selected to support Japanese text.
When a code page is chosen, the program or application for which the code page is set, refers to
a specific set of data that describes the characters the application recognizes. This influences the
way that application stores, receives, and sends character data.
Q. Which all databases PowerCenter Server on Windows can connect to?
A. PowerCenter Server on Windows can connect to following databases:
IBM DB2
15 | P a g e
Informix
Microsoft Access
Microsoft Excel
Microsoft SQL Server
Oracle
Sybase
Teradata
Q. Which all databases PowerCenter Server on UNIX can connect to?
A. PowerCenter Server on UNIX can connect to following databases:
IBM DB2
Informix
Oracle
Sybase
Teradata
Q. How to execute PL/SQL script from Informatica mapping?
A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. In SP
Transformation PL/SQL procedure name can be specified. Whenever the session is executed, the
session will call the pl/sql procedure.
Q. What is Data Driven?
The informatica server follows instructions coded into update strategy transformations within the
session mapping which determine how to flag records for insert, update, delete or reject. If we do
not choose data driven option setting, the informatica server ignores all update strategy
transformations in the mapping.
Q. What are the types of mapping wizards that are provided in Informatica?
The designer provide two mapping wizard.
1. Getting Started Wizard - Creates mapping to load static facts and dimension tables as well as
slowly growing dimension tables.
2. Slowly Changing Dimensions Wizard - Creates mappings to load slowly changing
dimension tables based on the amount of historical dimension data we want to keep and the
method we choose to handle historical dimension data.
Q. What is Load Manager?
A. While running a Workflow, the PowerCenter Server uses the Load Manager
process and the Data Transformation Manager Process (DTM) to run the workflow and carry
out workflow tasks. When the PowerCenter Server runs a workflow, the Load Manager
performs the following tasks:
1. Locks the workflow and reads workflow properties.
2. Reads the parameter file and expands workflow variables.
3. Creates the workflow log file.
4. Runs workflow tasks.
5. Distributes sessions to worker servers.
6. Starts the DTM to run sessions.
16 | P a g e
can use either the server manager or the command line program pmcmd to start
or stop the session.
Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The
Informatica Server. There Are Two Types Of Batches:
1. Sequential - Run Session One after the Other.
2. Concurrent - Run Session At The Same Time.
Q. How many ways you can update a relational source definition and what
are they?
A. Two ways
1. Edit the definition
2. Reimport the definition
Q. What is a transformation?
A. It is a repository object that generates, modifies or passes data.
Q. What are the designer tools for creating transformations?
A. Mapping designer
Transformation developer
Mapplet designer
Q. In how many ways can you create ports?
A. Two ways
1. Drag the port from another transformation
2. Click the add button on the ports tab.
Q. What are reusable transformations?
A. A transformation that can be reused is called a reusable transformation
They can be created using two methods:
1. Using transformation developer
2. Create normal one and promote it to reusable
Q. Is aggregate cache in aggregator transformation?
A. The aggregator stores data in the aggregate cache until it completes aggregate calculations.
When u run a session that uses an aggregator transformation, the Informatica server creates index
and data caches in memory to process the transformation. If the Informatica server requires more
space, it stores overflow values in cache files.
Q. What r the settings that u use to configure the joiner transformation?
Master and detail source
Type of join
Condition of the join
Q. What are the join types in joiner transformation?
A. Normal (Default) -- only matching rows from both master and detail
Master outer -- all detail rows and only matching rows from master
Detail outer -- all master rows and only matching rows from detail
Full outer -- all rows from both master and detail (matching or non matching)
Q. What are the joiner caches?
18 | P a g e
A. When a Joiner transformation occurs in a session, the Informatica Server reads all the records
from the master source and builds index and data caches based on the master rows. After
building the caches, the Joiner transformation reads records
from the detail source and performs joins.
Q. What r the types of lookup caches?
Static cache: You can configure a static or read-only cache for only lookup table. By default
Informatica server creates a static cache. It caches the lookup table and lookup values in the
cache for each row that comes into the transformation. When the lookup condition is true, the
Informatica server does not update the cache while it processes the lookup transformation.
Dynamic cache: If you want to cache the target table and insert new rows into cache and the
target, you can create a look up transformation to use dynamic cache. The Informatica server
dynamically inserts data to the target table.
Persistent cache: You can save the lookup cache files and reuse them the next time the
Informatica server processes a lookup transformation configured to use the cache.
Recache from database: If the persistent cache is not synchronized with the lookup table, you
can configure the lookup transformation to rebuild the lookup cache.
Shared cache: You can share the lookup cache between multiple transactions. You can share
unnamed cache between transformations in the same mapping.
Q. What is Transformation?
A: Transformation is a repository object that generates, modifies, or passes data.
Transformation performs specific function. They are two types of transformations:
1. Active
Rows, which are affected during the transformation or can change the no of rows that pass
through it. Eg: Aggregator, Filter, Joiner, Normalizer, Rank, Router, Source qualifier, Update
Strategy, ERP Source Qualifier, Advance External Procedure.
2. Passive
Does not change the number of rows that pass through it. Eg: Expression, External Procedure,
Input, Lookup, Stored Procedure, Output, Sequence Generator, XML Source Qualifier.
Q. What are Options/Type to run a Stored Procedure?
A: Normal: During a session, the stored procedure runs where the
transformation exists in the mapping on a row-by-row basis. This is useful for calling the stored
procedure for each row of data that passes through the mapping, such as running a calculation
against an input port. Connected stored procedures run only in normal mode.
Pre-load of the Source. Before the session retrieves data from the source, the stored procedure
runs. This is useful for verifying the existence of tables or performing joins of data in a
temporary table.
Post-load of the Source. After the session retrieves data from the source, the stored procedure
runs. This is useful for removing temporary tables.
Pre-load of the Target. Before the session sends data to the target, the stored procedure runs.
This is useful for verifying target tables or disk space on the target system.
19 | P a g e
Post-load of the Target. After the session sends data to the target, the stored procedure runs.
This is useful for re-creating indexes on the database. It must contain at least one Input and one
Output port.
Q. What kinds of sources and of targets can be used in Informatica?
Sources may be Flat file, relational db or XML.
Target may be relational tables, XML or flat files.
Q: What is Session Process?
A: The Load Manager process. Starts the session, creates the DTM process, and
sends post-session email when the session completes.
Q. What is DTM process?
A: The DTM process creates threads to initialize the session, read, write, transform
data and handle pre and post-session operations.
Q. What is the different type of tracing levels?
Tracing level represents the amount of information that Informatica Server writes in a log
file. Tracing levels store information about mapping and transformations. There are 4 types of
tracing levels supported
1. Normal: It specifies the initialization and status information and summarization of the success
rows and target rows and the information about the skipped rows due to transformation errors.
2. Terse: Specifies Normal + Notification of data
3. Verbose Initialization: In addition to the Normal tracing, specifies the location of the data
cache files and index cache files that are treated and detailed transformation statistics for each
and every transformation within the mapping.
4. Verbose Data: Along with verbose initialization records each and every record processed by
the informatica server.
Q. TYPES OF DIMENSIONS?
A dimension table consists of the attributes about the facts. Dimensions store the textual
descriptions of the business.
Conformed Dimension:
Conformed dimensions mean the exact same thing with every possible fact table to which they
are joined.
Eg: The date dimension table connected to the sales facts is identical to the date dimension
connected to the inventory facts.
Junk Dimension:
A junk dimension is a collection of random transactional codes flags and/or text attributes that
are unrelated to any particular dimension. The junk dimension is simply a structure that provides
a convenient place to store the junk attributes.
Eg: Assume that we have a gender dimension and marital status dimension. In the fact table we
need to maintain two keys referring to these dimensions. Instead of that create a junk dimension
which has all the combinations of gender and marital status (cross join gender and marital status
table and create a junk table). Now we can maintain only one key in the fact table.
20 | P a g e
Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table and doesnt have its
own dimension table.
Eg: A transactional code in a fact table.
Slowly changing dimension:
Slowly changing dimensions are dimension tables that have slowly increasing
data as well as updates to existing data.
Q. What are the output files that the Informatica server creates during the
session running?
Informatica server log: Informatica server (on UNIX) creates a log for all status and
error messages (default name: pm.server.log). It also creates an error log for error
messages. These files will be created in Informatica home directory
Session log file: Informatica server creates session log file for each session. It writes
information about session into log files such as initialization process, creation of sql
commands for reader and writer threads, errors encountered and load summary. The
amount of detail in session log file depends on the tracing level that you set.
Session detail file: This file contains load statistics for each target in mapping.
Session detail includes information such as table name, number of rows written or
rejected. You can view this file by double clicking on the session in monitor window.
Performance detail file: This file contains information known as session performance
details which helps you where performance can be improved. To generate this file
select the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does not write to
targets.
Control file: Informatica server creates control file and a target file when you run a
session that uses the external loader. The control file contains the information about
the target flat file such as data format and loading instructions for the external
loader.
Post session email: Post session email allows you to automatically communicate
information about a session run to designated recipients. You can create two
different messages. One if the session completed successfully the other if the session
fails.
Indicator file: If you use the flat file as a target, you can configure the Informatica
server to create indicator file. For each target row, the indicator file contains a
number to indicate whether the row was marked for insert, update, delete or reject.
Output file: If session writes to a target file, the Informatica server creates the
target file based on file properties entered in the session property sheet.
Cache files: When the Informatica server creates memory cache it also creates cache
files.
For the following circumstances Informatica server creates index and data cache
files:
21 | P a g e
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation
Q. What is meant by lookup caches?
A. The Informatica server builds a cache in memory when it processes the first row
of a data in a cached look up transformation. It allocates memory for the cache
based on the amount you configure in the transformation or session properties. The
Informatica server stores condition values in the index cache and output values in
the data cache.
Q. How do you identify existing rows of data in the target table using lookup
transformation?
A. There are two ways to lookup the target table to verify a row exists or not :
1. Use connect dynamic cache lookup and then check the values of NewLookuprow
Output port to decide whether the incoming record already exists in the table / cache
or not.
2. Use Unconnected lookup and call it from an expression transformation and check
the Lookup condition port value (Null/ Not Null) to decide whether the incoming
record already exists in the table or not.
Q. What are Aggregate tables?
Aggregate table contains the summary of existing warehouse data which is grouped to certain
levels of dimensions. Retrieving the required data from the actual table, which have millions of
records will take more time and also affects the server performance. To avoid this we can
aggregate the table to certain required level and can use it. This tables reduces the load in the
database server and increases the performance of the query and can retrieve the result very fastly.
Q. What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a data warehouse. For
example: Based on design you can decide to put the sales data in each transaction. Now, level of
granularity would mean what detail you are willing to put for each transactional fact. Product
sales with respect to each minute or you want to aggregate it upto minute and put that data.
Q. What is session?
A session is a set of instructions to move data from sources to targets.
Q. What is worklet?
Worklet are objects that represent a set of workflow tasks that allow to reuse a set of workflow
logic in several window.
Use of Worklet: You can bind many of the tasks in one place so that they can easily get
identified and also they can be of a specific purpose.
Q. What is workflow?
A workflow is a set of instructions that tells the Informatica server how to execute the tasks.
Q. Why cannot we use sorted input option for incremental aggregation?
22 | P a g e
In incremental aggregation, the aggregate calculations are stored in historical cache on the server.
In this historical cache the data need not be in sorted order. If you give sorted input, the records
come as presorted for that particular run but in the historical cache the data may not be in the
sorted order. That is why this option is not allowed.
Q. What is target load order plan?
You specify the target loadorder based on source qualifiers in a mapping. If you have the
multiple source qualifiers connected to the multiple targets, you can designate the order in which
informatica server loads data into the targets.
The Target load Plan defines the order in which data extract from source qualifier transformation.
In Mappings (tab) Target Load Order Plan
Q. What is constraint based loading?
Constraint based load order defines the order of loading the data into the multiple targets based
on primary and foreign keys constraints.
Set the option is: Double click the session
Configure Object > check the Constraint Based Loading
Q. What is the status code in stored procedure transformation?
Status code provides error handling for the informatica server during the session. The stored
procedure issues a status code that notifies whether or not stored procedure completed
successfully. This value cannot see by the user. It only used by the informatica server to
determine whether to continue running the session or stop.
Q. Define Informatica Repository?
The Informatica repository is a relational database that stores information, or metadata, used by
the Informatica Server and Client tools. Metadata can include information such as mappings
describing how to transform source data, sessions indicating when you want the Informatica
Server to perform the transformations, and connect strings for sources and targets.
The repository also stores administrative information such as usernames and passwords,
permissions and privileges, and product version.
Use repository manager to create the repository. The Repository Manager connects to the
repository database and runs the code needed to create the repository tables. These tables stores
metadata in specific format the informatica server, client tools use.
Q. What is a metadata?
Designing a data mart involves writing and storing a complex set of instructions. You need to
know where to get data (sources), how to change it, and where to write the information (targets).
PowerMart and PowerCenter call this set of instructions metadata. Each piece of metadata (for
example, the description of a source table in an operational database) can contain comments
about it.
In summary, Metadata can include information such as mappings describing how to transform
source data, sessions indicating when you want the Informatica Server to perform the
transformations, and connect strings for sources and targets.
Q. What is metadata reporter?
23 | P a g e
It is a web based application that enables you to run reports against repository metadata. With a
Meta data reporter you can access information about your repository without having knowledge
of sql, transformation language or underlying tables in the repository.
24 | P a g e
directory local to the Informatica Server. Server waits for the indicator file to appear before
running the session.
Q. What is audit table? and What are the columns in it?
Audit Table is nothing but the table which contains about your workflow names and session
names. It contains information about workflow and session status and their details.
WKFL_RUN_ID
WKFL_NME
START_TMST
END_TMST
ROW_INSERT_CNT
ROW_UPDATE_CNT
ROW_DELETE_CNT
ROW_REJECT_CNT
Q. If session fails after loading 10000 records in the target, how can we load 10001th record
when we run the session in the next time?
Select the Recovery Strategy in session properties as Resume from the last check point.
Note Set this property before running the session
Q. Informatica Reject File How to identify rejection reason
D - Valid data or Good Data. Writer passes it to the target database. The target accepts it unless
a database error occurs, such as finding a duplicate key while inserting.
O - Overflowed Numeric Data. Numeric data exceeded the specified precision or scale for the
column. Bad data, if you configured the mapping target to reject overflow or truncated data.
N - Null Value. The column contains a null value. Good data. Writer passes it to the target,
which rejects it if the target database does not accept null values.
T - Truncated String Data. String data exceeded a specified precision for the column, so the
Integration Service truncated it. Bad data, if you configured the mapping target to reject overflow
or truncated data.
Also to be noted that the second column contains column indicator flag value D which signifies
that the Row Indicator is valid.
Now let us see how Data in a Bad File looks like:
0,D,7,D,John,D,5000.375,O,,N,BrickLand Road Singapore,T
Q. What is Insert Else Update and Update Else Insert?
These options are used when dynamic cache is enabled.
Insert Else Update option applies to rows entering the lookup transformation with the row
type of insert. When this option is enabled the integration service inserts new rows in the cache
and updates existing rows. When disabled, the Integration Service does not update existing rows.
Update Else Insert option applies to rows entering the lookup transformation with the row
type of update. When this option is enabled, the Integration Service updates existing rows, and
inserts a new row if it is new. When disabled, the Integration Service does not insert new rows.
26 | P a g e
27 | P a g e
Q. What is Factless fact table? In which purpose we are using this in our DWH projects?
Plz give me the proper answer?
It is a fact table which does not contain any measurable data.
EX: Student attendance fact (it contains only Boolean values, whether student attended class or
not ? Yes or No.)
A Factless fact table contains only the keys but there is no measures or in other way we can say
that it contains no facts. Generally it is used to integrate the fact tables
Factless fact table contains only foreign keys. We can have two kinds of aggregate functions
from the factless fact one is count and other is distinct count.
2 purposes of factless fact
1. Coverage: to indicate what did NOT happen. Like to
Like: which product did not sell well in a particular region?
2. Event tracking: To know if the event took place or not.
Like: Fact for tracking students attendance will not contain any measures.
Q. What is staging area?
Staging area is nothing but to apply our logic to extract the data from source and cleansing the
data and put the data into meaningful and summaries of the data for data warehouse.
Q. What is constraint based loading
Constraint based load order defines the order of loading the data into the multiple targets based
on primary and foreign keys constraints.
Q. Why union transformation is active transformation?
the only condition for a transformation to bcum active is row number changes.
Now the thing is how a row number can change. Then there are
2 conditions:
1. either the no of rows coming in and going out is diff.
eg: in case of filter we have the data like
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
and we have a filter condition like dept=4 then the o/p wld
b like
id name dept row_num
1 aa 4 1
3 cc 4 2
28 | P a g e
When we use the joiner transformation an integration service maintains the cache, all the records
are stored in joiner cache. Joiner caches have 2 types of cache 1.Index cache 2. Joiner cache.
Index cache stores all the port values which are participated in the join condition and data cache
have stored all ports which are not participated in the join condition.
Q. What is the location of parameter file in Informatica?
$PMBWPARAM
Q. How can you display only hidden files in UNIX
$ ls -la
total 16
8 drwxrwxrwx 2 zzz yyy 4096 Apr 26 12:00 ./
8 drwxrwxrwx 9 zzz yyy 4096 Jul 31 16:59 ../
Correct answer is
ls -a|grep "^\."
$ls -a
Q. How to delete the data in the target table after loaded.
SQ---> Properties tab-->Post SQL
delete from target_tablename
SQL statements executed using the source database connection, after a pipeline is run write post
sql in target table as truncate table name. we have the property in session truncate option.
Q. What is polling in informatica?
It displays the updated information about the session in the monitor window. The monitor
window displays the status of each session when you poll the Informatica server.
Q. How i will stop my workflow after 10 errors
Session level property error handling mention condition stop on errors: 10
--->Config object > Error Handling > Stop on errors
Q. How can we calculate fact table size?
A fact table is multiple of combination of dimension tables
ie if we want 2 find the fact table size of 3years of historical date with 200 products and 200
stores
3*365*200*200=fact table size
Q. Without using emailtask how will send a mail from informatica?
by using 'mailx' command in unix of shell scripting
Q. How will compare two mappings in two different repositories?
30 | P a g e
A joiner Transformation is used to join data from hertogenous database ie (Sql database and flat
file) where has Union transformation is used to join data from
the same relational sources.....(oracle table and another Oracle table)
Join Transformation combines data record horizontally based on join condition.
And combine data from two different sources having different metadata.
Join transformation supports heterogeneous, homogeneous data source.
Union Transformation combines data record vertically from multiple sources, having same
metadata.
Union transformation also support heterogeneous data source.
Union transformation functions as UNION ALL set operator.
Q. What is constraint based loading exactly? And how to do this? I think it is when we
have primary key-foreign key relationship. Is it correct?
Constraint Based Load order defines load the data into multiple targets depend on the primary
key foreign key relation.
set the option is: Double click the session
Configure Object check the Constraint Based Loading
Q. Difference between top down(w.h inmon)and bottom up(ralph kimball)approach?
Top Down approach:As per W.H.INWON, first we need to build the Data warehouse after that we need to build up the
DataMart but this is so what difficult to maintain the DWH.
Bottom up approach;As per Ralph Kimbal, first we need to build up the Data Marts then we need to build up the
Datawarehouse..
this approach is most useful in real time while creating the Data warehouse.
Static cache
Dynamic cache
32 | P a g e
Shared cache
Persistent cache
Static cache
Dynamic cache
Shared cache
Persistent cache
Static cache
Static Cache is same as a Cached Lookup in which once a Cache is created the
Integration Service always queries the Cache instead of the Lookup Table.
In Static Cache when the Lookup condition is true it return value from lookup
table else returns Null or Default value. In Static Cache the important thing is that
you cannot insert or update the cache.
Dynamic cache
In Dynamic Cache we can insert or update rows in the cache when we pass the
rows. The Integration Service dynamically inserts or updates data in the lookup
cache and passes the data to the target. The dynamic cache is synchronized with
the target
Shared cache
When we use shared Cache Informatica server creates the cache memory for
multiple lookup transformations in the mapping and once the lookup is done for
the first lookup then memory is released and use that memory used by the other
look up transformation.
We can share the lookup cache between multiple transformations. Unnamed cache
is shared between transformations in the same mapping and named cache
between transformations in the same or different mappings.
Persistent cache
If we use Persistent cache Informatica server processes a lookup transformation
and saves the lookup cache files and reuses them the next time. The Integration
Service saves or deletes lookup cache files after a successful session run based on
whether the Lookup cache is checked as persistent or not
In order to make a Lookup Cache as Persistent cache you need to make the
following changes
33 | P a g e
Cache File Name Prefix: Enter the Named Persistent cache file name
2) Types of join
3) Condition of the join
Q. Explain about Lookup transformation?
Lookup t/r is used in a mapping to look up data in a relational table, flat file, view or synonym.
The Informatica server queries the look up source based on the look up ports in the
transformation. It compares look up t/r port values to look up source column values based on the
look up condition.
Look up t/r is used to perform the below mentioned tasks:
1) To get a related value.
2) To perform a calculation.
3) To update SCD tables.
Q. How to identify this row for insert and this row for update in dynamic lookup cache?
Based on NEW LOOKUP ROW.. Informatica server indicates which one is insert and which one
is update.
Newlookuprow- 0...no change
Newlookuprow- 1...Insert
Newlookuprow- 2...update
Q. How many ways can we implement SCD2?
1) Date range (Startdate and Effective date)
2) Flag
3) Versioning
Q. How will you check the bottle necks in Informatica? From where do you start checking?
You start as per this order
1. Target
2. Source
3. Mapping
4. Session
5. System
Q. What is incremental aggregation?
When the aggregator transformation executes all the output data will get stored in the temporary
location called aggregator cache. When the next time the mapping runs the aggregator
transformation runs for the new records loaded after the first run. These output values will get
incremented with the values in the aggregator cache. This is called incremental aggregation. By
this way we can improve performance...
--------------------------36 | P a g e
Incremental aggregation means applying only the captured changes in the source to aggregate
calculations in a session.
When the source changes only incrementally and if we can capture those changes, then we can
configure the session to process only those changes. This allows Informatica server to update
target table incrementally, rather than forcing it to process the entire source and recalculate the
same calculations each time you run the session. By doing this obviously the session
performance increases.
Q. How can i explain my project architecture in interview..? Tell me your project flow from
source to target..?
Project architecture is like
1. Source Systems: Like Mainframe,Oracle,People soft,DB2.
2. Landing tables: These are tables act like source. Used for easy to access, for backup purpose,
as reusable for other mappings.
3. Staging tables: From landing tables we extract the data into staging tables after all validations
done on the data.
4. Dimension/Facts: These are the tables those are used for analysis and make decisions by
analyzing the data.
5. Aggregation tables: These tables have summarized data useful for managers who wants to
view monthly wise sales, year wise sales etc.
6. Reporting layer: 4 and 5 phases are useful for reporting developers to generate reports. I hope
this answer helps you.
Q. What type of transformation is not supported by mapplets?
Normalizer transformation
COBOL sources, joiner
XML source qualifier transformation
XML sources
Target definitions
Pre & Post Session stored procedures
Other mapplets
Except source qualifier transformation, all transformations support reusable property. Reusable
transformation developed in two ways.
1. In mapping which transformation do you want to reuse, select the transformation and double
click on it, there you got option like make it as reusable transformation
option. There you need to check the option for converting non reusable to reusable
transformation. but except for source qualifier trans.
2. By using transformation developer
Q. What is Pre Sql and Post Sql?
Pre SQL means that the integration service runs SQL commands against the source database
before it reads the data from source.
Post SQL means integration service runs SQL commands against target database after it writes
to the target.
source qualifier you filter the records as per hire date and you can also parameterized the hire
date that help from which date you want to load data upon target.
This is the concept of Incremental loading.
Q. What is target update override?
By Default the integration service updates the target based on key columns. But we might want
to update non-key columns also, at that point of time we can override the UPDATE statement for
each target in the mapping. The target override affects only when the source rows are marked as
update by an update strategy in the mapping.
39 | P a g e
Rank :
select rank() over (partition by empno order by sal) from emp
1
2
2
4
5
6
Dense Rank
select dense_rank() over (partition by empno order by sal) from emp
and dense rank gives
1
2
2
3
4
5
Q. What is the incremental aggregation?
The first time you run an upgraded session using incremental aggregation, the Integration
Service upgrades the index and data cache files. If you want to partition a session using a
mapping with incremental aggregation, the Integration Service realigns the index and data cache
files.
Q. What is session parameter?
Parameter file is a text file where we can define the values to the parameters .session parameters
are used for assign the database connection values
Q. What is mapping parameter?
A mapping parameter represents a constant value that can be defined before mapping run. A
mapping parameter defines a parameter file which is saved with an extension .prm a mapping
parameter reuse the various constant values.
A parameter file can be a text file. Parameter file is to define the values for parameters and
variables used in a session. A parameter file is a file created by text editor such as word pad or
notepad. You can define the following values in parameter file
Mapping parameters
Mapping variables
Session parameters
Q. What is session override?
Session override is an option in informatica at session level. Here we can manually give a sql
query which is issued to the database when the session runs. It is nothing but over riding the
default sql which is generated by a particular transformation at mapping level.
Q. What are the diff. b/w informatica versions 8.1.1 and 8.6.1?
Little change in the Administrator Console. In 8.1.1 we can do all the creation of IS and
repository Service, web service, Domain, node, grid ( if we have licensed version),In 8.6.1 the
Informatica Admin console we can manage both Domain page and security page. Domain Page
means all the above like creation of IS and repository Service, web service, Domain, node, grid
( if we have licensed version) etc. Security page means creation of users, privileges, LDAP
configuration, Export Import user and Privileges etc.
Q. What are the uses of a Parameter file?
Parameter file is one which contains the values of mapping variables.
type this in notepad.save it .
foldername.sessionname
$$inputvalue1=
--------------------------------Parameter files are created with an extension of .PRM
These are created to pass values those can be changed for Mapping Parameter and Session
Parameter during mapping run.
Mapping Parameters:
A Parameter is defined in a parameter file for which a Parameter is create already in the Mapping
with Data Type , Precision and scale.
The Mapping parameter file syntax (xxxx.prm).
[FolderName.WF:WorkFlowName.ST:SessionName]
$$ParameterName1=Value
$$ParameterName2=Value
After that we have to select the properties Tab of Session and Set Parameter file name including
physical path of this xxxx.prm file.
42 | P a g e
Session Parameters:
The Session Parameter files syntax (yyyy.prm).
[FolderName.SessionName]
$InputFileValue1=Path of the source Flat file
After that we have to select the properties Tab of Session and Set Parameter file name including
physical path of this yyyy.prm file.
Do following changes in Mapping Tab of Source Qualifier's
Properties section
Attributes
values
Source file Type ---------> Direct
Source File Directory --------> Empty
Source File Name
--------> $InputFileValue1
Q. What is the default data driven operation in informatica?
This is default option for update strategy transformation.
The integration service follows instructions coded in update strategy within session mapping
determine how to flag records for insert,delete,update,reject. If you do not data driven option
setting, the integration service ignores update strategy transformations in the mapping.
Q. What is threshold error in informatica?
When the target is used by the update strategy DD_REJECT,DD_UPDATE and some limited
count, then if it the number of rejected records exceed the count then the session ends with failed
status. This error is called Threshold Error.
Q. SO many times i saw "$PM parser error ". What is meant by PM?
PM: POWER MART
1) Parsing error will come for the input parameter to the lookup.
2) Informatica is not able to resolve the input parameter CLASS for your lookup.
3) Check the Port CLASS exists as either input port or a variable port in your expression.
4) Check data type of CLASS and the data type of input parameter for your lookup.
Q. What is a candidate key?
A candidate key is a combination of attributes that can be uniquely used to identify a database
record without any extraneous data (unique). Each table may have one or more candidate keys.
One of these candidate keys is selected as the table primary key else are called Alternate Key.
43 | P a g e
Q. Can anyone tell me the difference between persistence and dynamic caches? On which
conditions we are using these caches?
Dynamic:-1)When you use a dynamic cache, the Informatica Server updates the lookup cache as it passes
rows to the target.
2)In Dynamic, we can update catch will New data also.
3) Dynamic cache, Not Reusable
(when we need Updated cache data, That only we need Dynamic Cache)
Persistent:-1)a Lookup transformation to use a non-persistent or persistent cache. The PowerCenter Server
saves or deletes lookup cache files after a successful session based on the Lookup Cache
Persistent property.
2) Persistent, we are not able to update the catch with New data.
3) Persistent catch is Reusable.
(When we need Previous Cache data, That only we need Persistent Cache)
---------------------------------few more additions to the above answer.....
1. Dynamic lookup allows modifying cache where as Persistent lookup does not allow us to
modify cache.
2. Dynamic lookup use 'newlookup row', a default port in the cache but persistent does use any
default ports in cache.
3.As session completes dynamic cache removed but the persistent cache saved in informatica
power centre server.
Q. How to obtain performance data for individual transformations?
There is a property at session level Collect Performance Data, you can select that property. It
gives you performance details for all the transformations.
46 | P a g e
is running for a long time, you may like to find out the bottlenecks that are existing. It may be
bottleneck of type target, source, mapping etc.
The basic idea behind test load is to see the behavior of Informatica Server with your session.
Q. What is ODS (Operational Data Store)?
A collection of operation or bases data that is extracted from operation databases and
standardized, cleansed, consolidated, transformed, and loaded into enterprise data architecture.
An ODS is used to support data mining of operational data, or as the store for base data that is
summarized for a data warehouse.
The ODS may also be used to audit the data warehouse to assure summarized and derived data is
calculated properly. The ODS may further become the enterprise shared operational database,
allowing operational systems that are being reengineered to use the ODS as there operation
databases.
Domains
Nodes
Services
Q. WHAT IS VERSIONING?
Its used to keep history of changes done on the mappings and workflows
1. Check in: You check in when you are done with your changes so that everyone can see those
changes.
2. Check out: You check out from the main stream when you want to make any change to the
mapping/workflow.
3. Version history: It will show you all the changes made and who made it.
48 | P a g e
t - Truncate
When the data is with nulls, or overflow it will be rejected to write the data to the target
The reject data is stored on reject files. You can check the data and reload the data in to the target
using reject reload utility.
Q. Difference between STOP and ABORT?
Stop - If the Integration Service is executing a Session task when you issue the stop command,
the Integration Service stops reading data. It continues processing and writing data and
committing data to targets. If the Integration Service cannot finish processing and committing
data, you can issue the abort command.
Abort - The Integration Service handles the abort command for the Session task like the stop
command, except it has a timeout period of 60 seconds. If the Integration Service cannot finish
processing and committing data within the timeout period, it kills the DTM process and
terminates the session.
Q. WHAT IS INLINE VIEW?
An inline view is term given to sub query in FROM clause of query which can be used as table.
Inline view effectively is a named sub query
Ex : Select Tab1.col1,Tab1.col.2,Inview.col1,Inview.Col2
From Tab1, (Select statement) Inview
Where Tab1.col1=Inview.col1
SELECT DNAME, ENAME, SAL FROM EMP ,
(SELECT DNAME, DEPTNO FROM DEPT) D
WHERE A.DEPTNO = B.DEPTNO
In the above query (SELECT DNAME, DEPTNO FROM DEPT) D is the inline view.
Inline views are determined at runtime, and in contrast to normal view they are not stored in the
data dictionary,
Disadvantage of using this is
1. Separate view need to be created which is an overhead
2. Extra time taken in parsing of view
This problem is solved by inline view by using select statement in sub query and using that as
table.
50 | P a g e
The integration service increments the generated key (GK) sequence number each time it
process a source row. When the source row contains a multiple-occurring column or a multipleoccurring group of columns, the normalizer transformation returns a row for each occurrence.
Each row contains the same generated key value.
The normalizer transformation has a generated column ID (GCID) port for each multipleoccurring column. The GCID is an index for the instance of the multiple-occurring data. For
example, if a column occurs 3 times in a source record, the normalizer returns a value of 1, 2 or 3
in the generated column ID.
Q. WHAT IS DIFFERENCE BETWEEN SUBSTR AND INSTR?
INSTR function search string for sub-string and returns an integer indicating the position of the
character in string that is the first character of this occurrence.
SUBSTR function returns a portion of string, beginning at character position, substring_length
characters long. SUBSTR calculates lengths using characters as defined by the input character
set.
Q. WHAT ARE DIFFERENT ORACLE DATABASE OBJECTS?
TABLES
VIEWS
INDEXES
SYNONYMS
SEQUENCES
TABLESPACES
Q. WHAT IS @@ERROR?
The @@ERROR automatic variable returns the error code of the last Transact-SQL statement. If
there was no error, @@ERROR returns zero. Because @@ERROR is reset after each TransactSQL statement, it must be saved to a variable if it is needed to process it further after checking it.
Q. WHAT IS DIFFERENCE BETWEEN CO-RELATED SUB QUERY AND NESTED
SUB QUERY?
Correlated subquery runs once for each row selected by the outer query. It contains a reference
to a value from the row selected by the outer query.
51 | P a g e
Nested subquery runs only once for the entire nesting (outer) query. It does not contain any
reference to the outer query row.
For example,
Correlated Subquery:
Select e1.empname, e1.basicsal, e1.deptno from emp e1 where e1.basicsal = (select
max(basicsal) from emp e2 where e2.deptno = e1.deptno)
Nested Subquery:
Select empname, basicsal, deptno from emp where (deptno, basicsal) in (select deptno,
max(basicsal) from emp group by deptno)
1.
2.
3.
4.
1.
2.
3.
4.
Surrogate key:
Query processing is fast.
It is only numeric
Developer develops the surrogate key using sequence generator transformation.
Eg: 12453
Primary key:
Query processing is slow
Can be alpha numeric
Source system gives the primary key.
Eg: C10999
Q. HOW DOES ONE ELIMINATE DUPLICATE ROWS IN AN ORACLE TABLE?
Method 1:
DELETE from table_name A
where rowid > (select min(rowid) from table_name B where A.key_values = B.key_values);
Method 2:
52 | P a g e
53 | P a g e
Q. How does the server recognize the source and target databases?
If it is relational - By using ODBC connection
FTP connection - By using flat file
Q. WHAT ARE THE DIFFERENT TYPES OF INDEXES SUPPORTED BY ORACLE?
1.
2.
3.
4.
5.
6.
B-tree index
B-tree cluster index
Hash cluster index
Reverse key index
Bitmap index
Function Based index
ls ltr (reverse)
ls lS (sort by size of the file)
Q. How do identify the empty line in a flat file in UNIX? How to remove it?
grep v ^$ filename
Q. How do send the session report (.txt) to manager after session is completed?
Email variable %a (attach the file) %g attach session log file
Q. How to check all the running processes in UNIX?
$> ps ef
Q. How can i display only and only hidden file in the current directory?
ls -a|grep "^\."
Q. How to display the first 10 lines of a file?
# head -10 logfile
Q. How to display the last 10 lines of a file?
# tail -10 logfile
55 | P a g e
57 | P a g e
Q. WHAT ARE THE DIFFERENCE BETWEEN DDL, DML AND DCL COMMANDS?
58 | P a g e
using a view are not permanently stored in the database. The data accessed through a view is
actually constructed using standard T-SQL select command and can come from one to many
different base tables or even other views.
Q. What is Index?
An index is a physical structure containing pointers to the data. Indices are created in an existing
table to locate rows more quickly and efficiently. It is possible to create an index on one or more
columns of a table, and each index is given a name. The users cannot see the indexes; they are
just used to speed up queries. Effective indexes are one of the best ways to improve performance
in a database application. A table scan happens when there is no index available to help a query.
In a table scan SQL Server examines every row in the table to satisfy the query results. Table
scans are sometimes unavoidable, but on large tables, scans have a terrific impact on
performance. Clustered indexes define the physical sorting of a database tables rows in the
storage media. For this reason, each database table may
have only one clustered index. Non-clustered indexes are created outside of the database table
and contain a sorted list of references to the table itself.
Q. What is the difference between clustered and a non-clustered index?
A clustered index is a special type of index that reorders the way records in the table are
physically stored. Therefore table can have only one clustered index. The leaf nodes of a
clustered index contain the data pages. A nonclustered index is a special type of index in which
the logical order of the index does not match the physical stored order of the rows on disk. The
leaf node of a nonclustered index does not consist of the data pages. Instead, the leaf nodes
contain index rows.
Q. What is Cursor?
Cursor is a database object used by applications to manipulate data in a set on a row-by row
basis, instead of the typical SQL commands that operate on all the rows in the set at one time.
In order to work with a cursor we need to perform some steps in the following order:
Declare cursor
Open cursor
Fetch row from the cursor
Process fetched row
Close cursor
Deallocate cursor
Q. What is the difference between a HAVING CLAUSE and a WHERE CLAUSE?
1. Specifies a search condition for a group or an aggregate. HAVING can be used only with the
SELECT statement.
2. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING
behaves like a WHERE clause.
3. Having Clause is basically used only with the GROUP BY function in a query. WHERE
Clause is applied to each row before they are part of the GROUP BY function in a query.
60 | P a g e
RANK CACHE
Sample Rank Mapping
When the Power Center Server runs a session with a Rank transformation, it compares an input
row with rows in the data cache. If the input row out-ranks a Stored row, the Power Center
Server replaces the stored row with the input row.
Example: Power Center caches the first 5 rows if we are finding top 5 salaried Employees.
When 6th row is read, it compares it with 5 rows in cache and places it in Cache is needed.
1) RANK INDEX CACHE:
The index cache holds group information from the group by ports. If we are Using Group By on
DEPTNO, then this cache stores values 10, 20, 30 etc.
All Group By Columns are in RANK INDEX CACHE. Ex. DEPTNO
2) RANK DATA CACHE:
It holds row data until the Power Center Server completes the ranking and is generally larger
than the index cache. To reduce the data cache size, connect only the necessary input/output ports
to subsequent transformations.
All Variable ports if there, Rank Port, All ports going out from RANK Transformations are
stored in RANK DATA CACHE.
Example: All ports except DEPTNO In our mapping example.
61 | P a g e
Aggregator Caches
1. The Power Center Server stores data in the aggregate cache until it completes Aggregate
calculations.
2. It stores group values in an index cache and row data in the data cache. If the Power Center
Server requires more space, it stores overflow values in cache files.
Note: The Power Center Server uses memory to process an Aggregator transformation with
sorted ports. It does not use cache memory. We do not need to configure cache memory for
Aggregator transformations that use sorted ports.
1) Aggregator Index Cache:
The index cache holds group information from the group by ports. If we are using Group By on
DEPTNO, then this cache stores values 10, 20, 30 etc.
62 | P a g e
JOINER CACHES
Joiner always caches the MASTER table. We cannot disable caching. It builds Index cache and
Data Cache based on MASTER table.
1) Joiner Index Cache:
All Columns of MASTER table used in Join condition are in JOINER INDEX CACHE.
Example: DEPTNO in our mapping.
2) Joiner Data Cache:
Master column not in join condition and used for output to other transformation or target table
are in Data Cache.
Example: DNAME and LOC in our mapping example.
63 | P a g e
64 | P a g e
65 | P a g e
66 | P a g e
Unconnected Lookup
Cache includes all lookup columns used Cache includes all lookup/output ports in the lookup
in the mapping.
condition and the lookup/return port.
If there is no match for the lookup
condition, the Power Center Server
returns the default value for all output
ports.
67 | P a g e
Cache Comparison
Persistence and Dynamic Caches
Dynamic
1) When you use a dynamic cache, the Informatica Server updates the lookup cache as it passes
rows to the target.
2) In Dynamic, we can update catch will new data also.
3) Dynamic cache, Not Reusable.
(When we need updated cache data, That only we need Dynamic Cache)
Persistent
1) A Lookup transformation to use a non-persistent or persistent cache. The PowerCenter Server
saves or deletes lookup cache files after a successful session based on the Lookup Cache
Persistent property.
2) Persistent, we are not able to update the catch with new data.
3) Persistent catch is Reusable.
(When we need previous cache data, that only we need Persistent Cache)
View And Materialized View
Star Schema And Snow Flake Schema
68 | P a g e
Informatica - Transformations
In Informatica, Transformations help to transform the source data according to the requirements
of target system and it ensures the quality of the data being loaded into target.
Transformations are of two types: Active and Passive.
Active Transformation
An active transformation can change the number of rows that pass through it from source to
target. (i.e) It eliminates rows that do not meet the condition in transformation.
Passive Transformation
A passive transformation does not change the number of rows that pass through it (i.e) It passes
all rows through the transformation.
Transformations can be Connected or Unconnected.
Connected Transformation
Connected transformation is connected to other transformations or directly to target table in the
mapping.
Unconnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. It is
called within another transformation, and returns a value to that transformation.
Following are the list of Transformations available in Informatica:
Aggregator Transformation
Expression Transformation
Filter Transformation
Joiner Transformation
Lookup Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Sequence Generator Transformation
Stored Procedure Transformation
Sorter Transformation
Update Strategy Transformation
XML Source Qualifier Transformation
In the following pages, we will explain all the above Informatica Transformations and their
significances in the ETL process in detail.
69 | P a g e
=====================================================================
=========
Aggregator Transformation
Aggregator transformation is an Active and Connected transformation.
This transformation is useful to perform calculations such as averages and sums (mainly to
perform calculations on multiple rows or groups).
For example, to calculate total of daily sales or to calculate average of monthly or yearly sales.
Aggregate functions such as AVG, FIRST, COUNT, PERCENTILE, MAX, SUM etc. can be
used in aggregate transformation.
=====================================================================
=========
Expression Transformation
Expression transformation is a Passive and Connected transformation.
This can be used to calculate values in a single row before writing to the target.
For example, to calculate discount of each product
or to concatenate first and last names
or to convert date to a string field.
=====================================================================
=========
Filter Transformation
Filter transformation is an Active and Connected transformation.
This can be used to filter rows in a mapping that do not meet the condition.
For example,
To know all the employees who are working in Department 10 or
To find out the products that falls between the rate category $500 and $1000.
=====================================================================
=========
70 | P a g e
Joiner Transformation
Joiner Transformation is an Active and Connected transformation. This can be used to join two
sources coming from two different locations or from same location. For example, to join a flat
file and a relational source or to join two flat files or to join a relational source and a XML
source.
In order to join two sources, there must be at least one matching port. While joining two sources
it is a must to specify one source as master and the other as detail.
The Joiner transformation supports the following types of joins:
1)Normal
2)Master Outer
3)Detail Outer
4)Full Outer
Normal join discards all the rows of data from the master and detail source that do not match,
based on the condition.
Master outer join discards all the unmatched rows from the master source and keeps all the rows
from the detail source and the matching rows from the master source.
Detail outer join keeps all rows of data from the master source and the matching rows from the
detail source. It discards the unmatched rows from the detail source.
Full outer join keeps all rows of data from both the master and detail sources.
=====================================================================
=========
Lookup transformation
Lookup transformation is Passive and it can be both Connected and UnConnected as well. It is
used to look up data in a relational table, view, or synonym. Lookup definition can be imported
either from source or from target tables.
For example, if we want to retrieve all the sales of a product with an ID 10 and assume that the
sales data resides in another table. Here instead of using the sales table as one more source, use
Lookup transformation to lookup the data for the product, with ID 10 in sales table.
71 | P a g e
Connected lookup receives input values directly from mapping pipeline whereas
Unconnected lookup receives values from: LKP expression from another transformation.
Connected lookup returns multiple columns from the same row whereas
Unconnected lookup has one return port and returns one column from each row.
Constraint-Based Loading
In the Workflow Manager, you can specify constraint-based loading for a session. When you
select this option, the Integration Service orders the target load on a row-by-row basis. For every
row generated by an active source, the Integration Service loads the corresponding transformed
row first to the primary key table, then to any foreign key tables. Constraint-based loading
depends on the following requirements:
Active source: Related target tables must have the same active source.
Key relationships: Target tables must have key relationships.
Target connection groups: Targets must be in one target connection group.
Treat rows as insert. Use this option when you insert into the target. You cannot use updates with
constraint based loading.
Active Source:
When target tables receive rows from different active sources, the Integration Service reverts to
normal loading for those tables, but loads all other targets in the session using constraint-based
loading when possible. For example, a mapping contains three distinct pipelines. The first two
contain a source, source qualifier, and target. Since these two targets receive data from different
active sources, the Integration Service reverts to normal loading for both targets. The third
pipeline contains a source, Normalizer, and two targets. Since these two targets share a single
active source (the Normalizer), the Integration Service performs constraint-based loading:
loading the primary key table first, then the foreign key table.
Key Relationships:
When target tables have no key relationships, the Integration Service does not perform
constraint-based loading.
Similarly, when target tables have circular key relationships, the Integration Service reverts to a
normal load. For example, you have one target containing a primary key and a foreign key
related to the primary key in a second target. The second target also contains a foreign key that
references the primary key in the first target. The Integration Service cannot enforce constraintbased loading for these tables. It reverts to a normal load.
Target Connection Groups:
The Integration Service enforces constraint-based loading for targets in the same target
connection group. If you want to specify constraint-based loading for multiple targets that
receive data from the same active source, you must verify the tables are in the same target
connection group. If the tables with the primary key-foreign key relationship are in different
target connection groups, the Integration Service cannot enforce constraint-based loading when
you run the workflow. To verify that all targets are in the same target connection group, complete
the following tasks:
Verify all targets are in the same target load order group and receive data from the same active
source.
Use the default partition properties and do not add partitions or partition points.
Define the same target type for all targets in the session properties.
Define the same database connection name for all targets in the session properties.
75 | P a g e
Choose normal mode for the target load type for all targets in the session properties.
Treat Rows as Insert:
Use constraint-based loading when the session option Treat Source Rows As is set to insert. You
might get inconsistent data if you select a different Treat Source Rows As option and you
configure the session for constraint-based loading.
When the mapping contains Update Strategy transformations and you need to load data to a
primary key table first, split the mapping using one of the following options:
Load primary key table in one mapping and dependent tables in another mapping. Use
constraint-based loading to load the primary table.
Perform inserts in one mapping and updates in another mapping.
Constraint-based loading does not affect the target load ordering of the mapping. Target load
ordering defines the order the Integration Service reads the sources in each target load order
group in the mapping. A target load order group is a collection of source qualifiers,
transformations, and targets linked together in a mapping. Constraint based loading establishes
the order in which the Integration Service loads individual targets within a set of targets
receiving data from a single source qualifier.
76 | P a g e
Example
The following mapping is configured to perform constraint-based loading:
In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing
the T1 primary key. T_3 has a primary key that T_4 references as a foreign key.
Since these tables receive records from a single active source, SQ_A, theIntegration Service
loads rows to the target in the following order:
1. T_1
2. T_2 and T_3 (in no particular order)
3. T_4
The Integration Service loads T_1 first because it has no foreign key dependencies and contains a
primary key referenced by T_2 and T_3. The Integration Service then loads T_2 and T_3, but
since T_2 and T_3 have no dependencies, they are not loaded in any particular order. The
Integration Service loads T_4 last, because it has a foreign key that references a primary key in
T_3.After loading the first set of targets, the Integration Service begins reading source B. If there
are no key relationships between T_5 and T_6, the Integration Service reverts to a normal load
for both targets.
If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data
from a single active source, the Aggregator AGGTRANS, the Integration Service loads rows to
the tables in the following order:
T_5
T_6
T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database
connection for each target, and you use the default partition properties. T_5 and T_6 are in
another target connection group together if you use the same database connection for each target
and you use the default partition properties. The Integration Service includes T_5 and T_6 in a
different target connection group because they are in a different target load order group from the
first four targets.
Enabling Constraint-Based Loading:
When you enable constraint-based loading, the Integration Service orders the target load on a
row-by-row basis. To enable constraint-based loading:
1. In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As
property.
2. Click the Config Object tab. In the Advanced settings, select Constraint Based Load Ordering.
3. Click OK.
77 | P a g e
78 | P a g e
initial value for the parameter or variable, the Integration Service uses a default value based on
the data type of the parameter or variable.
Data ->Default Value
Numeric ->0
String ->Empty String
Date time ->1/1/1
Variable Values: Start value and current value of a mapping variable
Start Value:
The start value is the value of the variable at the start of the session. The Integration Service
looks for the start value in the following order:
Value in parameter file
Value saved in the repository
Initial value
Default value
Current Value:
The current value is the value of the variable as the session progresses. When a session starts, the
current value of a variable is the same as the start value. The final current value for a variable is
saved to the repository at the end of a successful session. When a session fails to complete, the
Integration Service does not update the value of the variable in the repository.
Note: If a variable function is not used to calculate the current value of a mapping variable, the
start value of the variable is saved to the repository.
Variable Data type and Aggregation Type When we declare a mapping variable in a mapping, we
need to configure the Data type and aggregation type for the variable. The IS uses the aggregate
type of a Mapping variable to determine the final current value of the mapping variable.
Aggregation types are:
Count: Integer and small integer data types are valid only.
Max: All transformation data types except binary data type are valid.
Min: All transformation data types except binary data type are valid.
Variable Functions
Variable functions determine how the Integration Service calculates the current value of a
mapping variable in a pipeline.
SetMaxVariable: Sets the variable to the maximum value of a group of values. It ignores rows
marked for update, delete, or reject. Aggregation type set to Max.
SetMinVariable: Sets the variable to the minimum value of a group of values. It ignores rows
marked for update, delete, or reject. Aggregation type set to Min.
SetCountVariable: Increments the variable value by one. It adds one to the variable value when
a row is marked for insertion, and subtracts one when the row is Marked for deletion. It ignores
rows marked for update or reject. Aggregation type set to Count.
SetVariable: Sets the variable to the configured value. At the end of a session, it compares the
final current value of the variable to the start value of the variable. Based on the aggregate type
of the variable, it saves a final value to the repository.
80 | P a g e
In the Mapping Designer, click Mappings > Parameters and Variables. -or- In the Mapplet
Designer, click Mapplet > Parameters and Variables.
Select Type and Data type. Select Aggregation type for mapping variables.
82 | P a g e
PARAMETER FILE
A parameter file is a list of parameters and associated values for a workflow, worklet, or session.
Parameter files provide flexibility to change these variables each time we run a workflow or
session.
We can create multiple parameter files and change the file we use for a session or workflow. We
can create a parameter file using a text editor such as WordPad or Notepad.
Enter the parameter file name and directory in the workflow or session properties.
A parameter file contains the following types of parameters and variables:
Workflow variable: References values and records information in a workflow.
Worklet variable: References values and records information in a worklet. Use predefined
worklet variables in a parent workflow, but we cannot use workflow variables from the parent
workflow in a worklet.
Session parameter: Defines a value that can change from session to session, such as a database
connection or file name.
Mapping parameter and Mapping variable
USING A PARAMETER FILE
Parameter files contain several sections preceded by a heading. The heading identifies the
Integration Service, Integration Service process, workflow, worklet, or session to which we want
to assign parameters or variables.
Make session and workflow.
Give connection information for source and target table.
Run workflow and see result.
Sample Parameter File for Our example:
In the parameter file, folder and session names are case sensitive.
Create a text file in notepad with name Para_File.txt
[Practice.ST:s_m_MP_MV_Example]
$$Bonus=1000
$$var_max=500
$$var_min=1200
$$var_count=0
CONFIGURING PARAMTER FILE
We can specify the parameter file name and directory in the workflow or session properties.
To enter a parameter file in the workflow properties:
1. Open a Workflow in the Workflow Manager.
2. Click Workflows > Edit.
3. Click the Properties tab.
4. Enter the parameter directory and name in the Parameter Filename field.
5. Click OK.
To enter a parameter file in the session properties:
1. Open a session in the Workflow Manager.
2. Click the Properties tab and open the General Options settings.
83 | P a g e
3. Enter the parameter directory and name in the Parameter Filename field.
4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt
5. Click OK.
84 | P a g e
Mapplet
A mapplet is a reusable object that we create in the Mapplet Designer.
It contains a set of transformations and lets us reuse that transformation logic in multiple
mappings.
Created in Mapplet Designer in Designer Tool.
We need to use same set of 5 transformations in say 10 mappings. So instead of making 5
transformations in every 10 mapping, we create a mapplet of these 5 transformations. Now we
use this mapplet in all 10 mappings. Example: To create a surrogate key in target. We create a
mapplet using a stored procedure to create Primary key for target table. We give target table
name and key column name as input to mapplet and get the Surrogate key as output.
Mapplets help simplify mappings in the following ways:
Include source definitions: Use multiple source definitions and source qualifiers to provide
source data for a mapping.
Accept data from sources in a mapping
Include multiple transformations: As many transformations as we need.
Pass data to multiple transformations: We can create a mapplet to feed data to multiple
transformations. Each Output transformation in a mapplet represents one output group in a
mapplet.
Contain unused ports: We do not have to connect all mapplet input and output ports in a
mapping.
Mapplet Input:
Mapplet input can originate from a source definition and/or from an Input transformation in the
mapplet. We can create multiple pipelines in a mapplet.
We use Mapplet Input transformation to give input to mapplet.
Use of Mapplet Input transformation is optional.
Mapplet Output:
The output of a mapplet is not connected to any target table.
We must use Mapplet Output transformation to store mapplet output.
A mapplet must contain at least one Output transformation with at least one connected port in the
mapplet.
Example1: We will join EMP and DEPT table. Then calculate total salary. Give the output to
mapplet out transformation.
EMP and DEPT will be source tables.
Output will be given to transformation Mapplet_Out.
Steps:
Open folder where we want to create the mapping.
Click Tools -> Mapplet Designer.
Click Mapplets-> Create-> Give name. Ex: mplt_example1
Drag EMP and DEPT table.
Use Joiner transformation as described earlier to join them.
Transformation -> Create -> Select Expression for list -> Create -> Done
85 | P a g e
Pass all ports from joiner to expression and then calculate total salary as described in
expression transformation.
Now Transformation -> Create -> Select Mapplet Out from list > Create -> Give name and
then done.
Pass all ports from expression to Mapplet output.
Mapplet -> Validate
Repository -> Save
Use of mapplet in mapping:
We can mapplet in mapping by just dragging the mapplet from mapplet folder on left pane as we
drag source and target tables.
When we use the mapplet in a mapping, the mapplet object displays only the ports from the Input
and Output transformations. These are referred to as the mapplet input and mapplet output ports.
Make sure to give correct connection information in session.
Making a mapping: We will use mplt_example1, and then create a filter
transformation to filter records whose Total Salary is >= 1500.
mplt_example1 will be source.
Create target table same as Mapplet_out transformation as in picture above. Creating Mapping
Open folder where we want to create the mapping.
Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_mplt_example1
Drag mplt_Example1 and target table.
Transformation -> Create -> Select Filter for list -> Create -> Done.
Drag all ports from mplt_example1 to filter and give filter condition.
Connect all ports from filter to target. We can add more transformations after filter if needed.
Validate mapping and Save it.
Make session and workflow.
Give connection information for mapplet source tables.
Give connection information for target table.
Run workflow and see result.
86 | P a g e
87 | P a g e
Sample:
D:\EMP1.txt
E:\EMP2.txt
E:\FILES\DWH\EMP3.txt and so on
3. Now make a session and in Source file name and Source File Directory location fields, give
the name and location of above created file.
4. In Source file type field, select Indirect.
5. Click Apply.
6. Validate Session
7. Make Workflow. Save it to repository and run.
88 | P a g e
Incremental Aggregation
When we enable the session option-> Incremental Aggregation the Integration Service
performs incremental aggregation, it passes source data through the mapping and uses historical
cache data to perform aggregation calculations incrementally.
When using incremental aggregation, you apply captured changes in the source to aggregate
calculations in a session. If the source changes incrementally and you can capture changes, you
can configure the session to process those changes. This allows the Integration Service to update
the target incrementally, rather than forcing it to process the entire source and recalculate the
same data each time you run the session.
For example, you might have a session using a source that receives new data every day. You can
capture those incremental changes because you have added a filter condition to the mapping that
removes pre-existing data from the flow of data. You then enable incremental aggregation.
When the session runs with incremental aggregation enabled for the first time on March 1, you
use the entire source. This allows the Integration Service to read and store the necessary
aggregate data. On March 2, when you run the session again, you filter out all the records except
those time-stamped March 2. The Integration Service then processes the new data and updates
the target accordingly. Consider using incremental aggregation in the following circumstances:
You can capture new source data. Use incremental aggregation when you can capture new source
data each time you run the session. Use a Stored Procedure or Filter transformation to process
new data.
Incremental changes do not significantly change the target. Use incremental aggregation when
the changes do not significantly change the target. If processing the incrementally changed
source alters more than half the existing target, the session may not benefit from using
incremental aggregation. In this case, drop the table and recreate the target with complete source
data.
Note: Do not use incremental aggregation if the mapping contains percentile or median
functions. The Integration Service uses system memory to process these functions in addition to
the cache memory you configure in the session properties. As a result, the Integration Service
does not store incremental aggregation values for percentile and median functions in disk caches.
Integration Service Processing for Incremental Aggregation
(i)The first time you run an incremental aggregation session, the Integration Service processes
the entire source. At the end of the session, the Integration Service stores aggregate data from
that session run in two files, the index file and the data file. The Integration Service creates the
files in the cache directory specified in the Aggregator transformation properties.
(ii)Each subsequent time you run the session with incremental aggregation, you use the
incremental source changes in the session. For each input record, the Integration Service checks
historical information in the index file for a corresponding group. If it finds a corresponding
group, the Integration Service performs the aggregate operation incrementally, using the
aggregate data for that group, and saves the incremental change. If it does not find a
corresponding group, the Integration Service creates a new group and saves the record data.
89 | P a g e
(iii)When writing to the target, the Integration Service applies the changes to the existing target.
It saves modified aggregate data in the index and data files to be used as historical data the next
time you run the session.
(iv) If the source changes significantly and you want the Integration Service to continue saving
aggregate data for future incremental changes, configure the Integration Service to overwrite
existing aggregate data with new aggregate data.
Each subsequent time you run a session with incremental aggregation, the Integration Service
creates a backup of the incremental aggregation files. The cache directory for the Aggregator
transformation must contain enough disk space for two sets of the files.
(v)When you partition a session that uses incremental aggregation, the Integration Service
creates one set of cache files for each partition.
The Integration Service creates new aggregate data, instead of using historical data, when you
perform one of the following tasks:
Save a new version of the mapping.
Configure the session to reinitialize the aggregate cache.
Move the aggregate files without correcting the configured path or directory for the files in the
session properties.
Change the configured path or directory for the aggregate files without moving the files to the
new location.
Delete cache files.
Decrease the number of partitions.
When the Integration Service rebuilds incremental aggregation files, the data in the previous
files is lost.
Note: To protect the incremental aggregation files from file corruption or disk failure,
periodically back up the files.
Preparing for Incremental Aggregation:
When you use incremental aggregation, you need to configure both mapping and session
properties:
Implement mapping logic or filter to remove pre-existing data.
Configure the session for incremental aggregation and verify that the file directory has enough
disk space for the aggregate files.
Configuring the Mapping
Before enabling incremental aggregation, you must capture changes in source data. You can use
a Filter or Stored Procedure transformation in the mapping to remove pre-existing source data
during a session.
Configuring the Session
Use the following guidelines when you configure the session for incremental aggregation:
(i) Verify the location where you want to store the aggregate files.
The index and data files grow in proportion to the source data. Be sure the cache directory has
enough disk space to store historical data for the session.
90 | P a g e
When you run multiple sessions with incremental aggregation, decide where you want the files
stored. Then, enter the appropriate directory for the process variable, $PMCacheDir, in the
Workflow Manager. You can enter session-specific directories for the index and data files.
However, by using the process variable for all sessions using incremental aggregation, you can
easily change the cache directory when necessary by changing $PMCacheDir.
Changing the cache directory without moving the files causes the Integration Service to
reinitialize the aggregate cache and gather new aggregate data.
In a grid, Integration Services rebuild incremental aggregation files they cannot find. When an
Integration Service rebuilds incremental aggregation files, it loses aggregate history.
(ii) Verify the incremental aggregation settings in the session properties.
You can configure the session for incremental aggregation in the Performance settings on the
Properties tab.
You can also configure the session to reinitialize the aggregate cache. If you choose to reinitialize
the cache, the Workflow Manager displays a warning indicating the Integration Service
overwrites the existing cache and a reminder to clear this option after running the session.
91 | P a g e
92 | P a g e
93 | P a g e
The following statement adds a new set of cities ('KOCHI', 'MANGALORE') to an existing
partition list.
ALTER TABLE customers
MODIFY PARTITION south_india
ADD VALUES ('KOCHI', 'MANGALORE');
The statement below drops a set of cities (KOCHI' and 'MANGALORE') from an existing
partition value list.
ALTER TABLE customers
MODIFY PARTITION south_india
DROP VALUES (KOCHI,MANGALORE);
SPLITTING PARTITIONS
You can split a single partition into two partitions. For example to split the partition p5 of sales
table into two partitions give the following command.
Alter table sales split partition p5 into
(Partition p6 values less than (1996),
Partition p7 values less then (MAXVALUE));
TRUNCATING PARTITON
Truncating a partition will delete all rows from the partition.
To truncate a partition give the following statement
Alter table sales truncate partition p5;
LISTING INFORMATION ABOUT PARTITION TABLES
To see how many partitioned tables are there in your schema give the following statement
Select * from user_part_tables;
To see on partition level partitioning information
Select * from user_tab_partitions;
94 | P a g e
1.
2.
3.
4.
5.
6.
7.
8.
9.
1.
2.
TASKS
The Workflow Manager contains many types of tasks to help you build workflows and worklets.
We can create reusable tasks in the Task Developer.
Types of tasks:
Task Type
Tool where task can Reusable or not
be created
Session
Task Developer
Yes
Email
Workflow Designer Yes
Command
Worklet Designer Yes
Event-Raise
Workflow Designer No
Event-Wait
Worklet Designer No
Timer
No
Decision
No
Assignment
No
Control
No
SESSION TASK
A session is a set of instructions that tells the Power Center Server how and when to move data
from sources to targets.
To run a session, we must first create a workflow to contain the Session task.
We can run as many sessions in a workflow as we need. We can run the Session tasks
sequentially or concurrently, depending on our needs.
The Power Center Server creates several files and in-memory caches depending on the
transformations and options used in the session.
EMAIL TASK
The Workflow Manager provides an Email task that allows us to send email during a workflow.
Created by Administrator usually and we just drag and use it in our mapping.
Steps:
In the Task Developer or Workflow Designer, choose Tasks-Create.
Select an Email task and enter a name for the task. Click Create.
Click Done.
Double-click the Email task in the workspace. The Edit Tasks dialog box appears.
Click the Properties tab.
Enter the fully qualified email address of the mail recipient in the Email User Name field.
Enter the subject of the email in the Email Subject field. Or, you can leave this field blank.
Click the Open button in the Email Text field to open the Email Editor.
Click OK twice to save your changes.
Example: To send an email when a session completes:
Steps:
Create a workflow wf_sample_email
Drag any session task to workspace.
95 | P a g e
6. $S_M_FILTER_EXAMPLE.Status=SUCCEEDED
7. Workflow-> Validate
8. Repository > Save
WORKING WITH EVENT TASKS
We can define events in the workflow to specify the sequence of task execution.
Types of Events:
Pre-defined event: A pre-defined event is a file-watch event. This event Waits for a specified
file to arrive at a given location.
User-defined event: A user-defined event is a sequence of tasks in the Workflow. We create
events and then raise them as per need.
Steps for creating User Defined Event:
1. Open any workflow where we want to create an event.
2. Click Workflow-> Edit -> Events tab.
3. Click to Add button to add events and give the names as per need.
4. Click Apply -> Ok. Validate the workflow and Save it.
Types of Events Tasks:
EVENT RAISE: Event-Raise task represents a user-defined event. We use this task to raise a
user defined event.
EVENT WAIT: Event-Wait task waits for a file watcher event or user defined event to occur
before executing the next session in the workflow.
Example1: Use an event wait task and make sure that session s_filter_example runs when abc.txt
file is present in D:\FILES folder.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok.
2. Task -> Create -> Select Event Wait. Give name. Click create and done.
3. Link Start to Event Wait task.
4. Drag s_filter_example to workspace and link it to event wait task.
5. Right click on event wait task and click EDIT -> EVENTS tab.
6. Select Pre Defined option there. In the blank space, give directory and filename to watch.
Example: D:\FILES\abc.tct
7. Workflow validate and Repository Save.
Example 2: Raise a user defined event when session s_m_filter_example succeeds. Capture this
event in event wait task and run session S_M_TOTAL_SAL_EXAMPLE
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok.
2. Workflow -> Edit -> Events Tab and add events EVENT1 there.
3. Drag s_m_filter_example and link it to START task.
4. Click Tasks -> Create -> Select EVENT RAISE from list. Give name
5. ER_Example. Click Create and then done. Link ER_Example to s_m_filter_example.
6. Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User Defined Event and
Select EVENT1 from the list displayed. Apply -> OK.
97 | P a g e
7. Click link between ER_Example and s_m_filter_example and give the condition
$S_M_FILTER_EXAMPLE.Status=SUCCEEDED
8. Click Tasks -> Create -> Select EVENT WAIT from list. Give name EW_WAIT. Click Create
and then done.
9. Link EW_WAIT to START task.
10.
Right click EW_WAIT -> EDIT-> EVENTS tab.
11.
Select User Defined there. Select the Event1 by clicking Browse Events button.
12.
Apply -> OK.
13.
Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT.
14.
Mapping -> Validate
15.
Repository -> Save.
Run workflow and see.
TIMER TASK
The Timer task allows us to specify the period of time to wait before the Power Center Server
runs the next task in the workflow. The Timer task has two types of settings:
Absolute time: We specify the exact date and time or we can choose a user-defined workflow
variable to specify the exact time. The next task in workflow will run as per the date and time
specified.
Relative time: We instruct the Power Center Server to wait for a specified period of time after
the Timer task, the parent workflow, or the top-level workflow starts.
Example: Run session s_m_filter_example relative to 1 min after the timer task.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_timer_task_example -> Click ok.
2. Click Tasks -> Create -> Select TIMER from list. Give name TIMER_Example. Click Create and
then done.
3. Link TIMER_Example to START task.
4. Right click TIMER_Example-> EDIT -> TIMER tab.
5. Select Relative Time Option and Give 1 min and Select From start time of this task Option.
6. Apply -> OK.
7. Drag s_m_filter_example and link it to TIMER_Example.
8. Workflow-> Validate and Repository -> Save.
DECISION TASK
The Decision task allows us to enter a condition that determines the execution of the workflow,
similar to a link condition.
The Decision task has a pre-defined variable called $Decision_task_name.condition that
represents the result of the decision condition.
The Power Center Server evaluates the condition in the Decision task and sets the pre-defined
condition variable to True (1) or False (0).
We can specify one decision condition per Decision task.
Example: Command Task should run only if either s_m_filter_example or
S_M_TOTAL_SAL_EXAMPLE succeeds. If any of s_m_filter_example or
98 | P a g e
1.
2.
3.
4.
5.
6.
Workflow -> Create -> Give name wf_control_task_example -> Click ok.
Drag any 3 sessions to workspace and link all of them to START task.
Click Tasks -> Create -> Select CONTROL from list. Give name cntr_task.
Click Create and then done.
Link all sessions to the control task cntr_task.
Double click link between cntr_task and any session say s_m_filter_example and give the
condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED.
7. Repeat above step for remaining 2 sessions also.
8. Right click cntr_task-> EDIT -> GENERAL tab. Set Treat Input Links As to OR. Default is
AND.
9. Go to PROPERTIES tab of cntr_task and select the value Fail top level
10.
Workflow for Control Option. Click Apply and OK.
11.
Workflow Validate and repository Save.
Run workflow and see the result.
ASSIGNMENT TASK
The Assignment task allows us to assign a value to a user-defined workflow variable.
See Workflow variable topic to add user defined variables.
To use an Assignment task in the workflow, first create and add the
Assignment task to the workflow. Then configure the Assignment task to assign values or
expressions to user-defined variables.
100 | P a g e
Scheduler
We can schedule a workflow to run continuously, repeat at a given time or interval, or we can
manually start a workflow. The Integration Service runs a scheduled workflow as configured.
By default, the workflow runs on demand. We can change the schedule settings by editing the
scheduler. If we change schedule settings, the Integration Service reschedules the workflow
according to the new settings.
The Workflow Manager marks a workflow invalid if we delete the scheduler associated
with the workflow.
If we choose a different Integration Service for the workflow or restart the Integration
Service, it reschedules all workflows.
If we delete a folder, the Integration Service removes workflows from the schedule.
For each folder, the Workflow Manager lets us create reusable schedulers so we can reuse
the same set of scheduling settings for workflows in the folder.
Use a reusable scheduler so we do not need to configure the same set of scheduling
settings in each workflow.
When we delete a reusable scheduler, all workflows that use the deleted scheduler
becomes invalid. To make the workflows valid, we must edit them and replace the missing
scheduler.
101 | P a g e
Steps:
1.
Open the folder where we want to create the scheduler.
2.
3.
4.
5.
6.
Run on Demand
2.
Run Continuously
3.
102 | P a g e
1. Run on Demand:
Integration Service runs the workflow when we start the workflow manually.
2. Run Continuously:
Integration Service runs the workflow as soon as the service initializes. The Integration Service
then starts the next run of the workflow as soon as it finishes the previous run.
3. Run on Server initialization
Integration Service runs the workflow as soon as the service is initialized. The Integration
Service then starts the next run of the workflow according to settings in Schedule Options.
Schedule options for Run on Server initialization:
Customized Repeat: Integration Service runs the workflow on the dates and times
specified in the Repeat dialog box.
Start options for Run on Server initialization:
Start Date
Start Time
End options for Run on Server initialization:
End After: IS stops scheduling the workflow after the set number of
Workflow runs.
Forever: IS schedules the workflow as long as the workflow does not fail.
Creating a Non-Reusable Scheduler
1.
2.
3.
In the Scheduler tab, choose Non-reusable. Select Reusable if we want to select an
existing reusable scheduler for the workflow.
4.
5.
6.
Click the right side of the Scheduler field to edit scheduling settings for the non- reusable
scheduler
7.
103 | P a g e
8.
9.
Click Ok.
Points to Ponder:
To remove a workflow from its schedule, right-click the workflow in the Navigator
window and choose Unscheduled Workflow.
104 | P a g e
105 | P a g e
106 | P a g e
When you configure a session for full optimization, the Integration Service analyzes the mapping
from the source to the target or until it reaches a downstream transformation it cannot push to the
target database. If the Integration Service cannot push all transformation logic to the target
database, it tries to push all transformation logic to the source database. If it cannot push all
transformation logic to the source or target, the Integration Service pushes as much
transformation logic to the source database, processes intermediate transformations that it cannot
push to any database, and then pushes the remaining transformation logic to the target database.
The Integration Service generates and executes an INSERT SELECT, DELETE, or UPDATE
statement for each database to which it pushes transformation logic.
For example, a mapping contains the following transformations:
The Rank transformation cannot be pushed to the source or target database. If you configure the
session for full pushdown optimization, the Integration Service pushes the Source Qualifier
transformation and the Aggregator transformation to the source, processes the Rank
transformation, and pushes the Expression transformation and target to the target database. The
Integration Service does not fail the session if it can push only part of the transformation logic to
the database.
107 | P a g e
108 | P a g e
datatypes. Transformation datatypes use a default numeric precision that can vary from the
native datatypes. For example, a transformation Decimal datatype has a precision of 1-28. The
corresponding Teradata Decimal datatype has a precision of 1-18. The results can vary if the
database uses a different precision than the Integration Service.
Using ODBC Drivers
When you use native drivers for all databases, except Netezza, the Integration Service generates
SQL statements using native database SQL. When you use ODBC drivers, the Integration
Service usually cannot detect the database type. As a result, it generates SQL statements using
ANSI SQL. The Integration Service can generate more functions when it generates SQL
statements using the native language than ANSI SQL.
Note: Although the Integration Service uses an ODBC driver for the Netezza database, the
Integration Service detects that the database is Netezza and generates native database SQL when
pushing the transformation logic to the Netezza database.
In some cases, ANSI SQL is not compatible with the database syntax. The following sections
describe problems that you can encounter when you use ODBC drivers. When possible, use
native drivers to prevent these problems.
110 | P a g e
millisecond. If the lookup result is 8:20:35.123456, a database returns 8:20:35.123456, but the
Integration Service returns 8:20:35.123.
SYSDATE built-in variable. When you use the SYSDATE built-in variable, the Integration
Service returns the current date and time for the node running the service process. However,
when you push the transformation logic to the database, the SYSDATE variable returns the
current date and time for the machine hosting the database. If the time zone of the machine
hosting the database is not the same as the time zone of the machine running the Integration
Service process, the results
112 | P a g e