You are on page 1of 73

Mapping parameters and variables represent values in mappings and mapplets.

When we use a mapping parameter or variable in a mapping, first we declare the mapping
parameter or variable for use in each mapplet or mapping. Then, we define a value for the
mapping parameter or variable before we run the session.
MAPPING PARAMETERS

A mapping parameter represents a constant value that we can define before running a
session.

A mapping parameter retains the same value throughout the entire session.

Example: When we want to extract records of a particular month during ETL process, we will
create a Mapping Parameter of data type and use it in query to compare it with the timestamp
field in SQL override.

After we create a parameter, it appears in the Expression Editor.

We can then use the parameter in any expression in the mapplet or mapping.

We can also use parameters in a source qualifier filter, user-defined join, or extract
override, and in the Expression Editor of reusable transformations.

MAPPING VARIABLES

Unlike mapping parameters, mapping variables are values that can change between
sessions.

The Integration Service saves the latest value of a mapping variable to the repository
at the end of each successful session.

We can override a saved value with the parameter file.

We can also clear all saved values for the session in the Workflow Manager.

We might use a mapping variable to perform an incremental read of the source. For example,
we have a source table containing time stamped transactions and we want to evaluate the
transactions on a daily basis. Instead of manually entering a session override to filter source
data each time we run the session, we can create a mapping variable, $$IncludeDateTime. In
the source qualifier, create a filter to read only rows whose transaction date equals $
$IncludeDateTime, such as:

TIMESTAMP = $$IncludeDateTime
In the mapping, use a variable function to set the variable value to increment one day each
time the session runs. If we set the initial value of $$IncludeDateTime to 8/1/2004, the first
time the Integration Service runs the session, it reads only rows dated 8/1/2004. During the
session, the Integration Service sets $$IncludeDateTime to 8/2/2004. It saves 8/2/2004 to the
repository at the end of the session. The next time it runs the session, it reads only rows from
August 2, 2004.
Used in following transformations:

Expression

Filter

Router

Update Strategy

Initial and Default Value:


When we declare a mapping parameter or variable in a mapping or a mapplet, we can enter
an initial value. When the Integration Service needs an initial value, and we did not declare an
initial value for the parameter or variable, the Integration Service uses a default value based
on the data type of the parameter or variable.
Data ->Default Value
Numeric ->0
String ->Empty String
Date time ->1/1/1
Variable Values: Start value and current value of a mapping variable
Start Value:
The start value is the value of the variable at the start of the session. The Integration Service
looks for the start value in the following order:
1.

Value in parameter file

2.

Value saved in the repository

3.

Initial value

4.

Default value

Current Value:
The current value is the value of the variable as the session progresses. When a session starts,
the current value of a variable is the same as the start value. The final current value for a
variable is saved to the repository at the end of a successful session. When a session fails to
complete, the Integration Service does not update the value of the variable in the repository.
Note: If a variable function is not used to calculate the current value of a mapping variable,
the start value of the variable is saved to the repository.
Variable Data type and Aggregation Type When we declare a mapping variable in a mapping,
we need to configure the Data type and aggregation type for the variable. The IS uses the
aggregate type of a Mapping variable to determine the final current value of the mapping
variable.
Aggregation types are:

Count: Integer and small integer data types are valid only.

Max: All transformation data types except binary data type are valid.

Min: All transformation data types except binary data type are valid.

Variable Functions
Variable functions determine how the Integration Service calculates the current value of a
mapping variable in a pipeline.
SetMaxVariable: Sets the variable to the maximum value of a group of values. It ignores
rows marked for update, delete, or reject. Aggregation type set to Max.
SetMinVariable: Sets the variable to the minimum value of a group of values. It ignores rows
marked for update, delete, or reject. Aggregation type set to Min.
SetCountVariable: Increments the variable value by one. It adds one to the variable value
when a row is marked for insertion, and subtracts one when the row is Marked for deletion. It
ignores rows marked for update or reject. Aggregation type set to Count.

SetVariable: Sets the variable to the configured value. At the end of a session, it compares
the final current value of the variable to the start value of the variable. Based on the
aggregate type of the variable, it saves a final value to the repository.
Creating Mapping Parameters and Variables
1.

Open the folder where we want to create parameter or variable.

2.

In the Mapping Designer, click Mappings > Parameters and Variables. -or- In the
Mapplet Designer, click Mapplet > Parameters and Variables.

3.

Click the add button.

4.

Enter name. Do not remove $$ from name.

5.

Select Type and Data type. Select Aggregation type for mapping variables.

6.

Give Initial Value. Click ok.

Example: Use of Mapping of Mapping Parameters and Variables

EMP will be source table.

Create a target table MP_MV_EXAMPLE having columns: EMPNO, ENAME, DEPTNO,


TOTAL_SAL, MAX_VAR, MIN_VAR, COUNT_VAR and SET_VAR.

TOTAL_SAL = SAL+ COMM + $$BONUS (Bonus is mapping parameter that changes


every month)

SET_VAR: We will be added one month to the HIREDATE of every employee.

Create shortcuts as necessary.

Creating Mapping
1.

Open folder where we want to create the mapping.

2.

Click Tools -> Mapping Designer.

3.

Click Mapping-> Create-> Give name. Ex: m_mp_mv_example

4.

Drag EMP and target table.

5.

Transformation -> Create -> Select Expression for list -> Create > Done.

6.

Drag EMPNO, ENAME, HIREDATE, SAL, COMM and DEPTNO to Expression.

7.

Create Parameter $$Bonus and Give initial value as 200.

8.

Create variable $$var_max of MAX aggregation type and initial value 1500.

9.

Create variable $$var_min of MIN aggregation type and initial value 1500.

10. Create variable $$var_count of COUNT aggregation type and initial value 0. COUNT is
visible when datatype is INT or SMALLINT.
11. Create variable $$var_set of MAX aggregation type.

12. Create 5 output ports out_ TOTAL_SAL, out_MAX_VAR, out_MIN_VAR,


out_COUNT_VAR and out_SET_VAR.
13. Open expression editor for TOTAL_SAL. Do the same as we did earlier for SAL+ COMM. To
add $$BONUS to it, select variable tab and select the parameter from mapping parameter. SAL
+ COMM + $$Bonus
14. Open Expression editor for out_max_var.
15. Select the variable function SETMAXVARIABLE from left side pane. Select
$$var_max from variable tab and SAL from ports tab as shown below.SETMAXVARIABLE($
$var_max,SAL)

17. Open Expression editor for out_min_var and write the following expression:
SETMINVARIABLE($$var_min,SAL). Validate the expression.
18. Open Expression editor for out_count_var and write the following expression:
SETCOUNTVARIABLE($$var_count). Validate the expression.
19. Open Expression editor for out_set_var and write the following expression:
SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate.
20. Click OK. Expression Transformation below:

21. Link all ports from expression to target and Validate Mapping and Save it.
22. See mapping picture on next page.

PARAMETER FILE

A parameter file is a list of parameters and associated values for a workflow, worklet,
or session.

Parameter files provide flexibility to change these variables each time we run a
workflow or session.

We can create multiple parameter files and change the file we use for a session or
workflow. We can create a parameter file using a text editor such as WordPad or
Notepad.

Enter the parameter file name and directory in the workflow or session properties.

A parameter file contains the following types of parameters and variables:

Workflow variable: References values and records information in a workflow.

Worklet variable: References values and records information in a worklet. Use


predefined worklet variables in a parent workflow, but we cannot use workflow
variables from the parent workflow in a worklet.

Session parameter: Defines a value that can change from session to session, such
as a database connection or file name.

Mapping parameter and Mapping variable

USING A PARAMETER FILE


Parameter files contain several sections preceded by a heading. The heading identifies the
Integration Service, Integration Service process, workflow, worklet, or session to which we
want to assign parameters or variables.

Make session and workflow.

Give connection information for source and target table.

Run workflow and see result.

Sample Parameter File for Our example:


In the parameter file, folder and session names are case sensitive.
Create a text file in notepad with name Para_File.txt
[Practice.ST:s_m_MP_MV_Example]
$$Bonus=1000
$$var_max=500
$$var_min=1200
$$var_count=0
CONFIGURING PARAMTER FILE

We can specify the parameter file name and directory in the workflow or session properties.
To enter a parameter file in the workflow properties:
1. Open a Workflow in the Workflow Manager.
2. Click Workflows > Edit.
3. Click the Properties tab.
4. Enter the parameter directory and name in the Parameter Filename field.
5. Click OK.
To enter a parameter file in the session properties:
1. Open a session in the Workflow Manager.
2. Click the Properties tab and open the General Options settings.
3. Enter the parameter directory and name in the Parameter Filename field.
4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt
5. Click OK.

Reactions:

Unit testing can be broadly classified into 2 categories.


Quantitative Testing
Validate your Source and Target
a) Ensure that your connectors are configured properly.
b) If you are using flat file make sure have enough read/write permission on the file share.
c) You need to document all the connector information.
Analyze the Load Time
a) Execute the session and review the session statistics.
b) Check the Read and Write counters. How long it takes to perform the load.

c) Use the session and workflow logs to capture the load statistics.
d) You need to document all the load timing information.
Analyze the success rows and rejections.
a) Have customized SQL queries to check the source/targets and here we will perform the
Record Count Verification.
b) Analyze the rejections and build a process to handle those rejections. This requires a clear
business requirement from the business on how to handle the data rejections. Do we need to
reload or reject and inform etc? Discussions are required and appropriate process must be
developed.
Performance Improvement
a) Network Performance
b) Session Performance
c) Database Performance
d) Analyze and if required define the Informatica and DB partitioning requirements.
Qualitative Testing
Analyze & validate your transformation business rules. More of functional testing.
e) You need review field by field from source to target and ensure that the required
transformation logic is applied.
f) If you are making changes to existing mappings make use of the data lineage feature
Available with Informatica Power Center. This will help you to find the consequences of Altering
or deleting a port from existing mapping.
g) Ensure that appropriate dimension lookups have been used and your development is in
Sync with your business requirements.
INFORMATICA TESTING:
Debugger: Very useful tool for debugging a valid mapping to gain troubleshooting information
about data and error conditions. Refer Informatica documentation to know more about
debugger tool.
Test Load Options Relational Targets.
Running the Integration Service in Safe Mode

Test a development environment. Run the Integration Service in safe mode to test

a development environment before migrating to production


Troubleshoot the Integration Service. Configure the Integration Service to fail
over in safe mode and troubleshoot errors when you migrate or test a production

environment configured for high availability. After the Integration Service fails over in
safe mode, you can correct the error that caused the Integration Service to fail over.
Syntax Testing: Test your customized queries using your source qualifier before executing
the session. Performance Testing for identifying the following bottlenecks:

Target
Source
Mapping
Session
System

Use the following methods to identify performance bottlenecks:

Run test sessions. You can configure a test session to read from a flat file source or

to write to a flat file target to identify source and target bottlenecks.


Analyze performance details. Analyze performance details, such as performance

counters, to determine where session performance decreases.


Analyze thread statistics. Analyze thread statistics to determine the optimal number

of partition points.
Monitor system performance. You can use system monitoring tools to view the
percentage of CPU use, I/O waits, and paging to identify system bottlenecks. You can
also use the Workflow Monitor to view system resource usage. Use Power Center

conditional filter in the Source Qualifier to improve performance.


Share metadata. You can share metadata with a third party. For example, you want
to send a mapping to someone else for testing or analysis, but you do not want to
disclose repository connection information for security reasons. You can export the
mapping to an XML file and edit the repository connection information before sending
the XML file. The third party can import the mapping from the XML file and analyze the
metadata.

UAT:
In this phase you will involve the user to test the end results and ensure that business is
satisfied with the quality of the data.
Any changes to the business requirement will follow the change management process and
eventually those changes have to follow the SDLC process.
Optimize Development, Testing, and Training Systems

Dramatically accelerate development and test cycles and reduce storage costs by
creating fully functional, smaller targeted data subsets for development, testing, and

training systems, while maintaining full data integrity.


Quickly build and update nonproduction systems with a small subset of production

data and replicate current subsets of nonproduction copies faster.


Simplify test data management and shrink the footprint of nonproduction systems to
significantly reduce IT infrastructure and maintenance costs.

Reduce application and upgrade deployment risks by properly testing configuration

updates with up-to-date, realistic data before introducing them into production .
Easily customize provisioning rules to meet each organizations changing business

requirements.
Lower training costs by standardizing on one approach and one infrastructure.
Train employees effectively using reliable, production-like data in training systems.

Support Corporate Divestitures and Reorganizations

Untangle complex operational systems and separate data along business lines to

quickly build the divested organizations system.


Accelerate the provisioning of new systems by using only data thats relevant to the

divested organization.
Decrease the cost and time of data divestiture with no reimplementation costs .

Reduce the Total Cost of Storage Ownership

Dramatically increase an IT teams productivity by reusing a comprehensive list of data


objects for data selection and updating processes across multiple projects, instead of

coding by handwhich is expensive, resource intensive, and time consuming .


Accelerate application delivery by decreasing R&D cycle time and streamlining test

data management.
Improve the reliability of application delivery by ensuring IT teams have ready access

to updated quality production data.


Lower administration costs by centrally managing data growth solutions across all

packaged and custom applications.


Substantially accelerate time to value for subsets of packaged applications.
Decrease maintenance costs by eliminating custom code and scripting.

Constraint-Based Loading:
Constraint-Based Loading:

In the Workflow Manager, you can specify constraint-based loading for a session. When you
select this option, the Integration Service orders the target load on a row-by-row basis. For
every row generated by an active source, the Integration Service loads the corresponding
transformed row first to the primary key table, then to any foreign key tables. Constraintbased loading depends on the following requirements:

Active source. Related target tables must have the same active source.

Key relationships. Target tables must have key relationships.

Target connection groups. Targets must be in one target connection group.

Treat rows as insert. Use this option when you insert into the target. You cannot use
updates with constraint based loading.

Active Source:
When target tables receive rows from different active sources, the Integration Service reverts
to normal loading for those tables, but loads all other targets in the session using constraintbased loading when possible. For example, a mapping contains three distinct pipelines. The
first two contain a source, source qualifier, and target. Since these two targets receive data
from different active sources, the Integration Service reverts to normal loading for both
targets. The third pipeline contains a source, Normalizer, and two targets. Since these two
targets share a single active source (the Normalizer), the Integration Service performs
constraint-based loading: loading the primary key table first, then the foreign key table.
Key Relationships:
When target tables have no key relationships, the Integration Service does not perform
constraint-based loading.
Similarly, when target tables have circular key relationships, the Integration Service reverts to
a normal load. For example, you have one target containing a primary key and a foreign key
related to the primary key in a second target. The second target also contains a foreign key
that references the primary key in the first target. The Integration Service cannot enforce
constraint-based loading for these tables. It reverts to a normal load.
Target Connection Groups:
The Integration Service enforces constraint-based loading for targets in the same target
connection group. If you want to specify constraint-based loading for multiple targets that
receive data from the same active source, you must verify the tables are in the same target
connection group. If the tables with the primary key-foreign key relationship are in different
target connection groups, the Integration Service cannot enforce constraint-based loading
when you run the workflow. To verify that all targets are in the same target connection group,
complete the following tasks:

Verify all targets are in the same target load order group and receive data from the
same active source.

Use the default partition properties and do not add partitions or partition points.

Define the same target type for all targets in the session properties.

Define the same database connection name for all targets in the session properties.

Choose normal mode for the target load type for all targets in the session properties.

Treat Rows as Insert:


Use constraint-based loading when the session option Treat Source Rows As is set to insert.
You might get inconsistent data if you select a different Treat Source Rows As option and you
configure the session for constraint-based loading.
When the mapping contains Update Strategy transformations and you need to load data to a
primary key table first, split the mapping using one of the following options:

Load primary key table in one mapping and dependent tables in another mapping. Use
constraint-based loading to load the primary table.

Perform inserts in one mapping and updates in another mapping.

Constraint-based loading does not affect the target load ordering of the mapping. Target load
ordering defines the order the Integration Service reads the sources in each target load order
group in the mapping. A target load order group is a collection of source qualifiers,
transformations, and targets linked together in a mapping. Constraint based loading
establishes the order in which the Integration Service loads individual targets within a set of
targets receiving data from a single source qualifier.
Example
The following mapping is configured to perform constraint-based loading:

In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys
referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign key.
Since these tables receive records from a single active source, SQ_A, the Integration Service
loads rows to the target in the following order:
1. T_1
2. T_2 and T_3 (in no particular order)
3. T_4
The Integration Service loads T_1 first because it has no foreign key dependencies and
contains a primary key referenced by T_2 and T_3. The Integration Service then loads T_2
and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any particular
order. The Integration Service loads T_4 last, because it has a foreign key that references a
primary key in T_3.After loading the first set of targets, the Integration Service begins reading

source B. If there are no key relationships between T_5 and T_6, the Integration Service
reverts to a normal load for both targets.
If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data
from a single active source, the Aggregator AGGTRANS, the Integration Service loads rows to
the tables in the following order:

T_5

T_6

T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database
connection for each target, and you use the default partition properties. T_5 and T_6 are in
another target connection group together if you use the same database connection for each
target and you use the default partition properties. The Integration Service includes T_5 and
T_6 in a different target connection group because they are in a different target load order
group from the first four targets.
Enabling Constraint-Based Loading:
When you enable constraint-based loading, the Integration Service orders the target load on a
row-by-row basis. To enable constraint-based loading:
1.

In the General Options settings of the Properties tab, choose Insert for the Treat
Source Rows As property.

2.

Click the Config Object tab. In the Advanced settings, select Constraint Based Load
Ordering.

3.

Click OK.

Informatica Experienced Interview Questions part 1

18
1.

Difference between Informatica 7x and 8x?

2.

Difference between connected and unconnected lookup transformation in Informatica?

3.

Difference between stop and abort in Informatica?

4.

Difference between Static and Dynamic caches?

5.

What is Persistent Lookup cache? What is its significance?

6.

Difference between and reusable transformation and mapplet?

7.

How the Informatica server sorts the string values in Rank transformation?

8.

Is sorter an active or passive transformation? When do we consider it to be active and


passive?

9.

Explain about Informatica server Architecture?

10. In update strategy Relational table or flat file which gives us more performance? Why?
11. What are the out put files that the Informatica server creates during running a
session?
12. Can you explain what are error tables in Informatica are and how we do error handling
in Informatica?
13. Difference between constraint base loading and target load plan?
14. Difference between IIF and DECODE function?
15. How to import oracle sequence into Informatica?
16. What is parameter file?
17. Difference between Normal load and Bulk load?
18. How u will create header and footer in target using Informatica?
19. What are the session parameters?
20. Where does Informatica store rejected data? How do we view them?
21. What is difference between partitioning of relational target and file targets?
22. What are mapping parameters and variables in which situation we can use them?
23. What do you mean by direct loading and Indirect loading in session properties?
24. How do we implement recovery strategy while running concurrent batches?
25. Explain the versioning concept in Informatica?
26.What is Data driven?
27.What is batch? Explain the types of the batches?
28.What are the types of meta data repository stores?
29.Can you use the mapping parameters or variables created in one mapping into another
mapping?
30.Why did we use stored procedure in our ETL Application?
31.When we can join tables at the Source qualifier itself, why do we go for joiner
transformation?
32.What is the default join operation performed by the look up transformation?
33.What is hash table Informatica?

34.In a joiner transformation, you should specify the table with lesser rows as the master
table. Why?
35.Difference between Cached lookup and Un-cached lookup?
36.Explain what DTM does when you start a work flow?
37.Explain what Load Manager does when you start a work flow?
38.In a Sequential batch how do i stop one particular session from running?
39.What are the types of the aggregations available in Informatica?
40.How do I create Indexes after the load process is done?
41.How do we improve the performance of the aggregator transformation?
42.What are the different types of the caches available in Informatica? Explain in detail?
43.What is polling?
44.What are the limitations of the joiner transformation?
45.What is Mapplet?
46.What are active and passive transformations?
47.What are the options in the target session of update strategy transformation?
48.What is a code page? Explain the types of the code pages?
49.What do you mean rank cache?
50.How can you delete duplicate rows with out using Dynamic Lookup? Tell me any other
ways using lookup delete the duplicate rows?
51.Can u copy the session in to a different folder or repository?
52.What is tracing level and what are its types?
53.What is a command that used to run a batch?
54.What are the unsupported repository objects for a mapplet?
55.If your workflow is running slow, what is your approach towards performance tuning?
56.What are the types of mapping wizards available in Informatica?
57.After dragging the ports of three sources (Sql server, oracle, Informix) to a single source
qualifier, can we map these three ports directly to target?
58.Why we use stored procedure transformation?
59.Which object is required by the debugger to create a valid debug session?

60.Can we use an active transformation after update strategy transformation?


61.Explain how we set the update strategy transformation at the mapping level and at the
session level?
62.What is exact use of 'Online' and 'Offline' server connect Options while defining Work flow
in Work flow monitor? The system hangs when 'Online' Server connect option. The Informatica
is installed on a Personal laptop.
63.What is change data capture?
64.Write a session parameter file which will change the source and targets for every session.
i.e different source and targets for each session run ?
65.What are partition points?
66.What are the different threads in DTM process?
67.Can we do ranking on two ports? If yes explain how?
68.What is Transformation?
69.What does stored procedure transformation do in special as compared to other
transformation?
70.How do you recognize whether the newly added rows got inserted or updated?
71.What is data cleansing?
72.My flat files size is 400 MB and I want to see the data inside the FF with out opening it?
How do I do that?
73.Difference between Filter and Router?
74.How do you handle the decimal places when you are importing the flat file?
75.What is the difference between $ & $$ in mapping or parameter file? In which case they
are generally used?
76.While importing the relational source definition from database, what are the meta data of
source U import?
77.Difference between Power mart & Power Center?
78.What kinds of sources and of targets can be used in Informatica?
79.If a sequence generator (with increment of 1) is connected to (say) 3 targets and each
target uses the NEXTVAL port, what value will each target get?
80.What do you mean by SQL override?
81.What is a shortcut in Informatica?

82.How does Informatica do variable initialization? Number/String/Date


83.How many different locks are available for repository objects
84.What are the transformations that use cache for performance?
85.What is the use of Forward/Reject rows in Mapping?
86.How many ways you can filter the records?
87.How to delete duplicate records from source database/Flat Files? Can we use post sql to
delete these records. In case of flat file, how can you delete duplicates before it starts loading?
88.You are required to perform bulk loading using Informatica on Oracle, what action would
perform at Informatica + Oracle level for a successful load?
89.What precautions do you need take when you use reusable Sequence generator
transformation for concurrent sessions?
90.Is it possible negative increment in Sequence Generator? If yes, how would you
accomplish it?
91.Which directory Informatica looks for parameter file and what happens if it is missing when
start the session? Does session stop after it starts?
92.Informatica is complaining about the server could not be reached? What steps would you
take?
93.You have more five mappings use the same lookup. How can you manage the lookup?
94.What will happen if you copy the mapping from one repository to another repository and if
there is no identical source?
95.How can you limit number of running sessions in a workflow?
96.An Aggregate transformation has 4 ports (l sum (col 1), group by col 2, col3), which port
should be the output?
97.What is a dynamic lookup and what is the significance of NewLookupRow? How will use
them for rejecting duplicate records?
98.If you have more than one pipeline in your mapping how will change the order of load?
99.When you export a workflow from Repository Manager, what does this xml contain?
Workflow only?
100. Your session failed and when you try to open a log file, it complains that the session
details are not available. How would do trace the error? What log file would you seek for?
101.You want to attach a file as an email attachment from a particular directory using email
task in Informatica, How will you do it?

102. You have a requirement to alert you of any long running sessions in your workflow. How
can you create a workflow that will send you email for sessions running more than 30 minutes.
You can use any method, shell script, procedure or Informatica mapping or workflow control?

Data warehousing Concepts Based Interview Questions

24
1. What is a data-warehouse?
2. What are Data Marts?
3. What is ER Diagram?
4. What is a Star Schema?
5. What is Dimensional Modelling?
6. What Snow Flake Schema?
7. What are the Different methods of loading Dimension tables?
8. What are Aggregate tables?
9. What is the Difference between OLTP and OLAP?
10. What is ETL?
11. What are the various ETL tools in the Market?
12. What are the various Reporting tools in the Market?
13. What is Fact table?
14. What is a dimension table?
15. What is a lookup table?
16. What is a general purpose scheduling tool? Name some of them?
17. What are modeling tools available in the Market? Name some of them?
18. What is real time data-warehousing?
19. What is data mining?

20. What is Normalization? First Normal Form, Second Normal Form , Third Normal Form?
21. What is ODS?
22. What type of Indexing mechanism do we need to use for a typical
Data warehouse?
23. Which columns go to the fact table and which columns go the dimension table? (My user
needs to see <data element<data element broken by <data element<data element>
All elements before broken = Fact Measures
All elements after broken = Dimension Elements
24. What is a level of Granularity of a fact table? What does this signify?(Weekly level
summarization there is no need to have Invoice Number in the fact table anymore)
25. How are the Dimension tables designed? De-Normalized, Wide, Short, Use Surrogate
Keys, Contain Additional date fields and flags.
26. What are slowly changing dimensions?
27. What are non-additive facts? (Inventory,Account balances in bank)
28. What are conformed dimensions?
29. What is VLDB? (Database is too large to back up in a time frame then it's a VLDB)
30. What are SCD1, SCD2 and SCD3?

Target Load Order

22
Target Load Plan

When you use a mapplet in a mapping, the Mapping Designer lets you set the target load plan
for sources within the mapplet.
Setting the Target Load Order

You can configure the target load order for a mapping containing any type of target definition.
In the Designer, you can set the order in which the Integration Service sends rows to targets
in different target load order groups in a mapping. A target load order group is the collection
of source qualifiers, transformations, and targets linked together in a mapping. You can set the
target load order if you want to maintain referential integrity when inserting, deleting, or
updating tables that have the primary key and foreign key constraints.
The Integration Service reads sources in a target load order group concurrently, and it
processes target load order groups sequentially.
To specify the order in which the Integration Service sends data to targets, create one source
qualifier for each target within a mapping. To set the target load order, you then determine in
which order the Integration Service reads each source in the mapping.
The following figure shows two target load order groups in one mapping:

In this mapping, the first target load order group includes ITEMS, SQ_ITEMS, and T_ITEMS.
The second target load order group includes all other objects in the mapping, including the
TOTAL_ORDERS target. The Integration Service processes the first target load order group,
and then the second target load order group.
When it processes the second target load order group, it reads data from both sources at the
same time.
To set the target load order:

1.

Create a mapping that contains multiple target load order groups.

2.

Click Mappings > Target Load Plan.

3.

The Target Load Plan dialog box lists all Source Qualifier transformations in the
mapping and the targets that receive data from each source qualifier.

4.

Select a source qualifier from the list.

5.

Click the Up and Down buttons to move the source qualifier within the load order.

6.

Repeat steps 3 to 4 for other source qualifiers you want to reorder. Click OK.

MAPPLETS

16
A mapplet is a reusable object that we create in the Mapplet Designer.

It contains a set of transformations and lets us reuse that transformation logic in


multiple mappings.

Created in Mapplet Designer in Designer Tool.

We need to use same set of 5 transformations in say 10 mappings. So instead of making 5


transformations in every 10 mapping, we create a mapplet of these 5 transformations. Now
we use this mapplet in all 10 mappings. Example: To create a surrogate key in target. We
create a mapplet using a stored procedure to create Primary key for target table. We give
target table name and key column name as input to mapplet and get the Surrogate key as
output.
Mapplets help simplify mappings in the following ways:

Include source definitions: Use multiple source definitions and source qualifiers to
provide source data for a mapping.

Accept data from sources in a mapping

Include multiple transformations: As many transformations as we need.

Pass data to multiple transformations: We can create a mapplet to feed data to


multiple transformations. Each Output transformation in a mapplet represents one
output group in a mapplet.

Contain unused ports: We do not have to connect all mapplet input and output ports in
a mapping.

Mapplet Input:
Mapplet input can originate from a source definition and/or from an Input transformation in
the mapplet. We can create multiple pipelines in a mapplet.

We use Mapplet Input transformation to give input to mapplet.

Use of Mapplet Input transformation is optional.

Mapplet Output:
The output of a mapplet is not connected to any target table.

We must use Mapplet Output transformation to store mapplet output.

A mapplet must contain at least one Output transformation with at least one
connected port in the mapplet.

Example1: We will join EMP and DEPT table. Then calculate total salary. Give the output to
mapplet out transformation.
EMP and DEPT will be source tables.
Output will be given to transformation Mapplet_Out.
Steps:
1.

Open folder where we want to create the mapping.

2.

Click Tools -> Mapplet Designer.

3.

Click Mapplets-> Create-> Give name. Ex: mplt_example1

4.

Drag EMP and DEPT table.

5.

Use Joiner transformation as described earlier to join them.

6.

Transformation -> Create -> Select Expression for list -> Create -> Done

7.

Pass all ports from joiner to expression and then calculate total salary as described in
expression transformation.

8.

Now Transformation -> Create -> Select Mapplet Out from list > Create -> Give
name and then done.

9.

Pass all ports from expression to Mapplet output.

10. Mapplet -> Validate


11. Repository -> Save
Use of mapplet in mapping:

We can mapplet in mapping by just dragging the mapplet from mapplet folder on left
pane as we drag source and target tables.

When we use the mapplet in a mapping, the mapplet object displays only the ports
from the Input and Output transformations. These are referred to as the mapplet input
and mapplet output ports.

Make sure to give correct connection information in session.

Making a mapping: We will use mplt_example1, and then create a filter


transformation to filter records whose Total Salary is >= 1500.
mplt_example1 will be source.
Create target table same as Mapplet_out transformation as in picture above. Creating
Mapping
1.

Open folder where we want to create the mapping.

2.

Click Tools -> Mapping Designer.

3.

Click Mapping-> Create-> Give name. Ex: m_mplt_example1

4.

Drag mplt_Example1 and target table.

5.

Transformation -> Create -> Select Filter for list -> Create -> Done.

6.

Drag all ports from mplt_example1 to filter and give filter condition.

7.

Connect all ports from filter to target. We can add more transformations after filter if
needed.

8.

Validate mapping and Save it.

Make session and workflow.

Give connection information for mapplet source tables.

Give connection information for target table.

Run workflow and see result.

PARTITIONING

A pipeline consists of a source qualifier and all the transformations and Targets that
receive data from that source qualifier.

When the Integration Service runs the session, it can achieve higher Performance by
partitioning the pipeline and performing the extract, Transformation, and load for each
partition in parallel.

A partition is a pipeline stage that executes in a single reader, transformation, or Writer


thread. The number of partitions in any pipeline stage equals the number of Threads in the
stage. By default, the Integration Service creates one partition in every pipeline stage.
PARTITIONING ATTRIBUTES
1. Partition points

By default, IS sets partition points at various transformations in the pipeline.

Partition points mark thread boundaries and divide the pipeline into stages.

A stage is a section of a pipeline between any two partition points.

2. Number of Partitions

we can define up to 64 partitions at any partition point in a pipeline.

When we increase or decrease the number of partitions at any partition point, the
Workflow Manager increases or decreases the number of partitions at all Partition
points in the pipeline.

increasing the number of partitions or partition points increases the number of


threads.

The number of partitions we create equals the number of connections to the source or
target. For one partition, one database connection will be used.

3. Partition types

The Integration Service creates a default partition type at each partition point.

If we have the Partitioning option, we can change the partition type. This option is
purchased separately.

The partition type controls how the Integration Service distributes data among
partitions at partition points.

PARTITIONING TYPES
1. Round Robin Partition Type

In round-robin partitioning, the Integration Service distributes rows of data evenly to


all partitions.

Each partition processes approximately the same number of rows.

Use round-robin partitioning when we need to distribute rows evenly and do not need
to group data among partitions.

2. Pass-Through Partition Type

In

pass-through

partitioning,

the

Integration

Service

processes

data

without

Redistributing rows among partitions.

All rows in a single partition stay in that partition after crossing a pass-Through
partition point.

Use pass-through partitioning when we want to increase data throughput, but we do


not want to increase the number of partitions.

3. Database Partitioning Partition Type

Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets only.

Use any number of pipeline partitions and any number of database partitions.

We can improve performance when the number of pipeline partitions equals the
number of database partitions.

Database Partitioning with One Source

When we use database partitioning with a source qualifier with one source, the Integration
Service generates SQL queries for each database partition and distributes the data from the
database partitions among the session partitions Equally.
For example, when a session has three partitions and the database has five partitions, 1 st and
2nd session partitions will receive data from 2 database partitions each. Thus four DB partitions
used. 3rd Session partition will receive Data from the remaining 1 DB partition.
Partitioning a Source Qualifier with Multiple Sources Tables
The Integration Service creates SQL queries for database partitions based on the Number of
partitions in the database table with the most partitions.
If the session has three partitions and the database table has two partitions, one of the
session partitions receives no data.
4. Hash Auto-Keys Partition Type

The Integration Service uses all grouped or sorted ports as a compound Partition key.

Use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and Unsorted
Aggregator transformations to ensure that rows are grouped Properly before they
enter these transformations.

5. Hash User-Keys Partition Type

The Integration Service uses a hash function to group rows of data among Partitions.

we define the number of ports to generate the partition key.

we choose the ports that define the partition key .

6. Key range Partition Type

We specify one or more ports to form a compound partition key.

The Integration Service passes data to each partition depending on the Ranges we
specify for each port.

Use key range partitioning where the sources or targets in the pipeline are Partitioned
by key range.

Example: Customer 1-100 in one partition, 101-200 in another and so on. We Define
the range for each partition.

WORKING WITH LINKS

Use links to connect each workflow task.

We can specify conditions with links to create branches in the workflow.

The Workflow Manager does not allow us to use links to create loops in the workflow.
Each link in the workflow can run only once.

Valid Workflow :

Example of loop:

Specifying Link Conditions:

Once we create links between tasks, we can specify conditions for each link to
determine the order of execution in the workflow.

If we do not specify conditions for each link, the Integration Service runs the next task
in the workflow by default.

Use predefined or user-defined workflow variables in the link condition.

Steps:
1.

In the Workflow Designer workspace, double-click the link you want to specify.

2.

The Expression Editor appears.

3.

In the Expression Editor, enter the link condition. The Expression Editor provides
predefined workflow variables, user-defined workflow variables, variable functions, and
Boolean and arithmetic operators.

4.

Validate the expression using the Validate button.

Using the Expression Editor:


The Workflow Manager provides an Expression Editor for any expressions in the workflow. We
can enter expressions using the Expression Editor for the following:

Link conditions

Decision task

Assignment task

The Workflow Manager contains many types of tasks to help you build workflows and worklets.
We can create reusable tasks in the Task Developer.
Types of tasks:

Task Type

Tool where task


can be created

Reusable or not

Session

Task Developer

Yes

Email

Workflow Designer Yes

Command

Worklet Designer

Event-Raise

Workflow Designer No

Event-Wait

Worklet Designer

Yes

No

Timer

No

Decision

No

Assignment

No

Control

No

SESSION TASK

A session is a set of instructions that tells the Power Center Server how and when to
move data from sources to targets.
To run a session, we must first create a workflow to contain the Session task.
We can run as many sessions in a workflow as we need. We can run the Session tasks
sequentially or concurrently, depending on our needs.
The Power Center Server creates several files and in-memory caches depending on the
transformations and options used in the session.

EMAIL TASK

The Workflow Manager provides an Email task that allows us to send email during a

workflow.
Created by Administrator usually and we just drag and use it in our mapping.

Steps:
1.

In the Task Developer or Workflow Designer, choose Tasks-Create.

2.

Select an Email task and enter a name for the task. Click Create.

3.

Click Done.

4.

Double-click the Email task in the workspace. The Edit Tasks dialog box appears.

5.

Click the Properties tab.

6.

Enter the fully qualified email address of the mail recipient in the Email User Name
field.

7.

Enter the subject of the email in the Email Subject field. Or, you can leave this field
blank.

8.

Click the Open button in the Email Text field to open the Email Editor.

9.

Click OK twice to save your changes.

Example: To send an email when a session completes:


Steps:
1.

Create a workflow wf_sample_email

2.

Drag any session task to workspace.

3.

Edit Session task and go to Components tab.

4.

See On Success Email Option there and configure it.

5.

In Type select reusable or Non-reusable.

6.

In Value, select the email task to be used.

7.

Click Apply -> Ok.

8.

Validate workflow and Repository -> Save

We can also drag the email task and use as per need.
We can set the option to send email on success or failure in components tab of a
session task.

COMMAND TASK
The Command task allows us to specify one or more shell commands in UNIX or DOS
commands in Windows to run during the workflow.
For example, we can specify shell commands in the Command task to delete reject files, copy
a file, or archive target files.
Ways of using command task:
1. Standalone Command task: We can use a Command task anywhere in the workflow or
worklet to run shell commands.
2. Pre- and post-session shell command: We can call a Command task as the pre- or postsession shell command for a Session task. This is done in COMPONENTS TAB of a session. We
can run it in Pre-Session Command or Post Session Success Command or Post Session Failure
Command. Select the Value and Type option as we did in Email task.

Example: to copy a file sample.txt from D drive to E.


Command: COPY D:\sample.txt E:\ in windows
Steps for creating command task:
1.

In the Task Developer or Workflow Designer, choose Tasks-Create.

2.

Select Command Task for the task type.

3.

Enter a name for the Command task. Click Create. Then click done.

4.

Double-click the Command task. Go to commands tab.

5.

In the Commands tab, click the Add button to add a command.

6.

In the Name field, enter a name for the new command.

7.

In the Command field, click the Edit button to open the Command Editor.

8.

Enter only one command in the Command Editor.

9.

Click OK to close the Command Editor.

10. Repeat steps 5-9 to add more commands in the task.


11. Click OK.
Steps to create the workflow using command task:
1.

Create a task using the above steps to copy a file in Task Developer.

2.

Open Workflow Designer. Workflow -> Create -> Give name and click ok.

3.

Start is displayed. Drag session say s_m_Filter_example and command task.

4.

Link Start to Session task and Session to Command Task.

5.

Double click link between Session and Command and give condition in editor as

6.

$S_M_FILTER_EXAMPLE.Status=SUCCEEDED

7.

Workflow-> Validate

8.

Repository > Save

WORKING WITH EVENT TASKS


We can define events in the workflow to specify the sequence of task execution.
Types of Events:

Pre-defined event: A pre-defined event is a file-watch event. This event Waits for a

specified file to arrive at a given location.


User-defined event: A user-defined event is a sequence of tasks in the Workflow. We
create events and then raise them as per need.

Steps for creating User Defined Event:


1.

Open any workflow where we want to create an event.

2.

Click Workflow-> Edit -> Events tab.

3.

Click to Add button to add events and give the names as per need.

4.

Click Apply -> Ok. Validate the workflow and Save it.

Types of Events Tasks:

EVENT RAISE: Event-Raise task represents a user-defined event. We use this task to

raise a user defined event.


EVENT WAIT: Event-Wait task waits for a file watcher event or user defined event to
occur before executing the next session in the workflow.

Example1: Use an event wait task and make sure that session s_filter_example runs when
abc.txt file is present in D:\FILES folder.
Steps for creating workflow:
1.

Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok.

2.

Task -> Create -> Select Event Wait. Give name. Click create and done.

3.

Link Start to Event Wait task.

4.

Drag s_filter_example to workspace and link it to event wait task.

5.

Right click on event wait task and click EDIT -> EVENTS tab.

6.

Select Pre Defined option there. In the blank space, give directory and filename to
watch. Example: D:\FILES\abc.tct

7.

Workflow validate and Repository Save.

Example 2: Raise a user defined event when session s_m_filter_example succeeds. Capture
this event in event wait task and run session S_M_TOTAL_SAL_EXAMPLE
Steps for creating workflow:
1.

Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok.

2.

Workflow -> Edit -> Events Tab and add events EVENT1 there.

3.

Drag s_m_filter_example and link it to START task.

4.

Click Tasks -> Create -> Select EVENT RAISE from list. Give name

5.

ER_Example. Click Create and then done.Link ER_Example to s_m_filter_example.

6.

Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User Defined
Event and Select EVENT1 from the list displayed. Apply -> OK.

7.

Click link between ER_Example and s_m_filter_example and give the condition
$S_M_FILTER_EXAMPLE.Status=SUCCEEDED

8.

Click Tasks -> Create -> Select EVENT WAIT from list. Give name EW_WAIT. Click
Create and then done.

9.

Link EW_WAIT to START task.

10. Right click EW_WAIT -> EDIT-> EVENTS tab.


11. Select User Defined there. Select the Event1 by clicking Browse Events button.
12. Apply -> OK.
13. Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT.
14. Mapping -> Validate
15. Repository -> Save.
16. Run workflow and see.

TIMER TASK
The Timer task allows us to specify the period of time to wait before the Power Center Server
runs the next task in the workflow. The Timer task has two types of settings:

Absolute time: We specify the exact date and time or we can choose a user-defined
workflow variable to specify the exact time. The next task in workflow will run as per

the date and time specified.


Relative time: We instruct the Power Center Server to wait for a specified period of
time after the Timer task, the parent workflow, or the top-level workflow starts.

Example: Run session s_m_filter_example relative to 1 min after the timer task.
Steps for creating workflow:
1.

Workflow -> Create -> Give name wf_timer_task_example -> Click ok.

2.

Click Tasks -> Create -> Select TIMER from list. Give name TIMER_Example. Click
Create and then done.

3.

Link TIMER_Example to START task.

4.

Right click TIMER_Example-> EDIT -> TIMER tab.

5.

Select Relative Time Option and Give 1 min and Select From start time of this task
Option.

6.

Apply -> OK.

7.

Drag s_m_filter_example and link it to TIMER_Example.

8.

Workflow-> Validate and Repository -> Save.

DECISION TASK

The Decision task allows us to enter a condition that determines the execution of the

workflow, similar to a link condition.


The Decision task has a pre-defined variable called $Decision_task_name.condition

that represents the result of the decision condition.


The Power Center Server evaluates the condition in the Decision task and sets the pre-

defined condition variable to True (1) or False (0).


We can specify one decision condition per Decision task.

Example: Command Task should run only if either s_m_filter_example or


S_M_TOTAL_SAL_EXAMPLE succeeds. If any of s_m_filter_example or
S_M_TOTAL_SAL_EXAMPLE fails then S_m_sample_mapping_EMP should run.
Steps for creating workflow:
1.

Workflow -> Create -> Give name wf_decision_task_example -> Click ok.

2.

Drag s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE to workspace and link both


of them to START task.

3.

Click Tasks -> Create -> Select DECISION from list. Give name DECISION_Example.
Click Create and then done. Link DECISION_Example to both s_m_filter_example and
S_M_TOTAL_SAL_EXAMPLE.

4.

Right click DECISION_Example-> EDIT -> GENERAL tab.

5.

Set Treat Input Links As to OR. Default is AND. Apply and click OK.

6.

Now edit decision task again and go to PROPERTIES Tab. Open the Expression editor
by clicking the VALUE section of Decision Name attribute and enter the following
condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED OR
$S_M_TOTAL_SAL_EXAMPLE.Status = SUCCEEDED

7.

Validate the condition -> Click Apply -> OK.

8.

Drag command task and S_m_sample_mapping_EMP task to workspace and link them
to DECISION_Example task.

9.

Double click link between S_m_sample_mapping_EMP & DECISION_Example & give


the condition: $DECISION_Example.Condition = 0. Validate & click OK.

10. Double click link between Command task and DECISION_Example and give the
condition: $DECISION_Example.Condition = 1. Validate and click OK.
11. Workflow Validate and repository Save.
12. Run workflow and see the result.

CONTROL TASK

We can use the Control task to stop, abort, or fail the top-level workflow or the parent
workflow based on an input link condition.
A parent workflow or worklet is the workflow or worklet that contains the Control task.
We give the condition to the link connected to Control Task.

Control Option

Description

Fail Me

Fails the control task.

Fail Parent

Marks the status of the WF or worklet that contains the


Control task as failed.

Stop Parent

Stops the WF or worklet that contains the Control task.

Abort Parent

Aborts the WF or worklet that contains the Control task.

Fail Top-Level WF Fails the workflow that is running.


Stop Top-Level WF Stops the workflow that is running.

Abort Top-Level
WF

Aborts the workflow that is running.

Example: Drag any 3 sessions and if anyone fails, then Abort the top level workflow.
Steps for creating workflow:
1.

Workflow -> Create -> Give name wf_control_task_example -> Click ok.

2.

Drag any 3 sessions to workspace and link all of them to START task.

3.

Click Tasks -> Create -> Select CONTROL from list. Give name cntr_task.

4.

Click Create and then done.

5.

Link all sessions to the control task cntr_task.

6.

Double click link between cntr_task and any session say s_m_filter_example and give
the condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED.

7.

Repeat above step for remaining 2 sessions also.

8.

Right click cntr_task-> EDIT -> GENERAL tab. Set Treat Input Links As to OR. Default
is AND.

9.

Go to PROPERTIES tab of cntr_task and select the value Fail top level

10. Workflow for Control Option. Click Apply and OK.


11. Workflow Validate and repository Save.
12. Run workflow and see the result.

ASSIGNMENT TASK

The Assignment task allows us to assign a value to a user-defined workflow variable.


See Workflow variable topic to add user defined variables.
To use an Assignment task in the workflow, first create and add the
Assignment task to the workflow. Then configure the Assignment task to assign values
or expressions to user-defined variables.
We cannot assign values to pre-defined workflow.

Steps to create Assignment Task:


1.

Open any workflow where we want to use Assignment task.

2.

Edit Workflow and add user defined variables.

3.

Choose Tasks-Create. Select Assignment Task for the task type.

4.

Enter a name for the Assignment task. Click Create. Then click Done.

5.

Double-click the Assignment task to open the Edit Task dialog box.

6.

On the Expressions tab, click Add to add an assignment.

7.

Click the Open button in the User Defined Variables field.

8.

Select the variable for which you want to assign a value. Click OK.

9.

Click the Edit button in the Expression field to open the Expression Editor.

10. Enter the value or expression you want to assign.


11. Repeat steps 7-10 to add more variable assignments as necessary.
12. Click OK.

SCD Type 1

17
Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly,
rather than changing on a time-based, regular schedule
For example, you may have a dimension in your database that tracks the sales records of
your company's salespeople. Creating sales reports seems simple enough, until a salesperson
is transferred from one regional office to another. How do you record such a change in your
sales dimension?
You could sum or average the sales by salesperson, but if you use that to compare the
performance of salesmen, that might give misleading information. If the salesperson that was
transferred used to work in a hot market where sales were easy, and now works in a market
where sales are infrequent, her totals will look much stronger than the other salespeople in
her new region, even if they are just as good. Or you could create a second salesperson record
and treat the transferred person as a new sales person, but that creates problems also.
Dealing with these issues involves SCD management methodologies:
Type 1:
The Type 1 methodology overwrites old data with new data, and therefore does not track
historical data at all. This is most appropriate when correcting certain types of data errors,
such as the spelling of a name. (Assuming you won't ever need to know how it used to be
misspelled in the past.)
Here is an example of a database table that keeps supplier information:

Supplier_Key Supplier_Code Supplier_Name Supplier_State


123

ABC

Acme Supply Co CA

In this example, Supplier_Code is the natural key and Supplier_Key is asurrogate key.
Technically, the surrogate key is not necessary, since the table will be unique by the natural
key (Supplier_Code). However, the joins will perform better on an integer than on a character
string.
Now imagine that this supplier moves their headquarters to Illinois. The updated table would
simply overwrite this record:

Supplier_Key Supplier_Code Supplier_Name Supplier_State


123

ABC

Acme Supply Co IL

The obvious disadvantage to this method of managing SCDs is that there is no historical
record kept in the data warehouse. You can't tell if your suppliers are tending to move to the
Midwest, for example. But an advantage to Type 1 SCDs is that they are very easy to
maintain.
Explanation with an Example:
Source Table: (01-01-11) Target Table: (01-01-11)

Emp no

Ename

Sal

101

1000

102

2000

103

3000

Emp no

Ename

Sal

101

1000

102

2000

103

3000

The necessity of the lookup transformation is illustrated using the above source and target
table.
Source Table: (01-02-11) Target Table: (01-02-11)

Emp no Ename

Sal

Empno

Ename

Sal

101

1000

101

1000

102

2500

102

2500

103

3000

103

3000

104

4000

104

4000

In the second Month we have one more employee added up to the table with the
Ename D and salary of the Employee is changed to the 2500 instead of 2000.

Step 1: Is to import Source Table and Target table.

Create a table by name emp_source with three columns as shown above in oracle.

Import the source from the source analyzer.

In the same way as above create two target tables with the names emp_target1,
emp_target2.

Go to the targets Menu and click on generate and execute to confirm the creation of
the target tables.

The snap shot of the connections using different kinds of transformations are shown
below.

Step 2: Design the mapping and apply the necessary transformation.

Here in this transformation we are about to use four kinds of transformations namely
Lookup transformation, Expression Transformation, Filter Transformation, Update
Transformation. Necessity and the usage of all the transformations will be discussed in
detail below.

Look up Transformation: The purpose of this transformation is to determine whether to


insert, Delete, Update or reject the rows in to target table.

The first thing that we are goanna do is to create a look up transformation and
connect the Empno from the source qualifier to the transformation.

The snapshot of choosing the Target table is shown below.

What Lookup transformation does in our mapping is it looks in to the target table
(emp_table) and compares it with the Source Qualifier and determines whether to
insert, update, delete or reject rows.

In the Ports tab we should add a new column and name it as empno1 and this is
column for which we are gonna connect from the Source Qualifier.

The Input Port for the first column should be unchked where as the other ports like
Output and lookup box should be checked. For the newly created column only input
and output boxes should be checked.

In the Properties tab (i) Lookup table name ->Emp_Target.

(ii)Look up Policy on Multiple Mismatch -> use First Value.


(iii) Connection Information ->Oracle.

In the Conditions tab (i) Click on Add a new condition

(ii)Lookup Table Column should be Empno, Transformation port should be Empno1 and
Operator should =.

Expression Transformation: After we are done with the Lookup Transformation we are using
an expression transformation to check whether we need to insert the records the same
records or we need to update the records. The steps to create an Expression Transformation
are shown below.

Drag all the columns from both the source and the look up transformation and drop
them all on to the Expression transformation.

Now double click on the Transformation and go to the Ports tab and create two new
columns and name it as insert and update. Both these columns are gonna be our
output data so we need to have check mark only in front of the Output check box.

The Snap shot for the Edit transformation window is shown below.

The condition that we want to parse through our output data are listed below.

Input IsNull(EMPNO1)
Output iif(Not isnull (EMPNO1) and Decode(SAL,SAL1,1,0)=0,1,0) .

We are all done here .Click on apply and then OK.

Filter Transformation: we are gonna have two filter transformations one to insert and other
to update.

Connect the Insert column from the expression transformation to the insert column in
the first filter transformation and in the same way we are gonna connect the update
column in the expression transformation to the update column in the second filter.

Later now connect the Empno, Ename, Sal from the expression transformation to both
filter transformation.

If there is no change in input data then filter transformation 1 forwards the complete
input to update strategy transformation 1 and same output is gonna appear in the
target table.

If there is any change in input data then filter transformation 2 forwards the complete
input to the update strategy transformation 2 then it is gonna forward the updated
input to the target table.

Go to the Properties tab on the Edit transformation

(i) The value for the filter condition 1 is Insert.


(ii) The value for the filter condition 1 is Update.

The Closer view of the filter Connection is shown below.

Update Strategy Transformation: Determines whether to insert, delete, update or reject


the rows.

Drag the respective Empno, Ename and Sal from the filter transformations and drop
them on the respective Update Strategy Transformation.

Now go to the Properties tab and the value for the update strategy expression is 0 (on
the 1st update transformation).

Now go to the Properties tab and the value for the update strategy expression is 1 (on
the 2nd update transformation).

We are all set here finally connect the outputs of the update transformations to the
target table.

Step 3: Create the task and Run the work flow.

Dont check the truncate table option.

Change Bulk to the Normal.

Run the work flow from task.

Step 4: Preview the Output in the target table.

WORKFLOW VARIABLES

17
We can create and use variables in a workflow to reference values and record information.
Types of workflow variables:

Predefined workflow variables

User-defined workflow variables

Predefined workflow variables :


The Workflow Manager provides predefined workflow variables for tasks within a workflow.
Types of Predefined workflow variables are:
System variables:
Use the SYSDATE and WORKFLOWSTARTTIME system variables within a workflow.

Task-specific variables:
The Workflow Manager provides a set of task-specific variables for each task in the workflow.
The Workflow Manager lists task-specific variables under the task name in the Expression
Editor.

Task-specific

Description

Task Type

Condition

Result of decision condition


expression. NULL if task fails.

Decision Task

EndTime

Date and time when a task ended. All Tasks

ErrorCode

Last error code for the associated All Tasks


task. 0 if there is no error.

ErrorMsg

Last error message for the


associated task. Empty String if
there is no error.

FirstErrorCode

Error code for the first error


Session
message in the session. 0 if there
is no error.

FirstErrorMsg

First error message in the session. Session


Empty String if there is no error.

PrevTaskStatus

Status of the previous task in the All Tasks


workflow that IS ran. Can be
ABORTED, FAILED,
STOPPED,

variable

All Tasks

SUCCEEDED.
SrcFailedRows

Total number of rows the


Session
Integration Service failed to read
from the source.

SrcSuccessRows

Total number of rows


successfully read from the
sources.

StartTime

Date and time when task started. All Tasks

Session

Status

Status of the previous task in the All Tasks


workflow. Can be ABORTED,
DISABLED, FAILED,
NOTSTARTED,STARTED,
STOPPED, SUCCEEDED.

TgtFailedRows

Total number of rows the


Session
Integration Service failed to write
to the target.

TgtSuccessRows

Total number of rows


Session
successfully written to the target

TotalTransErrors

Total number of transformation


errors.

Session

User-Defined Workflow Variables


We can create variables within a workflow. When we create a variable in a workflow, it is valid
only in that workflow. Use the variable in tasks within that workflow. We can edit and delete
user-defined workflow variables.
Integration Service holds two different values for a workflow variable during a workflow run:

Start value of a workflow variable

Current value of a workflow variable

The Integration Service looks for the start value of a variable in the following order:
1.

Value in parameter file

2.

Value saved in the repository (if the variable is persistent)

3.

User-specified default value

4.

Data type default value

Persistent means value is saved to the repository.


To create a workflow variable:
1.

In the Workflow Designer, create a new workflow or edit an existing one.

2.

Select the Variables tab.

3.

Click Add and enter a name for the variable.

4.

In the Data type field, select the data type for the new variable.

5.

Enable the Persistent option if we want the value of the variable retained from one
execution of the workflow to the next.

6. Enter the default value for the variable in the Default field.
7. To validate the default value of the new workflow variable, click the Validate button.
8. Click Apply to save the new workflow variable.
9. Click OK to close the workflow properties.

IDENTIFICATION OF BOTTLENECKS
IDENTIFICATION OF BOTTLENECKS
Performance of Informatica is dependant on the performance of its several components like
database, network, transformations, mappings, sessions etc. To tune the performance of
Informatica, we have to identify the bottleneck first.
Bottleneck may be present in source, target, transformations, mapping, session,database or
network. It is best to identify performance issue in components in the order source, target,

transformations, mapping and session. After identifying the bottleneck, apply the tuning
mechanisms in whichever way they are applicable to the project.
Identify bottleneck in Source
If source is a relational table, put a filter transformation in the mapping, just after source
qualifier; make the condition of filter to FALSE. So all records will be filtered off and none will
proceed to other parts of the mapping.In original case, without the test filter, total time taken
is as follows:Total Time = time taken by (source + transformations + target load)
Now because of filter, Total Time = time taken by source
So if source was fine, then in the latter case, session should take less time. Still if the session
takes near equal time as former case, then there is a source bottleneck.
Identify bottleneck in Target
If the target is a relational table, then substitute it with a flat file and run the session. If the
time taken now is very much less than the time taken for the session to load to table, then the
target table is the bottleneck.
Identify bottleneck in Transformation
Remove the transformation from the mapping and run it. Note the time taken.Then put the
transformation back and run the mapping again. If the time taken now is significantly more
than previous time, then the transformation is the bottleneck.
But removal of transformation for testing can be a pain for the developer since that might
require further changes for the session to get into the working mode.
So we can put filter with the FALSE condition just after the transformation and run the session.
If the session run takes equal time with and without this test filter,then transformation is the
bottleneck.
Identify bottleneck in sessions
We can use the session log to identify whether the source, target or transformations are the
performance bottleneck. Session logs contain thread summary records like the following:-

MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point
[SQ_test_all_text_data] has completed: Total Run Time =[11.703201] secs, Total Idle Time =
[9.560945] secs, Busy Percentage =[18.304876].
MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of
partition point [SQ_test_all_text_data] has completed: Total Run Time = [11.764368] secs,
Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].
If busy percentage is 100, then that part is the bottleneck.
Basically we have to rely on thread statistics to identify the cause of performance issues. Once
the Collect Performance Data option (In session Properties tab) is enabled, all the
performance related information would appear in the log created by the session.

PUSH DOWN OPTIMISATION


You can push transformation logic to the source or target database using pushdown
optimization. When you run a session configured for pushdown optimization, the Integration
Service translates the transformation logic into SQL queries and sends the SQL queries to the
database. The source or target database executes the SQL queries to process the
transformations.
The amount of transformation logic you can push to the database depends on the database,
transformation logic, and mapping and session configuration. The Integration Service
processes all transformation logic that it cannot push to a database.
Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that
the Integration Service can push to the source or target database. You can also use the
Pushdown Optimization Viewer to view the messages related to pushdown optimization.
The following figure shows a mapping containing transformation logic that can be pushed to
the source database:

This mapping contains an Expression transformation that creates an item ID based on the
store number 5419 and the item ID from the source. To push the transformation logic to the
database, the Integration Service generates the following SQL statement:
INSERT INTO T_ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) SELECT CAST((CASE WHEN 5419
IS NULL THEN '' ELSE 5419 END) + '_' + (CASE WHEN ITEMS.ITEM_ID IS NULL THEN '' ELSE
ITEMS.ITEM_ID END) AS INTEGER), ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS2
ITEMS
The Integration Service generates an INSERT SELECT statement to retrieve the ID, name, and
description values from the source table, create new item IDs, and insert the values into the
ITEM_ID, ITEM_NAME, and ITEM_DESC columns in the target table. It concatenates the store
number 5419, an underscore, and the original ITEM ID to get the new item ID.
Pushdown Optimization Types
You can configure the following types of pushdown optimization:

Source-side pushdown optimization. The Integration Service pushes as much


transformation logic as possible to the source database.

Target-side pushdown optimization. The Integration Service pushes as much


transformation logic as possible to the target database.

Full

pushdown

optimization. The

Integration

Service

attempts

to

push

all

transformation logic to the target database. If the Integration Service cannot push all
transformation logic to the database, it performs both source-side and target-side
pushdown optimization.
Running Source-Side Pushdown Optimization Sessions

When you run a session configured for source-side pushdown optimization, the Integration
Service analyzes the mapping from the source to the target or until it reaches a downstream
transformation it cannot push to the source database.
The Integration Service generates and executes a SELECT statement based on the
transformation logic for each transformation it can push to the database. Then, it reads the
results of this SQL query and processes the remaining transformations.
Running Target-Side Pushdown Optimization Sessions
When you run a session configured for target-side pushdown optimization, the Integration
Service analyzes the mapping from the target to the source or until it reaches an upstream
transformation it cannot push to the target database. It generates an INSERT, DELETE, or
UPDATE statement based on the transformation logic for each transformation it can push to
the target database. The Integration Service processes the transformation logic up to the point
that it can push the transformation logic to the database. Then, it executes the generated SQL
on the Target database.
Running Full Pushdown Optimization Sessions
To use full pushdown optimization, the source and target databases must be in the same
relational database management system. When you run a session configured for full pushdown
optimization, the Integration Service analyzes the mapping from the source to the target or
until it reaches a downstream transformation it cannot push to the target database. It
generates and executes SQL statements against the source or target based on the
transformation logic it can push to the database.
When you run a session with large quantities of data and full pushdown optimization, the
database server must run a long transaction. Consider the following database performance
issues when you generate a long transaction:

A long transaction uses more database resources.

A long transaction locks the database for longer periods of time. This reduces database
concurrency and increases the likelihood of deadlock.

A long transaction increases the likelihood of an unexpected event. To minimize


database performance issues for long transactions, consider using source-side or
target-side pushdown optimization.

Rules and Guidelines for Functions in Pushdown Optimization


Use the following rules and guidelines when pushing functions to a database:

If you use ADD_TO_DATE in transformation logic to change days, hours, minutes, or


seconds, you cannot push the function to a Teradata database.

When you push LAST_DAY () to Oracle, Oracle returns the date up to the second. If
the input date contains sub seconds, Oracle trims the date to the second.

When you push LTRIM, RTRIM, or SOUNDEX to a database, the database treats the
argument (' ') as NULL, but the Integration Service treats the argument (' ') as spaces.

An IBM DB2 database and the Integration Service produce different results for
STDDEV and VARIANCE. IBM DB2 uses a different algorithm than other databases to
calculate STDDEV and VARIANCE.

When you push SYSDATE or SYSTIMESTAMP to the database, the database server
returns the timestamp in the time zone of the database server, not the Integration
Service.

If you push SYSTIMESTAMP to an IBM DB2 or a Sybase database, and you specify the
format for SYSTIMESTAMP, the database ignores the format and returns the complete
time stamp.

You can push SYSTIMESTAMP (SS) to a Netezza database, but not SYSTIMESTAMP
(MS) or SYSTIMESTAMP (US).

When you push TO_CHAR (DATE) or TO_DATE () to Netezza, dates with sub second
precision must be in the YYYY-MM-DD HH24: MI: SS.US format. If the format is
different, the Integration Service does not push the function to Netezza.

SCD 2 (Complete):
Let us drive the point home using a simple scenario. For eg., in the current month ie.,(0101-2010) we are provided with an source table with the three columns and three rows in it
like (EMpno,Ename,Sal). There is a new employee added and one change in the records
in the month (01-02-2010). We are gonna use the SCD-2 style to extract and load the
records in to target table.

The thing to be noticed here is if there is any update in the salary of any employee
then the history of that employee is displayed with the current date as the start date
and the previous date as the end date.

Source Table: (01-01-11)

Emp no

Ename

Sal

101

1000

102

2000

103

3000

Target Table: (01-01-11)

Skey

Emp no

Ename

Sal

S-date

E-date

Ver

Flag

100

101

1000

01-01-10

Null

200

102

2000

01-01-10

Null

300

103

3000

01-01-10

Null

Source Table: (01-02-11)

Emp no

Ename

Sal

101

1000

102

2500

103

3000

104

4000

Target Table: (01-02-11)

Skey

Emp no

Ename

Sal

S-date

E-date

Ver

Flag

100

101

1000

01-02-10

Null

200

102

2000

01-02-10

Null

300

103

3000

01-02-10

Null

201

102

2500

01-02-10

01-01-10

400

104

4000

01-02-10

Null

In the second Month we have one more employee added up to the table with the Ename D and
salary of the Employee is changed to the 2500 instead of 2000.
Step 1: Is to import Source Table and Target table.

Create a table by name emp_source with three columns as shown above in oracle.

Import the source from the source analyzer.

Drag the Target table twice on to the mapping designer to facilitate insert or update
process.

Go to the targets Menu and click on generate and execute to confirm the creation of
the target tables.

The snap shot of the connections using different kinds of transformations are shown
below.

In The Target Table we are goanna add five columns (Skey, Version, Flag, S_date
,E_Date).

Step 2: Design the mapping and apply the necessary transformation.

Here in this transformation we are about to use four kinds of transformations namely
Lookup transformation (1), Expression Transformation (3), Filter Transformation (2),
Sequence Generator. Necessity and the usage of all the transformations will be
discussed in detail below.

Look up Transformation: The purpose of this transformation is to Lookup on the target table
and to compare the same with the Source using the Lookup Condition.

The first thing that we are gonna do is to create a look up transformation and connect
the Empno from the source qualifier to the transformation.

The snapshot of choosing the Target table is shown below.

Drag the Empno column from the Source Qualifier to the Lookup Transformation.

The Input Port for only the Empno1 should be checked.

In the Properties tab (i) Lookup table name ->Emp_Target.

(ii)Look up Policy on Multiple Mismatch -> use Last Value.


(iii) Connection Information ->Oracle.

In the Conditions tab (i) Click on Add a new condition

(ii)Lookup Table Column should be Empno, Transformation port should be Empno1 and
Operator should =.
Expression Transformation: After we are done with the Lookup Transformation we are using
an expression transformation to find whether the data on the source table matches with the

target table. We specify the condition here whether to insert or to update the table. The steps
to create an Expression Transformation are shown below.

Drag all the columns from both the source and the look up transformation and drop
them all on to the Expression transformation.

Now double click on the Transformation and go to the Ports tab and create two new
columns and name it as insert and update. Both these columns are goanna be our
output data so we need to have unchecked input check box.

The Snap shot for the Edit transformation window is shown below.

The condition that we want to parse through our output data are listed below.

Insert : IsNull(EmpNO1)
Update: iif(Not isnull (Skey) and Decode(SAL,SAL1,1,0)=0,1,0) .

We are all done here .Click on apply and then OK.

Filter Transformation: We need two filter transformations the purpose the first filter is to
filter out the records which we are goanna insert and the next is vice versa.

If there is no change in input data then filter transformation 1 forwards the complete
input to Exp 1 and same output is goanna appear in the target table.

If there is any change in input data then filter transformation 2 forwards the complete
input to the Exp 2 then it is gonna forward the updated input to the target table.

Go to the Properties tab on the Edit transformation

(i) The value for the filter condition 1 is Insert.


(ii) The value for the filter condition 2 is Update.

The closer view of the connections from the expression to the filter is shown below.

Sequence Generator: We use this to generate an incremental cycle of sequential range of


number.The purpose of this in our mapping is to increment the skey in the bandwidth of 100.

We are gonna have a sequence generator and the purpose of the sequence generator
is to increment the values of the skey in the multiples of 100 (bandwidth of 100).

Connect the output of the sequence transformation to the Exp 1.

Expression Transformation:
Exp 1: It updates the target table with the skey values. Point to be noticed here is skey gets
multiplied by 100 and a new row is generated if there is any new EMP added to the list. Else
the there is no modification done on the target table.

Drag all the columns from the filter 1 to the Exp 1.

Now add a new column as N_skey and the expression for it is gonna be Nextval1*100.

We are goanna make the s-date as the o/p and the expression for it is sysdate.

Flag is also made as output and expression parsed through it is 1.

Version is also made as output and expression parsed through it is 1.

Exp 2: If same employee is found with any updates in his records then Skey gets added by 1
and version changes to the next higher number,F

Drag all the columns from the filter 2 to the Exp 2.

Now add a new column as N_skey and the expression for it is gonna be Skey+1.

Both the S_date and E_date is gonna be sysdate.

Exp 3: If any record of in the source table gets updated then we make it only as the output.

If change is found then we are gonna update the E_Date to S_Date.

Update Strategy: This is place from where the update instruction is set on the target table.

The update strategy expression is set to 1.

Step 3: Create the task and Run the work flow.

Dont check the truncate table option.

Change Bulk to the Normal.

Run the work flow from task.

Create the task and run the work flow.

SCD Type 3

26
SCD Type 3:

This Method has limited history preservation, and we are goanna use skey as the Primary key
here.

Source table: (01-01-2011)

Empno

Ename

Sal

101

1000

102

2000

103

3000

Target Table: (01-01-2011)

Empno Ename

C-sal

P-sal

101

1000

102

2000

103

3000

Source Table: (01-02-2011)

Empno

Ename

Sal

101

1000

102

4566

103

3000

Target Table (01-02-2011):

Empno

Ename

C-sal

P-sal

101

1000

102

4566

Null

103

3000

102

4544

4566

So hope u got what Im trying to do with the above tables:


Step 1: Initially in the mapping designer Im goanna create a mapping as below. And in this
mapping Im using lookup, expression, filter, update strategy to drive the purpose. Explanation
of each and every Transformation is given below.

Step 2: here we are goanna see the purpose and usage of all the transformations that we
have used in the above mapping.
Look up Transformation: The look Transformation looks the target table and compares the
same with the source table. Based on the Look up condition it decides whether we need to
update, insert, and delete the data from being loaded in to the target table.

As usually we are goanna connect Empno column from the Source Qualifier and
connect it to look up transformation. Prior to this Look up transformation has to look at
the target table.

Next to this we are goanna specify the look up condition empno =empno1.

Finally specify that connection Information (Oracle) and look up policy on multiple
mismatches (use last value) in the Properties tab.

Expression Transformation:
We are using the Expression Transformation to separate out the Insert-stuffs and UpdateStuffs logically.

Drag all the ports from the Source Qualifier and Look up in to Expression.

Add two Ports and Rename them as Insert, Update.

These two ports are goanna be just output ports. Specify the below conditions in the
Expression editor for the ports respectively.

Insert: isnull(ENO1 )
Update: iif(not isnull(ENO1) and decode(SAL,Curr_Sal,1,0)=0,1,0)

Filter Transformation: We are goanna use two filter Transformation to filter out the data
physically in to two separate sections one for insert and the other for the update process to
happen.
Filter 1:

Drag the Insert and other three ports which came from source qualifier in to the
Expression in to first filter.

In the Properties tab specify the Filter condition as Insert.

Filter 2:

Drag the update and other four ports which came from Look up in to the Expression in
to Second filter.

In the Properties tab specify the Filter condition as update.

Update Strategy: Finally we need the update strategy to insert or to update in to the target
table.
Update Strategy 1: This is intended to insert in to the target table.

Drag all the ports except the insert from the first filter in to this.

In

the

Properties

tab

specify

the

condition

as

the

Update Strategy 2: This is intended to update in to the target table.

Drag all the ports except the update from the second filter in to this.

In the Properties tab specify the condition as the 1 or dd_update.

or

dd_insert.

Finally connect both the update strategy in to two instances of the target.
Step 3: Create a session for this mapping and Run the work flow.
Step 4: Observe the output it would same as the second target table

Step 4: Preview the Output in the target table.

You might also like