Tuning Mappings For Better Performance

http://groups.yahoo.
com/group/Informatica_World
Tuning Mappings for Better Performance

In general, mapping-level optimization takes time to implement but can
significantly boost performance. Sometimes the mapping is the biggest
bottleneck in the load process because business rules determine the number
and complexity of transformations in a mapping.
Before deciding on the best route to optimize the mapping architecture, you
need to resolve some basic issues. Tuning mappings is a tiered process. The
first tier can be of assistance almost universally, bringing about a performance
increase in all scenarios. The second tier of tuning processes may yield only
small performance increase, or can be of significant value, depending on the
situation.
Some factors to consider when choosing tuning processes at the mapping

level include the specific environment, software/ hardware limitations, and the
number of records going through a mapping. This Best Practice offers some
guidelines for tuning mappings.
Description
Analyze mappings for tuning only after you have tuned the system, source,
and target for peak performance. To optimize mappings, you generally reduce
the number of transformations in the mapping and delete unnecessary links
between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank,
and Lookup transformations), limit connected input/output or output ports.
Doing so can reduce the amount of data the transformations store in the data
cache. Too many Lookups and Aggregators encumber performance because
each requires index cache and data cache. Since both are fighting for memory
space, decreasing the number of these transformations in a mapping can help
improve speed. Splitting them up into different mappings is another option.
Limit the number of Aggregators in a mapping. A high number of Aggregators

can increase I/O activity on the cache directory. Unless the seek/access time
is fast on the directory itself, having too many Aggregators can cause a
bottleneck. Similarly, too many Lookups in a mapping causes contention of
disk and memory, which can lead to thrashing, leaving insufficient memory to
run a mapping efficiently.
Consider Single-Pass Reading
If several mappings use the same data source, consider a single-pass

reading. Consolidate separate mappings into one mapping with either a single
Source Qualifier Transformation or one set of Source Qualifier Transformations
as the data source for the separate data flows.
Similarly, if a function is used in several mappings, a single-pass reading will

reduce the number of times that function will be called in the session.
Optimize SQL Overrides
1
http://groups.yahoo.com/group/Informatica_World
When SQL overrides are required in a Source Qualifier, Lookup

Transformation, or in the update override of a target object, be sure the SQL
statement is tuned. The extent to which and how SQL can be tuned depends
on the underlying source or target database system.
Scrutinize Datatype Conversions
PowerCenter Server automatically makes conversions between compatible

datatypes. When these conversions are performed unnecessarily performance
slows. For example, if a mapping moves data from an Integer port to a
Decimal port, then back to an Integer port, the conversion may be
unnecessary.
In some instances however, datatype conversions can help improve

performance. This is especially true when integer values are used in place of
other datatypes for performing comparisons using Lookup and Filter
transformations.
Eliminate Transformation Errors
Large numbers of evaluation errors significantly slow performance of the

PowerCenter Server. During transformation errors, the PowerCenter Server
engine pauses to determine the cause of the error, removes the row causing
the error from the data flow, and logs the error in the session log.
Transformation errors can be caused by many things including: conversion

errors, conflicting mapping logic, any condition that is specifically set up as an
error, and so on. The session log can help point out the cause of these errors.
If errors recur consistently for certain transformations, re-evaluate the
constraints for these transformations. Any source of errors should be traced
and eliminated.
Optimize Lookup Transformations
There are a number of ways to optimize lookup transformations that are

setup in a mapping.
When to Cache Lookups

When caching is enabled, the PowerCenter Server caches the lookup table
and queries the lookup cache during the session. When this option is not
enabled, the PowerCenter Server queries the lookup table on a row-by-row
basis.
NOTE: All the tuning options mentioned in this Best Practice assume that
memory and cache sizing for lookups are sufficient to ensure that caches will
not page to disks. Practices regarding memory and cache sizing for Lookup
transformations are covered in Best Practice: Tuning Sessions for Better
Performance.
In general, if the lookup table needs less than 300MB of memory, lookup
caching should be enabled.
2
A better rule of thumb than memory size is to determine the size of the
potential lookup cache with regard to the number of rows expected to be
processed. For example, consider the following example.
In Mapping X, the source and lookup contain the following number of records:
5000
ITEMS (source):
records
200
MANUFACTURER:
records
100000
DIM_ITEMS:
records
Number of Disk Reads
Cached Lookup Un-cached Lookup
LKP_Manufacturer
Build Cache 200 0
Read Source Records 5000 5000
Execute Lookup 0 5000
Total # of Disk Reads 5200 100000
LKP_DIM_ITEMS
Build Cache 100000 0
Read Source Records 5000 5000
Execute Lookup 0 5000
Total # of Disk Reads 105000 10000
Consider the case where MANUFACTURER is the lookup table. If the lookup
table is cached, it will take a total of 5200 disk reads to build the cache and
execute the lookup. If the lookup table is not cached, then it will take a total
of 10,000 total disk reads to execute the lookup. In this case, the number of
records in the lookup table is small in comparison with the number of times
the lookup is executed. So this lookup should be cached. This is the more
likely scenario.
Consider the case where DIM_ITEMS is the lookup table. If the lookup table is
cached, it will result in 105,000 total disk reads to build and execute the
lookup. If the lookup table is not cached, then the disk reads would total
10,000. In this case the number of records in the lookup table is not small in
3
comparison with the number of times the lookup will be executed. Thus the
lookup should not be cached.
Use the following eight step method to determine if a lookup should be

cached:
1. Code the lookup into the mapping.
2. Select a standard set of data from the source. For example, add a
where clause on a relational source to load a sample 10,000 rows.
3. Run the mapping with caching turned off and save the log.
4. Run the mapping with caching turned on and save the log to a
different name than the log created in step 3.
5. Look in the cached lookup log and determine how long it takes to
cache the lookup object. Note this time in seconds: LOOKUP TIME IN
SECONDS = LS.
6. In the non-cached log, take the time from the last lookup cache to the
end of the load in seconds and divide it into the number or rows being
processed: NON-CACHED ROWS PER SECOND = NRS.
7. In the cached log, take the time from the last lookup cache to the end
of the load in seconds and divide it into number or rows being
processed: CACHED ROWS PER SECOND = CRS.
8. Use the following formula to find the breakeven row point:
(LS*NRS*CRS)/(CRS-NRS) = X
Where X is the breakeven point. If your expected source records is

less than X, it is better to not cache the lookup. If your expected
source records is more than X, it is better to cache the lookup.
For example:
Assume the lookup takes 166 seconds to cache (LS=166).

Assume with a cached lookup the load is 232 rows per second
(CRS=232).
Assume with a non-cached lookup the load is 147 rows per second
(NRS = 147).
The formula would result in: (166*147*232)/(232-147) = 66,603.
Thus, if the source has less than 66,603 records, the lookup should not
be cached. If it has more than 66,603 records, then the lookup should
be cached.
4
Sharing Lookup Caches
There are a number of methods for sharing lookup caches.
• Within a specific session run for a mapping, if the same lookup is

used multiple times in a mapping, the PowerCenter Server will re-use
the cache for the multiple instances of the lookup. Using the same
lookup multiple times in the mapping will be more resource intensive
with each successive instance. If multiple cached lookups are from the
same table but are expected to return different columns of data, it
may be better to setup the multiple lookups to bring back the same
columns even though not all return ports are used in all lookups.
Bringing back a common set of columns may reduce the number of
disk reads.
• Across sessions of the same mapping, the use of an unnamed

persistent cache allows multiple runs to use an existing cache file
stored on the PowerCenter Server. If the option of creating a persistent
cache is set in the lookup properties, the memory cache created for
the lookup during the initial run is saved to the PowerCenter Server.
This can improve performance because the Server builds the memory
cache from cache files instead of the database. This feature should
only be used when the lookup table is not expected to change between
session runs.
• Across different mappings and sessions, the use of a named

persistent cache allows sharing of an existing cache file.
Reducing the Number of Cached Rows
There is an option to use a SQL override in the creation of a lookup cache.

Options can be added to the WHERE clause to reduce the set of records
included in the resulting cache.
NOTE: If you use a SQL override in a lookup, the lookup must be cached.
Optimizing the Lookup Condition
In the case where a lookup uses more than one lookup condition, set the
conditions with an equal sign first in order to optimize lookup performance.
Indexing the Lookup Table
The PowerCenter Server must query, sort and compare values in the lookup
condition columns. As a result, indexes on the database table should include
every column used in a lookup condition. This can improve performance for
both cached and un-cached lookups.
¨ In the case of a cached lookup, an ORDER BY condition is issued in the SQL

statement used to create the cache. Columns used in the ORDER BY condition
should be indexed. The session log will contain the ORDER BY statement.
5
¨ In the case of an un-cached lookup, since a SQL statement created for each
row passing into the lookup transformation, performance can be helped by
indexing columns in the lookup condition.
Optimize Filter and Router Transformations

Filtering data as early as possible in the data flow improves the
efficiency of a mapping. Instead of using a Filter Transformation to remove a
sizeable number of rows in the middle or end of a mapping, use a filter on the
Source Qualifier or a Filter Transformation immediately after the source
qualifier to improve performance.
Avoid complex expressions when creating the filter condition. Filter

transformations are most effective when a simple integer or TRUE/FALSE
expression is used in the filter condition.
Filters or routers should also be used to drop rejected rows from an

Update Strategy transformation if rejected rows do not need to be saved.
Replace multiple filter transformations with a router transformation.

This reduces the number of transformations in the mapping and makes the
mapping easier to follow.
Optimize Aggregator Transformations

Aggregator Transformations often slow performance because they must group
data before processing it.
Use simple columns in the group by condition to make the Aggregator

Transformation more efficient. When possible, use numbers instead of strings
or dates in the GROUP BY columns. Also avoid complex expressions in the
Aggregator expressions, especially in GROUP BY ports.
Use the Sorted Input option in the aggregator. This option requires that
data sent to the aggregator be sorted in the order in which the ports are used
in the aggregators group by. The Sorted Input option decreases the use of
aggregate caches. When it is used, the PowerCenter Server assumes all data
is sorted by group and, as a group is passed through an aggregator,
calculations can be performed and information passed on to the next
transformation. Without sorted input, the Server must wait for all rows of
data before processing aggregate calculations. Use of the Sorted Inputs
option is usually accompanied by a Source Qualifier which uses the Number of
Sorted Ports option.
Use an Expression and Update Strategy instead of an Aggregator

Transformation. This technique can only be used if the source data can be
sorted. Further, using this option assumes that a mapping is using an
Aggregator with Sorted Input option. In the Expression Transformation, the
use of variable ports is required to hold data from the previous row of data
processed. The premise is to use the previous row of data to determine
whether the current row is a part of the current group or is the beginning of a
new group. Thus, if the row is a part of the current group, then its data would
be used to continue calculating the current group function. An Update
6
Strategy Transformation would follow the Expression Transformation and set

the first row of a new group to insert and the following rows to update.
Optimize Joiner Transformations

Joiner transformations can slow performance because they need additional
space in memory at run time to hold intermediate results.
Define the rows from the smaller set of data in the joiner as the
Master rows. The Master rows are cached to memory and the detail records
are then compared to rows in the cache of the Master rows. In order to
minimize memory requirements, the smaller set of data should be cached and
thus set as Master.
Use Normal joins whenever possible. Normal joins are faster than outer
joins and the resulting set of data is also smaller.
Use the database to do the join when sourcing data from the same
database schema. Database systems usually can perform the join more
quickly than the Informatica Server, so a SQL override or a join condition
should be used when joining multiple tables from the same database schema.
Optimize Sequence Generator Transformations
Sequence Generator transformations need to determine the next available

sequence number, thus increasing the Number of Cached Values property can
increase performance. This property determines the number of values the
Informatica Server caches at one time. If it is set to cache no values then the
Informatica Server must query the Informatica repository each time to
determine what is the next number which can be used. Configuring the
Number of Cached Values to a value greater than 1000 should be considered.
It should be noted any cached values not used in the course of a session are
lost since the sequence generator value in the repository is set, when it is
called next time, to give the next set of cache values.
Avoid External Procedure Transformations
For the most part, making calls to external procedures slows down a session.
If possible, avoid the use of these Transformations, which include Stored
Procedures, External Procedures and Advanced External Procedures.
Field Level Transformation Optimization
As a final step in the tuning process, expressions used in transformations can

be tuned. When examining expressions, focus on complex expressions for
possible simplification.
To help isolate slow expressions, do the following:
1. Time the session with the original expression.
2. Copy the mapping and replace half the complex expressions with a
constant.
3. Run and time the edited session.
7
4. Make another copy of the mapping and replace the other half of the
complex expressions with a constant.
5. Run and time the edited session.
Processing field level transformations takes time. If the transformation

expressions are complex, then processing will be slower. Its often possible to
get a 10- 20% performance improvement by optimizing complex field level
transformations. Use the target table mapping reports or the Metadata
Reporter to examine the transformations. Likely candidates for optimization
are the fields with the most complex expressions. Keep in mind that there
may be more than one field causing performance problems.
Factoring out Common Logic
This can reduce the number of times a mapping performs the same logic. If a
mapping performs the same logic multiple times in a mapping, moving the
task upstream in the mapping may allow the logic to be done just once. For
example, a mapping has five target tables. Each target requires a Social
Security Number lookup. Instead of performing the lookup right before each
target, move the lookup to a position before the data flow splits.
Minimize Function Calls
Anytime a function is called it takes resources to process. There are several

common examples where function calls can be reduced or eliminated.
Aggregate function calls can sometime be reduced. In the case of each

aggregate function call, the Informatica Server must search and group the
data.
Thus the following expression:
SUM(Column A) + SUM(Column B)
Can be optimized to:
SUM(Column A + Column B)
In general, operators are faster than functions, so operators should be

used whenever possible.
For example if you have an expression which involves a CONCAT function such
as:
CONCAT(CONCAT(FIRST_NAME, ), LAST_NAME)
It can be optimized to:
FIRST_NAME || || LAST_NAME
Remember that IIF() is a function that returns a value, not just a

logical test. This allows many logical statements to be written in a more
compact fashion.
For example:
8
IIF(FLG_A=Y and FLG_B=Y and FLG_C=Y, VAL_A+VAL_B+VAL_C,
IIF(FLG_A=Y and FLG_B=Y and FLG_C=N, VAL_A+VAL_B,
IIF(FLG_A=Y and FLG_B=N and FLG_C=Y, VAL_A+VAL_C,
IIF(FLG_A=Y and FLG_B=N and FLG_C=N, VAL_A,
IIF(FLG_A=N and FLG_B=Y and FLG_C=Y, VAL_B+VAL_C,
IIF(FLG_A=N and FLG_B=Y and FLG_C=N, VAL_B,
IIF(FLG_A=N and FLG_B=N and FLG_C=Y, VAL_C,
IIF(FLG_A=N and FLG_B=N and FLG_C=N, 0.0))))))))
IIF(FLG_A=Y, VAL_A, 0.0) + IIF(FLG_B=Y, VAL_B, 0.0) +

IIF(FLG_C=Y, VAL_C, 0.0)
The original expression had 8 IIFs, 16 ANDs and 24 comparisons. The

optimized expression results in 3 IIFs, 3 comparisons and two additions.
Be creative in making expressions more efficient. The following is an example

of rework of an expression which eliminates three comparisons down to one:
For example:
IIF(X=1 OR X=5 OR X=9, 'yes', 'no')
IIF(MOD(X, 4) = 1, 'yes', 'no')
Calculate Once, Use Many Times
Avoid calculating or testing the same value multiple times. If the same sub-
expression is used several times in a transformation, consider making the
sub-expression a local variable. The local variable can be used only within the
transformation but by calculating the variable only once can speed
performance.
Choose Numeric versus String Operations
The Informatica Server processes numeric operations faster than string

operations. For example, if a lookup is done on a large amount of data on two
columns, EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around
EMPLOYEE_ID improves performance.
Optimizing Char-Char and Char-Varchar Comparisons
When the Informatica Server performs comparisons between CHAR and

VARCHAR columns, it slows each time it finds trailing blank spaces in the
row. The Treat CHAR as CHAR On Read option can be set in the Informatica
Server setup so that the Informatica Server does not trim trailing spaces from
the end of CHAR source fields.
9
Use DECODE instead of LOOKUP
When a LOOKUP function is used, the Informatica Server must lookup a table
in the database. When a DECODE function is used, the lookup values are
incorporated into the expression itself so the Informatica Server does not
need to lookup a separate table. Thus, when looking up a small set of
unchanging values, using DECODE may improve performance.
Reduce the Number of Transformations in a Mapping
Whenever possible the number of transformations should be reduced. As

there is always overhead involved in moving data between transformations.
Along the same lines, unnecessary links between transformations should be
removed to minimize the amount of data moved. This is especially important
with data being pulled from the Source Qualifier Transformation.
Use Pre- and Post-Session SQL Commands
You can specify pre- and post-session SQL commands in the Properties tab of
the Source Qualifier transformation and in the Properties tab of the target
instance in a mapping. To increase the load speed, use these commands to
drop indexes on the target before the session runs, then recreate them when
the session completes.
Apply the following guidelines when using the SQL statements:
• You can use any command that is valid for the database type.
However, the PowerCenter Server does not allow nested comments,
even though the database might.
• You can use mapping parameters and variables in SQL executed

against the source, but not against the target.
• Use a semi-colon (;) to separate multiple statements.
• The PowerCenter Server ignores semi-colons within single quotes,

double quotes, or within /* ...*/.
• If you need to use a semi-colon outside of quotes or comments, you

can escape it with a back slash (\).
• The Workflow Manager does not validate the SQL.
Use Environmental SQL
For relational databases, you can execute SQL commands in the database
environment when connecting to the database. You can use this for source,
target, lookup, and stored procedure connection. For instance, you can set
isolation levels on the source and target systems to avoid deadlocks. Follow
the guidelines mentioned above for using the SQL statements.
10
http://groups.yahoo.com/group/Informatica_World/
Purely Datawarehouse Materials and FAQ’s can be found @
http://groups.yahoo.com/group/DataWarehouse_World/
http://groups.yahoo.com/group/DWH_World/
Hope for your happy career in Informatica and Datawarehouse
Thanks and Regards
Moderator Team.
11

Tuning Mappings For Better Performance

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tuning Mappings For Better Performance

Uploaded by

Copyright:

Available Formats

http://groups.yahoo.

Tuning Mappings for Better Performance

Some factors to consider when choosing tuning processes at the mapping

Limit the number of Aggregators in a mapping. A high number of Aggregators

Consider Single-Pass Reading

If several mappings use the same data source, consider a single-pass

Similarly, if a function is used in several mappings, a single-pass reading will

Optimize SQL Overrides

When SQL overrides are required in a Source Qualifier, Lookup

Scrutinize Datatype Conversions

PowerCenter Server automatically makes conversions between compatible

In some instances however, datatype conversions can help improve

Eliminate Transformation Errors

Large numbers of evaluation errors significantly slow performance of the

Transformation errors can be caused by many things including: conversion

Optimize Lookup Transformations

There are a number of ways to optimize lookup transformations that are

When to Cache Lookups

Number of Disk Reads

Cached Lookup Un-cached Lookup

Build Cache 200 0

Read Source Records 5000 5000

Execute Lookup 0 5000

Total # of Disk Reads 5200 100000

Build Cache 100000 0

Read Source Records 5000 5000

Execute Lookup 0 5000

Total # of Disk Reads 105000 10000

Use the following eight step method to determine if a lookup should be

1. Code the lookup into the mapping.

8. Use the following formula to find the breakeven row point:

Where X is the breakeven point. If your expected source records is

Assume the lookup takes 166 seconds to cache (LS=166).

The formula would result in: (166*147*232)/(232-147) = 66,603.

Sharing Lookup Caches

There are a number of methods for sharing lookup caches.

• Within a specific session run for a mapping, if the same lookup is

• Across sessions of the same mapping, the use of an unnamed

• Across different mappings and sessions, the use of a named

Reducing the Number of Cached Rows

There is an option to use a SQL override in the creation of a lookup cache.

Optimizing the Lookup Condition

Indexing the Lookup Table

¨ In the case of a cached lookup, an ORDER BY condition is issued in the SQL

Optimize Filter and Router Transformations

Avoid complex expressions when creating the filter condition. Filter

Filters or routers should also be used to drop rejected rows from an

Replace multiple filter transformations with a router transformation.

Optimize Aggregator Transformations

Use simple columns in the group by condition to make the Aggregator

Use an Expression and Update Strategy instead of an Aggregator

Strategy Transformation would follow the Expression Transformation and set

Optimize Joiner Transformations

Optimize Sequence Generator Transformations

Sequence Generator transformations need to determine the next available

Avoid External Procedure Transformations

Field Level Transformation Optimization

As a final step in the tuning process, expressions used in transformations can

To help isolate slow expressions, do the following:

1. Time the session with the original expression.

3. Run and time the edited session.

5. Run and time the edited session.

The formula would result in: (166147232)/(232-147) = 66,603.