Professional Documents
Culture Documents
com/group/Informatica_World
Before deciding on the best route to optimize the mapping architecture, you
need to resolve some basic issues. Tuning mappings is a tiered process. The
first tier can be of assistance almost universally, bringing about a performance
increase in all scenarios. The second tier of tuning processes may yield only
small performance increase, or can be of significant value, depending on the
situation.
Description
Analyze mappings for tuning only after you have tuned the system, source,
and target for peak performance. To optimize mappings, you generally reduce
the number of transformations in the mapping and delete unnecessary links
between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank,
and Lookup transformations), limit connected input/output or output ports.
Doing so can reduce the amount of data the transformations store in the data
cache. Too many Lookups and Aggregators encumber performance because
each requires index cache and data cache. Since both are fighting for memory
space, decreasing the number of these transformations in a mapping can help
improve speed. Splitting them up into different mappings is another option.
1
http://groups.yahoo.com/group/Informatica_World
NOTE: All the tuning options mentioned in this Best Practice assume that
memory and cache sizing for lookups are sufficient to ensure that caches will
not page to disks. Practices regarding memory and cache sizing for Lookup
transformations are covered in Best Practice: Tuning Sessions for Better
Performance.
In general, if the lookup table needs less than 300MB of memory, lookup
caching should be enabled.
2
http://groups.yahoo.com/group/Informatica_World
A better rule of thumb than memory size is to determine the size of the
potential lookup cache with regard to the number of rows expected to be
processed. For example, consider the following example.
In Mapping X, the source and lookup contain the following number of records:
5000
ITEMS (source):
records
200
MANUFACTURER:
records
100000
DIM_ITEMS:
records
LKP_Manufacturer
LKP_DIM_ITEMS
Consider the case where MANUFACTURER is the lookup table. If the lookup
table is cached, it will take a total of 5200 disk reads to build the cache and
execute the lookup. If the lookup table is not cached, then it will take a total
of 10,000 total disk reads to execute the lookup. In this case, the number of
records in the lookup table is small in comparison with the number of times
the lookup is executed. So this lookup should be cached. This is the more
likely scenario.
Consider the case where DIM_ITEMS is the lookup table. If the lookup table is
cached, it will result in 105,000 total disk reads to build and execute the
lookup. If the lookup table is not cached, then the disk reads would total
10,000. In this case the number of records in the lookup table is not small in
3
http://groups.yahoo.com/group/Informatica_World
comparison with the number of times the lookup will be executed. Thus the
lookup should not be cached.
2. Select a standard set of data from the source. For example, add a
where clause on a relational source to load a sample 10,000 rows.
3. Run the mapping with caching turned off and save the log.
4. Run the mapping with caching turned on and save the log to a
different name than the log created in step 3.
5. Look in the cached lookup log and determine how long it takes to
cache the lookup object. Note this time in seconds: LOOKUP TIME IN
SECONDS = LS.
6. In the non-cached log, take the time from the last lookup cache to the
end of the load in seconds and divide it into the number or rows being
processed: NON-CACHED ROWS PER SECOND = NRS.
7. In the cached log, take the time from the last lookup cache to the end
of the load in seconds and divide it into number or rows being
processed: CACHED ROWS PER SECOND = CRS.
(LS*NRS*CRS)/(CRS-NRS) = X
For example:
Thus, if the source has less than 66,603 records, the lookup should not
be cached. If it has more than 66,603 records, then the lookup should
be cached.
4
http://groups.yahoo.com/group/Informatica_World
NOTE: If you use a SQL override in a lookup, the lookup must be cached.
In the case where a lookup uses more than one lookup condition, set the
conditions with an equal sign first in order to optimize lookup performance.
The PowerCenter Server must query, sort and compare values in the lookup
condition columns. As a result, indexes on the database table should include
every column used in a lookup condition. This can improve performance for
both cached and un-cached lookups.
5
http://groups.yahoo.com/group/Informatica_World
¨ In the case of an un-cached lookup, since a SQL statement created for each
row passing into the lookup transformation, performance can be helped by
indexing columns in the lookup condition.
Use the Sorted Input option in the aggregator. This option requires that
data sent to the aggregator be sorted in the order in which the ports are used
in the aggregators group by. The Sorted Input option decreases the use of
aggregate caches. When it is used, the PowerCenter Server assumes all data
is sorted by group and, as a group is passed through an aggregator,
calculations can be performed and information passed on to the next
transformation. Without sorted input, the Server must wait for all rows of
data before processing aggregate calculations. Use of the Sorted Inputs
option is usually accompanied by a Source Qualifier which uses the Number of
Sorted Ports option.
6
http://groups.yahoo.com/group/Informatica_World
Define the rows from the smaller set of data in the joiner as the
Master rows. The Master rows are cached to memory and the detail records
are then compared to rows in the cache of the Master rows. In order to
minimize memory requirements, the smaller set of data should be cached and
thus set as Master.
Use Normal joins whenever possible. Normal joins are faster than outer
joins and the resulting set of data is also smaller.
Use the database to do the join when sourcing data from the same
database schema. Database systems usually can perform the join more
quickly than the Informatica Server, so a SQL override or a join condition
should be used when joining multiple tables from the same database schema.
For the most part, making calls to external procedures slows down a session.
If possible, avoid the use of these Transformations, which include Stored
Procedures, External Procedures and Advanced External Procedures.
2. Copy the mapping and replace half the complex expressions with a
constant.
7
http://groups.yahoo.com/group/Informatica_World
4. Make another copy of the mapping and replace the other half of the
complex expressions with a constant.
This can reduce the number of times a mapping performs the same logic. If a
mapping performs the same logic multiple times in a mapping, moving the
task upstream in the mapping may allow the logic to be done just once. For
example, a mapping has five target tables. Each target requires a Social
Security Number lookup. Instead of performing the lookup right before each
target, move the lookup to a position before the data flow splits.
SUM(Column A) + SUM(Column B)
SUM(Column A + Column B)
For example if you have an expression which involves a CONCAT function such
as:
CONCAT(CONCAT(FIRST_NAME, ), LAST_NAME)
FIRST_NAME || || LAST_NAME
For example:
8
http://groups.yahoo.com/group/Informatica_World
For example:
Avoid calculating or testing the same value multiple times. If the same sub-
expression is used several times in a transformation, consider making the
sub-expression a local variable. The local variable can be used only within the
transformation but by calculating the variable only once can speed
performance.
9
http://groups.yahoo.com/group/Informatica_World
When a LOOKUP function is used, the Informatica Server must lookup a table
in the database. When a DECODE function is used, the lookup values are
incorporated into the expression itself so the Informatica Server does not
need to lookup a separate table. Thus, when looking up a small set of
unchanging values, using DECODE may improve performance.
You can specify pre- and post-session SQL commands in the Properties tab of
the Source Qualifier transformation and in the Properties tab of the target
instance in a mapping. To increase the load speed, use these commands to
drop indexes on the target before the session runs, then recreate them when
the session completes.
• You can use any command that is valid for the database type.
However, the PowerCenter Server does not allow nested comments,
even though the database might.
For relational databases, you can execute SQL commands in the database
environment when connecting to the database. You can use this for source,
target, lookup, and stored procedure connection. For instance, you can set
isolation levels on the source and target systems to avoid deadlocks. Follow
the guidelines mentioned above for using the SQL statements.
10
http://groups.yahoo.com/group/Informatica_World
http://groups.yahoo.com/group/Informatica_World/
http://groups.yahoo.com/group/DataWarehouse_World/
http://groups.yahoo.com/group/DWH_World/
Moderator Team.
11