You are on page 1of 12

Tuning Mappings for Better Performance

Challenge
In general, mapping-level optimization takes time to implement, but can significantly boost performance. Sometimes the mapping is the biggest bottleneck in the load process because business rules determine the number and complexity of transformations in a mapping. Before deciding on the best route to optimize the mapping architecture, you need to resolve some basic issues. Tuning mappings is a tiered process. The first tier can be of assistance almost universally, bringing about a performance increase in all scenarios. The second tier of tuning processes may yield only small performance increase, or can be of significant value, depending on the situation. Some factors to consider hen choosing tuning processes at the mapping level include the specific environment, soft are! hard are limitations, and the number of records going through a mapping. This Best "ractice offers some guidelines for tuning mappings.

Description
#nalyze mappings for tuning only after you have tuned the system, source, and target for peak performance. To optimize mappings, you generally reduce the number of transformations in the mapping and delete unnecessary links bet een transformations. $or transformations that use data cache %such as #ggregator, &oiner, 'ank, and (ookup transformations), limit connected input!output or output ports. *oing so can reduce the amount of data the transformations store in the data cache. Too many (ookups and #ggregators encumber performance because each re+uires index cache and data cache. Since both are fighting for memory space, decreasing the number of these transformations in a mapping can help improve speed. Splitting them up into different mappings is another option. (imit the number of #ggregators in a mapping. # high number of #ggregators can increase I!, activity on the cache directory. -nless the seek!access time is fast on the directory itself, having too many #ggregators can cause a bottleneck. Similarly, too many (ookups in a mapping causes contention of disk and memory, hich can lead to thrashing, leaving insufficient memory to run a mapping efficiently. Consider Single-Pass Reading If several mappings use the same data source, consider a single-pass reading. .onsolidate separate mappings into one mapping ith either a single Source /ualifier Transformation

or one set of Source /ualifier Transformations as the data source for the separate data flo s. Similarly, if a function is used in several mappings, a single-pass reading ill reduce the number of times that function ill be called in the session. Optimize SQL O errides 0hen S/( overrides are re+uired in a Source /ualifier, (ookup Transformation, or in the update override of a target ob1ect, be sure the S/( statement is tuned. The extent to hich and ho S/( can be tuned depends on the underlying source or target database system. See the section Tuning S/( ,verrides and 2nvironment for Better "erformance for more information. Scrutinize Datat!pe Con ersions "o er.enter Server automatically makes conversions bet een compatible datatypes. 0hen these conversions are performed unnecessarily performance slo s. $or example, if a mapping moves data from an Integer port to a *ecimal port, then back to an Integer port, the conversion may be unnecessary. In some instances ho ever, datatype conversions can help improve performance. This is especially true hen integer values are used in place of other datatypes for performing comparisons using (ookup and $ilter transformations. "liminate Transformation "rrors (arge numbers of evaluation errors significantly slo performance of the "o er.enter Server. *uring transformation errors, the "o er.enter Server engine pauses to determine the cause of the error, removes the ro causing the error from the data flo , and logs the error in the session log. Transformation errors can be caused by many things including3 conversion errors, conflicting mapping logic, any condition that is specifically set up as an error, and so on. The session log can help point out the cause of these errors. If errors recur consistently for certain transformations, re-evaluate the constraints for these transformation. #ny source of errors should be traced and eliminated. Optimize Loo#up Transformations There are a number of ays to optimize lookup transformations that are setup in a mapping. $hen to Cache Loo#ups

0hen caching is enabled, the "o er.enter Server caches the lookup table and +ueries the lookup cache during the session. 0hen this option is not enabled, the "o er.enter Server +ueries the lookup table on a ro -by-ro basis. %OT"3 #ll the tuning options mentioned in this Best "ractice assume that memory and cache sizing for lookups are sufficient to ensure that caches ill not page to disks. "ractices regarding memory and cache sizing for (ookup transformations are covered in Best "ractice3 Tuning Sessions for Better Performance. In general, if the lookup table needs less than 4556B of memory, lookup caching should be enabled. # better rule of thumb than memory size is to determine the size of the potential lookup cache ith regard to the number of ro s expected to be processed. $or example, consider the follo ing example. In 6apping 7, the source and lookup contain the follo ing number of records3
IT26S %source)3 8555 records :55 records <55555 records

6#9-$#.T-'2'3

*I6;IT26S3

%um&er of Dis# Reads

Cached Loo#up L(P)Manufacturer Build .ache 'ead Source 'ecords 2xecute (ookup Total = of *isk 'eads L(P)D*M)*T"MS Build .ache 'ead Source 'ecords 2xecute (ookup Total = of *isk 'eads <55555 8555 5 <58555 :55 8555 5 8:55

'n-cached Loo#up

5 8555 8555 <5555

5 8555 8555 <5555

.onsider the case here 6#9-$#.T-'2' is the lookup table. If the lookup table is cached, it ill take a total of 8:55 disk reads to build the cache and execute the lookup. If the lookup table is not cached, then it ill take a total of <5,555 total disk reads to execute the lookup. In this case, the number of records in the lookup table is small in comparison ith the number of times the lookup is executed. So this lookup should be cached. This is the more likely scenario. .onsider the case here *I6;IT26S is the lookup table. If the lookup table is cached, it ill result in <58,555 total disk reads to build and execute the lookup. If the lookup table is not cached, then the disk reads ould total <5,555. In this case the number of records

in the lookup table is not small in comparison ith the number of times the lookup ill be executed. Thus the lookup should not be cached. -se the follo ing eight step method to determine if a lookup should be cached3 <. .ode the lookup into the mapping. :. Select a standard set of data from the source. $or example, add a here clause on a relational source to load a sample <5,555 ro s. 4. 'un the mapping ith caching turned off and save the log. >. 'un the mapping ith caching turned on and save the log to a different name than the log created in step 4. 8. (ook in the cached lookup log and determine ho long it takes to cache the lookup ob1ect. 9ote this time in seconds3 (,,?-" TI62 I9 S2.,9*S @ (S. A. In the non-cached log, take the time from the last lookup cache to the end of the load in seconds and divide it into the number or ro s being processed3 9,9.#.B2* ',0S "2' S2.,9* @ 9'S. C. In the cached log, take the time from the last lookup cache to the end of the load in seconds and divide it into number or ro s being processed3 .#.B2* ',0S "2' S2.,9* @ .'S. D. -se the follo ing formula to find the breakeven ro point3 %(SE9'SE.'S)!%.'S-9'S) @ 7 0here 7 is the breakeven point. If your expected source records is less than 7, it is better to not cache the lookup. If your expected source records is more than 7, it is better to cache the lookup. $or example3 #ssume the lookup takes <AA seconds to cache %(S@<AA). #ssume ith a cached lookup the load is :4: ro s per second %.'S@:4:). #ssume ith a non-cached lookup the load is <>C ro s per second %9'S @ <>C). The formula ould result in3 %<AAE<>CE:4:)!%:4:-<>C) @ AA,A54. Thus, if the source has less than AA,A54 records, the lookup should not be cached. If it has more than AA,A54 records, then the lookup should be cached.
Sharing Lookup Caches

There are a number of methods for sharing lookup caches.

$ithin a specific session run for a mapping, if the same lookup is used multiple times in a mapping, the "o er.enter Server ill re-use the cache for the multiple instances of the lookup. -sing the same lookup multiple times in the mapping ill be more resource intensive ith each successive instance. If multiple cached lookups are from the same table but are expected to return different columns of

data, it may be better to setup the multiple lookups to bring back the same columns even though not all return ports are used in all lookups. Bringing back a common set of columns may reduce the number of disk reads. +cross sessions of the same mapping, the use of an unnamed persistent cache allo s multiple runs to use an existing cache file stored on the "o er.enter Server. If the option of creating a persistent cache is set in the lookup properties, the memory cache created for the lookup during the initial run is saved to the "o er.enter Server. This can improve performance because the Server builds the memory cache from cache files instead of the database. This feature should only be used hen the lookup table is not expected to change bet een session runs. +cross different mappings and sessions, the use of a named persistent cache allo s sharing of an existing cache file.

Reducing the Number of Cached Rows

There is an option to use a S/( override in the creation of a lookup cache. ,ptions can be added to the 0B2'2 clause to reduce the set of records included in the resulting cache. %OT", If you use a S/( override in a lookup, the lookup must be cached.
Optimizing the Lookup Condition

In the case here a lookup uses more than one lookup condition, set the conditions ith an e+ual sign first in order to optimize lookup performance.
Indexing the Lookup Table

The "o er.enter Server must +uery, sort and compare values in the lookup condition columns. #s a result, indexes on the database table should include every column used in a lookup condition. This can improve performance for both cached and un-cached lookups. F In the case of a cached lookup, an ,'*2' BG condition is issued in the S/( statement used to create the cache. .olumns used in the ,'*2' BG condition should be indexed. The session log ill contain the ,'*2' BG statement. F In the case of an un-cached lookup, since a S/( statement created for each ro passing into the lookup transformation, performance can be helped by indexing columns in the lookup condition. Optimize -ilter and Router Transformations -iltering data as earl! as possi&le in the data flo. improves the efficiency of a mapping. Instead of using a $ilter Transformation to remove a sizeable number of ro s in the middle or end of a mapping, use a filter on the Source /ualifier or a $ilter Transformation immediately after the source +ualifier to improve performance.

+ oid comple/ e/pressions .hen creating the filter condition. $ilter transformations are most effective hen a simple integer or T'-2!$#(S2 expression is used in the filter condition. -ilters or routers should also &e used to drop re1ected ro s from an -pdate Strategy transformation if re1ected ro s do not need to be saved. Replace multiple filter transformations .ith a router transformation. This reduces the number of transformations in the mapping and makes the mapping easier to follo . Optimize +ggregator Transformations #ggregator Transformations often slo performance because they must group data before processing it. 'se simple columns in the group &! condition to make the #ggregator Transformation more efficient. 0hen possible, use numbers instead of strings or dates in the H',-" BG columns. #lso avoid complex expressions in the #ggregator expressions, especially in H',-" BG ports. 'se the Sorted *nput option in the aggregator. This option re+uires that data sent to the aggregator be sorted in the order in hich the ports are used in the aggregators group by. The Sorted Input option decreases the use of aggregate caches. 0hen it is used, the "o er.enter Server assumes all data is sorted by group and, as a group is passed through an aggregator, calculations can be performed and information passed on to the next transformation. 0ithout sorted input, the Server must ait for all ro s of data before processing aggregate calculations. -se of the Sorted Inputs option is usually accompanied by a Source /ualifier hich uses the 9umber of Sorted "orts option. 'se an "/pression and 'pdate Strateg! instead of an #ggregator Transformation. This techni+ue can only be used if the source data can be sorted. $urther, using this option assumes that a mapping is using an #ggregator ith Sorted Input option. In the 2xpression Transformation, the use of variable ports is re+uired to hold data from the previous ro of data processed. The premise is to use the previous ro of data to determine hether the current ro is a part of the current group or is the beginning of a ne group. Thus, if the ro is a part of the current group, then its data ould be used to continue calculating the current group function. #n -pdate Strategy Transformation ould follo the 2xpression Transformation and set the first ro of a ne group to insert and the follo ing ro s to update. Optimize 0oiner Transformations &oiner transformations can slo performance because they need additional space in memory at run time to hold intermediate results.

Define the ro.s from the smaller set of data in the 1oiner as the Master ro.s. The 6aster ro s are cached to memory and the detail records are then compared to ro s in the cache of the 6aster ro s. In order to minimize memory re+uirements, the smaller set of data should be cached and thus set as 6aster. 'se %ormal 1oins .hene er possi&le. 9ormal 1oins are faster than outer 1oins and the resulting set of data is also smaller. 'se the data&ase to do the 1oin hen sourcing data from the same database schema. *atabase systems usually can perform the 1oin more +uickly than the Informatica Server, so a S/( override or a 1oin condition should be used hen 1oining multiple tables from the same database schema. Optimize Se2uence 3enerator Transformations Se+uence Henerator transformations need to determine the next available se+uence number, thus increasing the 9umber of .ached Ialues property can increase performance. This property determines the number of values the Informatica Server caches at one time. If it is set to cache no values then the Informatica Server must +uery the Informatica repository each time to determine hat is the next number hich can be used. .onfiguring the 9umber of .ached Ialues to a value greater than <555 should be considered. It should be noted any cached values not used in the course of a session are lost since the se+uence generator value in the repository is set, hen it is called next time, to give the next set of cache values. + oid "/ternal Procedure Transformations $or the most part, making calls to external procedures slo s do n a session. If possible, avoid the use of these Transformations, hich include Stored "rocedures, 2xternal "rocedures and #dvanced 2xternal "rocedures. -ield Le el Transformation Optimization #s a final step in the tuning process, expressions used in transformations can be tuned. 0hen examining expressions, focus on complex expressions for possible simplification. To help isolate slo expressions, do the follo ing3 <. Time the session ith the original expression. :. .opy the mapping and replace half the complex expressions ith a constant. 4. 'un and time the edited session. >. 6ake another copy of the mapping and replace the other half of the complex expressions ith a constant.

8. 'un and time the edited session. "rocessing field level transformations takes time. If the transformation expressions are complex, then processing ill be slo er. Its often possible to get a <5- :5J performance improvement by optimizing complex field level transformations. -se the target table mapping reports or the 6etadata 'eporter to examine the transformations. (ikely candidates for optimization are the fields ith the most complex expressions. ?eep in mind that there may be more than one field causing performance problems. actoring out Common Logic This can reduce the number of times a mapping performs the same logic. If a mapping performs the same logic multiple times in a mapping, moving the task upstream in the mapping may allo the logic to be done 1ust once. $or example, a mapping has five target tables. 2ach target re+uires a Social Security 9umber lookup. Instead of performing the lookup right before each target, move the lookup to a position before the data flo splits. !inimize unction Calls #nytime a function is called it takes resources to process. There are several common examples here function calls can be reduced or eliminated. +ggregate function calls can sometime be reduced. In the case of each aggregate function call, the Informatica Server must search and group the data. Thus the follo ing expression3 S-6%.olumn #) K S-6%.olumn B) .an be optimized to3 S-6%.olumn # K .olumn B) *n general4 operators are faster than functions, so operators should be used henever possible. $or example if you have an expression hich involves a .,9.#T function such as3 .,9.#T%.,9.#T%$I'ST;9#62, ), (#ST;9#62) It can be optimized to3 $I'ST;9#62 LL LL (#ST;9#62

Remem&er that **-56 is a function that returns a alue, not 1ust a logical test. This allo s many logical statements to be ritten in a more compact fashion. $or example3 II$%$(H;#@G and $(H;B@G and $(H;.@G, I#(;#KI#(;BKI#(;., II$%$(H;#@G and $(H;B@G and $(H;.@9, I#(;#KI#(;B, II$%$(H;#@G and $(H;B@9 and $(H;.@G, I#(;#KI#(;., II$%$(H;#@G and $(H;B@9 and $(H;.@9, I#(;#, II$%$(H;#@9 and $(H;B@G and $(H;.@G, I#(;BKI#(;., II$%$(H;#@9 and $(H;B@G and $(H;.@9, I#(;B, II$%$(H;#@9 and $(H;B@9 and $(H;.@G, I#(;., II$%$(H;#@9 and $(H;B@9 and $(H;.@9, 5.5)))))))) .an be optimized to3 II$%$(H;#@G, I#(;#, 5.5) K II$%$(H;B@G, I#(;B, 5.5) K II$%$(H;.@G, I#(;., 5.5) The original expression had D II$s, <A #9*s and :> comparisons. The optimized expression results in 4 II$s, 4 comparisons and t o additions. Be creati e in making expressions more efficient. The follo ing is an example of re ork of an expression hich eliminates three comparisons do n to one3 $or example3 II$%7@< ,' 7@8 ,' 7@M, NyesN, NnoN) .an be optimized to3 II$%6,*%7, >) @ <, NyesN, NnoN) Calculate Once" #se !an$ Times #void calculating or testing the same value multiple times. If the same sub-expression is used several times in a transformation, consider making the sub-expression a local variable. The local variable can be used only ithin the transformation but by calculating the variable only once can speed performance.

Choose Numeric %ersus String Operations The Informatica Server processes numeric operations faster than string operations. $or example, if a lookup is done on a large amount of data on t o columns, 26"(,G22;9#62 and 26"(,G22;I*, configuring the lookup around 26"(,G22;I* improves performance. Optimizing Char&Char and Char&'archar Comparisons 0hen the Informatica Server performs comparisons bet een .B#' and I#'.B#' columns, it slo s each time it finds trailing blank spaces in the ro . The Treat .B#' as .B#' ,n 'ead option can be set in the Informatica Server setup so that the Informatica Server does not trim trailing spaces from the end of .B#' source fields. #se ()CO() instead of LOO*#+ 0hen a (,,?-" function is used, the Informatica Server must lookup a table in the database. 0hen a *2.,*2 function is used, the lookup values are incorporated into the expression itself so the Informatica Server does not need to lookup a separate table. Thus, hen looking up a small set of unchanging values, using *2.,*2 may improve performance. Reduce the Number of Transformations in a !apping 0henever possible the number of transformations should be reduced. #s there is al ays overhead involved in moving data bet een transformations. #long the same lines, unnecessary links bet een transformations should be removed to minimize the amount of data moved. This is especially important ith data being pulled from the Source /ualifier Transformation. #se +re& and +ost&Session S,L Commands Gou can specify pre- and post-session S/( commands in the "roperties tab of the Source /ualifier transformation and in the "roperties tab of the target instance in a mapping. To increase the load speed, use these commands to drop indexes on the target before the session runs, then recreate them hen the session completes. #pply the follo ing guidelines hen using the S/( statements3

Gou can use any command that is valid for the database type. Bo ever, the "o er.enter Server does not allo nested comments, even though the database might. Gou can use mapping parameters and variables in S/( executed against the source, but not against the target. -se a semi-colon %O) to separate multiple statements.

The "o er.enter Server ignores semi-colons ithin single +uotes, double +uotes, or ithin !E ...E!. If you need to use a semi-colon outside of +uotes or comments, you can escape it ith a back slash %P). The 0orkflo 6anager does not validate the S/(.

#se )n%ironmental S,L $or relational databases, you can execute S/( commands in the database environment hen connecting to the database. Gou can use this for source, target, lookup, and stored procedure connection. $or instance, you can set isolation levels on the source and target systems to avoid deadlocks. $ollo the guidelines mentioned above for using the S/( statements.

You might also like