You are on page 1of 3

PERFORMANCE TUNING OF LOOKUP TRANSFORMATIONS:

Lookup transformations are used to lookup a set of values in another table. Lookups slows down the performance. 1. To improve performance, cache the lookup tables. Informatica can cache all the lookup and reference tables; this makes operations run very fast. (Meaning of cache is given in point 2 of this section and the procedure for determining the optimum cache size is given at the end of this document.) 2. Even after caching, the performance can be further improved by minimizing the size of the lookup cache. Reduce the number of cached rows by using a sql override with a restriction. Cache: Cache stores data in memory so that Informatica does not have to read the table each time it is referenced. This reduces the time taken by the process to a large extent. Cache is automatically generated by Informatica depending on the marked lookup ports or by a user defined sql query. Example for caching by a user defined query: Suppose we need to lookup records where employee_id=eno. employee_id is from the lookup table, EMPLOYEE_TABLE and eno is the input that comes from the from the source table, SUPPORT_TABLE. We put the following sql query override in Lookup Transform select employee_id from EMPLOYEE_TABLE If there are 50,000 employee_id, then size of the lookup cache will be 50,000. Instead of the above query, we put the following:select emp employee_id from EMPLOYEE_TABLE e, SUPPORT_TABLE s where e. employee_id=s.eno If there are 1000 eno, then the size of the lookup cache will be only 1000. But here the performance gain will happen only if the number of records in SUPPORT_TABLE is not huge. Our concern is to make the size of the cache as less as possible. 3. In lookup tables, delete all unused columns and keep only the fields that are used in the mapping. 4. If possible, replace lookups by joiner transformation or single source qualifier. Joiner transformation takes more time than source qualifier transformation. 5. If lookup transformation specifies several conditions, then place conditions that use equality operator = first in the conditions that appear in the conditions tab. 6. In the sql override query of the lookup table, there will be an ORDER BY clause. Remove it if not needed or put fewer column names in the ORDER BY list. 7. Do not use caching in the following cases: -Source is small and lookup table is large. -If lookup is done on the primary key of the lookup table. 8. Cache the lookup table columns definitely in the following case: -

-If lookup table is small and source is large. 9. If lookup data is static, use persistent cache. Persistent caches help to save and reuse cache files. If several sessions in the same job use the same lookup table, then using persistent cache will help the sessions to reuse cache files. In case of static lookups, cache files will be built from memory cache instead of from the database, which will improve the performance. 10. If source is huge and lookup table is also huge, then also use persistent cache. 11. If target table is the lookup table, then use dynamic cache. The Informatica server updates the lookup cache as it passes rows to the target. 12. Use only the lookups you want in the mapping. Too many lookups inside a mapping will slow down the session. 13. If lookup table has a lot of data, then it will take too long to cache or fit in memory. So move those fields to source qualifier and then join with the main table. 14. If there are several lookups with the same data set, then share the caches. 15. If we are going to return only 1 row, then use unconnected lookup. 16. All data are read into cache in the order the fields are listed in lookup ports. If we have an index that is even partially in this order, the loading of these lookups can be speeded up. 17. If the table that we use for look up has an index (or if we have privilege to add index to the table in the database, do so), then the performance would increase both for cached and uncached lookups.

Disabling Lookup Cache for very large Lookups


Informatica uses Lookup cache to store the lookup data on the ETL tier in flat files (dat and idx). The Integration Service builds cache in memory when it processes the first row of data in the cached Lookup Transformation. If Lookup data is small, the lookup data can be stored in memory and transformation processes the rows very fast. But, if Lookup data is very large (typically over 20M), the lookup cannot fit into the allocated memory and the data has to be paged in and out many times during a single session. As a result, such lookup transformations adversely affect the overall mapping performance. Additionally Informatica takes more time to build such large lookups. If constraining a large lookup is not possible, then consider disabling the lookup cache. Connect to Informatica Workflow Manager, open the session properties, and find the desired transformation in the Transformations folder on the Mapping tab. Then uncheck Lookup Cache Enabled property and save the session. Disabling the lookup cache for heavy lookups will help to avoid excessive paging on the ETL tier. When the lookup cache is disabled, the Integration Service issues a select statement against the lookup source database to retrieve lookup values for each row from the Reader Thread. It would not store any data in its flat files on ETL tier. The issued lookup query uses bind variables, so it is parsed only once in the lookup source database. Disabling lookup cache may work faster for very large lookups under following conditions:

1) Lookup query must use index access path, otherwise data retrieval would be very expensive on the source lookup database tier. Remember that Informatica would fire the lookup query for every record from its Reader thread. 2) Consider creating an index for all columns, which are used in the lookup query. Then Oracle Optimizer would choose INDEX FAST FULL SCAN to retrieve the lookup values from index blocks rather than scanning the whole table. 3) Check the explain plan for the lookup query to ensure index access path. Make sure you test the modified mapping with the selected disabled lookups in a test environment and benchmark its performance prior to implementing the change in the production system.

Criteria to set cache size in look up?


U can set the lookup index and data cache using the formulas as follows: Lookup Index cache (Minimum) = 200((Sum of columns size) +16) Where columns include columns in lookup condition. Lookup Index cache (Maximum) = No. of rows in lookup((sum of columns size)+16)*2 where columns include columns in lookup condition.

Lookup Data Cache = No. of rows in lookup((sum of columns size)+8) where columns include connected output ports not in the lookup condition in case of connected lookups and return port for unconnected lookup ports. Based on these formulas u can set the size of data and index cache size for lookups and this will increase the performance.

You might also like