Informatica

PowerCenter® 7
Architecture and
Performance Tuning
Erwin Dral
Sales Consultant
1
Agenda
° PowerCenter Architecture
° Performance tuning step-by-step
° Eliminating Common bottlenecks
2
PowerCenter Architecture:
Engine-based & Metadata-driven
Metadata
Client Tools Workflow Workflow Repository Designer
Windows Manager Monitor Manager Reporter
ODBC ODBC
Metadata
Exchange
Erwin
Designer 2000 TCP/IP
Power Designer
Heterogeneous
CWM Heterogeneous
JDBC
ODBC Targets
Sources Repository Server
Oracle Repository Agent Oracle API, SQL*Loader
MS SQL Server MS SQL Server, BCP
Sybase Native
Informix Sybase, IQ Load
DB2 UDB Targets Informix
ODBC Sources
Flat File DB2 UDB, Autoloader
MainFrame XML Teradata fload,
fload, tpump,
tpump,
MainFrame
VSAM/COBOL
ERP Metadata mpumpERP
Copybook GDR Native ODBC SAS
SAS Native
ODBC Repository ODBC
Flat FileRealTime
RealTime
Remote Files XML Remote Files
TCP/IP
PowerConnect PowerConnect
PowerCenter Server Engine
Buffers
UNIX, Windows
Reader DTM Writer
Key
Data
Metadata
3
Introducing PowerExchange
On-Demand Data Access through Changed Data Capture
Mainframe
Real-time
AS/400, HP3000
Change
Relational
Batch
File Formats, EAI
Change Bulk
4
PowerCenter Environment
Disk Disk
Disk Disk Disk Disk

DBMS
LAN/WAN
Disk OS Disk
Disk Disk
Disk PowerCenter Disk
Disk Disk
° This is a multi-vendor, multi-system environment

° There are many components involved
− Operating systems, databases, networks, I/O, PowerCenter
° Performance is determined by
THE SLOWEST COMPONENT (the bottleneck)
− Usually need to monitor performance in several places
− Usually need to monitor outside PowerCenter
5
Server Architecture - Memory
° The PowerCenter Server utilizes two main processes

− Load Manager process (pmserver)
− Session process (DTM)
° The Load Manager process is a continuous listener process

designed to handle tasks such as session start, scheduling,
error reporting, email, etc.
− Configured using the using the Load Manager Shared Memory
parameter
− Set value to approximately 200K bytes per session multiplied by
the max number of concurrent sessions
6
° The DTM process uses shared memory to handle tasks such as

reading, data transformation and writing
° Two session parameters control the DTM memory allocation
− DTM Buffer Pool Size
− Buffer Block Size
° DTM pipeline threads overlap when possible
Transformation
Reader Writer
Engine
7
Server memory runtime
° Example
8
° DTM Buffer Pool Size controls the total amount of memory used
to buffer rows internally by the reader and writer
− This sets the total number of blocks available
− The optimal value is about 25MB
− If the block size is 64K, then you get 25M/64K = 390 blocks
° Buffer Block Size controls the size of the blocks that move in
the pipeline
− Optimum size depends on the row size being processed
− 64KB ≈ 64 rows of 1KB
− 128KB ≈ 128 rows of 1KB
9
Server Architecture – DTM Parameters
The Session Task parameters control the processing pipeline and

are found on the Properties and Config Object tabs
10
Server Architecture - Threads
Assume a mapping with an Aggregator, a Rank, and other

transformations in a session with two partitions. Pre and
Load Manager
Post session commands would add one thread each.
DTM
Master Thread
Mapping Thread Transformation
Transformation
Thread
Thread
Reader Rank
Thread Threads
Reader
Thread
Thread
Transformation
Thread
Transformation Transformation
Writer Thread Thread
Transformation
Thread Thread
Writer
Thread
Thread
Aggregator
Process Memory Threads
11
Performance tuning step-by-step
1. Determine Batch window
Until elapsed time

2.
< batch window
Measure
5. 3.
Run Determine
sessions bottleneck
HINTS:
•Write down a log of every step 4.
•If all resources are used 100%, buy more Make ONE
change
•If the change doesn’t help, UNDO
12
2. Measuring Performance Internal to
Informatica
13
Measuring Performance - Internal
° Several types of Bottlenecks can affect session performance

− Network
− System
− Database
− Informatica Mapping and Session
° There several ways of measuring performance such as total amount of
data (volume) per unit of time
− Volume can be measured as:
° Number of bytes
° Number of rows
− Time can be measured as:
° CPU or process time
° Lapsed time
14
° For the purpose of identifying bottlenecks use:

− Lapsed time as a relative measurement time
− Number of rows loaded over the period of time (rows per second)
° Rows per second allows performance measurement of a

session over a period of time and with changes in the
environment
° Rows per sec can have a very wide range depending on the size
of the row (number of bytes), the type of source/target (flat file
or relational) and underlying hardware
15
° Establishing the baseline using the Workflow Manager

− Run the session task to be measured
− View the session task Transformation Statistics detail window at
the end of the session and record the number of rows loaded
− View the Session Task Properties window and record the start
and end times of the session
− Subtract the start time from the end time of the session, convert
to seconds to get the total session time
− Divide the number of rows loaded by the number of seconds of
run time for the session
16
Example
Session Name
Start/End Times
Applied Rows
17
Tips:
° Calculated rows per second are not the same as “Write
Throughput”
° For multiple targets use sum of rows loaded for targets which
are similar in row size
° For multiple partitions use the sum of rows loaded for all
partitions
° Monitor background processes external to Informatica that will
have an effect between test runs
18
Establishing Baselines Internal to
Informatica
19
Establishing Baselines - Internal
° Each component in a production environment contributes to the
overall session performance
° Performance is limited to the slowest component
° Knowing the physical data limits establishes the maximum data

throughput
° Baseline measurement can be used for future comparisons
LAN/ DBMS
WAN OS
PowerCenter
20
Establishing Baselines – Read
Throughput Mapping
° Read Throughput Mapping – Use a database table to
flat file mapping to establish a typical read rate
Rows
Session Name Rows Rows Start End Elapsed Per
Loaded Failed Time Time Time Sec
s_m_RDB_TO_FF_TEST 249995 0 10/18/2002 10/18/2002 19 13158

11:00:58 AM 11:01:17 AM
21
Establishing Baselines - Historical
° Each Informatica Repository contains a history of each session

run
° Use MX view “REP_SESS_LOG to extract session information
SUBJECT_AREA (Folder) SUCCESSFUL_ROWS (Rows Loaded) ACTUAL_START (Start Time)

SESSION_NAME (Session) FAILED_ROWS (Rows Not Loaded) SESSION_TIMESTAMP (End Time)
Note: simple query – select * from rep_sess_log
22
2. Measure Performance
° Use repository views to establish performance

− Session elapsed time (in seconds) =
(REP_SESS_LOG.SESSION_TIMESTAMP -
REP_SESS_LOG.ACTUAL_START) * 86400
TIMESTAMPDIFF(2,CHAR
(SESSION_LOG.SESSION_TIMESTAMP -
SESSION_LOG.ACTUAL_START))
- Target Rows per second =
SUCCESSFUL_ROWS / Session elapsed time
° OR: Use the MetaData Reporter!
23
3. Determine bottleneck
° Identifying Target Bottlenecks

° Identifying Source Bottlenecks
° Identifying Mapping Bottlenecks
− session parameters
− system resource allocation
− mapping/transformation design
24
3. Determine Target Bottlenecks
° Writing to a flat file usually does not cause a

bottleneck
° Configure a session task to write to
a flat file target (/dev/null)
− If write throughput increases significantly,
then you have a target database bottleneck.
25
3. Determine Source or Mapping Bottlenecks
Add a FILTER behind each source qualifier

set filter condition to false
Original
Modified
No faster  Source bottleneck

Faster  mapping bottleneck
26
6. Make ONE change
° Very case-specific,
here are some common bottlenecks
− Target
− Source
− Mapping
− Session
− System
° Only keep the changes that improve performance

(maintaining changes is confusing and costly)
27
6. Eliminate Target Bottlenecks
° Databases indexes and constraints

− Disable indexes and constraints before the load, and enable
afterward (connection/target pre- & post SQL)
− Check the database space allocation for indexes
° indexes should be on a different disk if possible
° Use a loader connection

° Check the commit interval
− Very small commit intervals cause excessive overhead
− Make sure you have allocated plenty of rollback space
(PC6: connection Rollback segment)
− Good Commit interval is 50,000
28
6. Eliminate Target Bottlenecks
° PowerCenter updates and deletes

− Updates and deletes can be extremely slow
without an index or key
− Bitmap Indexes on columns you are updating cause very slow
performance (usually less than 100 rows/sec)
− Do NOT use an Update Strategy transformation if all rows are
treated the same (DD_INSERT, DD_UPDATE).
The writer cannot do block inserts or block updates
29
3. Eliminate Source Bottlenecks
Discuss with your DBA how to optimize

your Source Qualifier SQL (in the session log file)
− standard DBMS tuning:
explain plan, add indexes, estimate statistics (regularly)
alter database parameters, etc
° Optimize the query to begin returning rows early

− the total query time may be longer, but PowerCenter
processing can overlap with the query execution
30
3. Eliminate Mapping Bottlenecks
° Reduce I/O times

− Cache in memory
− Use fast disks for Cache, BadFiles, SessionLogs etc.
− Check your Sequence Generator
° Reduce amount of data to transform

− Filter early
° Aggregator or joiner: prefix with a sorter
31
6. Optimize expression performance
° Use numeric ports instead of string ports

° Reduce (hidden) Data type conversions
° Simplify expressions
− Factor out common logic to transformation variables
or even mapping variables or parameters
° Simplify nested IIFs when possible
or use DECODE statements
32
6. Optimize Lookup Performance
° Reduce the number of lookup rows.

− ‘where’ clause in lookup sql
° Use persistent lookup caches

− When a nightly batch has several sessions that use the
same lookup
− Build the persistent cachefile in a separate session
° Lookup with date-range: lookup/filter combo

° Lookup against large dimension with few changes:
− PoweExchange Changed Data Capture
− checksum AEP plus lookup (devnet.informatica.com)
° Remove the lookup, use ‘update else insert’
33
6. Session Optimizing
° Set the DTM Buffer Pool Size and Buffer Block Size
− Large row sizes may require a larger buffer block size
− Default buffer pool is 12000000b = 12 Mb,
recommended is 24Mb
° Buffer Block Size controls the size of the blocks that
move in the pipeline
− Buffer Block size should hold about 100 rows
− 64K (64,000) ≈ 64 rows of 1Kb
− 128K (128,000) ≈ 128 rows of 1Kb
° Extremely large DTM may SLOW DOWN session!
34
6. Session Memory Settings
° Set cache memory larger than the size of the cachefile on disk
° Set the server variable directories
(Badfiles, Cache, SessLogs, etc.)
to point to high performance disk arrays
° Reduce transformation errors (& error logging)
35
For those that are still on PowerCenter 5 …
PowerCenter 6 Performance highlights
° More efficient server

° New Sorter transformation
° ‘Sorted Input’ switch for aggregator & joiner

° More bulk loaders
° Pipeline Partitioning Upgrade!
(PowerCenter only)
36
For those that are still on PowerCenter 6 …
PowerCenter 7 Performance highlights
° Block DTM
− Enables moving/transforming a block of rows at a time
at each transformation
− Accelerates ALL sessions with:
° Mapping bottleneck AND
° (Lots of transformations OR Lots of string ports)
° Superior XML reading and writing Upgrade!

° Easy GUI for partitioning
° Max 64 partitions per partition point
° 64-bit version
° Server Grid (workflow load balancing across several servers)
° Change Data Capture (MVS, Oracle 9i and MS SQL server)
37
Performance tuning step-by-step
1. Determine Batch window
Until elapsed time

2.
< batch window
Measure
5. 3.
Run Determine
sessions bottleneck
HINTS:
- Write down a log of every step 4.
3. If all resources are used 100%, buy more Make ONE
change
4. If the change doesn’t help, UNDO
38

Informatica

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Informatica

Uploaded by

Copyright:

Available Formats

PowerCenter® 7

PowerCenter Server Engine

Disk Disk Disk Disk

° This is a multi-vendor, multi-system environment

° The PowerCenter Server utilizes two main processes

° The Load Manager process is a continuous listener process

° The DTM process uses shared memory to handle tasks such as

° DTM pipeline threads overlap when possible

The Session Task parameters control the processing pipeline and

Assume a mapping with an Aggregator, a Rank, and other

1. Determine Batch window

Until elapsed time

° Several types of Bottlenecks can affect session performance

° For the purpose of identifying bottlenecks use:

° Rows per second allows performance measurement of a

° Establishing the baseline using the Workflow Manager

° Knowing the physical data limits establishes the maximum data

s_m_RDB_TO_FF_TEST 249995 0 10/18/2002 10/18/2002 19 13158

° Each Informatica Repository contains a history of each session

SUBJECT_AREA (Folder) SUCCESSFUL_ROWS (Rows Loaded) ACTUAL_START (Start Time)

Note: simple query – select * from rep_sess_log

° Use repository views to establish performance

° OR: Use the MetaData Reporter!

° Identifying Target Bottlenecks

° Writing to a flat file usually does not cause a

Add a FILTER behind each source qualifier

No faster  Source bottleneck

° Only keep the changes that improve performance

° Databases indexes and constraints

° Use a loader connection

° PowerCenter updates and deletes

Discuss with your DBA how to optimize

° Optimize the query to begin returning rows early

° Reduce I/O times

° Reduce amount of data to transform

° Aggregator or joiner: prefix with a sorter

° Use numeric ports instead of string ports

° Reduce the number of lookup rows.

° Use persistent lookup caches

° Lookup with date-range: lookup/filter combo

° Remove the lookup, use ‘update else insert’

° Extremely large DTM may SLOW DOWN session!

° More efficient server

° ‘Sorted Input’ switch for aggregator & joiner

° Superior XML reading and writing Upgrade!

1. Determine Batch window

Until elapsed time

You might also like