Capacity Planning 8.6 - Florida IUG

Data Integration and
Capacity Planning
Peter Zabaldo, Sr. Consultant

Informatica Professional Services
Florida IUG - October 8, 2009
1
Data Integration Server Sizing:
Frequently asked questions:
• Could I move my data faster with an expanded environment?

• Is my Informatica Server ‘big’ enough?
• How much more data could I move with my existing system?
• How much faster could I execute my existing loads?
• If I had to add 1 more project – do I have sufficient capacity?
• How about x more projects?
2
Capacity Planning/Server Sizing Goals
• Meet performance requirements

• Satisfy load window requirements
• Minimize contention due to lack of resources
• Lower maintenance cost and cost of ownership
3
Agenda
• Key Architecture Points

• Environment Sizing vs. Capacity Planning
• Tools
• Read world example
• Pitfalls and common mistakes
4
Key Architecture Points
5
Data Integration Environment:
Key Architecture Points
Source File Target File
Server and Server and
RDBMS RDBMS
CPU/RAM CPU/RAM
Network Network
PowerCenter
Server
Sources Targets
Server
CPU/RAM
6
Informatica Real Time Data Integration Transactional
System –
Relational Source
(Oracle, DB2, etc)
Customer Portal Web Application Integrated

Server Customer Portal
Database
Mainframe System
Administration Portal
PowerCenter
Orchestration Engine Acquired Mainframe
System
Data Steward Portal
Acquired Mid-Range
Exception
AS/400 System
Management
Database
7
PowerCenter System Characteristics
• Block processing
• Parallelism – Multiple threads, partitioning
• 64-bit Option
• Database and File processing: Random vs. Sequential Reads
• String Manipulation and Unicode
• Pushdown Option
• Checkpoint Recovery
• Web Services
8
Environment Sizing vs
Capacity Planning
9
Environment Sizing vs Capacity Planning
• Environment Sizing
• New software implementation/install
• Rudimentary models predict estimated need
• Rarely perfect
• Limit focus
• Capacity Planning
• Accuracy of exercise is based on existing statistics from existing
environment
• Upgrade / migration
• New projects added on existing environment
• Ensure existing projects are not affected
• Load window
• Load times
• Performance
10
Environment Sizing
11
Environment Sizing
• New Environment
• Estimation process because there is no existing environment
• Architectural considerations that affect sizing:
• GRID/HA
• Windows vs. UNIX/LINUX
• Shared PC Environment vs. Dedicated PC Environment
• Virtualization
• Hardware sizing considerations

• CPU as basis
• Memory
• I/O Capacity related to Disk Space
• Network Bandwidth
• Repository Database
12
Environment Sizing Inputs
• Data volumes
• Mapping complexity
• Number of mappings
• Concurrent work load
• Peak work load
• Expected growth
13
Environment Sizing Methodology
• Gather performance requirements (Volume, load

window etc.)
• Document assumptions
• planned architecture
• usage period
• geographical distribution of data & users
• Evaluate hardware alternatives – Commodity vs. High-

End SMP
• Target 75% CPU Utilization max
• Utilize CPU estimation factors
14
Environment Sizing Methodology – cont.
• Consider future growth

• Use benchmark testing to validate
• Cross check with other implementations
• Size appropriately in lifecycle
15
Capacity Planning
16
Capacity Planning
• Existing Environment
• Measure actual performance in YOUR environment
• Use real world performance information to understand
current unused capacity
• Use linear scalability to predict future needs
• Key review points:
• Current performance tuning
• Data growth projections
• Future integration needs
• Consider Impacts of any technology shift/change
• Web Services
• Grid/HA
• XML Processing etc
17
Capacity Planning Methodology
• Gather performance information
• Volume (data/records)
• CPU Usage
• Memory Usage
• Network Usage
• File System Usage
• System Characteristics (CPU speed etc)
• Document future assumptions e.g. planned architecture, usage period,
geographical distribution of data & users
• Review data growth projections
• Review future growth needs
• Plan for 75% CPU utilization or less
• Determine required capacity
• Update/expand environment as needed
• Use continuous benchmark testing
18
Tools
19
Tools - Unix
vmstat - (virtual memory statistics)
• Reports information about processes, memory, paging, block IO, and cpu
• vmstat 5 10 – run with 5 sec delay 10 times
• Processes in the run queue (procs r)
• procs r consistently greater than the number of CPUs is a bottleneck
• Idle time (cpu id)
• cpu id consistently 0 indicates CPU issue
• Scan rate (sr)
• sr rate continuously over 200 pages per second indicates a memory shortage
20
Tools - Unix
iostat – (input/output statistics)
• Report on CPU, input/output statistics for devices and
partitions
• iostat 5 10 – run with 5 sec delay 10 times
• Reads/writes per second (r/s , w/s)
• Consistently high reads/writes indicates disk issues
• Percentage busy (%b)
• %b > 5 may point to I/O bottleneck
• Service time (svc_t)
• svc_t > 30 milliseconds requires faster disk/controller
21
Tools - Unix
sar – (system activity reporter)
• Exists on many UNIX platforms
• Examine live statistics
• sar [options…] t n
• t is number of seconds per sample
• n is number of samples
• Save sar data for later analysis

• sar –o filename t n
• Recall CPU usage: sar –u –f filename
• Recall Disk usage: sar –d –f filename
• You can also specify time windows (-s, -e) and alternate interval
with –i
22
Tools - Unix
sar – Disk Utilization

• sar –d t n
• Average I/O size in bytes = (blks/s*512 bytes)/(r+w/s)
• % busy is a good indicator of disk bottleneck
• Shows disk devices -- can be tough to trace back to specific logical
volume
vega7077-root-># sar -d 60 1
HP-UX vega7077 B.11.23 U ia64 10/25/07
10:25:24 device %busy avque r+w/s blks/s avwait avserv

10:26:24 c2t6d0 0.65 0.50 1 23 0.00 9.14
c76t4d3 0.02 0.50 0 0 0.01 10.03
c140t2d0 3.13 0.50 2 180 0.00 18.21
c142t2d0 3.88 0.50 2 180 0.00 22.38
c148t2d0 0.28 0.50 2 180 0.00 1.69
c150t2d0 0.42 0.50 2 176 0.00 2.37
c108t2d0 3.03 0.50 2 179 0.00 17.67
23
Tools - Unix
sar – CPU utilization
• sar –u t n
• %sys is system/kernel time
• %usr is user space time
• %wio is Percent of time “waiting on I/O”
• wio is the best indicator if I/O is a bottleneck
• Directly reflects how much performance is lost waiting on I/O
operations
vega7077-root-># sar -u 60 1
HP-UX vega7077 B.11.23 U ia64 10/25/07
10:49:31 %usr %sys %wio %idle

10:50:31 1 5 6 87
24
Tools - Unix
top
• Provides a dynamic real-time view of a running
system
• Displays system summary information as well as
a list of tasks currently being managed
• Useful for shared environments to identify each
application process and their CPU/memory
consumption
25
Tools
Windows perfmon
26
Example
27
Example Capacity Planning Exercise
• Peak Load Time – 1am to 1:35 am
• Number of Sessions – 45
• Most concurrent sessions – 4
• Total Data Processed – 10 GB
• Primarily DBMS to DBMS data moves
• Server is 4 CPU with 16gb of RAM
• Most sessions include lookups but with fairly reasonable
cache size (ie. no 8gb customer master)
• Total Load Window requirement is 2 hrs (done by 3am)
28
Example Capacity Planning Exercise
Time CPU 1 CPU 2 CPU 3 CPU 4 Avg RAM I/O

1:01 95% 90% 85% 25% 74% 90% Ok
1:11 90% 90% 65% 3% 62% 35% Good
1:21 90% 50% 10% 3% 38% 50% Good
1:31 75% 25% 3% 3% 25% 25% Good
Avg 87% 64% 41% 9% 50% 50% Good
Data Seconds Data/Sec Data CPU/Sec Max Expected
10GB 2100 4.8mb 1.2mb 2.4/mb/CPU
29
Questions for the Example :
• Do you need more CPU?
• Do you need more RAM?
• How much more expected capacity do you have without
extending the current load window?
• How much more capacity do you have until you no longer
meet load window?
• What could you do to ‘free up’ more capacity?
30
Questions for the Example :
• How much more expected capacity do you have without extending the current load
window?
• How much more capacity do you have until you no longer meet load window?
• What could you do to ‘free up’ more capacity?
• How much more capacity :
• 1.2mb/CPU currently – could comfortably go to 75% CPU so likely

1.8mb/CPU or capacity of .6mb/CPU/Second = 5GB more
• How much more capacity till no longer meet 2 hrs? – 15gb x 2 hrs =
30gb total capacity – current 10gb = 20gb more of data
• Tuning, spreading current load out across window, pushdown

optimization, Change Data Capture etc.
31
Useful “Rules of Thumb”:
Processor & RAM
• Virtual CPU is considered as 0.75 CPU. For example 4
CPU with 4 cores each, could be considered as 12
Virtual CPUs.
• 20 to 30MB of memory for the Integration Service for
session coordination w/o aggregations, lookups, or
heterogeneous data joins.
• Note: 32-bit systems have an operating system limitation
of 2GB per session.
32
Useful “Rules of Thumb”:
• Caches for aggregation, lookups or joins use additional
memory:
• Lookup tables are cached in full; the memory consumed depends
on the size of the tables and selected data ports.
• Aggregate caches store the individual groups; more memory is
used if there are more groups. Sorting input greatly reduces the
need for memory.
• Joins cache the master table in a join; memory consumed
depends on the size of the master.
• Full Pushdown Optimization requires fewer resources on

PowerCenter server in comparison to partial
(source/target) pushdown optimization.
33
Pitfalls and Common Mistakes
34
Pitfalls and Common Mistakes
• Apples to Apples
• “I talked to <customer> at the user group and they are moving 1,000 rows a second – why
aren’t I experiencing the same?”
• “I read an Informatica benchmark and they moved a terabyte in 38 min, which showed 4mb a
second per processor – mine should be the same performance right?”
• Growth Projections
• “Every day we process 100,000 records that equal 5mb of data thus our warehouse is
increasing by 5mb a day. “
• “Every year our warehouse grows by 25% so our daily capacity must be growing by 25%. “
• Adding Horsepower
• ‘If I add more CPU and RAM my loads will be faster.”
• “My hardware vendor promised their new CPU’s are 2x faster so my load should finish in ½
the time.”
• Root Cause
• “My performance is poor, it must be the Informatica Platform.”
• “I’m seeing very low rows per second processed, I must have a slow server”
35
Additional Resources
Velocity Best Practices

• Platform Sizing
• Environment & PowerCenter Sizing
My.informatica.com
https://my-prod.informatica.com/velocity/
36
Questions?
37
38

Capacity Planning 8.6 - Florida IUG

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Capacity Planning 8.6 - Florida IUG

Uploaded by

Copyright:

Available Formats

Data Integration and

Peter Zabaldo, Sr. Consultant

• Could I move my data faster with an expanded environment?

• Meet performance requirements

• Key Architecture Points

Customer Portal Web Application Integrated

Data Steward Portal

• Hardware sizing considerations

• Gather performance requirements (Volume, load

• Evaluate hardware alternatives – Commodity vs. High-

• Consider future growth

• Save sar data for later analysis

sar – Disk Utilization

HP-UX vega7077 B.11.23 U ia64 10/25/07

10:25:24 device %busy avque r+w/s blks/s avwait avserv

HP-UX vega7077 B.11.23 U ia64 10/25/07

10:49:31 %usr %sys %wio %idle

Time CPU 1 CPU 2 CPU 3 CPU 4 Avg RAM I/O

Data Seconds Data/Sec Data CPU/Sec Max Expected

10GB 2100 4.8mb 1.2mb 2.4/mb/CPU

• How much more capacity :

• 1.2mb/CPU currently – could comfortably go to 75% CPU so likely

• Tuning, spreading current load out across window, pushdown

• Full Pushdown Optimization requires fewer resources on

Velocity Best Practices

You might also like