Professional Documents
Culture Documents
Capacity Planning
1
Data Integration Server Sizing:
Frequently asked questions:
2
Capacity Planning/Server Sizing Goals
3
Agenda
4
Key Architecture Points
5
Data Integration Environment:
Key Architecture Points
Source File Target File
Server and Server and
RDBMS RDBMS
CPU/RAM CPU/RAM
Network Network
PowerCenter
Server
Sources Targets
Server
CPU/RAM
6
Informatica Real Time Data Integration Transactional
System –
Relational Source
(Oracle, DB2, etc)
Mainframe System
Administration Portal
PowerCenter
Orchestration Engine Acquired Mainframe
System
Acquired Mid-Range
Exception
AS/400 System
Management
Database
7
PowerCenter System Characteristics
• Block processing
• Parallelism – Multiple threads, partitioning
• 64-bit Option
• Database and File processing: Random vs. Sequential Reads
• String Manipulation and Unicode
• Pushdown Option
• Checkpoint Recovery
• Web Services
8
Environment Sizing vs
Capacity Planning
9
Environment Sizing vs Capacity Planning
• Environment Sizing
• New software implementation/install
• Rudimentary models predict estimated need
• Rarely perfect
• Limit focus
• Capacity Planning
• Accuracy of exercise is based on existing statistics from existing
environment
• Upgrade / migration
• New projects added on existing environment
• Ensure existing projects are not affected
• Load window
• Load times
• Performance
10
Environment Sizing
11
Environment Sizing
• New Environment
• Estimation process because there is no existing environment
• Architectural considerations that affect sizing:
• GRID/HA
• Windows vs. UNIX/LINUX
• Shared PC Environment vs. Dedicated PC Environment
• Virtualization
12
Environment Sizing Inputs
• Data volumes
• Mapping complexity
• Number of mappings
• Concurrent work load
• Peak work load
• Expected growth
13
Environment Sizing Methodology
14
Environment Sizing Methodology – cont.
15
Capacity Planning
16
Capacity Planning
• Existing Environment
• Measure actual performance in YOUR environment
• Use real world performance information to understand
current unused capacity
• Use linear scalability to predict future needs
• Key review points:
• Current performance tuning
• Data growth projections
• Future integration needs
• Consider Impacts of any technology shift/change
• Web Services
• Grid/HA
• XML Processing etc
17
Capacity Planning Methodology
• Gather performance information
• Volume (data/records)
• CPU Usage
• Memory Usage
• Network Usage
• File System Usage
• System Characteristics (CPU speed etc)
• Document future assumptions e.g. planned architecture, usage period,
geographical distribution of data & users
• Review data growth projections
• Review future growth needs
• Plan for 75% CPU utilization or less
• Determine required capacity
• Update/expand environment as needed
• Use continuous benchmark testing
18
Tools
19
Tools - Unix
vmstat - (virtual memory statistics)
• Reports information about processes, memory, paging, block IO, and cpu
• vmstat 5 10 – run with 5 sec delay 10 times
• Processes in the run queue (procs r)
• procs r consistently greater than the number of CPUs is a bottleneck
• Idle time (cpu id)
• cpu id consistently 0 indicates CPU issue
• Scan rate (sr)
• sr rate continuously over 200 pages per second indicates a memory shortage
20
Tools - Unix
iostat – (input/output statistics)
• Report on CPU, input/output statistics for devices and
partitions
• iostat 5 10 – run with 5 sec delay 10 times
• Reads/writes per second (r/s , w/s)
• Consistently high reads/writes indicates disk issues
• Percentage busy (%b)
• %b > 5 may point to I/O bottleneck
• Service time (svc_t)
• svc_t > 30 milliseconds requires faster disk/controller
21
Tools - Unix
sar – (system activity reporter)
• Exists on many UNIX platforms
• Examine live statistics
• sar [options…] t n
• t is number of seconds per sample
• n is number of samples
22
Tools - Unix
vega7077-root-># sar -d 60 1
23
Tools - Unix
sar – CPU utilization
• sar –u t n
• %sys is system/kernel time
• %usr is user space time
• %wio is Percent of time “waiting on I/O”
• wio is the best indicator if I/O is a bottleneck
• Directly reflects how much performance is lost waiting on I/O
operations
vega7077-root-># sar -u 60 1
24
Tools - Unix
top
• Provides a dynamic real-time view of a running
system
• Displays system summary information as well as
a list of tasks currently being managed
• Useful for shared environments to identify each
application process and their CPU/memory
consumption
25
Tools
Windows perfmon
26
Example
27
Example Capacity Planning Exercise
• Peak Load Time – 1am to 1:35 am
• Number of Sessions – 45
• Most concurrent sessions – 4
• Total Data Processed – 10 GB
• Primarily DBMS to DBMS data moves
• Server is 4 CPU with 16gb of RAM
• Most sessions include lookups but with fairly reasonable
cache size (ie. no 8gb customer master)
• Total Load Window requirement is 2 hrs (done by 3am)
28
Example Capacity Planning Exercise
29
Questions for the Example :
• Do you need more CPU?
• Do you need more RAM?
• How much more expected capacity do you have without
extending the current load window?
• How much more capacity do you have until you no longer
meet load window?
• What could you do to ‘free up’ more capacity?
30
Questions for the Example :
• How much more expected capacity do you have without extending the current load
window?
• How much more capacity do you have until you no longer meet load window?
• What could you do to ‘free up’ more capacity?
31
Useful “Rules of Thumb”:
Processor & RAM
• Virtual CPU is considered as 0.75 CPU. For example 4
CPU with 4 cores each, could be considered as 12
Virtual CPUs.
• 20 to 30MB of memory for the Integration Service for
session coordination w/o aggregations, lookups, or
heterogeneous data joins.
• Note: 32-bit systems have an operating system limitation
of 2GB per session.
32
Useful “Rules of Thumb”:
• Caches for aggregation, lookups or joins use additional
memory:
• Lookup tables are cached in full; the memory consumed depends
on the size of the tables and selected data ports.
• Aggregate caches store the individual groups; more memory is
used if there are more groups. Sorting input greatly reduces the
need for memory.
• Joins cache the master table in a join; memory consumed
depends on the size of the master.
33
Pitfalls and Common Mistakes
34
Pitfalls and Common Mistakes
• Apples to Apples
• “I talked to <customer> at the user group and they are moving 1,000 rows a second – why
aren’t I experiencing the same?”
• “I read an Informatica benchmark and they moved a terabyte in 38 min, which showed 4mb a
second per processor – mine should be the same performance right?”
• Growth Projections
• “Every day we process 100,000 records that equal 5mb of data thus our warehouse is
increasing by 5mb a day. “
• “Every year our warehouse grows by 25% so our daily capacity must be growing by 25%. “
• Adding Horsepower
• ‘If I add more CPU and RAM my loads will be faster.”
• “My hardware vendor promised their new CPU’s are 2x faster so my load should finish in ½
the time.”
• Root Cause
• “My performance is poor, it must be the Informatica Platform.”
• “I’m seeing very low rows per second processed, I must have a slow server”
35
Additional Resources
My.informatica.com
https://my-prod.informatica.com/velocity/
36
Questions?
37
38