You are on page 1of 5

Using Incremental Aggregation

By PenchalaRaju.Yanamala

This chapter includes the following topics:

Using Incremental Aggregation Overview


Integration Service Processing for Incremental Aggregation
Reinitializing the Aggregate Files
Moving or Deleting the Aggregate Files
Partitioning Guidelines with Incremental Aggregation
Preparing for Incremental Aggregation

Using Incremental Aggregation Overview

When using incremental aggregation, you apply captured changes in the source
to aggregate calculations in a session. If the source changes incrementally and
you can capture changes, you can configure the session to process those
changes. This allows the Integration Service to update the target incrementally,
rather than forcing it to process the entire source and recalculate the same data
each time you run the session.

For example, you might have a session using a source that receives new data
every day. You can capture those incremental changes because you have added
a filter condition to the mapping that removes pre-existing data from the flow of
data. You then enable incremental aggregation.

When the session runs with incremental aggregation enabled for the first time on
March 1, you use the entire source. This allows the Integration Service to read
and store the necessary aggregate data. On March 2, when you run the session
again, you filter out all the records except those time-stamped March 2. The
Integration Service then processes the new data and updates the target
accordingly.

Consider using incremental aggregation in the following circumstances:

You can capture new source data. Use incremental aggregation when you
can capture new source data each time you run the session. Use a Stored
Procedure or Filter transformation to process new data.
Incremental changes do not significantly change the target. Use
incremental aggregation when the changes do not significantly change the
target. If processing the incrementally changed source alters more than half the
existing target, the session may not benefit from using incremental aggregation.
In this case, drop the table and recreate the target with complete source data.

Note: Do not use incremental aggregation if the mapping contains percentile or


median functions. The Integration Service uses system memory to process these
functions in addition to the cache memory you configure in the session
properties. As a result, the Integration Service does not store incremental
aggregation values for percentile and median functions in disk caches.

Integration Service Processing for Incremental Aggregation


The first time you run an incremental aggregation session, the Integration
Service processes the entire source. At the end of the session, the Integration
Service stores aggregate data from that session run in two files, the index file
and the data file. The Integration Service creates the files in the cache directory
specified in the Aggregator transformation properties.

Each subsequent time you run the session with incremental aggregation, you use
the incremental source changes in the session. For each input record, the
Integration Service checks historical information in the index file for a
corresponding group. If it finds a corresponding group, the Integration Service
performs the aggregate operation incrementally, using the aggregate data for
that group, and saves the incremental change. If it does not find a corresponding
group, the Integration Service creates a new group and saves the record data.

When writing to the target, the Integration Service applies the changes to the
existing target. It saves modified aggregate data in the index and data files to be
used as historical data the next time you run the session.

If the source changes significantly and you want the Integration Service to
continue saving aggregate data for future incremental changes, configure the
Integration Service to overwrite existing aggregate data with new aggregate data.

Each subsequent time you run a session with incremental aggregation, the
Integration Service creates a backup of the incremental aggregation files. The
cache directory for the Aggregator transformation must contain enough disk
space for two sets of the files.

When you partition a session that uses incremental aggregation, the Integration
Service creates one set of cache files for each partition.

The Integration Service creates new aggregate data, instead of using historical
data, when you perform one of the following tasks:

Save a new version of the mapping.


Configure the session to reinitialize the aggregate cache.
Move the aggregate files without correcting the configured path or directory for
the files in the session properties.
Change the configured path or directory for the aggregate files without moving
the files to the new location.
Delete cache files.
Decrease the number of partitions.

Reinitializing the Aggregate Files

If the source tables change significantly, you might want the Integration Service
to create new aggregate data, instead of using historical data. To have the
Integration Service create new aggregate data, configure the session to
reinitialize the aggregate cache.

For example, you can reinitialize the aggregate cache if the source for a session
changes incrementally every day and completely changes once a month. When
you receive the new source data for the month, you might configure the session
to reinitialize the aggregate cache, truncate the existing target, and use the new
source table during the session.

After you run a session that reinitializes the aggregate cache, edit the session
properties to disable the Reinitialize Aggregate Cache option. If you do not clear
Reinitialize Aggregate Cache, the Integration Service overwrites the aggregate
cache each time you run the session.

Note: When you move from Windows to UNIX, you must reinitialize the cache.
Therefore, you cannot change from a Latin1 code page to an MSLatin1 code
page, even though these code pages are compatible.

Moving or Deleting the Aggregate Files

After you run an incremental aggregation session, avoid moving or modifying the
index and data files that store historical aggregate information.

If you move the files into a different directory, and you want the Integration
Service to use the aggregate files, you must also change the path to those files in
the session properties. As well, if you change the path to the files, but you do not
move the files, the Integration Service rebuilds the files the next time you run the
session.

If you change certain session or Integration Service properties, the Integration


Service cannot use the incremental aggregation files, and it fails the session. To
avoid session failure, delete existing incremental aggregation files when you
perform any of the following tasks:

Change the Integration Service data movement mode from ASCII to Unicode or
from Unicode to ASCII.
Change the Integration Service code page to an incompatible code page.
Change the session sort order when the Integration Service runs in Unicode
mode.
Change the Enable High Precision session option.

Finding Index and Data Files

By default, the Integration Service stores the index and data files in the directory
entered in the process variable, $PMCacheDir, in the Workflow Manager. The
Integration Service names the index file PMAGG*.idx*. The Integration Service
names the data file PMAGG*.dat*.

When you run the session, the Integration Service writes the file names in the
session log. To locate the files, look in the previous session log for the SM_7034
and SM_7035 messages that indicate the cache file name and location. The
following messages show sample entries in the session log:

MAPPING> SM_7034 Aggregate Information: Index file is


[C:\Informatica\PowerCenter8.0\server\infa_shared\Cache\PMAGG8_4_2.idx2]

MAPPING> SM_7035 Aggregate Information: Data file is


[C:\Informatica\PowerCenter8.0\server\infa_shared\Cache\PMAGG8_4_2.dat2]
Partitioning Guidelines with Incremental Aggregation

When you use incremental aggregation in a session with multiple partitions, the
Integration Service creates one set of cache files for each partition.

Use the following guidelines when you change the number of partitions or the
cache directory:

Change the cache directory for a partition.


If you change the directory for a partition and
you want the Integration Service to reuse the
cache files, you must move the cache files for
the partition associated with the changed
directory.
If you change the directory for the first partition, and you do not move
the cache files, the Integration Service rebuilds the cache files for all
- partitions.
If you change the directory for partitions 2-n, and you do not move the cache
- files, the Integration Service rebuilds the cache files that it cannot locate.
Decrease the number of partitions. If you delete a partition and you want the
Integration Service to reuse the cache files, you must move the cache files for
the deleted partition to the directory configured for the first partition. If you do not
move the files to the directory of the first partition, the Integration Service
rebuilds the cache files that it cannot locate.
Note: If you increase the number of partitions, the Integration Service realigns
the index and data cache files the next time you run a session. It does not need
to rebuild the files.
Move cache files. If you move cache files for a partition and you want the
Integration Service to reuse the files, you must also change the partition
directory. If you do not change the directory, the Integration Service rebuilds the
files the next time you run a session.
Delete cache files. If you delete cache files, the Integration Service rebuilds
them the next time you run a session.

If you change the number of partitions and the cache directory, you may need to
move cache files for both. For example, if you change the cache directory for the
first partition and you decrease the number of partitions, you need to move the
cache files for the deleted partition and the cache files for the partition associated
with the changed directory.

Preparing for Incremental Aggregation

When you use incremental aggregation, you need to configure both mapping and
session properties:

Implement mapping logic or filter to remove pre-existing data.


Configure the session for incremental aggregation and verify that the file
directory has enough disk space for the aggregate files.

Configuring the Mapping

Before enabling incremental aggregation, you must capture changes in source


data. You can use a Filter or Stored Procedure transformation in the mapping to
remove pre-existing source data during a session.
Configuring the Session

Use the following guidelines when you configure the session for incremental
aggregation:

Verify the location where you want to store the aggregate files. The index
and data files grow in proportion to the source data. Be sure the cache directory
has enough disk space to store historical data for the session.
When you run multiple sessions with incremental aggregation, decide where you
want the files stored. Then, enter the appropriate directory for the process
variable, $PMCacheDir, in the Workflow Manager. You can enter session-
specific directories for the index and data files. However, by using the process
variable for all sessions using incremental aggregation, you can easily change
the cache directory when necessary by changing $PMCacheDir.
Changing the cache directory without moving the files causes the Integration
Service to reinitialize the aggregate cache and gather new aggregate data.
In a grid, Integration Services rebuild incremental aggregation files they cannot
find. When an Integration Service rebuilds incremental aggregation files, it loses
aggregate history.
Verify the incremental aggregation settings in the session properties. You
can configure the session for incremental aggregation in the Performance
settings on the Properties tab.
You can also configure the session to reinitialize the aggregate cache. If you
choose to reinitialize the cache, the Workflow Manager displays a warning
indicating the Integration Service overwrites the existing cache and a reminder to
clear this option after running the session.

You might also like