Professional Documents
Culture Documents
By PenchalaRaju.Yanamala
When using incremental aggregation, you apply captured changes in the source
to aggregate calculations in a session. If the source changes incrementally and
you can capture changes, you can configure the session to process those
changes. This allows the Integration Service to update the target incrementally,
rather than forcing it to process the entire source and recalculate the same data
each time you run the session.
For example, you might have a session using a source that receives new data
every day. You can capture those incremental changes because you have added
a filter condition to the mapping that removes pre-existing data from the flow of
data. You then enable incremental aggregation.
When the session runs with incremental aggregation enabled for the first time on
March 1, you use the entire source. This allows the Integration Service to read
and store the necessary aggregate data. On March 2, when you run the session
again, you filter out all the records except those time-stamped March 2. The
Integration Service then processes the new data and updates the target
accordingly.
You can capture new source data. Use incremental aggregation when you
can capture new source data each time you run the session. Use a Stored
Procedure or Filter transformation to process new data.
Incremental changes do not significantly change the target. Use
incremental aggregation when the changes do not significantly change the
target. If processing the incrementally changed source alters more than half the
existing target, the session may not benefit from using incremental aggregation.
In this case, drop the table and recreate the target with complete source data.
Each subsequent time you run the session with incremental aggregation, you use
the incremental source changes in the session. For each input record, the
Integration Service checks historical information in the index file for a
corresponding group. If it finds a corresponding group, the Integration Service
performs the aggregate operation incrementally, using the aggregate data for
that group, and saves the incremental change. If it does not find a corresponding
group, the Integration Service creates a new group and saves the record data.
When writing to the target, the Integration Service applies the changes to the
existing target. It saves modified aggregate data in the index and data files to be
used as historical data the next time you run the session.
If the source changes significantly and you want the Integration Service to
continue saving aggregate data for future incremental changes, configure the
Integration Service to overwrite existing aggregate data with new aggregate data.
Each subsequent time you run a session with incremental aggregation, the
Integration Service creates a backup of the incremental aggregation files. The
cache directory for the Aggregator transformation must contain enough disk
space for two sets of the files.
When you partition a session that uses incremental aggregation, the Integration
Service creates one set of cache files for each partition.
The Integration Service creates new aggregate data, instead of using historical
data, when you perform one of the following tasks:
If the source tables change significantly, you might want the Integration Service
to create new aggregate data, instead of using historical data. To have the
Integration Service create new aggregate data, configure the session to
reinitialize the aggregate cache.
For example, you can reinitialize the aggregate cache if the source for a session
changes incrementally every day and completely changes once a month. When
you receive the new source data for the month, you might configure the session
to reinitialize the aggregate cache, truncate the existing target, and use the new
source table during the session.
After you run a session that reinitializes the aggregate cache, edit the session
properties to disable the Reinitialize Aggregate Cache option. If you do not clear
Reinitialize Aggregate Cache, the Integration Service overwrites the aggregate
cache each time you run the session.
Note: When you move from Windows to UNIX, you must reinitialize the cache.
Therefore, you cannot change from a Latin1 code page to an MSLatin1 code
page, even though these code pages are compatible.
After you run an incremental aggregation session, avoid moving or modifying the
index and data files that store historical aggregate information.
If you move the files into a different directory, and you want the Integration
Service to use the aggregate files, you must also change the path to those files in
the session properties. As well, if you change the path to the files, but you do not
move the files, the Integration Service rebuilds the files the next time you run the
session.
Change the Integration Service data movement mode from ASCII to Unicode or
from Unicode to ASCII.
Change the Integration Service code page to an incompatible code page.
Change the session sort order when the Integration Service runs in Unicode
mode.
Change the Enable High Precision session option.
By default, the Integration Service stores the index and data files in the directory
entered in the process variable, $PMCacheDir, in the Workflow Manager. The
Integration Service names the index file PMAGG*.idx*. The Integration Service
names the data file PMAGG*.dat*.
When you run the session, the Integration Service writes the file names in the
session log. To locate the files, look in the previous session log for the SM_7034
and SM_7035 messages that indicate the cache file name and location. The
following messages show sample entries in the session log:
When you use incremental aggregation in a session with multiple partitions, the
Integration Service creates one set of cache files for each partition.
Use the following guidelines when you change the number of partitions or the
cache directory:
If you change the number of partitions and the cache directory, you may need to
move cache files for both. For example, if you change the cache directory for the
first partition and you decrease the number of partitions, you need to move the
cache files for the deleted partition and the cache files for the partition associated
with the changed directory.
When you use incremental aggregation, you need to configure both mapping and
session properties:
Use the following guidelines when you configure the session for incremental
aggregation:
Verify the location where you want to store the aggregate files. The index
and data files grow in proportion to the source data. Be sure the cache directory
has enough disk space to store historical data for the session.
When you run multiple sessions with incremental aggregation, decide where you
want the files stored. Then, enter the appropriate directory for the process
variable, $PMCacheDir, in the Workflow Manager. You can enter session-
specific directories for the index and data files. However, by using the process
variable for all sessions using incremental aggregation, you can easily change
the cache directory when necessary by changing $PMCacheDir.
Changing the cache directory without moving the files causes the Integration
Service to reinitialize the aggregate cache and gather new aggregate data.
In a grid, Integration Services rebuild incremental aggregation files they cannot
find. When an Integration Service rebuilds incremental aggregation files, it loses
aggregate history.
Verify the incremental aggregation settings in the session properties. You
can configure the session for incremental aggregation in the Performance
settings on the Properties tab.
You can also configure the session to reinitialize the aggregate cache. If you
choose to reinitialize the cache, the Workflow Manager displays a warning
indicating the Integration Service overwrites the existing cache and a reminder to
clear this option after running the session.