Professional Documents
Culture Documents
Figure 3: Changes processed by the second subscriber followed by a purge of the consumed
records.
In the above example, if changes had occurred before 8:00pm, but after GL_INTEGRATION last
processed the changes (i.e. 7:15pm) then these changes would not be purged until
GL_INTEGRATION has processed them all (i.e. 8:15pm)
To process the changes, ODI applies a logical lock on the records that it is about to process. Then
the records are processed and the unlock step defines if the records have to be purged or not,
based on other subscribers consumption of the changes.
Two views are created: the JV$ view and the JV$D view.
The JV$ view is used in the mappings where you select the option Journalized data only. Figure 4
shows where to find this option in the Physical tab of the mappings:
Figure 5: Selecting the subscriber name in the mapping options to consume changes.
The subscriber name does not have to be hard-coded: you can use an ODI variable to store this
name and use the variable in the filter.
The JV$D view is used to show the list of changes available in the J$ table when you select the
menu Journal Data from the CDC menu under the models and datastores. Figure 6 shows how to
access this menu:
Because of these limitations though, the most recent and most efficient JKMs provided out of the box
with ODI are all Consistent set JKMs. One important caveat with simple CDC JKMs is that they
create one entry per subscriber in the J$ table for every single changed row. If you have two
subscribers, each change generates two records in the J$ table. Having three subscribers means
three entries in the J$ table for each change. You can immediately see that this implementation
works for basic cases, but it is very limited when you want to expand your infrastructure.
When using Simple CDC JKMs, the lock, unlock and purge operations are performed in the IKM:
each IKM has the necessary steps for these operations, and these steps are only executed if:
The JKM used for journalizing in the model that contains the source table is a Simple CDC
JKM
Parent records must be processed first, or child records cannot be inserted (they would be
referencing invalid foreign keys).
ODI needs to mark the records that are about to be processed (This is the logical lock
mentioned earlier), and then process them. But as we are processing the parent records,
changes to additional parent and children can be written to the CDC tables. The challenge is that
by the time we lock the children records in order to process them, the parent records for the last
arrived changes have not been processed yet. Figure 7 below illustrates this: if ODI starts
processing the changes in the Orders table at 12:00:00, and then starts processing the changes
in the Order Lines table at 12:00:02, the parent record for order lines 4 and 5 is missing in the
target environment: order # 3 had not arrived yet when the Orders changes were processed.
Figure 7: Parent and children records arriving during the processing of changes
When you define the parameters for consistent set CDC, you have to define the parent-child
relationship between the tables. To do so, you have to edit the Model that contains these tables and
select the Journalized Tables tab. You can either use the Reorganize button to have ODI compute
the dependencies for you based on the foreign keys available in the model, or you can manually set
the order. Parent tables should be at the top, children tables (the ones referencing the parents)
should be at the bottom.
In Figure 8 we see a Diagram that was created under the model that hosts the journalized tables to
represent the relationships between the tables. To reproduce this, create a Diagram under your
Model, then drag and drop the selected tables in that diagram: the foreign keys will automatically be
represented as arrows by the ODI Studio.
Figure 8: ODI Diagram that represents the parent-child relationship in a set of tables.
In the illustration shown in figure 9 we would have to move PRODUCT_RATINGS down the list
because of its reference to the SUPPLIERS table.
Make sure that all records have a window_id, then identify the highest available window_ids
(this is the Extend Window operation)
Define the array of window_ids to be processed by the subscribers (this is the Lock
Subscriber operation).
These operations are performed in the packages before processing the interfaces where CDC data
is processed as shown in Figure 10. After the data has been processed, the subscribers must be
unlocked and the J$ table can be purged of the consumed records.
Extend window
Either the window_id column of the J$ table is updated by the detection mechanism (as is the case
with GoldenGate JKMs) or it is not (as is the case with trigger based JKMs). In all cases, the
SNP_CDC_SET table is first updated with the new computed window_id for the CDC Set that is
being processed. The window_id is computed from the checkpoint table for GoldenGate JKMs or is
based on an increment of the last used value (found in the SNP_CDC_SET table) for other JKMs.
For non GoldenGate JKMs, all records of the J$ table that do not have a window_id yet (the value
would be null) are updated with this new window_id value so that the records can be processed:
these are records that were written to the J$ table after the last processing of changes and were
never assigned a window_id.
Again, GoldenGate writes this window_id as it inserts records into the J$ table.
Lock subscriber
For all JKMs, the subscribers have to be locked: their processing window are set to range between
the last processed window_id (which is the minimum window_id) and the newly computed
window_id (which is the maximum window_id).
JRN_TNAME (PK)
JRN_SUBSCRIBER (PK)
JRN_REFDATE
JRN_ROW_COUNT
JRN_DATA_CMD
JRN_COUNT_CMD
JRN_SUBSCRIBER
JRN_CONSUMED
JRN_FLAG
JRN_DATE
PK_x
This shows that ODI does not replicate the transactions; it does an integration of the data as they
are at the time the integration process runs. Oracle GoldenGate replicates the transactions as they
occur on the source system.
If the same PK appears multiple times, only the last entry for that PK (based on the
JRN_DATE) is taken into account. Again the logic here is that we want to replicate values as they
are currently in the source database. We are not interested in the history of intermediate values
that could have existed.
An additional filter is added in the mappings at design time so that only the records for the selected
subscriber are consumed from the J$ table, as we saw in figure 5.
SNP_CDC_SET
CDC_SET_NAME (PK)
CUR_WINDOW_ID
CUR_WINDOW_ID_DEL
CUR_WINDOW_ID_INS
CDC_REFDATE
MIN_WINDOW_ID
MAX_WINDOW_ID
MAX_WINDOW_ID_DEL
CDC_SET_NAME
FULL_TABLE_NAME
(PK)
FULL_DATA_VIEW
SNP_CDC_OBJECTS
FULL_TABLE_NAME
(PK)
CDC_OBJECT_TYPE
(PK)
FULL_OBJECT_NAME
DB_OBJECT_TYPE
This table is leveraged to make sure that ODI does not attempt to recreate an object that has
already been created (see section 4.1 Only creating the J$ tables and views if they do not exist).
WINDOW_ID
PK_x
Records where the window_id is between the minimum and maximum window_id for the
subscribers;
If the same PK appears multiple times, only the last entry for that PK is taken into account.
The logic here is that we want to replicate values as they currently are in the source database:
we are not interested in the history of intermediate values that could have existed.
A filter created in the mappings allows the developers to select the subscriber for which the changes
are consumed, as we saw in figure 5.
The JV$D view uses the same approach to remove duplicate entries, but it shows all entries
available to all subscribers, including the ones that have not been assigned a window_id yet.
Low impact on the source system: GoldenGate reads the changes directly from the database
logs, and as such does not require any additional database activity.
Performance of the end-to-end solution: even though the large majority of ODI customers
run their ODI processes as batch jobs, some customers are reducing the processing windows
continuously. Using GoldenGate for CDC allows for unique end-to-end performance, with
customers achieving under-10 seconds end-to-end latency across heterogeneous systems: this
includes GoldenGate detection of the changes, replication of the changes, transformations by
ODI and commit in the target system.
Heterogeneous capabilities: both ODI and GoldenGate can operate on many databases
available on the market, allowing for more flexibility in the data integration infrastructure.
The ODI JKMs generates the necessary files for GoldenGate to replicate the data and
update the ODI J$ tables (oby and prm files for the capture, pump and apply processes),
including the window_id
These files instruct GoldenGate to write the PK of the changed records and to update the
window_id for that change. The window_id is computed by concatenating the sequence number
and the RBA from the GoldenGate checkpoint file with this expression:
If you are using OGG Online JKMs, ODI can issue the commands using the GoldenGate
JAgent and execute these commands directly. If not, ODI generates a readme file along with the
oby and prm file. This file provides all the necessary instructions to configure and start the
GoldenGate replication using the generated files.
If you already have a GoldenGate replication in place, you can read the prm files generated
by ODI to see what needs to be changed in your configuration so that you update the J$ tables
(or read the next section for an explanation of how this works).
The second one makes sure that the J$ table is updated at the same time as the staging table.
GoldenGate in this case has two targets when it replicates the changes.
?
map <Source_table_name>, target <J$_Table_name>, KEYCOLS (PK1, PK2,,PKn,
WINDOW_ID), INSERTALLRECORDS, OVERRIDEDUPS,
COLMAP (
PK1 = PK1,
PK2 = PK2,
...
PKn=PKn,
WINDOW_ID = @STRCAT(@GETENV("RECORD", "FILESEQNO"), @STRNUM(@GETENV("RECORD",
"FILERBA"), RIGHTZERO, 10))
);
If you already have GoldenGate in place to replicate data from the source tables into a staging area,
you may not be interested in using the files generated by ODI. You have already configured and fine
tuned your environment, you do not want to override your configuration. All you need to do in that
case is to add the additional maps for GoldenGate to update the ODI J$ tables.
In ODI 11g the source table for an initial load was different from the source table used with
GoldenGate for CDC: the GoldenGate replicat table had to be used explicitly as a source table in
CDC configurations. With the 12c implementation of the GoldenGate JKMs, the same original
source table is used in the mappings for both initial loads and incremental loads using
GoldenGate. For CDC, the GoldenGate source becomes the source table in the mappings for
CDC. The GoldenGate replicat is considered as a staging table and as such is not represented in
the ODI mappings anymore. David Allan has a very good pictorial representation of the new
paradigm available here:
https://blogs.oracle.com/dataintegration/resource/odi_12c/odi_12c_ogg_configuration.jpg.
The new JKMs allow for online or offline use of GoldenGate: in online mode, ODI
communicates directly with the GoldenGate JAgent to distribute the configuration parameters.
The offline mode is similar to what was available in ODI 11g.
Figure 11: Repeating the code for all tables of the CDC set in the appropriate order
Note that since GoldenGate updates the window_ids directly for ODI, the matching step does not
exist in the GoldeGate JKMs. But the same technique of processing tables of the set in the
appropriate order is leveraged when creating or dropping the infrastructure (look at the Create J$
and Drop J$ tasks for instance in the GoldenGate JKMs).