Parallel Stages

Processing stages
27/02/2009
PROCESSING STAGES:
Aggregator Stage:
Aggregator stage is an Active stage. It accepts 1 input and 1 output. It is used for
aggregating data based on a key column. We Group by on a column and do various calculations
on that group
This stage takes data from input and does aggregations on each group. We can do calculations
like Sum, Average, Min, Max, Percentage and also Count the values based on the keys from
input.
For better Performance, The Key column should be sorted before aggregation and hash
partitioned.
Properties:
-- Group key: It is a key by which we Group by the data. It will arrange the similar records
at same place which is easier when we go for any Calculation.
-- Aggregate type: Here we have Calculate/ Count/ Recount options.

Calculate: we can do calculations like Sum, Average, Percentage, Minimum, Maximum, Mean
and so on.
Recalculation: This option is used when we have to do more than one calculation. For this
scenario we need to connect one Aggregator stage followed by another Aggregator. Now the
output data set from first stage will be the input for the second stage where re-calculation is done.
COPY stage:
Copy stage is an Active stage. It can have a single input link and any number of output
links. It is used to copy single input data set to multiple output data sets. Thus, Each record from
the input data is copied to every output data set. This stage is used to take backup of a data
set to some other location on the disk or when we want multiple copies of data for different
processes in other job. Here we cannot alter data in any order.
Properties:
-- Force = True / False
When we have a single input and single output in the job, set the Force as True and
when we have single input and multiple outputs, then set as False. By default it is set as
False.
DATASTAGE PX
Processing stages
27/02/2009
Funnel:
Funnel stage is an Active stage. It is used to copy multiple input data sets to a single
output data set. It accepts any number of input links and only one output link. For this stage all
input data sets must have the same metadata.
Properties:
Funnel type: Continuous Funnel / Sort Funnel / Sequential funnel
Continuous Funnel: It picks the input records randomly and there is no particular order by
which it picks data. It picks one record from each input link in their order. If data is not available
in one input data set, the stage skips to the next link than waiting for the data.
Sort Funnel: It combines the input records in the order defined by the key columns and the
order of the output records is determined by these sorting keys. All Input data sets for sort funnel
must be hash partitioned before theyre sorted. It ensures that all records with the same key
column are located in the same partition and will be processed to the same node.
Sequential funnel: It copies all records from the first input data set to the output data set, then
all the records from the second input data set, and so on.
Remove Duplicates stage:

It is an Active stage. It accepts one input and a single output. It takes sorted data set as
input, removes duplicate records and populates the resultant records to an output data set.
The input data to this stage should be sorted so that all records with identical key values are
present at the same place.
Properties:
Key: The key based on which the record is considered as a Duplicate. We can simply select the
columns from the scroll down window.
Duplicates to Retain: First / Last
This option helps to decide which duplicate record to be retained by the stage. By default it is set
to first.
DATASTAGE PX
Processing stages
27/02/2009
Filter stage:
Filter stage is an Active stage. It can have one input link and any number of output links
and a single Reject link.
This stage transforms the input data with respect to the given conditions and filter out the
remaining record which doesnt satisfy the condition. The filtered records can also be sent to the
reject link.
Properties:
Where clause: Here we can provide the condition as Store_Loc=New York. Now only the
records which satisfy this condition are populated to the output. We can give multiple conditions
for different columns from the input link.
Output rejects: True / False
If we set the option as True, the record which doesnt match the condition is sent to reject link
and if option set as False, the unmatched records are ignored.
Output records only once: True / False
When we give multiple conditions to the records from the input, Some times a single record may
satisfy more than one condition. Then we can send the same record to multiple outputs by
selecting the above condition as False. It allows to propagate the same valid record to multiple
outputs accordingly. Set the option as True doesnt allow a record to multiple outputs being
satisfied by multiple conditions.
Sort stage:
Sort stage is an Active stage. This stage can have one input link which carries the data to
be sorted, and a single output link carrying the sorted data. We specify sorting keys as the criteria
on which to perform the sort. We can specify more than one key for sorting. First key is called as
primary key and other as Secondary key column. If multiple records have the same value for the
primary key column, then this stage uses the secondary columns to sort the records. The stage
uses temporary disk space when performing the sort operation.
Properties:
Sort key: A key is a column on which data to be sorted. For example, we sort the data on
City_Code=New York, the data is sorted on this key and populated to the output.
Allow Duplicates: True / False
DATASTAGE PX
Processing stages
27/02/2009
If multiple records have identical sorting key values, only one record is retained if it is set
to False. Duplicate records also populated to the output if set to True. It is set to True by
default.
If Stable Sort is True, then the first record is retained. This property is not available for the
UNIX sort type.
Sort Utility: Datastage / UNIX
--- DataStage: It is by default. This uses the built-in DataStage sorter; you do not require any
additional software to use this option.
Stable Sort: It is applicable when we select the Sort Utility as DataStage. True guarantees that
this sort operation will not rearrange records that are already in a properly sorted data set. If set
to False, no prior ordering of records is guaranteed to be preserved by the sorting operation.
--- UNIX. This specifies that the UNIX sort command is used to perform the sort.
Output Statistics: True / False
If set as True, it causes the sort operation to output statistics. This property is not
available for the UNIX sort type. It is set as False by default.
Surrogate key generator stage:

Surrogate Key stage is an Active stage. It can have one input link and a single output link.
It generates key columns for an existing data set. This stage generates sequentially incrementing
unique integers from a given starting point. The existing columns of the data set are passed
straight through the stage.
Based on the nodes we are generating key values, and the input data partitions should be
perfectly balanced across the nodes. This can be achieved using round robin partitioning method.
If the stage is operating in parallel, each node will increment the key by the number of partitions
being written to.
Properties:
Surrogate key name: The new column name given to the surrogate key field.
Output type: 16 bit / 32 bit / 64 bit
Start Value: Here we can provide the number from where the new key should be generated.
DATASTAGE PX
Processing stages
27/02/2009
Change capture stage:

Change Capture Stage is a processing stage. It accepts two inputs and a single output
link. The stage compares two data sets and makes a record of the differences. Change Capture
stage takes two input data sets, denoted before and after, and outputs a single data set whose
records represent the changes made to the before data set to obtain the after data set. The table
definition of change data set is transferred from the after data sets table definition with
additional column as change code with values encoding the four actions: insert, delete, copy,
and edit.
The comparison is based on set of key columns, rows from the two data sets. The stage
assumes that the incoming data is hash-partitioned and sorted in ascending order. The columns
the data is hashed on should be the key columns used for the data compare. You can achieve the
sorting and partitioning using the Sort stage or by using the built-in sorting and partitioning
abilities of the Change Capture stage.
We can use both Sequential as well as parallel modes of execution for change capture
stage. We can use the companion Change Apply stage to combine the changes from the Change
Capture stage with the original before data set to reproduce the after data set.
It generates the codes for Copy, Edit, Delete and Inserted records. The Default codes are:
Copy 0, Insert 1, Delete 2, Update - 3
Properties:
Change key: Specifies the name of a difference key input column. We can specify multiple
difference key input columns here. This is a key on which we identify the record as
new/update/deleted record.
--- Sort order: Ascending / Descending
We can sort the key values in ascending or descending order for better performance.
Change Value: value of an input column which is also considered as a key to identify the
changes in the input records. We can select a column from drop down list.
Change Mode: Explicit Keys & Values / All keys Explicit values / Explicit Keys, All Values
This mode determines how keys and values are specified. Choose Explicit Keys & Values to
specify the keys and values yourself. Choose All keys Explicit values to specify that value
columns must be defined, but all other columns are key columns unless excluded. Choose
Explicit Keys, All Values to specify that key columns must be defined but all other columns are
value columns unless they are excluded.
DATASTAGE PX
Processing stages
27/02/2009
Drop output for Copy: True / False

Drops the input Copied records for output data set if set as True and populates the
records to output data set if set as False.
Drop output for Insert: True / False
Drops the input Inserted records for output data set if set as True and populates the
records to output data set if set as False.
Drop output for Delete: True / False
Drops the input Deleted records for output data set if set as True and populates
the records to output data set if set as False.
Drop output for Update: True / False
Drops the input Updated records for output data set if set as True and populates
the records to output data set if set as False.
Change Apply stage:

Change Apply Stage is a processing stage. It accepts two input link and a single output
link. It takes the input from the Change Capture stage and before data set to apply the changes to
a before data set to compute an after data set. Change Apply Stage follows the Change Capture
Stage.
Properties:
Change key: Specifies the name of a difference key input column. We can specify multiple key
input columns here. This is a key on which we identify the record as new/update/deleted record.
--- Sort order: Ascending / Descending
We can sort the key values in ascending or descending order for better performance.
Change Value: value of an input column which is also considered as a key to identify the
changes in the input records. We can select the column from drop down list.
Change Mode: Explicit Keys & Values / All keys Explicit values / Explicit Keys, All Values
This mode determines how keys and values are specified. Choose Explicit Keys & Values
to specify the keys and values yourself. Choose All keys Explicit values to specify that value
columns must be defined, but all other columns are key columns unless excluded. Choose
Explicit Keys, All Values to specify that key columns must be defined but all other columns are
value columns unless they are excluded.
DATASTAGE PX
Processing stages
27/02/2009
Check value columns on delete:

Log statistics: True / False
Modify Stage:
Modify stage is a processing stage. It can have one input link and a single output link.
Modify stage alters the record schema of its input data set. The modified data set is then output.
We can Change data types, Rename the columns, Drop columns. Keep the columns and Handle
nulls.
For example, you can use the conversion hours_from_time to convert a time to an int8, or
to an int16, int32 and so on. We have to provide the specification for the destination column in
the properties window.
Syntax for conversion:
new_columnname:new_type = conversion_function (old_columnname)
Column_Name=Handle_Null('Column_Name',Value)
Properties:
Options:
Specification: Here we need to specify the conversion for the output column.
Ex: HIREDATE = date_from_timestamp (HIREDATE)
Switch Stage:
Switch stage is a processing stage. It can have one input link, up to 128 output links and a
single reject link. The switch stage takes a single data set as input and assigns each input row to
an output data set based on the value of a selector field (column_Name).This stage performs an
operation similar to a C switch statement. Rows that satisfy none of the cases are populated to
the rejects link.
Properties:
Selector: Specifies the input column that the switch applies to. Unlike Filter
we can specify multiple conditions on a single column here.
DATASTAGE PX
stage,
Processing stages
27/02/2009
Selector Mode: Auto / Hash / User-defined Mapping

Specifies how you are going to define the case statements for the switch. We can choose one
from the above options.
Auto can be used where there are as many distinct selector values (columns) as there are
output links. Here we cannot say which record is populated to which output link. The records
are randomly sent to the output data sets. We can have a Reject link in this mode.
Hash: The incoming rows are hashed on the selector column modulo the number of output
links and assigned to an output link accordingly. We cannot have a Reject link in this mode.
User-defined Mapping means you must provide explicit mappings from case values to
outputs. If you use this mode you specify the switch expression under the User-defined
Mapping category. This is the default option in the stage.
When we select the User-defined Mapping option, it will ask for Case value which is
nothing but the number of outputs to populate the resultant data.
Case: This property appears if you have chosen a Selector Mode of User-defined Mapping. You
must specify a selector value for each value of the input column that you want to direct to an
output column. Repeat the Case property to specify multiple values. You can omit the output link
label if the value is intended for the same output link.
Syntax for the expression: Selector_Value = Output_Link_Label_Number
Options Category:
If not found: Fail / Drop / Output
Specifies the action to take if a row fails to match any of the case statements. It is not visible if
you choose a Selector Mode of Hash. We can choose between the following options
Fail: Causes the job to fail.
Drop: Drops the record.
Output: Record will be sent to the Reject link.
DATASTAGE PX
Processing stages
27/02/2009
Pivot Stage:
Pivot stage is an Active stage. It accepts one input and a single output. It converts
multiple columns in to rows. We have to see that the Data types for both the input columns
should be the same and the output is created with the same data type.
Scenario: Let us assume that Mark-1 and Mark-2 are two columns in the input data
set. Now we need to convert these two columns in to one column with Name "Marks". So, we
have to provide the derivation in the derivation field of the output column Marks during the
process. Thus a new column "Marks" is derived from the input columns Mark-1 and Mark-2.
We need to provide the following derivation in the Marks column derivation:
(Marks = Mark-1, Mark-2)
JOIN Stage:
Join stage is a processing stage. It has any number of input links and a single output link.
It doesnt allow the Reject link. It performs join operations on two or more data sets input to the
stage and then outputs the resulting data set. The input data sets are called as the right set and
the left set and intermediate sets. You can specify which is which.
Join stage can perform four join operations: Inner Join , Left outer Join, Right outer Join,
Full outer Join. The default is inner Join.
The data sets input to the Join stage must be key partitioned and sorted. This ensures that
rows with the same key column values are located in the same partition and will be processed by
the same node. Choosing the auto partitioning method will ensure that partitioning and sorting is
done. If sorting and partitioning are carried out on separate stage before the Join stage, DataStage
in auto mode will detect this and wont repartition the data again.
Properties:
Join Key: This is the Column name on which the input tables are joined together and matched
data is sent to the output data set. We can select multiple keys for joining tables.
Join type: Inner / Left Outer / Right Outer / Full Outer
DATASTAGE PX
Processing stages
27/02/2009
Inner Join: Transfers records from input data sets whose key columns contain equal
values to the output data set. Records whose key columns do not contain equal
values are dropped
Left Outer Join: Transfers all values from the left data set but transfers values from the
right data set and intermediate data sets only where key columns match. The
stage drops the key column from the right and intermediate data sets.
Right Outer Join: Transfers all values from the right data set and transfers values from
the left data set and intermediate data sets only where key columns match. The
stage drops the key column from the left and intermediate data sets.
Full Outer Join: Transfers records in which the contents of the key columns are equal
from the left and right input data sets to the output data set. It also transfers
records whose key columns contain unequal values from both input data sets to
the output data set. Full outer joins do not support more than two input
links.
Merge Stage:
Merge stage is a processing stage. It can have any number of input links, a single output
link, and the same number of reject links as there are update input links.
Merge Stage combines a sorted master data set with one or more update data sets. The
columns from the records in the master and update data sets are merged so that the output record
contains all the columns from the master record plus any additional columns from each update
record. A master record and an update record are merged only if both of them have the same
values for the merge key column(s) that you specify. Merge key columns are one or more
columns that exist in both the master and update records.
Unlike Join stage and Lookup stage, the Merge stage allows you to specify several reject
links. You must have the same number of reject links as you have update links. You can also
specify whether to drop unmatched master rows, or output them on the output data link.
The data sets input to the Merge stage must be key partitioned and sorted. This ensures
that rows with the same key column values are located in the same partition and will be
processed by the same node.
Properties:
Merger Key: It is the key column that exists in both the master and update records. We
can select the common key from the drop down list. We can give multiple key columns to merge
tables.
Sort order: Ascending / Descending
DATASTAGE PX
Processing stages
27/02/2009
Options:
Unmatched master mode = Drop / Keep
If selected as Keep, it specifies that unmatched rows from the master link are
output to the merged data set. If Set to Drop, it specifies the unmatched records to drop
It Set to Keep by default.
Warn on reject updates = True / False
Warn on Unmatched Masters = True / False
Warn On Unmatched Masters: This will warn you when bad records from the master link are
not matched. Set it to False to receive no warnings. It is set to True by default.
Warn On Reject Updates: It will warn you when bad records from any update links are
rejected. Set it to False to receive no warnings. It is set to True by default.
Lookup stage:
Lookup stage is a processing stage. It can have a reference link, a single input link, a
single output link, and a single reject link. Depending upon the type and setting of the stage(s)
providing the look up information, it can have multiple reference links.
Lookup Stage performs lookup operations on a data set read into memory from any other
Parallel job stage that can output data. It can also perform lookups directly in a DB2 or Oracle
database or in a lookup table contained in a Lookup File Set stage. Lookups can also be used for
validation of a row. If there is no corresponding entry in a lookup table to the keys values, the
row is rejected.
There are two types of lookup available with this stage. Normal lookup and sparse
lookup. Normal lookup is used when the reference data is small in size compared to the master
data set and when reference data is in huge then we need to keep the lookup type as sparse.
Instead of this, its better to go for join stage here.
We can set the lookup operations based on the following properties available in the stage tab.
Condition Not Met and Lookup Failure:
Choose the options from the Condition Not Met drop-down list. Possible actions are,
Continue: It continues processing any further lookups before sending the row to the
output link.
DATASTAGE PX
Processing stages
27/02/2009
Drop: Drops the row and continues with the next lookup.
Fail: Causes the job to issue a fatal error in the log file and stop.
Reject: Sends the row to the reject link.
To specify the action taken if a lookup on a link fails, choose an action from the Lookup Failure
drop-down list. Possible actions are:
Continue: Continues processing any further lookups before sending the row to the output
link.
Drop: Drops the row and continues with the next lookup.
Fail. Causes the job to issue a fatal error and stop.
Reject: Sends the row to the reject link.
Difference between Lookup, Merge & Join Stage:

These "three Stages" combine two or more input links according to values of userdesignated "key" column(s). They differ mainly in:
1. Memory usage
2. Treatment of rows with unmatched key values
3. Input requirements (sorted, de-duplicated)
The main difference between joiner and lookup is in the way they handle the data
and the reject links. Lookup provides a reject link. Join does not allow reject links.
Lookup is used if the data being looked up can fit in the available temporary memory.
If the volume of data is quite huge, then it is safe to go for Join stage and If the volume of
data is huge to be fit into memory you go for join and avoid lookup as paging can occur
when lookup is used.
Join requires the input dataset to be key partitioned and sorted. Lookup does not
have this requirement.
Merge allow us to capture failed lookups from each reference input separately. It also
requires identically sorted and partitioned inputs and, if more than one reference input, deduplicated reference inputs.
For merge stage, duplicates should be removed from master dataset before
processing and also from update dataset if there are more than one updated dataset. The
above-mentioned step is not required for join and lookup stages
DATASTAGE PX
Processing stages
27/02/2009
Dataset stage:
Dataset stage is a file stage. It allows you to read data from or write data to a data set. The
stage can have one input link or a single output link. It wont allow both input and output links at
the same time.
The data in the Dataset is stored in internal format. Using datasets wisely can be key to
good performance in a set of linked jobs.
A Dataset consists of two parts:
1. Descriptor file: Contains metadata and data location.
2. Data file: Contains the data.
Datasets are operating system files, each referred to by a control file, which has the suffix
(.ds). Parallel jobs use datasets to manage data within a job. It allows you to store data in
persistent form, which can then be used by other jobs. We can also manage data sets
independently of a job using the Data Set Management utility, available from the Datastage
Director and Manager.
Dataset can be saved across nodes using partitioning method selected, so it is always
faster when we used as a source or target. It can be configured to execute in parallel or sequential
mode. If the Data Set stage is operating in sequential mode, it will first collect the data before
writing it to the file using the default Auto collection method. By default the stage partitions in
Auto mode.
Sequential file as the source or target needs to be repartitioned as it is(as name suggests) a
single sequential stream of data.
Properties:
Source category:
File: The name of the control file for the data set. We can browse for the file or enter a job
parameter. By convention this file has the suffix (.ds)
Target category: (When used as a Target stage)

File: The name of the control file for the data set. You can enter the file name manually or
browse for the file or enter via job parameter if the file already exists. By convention, the file has
the suffix .ds.
Update Policy: Specifies what action will be taken if the data set you are writing to
already exists.
Append: Append any new data to the existing data.
Create (Error if exists): DataStage reports an error if the data set already exists.
Overwrite: Overwrites any existing data with new data.
DATASTAGE PX
Processing stages
27/02/2009
The default is Overwrite.
Sequential File Stage:

Sequential File stage is a file stage. The stage can have a single input link or
a single output link, and a single rejects link.
It allows you to read data from or write data to one or more flat files. We can access text
files and .csv files from this stage. The stage executes in parallel mode if reading multiple files
but executes sequentially if it is reading only one file. By default a complete file will be read by a
single node.
Sequential file cannot handle nulls by itself. We have to handle nulls at the time of source
extract. We can provide the format of the flat file to which we are writing with the help of
Format Tab available in the file stage.
By default the stage partitions in Auto mode. This attempts to work out the best
partitioning method depending on execution modes of current and preceding stages and how
many nodes are specified in the Configuration file.
If the Sequential File stage is operating in sequential mode, it will first collect the data
before writing it to the file using the default Auto collection method. If the Sequential File stage
is set to execute in parallel (i.e., is writing to multiple files), then we can set a partitioning
method by selecting from the Partition type drop-down list. This will override the current
partitioning.
Properties:
Source category:
File: The name of the source flat file. We can browse for the file or enter a job
parameter.
Read method: specific files / File pattern
Specific files: specify the pathname of the file being read from (repeat this for
reading multiple files).
File pattern: specify the pattern of the file to read from.
Options:
First line is column name: True / False
The stage considers the first line of the file as Column Name if selected True and
ignores the first line if set to False.
Keep file partitions: True / False
It keeps the partitioning available in the source and doesnt re-partition if set as True
Missing file Mode: depends / Error / Ok
Depends means the default is error unless the file has
DATASTAGE PX
Processing stages
27/02/2009
Error to stop job if one of the files mentioned does not exist
Ok to skip the file
Reject mode: Continue / Fail / Output
Continue means the stage discards any rejected records
Fail to stop if any record is rejected
Output to send the rejected records to a reject link
Report Progress: Yes / No
Enable or disable logging of a progress report at intervals
Column Generator stage:

Column Generator stage is a development / debugging stage. It can have one input link
and a single output link. The Column Generator stage adds columns to incoming data and
generates mock data for these columns for each data row processed. The new data set is then
considered as an output.
The stage can execute in parallel mode or sequential mode. In parallel mode the input
data is processed by the available nodes as specified in the Configuration file, and by any node
constraints specified on the Advanced tab. In Sequential mode the entire data set is processed by
the conductor node.
Options:
Column method: Explicit / Schema file
Explicit means you should specify the meta data for the columns you want to
generate on the Output Page Columns tab. If you use the Explicit method, you also need to
specify which of the output link columns you are generating. You can repeat this property to
specify multiple columns. If you use the Schema File method, you should specify the schema
file.
Schema file is a plain text file in which Meta data for a stage is specified.
A Schema consists of a record definition. The following is an example for record
schema:
record(
name:string[];
address:nullable string[];
date:date[];
)
DATASTAGE PX
Processing stages
27/02/2009
Row Generator stage:

Row Generator stage is a Development/Debugging stage. It has no input links, and a
single output link. It is used to generate the Mock data. We need to tell the stage how many
columns to be generated and what data type each column has. We do this in the Output page
Columns tab by specifying the column name and column definitions.
By default the Row Generator stage runs sequentially, generating data in a single
partition. You can, however, configure it to run in parallel, and you can use the partition number
when you are generating data to. For example, increment a value by the number of partitions.
You will also get the Number of Records you specify in each partition.
Properties:
Options:
Number of Records: The number of records you want your generated data set to
contain. The default number is 10.
Schema File: (optional) by default the stage will take the Meta data defined on the input
link to base the mock data set on. But we can specify the column definitions in a schema file, if
required. We can browse for the schema file or specify a job parameter.
Peek stage:
Peek stage is a development / debug stage. It has one input link and any number of output
links.
The Peek stage lets you print record column values either to the job log or to a separate
output link as the stage copies records from its input data set to one or more output data sets.
This can be helpful for monitoring the progress of your application or to diagnose a bug in your
application.
Properties:
Rows Category:
DATASTAGE PX
Processing stages
27/02/2009
All Records (After Skip): True / False

Prints all records from each partition if set to True. It is set to False by default.
Number of Records (Per Partition): Specifies the number of records to print from each
partition. The number of records is 10 by default.
Columns Category:
Peek All Input Columns = True / False
Set to False to specify that only selected columns will be printed and specify these
columns using the Input Column to Peek property. It is set to True by default and Prints
all the input columns to the output
Input Column to Peek: If you have set Peek All Input Columns to False, use this
property to specify a column to be printed. Repeat the property to specify multiple
columns.
Partitions Category:
All Partitions: True / False
Set to False to specify that only certain partitions should have columns printed, and
specify which partitions using the Partition Number property. It is set to True by default.
Partition Number: If you have set All Partitions to False, use this property to
specify which partition you want to print columns from. Repeat the property to specify
multiple columns.
Options Category:
Peek Records Output Mode: Job log / Output
Specifies whether the output should go to an output column (the Peek Records column)
or to the job log.
Show Column Names: True / False
If set as True, causes the stage to print the column name, followed by a colon, followed
by the column value. If set to False, the stage prints only the column value, followed by
a space. It is set to True by default.
DATASTAGE PX
Processing stages
27/02/2009
Transformer stage:
Transformer stage is a processing stage. Transformer stage can have one input link and
any number of output links. It can also have a reject link that takes any rows which have not been
written to any of the outputs links by reason of a write failure or expression evaluation failure.
Transformer stages do not extract data or write data to a target database. It is used to handle
extracted data, perform any conversions required, and pass data to another Transformer stage or a
stage that writes data to the target.
In Transformer Editor window, we can Create new columns, Delete columns from a link, Move
columns within a link, Edit column meta data, Define output column derivations, Define link
constraints, Specify the order in which the links to be processed and Define local stage variables.
We can simply drag and drop the metadata from input link to the output link. We can specify
derivations for columns in the output pane where we can provide System variables, functions,
job_parameters, ds_macros and ds_routines.
Stage constraints: A constraint is an expression that specifies criteria that data must meet before
it is passed to the output link. If the constraint expression evaluates to TRUE for an input row,
the data row is output on that link. Rows that are not output on any of the links can be output on
the otherwise link. Constraint expressions on different links are independent.
Stage variables: This provides a method of defining expressions which can be reused in the
output column derivations. These values are not passed to the output.
The stage variables in the transformer have required order, if it has the dependencies on other
stage variables. For example, if you have three stage variables called A, B and C, in these stage
variables if B depends up on A then you need to maintain A, B and C in order. Otherwise you
will get very strange and wrong results.
If the result of a stage variable is Single character then keep the Length of the variable as
Varchar 1. Other wise, you will not get proper result
The following is the order of execution at the time of processing records:
Stage variables Stage Constraints Column Derivations
DATASTAGE PX
Processing stages
27/02/2009
Oracle Enterprise stage:

Oracle Enterprise Stage is a database stage. It allows you to read data
from and write data to an Oracle database. The Oracle Enterprise Stage can have one input link
and a single output link, or a single reject link or output reference link.
This stage helps us in creating tables, loading data into an existing table, updating the table,
deleting the records from the table and truncates fields in the table. Once you have given the
source table name in the properties window, you need to load the metadata of the table so that
data can be retrieved from the table.
For better performance, always go for user-defined queries than auto-generated queries when you
have joins in the table. Create Indexes on tables. Some times partitioning in the database level
also helps in performance.
Properties:
Table: Specifies the name of the table to write to. We can specify a job parameter if
required. It appears only when Write Method = Load.
Write Method: Load / Delete Rows / Upsert
Load: To load the data to the target table.
Delete Rows: Allows you to specify how the delete statement is to be derived
SQL property by using Auto-generated Delete or User-defined Delete actions.
from
Upsert: It allows you to provide the insert and update SQL statements for inserting
records. We can restrict oracle stage to take actions like update only or update and insert.
These tasks can be achieved by User-defined update or Auto-generated update actions
available in oracle stage.
Write Mode: Append / Create / Replace / Truncate
It appears only when Write Method = Load.
Append: New records are appended to an existing table. This is the default option.
Create: It creates a new table. If the Oracle table already exists, an error occurs and the job
terminates. You must specify this mode if the Oracle table does not exist.
Replace: The existing table is first dropped and an entirely new table is created in its place.
Oracle uses the default partitioning method for the new table.
DATASTAGE PX
Processing stages
27/02/2009
Truncate: The existing table attributes (including schema) and the Oracle partitioning keys
are retained, but any existing records are discarded. New records are then appended to the
table.
Connection Category:
DB Options: Specify a user name and password for connecting the database.
DB Options Mode: Auto generate / User defined
Here we can provide the user name and password for connecting to the remote server. If
you select User-defined, you have to edit the database options yourself.
Options Category:
Disable Constraints: Set True to disable all enabled constraints on a table when loading, then
attempt to re enable them at end of the load.
Silently Drop Columns Not in Table. This only appears for the Load Write Method. It is False
by default. Set to True to silently drop all input columns that do not correspond to columns in an
existing Oracle table. Otherwise the stage reports an error and terminates the job.
Truncate Column Names. This only appears for the Load Write Method. Set this property to
True to truncate column names up to 30 characters.
DATASTAGE PX

Parallel Stages

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parallel Stages

Uploaded by

Copyright:

Available Formats

Processing stages

-- Aggregate type: Here we have Calculate/ Count/ Recount options.

Remove Duplicates stage:

Surrogate key generator stage:

Change capture stage:

Drop output for Copy: True / False

Change Apply stage:

Check value columns on delete:

Selector Mode: Auto / Hash / User-defined Mapping

Difference between Lookup, Merge & Join Stage:

Target category: (When used as a Target stage)

The default is Overwrite.

Sequential File Stage:

Column Generator stage:

Row Generator stage:

All Records (After Skip): True / False

Oracle Enterprise stage:

You might also like