Professional Documents
Culture Documents
When we use a mapping parameter or variable in a mapping, first we declare the mapping
parameter or variable for use in each mapplet or mapping. Then, we define a value for the
mapping parameter or variable before we run the session.
MAPPING PARAMETERS
A mapping parameter represents a constant value that we can define before running a
session.
A mapping parameter retains the same value throughout the entire session.
Example: When we want to extract records of a particular month during ETL process, we will
create a Mapping Parameter of data type and use it in query to compare it with the timestamp
field in SQL override.
We can then use the parameter in any expression in the mapplet or mapping.
We can also use parameters in a source qualifier filter, user-defined join, or extract
override, and in the Expression Editor of reusable transformations.
MAPPING VARIABLES
Unlike mapping parameters, mapping variables are values that can change between
sessions.
The Integration Service saves the latest value of a mapping variable to the repository
at the end of each successful session.
We can also clear all saved values for the session in the Workflow Manager.
We might use a mapping variable to perform an incremental read of the source. For example,
we have a source table containing time stamped transactions and we want to evaluate the
transactions on a daily basis. Instead of manually entering a session override to filter source
data each time we run the session, we can create a mapping variable, $$IncludeDateTime. In
the source qualifier, create a filter to read only rows whose transaction date equals $
$IncludeDateTime, such as:
TIMESTAMP = $$IncludeDateTime
In the mapping, use a variable function to set the variable value to increment one day each
time the session runs. If we set the initial value of $$IncludeDateTime to 8/1/2004, the first
time the Integration Service runs the session, it reads only rows dated 8/1/2004. During the
session, the Integration Service sets $$IncludeDateTime to 8/2/2004. It saves 8/2/2004 to the
repository at the end of the session. The next time it runs the session, it reads only rows from
August 2, 2004.
Used in following transformations:
Expression
Filter
Router
Update Strategy
2.
3.
Initial value
4.
Default value
Current Value:
The current value is the value of the variable as the session progresses. When a session starts,
the current value of a variable is the same as the start value. The final current value for a
variable is saved to the repository at the end of a successful session. When a session fails to
complete, the Integration Service does not update the value of the variable in the repository.
Note: If a variable function is not used to calculate the current value of a mapping variable,
the start value of the variable is saved to the repository.
Variable Data type and Aggregation Type When we declare a mapping variable in a mapping,
we need to configure the Data type and aggregation type for the variable. The IS uses the
aggregate type of a Mapping variable to determine the final current value of the mapping
variable.
Aggregation types are:
Count: Integer and small integer data types are valid only.
Max: All transformation data types except binary data type are valid.
Min: All transformation data types except binary data type are valid.
Variable Functions
Variable functions determine how the Integration Service calculates the current value of a
mapping variable in a pipeline.
SetMaxVariable: Sets the variable to the maximum value of a group of values. It ignores
rows marked for update, delete, or reject. Aggregation type set to Max.
SetMinVariable: Sets the variable to the minimum value of a group of values. It ignores rows
marked for update, delete, or reject. Aggregation type set to Min.
SetCountVariable: Increments the variable value by one. It adds one to the variable value
when a row is marked for insertion, and subtracts one when the row is Marked for deletion. It
ignores rows marked for update or reject. Aggregation type set to Count.
SetVariable: Sets the variable to the configured value. At the end of a session, it compares
the final current value of the variable to the start value of the variable. Based on the
aggregate type of the variable, it saves a final value to the repository.
Creating Mapping Parameters and Variables
1.
2.
In the Mapping Designer, click Mappings > Parameters and Variables. -or- In the
Mapplet Designer, click Mapplet > Parameters and Variables.
3.
4.
5.
Select Type and Data type. Select Aggregation type for mapping variables.
6.
Creating Mapping
1.
2.
3.
4.
5.
Transformation -> Create -> Select Expression for list -> Create > Done.
6.
7.
8.
Create variable $$var_max of MAX aggregation type and initial value 1500.
9.
Create variable $$var_min of MIN aggregation type and initial value 1500.
10. Create variable $$var_count of COUNT aggregation type and initial value 0. COUNT is
visible when datatype is INT or SMALLINT.
11. Create variable $$var_set of MAX aggregation type.
17. Open Expression editor for out_min_var and write the following expression:
SETMINVARIABLE($$var_min,SAL). Validate the expression.
18. Open Expression editor for out_count_var and write the following expression:
SETCOUNTVARIABLE($$var_count). Validate the expression.
19. Open Expression editor for out_set_var and write the following expression:
SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate.
20. Click OK. Expression Transformation below:
21. Link all ports from expression to target and Validate Mapping and Save it.
22. See mapping picture on next page.
PARAMETER FILE
A parameter file is a list of parameters and associated values for a workflow, worklet,
or session.
Parameter files provide flexibility to change these variables each time we run a
workflow or session.
We can create multiple parameter files and change the file we use for a session or
workflow. We can create a parameter file using a text editor such as WordPad or
Notepad.
Enter the parameter file name and directory in the workflow or session properties.
Session parameter: Defines a value that can change from session to session, such
as a database connection or file name.
We can specify the parameter file name and directory in the workflow or session properties.
To enter a parameter file in the workflow properties:
1. Open a Workflow in the Workflow Manager.
2. Click Workflows > Edit.
3. Click the Properties tab.
4. Enter the parameter directory and name in the Parameter Filename field.
5. Click OK.
To enter a parameter file in the session properties:
1. Open a session in the Workflow Manager.
2. Click the Properties tab and open the General Options settings.
3. Enter the parameter directory and name in the Parameter Filename field.
4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt
5. Click OK.
Reactions:
c) Use the session and workflow logs to capture the load statistics.
d) You need to document all the load timing information.
Analyze the success rows and rejections.
a) Have customized SQL queries to check the source/targets and here we will perform the
Record Count Verification.
b) Analyze the rejections and build a process to handle those rejections. This requires a clear
business requirement from the business on how to handle the data rejections. Do we need to
reload or reject and inform etc? Discussions are required and appropriate process must be
developed.
Performance Improvement
a) Network Performance
b) Session Performance
c) Database Performance
d) Analyze and if required define the Informatica and DB partitioning requirements.
Qualitative Testing
Analyze & validate your transformation business rules. More of functional testing.
e) You need review field by field from source to target and ensure that the required
transformation logic is applied.
f) If you are making changes to existing mappings make use of the data lineage feature
Available with Informatica Power Center. This will help you to find the consequences of Altering
or deleting a port from existing mapping.
g) Ensure that appropriate dimension lookups have been used and your development is in
Sync with your business requirements.
INFORMATICA TESTING:
Debugger: Very useful tool for debugging a valid mapping to gain troubleshooting information
about data and error conditions. Refer Informatica documentation to know more about
debugger tool.
Test Load Options Relational Targets.
Running the Integration Service in Safe Mode
Test a development environment. Run the Integration Service in safe mode to test
environment configured for high availability. After the Integration Service fails over in
safe mode, you can correct the error that caused the Integration Service to fail over.
Syntax Testing: Test your customized queries using your source qualifier before executing
the session. Performance Testing for identifying the following bottlenecks:
Target
Source
Mapping
Session
System
Run test sessions. You can configure a test session to read from a flat file source or
of partition points.
Monitor system performance. You can use system monitoring tools to view the
percentage of CPU use, I/O waits, and paging to identify system bottlenecks. You can
also use the Workflow Monitor to view system resource usage. Use Power Center
UAT:
In this phase you will involve the user to test the end results and ensure that business is
satisfied with the quality of the data.
Any changes to the business requirement will follow the change management process and
eventually those changes have to follow the SDLC process.
Optimize Development, Testing, and Training Systems
Dramatically accelerate development and test cycles and reduce storage costs by
creating fully functional, smaller targeted data subsets for development, testing, and
updates with up-to-date, realistic data before introducing them into production .
Easily customize provisioning rules to meet each organizations changing business
requirements.
Lower training costs by standardizing on one approach and one infrastructure.
Train employees effectively using reliable, production-like data in training systems.
Untangle complex operational systems and separate data along business lines to
divested organization.
Decrease the cost and time of data divestiture with no reimplementation costs .
data management.
Improve the reliability of application delivery by ensuring IT teams have ready access
Constraint-Based Loading:
Constraint-Based Loading:
In the Workflow Manager, you can specify constraint-based loading for a session. When you
select this option, the Integration Service orders the target load on a row-by-row basis. For
every row generated by an active source, the Integration Service loads the corresponding
transformed row first to the primary key table, then to any foreign key tables. Constraintbased loading depends on the following requirements:
Active source. Related target tables must have the same active source.
Treat rows as insert. Use this option when you insert into the target. You cannot use
updates with constraint based loading.
Active Source:
When target tables receive rows from different active sources, the Integration Service reverts
to normal loading for those tables, but loads all other targets in the session using constraintbased loading when possible. For example, a mapping contains three distinct pipelines. The
first two contain a source, source qualifier, and target. Since these two targets receive data
from different active sources, the Integration Service reverts to normal loading for both
targets. The third pipeline contains a source, Normalizer, and two targets. Since these two
targets share a single active source (the Normalizer), the Integration Service performs
constraint-based loading: loading the primary key table first, then the foreign key table.
Key Relationships:
When target tables have no key relationships, the Integration Service does not perform
constraint-based loading.
Similarly, when target tables have circular key relationships, the Integration Service reverts to
a normal load. For example, you have one target containing a primary key and a foreign key
related to the primary key in a second target. The second target also contains a foreign key
that references the primary key in the first target. The Integration Service cannot enforce
constraint-based loading for these tables. It reverts to a normal load.
Target Connection Groups:
The Integration Service enforces constraint-based loading for targets in the same target
connection group. If you want to specify constraint-based loading for multiple targets that
receive data from the same active source, you must verify the tables are in the same target
connection group. If the tables with the primary key-foreign key relationship are in different
target connection groups, the Integration Service cannot enforce constraint-based loading
when you run the workflow. To verify that all targets are in the same target connection group,
complete the following tasks:
Verify all targets are in the same target load order group and receive data from the
same active source.
Use the default partition properties and do not add partitions or partition points.
Define the same target type for all targets in the session properties.
Define the same database connection name for all targets in the session properties.
Choose normal mode for the target load type for all targets in the session properties.
Load primary key table in one mapping and dependent tables in another mapping. Use
constraint-based loading to load the primary table.
Constraint-based loading does not affect the target load ordering of the mapping. Target load
ordering defines the order the Integration Service reads the sources in each target load order
group in the mapping. A target load order group is a collection of source qualifiers,
transformations, and targets linked together in a mapping. Constraint based loading
establishes the order in which the Integration Service loads individual targets within a set of
targets receiving data from a single source qualifier.
Example
The following mapping is configured to perform constraint-based loading:
In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys
referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign key.
Since these tables receive records from a single active source, SQ_A, the Integration Service
loads rows to the target in the following order:
1. T_1
2. T_2 and T_3 (in no particular order)
3. T_4
The Integration Service loads T_1 first because it has no foreign key dependencies and
contains a primary key referenced by T_2 and T_3. The Integration Service then loads T_2
and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any particular
order. The Integration Service loads T_4 last, because it has a foreign key that references a
primary key in T_3.After loading the first set of targets, the Integration Service begins reading
source B. If there are no key relationships between T_5 and T_6, the Integration Service
reverts to a normal load for both targets.
If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data
from a single active source, the Aggregator AGGTRANS, the Integration Service loads rows to
the tables in the following order:
T_5
T_6
T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database
connection for each target, and you use the default partition properties. T_5 and T_6 are in
another target connection group together if you use the same database connection for each
target and you use the default partition properties. The Integration Service includes T_5 and
T_6 in a different target connection group because they are in a different target load order
group from the first four targets.
Enabling Constraint-Based Loading:
When you enable constraint-based loading, the Integration Service orders the target load on a
row-by-row basis. To enable constraint-based loading:
1.
In the General Options settings of the Properties tab, choose Insert for the Treat
Source Rows As property.
2.
Click the Config Object tab. In the Advanced settings, select Constraint Based Load
Ordering.
3.
Click OK.
18
1.
2.
3.
4.
5.
6.
7.
How the Informatica server sorts the string values in Rank transformation?
8.
9.
10. In update strategy Relational table or flat file which gives us more performance? Why?
11. What are the out put files that the Informatica server creates during running a
session?
12. Can you explain what are error tables in Informatica are and how we do error handling
in Informatica?
13. Difference between constraint base loading and target load plan?
14. Difference between IIF and DECODE function?
15. How to import oracle sequence into Informatica?
16. What is parameter file?
17. Difference between Normal load and Bulk load?
18. How u will create header and footer in target using Informatica?
19. What are the session parameters?
20. Where does Informatica store rejected data? How do we view them?
21. What is difference between partitioning of relational target and file targets?
22. What are mapping parameters and variables in which situation we can use them?
23. What do you mean by direct loading and Indirect loading in session properties?
24. How do we implement recovery strategy while running concurrent batches?
25. Explain the versioning concept in Informatica?
26.What is Data driven?
27.What is batch? Explain the types of the batches?
28.What are the types of meta data repository stores?
29.Can you use the mapping parameters or variables created in one mapping into another
mapping?
30.Why did we use stored procedure in our ETL Application?
31.When we can join tables at the Source qualifier itself, why do we go for joiner
transformation?
32.What is the default join operation performed by the look up transformation?
33.What is hash table Informatica?
34.In a joiner transformation, you should specify the table with lesser rows as the master
table. Why?
35.Difference between Cached lookup and Un-cached lookup?
36.Explain what DTM does when you start a work flow?
37.Explain what Load Manager does when you start a work flow?
38.In a Sequential batch how do i stop one particular session from running?
39.What are the types of the aggregations available in Informatica?
40.How do I create Indexes after the load process is done?
41.How do we improve the performance of the aggregator transformation?
42.What are the different types of the caches available in Informatica? Explain in detail?
43.What is polling?
44.What are the limitations of the joiner transformation?
45.What is Mapplet?
46.What are active and passive transformations?
47.What are the options in the target session of update strategy transformation?
48.What is a code page? Explain the types of the code pages?
49.What do you mean rank cache?
50.How can you delete duplicate rows with out using Dynamic Lookup? Tell me any other
ways using lookup delete the duplicate rows?
51.Can u copy the session in to a different folder or repository?
52.What is tracing level and what are its types?
53.What is a command that used to run a batch?
54.What are the unsupported repository objects for a mapplet?
55.If your workflow is running slow, what is your approach towards performance tuning?
56.What are the types of mapping wizards available in Informatica?
57.After dragging the ports of three sources (Sql server, oracle, Informix) to a single source
qualifier, can we map these three ports directly to target?
58.Why we use stored procedure transformation?
59.Which object is required by the debugger to create a valid debug session?
102. You have a requirement to alert you of any long running sessions in your workflow. How
can you create a workflow that will send you email for sessions running more than 30 minutes.
You can use any method, shell script, procedure or Informatica mapping or workflow control?
24
1. What is a data-warehouse?
2. What are Data Marts?
3. What is ER Diagram?
4. What is a Star Schema?
5. What is Dimensional Modelling?
6. What Snow Flake Schema?
7. What are the Different methods of loading Dimension tables?
8. What are Aggregate tables?
9. What is the Difference between OLTP and OLAP?
10. What is ETL?
11. What are the various ETL tools in the Market?
12. What are the various Reporting tools in the Market?
13. What is Fact table?
14. What is a dimension table?
15. What is a lookup table?
16. What is a general purpose scheduling tool? Name some of them?
17. What are modeling tools available in the Market? Name some of them?
18. What is real time data-warehousing?
19. What is data mining?
20. What is Normalization? First Normal Form, Second Normal Form , Third Normal Form?
21. What is ODS?
22. What type of Indexing mechanism do we need to use for a typical
Data warehouse?
23. Which columns go to the fact table and which columns go the dimension table? (My user
needs to see <data element<data element broken by <data element<data element>
All elements before broken = Fact Measures
All elements after broken = Dimension Elements
24. What is a level of Granularity of a fact table? What does this signify?(Weekly level
summarization there is no need to have Invoice Number in the fact table anymore)
25. How are the Dimension tables designed? De-Normalized, Wide, Short, Use Surrogate
Keys, Contain Additional date fields and flags.
26. What are slowly changing dimensions?
27. What are non-additive facts? (Inventory,Account balances in bank)
28. What are conformed dimensions?
29. What is VLDB? (Database is too large to back up in a time frame then it's a VLDB)
30. What are SCD1, SCD2 and SCD3?
22
Target Load Plan
When you use a mapplet in a mapping, the Mapping Designer lets you set the target load plan
for sources within the mapplet.
Setting the Target Load Order
You can configure the target load order for a mapping containing any type of target definition.
In the Designer, you can set the order in which the Integration Service sends rows to targets
in different target load order groups in a mapping. A target load order group is the collection
of source qualifiers, transformations, and targets linked together in a mapping. You can set the
target load order if you want to maintain referential integrity when inserting, deleting, or
updating tables that have the primary key and foreign key constraints.
The Integration Service reads sources in a target load order group concurrently, and it
processes target load order groups sequentially.
To specify the order in which the Integration Service sends data to targets, create one source
qualifier for each target within a mapping. To set the target load order, you then determine in
which order the Integration Service reads each source in the mapping.
The following figure shows two target load order groups in one mapping:
In this mapping, the first target load order group includes ITEMS, SQ_ITEMS, and T_ITEMS.
The second target load order group includes all other objects in the mapping, including the
TOTAL_ORDERS target. The Integration Service processes the first target load order group,
and then the second target load order group.
When it processes the second target load order group, it reads data from both sources at the
same time.
To set the target load order:
1.
2.
3.
The Target Load Plan dialog box lists all Source Qualifier transformations in the
mapping and the targets that receive data from each source qualifier.
4.
5.
Click the Up and Down buttons to move the source qualifier within the load order.
6.
Repeat steps 3 to 4 for other source qualifiers you want to reorder. Click OK.
MAPPLETS
16
A mapplet is a reusable object that we create in the Mapplet Designer.
Include source definitions: Use multiple source definitions and source qualifiers to
provide source data for a mapping.
Contain unused ports: We do not have to connect all mapplet input and output ports in
a mapping.
Mapplet Input:
Mapplet input can originate from a source definition and/or from an Input transformation in
the mapplet. We can create multiple pipelines in a mapplet.
Mapplet Output:
The output of a mapplet is not connected to any target table.
A mapplet must contain at least one Output transformation with at least one
connected port in the mapplet.
Example1: We will join EMP and DEPT table. Then calculate total salary. Give the output to
mapplet out transformation.
EMP and DEPT will be source tables.
Output will be given to transformation Mapplet_Out.
Steps:
1.
2.
3.
4.
5.
6.
Transformation -> Create -> Select Expression for list -> Create -> Done
7.
Pass all ports from joiner to expression and then calculate total salary as described in
expression transformation.
8.
Now Transformation -> Create -> Select Mapplet Out from list > Create -> Give
name and then done.
9.
We can mapplet in mapping by just dragging the mapplet from mapplet folder on left
pane as we drag source and target tables.
When we use the mapplet in a mapping, the mapplet object displays only the ports
from the Input and Output transformations. These are referred to as the mapplet input
and mapplet output ports.
2.
3.
4.
5.
Transformation -> Create -> Select Filter for list -> Create -> Done.
6.
Drag all ports from mplt_example1 to filter and give filter condition.
7.
Connect all ports from filter to target. We can add more transformations after filter if
needed.
8.
PARTITIONING
A pipeline consists of a source qualifier and all the transformations and Targets that
receive data from that source qualifier.
When the Integration Service runs the session, it can achieve higher Performance by
partitioning the pipeline and performing the extract, Transformation, and load for each
partition in parallel.
Partition points mark thread boundaries and divide the pipeline into stages.
2. Number of Partitions
When we increase or decrease the number of partitions at any partition point, the
Workflow Manager increases or decreases the number of partitions at all Partition
points in the pipeline.
The number of partitions we create equals the number of connections to the source or
target. For one partition, one database connection will be used.
3. Partition types
The Integration Service creates a default partition type at each partition point.
If we have the Partitioning option, we can change the partition type. This option is
purchased separately.
The partition type controls how the Integration Service distributes data among
partitions at partition points.
PARTITIONING TYPES
1. Round Robin Partition Type
Use round-robin partitioning when we need to distribute rows evenly and do not need
to group data among partitions.
In
pass-through
partitioning,
the
Integration
Service
processes
data
without
All rows in a single partition stay in that partition after crossing a pass-Through
partition point.
Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets only.
Use any number of pipeline partitions and any number of database partitions.
We can improve performance when the number of pipeline partitions equals the
number of database partitions.
When we use database partitioning with a source qualifier with one source, the Integration
Service generates SQL queries for each database partition and distributes the data from the
database partitions among the session partitions Equally.
For example, when a session has three partitions and the database has five partitions, 1 st and
2nd session partitions will receive data from 2 database partitions each. Thus four DB partitions
used. 3rd Session partition will receive Data from the remaining 1 DB partition.
Partitioning a Source Qualifier with Multiple Sources Tables
The Integration Service creates SQL queries for database partitions based on the Number of
partitions in the database table with the most partitions.
If the session has three partitions and the database table has two partitions, one of the
session partitions receives no data.
4. Hash Auto-Keys Partition Type
The Integration Service uses all grouped or sorted ports as a compound Partition key.
Use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and Unsorted
Aggregator transformations to ensure that rows are grouped Properly before they
enter these transformations.
The Integration Service uses a hash function to group rows of data among Partitions.
The Integration Service passes data to each partition depending on the Ranges we
specify for each port.
Use key range partitioning where the sources or targets in the pipeline are Partitioned
by key range.
Example: Customer 1-100 in one partition, 101-200 in another and so on. We Define
the range for each partition.
The Workflow Manager does not allow us to use links to create loops in the workflow.
Each link in the workflow can run only once.
Valid Workflow :
Example of loop:
Once we create links between tasks, we can specify conditions for each link to
determine the order of execution in the workflow.
If we do not specify conditions for each link, the Integration Service runs the next task
in the workflow by default.
Steps:
1.
In the Workflow Designer workspace, double-click the link you want to specify.
2.
3.
In the Expression Editor, enter the link condition. The Expression Editor provides
predefined workflow variables, user-defined workflow variables, variable functions, and
Boolean and arithmetic operators.
4.
Link conditions
Decision task
Assignment task
The Workflow Manager contains many types of tasks to help you build workflows and worklets.
We can create reusable tasks in the Task Developer.
Types of tasks:
Task Type
Reusable or not
Session
Task Developer
Yes
Command
Worklet Designer
Event-Raise
Workflow Designer No
Event-Wait
Worklet Designer
Yes
No
Timer
No
Decision
No
Assignment
No
Control
No
SESSION TASK
A session is a set of instructions that tells the Power Center Server how and when to
move data from sources to targets.
To run a session, we must first create a workflow to contain the Session task.
We can run as many sessions in a workflow as we need. We can run the Session tasks
sequentially or concurrently, depending on our needs.
The Power Center Server creates several files and in-memory caches depending on the
transformations and options used in the session.
EMAIL TASK
The Workflow Manager provides an Email task that allows us to send email during a
workflow.
Created by Administrator usually and we just drag and use it in our mapping.
Steps:
1.
2.
Select an Email task and enter a name for the task. Click Create.
3.
Click Done.
4.
Double-click the Email task in the workspace. The Edit Tasks dialog box appears.
5.
6.
Enter the fully qualified email address of the mail recipient in the Email User Name
field.
7.
Enter the subject of the email in the Email Subject field. Or, you can leave this field
blank.
8.
Click the Open button in the Email Text field to open the Email Editor.
9.
2.
3.
4.
5.
6.
7.
8.
We can also drag the email task and use as per need.
We can set the option to send email on success or failure in components tab of a
session task.
COMMAND TASK
The Command task allows us to specify one or more shell commands in UNIX or DOS
commands in Windows to run during the workflow.
For example, we can specify shell commands in the Command task to delete reject files, copy
a file, or archive target files.
Ways of using command task:
1. Standalone Command task: We can use a Command task anywhere in the workflow or
worklet to run shell commands.
2. Pre- and post-session shell command: We can call a Command task as the pre- or postsession shell command for a Session task. This is done in COMPONENTS TAB of a session. We
can run it in Pre-Session Command or Post Session Success Command or Post Session Failure
Command. Select the Value and Type option as we did in Email task.
2.
3.
Enter a name for the Command task. Click Create. Then click done.
4.
5.
6.
7.
In the Command field, click the Edit button to open the Command Editor.
8.
9.
Create a task using the above steps to copy a file in Task Developer.
2.
Open Workflow Designer. Workflow -> Create -> Give name and click ok.
3.
4.
5.
Double click link between Session and Command and give condition in editor as
6.
$S_M_FILTER_EXAMPLE.Status=SUCCEEDED
7.
Workflow-> Validate
8.
Pre-defined event: A pre-defined event is a file-watch event. This event Waits for a
2.
3.
Click to Add button to add events and give the names as per need.
4.
Click Apply -> Ok. Validate the workflow and Save it.
EVENT RAISE: Event-Raise task represents a user-defined event. We use this task to
Example1: Use an event wait task and make sure that session s_filter_example runs when
abc.txt file is present in D:\FILES folder.
Steps for creating workflow:
1.
Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok.
2.
Task -> Create -> Select Event Wait. Give name. Click create and done.
3.
4.
5.
Right click on event wait task and click EDIT -> EVENTS tab.
6.
Select Pre Defined option there. In the blank space, give directory and filename to
watch. Example: D:\FILES\abc.tct
7.
Example 2: Raise a user defined event when session s_m_filter_example succeeds. Capture
this event in event wait task and run session S_M_TOTAL_SAL_EXAMPLE
Steps for creating workflow:
1.
Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok.
2.
Workflow -> Edit -> Events Tab and add events EVENT1 there.
3.
4.
Click Tasks -> Create -> Select EVENT RAISE from list. Give name
5.
6.
Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User Defined
Event and Select EVENT1 from the list displayed. Apply -> OK.
7.
Click link between ER_Example and s_m_filter_example and give the condition
$S_M_FILTER_EXAMPLE.Status=SUCCEEDED
8.
Click Tasks -> Create -> Select EVENT WAIT from list. Give name EW_WAIT. Click
Create and then done.
9.
TIMER TASK
The Timer task allows us to specify the period of time to wait before the Power Center Server
runs the next task in the workflow. The Timer task has two types of settings:
Absolute time: We specify the exact date and time or we can choose a user-defined
workflow variable to specify the exact time. The next task in workflow will run as per
Example: Run session s_m_filter_example relative to 1 min after the timer task.
Steps for creating workflow:
1.
Workflow -> Create -> Give name wf_timer_task_example -> Click ok.
2.
Click Tasks -> Create -> Select TIMER from list. Give name TIMER_Example. Click
Create and then done.
3.
4.
5.
Select Relative Time Option and Give 1 min and Select From start time of this task
Option.
6.
7.
8.
DECISION TASK
The Decision task allows us to enter a condition that determines the execution of the
Workflow -> Create -> Give name wf_decision_task_example -> Click ok.
2.
3.
Click Tasks -> Create -> Select DECISION from list. Give name DECISION_Example.
Click Create and then done. Link DECISION_Example to both s_m_filter_example and
S_M_TOTAL_SAL_EXAMPLE.
4.
5.
Set Treat Input Links As to OR. Default is AND. Apply and click OK.
6.
Now edit decision task again and go to PROPERTIES Tab. Open the Expression editor
by clicking the VALUE section of Decision Name attribute and enter the following
condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED OR
$S_M_TOTAL_SAL_EXAMPLE.Status = SUCCEEDED
7.
8.
Drag command task and S_m_sample_mapping_EMP task to workspace and link them
to DECISION_Example task.
9.
10. Double click link between Command task and DECISION_Example and give the
condition: $DECISION_Example.Condition = 1. Validate and click OK.
11. Workflow Validate and repository Save.
12. Run workflow and see the result.
CONTROL TASK
We can use the Control task to stop, abort, or fail the top-level workflow or the parent
workflow based on an input link condition.
A parent workflow or worklet is the workflow or worklet that contains the Control task.
We give the condition to the link connected to Control Task.
Control Option
Description
Fail Me
Fail Parent
Stop Parent
Abort Parent
Abort Top-Level
WF
Example: Drag any 3 sessions and if anyone fails, then Abort the top level workflow.
Steps for creating workflow:
1.
Workflow -> Create -> Give name wf_control_task_example -> Click ok.
2.
Drag any 3 sessions to workspace and link all of them to START task.
3.
Click Tasks -> Create -> Select CONTROL from list. Give name cntr_task.
4.
5.
6.
Double click link between cntr_task and any session say s_m_filter_example and give
the condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED.
7.
8.
Right click cntr_task-> EDIT -> GENERAL tab. Set Treat Input Links As to OR. Default
is AND.
9.
Go to PROPERTIES tab of cntr_task and select the value Fail top level
ASSIGNMENT TASK
2.
3.
4.
Enter a name for the Assignment task. Click Create. Then click Done.
5.
Double-click the Assignment task to open the Edit Task dialog box.
6.
7.
8.
Select the variable for which you want to assign a value. Click OK.
9.
Click the Edit button in the Expression field to open the Expression Editor.
SCD Type 1
17
Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly,
rather than changing on a time-based, regular schedule
For example, you may have a dimension in your database that tracks the sales records of
your company's salespeople. Creating sales reports seems simple enough, until a salesperson
is transferred from one regional office to another. How do you record such a change in your
sales dimension?
You could sum or average the sales by salesperson, but if you use that to compare the
performance of salesmen, that might give misleading information. If the salesperson that was
transferred used to work in a hot market where sales were easy, and now works in a market
where sales are infrequent, her totals will look much stronger than the other salespeople in
her new region, even if they are just as good. Or you could create a second salesperson record
and treat the transferred person as a new sales person, but that creates problems also.
Dealing with these issues involves SCD management methodologies:
Type 1:
The Type 1 methodology overwrites old data with new data, and therefore does not track
historical data at all. This is most appropriate when correcting certain types of data errors,
such as the spelling of a name. (Assuming you won't ever need to know how it used to be
misspelled in the past.)
Here is an example of a database table that keeps supplier information:
ABC
Acme Supply Co CA
In this example, Supplier_Code is the natural key and Supplier_Key is asurrogate key.
Technically, the surrogate key is not necessary, since the table will be unique by the natural
key (Supplier_Code). However, the joins will perform better on an integer than on a character
string.
Now imagine that this supplier moves their headquarters to Illinois. The updated table would
simply overwrite this record:
ABC
Acme Supply Co IL
The obvious disadvantage to this method of managing SCDs is that there is no historical
record kept in the data warehouse. You can't tell if your suppliers are tending to move to the
Midwest, for example. But an advantage to Type 1 SCDs is that they are very easy to
maintain.
Explanation with an Example:
Source Table: (01-01-11) Target Table: (01-01-11)
Emp no
Ename
Sal
101
1000
102
2000
103
3000
Emp no
Ename
Sal
101
1000
102
2000
103
3000
The necessity of the lookup transformation is illustrated using the above source and target
table.
Source Table: (01-02-11) Target Table: (01-02-11)
Emp no Ename
Sal
Empno
Ename
Sal
101
1000
101
1000
102
2500
102
2500
103
3000
103
3000
104
4000
104
4000
In the second Month we have one more employee added up to the table with the
Ename D and salary of the Employee is changed to the 2500 instead of 2000.
Create a table by name emp_source with three columns as shown above in oracle.
In the same way as above create two target tables with the names emp_target1,
emp_target2.
Go to the targets Menu and click on generate and execute to confirm the creation of
the target tables.
The snap shot of the connections using different kinds of transformations are shown
below.
Here in this transformation we are about to use four kinds of transformations namely
Lookup transformation, Expression Transformation, Filter Transformation, Update
Transformation. Necessity and the usage of all the transformations will be discussed in
detail below.
The first thing that we are goanna do is to create a look up transformation and
connect the Empno from the source qualifier to the transformation.
What Lookup transformation does in our mapping is it looks in to the target table
(emp_table) and compares it with the Source Qualifier and determines whether to
insert, update, delete or reject rows.
In the Ports tab we should add a new column and name it as empno1 and this is
column for which we are gonna connect from the Source Qualifier.
The Input Port for the first column should be unchked where as the other ports like
Output and lookup box should be checked. For the newly created column only input
and output boxes should be checked.
(ii)Lookup Table Column should be Empno, Transformation port should be Empno1 and
Operator should =.
Expression Transformation: After we are done with the Lookup Transformation we are using
an expression transformation to check whether we need to insert the records the same
records or we need to update the records. The steps to create an Expression Transformation
are shown below.
Drag all the columns from both the source and the look up transformation and drop
them all on to the Expression transformation.
Now double click on the Transformation and go to the Ports tab and create two new
columns and name it as insert and update. Both these columns are gonna be our
output data so we need to have check mark only in front of the Output check box.
The Snap shot for the Edit transformation window is shown below.
The condition that we want to parse through our output data are listed below.
Input IsNull(EMPNO1)
Output iif(Not isnull (EMPNO1) and Decode(SAL,SAL1,1,0)=0,1,0) .
Filter Transformation: we are gonna have two filter transformations one to insert and other
to update.
Connect the Insert column from the expression transformation to the insert column in
the first filter transformation and in the same way we are gonna connect the update
column in the expression transformation to the update column in the second filter.
Later now connect the Empno, Ename, Sal from the expression transformation to both
filter transformation.
If there is no change in input data then filter transformation 1 forwards the complete
input to update strategy transformation 1 and same output is gonna appear in the
target table.
If there is any change in input data then filter transformation 2 forwards the complete
input to the update strategy transformation 2 then it is gonna forward the updated
input to the target table.
Drag the respective Empno, Ename and Sal from the filter transformations and drop
them on the respective Update Strategy Transformation.
Now go to the Properties tab and the value for the update strategy expression is 0 (on
the 1st update transformation).
Now go to the Properties tab and the value for the update strategy expression is 1 (on
the 2nd update transformation).
We are all set here finally connect the outputs of the update transformations to the
target table.
WORKFLOW VARIABLES
17
We can create and use variables in a workflow to reference values and record information.
Types of workflow variables:
Task-specific variables:
The Workflow Manager provides a set of task-specific variables for each task in the workflow.
The Workflow Manager lists task-specific variables under the task name in the Expression
Editor.
Task-specific
Description
Task Type
Condition
Decision Task
EndTime
ErrorCode
ErrorMsg
FirstErrorCode
FirstErrorMsg
PrevTaskStatus
variable
All Tasks
SUCCEEDED.
SrcFailedRows
SrcSuccessRows
StartTime
Session
Status
TgtFailedRows
TgtSuccessRows
TotalTransErrors
Session
The Integration Service looks for the start value of a variable in the following order:
1.
2.
3.
4.
2.
3.
4.
In the Data type field, select the data type for the new variable.
5.
Enable the Persistent option if we want the value of the variable retained from one
execution of the workflow to the next.
6. Enter the default value for the variable in the Default field.
7. To validate the default value of the new workflow variable, click the Validate button.
8. Click Apply to save the new workflow variable.
9. Click OK to close the workflow properties.
IDENTIFICATION OF BOTTLENECKS
IDENTIFICATION OF BOTTLENECKS
Performance of Informatica is dependant on the performance of its several components like
database, network, transformations, mappings, sessions etc. To tune the performance of
Informatica, we have to identify the bottleneck first.
Bottleneck may be present in source, target, transformations, mapping, session,database or
network. It is best to identify performance issue in components in the order source, target,
transformations, mapping and session. After identifying the bottleneck, apply the tuning
mechanisms in whichever way they are applicable to the project.
Identify bottleneck in Source
If source is a relational table, put a filter transformation in the mapping, just after source
qualifier; make the condition of filter to FALSE. So all records will be filtered off and none will
proceed to other parts of the mapping.In original case, without the test filter, total time taken
is as follows:Total Time = time taken by (source + transformations + target load)
Now because of filter, Total Time = time taken by source
So if source was fine, then in the latter case, session should take less time. Still if the session
takes near equal time as former case, then there is a source bottleneck.
Identify bottleneck in Target
If the target is a relational table, then substitute it with a flat file and run the session. If the
time taken now is very much less than the time taken for the session to load to table, then the
target table is the bottleneck.
Identify bottleneck in Transformation
Remove the transformation from the mapping and run it. Note the time taken.Then put the
transformation back and run the mapping again. If the time taken now is significantly more
than previous time, then the transformation is the bottleneck.
But removal of transformation for testing can be a pain for the developer since that might
require further changes for the session to get into the working mode.
So we can put filter with the FALSE condition just after the transformation and run the session.
If the session run takes equal time with and without this test filter,then transformation is the
bottleneck.
Identify bottleneck in sessions
We can use the session log to identify whether the source, target or transformations are the
performance bottleneck. Session logs contain thread summary records like the following:-
MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point
[SQ_test_all_text_data] has completed: Total Run Time =[11.703201] secs, Total Idle Time =
[9.560945] secs, Busy Percentage =[18.304876].
MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of
partition point [SQ_test_all_text_data] has completed: Total Run Time = [11.764368] secs,
Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].
If busy percentage is 100, then that part is the bottleneck.
Basically we have to rely on thread statistics to identify the cause of performance issues. Once
the Collect Performance Data option (In session Properties tab) is enabled, all the
performance related information would appear in the log created by the session.
This mapping contains an Expression transformation that creates an item ID based on the
store number 5419 and the item ID from the source. To push the transformation logic to the
database, the Integration Service generates the following SQL statement:
INSERT INTO T_ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) SELECT CAST((CASE WHEN 5419
IS NULL THEN '' ELSE 5419 END) + '_' + (CASE WHEN ITEMS.ITEM_ID IS NULL THEN '' ELSE
ITEMS.ITEM_ID END) AS INTEGER), ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS2
ITEMS
The Integration Service generates an INSERT SELECT statement to retrieve the ID, name, and
description values from the source table, create new item IDs, and insert the values into the
ITEM_ID, ITEM_NAME, and ITEM_DESC columns in the target table. It concatenates the store
number 5419, an underscore, and the original ITEM ID to get the new item ID.
Pushdown Optimization Types
You can configure the following types of pushdown optimization:
Full
pushdown
optimization. The
Integration
Service
attempts
to
push
all
transformation logic to the target database. If the Integration Service cannot push all
transformation logic to the database, it performs both source-side and target-side
pushdown optimization.
Running Source-Side Pushdown Optimization Sessions
When you run a session configured for source-side pushdown optimization, the Integration
Service analyzes the mapping from the source to the target or until it reaches a downstream
transformation it cannot push to the source database.
The Integration Service generates and executes a SELECT statement based on the
transformation logic for each transformation it can push to the database. Then, it reads the
results of this SQL query and processes the remaining transformations.
Running Target-Side Pushdown Optimization Sessions
When you run a session configured for target-side pushdown optimization, the Integration
Service analyzes the mapping from the target to the source or until it reaches an upstream
transformation it cannot push to the target database. It generates an INSERT, DELETE, or
UPDATE statement based on the transformation logic for each transformation it can push to
the target database. The Integration Service processes the transformation logic up to the point
that it can push the transformation logic to the database. Then, it executes the generated SQL
on the Target database.
Running Full Pushdown Optimization Sessions
To use full pushdown optimization, the source and target databases must be in the same
relational database management system. When you run a session configured for full pushdown
optimization, the Integration Service analyzes the mapping from the source to the target or
until it reaches a downstream transformation it cannot push to the target database. It
generates and executes SQL statements against the source or target based on the
transformation logic it can push to the database.
When you run a session with large quantities of data and full pushdown optimization, the
database server must run a long transaction. Consider the following database performance
issues when you generate a long transaction:
A long transaction locks the database for longer periods of time. This reduces database
concurrency and increases the likelihood of deadlock.
When you push LAST_DAY () to Oracle, Oracle returns the date up to the second. If
the input date contains sub seconds, Oracle trims the date to the second.
When you push LTRIM, RTRIM, or SOUNDEX to a database, the database treats the
argument (' ') as NULL, but the Integration Service treats the argument (' ') as spaces.
An IBM DB2 database and the Integration Service produce different results for
STDDEV and VARIANCE. IBM DB2 uses a different algorithm than other databases to
calculate STDDEV and VARIANCE.
When you push SYSDATE or SYSTIMESTAMP to the database, the database server
returns the timestamp in the time zone of the database server, not the Integration
Service.
If you push SYSTIMESTAMP to an IBM DB2 or a Sybase database, and you specify the
format for SYSTIMESTAMP, the database ignores the format and returns the complete
time stamp.
You can push SYSTIMESTAMP (SS) to a Netezza database, but not SYSTIMESTAMP
(MS) or SYSTIMESTAMP (US).
When you push TO_CHAR (DATE) or TO_DATE () to Netezza, dates with sub second
precision must be in the YYYY-MM-DD HH24: MI: SS.US format. If the format is
different, the Integration Service does not push the function to Netezza.
SCD 2 (Complete):
Let us drive the point home using a simple scenario. For eg., in the current month ie.,(0101-2010) we are provided with an source table with the three columns and three rows in it
like (EMpno,Ename,Sal). There is a new employee added and one change in the records
in the month (01-02-2010). We are gonna use the SCD-2 style to extract and load the
records in to target table.
The thing to be noticed here is if there is any update in the salary of any employee
then the history of that employee is displayed with the current date as the start date
and the previous date as the end date.
Emp no
Ename
Sal
101
1000
102
2000
103
3000
Skey
Emp no
Ename
Sal
S-date
E-date
Ver
Flag
100
101
1000
01-01-10
Null
200
102
2000
01-01-10
Null
300
103
3000
01-01-10
Null
Emp no
Ename
Sal
101
1000
102
2500
103
3000
104
4000
Skey
Emp no
Ename
Sal
S-date
E-date
Ver
Flag
100
101
1000
01-02-10
Null
200
102
2000
01-02-10
Null
300
103
3000
01-02-10
Null
201
102
2500
01-02-10
01-01-10
400
104
4000
01-02-10
Null
In the second Month we have one more employee added up to the table with the Ename D and
salary of the Employee is changed to the 2500 instead of 2000.
Step 1: Is to import Source Table and Target table.
Create a table by name emp_source with three columns as shown above in oracle.
Drag the Target table twice on to the mapping designer to facilitate insert or update
process.
Go to the targets Menu and click on generate and execute to confirm the creation of
the target tables.
The snap shot of the connections using different kinds of transformations are shown
below.
In The Target Table we are goanna add five columns (Skey, Version, Flag, S_date
,E_Date).
Here in this transformation we are about to use four kinds of transformations namely
Lookup transformation (1), Expression Transformation (3), Filter Transformation (2),
Sequence Generator. Necessity and the usage of all the transformations will be
discussed in detail below.
Look up Transformation: The purpose of this transformation is to Lookup on the target table
and to compare the same with the Source using the Lookup Condition.
The first thing that we are gonna do is to create a look up transformation and connect
the Empno from the source qualifier to the transformation.
Drag the Empno column from the Source Qualifier to the Lookup Transformation.
(ii)Lookup Table Column should be Empno, Transformation port should be Empno1 and
Operator should =.
Expression Transformation: After we are done with the Lookup Transformation we are using
an expression transformation to find whether the data on the source table matches with the
target table. We specify the condition here whether to insert or to update the table. The steps
to create an Expression Transformation are shown below.
Drag all the columns from both the source and the look up transformation and drop
them all on to the Expression transformation.
Now double click on the Transformation and go to the Ports tab and create two new
columns and name it as insert and update. Both these columns are goanna be our
output data so we need to have unchecked input check box.
The Snap shot for the Edit transformation window is shown below.
The condition that we want to parse through our output data are listed below.
Insert : IsNull(EmpNO1)
Update: iif(Not isnull (Skey) and Decode(SAL,SAL1,1,0)=0,1,0) .
Filter Transformation: We need two filter transformations the purpose the first filter is to
filter out the records which we are goanna insert and the next is vice versa.
If there is no change in input data then filter transformation 1 forwards the complete
input to Exp 1 and same output is goanna appear in the target table.
If there is any change in input data then filter transformation 2 forwards the complete
input to the Exp 2 then it is gonna forward the updated input to the target table.
The closer view of the connections from the expression to the filter is shown below.
We are gonna have a sequence generator and the purpose of the sequence generator
is to increment the values of the skey in the multiples of 100 (bandwidth of 100).
Expression Transformation:
Exp 1: It updates the target table with the skey values. Point to be noticed here is skey gets
multiplied by 100 and a new row is generated if there is any new EMP added to the list. Else
the there is no modification done on the target table.
Now add a new column as N_skey and the expression for it is gonna be Nextval1*100.
We are goanna make the s-date as the o/p and the expression for it is sysdate.
Exp 2: If same employee is found with any updates in his records then Skey gets added by 1
and version changes to the next higher number,F
Now add a new column as N_skey and the expression for it is gonna be Skey+1.
Exp 3: If any record of in the source table gets updated then we make it only as the output.
Update Strategy: This is place from where the update instruction is set on the target table.
SCD Type 3
26
SCD Type 3:
This Method has limited history preservation, and we are goanna use skey as the Primary key
here.
Empno
Ename
Sal
101
1000
102
2000
103
3000
Empno Ename
C-sal
P-sal
101
1000
102
2000
103
3000
Empno
Ename
Sal
101
1000
102
4566
103
3000
Empno
Ename
C-sal
P-sal
101
1000
102
4566
Null
103
3000
102
4544
4566
Step 2: here we are goanna see the purpose and usage of all the transformations that we
have used in the above mapping.
Look up Transformation: The look Transformation looks the target table and compares the
same with the source table. Based on the Look up condition it decides whether we need to
update, insert, and delete the data from being loaded in to the target table.
As usually we are goanna connect Empno column from the Source Qualifier and
connect it to look up transformation. Prior to this Look up transformation has to look at
the target table.
Next to this we are goanna specify the look up condition empno =empno1.
Finally specify that connection Information (Oracle) and look up policy on multiple
mismatches (use last value) in the Properties tab.
Expression Transformation:
We are using the Expression Transformation to separate out the Insert-stuffs and UpdateStuffs logically.
Drag all the ports from the Source Qualifier and Look up in to Expression.
These two ports are goanna be just output ports. Specify the below conditions in the
Expression editor for the ports respectively.
Insert: isnull(ENO1 )
Update: iif(not isnull(ENO1) and decode(SAL,Curr_Sal,1,0)=0,1,0)
Filter Transformation: We are goanna use two filter Transformation to filter out the data
physically in to two separate sections one for insert and the other for the update process to
happen.
Filter 1:
Drag the Insert and other three ports which came from source qualifier in to the
Expression in to first filter.
Filter 2:
Drag the update and other four ports which came from Look up in to the Expression in
to Second filter.
Update Strategy: Finally we need the update strategy to insert or to update in to the target
table.
Update Strategy 1: This is intended to insert in to the target table.
Drag all the ports except the insert from the first filter in to this.
In
the
Properties
tab
specify
the
condition
as
the
Drag all the ports except the update from the second filter in to this.
or
dd_insert.
Finally connect both the update strategy in to two instances of the target.
Step 3: Create a session for this mapping and Run the work flow.
Step 4: Observe the output it would same as the second target table