A Training - Chennai - 1

Informatica the basics
Trainer: Muhammed Naufal
2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Purpose of the training

The training is designed to have you start using PowerCenter, not to make you experts Youll know how to:
Create logical and physical data flow Design some simple transformations Choose appropriate transformation for your processing Schedule and execute jobs Examine runtime log files Debug transformations and solve data quality issues Run Informatica logic from command line Manage security in Powercenter
2
Agenda Day 1

Introduction Software Installation and Configuration DW/BI Basics Database, Data Warehouse, OLAP/BI, Enterprise Data Warehouse Architecture, ETL & Data Integration, Fitment - Informatica ETL in Data warehouse architecture. Informatica Architecture Brief Description of Informatica Tool & Components (Repository Administrator, Repository Manager, Workflow Manager, Workflow Monitor, Designers, Repository Server, Repository Agent, Informatica Server) LAB: Informatica Installation & Configuration
Agenda Day 2 & 3
ETL Components Designer (Source Analyzer, Workflow Designer, Task Designer, Mapplet Designer & Mapping Designer) Workflow Manager (Task Designer, Worklet Designer, Workflow Designer, Workflow Monitor) Repository Manager & Server Informatica Administration Basics LAB: Informatica Administration
Agenda Day 3 & 4

Transformations Classifications Active/Passive Re-usable Connected/Un-Connected Transformations and Properties Source Qualifier , Expressions, Lookup, Router, SeqGen, Update Strategy, Targets, Joiner, Filter, Aggregator, Sorter LAB: Transformations - Demo
Agenda Day 5
Mapplets and Mappings Mapping Design Mapping Development Mapping Parameters and variables Incorporating Shortcuts Using Debugger Re-usable transformations & Mapplets Importing Sources & Targets Versioning Overview LAB: Mapplet & Mapping Designing
Agenda Day 6

Development of Mappings Sample mappings LAB: Mapping designing for Flat files loading, DB
table loading, etc
Agenda Day 7
Workflow and Properties

Workflow Manager Tasks (Assignment, Command, Control, Decision, Event, Timer) Session (Re-usable or Local) Worklets Workflow Design Session Configuration (Pre/Post Sessions, Parameters & Variables, Emailing, Source/Target Connections, Memory Properties, Files & Directories, Log/Error Handling, Override/revert properties) Workflow Execution & Monitoring Workflow recovery principles Task recovery strategy Workflow recovery options Command line execution
8
LAB: Workflow Designing, Worklet Designing, Task Designing, Scheduling, Sample Workflows, etc.
Agenda Day 8
Advanced Topics Revisit Informatica Architecture Server Configurations (pmServer/pmRepServer), Memory Management Caches (Lookups, Aggregators, Types of Caches) Performance Tuning Release Management & Versioning Repository Metadata Overview LAB: Performance Tuning Demo, Release
Management Demo, Metadata querying, etc.
Agenda Day 9
GMAS ETL Process & Development Methodology Design/Development Guidelines, Checklists, etc. Best Practices & references my.Informatica.com, Informatica discussion groups etc Question & Clarification sessions LAB Sessions Sample mappings/workflows
10
Overview, Software setup & Configuration
Informatica PowerCenter

PowerCenter is an ETL tool: Extract, Transform, Load A number of Connectivity options (DB-Specific, ODBC, Flat, XML, other) Metadata Repository/Versioning built in Integrated scheduler (possibility to use external) Number of cool features XML Imports/Exports, integrity reports, Save As Picture etc
12
PowerCenter Concepts

Repository: stores all information about definition of processes and execution flow Repository server: provides information from the Repository Server: executes operations on the data
Must have access to sources/targets Has memory allocated for cache, processing
Clients (Designer, workflow manager etc): manage the Repository Sources, Targets can be over any network (local, FTP, ODBC, other)
13
PowerCenter Concepts II
Client s Rep Server Server(s )
Sources
Repositor y
Targets
14
Software installation

Copy the folder Pcenter 7.1.1 from Share??
Install the Client. Do not install the servers or ODBC

After the installation you may delete your local Pcenter 7.1.1 folder
15
Registering the Repository

You need to tell your client tools where is the Repository Sever Go to the Repository Manager, Choose Repository->Add Repository . Repository is <Server Name>, user is your Name
16
Registering the Repository II

Right-click on the newly created repository, choose Connect Fill in additional details. Password, port, etc.
17
Register the Oracle database Instance

Add the connection information to tnsnames.ora on your machine Verify the connection using SQLPLUS
18
Define the ODBC connection to waretl
This connection will be used to connect from Client tools directly to a database (e.g. for imports) Go to Control Panel -> Administrative Tools -> Data Sources (ODBC) On the System DSN tab click Add and choose Microsoft ODBC for Oracle. Fill in details for your TNumberOL
19
Environment misc info

Login to Informatica with your User/Pwd
Login to Oracle with User/Pwd

Work in your INFA folder
First folder you open becomes your working folder.
20
STOP
Wait for the class to finish the connectivity setup
Sources, Targets
Client tools - overview
Designer defines Sources, Targets, Transformations and their groups (Mapplets /Mappings) Workflow Manager defines Tasks and Workflows (+scheduling) Workflow Monitor monitors job execution Repository Manager defines connection to repository Server Administration Console for Admins only. Must have English locale settings on PC
23
The Designer tool
Transformation
selector
Tool Selector
Navigator Workspace
Messages
24
Definition of the data flow

The data flow structure is defined in the Designer client tool Following concepts are used:
Sources: structure of the source data Targets: structure of the destination data Transformation: an operation on the data Mapplet: reusable combination of Transformations Mapping: complete data flow from Source to Target
25
Mapping: definition of an e2e flow
Target(s)
Source(s)
Transformation(s)
26
Sources - overview

Sources define structure of the source data (not where the data is). Source + Connection = complete information It is only the internal Informatica information, not e.g. physical structure in the database. Sources are created using the Source Analyzer in the Designer client tool You can either create or import Source definitions
27
Sources how to create a DB source

Go to Tools -> Source Analyzer Choose Import from the database Choose the waretl ODBC connection and fill in remaining details
If you get unable to resolve TNS name error make sure you have added the waretl server to all tnsnames.ora files
28
Sources how to create a DB source
Choose table SRC_TNumber and press OK
29
Sources Creating cont.

A Source is created You can see its definition on the Workspace
30
Editing Source properties
To Edit a Source doubleclick on it in the Workspace You can manually add/delete/edit columns You can define PK/FK/other relations in INFA You can load based on PK (Unit). See the Update Strategy Transformation
31
Sources: Exercise 1
Create a comma-delimited file

Define header columns: DATE_SHIPPED, CUST, PROD, SALES. Have at least 10 data rows. Date format: dd-mmm-yyyy Use REAL IDs for CUST and PROD that exist in some hierarchies. CUST and PROD tables available on waretl
Define this comma-delimited source in Source Analyzer. Use name SRC_TNumber_Flat Preview data in your file in the Source Analyzer.
32
STOP
Wait for the rest of the class to finish If you finish early, read the Designer Guide -> Chapters 2 and 3
Targets overview

Targets define the structure of the destination objects Target + Connection = complete information It is only the internal Informatica information, not e.g. physical structure in the database. Targets are created using the Warehouse Designer in the Designer client tool You can either create or import Target definitions Defining Targets is works the same way as defining Sources
34
Targets - Columns
35
Targets: Exercise I

Import table TGT_TNumber_1 to Warehouse Designer using Import from Database function Compare your Target with the Flat File Source SRC_TNumber_Flat Modify the your Target to be able to load all data from your Flat File. Remember to have your Oracle and INFA definitions synchronized!
This can be done literally <1 minute :)
36
Targets: Exercise II

Define a new Target called TGT_TNumber_2
Define all columns from our Flat File source plus new ones:
FACT_ID (PK) GEO_ID
Create the Target in Oracle too! (and dont forget grants to dev_rol1..)
This too can be done <1 minute :>
37
STOP
Wait for the rest of the class to finish If you finish early, read the Designer Guide -> Chapter 4
Transformations, Mappings
Transformations - Overview

Transformations modify the data during processing. They are internal INFA objects only Transformations can be reusable Large number of transformations available
E.g. Lookup, Filter, Aggregator.. Every transformation has different configuration options
If none applicable, write your own

PL/SQL execution straight from INFA Can write custom COM transformations Custom transformation executes C code
40
Transformations Overview II
Transformations can be created in:

Transformation Developer: reusable Mapplet/Mapping Designer: processing-specific
Usually you can Override the default SQL generated by a Transformation

E.g. in Lookup transformation
Very good online help available Transformation Guide. Use it for self-study!
41
Transformations - Concepts
Transformations work on Ports:

Input, Output, Variable, other (Lookup, Group, Sort..)
42
Transformations Concepts II
Transformations are configured using Properties Tab. Thats THE job!

HUGE number of properties in total
43
Mappings Overview

Mapping is a definition of an actual end-to-end processing (from Source to Target) You connect Transformations by Ports defining a data flow (SH30->CHAIN_TYPE/CHAIN) The data flows internally in INFA on the Server You can visualize the data flowing row by row from a Source to a Target
Exceptions: Sorters, Aggregators etc
A number of things can make the Mapping invalid

44
Mappings - Creating

Choose Mapping Designer, Mappings->Create
Use name m_TNumberLoad
Now we need to define the data flow in the Mapping
45
Transformations: Source Qualifier

Begins the process flow from physical Source onwards Can select only a couple of columns from Source (will SELECT only those!) Can Join multiple sources (PK/FK relationship inside INFA, not Oracle) For relational sources:
You can override the where condition Can Sort the input data Can Filter the input data
46
Transformations: SQ II

For relational sources you can completely Override the SQL statement Is created by default when you drag a Source to the Mapping Designer Standard naming convention is SQ Some options are available only for Relational sources (e.g. sorting, distinct etc) As usually more info in the Transformation Guide
Self-study on overwriting the default SQL ad Where conditions
47
SQ: Creating

Having Mapping Designer open, drag your Flat File SRC_TNumber_Flat onto Workspace SQ is created automatically for you SQ is often a non-reusable component
48
SQ: Ports + Workspace operations

Sources have only Output ports
SQ has I/O ports

Experiment: drag around your Source/SQ Experiment: select, delete and connect Port connectors Right-click on the Workspace, see the Arrange/Arrange Iconic options
49
Workspace

Objects are named and color-coded
Iconic view very useful for large flows. Use Zoom!
50
Exercise: SQ

Delete the automatically created SQ
Create manually an SQ for your Source, the ports and links should be connected automatically
Hint: theres a Transformations menu available when in Mapping designer Hint: you can drag Ports to create them in the target Transformation Save your work the Mapping is Invalid. Why?
51
Exercise: a complete Mapping

Having Mapping Designer open, drag the TGT_TNumber_1 Target onto Workspace Connect appropriate Ports from SQ to Target
Save - the Mapping is now Valid! :) Our Mapping is actually a SQL-Loader equivalent
52
STOP
Wait for the rest of the class to finish
Connections, Sessions
Execution of a Mapping
To execute a Mapping you need to specify WHERE it runs

Remember, Sources/Targets define only the structure
A single Mapping can be executed over different connections An executable instance of a Mapping is called a Session A series of Sessions is a Workflow Workflows are defined in Workflow Designer
55
Workflow Designer
56
Workflows

Workflows are series of Tasks linked together
Workflows can be executed on demand or scheduled

There are multiple workflow variables defined for every server (Server->Server Configuration)
E.g. $PMSessionLogDir, $PMSourceFileDir etc.
The parameters are used to define physical locations for files, logs etc They are relative to Server, not your local PC! Directories must be accessible from Server
57
Connections

Connections define WHERE to connect for a number of reasons (Sources, Targets, Lookups..) There are many Connections types available (any relational, local/flat file, FTP..) Connections are defined in the Workflow Manager Connections have their own permissions! (owner/group/others)
58
Connections: Exercise
Define a Shared connection to your own Oracle schema on <Schema_Name>

In the Workflow Manager click on Create/Edit Relational Connection:
Choose New Connection of type Oracle and fill in the necessary info
59
Tasks
Tasks are definitions of actual actions executed by Informatica Server

Sessions (instances of Mappings) Email Command More Tasks available within a Workflow (e.g. Event Raise, Event Wait, Timer etc.)
Tasks have huge number of attributes that define WHERE and HOW the task is executed Remember, online manual is your friend :)
60
Sessions : Creating

Set the Workspace to Task Developer: Go to Tools->Task Developer Go to Tasks->Create, choose Session as a Task Type Create a task called tskTNumber_Load_1
61
Sessions : Creating II
Choose your Mapping
62
Sessions: important parameters

Access the Task properties by double-clicking on it General tab

Name of your Task
Properties tab:
Session log file directory (default $PMSessionLogDir\) and file name $Source and $Target variables: for Sources/Targets, Lookups etc. Use Connections here (also variables :) ) Treat Source Rows As: defines how the data is loaded.
Read about the Update Strategy Transformation to understand how to use this property. This is very useful property, e.g. able to substitute concept of a Unit
63
Sessions : important parameters II
Config Object tab

Constraint based load ordering Cache LOOKUP() function
Multiple error handling options Error Log File/Directory ($PMBadFileDir) All options in the Config Object tab are based on the Session Configuration (a set of predefined options). To predefine go to Tasks>Session Configuration
64
Sessions : important parameters III

Mapping tab defines the WHERE Connections, Logs For every Source and Target define type of connection (File Writer, Relational..) and details
65
Sessions: relational connections

For relational Sources and Targets you can/must define owner (schema) Click on the Source/Target on the Connection tab The Attribute is :
Owner Name for Sources Table Name Prefix for Targets
If you use a shared Connection and access private objects (without public synonyms) you MUST populate this attribute
66
Sessions : important parameters IV
Components tab
Pre, post session commands Email settings
67
Sessions : Summary

Lot of options again, read the online manual Most importantly, you define all Wheres: $Source, $Target, Connections for all Sources, Targets, Lookups etc Heres also definition of locations of flat files Define error handling/log locations for every Task Use Session Configs Majority of the Session options can be overwritten in Workflows :)
This allows e.g. to execute the same Session over different Connections!
68
Sessions: Exercise
For your tskTNumber_Load_1 session define:

$Source, $Target variables Source and Target locations Remember, your Source is a flat file and Target is Oracle. Source filetype must be set to Direct. This means that the file contains actual data. Indirect= ? :> Enable the Truncate target table option for your relational Target. This purges the target table before every load During all Exercises use only Normal load mode (NOT Bulk) $Source variable is a tricky one :>
69
STOP
Wait for the class to finish
Workflow execution
Workflows: Creating

Choose Tools-> Workflow Designer
Choose Workflows->Create
Create a Workflow wrkTNumber_Load_1
72
Workflows: Properties

Available when creating a Workflow or using Workflows->Edit General tab:

Name of the workflow Server where the workflow will run
Also avilable from Server->Assign Server
73
Workflows: Properties II
Properties tab:
Parameter filename: holds list of Parameters used in Mappings. See online manual Workflow log (different than Session logs!)
74
Workflows: Properties III
Scheduler tab:
Allows to define complete reusable calendars Explore on your own! First read the online manual, then schedule some jobs and see what happens :)
75
Workflows: Properties IV
Variables tab
Variables are also used during Mapping execution Quick task: Find what is the difference between a Parameter and a Variable
Events tab
Add user-defined Events here These events are used later on to Raise or Wait for a signal (Event)
76
Workflow: Adding tasks
With your Workflow open drag the tskTNumber_Load_1 Session onto the Workspace Go to Tasks and choose Link Tasks Link the Start Task with tskTNumber_Load_1
The Start Task does not have any interesting properties
77
Workflow: editing Tasks
You can edit Task properties in a Workflow the same way as you do it for a single Task
Editing the task properties in Workflow overwrites the default Task properties Overwrite only to change the default Task behavior Use system variables if a Task will be executed e.g. every time on a different instance
78
Workflows: Running
You can run a Workflow automatically (scheduled) or on-demand

Well run on-demand only in this course
Before you run a Workflow, run the Workflow Monitor and connect to Server first!
79
Workflows: Running II
In Workflow Designer right-click on the Workflow wrkTNumber_Load_1 and choose Start Workflow Go to Workflow Monitor to monitor your Workflow
80
Workflows: Running II

Workflow Monitor displays Workflow status by Repository/Server/Workflow/Session Two views available: GANTT view and TASK view
81
Workflows: Logs
You can get the Workflow/Session log by rightclicking on Workflow/Session and choosing the log
Remember, Session log is different than Workflow log!
82
Workflows: Logs II
Most interesting information is in the Session logs (e.g. Oracle errors etc) Exercise:
Where do you define location of Where are the Session/Workflow logs? Manually locate and open the logs for your Workflow run Find out why your Workflow has failed :>
83
Logs: Session Log missing for a failed Session
Why would you get an error like that?:
84
Workflows: Restarting

Restart your Workflow using Restart Workflow from Task (More about job restarting: Workflow Administration Guide -> Chapter 14 ->Working with Tasks and Workflows ) Debug until your Workflow finishes OK
85
Workflow: verifying

Check that the data is in your Target table. The table will be empty initially why? Why theres an Oracle error in the Session Log about Truncating the Target table? Hint: when you modify a Mapping that is already used in a Session, you need to refresh and save the Session. Warning! PowerCenter has problems refreshing objects between tools! Use Task->Edit or File>Close All Tools
86
E2E flow: Exercise
Create a new Mapping m_TNumberLoad_2 that will move all data from TGT_TNumber_1 to TGT_TNumber_2 Create a new session tskTNumber_Load_2 for this Mapping. Define connection information to waretl, check Truncate target option and use Normal load mode (NOT Bulk) Create a new Workflow wrkTNumber_Load_2 that will run tskTNumber_Load_2 Run your Workflow, make sure it finishes OK Check that the data is in TGT_TNumber_2 table.
87
STOP
Summary: what you have learned so far
Now you can:
Define structures of Sources and Targets Define where the Sources and Targets are Create a simple Workflow Run a Workflow Debug the Workflow when it fails
You have all basic skills to learn further by yourself!
Transformations
Expressions, Sequences
Transformations what can you do?

Well be going through a number of Transformations Only some (important) properties will be mentioned Read the Transformation Guide to learn more
91
Transformations: Expression (EXP)

Expression modifies ports values
This is a pure Server Transformation

Remember Ports? Input, Output, I/O, Variables
Convention: name ports IN_ and OUT_
The only Property of EXP is Tracing Level
92
Expression Editor

Available in almost every Transformation
Allows to easily access Ports and Functions
93
EXP: Example

Lets create an non-reusable Expression that will change customer Bolek into customer Lolek You can drag ports to copy them IIF function similar to DECODE
94
Transformations: Sequence Generator

SEQ can generate a sequence of numbers
Very similar to Oracle sequence

Can generate batches of sequences Each target is populated from a batch of cached values
95
SEQ: Ports

Nextval
Currval = Nextval + IncrementBy. No clue how this is useful
96
SEQ: Properties

Start Value Increment By The difference between two consecutive values from the NEXTVAL port. End Value: The maximum value the PowerCenter Server generates. Current Value: The current value of the sequence. Cycle Number of Cached Values: The number of sequential values cached at a time. Use when multiple sessions use the same reusable SEQ at the same time to ensure each session receives unique values. 0 = caching disabled Reset: Rewind to Start Value every time a Session runs Disabled for reusable SEQs
97
SEQ: Batch processing
Guess what happens here?: Start = 1, Increment By = 2, Cache = 1000?
98
Transformations: Exercise
Create a Target called TGT_TNumber_Tmp that will hold FACT_ID, SALES and SALESx2 columns Create appropriate table in Oracle, (remember: grants, synonyms if needed..) Add this Target to your Mapping m_TNumberLoad_2 (so, you should have two Targets). Save. The Mapping is valid even though the Target is not connected why? In the Workflow define Connection for this Target (remember, use Normal load mode). Choose Truncate Target option
99
EXP: Exercise

Create a reusable expression called EXP_X2 that will multiply an integer number by two. The input number must be accessible after the transformation Use this EXP to multiply the SALES field when passing it to TGT_TNumber_Tmp. The original SALES field goes to SALES and the multiplied one to SALESx2
100
SEQ: Exercise
Create a reusable SEQ called SEQ_TNumber_FACT_ID. Start = 1, Increment By = 6, Cache = 2000 Populate the FACT_ID field in both targets to have the same value from the sequence (parent/detail).
101
Exercise: verify

Run your modified Mapping m_TNumberLoad_2 Remember to refresh your Task:

Try Task Editor -> Edit If you get an Error, go to Repository -> Close All Tools
You may need to modify the Workflow before you run it - what information needs to be added? Verify that the data is in both TGT_TNumber_2 and TGT_TNumber_Tmp, the SALES_x2 column equals SALES*2 and the same FACT_ID is used for the same row in _2 and _TMP table. Rerun the workflow a couple of times. What happens with FACT_ID field on every run?
102
Solutions:
Wrong! SEQ initialized for every target. _TMP and _2 tables will have different FACT_IDs
103
Solutions:
Wrong! SEQ initialized for every target. _TMP and _2 tables will have different FACT_IDs
104
The correct solution
SEQ initialized once (one ID for every row)
105
STOP
The Debugger
The Debugger
Debugger allows you to see every row that passes through a Mapping
Every transformation can be debugged in your Mapping Nice feature available: breakpoint The Debugger is available from the Mapping Designer
To start the debugger, open your Mapping in Mapping Designer and go to Mappings -> Debugger -> Start Debugger Well be debugging the m_TNumberLoad_2 Mapping Warning: the Debugger uses a lot of server resources!
108
The Debugger: Setting up
You need a Session definition that holds a Mapping to run Debugger

Best way is to use an existing Session You may create the temporary Debug session. This limits debug capabilities
109
The Debugger: Setting up II

Select the Session you want to use
All properties of this session (Connections etc) will be used in the debug session
110
The Debugger: Setting up III

Select the Targets you want to debug. This defines flow in the Debugger You have an option to discard the loaded data
111
The Debugger: running
The Mapping Designer is now in debug mode
Flow Monitor
Transformation youre debugging
Target data
112
The Debugger: Running II
Select EXP_x2 as the Transformation youll be monitoring Remember, you run the Debug session for one Target! Most optimal operation with the Debuger:
Choose your mapping on the Workspace Choose Debugger -> Step to Instance. Goes directly to your Transformation Or, Choose Debugger -> Next instance. This is actually step-by!
113
The Debugger: View on the data
You can view individual Transformations in the Instance window. The data is shown with regard to given data row:
Continue = run the Session (if there are no Breakpoints the Session with run until OK/Failure When the whole Source is read, the Debugger finishes
114
The Debugger: Breakpoints
You can add breakpoints to have the Debugger stop on given condition
Mappings -> Debugger -> Edit Breakpoints
115
Breakpoints: Setting up

Global breakpoints vs. Instance breakpoints
Error breakpoints vs. Data breakpoints

A number of conditions available..
116
The Debugger: Exercise

Experiment on your own :)
Check the difference between Step Into and Go To

Setup breakpoints
You can actually change the data in Debugger flow! Check out this feature, really useful.
A number of restrictions apply: usually can modify only output ports and group conditions
117
STOP
Transformations
Lookups
Transformations: Lookups (LKP)

Lookups find values
Connected and Unconnected

Cached and Uncached
PowerCenter has a number of ways to cache values Cache can be static or dynamic, persistent or not Lookup caching is a very large subject. To learn more about caching read the Transformation Guide
Question: where is the lookup cache file created?
120
Lookups: Connected vs. Unconnected

Connected lookups receive data from the flow while Unconnected are inline functions Unconnected lookup has one port of type R = Return
Connected Default values Caching #returned columns Supported Any type Multiple Unconnected Not supported (possible using NVL) Static only Only one
121
Lookups: Creating
First, import the definition on Lookup table as either Source or Target

REF_CTRL is owned by user infa_dev
While in the Transformation Designer choose Transformations -> Create and create lkpRefCtrl
122
Lookups: Creating II
Choose the Table for your Lookup

Only one table allowed
123
Lookups: Creating III
Delete the columns not used in Lookup

If this is a reusable lookup, be careful when deleting..
124
Lookups: defining lookup ports and condition
There are two important types of ports: Input and Lookup

Combinations of I/O and O/L ports allowed
Input ports define comparison columns in the INFA data flow Lookup ports are columns in the Lookup table For unconnected Lookup there one R port the return column
125
Lookups: Creating join ports
Create two port groups:

IN_ ports for data in the stream (Input) Comparison in the physical table (Lookup) CTRL_PERD will be the Return column
126
Lookups: Creating join conditions
Match the Iinput ports with Lookup ports on the Condition tab
127
Lookup: important Properties
Lookup Sql Override: you can actually override the Join. More on Overriding in the Designer Guide Lookup table name: Join table Lookup caching enabled: disable for our Exercise Lookup policy on multiple match: first or last Connection Information: very important! Use $Source, $Target or other Connection information
128
Lookups: using Unconnected
Weve created an Unconnected lookup

How can you tell?
Use Unconnected lookup in an Expression as O port using syntax: :LKP.lookup_name(parameters)

For example: :LKP.lkpRefCtrl('BS',31,IN_CUST_STRCT,1)
Parameters can be of course port names You must put the Unconnected lookup in the Mapping (floating)
129
Lookups: using Connected

Connected Lookups dont have Return port (they use Output ports) Data for the lookup comes from the pipeline
130
Lookups: Exercise

Modify Mapping m_TNumberLoad_2 to find a ISO_CNTRY_NUM for every customer. Column GEO_ID should be populated with ISO_CNTRY_NUM. Use default values if CUST_ID not found Choose Connected or Unconnected Lookup Run your modified Workflow
Verify that when your Workflow finishes OK all rows have GEO_ID populated
131
STOP
Transformations
Stored Procedures
Transformations: Stored Procedure

Used to execute an external (database) procedure Naming convention is name of the Procedure Huge number of different configurations possible
Connected, Unconnected (same as Lookups) Pre- and post-session/load Returned parameters
Well do an example of a simple stored procedure Possible applications: e.g. Oracle Partitioning for promotions in Informatica (post-session)
PowerCenter has hard time running inlines!
134
Stored Procedures: Creating
The easiest way is to Import the procedure from DB: Transformation -> Import SP
135
Stored Procedures: Importing
Importing creates required Ports for you..
.. and uses correct function name :)
136
Stored Procedures: watchouts
A number of watchouts must be taken into account when using SP Transformation

Pre/post SPs require unconnected setup When more than one value is returned use Mapping Variables Datatypes of return values must match Ports
Transformation Guide is your friend
137
Stored Procedure

Create a procedure in Oracle
Import the procedure in Designer

Use this procedure as Connected transformation. Run your Workflow
What does this function do? :>
138
STOP
Transformations Overview
Aggregator, Filter, Router, Joiner, Update Strategy, Transaction Control, Sorter, Variables
Overviews

Number of interesting Transformations and techniques outside of the scope of this training Overview gives you an idea that a possibility exists to do something If you want to learn more self study: read the Designer Guide and the Transformation Guide
141
Overview: Aggregation (AGG)

Similar to Oracles group-by, functions available
Active transformation changes #rows

A number of restrictions apply
A number of caching mechanisms available
142
Overview: Filter (FL)
Allows to reject rows that dont meet specified criteria. Rows filtered are not in the reject file.
Active transformation
Use as early in the flow as possible
143
Overview: Router (RTR)
Used to sent data to different Targets

Active transformation Dont split processing using Router and then join back! Often used with Update Strategy preceding
Typical usage:
144
Router: configuring
Ports are only Input (in reality I/O)
Define condition groups (a row is tested against all groups)
145
Router: using in Mapping
Router receives the whole stream and sends it different way depending on conditions
146
Overview: Joiner (JNR)
Joins pipelines on master/detail basis

Special Port available that marks one of the pipeline sources as Master Joiner reads ALL (including duplicate) rows for Master and then looks up the detail rows.
Outer joins available (including full outer) Caching mechanisms available

Sorted input speeds up processing
Restrictions:
Cant use is either input pipeline contains an Update Strategy transformation Cant use if one connects a Sequence Generator transformation directly before the Joiner transformation Allows to join two pipelines. If more joins needed, use consecutive JNRs
147
Joiner: Example
Joining QDF with CUST_ASSOC_DNORM before an aggregation
148
Joiner: Ports
One pipeline is master, the other one is detail. M port denotes which one is which
149
Joiner: Properties
Join Condition tab defines the join ;)
Properties tab lets you define join type (amongst other properties)
150
Overview: Update Strategy (UPD)

This transformation lets you mark a row as Update, Insert, Delete or Reject You do it by a conditional expression in the Properties tab of UPD
E.g. :IIF( ( SALES_DATE > TODAY), DD_REJECT, DD_UPDATE )
UPD can replace the Unit concept from S1..
151
Update Strategy: setting up

To set up, create I/O pass-through ports
Then enter the conditional expression into Properties tab of UPD. Use variables:
Insert DD_INSERT 0 Update DD_UPDATE 1 Delete DD_DELETE 2 Reject DD_REJECT 3
You must set properly Session properties. Read more in the Transformation Guide
For example you must select Treat Source Rows As Session Property option to Data Driven
152
Overview: Transaction Control (TC)

This transformation defines commit points in the pipeline To setup, populate the conditional clause in Properties
For example IIF(value = 1, TC_COMMIT_BEFORE, TC_CONTINUE_TRANSACTION)
153
TC: Defining commit points
Use following system variables:

TC_CONTINUE_TRANSACTION. TC_COMMIT_BEFORE TC_COMMIT_AFTER TC_ROLLBACK_BEFORE TC_ROLLBACK_AFTER
Theres transformation scope in majority of transformations
Transaction control must be effective for every source, otherwise the mapping is invalid
Read more in the Transformation Guide
154
Overview: Sorter (SRT)
Allows to sort on multiple keys. Has only I/O Ports

Option to output only distinct data (all Ports become Keys) Option for case sensitive sorts Option to treat Null as High/Low Caching mechanism available
Sorter speeds up some Transformations (AGG, JNR..)
155
Variable ports
You can use Variable ports to:

Store interim results of complex transformations Capture multiple return values from Stored Procedures Store values from previous rows
Remember, Ports are evaluated in order of dependency:

Input Variable Output
156
Variables: storing values from previous rows

Useful when e.g. running inlines for distinct values You need to create two variable Ports to store values from previous rows (why? :> )
157
Transformations: Exercise

Modify the m_TNumberLoad_2 mapping to use JNR transformation for geography lookup (instead of LKP) Count the number of distinct customers in the pipeline (modify the Target table to have CNT column). Use Variables
Run the modified workflow

Verify that GEO_ID is derived (from CUST.ISO_CNTRU_NUM) and loaded into the Target table
Once you finish, modify the Transformation to complete the task without Variables
Change back your mapping to use Lookup instead of Joiner
158
Transformations: Exercise II

Build a Loader mapping just with the following objects: Source, SQ, UPD, Target Add a PK on your Source (e.g. TRANX_ID) Add TRANX_ID to your flat file Use the UPD strategy to insert new rows and update already existing rows (based on TRANX_ID field)
Remember, set correct Session parameters
Load your flat file Verify: add and modify some rows in the flat file. Load again, check that the Target is updated as needed (rows are added/modified/deleted)
159
STOP
Mapplets, Worklets
Mapplets: Overview

Mapplets are reusable sets of Transformations put together into a logical unit Can contain Sources and some Transformations. Cant contain Targets or other Mapplets
Special Transformations available

Input Output
162
Mapplets in Mapping

Use Mapplet Input and Output ports
Connect at least one I and one O port
163
Worklets: Overview
Worklets are sets of Tasks connected into a logical unit

Can be nested Can be reusable Can pass on persistent variables
Runtime restrictions apply:

You cannot run two instances of the same Worklet concurrently in the same workflow. You cannot run two instances of the same Worklet concurrently across two different workflows.
164
Mapplets: Exercise
Create a Mapplet mplt_EMP_NAME that:

Has Input ports of EMPNO and MGR Looks up the ENAME and DEPTNO fields from EMP table for the input EMPNO Filters out all rows that have the MGR<=0
The Mapplet should have three Output ports:

EMPNO, MGR and DEPTNO concatenated with ENAME as DEPT_NAME
Create a table called EMP_DEPT which has EMPNO,MGR, DEPT_NAME as fields (Take the structure of the fields from EMP table) Create a mapping map_EMP_NAME which has EMP1 as Source and EMP_DEPT as target and use the above mapplet inside this mapping. Run the Mapping, verify results
165
STOP
Advanced Scheduling
Advanced Scheduling
When you build Workflows in the Workflow Designer you can use a number of non-reusable components Different control techniques available To use the non-reusable components go to Tasks -> Create when in Workflow Designer
168
Workflows: Links
Links have conditions that set them to True or False

Double-click on a Link to get its properties If the Link condition evaluated to True (default) the Link executes its target Task
Use Expression Editor to modify Link conditions

Access properties of Tasks, e.g. Start Time You can access the Workflow Variables here!
Number of predefined Workflow Variables are available You can create Variables persistent between Workflow runs! Workflow Variables must be predefined for the Workflow (Workflow -> Edit and then the Variables tab)
Task properties:
169
Expression Editor
Links (and some Tasks) can use the Expression Editor for Workflow events
170
Tasks: Command

Command Task executes any script on the Informatica Server It can be reusable Property Run if previous completed controls execution flow when more than one script is defined
171
Tasks: Decision
The Decision task sets a Workflow variable ($Decision_task_name.Condition)

On the Properties tab edit the Decision Name parameter. You can use Expression Editor here
You can use this variable later in a link

Of course you can evaluate the Decision condition directly in a Link, but use Decision for Workflow clarity
172
Workflow: Variables

Variables are integers available from Workflow
They need to be defined upfront in Workflow definition

Go to Workflows - > Edit
Variables can be persistent between Workflow runs
173
Workflow: Variables II

Persistent variables are saved in the repository
You can check the value of a Variable in the Workflow log

Not in the session log!
174
Tasks: Assignment

The Assignment task sets the Workflow variables
You can use the Expression Editor

One Assignment task can set multiple variables
175
Tasks: Email

Use to send emails! ;) Can be reusable
The waretl server is not set up to send emails
You can use all Workflow Variables and Session Properties

Including $PMSuccessEmailUser or $PMFailureEmailUser server variables
176
Emails: Advanced
Every Session can send an email on success or failure

Additional Email options available! Go to Edit Email > Email Subject and click on the small arrow
177
Emails: Additional options
Additional options are available for Email body when using from within a Session
178
Events: Overview

Events are a way of sending a signal from one Task to another Two types of Events supported:
User-defined (define in Workfow->Events) Filewatcher event
Two Event Tasks available:

Event Raise Event Wait
Actually you can use Links to do the same..
179
Events: User-defined
Create user-defined Events in Workflow properties

Workflows -> Edit
Use them later in Event Raise/Wait Tasks
180
Events: Example
Sending Events = Links (in a way..)
181
Tasks: Event Raise
Raises a user-defined Event (sends a signal)

Remember, the Even must be predefined for a Workflow
Only one Event can be Raised
182
Tasks: Event Wait
Waits for an Event to be raised

User-defined Event (signal), or Filewatcher event
Properties available
Enable Past Events!
183
Event Wait: filewatcher
The filewatcher event is designed to wait for a marker file (e.g. *.end)
Theres an option to delete the file immediately after the filewatcher kicks in No wildcards are allowed
Discussion: how to emulate S1 filewatcher, waiting and loading for multiple files?
184
Tasks: Control

The Control Task fails unconditionally
The abort command can be send to different levels

Read more in the Workflow Administration Guide
185
Tasks: Timer
The Timer Task executes

On a date (absolute time) After relative waiting time
186
Tasks: Exercise

Create a new Workflow that will use two Sessions: ..Load_1 and ..Load_2 Run the ..Load_2 Session after every 3 runs of ..Load_1
Dont cycle the Sessions, rerun the whole Workflow How will you verify success?
Obligatory tasks:
Decision Event Raise/Wait
187
STOP
Command line execution
pmcmd - overview

A command line tool to execute commands directly on the Informatica server Useful e.g. to:
Schedule Powercenter tasks using an external scheduler Get status of the server and its tasks
Big number of commands available see online manual:

Workflow Administration Guide -> Chapter 23: using pmcmd
190
pmcmd - example
pmcmd getserverdetails <-serveraddr|-s> [host:]portno <<-user|-u> username|<-uservar|-uv> userEnvVar> <<-password|-p> password|<-passwordvar|-pv> passwordEnvVar> [-all|-running|-scheduled]
pmcmd getserverdetails -s waretl.emea.cpqcorp.net:4001 -u bartek -p mypassword
191
pmrep - overview
Updates session-related parameters and security information on the Repository

E.g. create new user, create the Connection, import/export objects..
Very useful for e.g. bulk operations (create 10 users) Usage: pmrep command_name [-option1] argument_1 [-option2] argument_2...
192
pmrep - usage
The first pmrep command must be connect
Pmrep connect -r repository_name -n repository_username <-x repository_password | X repository_password_environment_variable> -h repserver_host_name -o repserver_port_number
The last command must be exit Full list of commands in the Repository Guide -> Chapter 16: using pmrep
193
pmrep exit
pmcmd: exercise
Run the workflow wrkTNumber_Load_2 from command line: pmcmd startworkflow <-serveraddr|-s> [host:]portno <<-user|-u> username|<-uservar|-uv> userEnvVar> <<-password|-p> password|<-passwordvar|-pv> passwordEnvVar> [<-folder|-f> folder] [<-startfrom> taskInstancePath] [-recovery] [-paramfile paramfile] [<-localparamfile|-lpf> localparamfile] [-wait|-nowait] workflow
194
STOP
Parameters and Variables
Parameters and Variables
Parameters and Variables are used to make Mappings/Workflows/Sessions more flexible

Example of a Session variable: name of a file to load We had an exercise for Workflow variables already Dont confuse with port variables!
Parameters dont change but Variables change between Session runs. The changes are persistent Both Variables and Parameters can be defined in the Parameter File
Except port variables Variables can initialize without being defined upfront
197
Mapping Parameters and Variables
Used inside a Mapping/Mapplet

E.g. to load data incrementally (a week at a time) Cant mix Mapplet and Mapping parameters
Described in the Designer Guide -> Chapter 8 Use them inside transformations in regular expressions
E.g. in SQs (WHERE), Filters, Routers
198
Mapping Parameters and Variables II
To use a Mapping parameter or variable:

Declare them in Mappings -> Declare Parameters and Variables If required define parameters and variables in the Parameter file (discussed later) For variables set the Aggregation type to define partitioning handling Change the values of variables using special functions
SetVariable, SetMaxVariable
199
Session Parameters

Very useful! Can be used to have the same Session work on different files/connections Must be defined in the Parameter File Conventions:
Parameter Type Database Connection Source File Target File Lookup File Naming Convention $DBConnectionName $InputFileName $OutputFileName $LookupFileName
Reject File
$BadFileName
200
Session Parameters - usage

You can replace majority of the Session attributes with Parameters Described in detail in the Workflow Administration Guide -> Chapter 18: Session Parameters
201
Parameter File
Parameter file is used to define values for:

Workflow variables Worklet variables Session parameters Mapping/Mapplet parameters and variables
The variable values in the file take precedence over the values saved in the Repository
This means that if a Variable is defined in a Parameter File, the change of its value in Mapping will have no effect when the Session runs again!
Described in detail in the Workflow Administration Guide -> Chapter 19: Parameter Files
202
Parameter Files II
Parameter Files can be put on the Informatica Server machine or on a local machine
Local files only for pmcmd use
Parameter files can be defined in two places:

Session Properties for Session/Mapping parameters Workflow properties Dont know why there are two places
A single parameter file can have sections to hold ALL parameters and variables
203
Parameter File Format
You define headers for different sections of your parameter file:

Workflow variables: [folder name.WF:workflow name] Worklet variables: [folder name.WF:workflow name.WT:worklet name] Worklet variables in nested worklets: [folder name.WF:workflow name.WT:worklet name.WT:worklet name...] Session parameters, plus mapping parameters and variables: [folder name.WF:workflow name.ST:session name] or [folder name.session name] or [session name]
Values are defined as:

name=value
204
Parameter File Example

[folder_Production.s_MonthlyCalculations] $$State=MA $$Time=10/1/2000 00:00:00 $InputFile1=sales.txt $DBConnection_target=sales [folder_Test.s_MonthlyCalculations] $$State=MA $$Time=10/1/2000 00:00:00 $InputFile1=sales_test_file.txt $DBConnection_target=sales_test_conn
205
Exercise: Mapping Variables & Parameters
Modify the map_EMP_NAME Mapping to load only one Employee specified by a Parameter
Remember to define the Parameter in the Parameter File
Modify the Mapping to store the SAL as SAL+(SAL*30/100) (increase salary by 30%) Use Mapping Variables Test!
206
Exercise: Session Parameters
Modify the S_EMP_NAME Mapping to use a Parameter for the file name to be loaded
Remember to define the Parameter in the Parameter File
How would you load e.g. 10 files one after another using the same Session?
207
STOP
Security overview
Security in PowerCenter

Security topics are described in Repository Guide -> Chapter 5; Repository Security PowerCenter manages privileges internally
Repository privileges (individual or group) Folder permissions Connection privileges
Authentication can be either internal or using LDAP
Security is managed through the Repository Manager

You need to have appropriate privileges to manage security! :)
210
Users, Groups
Individual (User) and Group privileges are combined to get the overall view on someones permissions The group Administrators has all possible privs
211
Repository privileges

Repository privileges are granted to Groups and Users The Repository privileges work on Objects! Detailed description of Repository privileges is in the Repository Guide -> Chapter 5 -> Repository Privileges
212
Object permissions
Object permissions apply in conjunction with Repository privileges

Folders Connections Other..
213
Performance tuning (basics)
Performance tuning
Workflow performance depends on a number of things:

Mapping performance
Database performance Lookups Complex transformations (aggregator, sorters)
Source/Target performance Power of the Informatica Server/Repository Server machines
Good overview in the Workflow Administration Guide -> Chapter 25: Performance Tuning
215
Performance: What can we tune
Eliminate source and target database bottlenecks

Database/remote system throughout put Lookup logic
Eliminate mapping bottlenecks

Transformation logic
Eliminate session bottlenecks

Performance-relates Session parameters Increase #partitions
Eliminate system bottlenecks

Increase #CPUs, memory
Evaluate bottlenecks in this order!

216
Bottlenecks: Identifying
Target bottlenecks
If Target is relational or remote location, change to local Flat file and compare run time
Source bottlenecks - Usually only if relational or remote source.

Use Filter directly after the SQ Run the SQ query manually and direct output to /dev/null
LAN speed can affect the performance dramatically for remote Sources/Targets
Query remotely/locally to identify LAN problems
217
Bottlenecks: Identifying Mapping
Mapping bottlenecks
Put Filters just before Targets: if the run time about the same you may have a Mapping bottleneck Some transformations are obvious candidates
Lookups
Multiple Transformation Errors slow down transformations Use Performance Details file
218
Performance Detail File
Enable in Session Properties
The Performance Detail File has very useful information about every single transformation
File is created in the SessionLog directory Big number of performance statistics available Workflow Administration Guide -> Chapter 14 Monitoring Workflows -> Creating and Viewing Performance Details
219
Performance File - example
Transformation Name LKP_CUST_GENERIC
Counter Name Lookup_inputrows Lookup_outputrows Lookup_rowsinlookupcache
Counter Value 107295 214590 1239356
220
Bottlenecks: Identifying Session

Usually related to insufficient cache or buffer sizes Use the Performance File
Any value other than zero in the readfromdisk and writetodisk counters for Aggregator, Joiner, or Rank transformations indicate a session bottleneck.
221
Allocating Buffer Memory

By default, a session has enough buffer blocks for 83 sources and targets. If you run a session that has more than 83 sources and targets, you can increase the number of available memory blocks by adjusting the following session parameters: DTM Buffer Size. Increase the DTM buffer size found in the Performance settings of the Properties tab. The default setting is 12,000,000 bytes. Default Buffer Block Size. Decrease the buffer block size found in the Advanced settings of the Config Object tab. The default setting is 64,000 bytes.
To configure these settings, first determine the number of memory blocks the PowerCenter Server requires to initialize the session. Then, based on default settings, you can calculate the buffer size and/or the buffer block size to create the required number of session blocks.
222
Example - Buffer Size/Buffer Block

For example, you create a session that contains a single partition using a mapping that contains 50 sources and 50 targets. 1. You determine that the session requires 200 memory blocks: [(total number of sources + total number of targets)* 2] = (session buffer blocks) 100 * 2 = 200
2. Next, based on default settings, you determine that you can change the DTM Buffer Size to 15,000,000, or you can change the Default Buffer Block Size to 54,000:
(session Buffer Blocks) = (.9) * (DTM Buffer Size) / (Default Buffer Block Size) * (number of partitions) 200 = .9 * 14222222 / 64000 * 1 or 200 = .9 * 12000000 / 54000 * 1
223
Bottlenecks: Identifying System
Obvious to spot on the hardware:

100% CPU High paging/second (low physical memory) High physical disk reads/writes
224
A balanced Session
The Session Log has statistics on Reader/Transformation/Writer threads (at the end of the file) MASTER> PETL_24018 Thread [READER_1_1_1] created Total Run Time = [595.053326] secs, Total Idle Time = [319.658512] secs, Busy Percentage = [46.280695]. MASTER> PETL_24019 Thread [TRANSF_1_1_1] created Total Run Time = [592.979465] secs, Total Idle Time = [248.725231] secs, Busy Percentage = [58.055001]. MASTER> PETL_24022 Thread [WRITER_1_*_1] created Total Run Time = [535.331108] secs, Total Idle
225
Increasing performance

A huge subject in itself
For every bottleneck there is a number of optimization techniques available

Think creatively, having the overall architecture in mind
Relational databases (Sources, Targets, Lookups..) Informatica server
226
Tuning Sources/Targets

Increase the database throughout put
Limit SQs
Limit incoming data (# rows) Tune SQ queries Prepare the data on the source side (if possible)
For Targets use Bulk Loading and avoid PKs/Indexes Increase LAN speed for remote connections
227
Tuning Transformations

Tune Lookups with regard to DB performance
Use appropriate caching techniques

For Lookups: static vs dynamic, persistent
If possible use sorted transformations

Aggregator, Joiner
Use Filters as early in the pipeline as possible Use port variables for complex calculations (factor out common logic) Use single-pass reading
228
Optimizing Sessions/System
Increase physical servers capacity

#CPUs Memory LAN HDD speed
Use appropriate buffer sizes

Big number of options available
Use bigger number of machines

Informatica Grids Oracles RACs
229
Pipeline Partitioning - Overview
Pipeline Partitioning is a way to split a single pipeline into multiple processing threads
Workflow Administration Guide -> Chapter 13: Pipeline Partitioning

230
Default Partition Points
231
Pipeline Partitioning
In a way one partition is a portion of the data

Partition point is where you create boundaries between threads Different partition points can have different #partitions
This means that there can be multiple

Reader threads Transformation threads Writer threads
This requires multi-CPU machines and relational databases with parallel options enables HUGE performance benefits can be achieved
If you know what youre doing, otherwise you may actually lower system performance!
232
Understanding Pipeline Flow
Pipeline partitions are added in the Mapping tab of Session properties (Workflow Manager)
233
Partitioning Limitations
You need to have a streamlined data flow to add partition points
Cant add partition points to these transformations because not all the columns flow through this part of the pipeline
234
Partition Types
Round-robin. The PowerCenter Server distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows. Hash. The PowerCenter Server applies a hash function to a partition key to group data among partitions. If you select hash auto-keys, the PowerCenter Server uses all grouped or sorted ports as the partition key. If you select hash user keys, you specify a number of ports to form the partition key. Use hash partitioning where you want to ensure that the PowerCenter Server processes groups of rows with the same partition key in the same partition. Key range. You specify one or more ports to form a compound partition key. The PowerCenter Server passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range. Pass-through. The PowerCenter Server passes all rows at one partition point to the next partition point without redistributing them. Choose passthrough partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions. Database partitioning. The PowerCenter Server queries the IBM DB2 system for table partition information and loads partitioned data to the corresponding nodes in the target database. Use database partitioning with IBM DB2 targets stored on a multi-node tablespace. For more information, refer Workflow Administration Guide
235
Migration strategies
Migration Strategies
Theres always a need to migrate objects between stages

E.g. Test -> QA -> Prod
Usual problems with object synchronization There are two main types of migration
Repository per Stage Folder per stage
237
Folder Migrations
One folder per stage

Complex directory structure (multiple stages per project folder)
Not allowed to nest directories
Lower server requirements (one repository) Easier security management (one user login) Folders are created and managed in the Repository Manager
You need to have appropriate privs
238
Repository migrations

In this case you have a separate Repository (not necessarily Repository Server) per Stage Reduces the Repository size/complexity Streamlines folder structure
Test repository
Prod repository
239
Copy Wizard
Copy Wizard assists you to copy Folders or Deployment Groups

Use Edit -> Copy (..) Edit -> Paste
240
Copy Wizard II

You can copy between repositories or within the same repository The Wizard helps you to resolve conflicts
Connections Variables Folder names Other
241
XML Exports/Imports
If not possible to copy between folders or repositories (e.g. no access at all for Dev group to QA repository), one can use XML Export/Import
242
XML Imports/Exports II

You can Export/Import any type of object
When Exporting/Importing there are all dependencies exported (e.g. Sources for Mappings)
When Importing an Import Wizard will help you to resolve any possible conflicts
Different folder names Existing objects other
243
XML other use
How can one use XML data imports?

Transfer of objects between repositories Automatic Transformation creation from existing processes Quick import of Source/Target definitions from different format Backup of PowerCenter objects
244
Deployment Groups

For versioned Repositories you can group objects into Deployment Groups Greater flexibility and reduced migration effort You can define whole application or just a part of it No need to have one folder per application A deployment Group can be Static or Dynamic
Additional complexity (dependant child objects)

Read more in the Repository Guide -> Chapter 9: Grouping Versioned Objects
245
Exercise: Copy Wizard

Create a new folder TNumber_PROD
Copy your entire folder TNumber to folder TNumber_PROD

Modify the m_TNumberLoad_1 Mapping back to use hardcoded file name (instead of a Parameter)
In the TNumber folder
Migrate your change to TNumber_PROD folder Use Advanced options
246
The Aggregator Test
The Test Objectives

This test checks some skills you should have learned during the course Its supposed to prove your knowledge, not your colleagues or mine Its close to real life development work The test requires from you
Application of gained knowledge use training materials and online guides! Creativity Persistance
248
The Test Description
Your task is to load some data into target database, manipulating it on the way
So, its a typical ETL process
Youll have to:

Define and create Informatica and Oracle objects Modify the source information Run Informatica workflows Verify that the data has been correctly loaded and transformed
249
The Test : Workflow I

1.
Define a workflow that will load the data from file agg_src_file_1.txt to Oracle
Create your own target Oracle table for called ODS_TNUMBER If a numerical value is not numerical then load the row anyway, using 0 for numerical value Use an Update Strategy transformation based on the original transaction ID (ORIG_TRX_ID) to insert new rows and update existing rows Verify:
#rows in = #rows out Sum(values) in the source file = sum(vaules) in the target
250
The Test: Workflow II
Move all the data from table ODS_TNUMBER to table QDF_TNUMBER, adding following columns on the way:
TRADE_CHANL_ID from CUST table GEO_NAME from GEO.NAME table, linking via CUST.ISO_CNTRY_NUM
Filter out all rows with sales values <=0 Create your QDF table
251
The Test: Workflow III
Create a report (Oracle table) that will give information how much was daily sales in each Sector
Sector is level 2 in the 710 hierarchy Use DNORM table to get SECTOR information Use most recent CTRL_PERD Create appropriate Oracle report table
252
Test rules
No data row can be dropped

#rows in the source file = #rows in the target file, unless a source file row is supposed to update already loaded row
If an ID is not known it is supposed to be replaced with a replacement code

Product ID replacement key: 82100000 Customer ID replacement key: '9900000003
Dont change any Oracle or source data, however you may create your own objects
253
Task hints

Some values may be a bit different than other try to fix as many data issues as possible Remember about performance! Large lookups, aggregations, joins Use log files and the Debugger Use reusable components if feasible
254
The End
2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

A Training - Chennai - 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Training - Chennai - 1

Uploaded by

Copyright:

Available Formats

Informatica the basics

Trainer: Muhammed Naufal

Purpose of the training

Agenda Day 2 & 3

Agenda Day 3 & 4

table loading, etc

Workflow and Properties

Management Demo, Metadata querying, etc.

Overview, Software setup & Configuration

Copy the folder Pcenter 7.1.1 from Share??

Install the Client. Do not install the servers or ODBC

Registering the Repository

Registering the Repository II

Register the Oracle database Instance

Define the ODBC connection to waretl

Environment misc info

Login to Informatica with your User/Pwd

Login to Oracle with User/Pwd

Client tools - overview

The Designer tool

Definition of the data flow

Mapping: definition of an e2e flow

Sources how to create a DB source

Sources how to create a DB source

Choose table SRC_TNumber and press OK

Sources Creating cont.

A Source is created You can see its definition on the Workspace

Editing Source properties

Create a comma-delimited file

Define a new Target called TGT_TNumber_2

If none applicable, write your own

Transformations can be created in:

Usually you can Override the default SQL generated by a Transformation

Transformations work on Ports:

Transformations are configured using Properties Tab. Thats THE job!

A number of things can make the Mapping invalid

Choose Mapping Designer, Mappings->Create

Use name m_TNumberLoad

Now we need to define the data flow in the Mapping

Transformations: Source Qualifier

SQ: Ports + Workspace operations

Sources have only Output ports

SQ has I/O ports

Objects are named and color-coded

Iconic view very useful for large flows. Use Zoom!

Delete the automatically created SQ

Exercise: a complete Mapping

To execute a Mapping you need to specify WHERE it runs

Workflows are series of Tasks linked together

Workflows can be executed on demand or scheduled

Define a Shared connection to your own Oracle schema on <Schema_Name>

Tasks are definitions of actual actions executed by Informatica Server

Choose your Mapping

Sessions: important parameters

Access the Task properties by double-clicking on it General tab

Sessions : important parameters II

Config Object tab

Sessions : important parameters III

Sessions: relational connections

Sessions : important parameters IV

For your tskTNumber_Load_1 session define:

Choose Tools-> Workflow Designer