Professional Documents
Culture Documents
Approvals
Stephen Musgrove :
:
Informatica Cookbook
Change Record
DATE Author Version Reference
19-Apr-2004 Sastry Kolluru 1.00 Added section 7.7 and
7.8
Reviewers
NAME POSITION
Table of Contents
1.0OVERVIEW...............................................................................................................................................5
2.0GETTING STARTED...............................................................................................................................5
2.1 ABOUT INFORMATICA...................................................................................................................................5
2.1.1 Version in use..................................................................................................................................5
3.0INFORMATICA DEVELOPMENT CYCLE.........................................................................................5
3.1 STARTING A NEW PROJECT...........................................................................................................................5
3.1.1 Project Initialization........................................................................................................................5
3.1.2 Login................................................................................................................................................6
3.1.3 Folders and Groups setup...............................................................................................................6
3.2 DEVELOPMENT AND TESTING PROCESS...........................................................................................................6
3.3 MIGRATION TO PRODUCTION.........................................................................................................................6
3.3.1 Information to be provided..............................................................................................................6
3.3.2 Review before movement.................................................................................................................7
3.4 CHANGES TO AN EXISTING PROJECT...............................................................................................................7
4.0 TRANSITION OF PROJECTS FOR SUPPORT..................................................................................7
4.1 REQUIREMENTS FOR SUPPORT........................................................................................................................7
4.2 SUPPORT PROCESS ON FAILURE.......................................................................................................................8
4.3 SUPPORT WINDOW......................................................................................................................................8
5.0INFORMATICA ENVIRONMENTS......................................................................................................8
5.1 DEVELOPMENT............................................................................................................................................8
5.2 PRODUCTION...............................................................................................................................................8
6.0 ENGINE MANAGEMENT.....................................................................................................................9
6.1 MANAGING THE ENGINE...............................................................................................................................9
6.2 RESTARTING THE ENGINE..............................................................................................................................9
7.0 BEST PRACTICES..................................................................................................................................9
7.1 NAMING STANDARDS...................................................................................................................................9
7.1.1 Challenge........................................................................................................................................9
7.1.2 Description......................................................................................................................................9
7.2 TEMPLATES...............................................................................................................................................12
7.2.1 Challenge......................................................................................................................................12
7.2.2 Description....................................................................................................................................12
7.3 USAGE OF CONNECTION OBJECTS................................................................................................................14
7.3.1 Challenge......................................................................................................................................14
7.3.2 Description....................................................................................................................................14
7.4 FAILURE SCRIPTS......................................................................................................................................15
7.4.1 Challenge......................................................................................................................................15
7.4.2 Description....................................................................................................................................15
7.5 TRUNCATING DATA.....................................................................................................................................15
7.5.1 Challenge......................................................................................................................................15
7.5.2 Description....................................................................................................................................15
7.6 BUILT-IN RE-STARTABILITY..........................................................................................................................16
7.6.1 Challenge......................................................................................................................................16
7.6.2 Description....................................................................................................................................16
7.7 PROJECT DIRECTORY STRUCTURE IN UNIX......................................................................................................17
7.7.1 Challenge......................................................................................................................................17
7.7.2 Description....................................................................................................................................17
7.8 PARAMETERIZATION OF SESSION INFORMATION................................................................................................17
1.0 OVERVIEW
The objective of the Informatica Cookbook is to provide the Informatica user
community at Fidelity Investments information regarding
• Informatica infrastructure at FEB
• Processes for the development life cycle
• Best practices/ tips and techniques
The cookbook hopes to be a starting point for developers so that they can
understand standards/processes and best practices before starting work on the
FEB Informatica infrastructure. It also will act as a refresher for experienced
developers for best practices and learning’s from other users.
3.1.2 Login
Every user should have a login into development as well as production. The Corp
id will be used as login for individual users. In development users will be given
access to both create and execute mappings/sessions whereas in production only
read access will be given. The request for creating a new login may come as a
part of the project initialization mail or a separate mail maybe sent to the
Informatica Support Group. A selective execute privilege can be requested for
some sessions or workflows.
After coding and testing has been done in development, the following information
should be provided to the Informatica Development Team so as to facilitate
movement of code. This could also be true for enhancements/Bugfixes existing
mappings/Sessions
1. Project name
2. Folder in development
3. List of session/mapping names
4. If any scripts need to be moved then the list of the same
5. Date when the movement has to be made
The Informatica Support group will review mappings and Sessions before it is
moved from development to Production, following are some of the important
points
1. Project name
2. Folder in development
3. List of session/mapping names
4. If any scripts need to be moved then the list of the same.
5. Date when the movement has to be made
1. On failure the session will send out a mail/Page to the support team. The
Informatica support team shall follow the Re-start and recovery process provided.
2. A mail will be sent to the primary and secondary contacts summarizing the
reason for failure and the action taken
Monday to Friday
5.1 DEVELOPMENT
The Informatica Development Engine is setup in webstatdev. The repository is in
oracle and it has been hosted in smmk94 so that backup’s of it are taken from
time to time. There are development instances in version 5.1 and 6.2.
5.2 PRODUCTION
The Informatica Development Engine is setup in smmk94. The repository is in
oracle and it has been hosted in smmk94. There are production instances in
version 5.1 and 6.2.
A mail shall be sent to the user community regarding the re-starting of the
engine and after the engine has been brought up this confirmation will be sent so
that users can double check status of their sessions. If the sessions have not
been scheduled properly the uses should inform the Informatica support team.
7.1.2 Description
Folders
Folders are a collection of mappings, sources, targets, sessions, and batches.
Syntax:
ProjectName_phase
Description:
Note: not all phases may be required by each development group. Additional
folders can be created to meet the testing needs of the development teams.
Ports
Ports are another name for fields. There are many kinds of Ports: Input, Output,
Variable, Lookup etc.
Variable port names begin with the ‘v_’ prefix. Output ports that have been
added during coding should begin with ‘o_’ prefix
All other port names are at the discretion of the programming team.
Transforms
The names of these objects should describe what the transform does. Be as clear
and concise as possible. Prefixes are:
exp_ - Expressions
jnr_ - Joiners
fil_ - Filters
lkp_ - Lookups
agg_ - Aggregators
seq_ - Sequence Generator
sq_ - Source Qualifier
upd_ - Update Strategy
sp_ - Stored Procedure
nrm_ - Normlizer
rnk_ - Rank
rtr_ - Router
xsq_ - XML Source qualifier
srt_ - Sorter
For databases tables, default Source and Target names are derived from the
ODBC data source name and the table name/view name of the object in the
DBMS.
For files, default Source names are derived from FLATFILE:name of file.
Mappings
Sessions and Batches are the descriptive components that wrap the mappings
and provide the detail regarding how, when and with what sources/targets to use
during a mapping execution.
Syntax :
Qualifier_Batch/SessionName
Description:
Qualifier - ‘s’ for Session
‘b’ for Batch
‘wf’ for workflow
‘wl’ for worklet
Syntax:
database_LogonID
Description:
Example:
CAP1_powerm
Syntax:
server_database_LogonID
Description:
Server - The server name
LogonID - The user id to use when logging into the source/target
Example:
dbp1_powerm
Syntax:
Server_Database_LogonID
Description:
Database - The Database name
LogonID - The user id to use when logging into the source/target
Example:
dbp1_powerm
The PowerMart™ engine requires external loader on the machine the engine is
running to use bulk loading utilities to load data to databases. In order to
establish clear loader names the following standard should be used:
Syntax:
SQLLDR_Schema_LogonID
Description:
Example:
CAP1_powerm
7.2 TEMPLATES
7.2.1 Challenge
Develop a method by which the code in Informatica can be documented so that it
is easy for development and transitioning to a support team.
7.2.2 Description
A template document has been created to document the logic in the Informatica
transforms. This document will be a master list of all activities to be done. One
template document will be created for every mapping. The template document
consists of the following sections
Setup
This section would contain the details of source and target, the intermediate data
elements and any comments at the template level.
Error handling
This section would contain the error conditions and the actions to be taken for
each of the error conditions.
Setup
# Name Description
1. Mapping Name The name of the mapping document.
2. Description Any detailed description found necessary for the
document.
3. Source Details source for the mapping
4. Target Details the target for the mapping
5. Initial Rows The average number of records expected to be
processed; this will be used for database size
estimation and load window.
6. Load Frequency The frequency of loads, this could be daily,
weekly, monthly etc.
7. Load Window The time period during which the upload will
take place
8. Pre-processor The activities to be done before processing the
transformations. Any specific checks will have to
be added here.
9. Post Processing The activities after the transformation process
are complete. Any specific checks will have to be
added here.
10. Remarks Any remarks applicable at the Mapping level.
Sources
1. Tables The source table name, the schema/owner
name and any filter condition to be applied for
the table. If multiple tables are present then all
the table names will have to be added. The
relationship between the tables will be provided
in the relationship column.
2. File The source file name, the location of the file, the
file type, the file format, relationship between
various files and information regarding presence
of header and footer.
Target
1. Tables The target table name, the schema/owner name
If multiple tables are present then all the table
names will have to be added. The relationship
between the tables will be provided in the
relationship column.
2. File The target file name, the location of the file, the
Lookups
1. Look up name The name of the lookup.
2. Lookup Table The source of data
3. Table Owner The owner of the table
4. Lookup Columns The columns that are to be included in the
lookup
5. Filter The condition to be applied to the data to be
fetched from the table
6. Comments The context of usage of the lookup
# Name Description
1. Target Table name The table name of the ODS table
2. Target field name Field name in the target field
3. Target datatype The datatype of the Target field
4. Target mandatory To indicate if the field is mandatory
5. Default value The default value if field is null
6. Source Table/File name The table/file name of the source
7. Source field name Field name in the source field
8. Comments and detailed The details of all transformations to be done
transformations
Error Handling
Any specific error handling needs can be specified in this section of the template.
7.3.2 Description
• When connecting to the database the administrative user should not be used, an
application specific batch user should be used
• The naming convention to be followed is as specified in the naming convention
section 7.1
• The name of the connection object in QA and production should be the same
• When using the external loader, for the external loader executable name instead
of using /webstatmmk1/oracle/product/9.2.0.2/bin/sqlldr use the shell script
/webstatmmk1/ia/pm47/sh_load or /webstatmmk1/ia/pm47/
sh_load_parallel_direct
7.4.2 Description
7.5.1 Challenge
Truncate data before loading, when an application user is being used to connect
to the database.
7.5.2 Description
Informatica sessions are connecting to the database using a non DBA user. A
sample of the procedure is as given under. Only batch id’s should have the access
to execute this proc.
This procedure can then be called from Informatica within the mapping or in the
preprocessing using a shell script.
7.6.1 Challenge
Design sessions such that the support and maintenance effort is low
7.6.2 Description
Incase aggregates are being populated data should be first deleted for the period
for which data is being inserted before actually inserting the data.
Tasks should be broken into different sessions that calling all scripts as a part of
one session. By this if a given script fails then re-starting would be easy.
7.7.2 Description
Following directories should be created inside the home directory for each project
• Bin – Directory for all the scripts used in the project (E.g.
/webstatmmk1/post/sample/bin)
• Env – Directory for parameter and environment settings files(E.g.
/webstatmmk1/post/sample/env)
• Incoming – Directory where the files that act as the source for the
project should reside (E.g. /webstatmmk1/post/sample/incoming)
• Outgoing – Directory where the output files created by various processes
should reside (E.g. /webstatmmk1/post/sample/outgoing)
• Temp – Directory for temporary files created by various processes, the
bad files and lookup cache files created by Informatica should also reside
in this directory (E.g. /webstatmmk1/post/sample/temp)
• Log – Directory for the log files generated by various processes in the
project. The Informatica log files should be saved into this directory (E.g.
/webstatmmk1/post/sample/log)
• Archive – Directory for storing files that need to be archived as a part of
the project (E.g. /webstatmmk1/post/sample/archive)
The Directory where the log files are stored should be added to the script in the
crontab that checks for the # of errors and warnings in Informatica log files so
that it would become easy to track sessions with many errors/warnings.
7.8.1 Challenge
Session information should be parameterized as far as possible so that migration
of code between dev/qa and production can be done with minimum changes. The
log files/bad files target files etc can be separated for each application so that
they don’t affect each other.
7.8.2 Description
[SAMPLE.s_m_first_sample_session]
$PMSessionLogFile=/webstatmmk1/post/sample/log/
s_m_first_sample_session.log
$DBConnection_sample_source=sample_source
$DBConnection_sample_target=sample_target
$RejectFile_sample=/webstatmmk1/post/sample/temp
$TargetFileDir_test=/webstatmmk1/post/sample/outgoing
$SrcFileDir_test=/webstatmmk1/post/sample/incoming
The header should be FolderName.SessionName, the folder name is not required but
it is advised to add the same.
The session log file name and directory can be parameterized, if only the file name
needs to be parameterized then the property “Session Log File Name” needs to set to
$PMSessionLogFile. If the log file name and directory needs to be parameterized then
the property “Session Log File directory“ should be left blank and then the property
“Session Log File Name” should be set to $PMSessionLogFile.
Database connection
The Source/Target or Reject file names can be parameterized. If only the file name
needs to be changed to $TargetFileDir_test and the value for the parameter can be
set to a different file name. If the file as well as the directory needs to be changed
then the property “Output file directory” should be left blank and in the file name
should be populated as $TargetFileDir_test.
information
b. Cache file location
i. Unix soft links should be used so that the same string can be
used in Development/QA and Production
2. Parameter Filename in the properties tab, an exception being if the session is
being scheduled by pmcmd. When using pmcmd the parameter file name is
taken as an input parameter.