Professional Documents
Culture Documents
Rank transformation is used to find the status.ex if we have one sales table and in this if we find more
employees selling the same product and we are in need to find the first 5 0r 10 employee who is selling
more products.we can go for rank transformation.
Can any body write a session parameter file which will change the source and
targets for every session i.e different source and targets for each session run.
You are supposed to define a parameter file. And then in the Parameter file, you can define two
parameters, one for source and one for target.
Give like this for example:
$Src_file = c:\program files\informatica\server\bin\abc_source.txt
$tgt_file = c:\targets\abc_targets.txt
Then go and define the parameter file:
[folder_name.WF:workflow_name.ST:s_session_name]
$Src_file =c:\program files\informatica\server\bin\abc_source.txt
$tgt_file = c:\targets\abc_targets.txt
If its a relational db, you can even give an overridden sql at the session level...as a parameter. Make sure
the sql is in a single line.
Informatica Live Interview Questions
here are some of the interview questions i could not answer, any body can help giving answers for others
also.
thanks in advance.
Junk Dimension A Dimension is called junk dimension if it contains attribute which are rarely changed
ormodified. example In Banking Domain , we can fetch four attributes accounting to a junk dimensions
like from the Overall_Transaction_master table tput flag tcmp flag del flag advance flag all these
attributes can be a part of a junk dimensions.
Can anyone explain about incremental aggregation with an example?
When you use aggregator transformation to aggregate it creates index and data caches to store the data
1.Of group by columns 2. Of aggregate columns
the incremental aggregation is used when we have historical data in place which will be used in
aggregation incremental aggregation uses the cache which contains the historical data and for each
group by column value already present in cache it add the data value to its corresponding data cache
value and outputs the row in case of a incoming value having no match in index cache the new values for
group by and output ports are inserted into the cache .
Difference between Rank and Dense Rank?
Rank:
1
2<--2nd position
2<--3rd position
4
5
Same Rank is assigned to same totals/numbers. Rank is followed by the Position. Golf game usually
Ranks this way. This is usually a Gold Ranking.
Dense Rank:
1
2<--2nd position
2<--3rd position
3
4
Same ranks are assigned to same totals/numbers/names. The next rank follows the serial number.
About Informatica Power center 7:
1) I want to know which mapping properties can be overridden on a Session Task
level.
2)Know what types of permissions are needed to run and schedule Work flows.
1) I want to Know which mapping properties can be overridden on a Session Task level?
You can override any properties other than the source and targets. Make sure the source and targets
exist in your db if it is a relational db. If it is a flat file, you can override its properties. You can override sql
if its a relational db, session log, DTM buffer size, cache sizes etc.
2) Know what types of permissions are needed to run and schedule Work flows
You need execute permissions on the folder to run/schedule a workflow. You may have read and write.
But u need execute permissions as well.
Can any one explain real time complain mappings or complex transformations in
Informatica.
Especially in Sales Domain.
Most complex logic we use is denormalization. We don’t have any Denormalizer transformation in
Informatica. So we will have to use an aggregator followed by an expression. Apart from this, we use
most of the complex in expression transformation involving lot of nested IIF and Decode
statements...another one is the union transformation and joiner.
How do you create a mapping using multiple lookup transformation?
Use unconnected lookup if same lookup repeats multiple times.
In the source, if we also have duplicate records and we have 2 targets, T1- for unique
values and T2- only for duplicate values. How do we pass the unique values to T1
and duplicate values to T2 from the source to these 2 different targets in a single
mapping?
Soln1: source--->sq--->exp-->sorter (with enable select distinct check box) --->t1
--->aggregator (with enabling group by and write count function) --->t2
If u wants only duplicates to t2 u can follow this sequence
--->agg (with enable group by write this code decode(count(col),1,1,0))---
>Filter(condition is 0)--->t2.
Soln2: take two source instances and in first one embedded distinct in the source qualifier and connect
it to the target t1.
and just write a query in the second source instance to fetch the duplicate records and connect it to the
target t2.
<< if u use aggregator as suggested by my friend u will get duplicate as well as distinct records in the
second target >>
Soln3: Use a sorter transformation. Sort on key fields by which u want to find the duplicates. then use
an expression transformation.
Example:
Example:
field1-->
field2-->
SORTER:
field1 --ascending/descending
field2 --ascending/descending
Expression:
--> field1
--> field2
<--> v_field1_curr = field1
<--> v_field2_curr = field2
v_dup_flag = IIF(v_field1_curr = v_field1_prev, true, false)
o_dup_flag = IIF(v_dup_flag = true, 'Duplicate', 'Not Duplicate'
<--> v_field1_prev = v_field1_curr
<--> v_field2_prev = v_field2_curr
Use a Router transformation and put o_dup_flag = 'Duplicate' in T2 and 'Not Duplicate' in T1.
Informatica evaluates row by row. So as we sort, all the rows come in order and it will evaluate based on
the previous and current rows.
What are the enhancements made to Informatica 7.1.1 version when compared to
6.2.2 version?
In 7+ versions
- We can lookup a flat file - Union and custom transformation- There is
propagate option i.e., if we change any data type of a field, all the linked
columns will reflect that change- We can write to XML target.- We can use
up to 64 partitions What is the difference between Power Centre and Power Mart?
What is the procedure for creating Independent Data Marts from Informatica 7.1?
Power Centre have Multiple Repositories,where as Power mart have single repository(desktop
repository)Power Centre again linked to global repositor to share between users
No. of
repository n No. n No.
low&mid range
aplicability high end WH WH
global
repository supported not supported
You should configure the mapping with the least number of transformations and expressions to do the
most amount of work possible. You should minimize the amount of data moved by deleting unnecessary
links between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup
transformations), limit connected input/output or output ports. Limiting the number of connected
input/output or output ports reduces the amount of data the transformations store in the data cache.
You can also perform the following tasks to optimize the mapping:
• Configure single-pass reading.
• Optimize datatype conversions.
• Eliminate transformation errors.
• Optimize transformations.
• Optimize expressions. You should configure the mapping with the least number of
transformations and expressions to do the most amount of work possible. You should minimize
the amount of data moved by deleting unnecessary links between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup
transformations), limit connected input/output or output ports. Limiting the number of connected
input/output or output ports reduces the amount of data the transformations store in the data
cache.
You can also perform the following tasks to optimize the mapping:
•
○ Configure single-pass reading.
○ Optimize datatype conversions.
○ Eliminate transformation errors.
○ Optimize transformations.
○ Optimize expressions.
What is difference between dimension table and fact table
and what are different dimension tables and fact tables
In the fact table contain measurable data and fewer columns and many rows,
It's contain primary key
Different types of fact tables:
Additive, non additive, semi additive
In the dimensions table contain textual description of data and also contain many columns, less rows
Its contain primary key
What are Work let and what use of work let and in which situation we can use it
Worklet is a set of tasks. If a certain set of task has to be reused in many workflows then we use work
lets. To execute a Work let, it has to be placed inside a workflow.
The use of work let in a workflow is similar to the use of mapplet in a mapping.
What are mapping parameters and variables in which situation we can use it
If we need to change certain attributes of a mapping after every time the session is run, it will be very
difficult to edit the mapping and then change the attribute. So we use mapping parameters and variables
and define the values in a parameter file. Then we could edit the parameter file to change the attribute
values. This makes the process simple.
Mapping parameter values remain constant. If we need to change the parameter values then we need to
edit the parameter file.
But value of mapping variables can be changed by using variable function. If we need to increment the
attribute value by 1 after every session run then we can use mapping variables
In a mapping parameter we need to manually edit the attribute value in the parameter file after every
session run.
explain use of update strategy transformation
Maintain the history data and maintain the most recent changes data.
what is meant by complex mapping,
Complex mapping means involved in more logic and more business rules.Actually in my project complex
mapping isIn my bank project, I involved in construct a 1 data ware houseMany customer is there in my
bank project, They r after taking loans relocated in to another place that time i feel to difficult maintain
both previous and current addressesin the sense i am using scd2This is an simple example of complex
mapping
I have an requirement where in the columns names in a table (Table A) should appear
in rows of target table (Table B) i.e. converting columns to rows. Is it possible
through Informatica? If so, how?
if data in tables as follows
Table A
Key-1 char(3);
table A values
_______
1
2
3
Table B
bkey-a char(3);
bcode char(1);
table b values
1T
1A
1G
2A
2T
2L
3A
and output required is as
1, T, A
2, A, T, L
3, A
the SQL query in source qualifier should be
select key_1,
max(decode( bcode, 'T', bcode, null )) t_code,
max(decode( bcode, 'A', bcode, null )) a_code,
max(decode( bcode, 'L', bcode, null )) l_code
from a, b
where a.key_1 = b.bkey_a
group by key_1
/
If a session fails after loading of 10,000 records in to the target How can u load the
records from 10001 th record when u run the session next time in informatica 6.1?
Simple solution, Nothing by using performance recovery option
Can we run a group of sessions without using workflow manager
ya Its Possible using pmcmd Command with out using the workflow Manager run the group of session.
what is the difference between stop and abort
The Power Center Server handles the abort command for the Session task like the stop command,
except it has a timeout period of 60 seconds. If the Power Center Server cannot finish processing and
committing data within the timeout period, it kills the DTM process and terminates the session.
stop: _______If the session u want to stop is a part of batch you must stop the batch,
if the batch is part of nested batch, Stop the outer most bacth\
Abort:----
You can issue the abort command , it is similar to stop command except it has 60 second time out .
If the server cannot finish processing and committing data with in 60 sec
What is difference between lookup cache and uncached lookup?
Can i run the mapping with out starting the informatica server?
The difference between cache and uncached lookup is when you configure the lookup transformation
cache lookup it stores all the lookup table data in the cache when the first input record enter into the
lookup transformation, in cache lookup the select statement executes only once and compares the values
of the input record with the values in the cache but in uncached lookup the select statement executes for
each input record entering into the lookup transformation and it has to connect to database each time
entering the new record
I want to prepare a questionnaire. The details about it are as follows: -
Can you please tell me what should be those 15 questions to ask from a company,
say a telecom company?
First of all meet your sponsors and make a BRD (business requirement document) about their
expectation from this data warehouse (main aim comes from them).For example they need customer
billing process. Now go to business management team they can ask for metrics out of billing process for
their use. Now management people monthly usage, billing metrics, sales organization, rate plan to
perform sales rep and channel performance analysis and rate plan analysis. So your dimension tables
can be Customer (customer id, name, city, state etc) Sales rep sales rep number, name, idsalesorg: sales
ord idBill dimension: Bill #,Bill date, Numberrate plan:rate plan codeAnd Fact table can be:Billing
details(bill #,customer id, minutes used, call details etc)you can follow star and snow flake schema in this
case. Depend upon the granularity of your data.
Can i start and stop single session in concurrent batch?
Just right click on the particular session and going to recovery option
or
by using event wait and event rise
What is Micro Strategy? Why is it used for? Can any one explain in detail about it?
Micro strategy is again an BI tool which is a HOLAP... u can create 2 dimensional report and also cubes
in here.......basically a reporting tool. It has a full range of reporting on web also in windows.
What is difference b/w Informatica 7.1 and Abinitio
There is a lot of difference between Inforrmatica an Abinitio
In Ab Initio we r using 3 parllalisim
but Informatica using 1 parllalisim
In Ab Initio no scheduling option we can scheduled manully or pl/sql script
but informatica contains 4 scheduling options
Ab Inition contains co-operating system
but informatica is not
Ramp time is very quickly in Ab Initio campare than Informatica
Ab Initio is userfriendly than Informatica
IQD file is nothing but Impromptu Query Definition, This file is mainly used in Cognos Impromptu tool
after creating a imr ( report) we save the imr as IQD file which is used while creating a cube in power
play transformer.In data source type we select Impromptu Query Definetion.
Differences between Normalizer and Normalizer transformation.
Normalizer: It is a transormation mainly using for cobol sources,
it's change the rows into coloums and columns into rows
Normalization:To remove the retundancy and inconsitecy
How do I import VSAM files from source to target. Do I need a special plugin
In mapping Designer we have direct option to import files from VSAM Navigation : Sources => Import
from file => file from COBOL
What is the procedure or steps implementing versioning if you are already in
version7.X. Any gotcha\'s or precautions..
For version control in ETL layer using informatica, first of all after doing anything in your designer mode or
workflow manager, do the following steps.....
1> First save the changes or new implementations.
2>Then from navigator window, right click on the specific object you are currently in. There will be a pop
up window. In that window at the lower end side, you will find versioning->Check In. A window will be
opened. Leave the information you have done like "modified this mapping" etc. Then click ok button.
can anyone explain error handling in informatica with examples so that it will be easy
to explain the same in the interview.
go to the session log file there we will find the information regarding to the
session initiation process,
errors encountered.
load summary.
so by seeing the errors encountered during the session running, we can resolve the errors.
If you have four lookup tables in the workflow How do you troubleshoot to improve
performance?
There r many ways to improve the mapping which has multiple lookups.
1) We can create an index for the lookup table if we have permissions(staging area).
2) Divide the lookup mapping into two (a) dedicate one for insert means: source - target,, these r new
rows only the new rows will come to mapping and the process will be fast . (b) Dedicate the second one
to update : source=target,, these r existing rows only the rows which exists allready will come into the
mapping.
3)we can increase the chache size of the lookup
If you are workflow is running slow in informatica. Where do you start trouble
shooting and what are the steps you follow? If you are workflow is running slow in
informatica. Where do you start trouble shooting and what are the steps you follow?
SOLN1: when the work flow is running slowly you have to find out the bottlenecks
in this order
target
source
mapping
session
system
SOLN2: work flow may be slow due to different reasons one is alpha characters in decimal data check it
out this and due to insufficient length of strings check with the SQL override
How do you handle decimal places while importing a flatfile into informatica?
while importing the flat file, the flat file wizard helps in configuring the properties of the file so that select
the numeric column and just enter the precision value and the scale. Precision includes the scale for
examples if the number is 98888.654, enter precision as 8 and scale as 3 and width as 10 for fixed width
flat file
In a sequential Batch how can we stop single session?
we have a task called wait event using that we can stop.
we start using raise event.
why dimenstion tables are denormalized in nature ?...
how to do this.
SOLN1: u can do onething after running the mapping,, in workflow manager
start-------->session.
right clickon the session u will get a menu, in that go for persistant values, there u will find the last value
stored in the repository regarding to mapping variable. then remove it and put ur desired one, run the
session... i hope ur task will be done
SOLN2: it takes value of 51 but u can override the saved variable in the repository by defining the value
in the parameter file.if there is a parameter file for the mapping variable it uses the value in the parameter
file not the value+1 in the repositoryfor example assign the value of the mapping variable as 70.in othere
words higher preference is given to the value in the parameter file
how to use mapping parameters and what is their use
Mapping parameters and variables make the use of mappings more flexible and also it avoids creating of
multiple mappings. it helps in adding incremental data mapping parameters and variables has to create in
the mapping designer by choosing the menu option as Mapping ----> parameters and variables and the
enter the name for the variable or parameter but it has to be preceded by $$. and choose type as
parameter/variable, data type once defined the variable/parameter is in the any expression for example in
SQ transformation in the source filter properties tab. just enter filter condition and finally create a
parameter file to assign the value for the variable / parameter and configure the session properties.
however the final step is optional. if their parameter is not present it uses the initial value which is
assigned at the time of creating the variable
How to delete duplicate rows in flat files source is any option in informatica
Use a sorter transformation , in that u will have a "distinct" option make use of it .
What is the use of incremental aggregation? Explain me in brief with an example.
Its a session option when the informatica server performs incremental aggregation it passes new source
data through the mapping and uses historical cache data to perform new aggregation calculations
incrementally for performance we will use it.
What is the procedure to load the fact table.Give in detail?
SOLN1: we use the 2 wizards (i.e) the getting started wizard and slowly changing dimension wizard to
load the fact and dimension tables,by using these 2 wizards we can create different types of mappings
according to the business requirements and load into the star schemas(fact and dimension tables).
SOLN2: first dimenstion tables need to be loaded, then according to the specifications the fact tables
should be loaded. Don’t think that fact table’s r different in case of loading; it is general mapping as we do
for other tables. specifications will play important role for loading the fact.
How to lookup the data on multiple tabels.
if u want to lookup data on multiple tables at a time u can do one thing join the tables which u want then
lookup that joined table. informatica provieds lookup on joined tables
How to retrieve the records from a rejected file. explane with syntax or example
SOLN1: there is one utility called "reject Loader" where we can find out the reject records and able to
refine and reload the rejected records..
SOLN2: During the execution of workflow all the rejected rows will be stored in bad files (where your
informatica server get installed C:\Program Files\Inforrmatica Power Center 7.1\Server) These bad files
can be imported as flat a file in source then thro' direct mapping we can load these files in desired format.
How does the server recognise the source and target databases?
By using ODBC connection.if it is relational.if is flat file FTP connection..see we can make sure with
connection in the properties of session both sources & targets
What are variable ports and list two situations when they can be used?
We have mainly three ports Inport, Outport, Variable port. Inport represents data is flowing into
transformation. Outport is used when data is mapped to next transformation. Variable port is used when
we mathematical calculations are required.
you can also use as for example consider price and quantity and total as a variable we can make a sum
on the total_amt by giving
sum (total_amt)
variable port is used to break the complex expression into simpler
and also it is used to store intermediate values
What is difference between IIF and DECODE function...
You can use nested IIF statements to test multiple conditions. The following example tests for various
conditions and returns 0 if sales is zero or negative:
IIF( SALES > 0, IIF( SALES < 50, SALARY1, IIF( SALES < 100, SALARY2,
IIF( SALES < 200, SALARY3, BONUS))), 0 )
You can use DECODE instead of IIF in many cases. DECODE may improve readability. The following
shows how you can use DECODE instead of IIF :
SALES > 0 and SALES < 50, SALARY1,
SALES > 49 AND SALES < 100, SALARY2,
SALES > 99 AND SALES < 200, SALARY3,
SALES > 199, BONUS)
in Dimensional modeling fact table is normalized or denormalized?in case of star
schema and incase of snow flake schema?
No concept of normailzation in the case of star schema but in the case of snow flack schema dimension
table must be normalized.
Star schema--De-Normalized dimensions
Snow Flake Schema-- Normalized dimensions
which is better among connected lookup and unconnected lookup transformations in
informatica or any other ETL tool?
When you compared both basically connected lookup will return more values and unconnected returns
one value conn lookup is in the same pipeline of source and it will accept dynamic caching. Unconn
lookup don't have that facility but in some special cases we can use Unconnected. if o/p of one lookup is
going as i/p of another lookup this unconnected lookups are favorable
I think the better one is connected look up. beacaz we can use dynamic cache with it ,, also connected
loop up can send multiple columns in a single row, where as unconnected is concerned it has a single
return port.(in case of etl informatica is concerned)
What is the limit to the number of sources and targets you can have in a mapping
As per my knowledge there is no such restriction to use this number of sources or targets inside a
mapping.
Question is " if you make N number of tables to participate at a time in processing what is the position of
your database. I organization point of view it is never encouraged to use N number of tables at a time, It
reduces database and informatica server performance"
The restriction is only on the database side. how many concurrent threads r u allowed to run on the db
server?
which objects are required by the debugger to create a valid debug session?
Initially the session should be valid session.
Source, target, lookups, expressions should be available min 1 break point should be available for
debugger to debug your session.
Informatica server Object is must.
what is the procedure to write the query to list the highest salary of three employees?
SELECT sal
FROM (SELECT sal FROM my_table ORDER BY sal DESC)
WHERE ROWNUM < 4;
since this is informatica.. you might as well use the Rank transformation. check out the help file on how to
use it.
We are using Update Strategy Transformation in mapping how can we know whether
insert or update or reject or delete option has been selected during running of
sessions in Informatica.
In Designer while creating Update Strategy Transformation uncheck "forward to next transformation". If
any rejected rows are there automatically it will be updated to the session log file.
Update or insert files are known by checking the target file or table only.
Suppose session is configured with commit interval of 10,000 rows and source has
50,000 rows. Explain the commit points for Source based commit and Target based
commit. Assume appropriate value wherever required.
Source based commit will commit the data into target based on commit interval so for every 10,000 rows
it will commit into target.
Target based commit will commit the data into target based on buffer size of the target. i.e., it commits the
data into target when ever the buffer fills Let us assume that the buffer size is 6,000. So for every 6,000
rows it commits the data.
How do we estimate the number of partitions that a mapping really requires? Is it
dependent on the machine configuration?
It depends upon the informatica version we r using suppose if we r using informatica 6 it supports only 32
partitions where as informatica 7 supports 64 partitions
Can Informatica be used as a Cleansing Tool? If yes give example of transformations
that can implement a data cleansing routine.
Yes, we can use Informatica for cleansing data some time we use stages to cleansing the data. It
depends upon performance again else we can use expression to cleansing data.
For example a field X has some values and other with Null values and assigned to target field where
target field is not null column, inside an expression we can assign space or some constant value to avoid
session failure.
The input data is in one format and target is in another format, we can change the format in expression.
We can assign some default values to the target to represent complete set of data in the target.
How do you decide whether you need it do aggregations at database level or at
Informatica level?
It depends upon our requirement only If you have good processing database you can create aggregation
table or view at database level else its better to use informatica. Here I am explaining why we need to use
informatica.
what ever it may be informatica is a third party tool, so it will take more time to process aggregation
compared to the database, but in Informatica an option we called "Incremental aggregation" which will
help you to update the current values with current values +new values. No necessary to process entire
values again and again unless this can be done if nobody deleted that cache files. If that happened total
aggregation we need to execute on informatica also.
In database we don't have Incremental aggregation facility.
Identifying bottlenecks in various components of Informatica and resolving them.
The best way to find out bottlenecks is writing to flat file and see where the bottle neck is .
How to join two tables without using the Joiner Transformation
SOLN1: It possible to join the two or more tables by using source qualifier. But provided the tables
should have relationship.
When u drag n drop the table u will getting the source qualifier for each table. Delete all the
source qualifiers. Add a common source qualifier for all. Right click on the source qualifier u will find EDIT
click on it. Click on the properties tab, u will find sql query in that u can write ur sqls
SOLN2: joiner transformation is used to join n (n>1) tables from same or different databases, but source
qualifier transformation is used to join only n tables from same database
SOLN3: use Source Qualifier transformation to join tables on the SAME database. Under its properties
tab, you can specify the user-defined join. Any select statement you can run on a database.. you can do
also in Source Qualifier.
Note: you can only join 2 tables with Joiner Transformation but you can join two tables from
different databases.
In a filter expression we want to compare one date field with a db2 system field
CURRENT DATE.
Our Syntax: datefield = CURRENT DATE (we didn't define it by ports, its a system
field ), but this is not valid (PMParser: Missing Operator)..
Can someone help us.
the db2 date format is "yyyymmdd" where as sysdate in oracle will give "dd-mm-yy" so conversion of
db2 date formate to local database date formate is compulsary. other wise u will get that type of error
Use Sysdate or use to_date for the current date
what does the expression n filter transformations do in Informatica Slowly growing
target wizard?
EXPESSION transformation detects and flags the rows from source.
Filter transformation filters the rows that are not flagged and passes the flagged rows to the Update
strategy transformation
how to create the staging area in your database
A Staging area in a DW is used as a temporary space to hold all the records from the source system. So
more or less it should be exact replica of the source systems except for the laod startegy where we use
truncate and reload options.
So create using the same layout as in your source tables or using the Generate SQL option in the
Warehouse Designer tab.
whats the diff between Informatica powercenter server, repositoryserver and
repository?
Power center server contains the scheduled runs at which time data should load from source to target
Repository contains all the definitions of the mappings done in designer.
What are the Differences between Informatica Power Center versions 6.2 and 7.1,
also between Versions 6.2 and 5.1?
The main difference between informatica 5.1 and 6.1 is that in 6.1 they introduce a new thing called
repository server and in place of server manager(5.1), they introduce workflow manager and workflow
monitor.
In ver 7x u have the option of looking up (lookup) on a flat file.
U can write to XML target.
Versioning
LDAP authentication
Support of 64 bit architectures
Differences between Informatica 6.2 and Informatica 7.0
Features in 7.1 are :
1. Union and custom transformation
2. Lookup on flat file
3. Grid servers working on different operating systems can coexist on same server
4. We can use pmcmdrep
5. We can export independent and dependent rep objects
6. We ca move mapping in any web application
7. Version controlling
8. Data profilling
What is the difference between connected and unconnected stored procedures.
Run a stored procedure before or after your session. Unconnected
Run a stored procedure once during your mapping, such as pre- or post-
Unconnected
session.
Run a stored procedure every time a row passes through the Stored Procedure Connected or
transformation. Unconnected
Run a stored procedure based on data that passes through the mapping, such
Unconnected
as when a specific port does not contain a null value.
Connected or
Pass parameters to the stored procedure and receive a single output parameter.
Unconnected
Pass parameters to the stored procedure and receive multiple output
parameters.
Connected or
Note: To get multiple output parameters from an unconnected Stored
Unconnected
Procedure transformation, you must create variables for each output
parameter. For details, see Calling a Stored Procedure From an Expression.
Run nested stored procedures. Unconnected
Call multiple times within a mapping. Unconnected
Discuss which is better among incremental load, Normal Load and Bulk load
If the database supports bulk load option from Inforrmatica then using BULK LOAD for intial loading the
tables is recommended.
Depending upon the requirment we should choose between Normal and incremental loading strategies
If supported by the database bulk load can do the loading faster than normal load.(incremental load
concept is differnt dont merge with bulk load, mormal load)
Compare Data Warehousing Top-Down approach with Bottom-up approach
in top down approch: first we have to build dataware house then we will build data marts. which will
need more crossfunctional skills and timetaking process also costly.
in bottom up approach: first we will build data marts then data warehuse. the data mart that is first build
will remain as a proff of concept for the others. less time as compared to above and less cost.
What is the difference between summary filter and detail filter
summary filter can be applied on a group of rows that contain a common value where as detail filters can
be applied on each and every rec of the data base.
what are the difference between view and materialized view?
Materialized views are schema objects that can be used to summarize, precompute, replicate, and distribute data.
E.g. to construct a data warehouse.
A materialized view provides indirect access to table data by storing the results of a query in a separate schema
object. Unlike an ordinary view, which does not take up any storage space or contain any data
can we modify the data in flat file?
Just open the text file with notepad, change what ever you want (but datatype should be the same)
how to get the first 100 rows from the flat file into the target?
SOLN1: task ----->(link) session (workflow manager)
double click on link and type $$source sucsess rows(parameter in session variables) = 100
it should automatically stops session.
SOLN2: 1. Use test download option if you want to use it for testing.
2. Put counter/sequence generator in mapping and perform it.
can we lookup a table from a source qualifer transformation-unconnected lookup
No. we can't do.
I will explain you why.
1) Unless you assign the output of the source qualifier to another transformation or to target no way it will
include the feild in the query.
2) source qualifier don't have any variables feilds to utalize as expression.
what is a junk dimension
A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are
unrelated to any particular dimension. The junk dimension is simply a structure that provides a
convenient place to store the junk attributes. A good example would be a trade fact in a company that
brokers equity trades.
What is the difference between Narmal load and Bul...
Normal Load: Normal load will write information to the database log file so that if any recorvery is needed
it is will be helpful. when the source file is a text file and loading data to a table,in such cases we should
you normal load only, else the session will be failed.Bulk Mode: Bulk load will not write information to the
database log file so that if any recorvery is needed we can't do any thing in such cases. compartivly Bulk
load is pretty faster than normal load.
At the max how many tranformations can be us in a mapping?
There is no such limitation to use this number of transformations. But in performance point of view using
too many transformations will reduce the session performance.
My idea is "if needed more tranformations to use in a mapping its better to go for some stored
procedure."
Waht are main advantages and purpose of using Normalizer Transformation in Informatica?
Narmalizer Transformation is used mainly with COBOL sources where most of the time data is stored in
de-normalized format. Also, Normalizer transformation can be used to create multiple rows from a single
row of data
How do u convert rows to columns in Normalizer? could you explain us??
Normally, its used to convert columns to rows but for converting rows to columns, we need an aggregator
and expression and little effort is needed for coding. Denormalization is not possible with a Normalizer
transformation.
Discuss the advantages & Disadvantages of star & snowflake schema?
In a star schema every dimension will have a primary key.
In a star schema, a dimension table will not have any parent table.
Whereas in a snow flake schema, a dimension table will have one or more parent tables.
Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill
down the data from topmost hierachies to the lowermost hierarchies.
star schema consists of single fact table surrounded by some dimensional table.In snowflake schema the
dimension tables are connected with some subdimension table.
In starflake dimensional ables r denormalized,in snowflake dimension tables r normalized.
star schema is used for report generation ,snowflake schema is used for cube.
The advantage of snowflake schema is that the normalized tables r easier to maintain.it also saves the
storage space.
The disadvantage of snowflake schema is that it reduces the effectiveness of navigation across the tables
due to large no of joins between them.
what is a time dimension? give an example.
Time dimension is one of important in Datawarehouse. Whenever u genetated the report , that time u
access all data from thro time dimension.
Fields : Date key, full date, day of wek, day , month,quarter,fiscal year
What r the connected or unconnected transforamations?
Connected transformation is a part of your data flow in the pipeline while unconnected Transformation is
not.
much like calling a program by name and by reference.
use unconnected transforms when you wanna call the same transform many times in a single mapping
An unconnected transformation cant be connected to another transformation. but it can be called inside
another transformation.
uncondition transformation are directly connected and can/used in as many as other transformations. If
you are using a transformation several times, use unconditional. You get better performance.
How can U create or import flat file definition in to the warehouse designer?
U can create flat file definition in warehouse designer.in the warehouse designer,u can create new target:
select the type as flat file. save it and u can enter various columns for that created target by editing its
properties.Once the target is created, save it. u can import it from the mapping designer.
U can not create or import flat file defintion in to warehouse designer directly.Instead U must analyze the
file in source analyzer,then drag it into the warehouse designer.When U drag the flat file source defintion
into warehouse desginer workspace,the warehouse designer creates a relational target defintion not a file
defintion.If u want to load to a file,configure the session to write to a flat file.When the informatica server
runs the session,it creates and loads the flatfile.
What r the tasks that Loadmanger process will do?
Manages the session and batch scheduling: Whe u start the informatica server the load maneger
launches and queries the repository for a list of sessions configured to run on the informatica
server.When u configure the session the loadmanager maintains list of list of sessions and session start
times.When u sart a session loadmanger fetches the session information from the repository to perform
the validations and verifications prior to starting DTM process.
Locking and reading the session: When the informatica server starts a session lodamaager locks the
session from the repository.Locking prevents U starting the session again and again.
Reading the parameter file: If the session uses a parameter files,loadmanager reads the parameter file
and verifies that the session level parematers are declared in the file
Verifies permission and privelleges: When the sesson starts load manger checks whether or not the user
have privelleges to run the session.
Creating log files: Loadmanger creates logfile contains the status of session.
How do you transfert the data from data warehouse to flatfile?
You can write a mapping with the flat file as a target using a DUMMY_CONNECTION. A flat file target is
built by pulling a source into target space using Warehouse Designer tool.
Diff between informatica repositry server & informatica server
Informatica Repository Server:It's manages connections to the repository from client application.
Informatica Server:It's extracts the source data,performs the data transformation,and loads the
transformed data into the target
Router transformation
A Router transformation is similar to a Filter transformation because both transformations allow you to
use a condition to test data. A Filter transformation tests data for one condition and drops the rows of
data that do not meet the condition. However, a Router transformation tests data for one or more
conditions and gives you the option to route rows of data that do not meet any of the conditions to a
default output group.
What are 2 modes of data movement in Informatica Server?The data movement mode depends on
whether Informatica Server should process single byte or multi-byte character data. This mode selection
can affect the enforcement of code page relationships and code page validation in the Informatica Client
and Server.
a) Unicode - IS allows 2 bytes for each character and uses additional byte for each non-
ascii character (such as Japanese characters)
b) ASCII - IS holds all data in a single byte
The IS data movement mode can be changed in the Informatica Server configuration parameters. This
comes into effect once you restart the Informatica Server.
How to read rejected data or bad data from bad file and reload it to target?
correction the rejected data and send to target relational tables using loadorder utility. Find out the
rejected data by using column indicatior and row indicator.
Explain the informatica Architecture in detail
Informatica server connects source data and target data using native
odbc drivers
again it connect to the repository for running sessions and retriveing metadata information
source------>informatica server--------->target
|
|
REPOSITORY repository←Repository→Repository ser.adm.
control server ¢Õ source←informatica server→target
-------------¢Õ ¢Õ ¢Õdesigner w.f.manager
w.f.monitor
how can we partition a session in Informatica?
When the PowerCenter Server runs a session, the DTM performs the following tasks:
1. Fetches session and mapping metadata from the repository.
2. Creates and expands session variables.
3. Creates the session log file.
4. Validates session code pages if data code page validation is enabled. Checks query
conversions if data code page validation is disabled.
5. Verifies connection object permissions.
6. Runs pre-session shell commands.
7. Runs pre-session stored procedures and SQL.
8. Creates and runs mapping, reader, writer, and transformation threads to extract,transform, and
load data.
9. Runs post-session stored procedures and SQL.
10. Runs post-session shell commands.
11. Sends post-session email.
What is Data cleansing..?
The process of finding and removing or correcting data that is incorrect, out-of-date, redundant,
incomplete, or formatted incorrectly.
This is nothing but polising of data. For example of one of the sub system store the Gender as M
and F. The other may store it as MALE and FEMALE. So we need to polish this data, clean it before it is
add to Datawarehouse. Other typical example can be Addresses. The all sub systesms maintinns the
customer address can be different. We might need a address cleansing to tool to have the customers
addresses in clean and neat form.
To provide support for Mainframes source data,which files r used as a source
definitions?COBOL Copy-book filesWhere should U place the flat file to import the flat file
defintion to the designer?
There is no such restrication to place the source file. In performance point of view its better to place the
file in server local src folder. if you need path please check the server properties availble at workflow
manager.
It doesn't mean we should not place in any other folder, if we place in server src folder by default src will
be selected at time session creation
How many ways you can update a relational source defintion and what r they?Two
ways
1. Edit the definition
2. Reimport the definitionWhich transformation should u need while using the cobol
sources as source defintions?Normalizer transformaiton which is used to normalize the data.Since
cobol sources r oftenly consists of Denormailzed data.
What is the maplet?
For Ex:Suppose we have several fact tables that require a series of dimension keys.Then we can create
a mapplet which contains a series of Lkp transformations to find each dimension key and use it in each
fact table mapping instead of creating the same Lkp logic in each mapping.
what is a transforamation?It is a repostitory object that generates,modifies or passes data.A
transformation is repository object that pass data to the next stage(i.e to the next transformation or target)
with/with out modifying the dataWhat r the active and passive transforamtions?An active
transforamtion can change the number of rows that pass through it.A passive transformation does not
change the number of rows that pass through it.
Transformations can be active or passive. An active transformation can change the number of rows that
pass through it, such as a Filter transformation that removes rows that do not meet the filter condition.
A passive transformation does not change the number of rows that pass through it, such as an
Expression transformation that performs a calculation on data and passes all rows through the
transformation.
What r the reusable transforamtions?Reusable transformations can be used in multiple
mappings.When u need to incorporate this transformation into maping,U add an instance of it to
maping.Later if U change the definition of the transformation ,all instances of it inherit the changes.Since
the instance of reusable transforamation is a pointer to that transforamtion,U can change the
transforamation in the transformation developer,its instances automatically reflect these changes.This
feature can save U great deal of work.What r the methods for creating reusable
transforamtions?Two methods
1.Design it in the transformation developer.
2.Promote a standard transformation from the mapping designer.After U add a transformation to the
mapping , U can promote it to the status of reusable transformation.
Once U promote a standard transformation to reusable status,U can demote it to a standard
transformation at any time.
If u change the properties of a reusable transformation in mapping,U can revert it to the original reusable
transformation properties by clicking the revert button.What r the unsupported repository
objects for a mapplet?COBOL source definition
Joiner transformations
Normalizer transformations
Non reusable sequence generator transformations.
Pre or post session stored procedures
Target defintions
Power mart 3.5 style Look Up functions
XML source definitions
IBM MQ source definitions• Source definitions. Definitions of database objects (tables, views,
synonyms) or files that provide source data. • Target definitions. Definitions of database objects or files
that contain the target data. • Multi-dimensional metadata. Target definitions that are configured as
cubes and dimensions. • Mappings. A set of source and target definitions along with transformations
containing business logic that you build into the transformation. These are the instructions that the
Informatica Server uses to transform and move data. • Reusable transformations. Transformations that
you can use in multiple mappings. • Mapplets. A set of transformations that you can use in multiple
mappings. • Sessions and workflows. Sessions and workflows store information about how and when
the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run
tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in
a workflow. Each session corresponds to a single mapping.What r the mapping paramaters and
maping variables?Maping parameter represents a constant value that U can define before running a
session.A mapping parameter retains the same value throughout the entire session.
When u use the maping parameter ,U declare and use the parameter in a maping or maplet.Then define
the value of parameter in a parameter file for the session.
Unlike a mapping parameter,a maping variable represents a value that can change throughout the
session.The informatica server saves the value of maping variable to the repository at the end of session
run and uses that value next time U run the session.Can U use the maping parameters or
variables created in one maping into another maping?NO.
We can use mapping parameters or variables in any transformation of the same maping or mapplet in
which U have created maping parameters or variables.Can u use the maping parameters or
variables created in one maping into any other reusable transformation?Yes.Because
reusable tranformation is not contained with any maplet or maping.
How can U improve session performance in aggregator transformation?
Normal (Default) -- only matching rows from both master and detail
Master outer -- all detail rows and only matching rows from master
Detail outer -- all master rows and only matching rows from detail
Full outer -- all rows from both master and detail ( matching or non matching)
follw this
1. In the Mapping Designer, choose Transformation-Create. Select the Joiner
transformation. Enter a name, click OK.
The naming convention for Joiner transformations is JNR_TransformationName. Enter a
description for the transformation. This description appears in the Repository Manager, making
it easier for you or others to understand or remember what the transformation does. The
Designer creates the Joiner transformation. Keep in mind that you cannot use a Sequence
Generator or Update Strategy transformation as a source to a Joiner transformation.
2. Drag all the desired input/output ports from the first source into the Joiner
transformation.
The Designer creates input/output ports for the source fields in the Joiner as detail fields by
default. You can edit this property later.
3. Select and drag all the desired input/output ports from the second source into the Joiner
transformation.
The Designer configures the second set of source fields and master fields by default.
4. Double-click the title bar of the Joiner transformation to open the Edit Transformations
dialog box.
5. Select the Ports tab.
6. Click any box in the M column to switch the master/detail relationship for the sources.
Change the master/detail relationship if necessary by selecting the master source in the
M column.
Tip: Designating the source with fewer unique records as master increases performance during a
join.
7. Add default values for specific ports as necessary.
Certain ports are likely to contain NULL values, since the fields in one of the sources may be
empty. You can specify a default value if the target database does not handle NULLs.
8. Select the Condition tab and set the condition.
9. Click the Add button to add a condition. You can add multiple conditions. The master
and detail ports must have matching datatypes. The Joiner transformation only supports
equivalent (=) joins:
10. Select the Properties tab and enter any additional settings for the transformations.
11. Click OK.
12. Choose Repository-Save to save changes to the mapping.
What r the joiner caches?When a Joiner transformation occurs in a session, the Informatica Server
reads all the records from the master source and builds index and data caches based on the master rows.
After building the caches, the Joiner transformation reads records from the detail source and perform
joinswhat is the look up transformation?Use lookup transformation in u’r mapping to lookup data
in a relational table,view,synonym.
Informatica server queries the look up table based on the lookup ports in the transformation.It compares
the lookup transformation port values to lookup table column values based on the look up condition.Why
use the lookup transformation ?To perform the following tasks.
Get a related value. For example, if your source table includes employee ID, but you want to include the
employee name in your target table to make your summary data easier to read.
Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales
per invoice or sales tax, but not the calculated value (such as net sales).
Update slowly changing dimension tables. You can use a Lookup transformation to determine whether
records already exist in the target.
What r the types of lookup?
1. Connected lookup
2. Unconnected lookup
1. Persistent cache
2. Re-cache from database
3. Static cache
4. Dynamic cache
5. Shared cache
Differences between connected and unconnected lookup?
Receives input values diectly from the Receives input values from the result of a lkp expression in
pipe line. a another transformation.
Cache includes all lookup columns Cache includes all lookup out put ports in the lookup
used in the maping condition and the lookup/return port.
Support user defined default values Does not support user defiend default values
What is meant by lookup caches?The informatica server builds a cache in memory when it
processes the first row af a data in a cached look up transformation.It allocates memory for the cache
based on the amount u configure in the transformation or session properties.The informatica server stores
condition values in the index cache and output values in the data cache.What r the types of lookup
caches?Persistent cache: U can save the lookup cache files and reuse them the next time the
informatica server processes a lookup transformation configured to use the cache.
Recache from database: If the persistent cache is not synchronized with he lookup table, U can configure
the lookup transformation to rebuild the lookup cache.
Static cache: U can configure a static or readonly cache for only lookup table.By default informatica
server creates a static cache.It caches the lookup table and lookup values in the cache for each row that
comes into the transformation.when the lookup condition is true,the informatica server does not update
the cache while it prosesses the lookup transformation.
Dynamic cache: If u want to cache the target table and insert new rows into cache and the target,u can
create a look up transformation to use dynamic cache.The informatica server dynamically inerts data to
the target table.
Shared cache: U can share the lookup cache between multiple transactions. U can share unnamed cache
between transformations in the same maping.Difference between static cache and dynamic
cache
Static cache Dynamic cache
U can insert rows into the cache as u pass to
U can not insert or update the cache
the target
The informatica server returns a value from the lookup table The informatica server inserts rows into
or cache when the condition is true. When the condition is not cache when the condition is false. This
true, informatica server returns the default value for indicates that the row is not in the cache or
connected transformations and null for unconnected target table. U can pass these rows to the
transformations. target table
Which transformation should we use to normalize the COBOL and relational
sources?Normalizer Transformation.
When U drag the COBOL source in to the mapping Designer workspace,the normalizer transformation
automatically appears,creating input and output ports for every column in the source.How the
informatica server sorts the string values in Ranktransformation?When the informatica
server runs in the ASCII data movement mode it sorts session data using Binary sortorder.If U configure
the seeion to use a binary sort order,the informatica server caluculates the binary value of each string and
returns the specified number of rows with the higest binary values for the string.What r the rank
caches?During the session ,the informatica server compares an inout row with rows in the datacache.If
the input row out-ranks a stored row,the informatica server replaces the stored row with the input row.The
informatica server stores group information in an index cache and row data in a data cache.What is the
Rankindex in Ranktransformation?The Designer automatically creates a RANKINDEX port for
each Rank transformation. The Informatica Server uses the Rank Index port to store the ranking position
for each record in a group. For example, if you create a Rank transformation that ranks the top 5
salespersons for each quarter, the rank index numbers the salespeople from 1 to 5:What is the
Router transformation?A Router transformation is similar to a Filter transformation because both
transformations allow you to use a condition to test data. However, a Filter transformation tests data for
one condition and drops the rows of data that do not meet the condition. A Router transformation tests
data for one or more conditions and gives you the option to route rows of data that do not meet any of the
conditions to a default output group.
If you need to test the same input data based on multiple conditions, use a Router Transformation in a
mapping instead of creating multiple Filter transformations to perform the same task.What r the types
of groups in Router transformation?Input group Output group
The designer copies property information from the input ports of the input group to create a set of output
ports for each output group.
Two types of output groups
User defined groups
Default group
U can not modify or delete default groups.Why we use stored procedure transformation?
A Stored Procedure transformation is an important tool for populating and maintaining
databases. Database administrators create stored procedures to automate time-consuming tasks
that are too complicated for standard SQL statements
What r the types of data that passes between informatica server and stored
procedure?3 types of data
Input/Out put parameters
Return Values
Status code.What is the status code?Status code provides error handling for the informatica server
during the session.The stored procedure issues a status code that notifies whether or not stored
procedure completed sucessfully.This value can not seen by the user.It only used by the informatica
server to determine whether to continue running the session or stop.
What is source qualifier transformation? What r the tasks that source qualifier performs?
When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source
Qualifier transformation. The Source Qualifier represents the rows that the Informatica Server reads
when it executes a session.
• Join data originating from the same source database. You can join two or more tables with
primary-foreign key relationships by linking the sources to one Source Qualifier. • Filter records when
the Informatica Server reads source data. If you include a filter condition, the Informatica Server adds
a WHERE clause to the default query. • Specify an outer join rather than the default inner join. If
you include a user-defined join, the Informatica Server replaces the join information specified by the
metadata in the SQL query. • Specify sorted ports. If you specify a number for sorted ports, the
Informatica Server adds an ORDER BY clause to the default SQL query. • Select only distinct values
from the source. If you choose Select Distinct, the Informatica Server adds a SELECT DISTINCT
statement to the default SQL query. • Create a custom query to issue a special SELECT statement
for the Informatica Server to read source data. For example, you might use a custom query to
perform aggregate calculations or execute a stored procedure.
What is the target load order?U specify the target loadorder based on source qualifiers in a
maping.If u have the multiple
source qualifiers connected to the multiple targets,U can designatethe order in which informatica
server loads data into the targets.
A target load order group is the collection of source qualifiers, transformations, and targets linked together
in a mapping.
What is the default join that source qualifier provides?Inner equi join.
The Joiner transformation supports the following join types, which you set in the Properties tab:
• Normal (Default)
• Master Outer
• Detail Outer
• Full Outer
What r the basic needs to join two sources in a source qualifier?Two sources should have
primary and Foreign key relation ships.
Two sources should have matching data types.
what is update strategy transformation ?
The model you choose constitutes your update strategy, how to handle changes to existing rows. In
PowerCenter and PowerMart, you set your update strategy at two different levels:
• Within a session. When you configure a session, you can instruct the Informatica Server to
either treat all rows in the same way (for example, treat all rows as inserts), or use instructions
coded into the session mapping to flag rows for different database operations.
• Within a mapping. Within a mapping, you use the Update Strategy transformation to flag rows
for insert, delete, update, or reject.
Describe two levels in which update strategy transformation sets?Within a session. When
you configure a session, you can instruct the Informatica Server to either treat all records in the same way
(for example, treat all records as inserts), or use instructions coded into the session mapping to flag
records for different database operations.
Within a mapping. Within a mapping, you use the Update Strategy transformation to flag records for
insert, delete, update, or reject.What is the default source option for update stratgey
transformation?Data driven.What is Datadriven?The informatica server follows instructions
coded into update strategy transformations with in the session maping determine how to flag records for
insert, update, delete or reject. If u do not choose data driven option setting,the informatica server ignores
all update strategy transformations in the mapping.What r the options in the target session of
update strategy transsformatioin?Insert
Delete
Update
Update as update
Update as insert
Update esle insert
Truncate table
Update as Insert:
This option specified all the update records from source to be flagged as inserts in the target. In other
words, instead of updating the records in the target they are inserted as new records.
Update else Insert:
This option enables informatica to flag the records either for update if they are old or insert, if they are
new records from source.
What r the types of maping wizards that r to be provided in Informatica?Simple Pass
through Slowly Growing Target Slowly Changing the Dimension Type1
Most recent values
Type2Full History
Version
Flag
Date
Type3
Current and one previous
What r the types of maping in Getting Started Wizard?Simple Pass through maping :
Loads a static fact or dimension table by inserting all rows. Use this mapping when you want to drop all
existing data from your table before loading new data.
Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the target.
Changes are tracked in the target table by versioning the primary key and creating a version number for
each dimension in the table.
Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension table when you
want to keep a full history of dimension data in the table. Version numbers and versioned primary keys
track the order of changes to each dimension.
Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and
inserts only those found to be new dimensions to the target. Rows containing changes to existing
dimensions are updated in the target. When updating an existing dimension, the Informatica Server saves
existing data in different columns of the same row and replaces the existing data with the updatesWhat r
the different types of Type2 dimension maping?Type2 Dimension/Version Data Maping: In
this maping the updated dimension in the source will gets inserted in target along with a new version
number.And newly added dimension
in source will inserted into target with a primary key.
Type2 Dimension/Flag current Maping: This maping is also used for slowly changing dimensions.In
addition it creates a flag value for changed or new dimension.
Flag indiactes the dimension is new or newlyupdated.Recent dimensions will gets saved with cuurent flag
value 1. And updated dimensions r saved with the value 0.
Type2 Dimension/Effective Date Range Maping: This is also one flavour of Type2 maping used for slowly
changing dimensions.This maping also inserts both new and changed dimensions in to the target.And
changes r tracked by the effective date range for each version of each dimension.How can u
recognise whether or not the newly added rows in the source r gets insert in the
target ?In the Type2 maping we have three options to recognise the newly added rows
Version number
Flagvalue
Effective date RangeWhat r two types of processes that informatica runs the session?
Load manager Process: Starts the session, creates the DTM process, and sends post-session email
when the session completes.
The DTM process. Creates threads to initialize the session, read, write, and transform data, and handle
pre- and post-session operations.What r the new features of the server manager in the
informatica 5.0?U can use command line arguments for a session or batch.This allows U to change
the values of session parameters,and mapping parameters and maping variables.
Parallel data processing: This feature is available for powercenter only.If we use the informatica server on
a SMP system,U can use multiple CPU’s to process a session concurently.
Process session data using threads: Informatica server runs the session in two processes.Explained in
previous question.Can u generate reports in Informatcia?
It is a ETL tool, you could not make reports from here, but you can generate metadata report,
that is not going to be used for business analysis
What is metadata reporter?It is a web based application that enables you to run reports againist
repository metadata.
With a meta data reporter,u can access information about U’r repository with out having knowledge of
sql,transformation language or underlying tables in the repository.Define maping and sessions?
Maping: It is a set of source and target definitions linked by transformation objects that define the rules for
transformation.
Session : It is a set of instructions that describe how and when to move data from source to
targets.Which tool U use to create and manage sessions and batches and to monitor
and stop the informatica server?Informatica server manager.what is polling?It displays the
updated information about the session in the monitor window. The monitor window displays the status of
each session when U poll the informatica server.While importing the relational source
defintion from database,what are the meta data of source U import?Source name
Database location
Column names
Datatypes
Key constraints What r the designer tools for creating tranformations?Mapping designer
Tansformation developer
Mapplet designerHow many ways u create ports?Two ways
1.Drag the port from another transforamtion
2.Click the add buttion on the ports tab.Why we use partitioning the session in informatica?
Partitioning achieves the session performance by reducing the time period of reading the source and
loading the data into target.
Performance can be improved by processing data in parallel in a single session by creating multiple
partitions of the pipeline.
Informatica server can achieve high performance by partitioning the pipleline and performing the extract ,
transformation, and load for each partition in parallel.
To achieve the session partition what r the necessary tasks u have to do?Configure the
session to partition source data.
Install the informatica server on a machine with multiple CPU’s.How the informatica server
increases the session performance through partitioning the source?For a relational
sources informatica server creates multiple connections for each parttion of a single source and extracts
seperate range of data for each connection.Informatica server reads multiple partitions of a single source
concurently.Similarly for loading also informatica server creates multiple connections to the target and
loads partitions of data concurently.
For XML and file sources,informatica server reads multiple files concurently.For loading the data
informatica server creates a seperate file for each partition(of a source file).U can choose to merge the
targets.Why u use repository connectivity?When u edit,schedule the sesion each
time,informatica server directly communicates the repository to check whether or not the session and
users r valid.All the metadata of sessions and mappings will be stored in repository. What is DTM
process?After the loadmanger performs validations for session,it creates the DTM process.DTM is to
create and manage the threads that carry out the session tasks.I creates the master thread.Master thread
creates and manges all the other threads.What r the different threads in DTM process?Master
thread: Creates and manages all other threads
Maping thread: One maping thread will be creates for each session.Fectchs session and maping
information.
Pre and post session threads: This will be created to perform pre and post session operations.
Reader thread: One thread will be created for each partition of a source.It reads data from source.
Transformation thread: It will be created to tranform data.What r the data movement modes in
informatcia?Datamovement modes determines how informatcia server handles the charector data.U
choose the datamovement in the informatica server configuration settings.Two types of datamovement
modes avialable in informatica.
ASCII mode
Uni code mode.What r the out put files that the informatica server creates during the
session running?Informatica server log: Informatica server(on unix) creates a log for all status and
error messages(default name: pm.server.log).It also creates an error log for error messages.These files
will be created in informatica home directory.
Session log file: Informatica server creates session log file for each session.It writes information about
session into log files such as initialization process,creation of sql commands for reader and writer
threads,errors encountered and load summary.The amount of detail in session log file depends on the
tracing level that u set.
Session detail file: This file contains load statistics for each targets in mapping.Session detail include
information such as table name,number of rows written or rejected.U can view this file by double clicking
on the session in monitor window
Performance detail file: This file contains information known as session performance details which helps
U where performance can be improved.To genarate this file select the performance detail option in the
session property sheet.
Reject file: This file contains the rows of data that the writer does notwrite to targets.
Control file: Informatica server creates control file and a target file when U run a session that uses the
external loader.The control file contains the information about the target flat file such as data format and
loading instructios for the external loader.
Post session email: Post session email allows U to automatically communicate information about a
session run to designated recipents.U can create two different messages.One if the session completed
sucessfully the other if the session fails.
Indicator file: If u use the flat file as a target,U can configure the informatica server to create indicator
file.For each target row,the indicator file contains a number to indicate whether the row was marked for
insert,update,delete or reject.
output file: If session writes to a target file,the informatica server creates the target file based on file
prpoerties entered in the session property sheet.
Cache files: When the informatica server creates memory cache it also creates cache files.For the
following circumstances informatica server creates index and datacache files.
Aggreagtor transformation
Joiner transformation
Rank transformation
Lookup transformationIn which circumstances that informatica server creates Reject
files?When it encounters the DD_Reject in update strategy transformation.
Violates database constraint
Filed in the rows was truncated or overflowed.Can u copy the session to a different folder or
repository?Yes. By using copy session wizard u can copy a session in a different folder or
repository.But that
target folder or repository should consists of mapping of that session.
If target folder or repository is not having the maping of copying session ,
u should have to copy that maping first before u copy the sessionIn addition, you can copy the workflow
from the Repository manager. This will automatically copy the mapping, associated source,targets and
session to the target folder.What is batch and describe about types of batches?Grouping of
session is known as batch.Batches r two types
Sequential: Runs sessions one after the other
Concurrent: Runs session at same time.
If u have sessions with source-target dependencies u have to go for sequential batch to start the
sessions one after another.If u have several independent sessions u can use concurrent batches.
Whch runs all the sessions at the same time. How many number of sessions that u can create
in a batch?Any number of sessions.When the informatica server marks that a batch is
failed?If one of session is configured to "run if previous completes" and that previous session failsWhat
is a command that used to run a batch?pmcmd is used to start a batch.What r the different
options used to configure the sequential batches?Two options
Run the session only if previous session completes sucessfully. Always runs the session.In a
sequential batch can u run the session if previous session fails?Yes.By setting the option
always runs the session.Can u start a batches with in a batch?U can not. If u want to start batch
that resides in a batch,create a new independent batch and copy the necessary sessions into the new
batch.Can u start a session inside a batch idividually?We can start our required session only
in case of sequential batch.in case of concurrent batch
we cant do like this.How can u stop a batch?By using server manager or pmcmd.What r the
session parameters?Session parameters r like maping parameters,represent values U might want to
change between
sessions such as database connections or source files.
Server manager also allows U to create userdefined session parameters.Following r user defined
session parameters.
Database connections
Source file names: use this parameter when u want to change the name or location of
session source file between session runs
Target file name : Use this parameter when u want to change the name or location of
session target file between session runs.
Reject file name : Use this parameter when u want to change the name or location of
session reject files between session runs.What is parameter file?Parameter file is to define the
values for parameters and variables used in a session.A parameter
file is a file created by text editor such as word pad or notepad.
U can define the following values in parameter file
Maping parameters
Maping variables
session parameters
For Windows command prompt users, the parameter file name cannot have beginning or trailing spaces.
If the name includes spaces, enclose the file name in double quotes:
-paramfile ”$PMRootDir\my file.txt”
Note: When you write a pmcmd command that includes a parameter file located on another machine, use
the backslash (\) with the dollar sign ($). This ensures that the machine where the variable is defined
expands the server variable.
pmcmd startworkflow -uv USERNAME -pv PASSWORD -s SALES:6258 -f east -w
wSalesAvg -paramfile '$PMRootDir/myfile.txt'
How can u access the remote source into U’r session?Relational source: To acess relational
source which is situated in a remote place ,u need to
configure database connection to the datasource.
FileSource : To access the remote source file U must configure the FTP connection to the
host machine before u create the session.
Hetrogenous : When U’r maping contains more than one source type,the server manager creates
a hetrogenous session that displays source options for all types.What is difference between
partioning of relatonal target and partitioning of file targets?If u parttion a session with a
relational target informatica server creates multiple connections
to the target database to write target data concurently.If u partition a session with a file target
the informatica server creates one target file for each partition.U can configure session properties
to merge these target fileswhat r the transformations that restricts the partitioning of
sessions?Advanced External procedure tranformation and External procedure transformation: This
transformation contains a check box on the properties tab to allow partitioning.
Aggregator Transformation: If u use sorted ports u can not parttion the assosiated source
Joiner Transformation : U can not partition the master source for a joiner transformation
Normalizer Transformation
The performance of the Informatica Server is related to network connections. Data generally moves
across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times
faster. Thus network connections ofteny affect on session performance.So aviod netwrok connections.
Flat files: If u’r flat files stored on a machine other than the informatca server, move those files to the
machine that consists of informatica server.
Relational datasources: Minimize the connections to sources ,targets and informatica server to
improve session performance.Moving target database into server system may improve session
performance.
Staging areas: If u use staging areas u force informatica server to perform multiple datapasses.
Removing of staging areas may improve session performance.
U can run the multiple informatica servers againist the same repository.Distibuting the session load to
multiple informatica servers may improve session performance.
Run the informatica server in ASCII datamovement mode improves the session performance.Because
ASCII datamovement mode stores a character value in one byte.Unicode mode takes 2 bytes to store a
character.
If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve
performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit
from optimization such as adding indexes.
We can improve the session performance by configuring the network packet size,which allows
data to cross the network at one time.To do this go to server manger ,choose server configure database
connections.
If u r target consists key constraints and indexes u slow the loading of data.To improve the session
performance in this case drop constraints and indexes before u run the session and rebuild them after
completion of session.
Running a parallel sessions by using concurrent batches will also reduce the time of loading the
data.So concurent batches may also increase the session performance.
Partittionig the session improves the session performance by creating multiple connections to sources
and targets and loads data in paralel pipe lines.
In some cases if a session contains a aggregator transformation ,u can use incremental aggregation to
improve session performance.
If the sessioin containd lookup transformation u can improve the session performance by enabling the
look up cache.
If U’r session contains filter transformation ,create that filter transformation nearer to the sources
or u can use filter condition in source qualifier.
Aggreagator,Rank and joiner transformation may oftenly decrease the session performance .Because
they must group data before processing it.To improve session performance in this case use sorted ports
option.What is difference between maplet and reusable transformation?Maplet consists of
set of transformations that is reusable.A reusable transformation is a
single transformation that can be reusable.
If u create a variables or parameters in maplet that can not be used in another maping or maplet.Unlike
the variables that r created in a reusable transformation can be usefull in any other maping or maplet.
We can not include source definitions in reusable transformations.But we can add sources to a maplet.
Whole transformation logic will be hided in case of maplet.But it is transparent in case of reusable
transformation.
The repository also stores administrative information such as usernames and passwords, permissions
and privileges, and product version.
Use repository manager to create the repository.The Repository Manager connects to the repository
database and runs the code needed to create the repository tables.Thsea tables
stores metadata in specific format the informatica server,client tools use.What r the types of
metadata that stores in repository?Following r the types of metadata that stores in the repository
Database connections
Global objects
Mappings
Mapplets
Multidimensional metadata
Reusable transformations
Sessions and batches
Short cuts
Source definitions
Target defintions
Transformations• Source definitions. Definitions of database objects (tables, views, synonyms) or files
that provide source data. • Target definitions. Definitions of database objects or files that contain the
target data. • Multi-dimensional metadata. Target definitions that are configured as cubes and
dimensions. • Mappings. A set of source and target definitions along with transformations containing
business logic that you build into the transformation. These are the instructions that the Informatica
Server uses to transform and move data. • Reusable transformations. Transformations that you can
use in multiple mappings. • Mapplets. A set of transformations that you can use in multiple mappings. •
Sessions and workflows. Sessions and workflows store information about how and when the
Informatica Server moves data. A workflow is a set of instructions that describes how and when to run
tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in
a workflow. Each session corresponds to a single mappingWhat is power center repository?The
PowerCenter repository allows you to share metadata across repositories to create a data mart domain.
In a data mart domain, you can create a single global repository to store metadata used across an
enterprise, and a number of local repositories to share the global metadata as needed.• Standalone
repository. A repository that functions individually, unrelated and unconnected to other repositories. •
Global repository. (PowerCenter only.) The centralized repository in a domain, a group of connected
repositories. Each domain can contain one global repository. The global repository can contain common
objects to be shared throughout the domain through global shortcuts. • Local repository. (PowerCenter
only.) A repository within a domain that is not the global repository. Each local repository in the domain
can connect to the global repository and use objects in its shared folders.How can u work with
remote database in informatica?did u work directly by using remote connections?To
work with remote datasource u need to connect it with remote connections.But it is not
preferable to work with that remote source directly by using remote connections .Instead u bring that
source into U r local machine where informatica server resides.If u work directly with remote source the
session performance will decreases by passing less amount of data across the network in a particular
time.
You can work with remote,
Configure FTP
Connection details
IP address
User authentication
what is incremantal aggregation?When using incremental aggregation, you apply captured
changes in the source to aggregate calculations in a session. If the source changes only incrementally
and you can capture changes, you can configure the session to process only those changes. This allows
the Informatica Server to update your target incrementally, rather than forcing it to process the entire
source and recalculate the same calculations each time you run the session.What r the scheduling
options to run a sesion?U can shedule a session to run at a given time or intervel,or u can manually
run the session.
Run only on demand: server runs the session only when user starts session explicitly
Run once: Informatica server runs the session only once at a specified date and time.
Run every: Informatica server runs the session at regular intervels as u configured.
Customized repeat: Informatica server runs the session at the dats and times secified in the repeat dialog
box.What is tracing level and what r the types of tracing level?Tracing level represents the
amount of information that informatcia server writes in a log file.
Types of tracing level
Normal
Verbose
Verbose init
Verbose dataWhat is difference between stored procedure transformation and external
procedure transformation?In case of storedprocedure transformation procedure will be compiled
and executed in a relational data source.U need data base connection to import the stored procedure in
to u’r maping.Where as in external procedure transformation procedure or function will be executed out
side of data source.Ie u need to make it as a DLL to access in u r maping.No need to have data base
connection in case of external procedure transformation.Explain about Recovering sessions?If
you stop a session or if an error causes a session to stop, refer to the session and error logs to determine
the cause of failure. Correct the errors, and then complete the session. The method you use to complete
the session depends on the properties of the mapping, session, and Informatica Server configuration.
Use one of the following methods to complete the session:
· Run the session again if the Informatica Server has not issued a commit.
· Truncate the target tables and run the session again if the session is not recoverable.
· Consider performing recovery if the Informatica Server has issued at least one commit. If a session
fails after loading of 10,000 records in to the target.How can u load the records from
10001 th record when u run the session next time?As explained above informatcia server has
3 methods to recovering the sessions.Use performing recovery to load the records from where the
session fails.Explain about perform recovery?When the Informatica Server starts a recovery
session, it reads the OPB_SRVR_RECOVERY table and notes the row ID of the last row committed to
the target database. The Informatica Server then reads all sources again and starts processing from the
next row ID. For example, if the Informatica Server commits 10,000 rows before the session fails, when
you run recovery, the Informatica Server bypasses the rows up to 10,000 and starts loading with row
10,001.
By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery in
the Informatica Server setup before you run a session so the Informatica Server can create and/or write
entries in the OPB_SRVR_RECOVERY table.How to recover the standalone session?A
standalone session is a session that is not nested in a batch. If a standalone session fails, you can run
recovery using a menu command or pmcmd. These options are not available for batched sessions.
If you do not clear Perform Recovery, the next time you run the session, the Informatica Server attempts
to recover the previous session.
If you do not configure a session in a sequential batch to stop on failure, and the remaining sessions in
the batch complete, recover the failed session as a standalone session. How to recover sessions
in concurrent batches?If multiple sessions in a concurrent batch fail, you might want to truncate all
targets and run the batch again. However, if a session in a concurrent batch fails and the rest of the
sessions complete successfully, you can recover the session as a standalone session.
To recover a session in a concurrent batch:
1.Copy the failed session using Operations-Copy Session.
2.Drag the copied session outside the batch to be a standalone session.
3.Follow the steps to recover a standalone session.
4.Delete the standalone copy.How can u complete unrcoverable sessions?Under certain
circumstances, when a session does not complete, you need to truncate the target tables and run the
session from the beginning. Run the session from the beginning when the Informatica Server cannot run
recovery or when running recovery might result in inconsistent data.What r the circumstances that
infromatica server results an unreciverable session?The source qualifier transformation does
not use sorted ports.
If u change the partition information after the initial session fails.
Perform recovery is disabled in the informatica server configuration.
If the sources or targets changes after initial session fails.
If the maping consists of sequence generator or normalizer transformation.
If a concuurent batche contains multiple failed sessions.If i done any modifications for my table
in back end does it reflect in informatca warehouse or maping desginer or source
analyzer?NO. Informatica is not at all concern with back end data base.It displays u all the information
that is to be stored in repository.If want to reflect back end changes to informatica screens,
again u have to import from back end to informatica by valid connection.And u have to replace the
existing files with imported files.After draging the ports of three sources(sql
server,oracle,informix) to a single source qualifier, can u map these three ports
directly to target?NO.Unless and until u join those three ports in source qualifier u cannot map them
directly
if u drag three hetrogenous sources and populated to target without any join means you are entertaining
Carteisn product. If you don't use join means not only diffrent sources but homegeous sources are show
same error.
If you are not interested to use joins at source qualifier level u can add some joins sepratly.
What are Target Types on the Server?Target Types are File, Relational, XML and ERP What are
Target Options on the Servers?Target Options for File Target type are FTP File, Loader and MQ.
There are no target options for ERP target type
Target Options for Relational are Insert, Update (as Update), Update (as Insert), Update (else Insert),
Delete, and Truncate Table
How do you identify existing rows of data in the target table using lookup transformation?
Can identify existing rows of data using unconnected lookup transformation.
You can use a Connected Lookup with dynamic cache on the target
What are Aggregate transformation?
Aggregator transform is much like the Group by clause in traditional SQL.
this particular transform is a connected/active transform which can take the incoming data form
the mapping pipeline and group them based on the group by ports specified and can calculated
aggregate funtions like ( avg, sum, count, stddev....e.tc) for each of those groups.
From a performanace perspective if your mapping has an AGGREGATOR transform use filters and
sorters very early in the pipeline if there is any need for them.
What are various types of Aggregation?
Various types of aggregation are SUM, AVG, COUNT, MAX, MIN, FIRST, LAST, MEDIAN,
PERCENTILE, STDDEV, and VARIANCE.
What is Code Page Compatibility?Compatibility between code pages is used for accurate data
movement when the Informatica Sever runs in the Unicode data movement mode. If the code pages are
identical, then there will not be any data loss. One code page can be a subset or superset of another. For
accurate data movement, the target code page must be a superset of the source code page.
Superset - A code page is a superset of another code page when it contains the character encoded in the
other code page, it also contains additional characters not contained in the other code page.
Subset - A code page is a subset of another code page when all characters in the code page are encoded
in the other code page.
What is Code Page used for?
Code Page is used to identify characters that might be in different languages. If you are importing
Japanese data into mapping, u must select the Japanese code page of source data.
what is a source qualifier?
It is a transformation which represents the data Informatica server reads from source.
The Source Qualifier represents the rows that the Informatica Server reads when it executes a session. It
represents all data queried from the source.
What are Dimensions and various types of Dimensions?
set of level properties that describe a specific aspect of a business, used for analyzing the factual
measures of one or more cubes, which use that dimension. Egs. Geography, time, customer and
product.
What is Data Transformation Manager?After the load manager performs validations for the
session, it creates the DTM process. The DTM process is the second process associated with the session
run. The primary purpose of the DTM process is to create and manage threads that carry out the session
tasks.
· The DTM allocates process memory for the session and divide it into buffers. This is
also known as buffer memory. It creates the main thread, which is called the master
thread. The master thread creates and manages all other threads.
· If we partition a session, the DTM creates a set of threads for each partition to allow
concurrent processing.. When Informatica server writes messages to the session log it
includes thread type and thread ID. Following are the types of threads that DTM creates:
Master thread - Main thread of the DTM process. Creates and manages all other
threads.Mapping thread - One Thread to Each Session. Fetches Session and Mapping
Information.Pre and Post Session Thread-One Thread each to Perform Pre and Post Session
Operations.reader thread-One Thread for Each Partition for Each Source Pipeline.WRITER
THREAD-One Thread for Each Partition if target exist in the source pipeline write to the
target.tRANSFORMATION THREAD - One or More Transformation Thread For Each Partition.
What is Session and Batches?Session - A Session Is A set of instructions that tells the Informatica
Server How And When To Move Data From Sources To Targets. After creating the session, we can use
either the server manager or the command line program pmcmd to start or stop the session.Batches - It
Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The Informatica Server.
There Are Two Types Of Batches :
Sequential - Run Session One after the Other.concurrent - Run Session At The Same Time.
Why we use lookup transformations?Lookup Transformations can access data from relational
tables that are not sources in mapping. With Lookup transformation, we can accomplish the following
tasks:
Get a related value-Get the Employee Name from Employee table based on the Employee
IDPerform Calculation.
Update slowly changing dimension tables - We can use unconnected lookup transformation to
determine whether the records already exist in the target or not.
Informatica allows end users and partners to extend the metadata stored in the repository by associating
information with individual objects in the repository. For example, when you create a mapping, you can
store your contact information with the mapping. You associate information with repository metadata
using metadata extensions.
Informatica Client applications can contain the following types of metadata extensions:
• Vendor-defined. Third-party application vendors create vendor-defined metadata extensions.
You can view and change the values of vendor-defined metadata extensions, but you cannot
create, delete, or redefine them.
• User-defined. You create user-defined metadata extensions using PowerCenter/PowerMart.
You can create, edit, delete, and view user-defined metadata extensions. You can also change
the values of user-defined extensions.
what is ODS (operation data source)
ANS1: ODS - Operational Data Store.
ODS Comes between staging area & Data Warehouse. The data is ODS will be at the low level of
granularity.
Once data was poopulated in ODS aggregated data will be loaded into into EDW through ODS.
ANS2: An updatable set of integrated operational data used for enterprise- wide tactical decision
making.Contains live data, not snapshots,and has minimal history retained
can we lookup a table from source qualifier transformation. ie. unconnected lookup
You cannot lookup from a source qualifier directly. However, you can override the SQL in the source
qualifier to join with the lookup table to perform the lookup.
What are the different Lookup methods used in Informatica?
In the lookup transormation mainly 2 types
1)connected 2)unconnected lookup
Connected lookup: 1)It recive the value directly from pipeline
2)it iwill use both dynamic and static
3)it return multiple value
4)it support userdefined value
Unconnected lookup:it recives the value :lkp expression
2)it will be use only dynamic
3)it return only single value
4)it does not support user defined values
What are parameter files ? Where do we use them?
Parameter file is any text file where u can define a value for the parameter defined in the informatica
session, this parameter file can be referenced in the session properties,When the informatica sessions
runs the values for the parameter is fetched from the specified file. For eg : $$ABC is defined in the
infomatica mapping and the value for this variable is defined in the file called abc.txt as
[foldername_session_name]
ABC='hello world"
In the session properties u can give in the parameter file name field abc.txt
What is a mapping, session, worklet, workflow, mapplet?
Mapping - represents the flow and transformation of data from source to taraget.
Mapplet - a group of transformations that can be called within a mapping.
Session - a task associated with a mapping to define the connections and other configurations for that
mapping.
Workflow - controls the execution of tasks such as commands, emails and sessions.
Worklet - a workflow that can be called within a workflow.
Session - a task associated with a mapping to define the connections and other configurations for that
mapping. Workflow - controls the execution of tasks such as commands, emails and sessions. Worklet - a workflow
that can be called within a workflow. Mapping - represents the flow and transformation of data from source to
taraget.
High-end warehouses
Global as well as local repositories
ERP support.
Can Informatica load heterogeneous targets from heterogeneous sources?
yes! it loads from heterogeneous sources..
What are the various tools? - Name a few
The various ETL tools are as follows.
Informatica
Datastage
Business Objects Data Integrator
Abinitio,
Cognos
Business Objects
What are snapshots? What are materialized views & where do we use them? What is
a materialized view log?
Materialized view is a view in wich data is also stored in some temp table.i.e if we will go with the View
concept in DB in that we only store query and once we call View it extract data from DB.But In
materialized View data is stored in some temp tables.
What is partitioning? What are the types of partitioning?
Partitioning is a part of physical data warehouse design that is carried out to improve performance and
simplify stored-data management. Partitioning is done to break up a large table into smaller,
independently-manageable components because it:
1. reduces work involved with addition of new data.
2. reduces work involved with purging of old data.
Aggregator Transformation
Expression Transformation
Filter Transformation
Joiner Transformation
Lookup Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Sequence Generator Transformation
Stored Procedure Transformation
Sorter Transformation
Update Strategy Transformation
XML Source Qualifier Transformation
Advanced External Procedure Transformation
External Transformation
Three tier data warehouse contains three tier such as bottom tier, middle tier and top tier.
Bottom tier deals with retrieving related data’s or information from various information repositories by
using SQL.
Middle tier contains two types of servers.
1. ROLAP server
2. MOLAP server
Top tier deals with presentation or visualization of the results . The 3 tiers are:
1. Data tier - bottom tier - consists of the database
2. Application tier - middle tier - consists of the analytical server
3. Presentation tier - tier that interacts with the end-user
Do we need an ETL tool? When do we go for the tools in the market?
ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making.
Before the evolution of ETL Tools, the above mentioned ETL process was done manually by using SQL
code created by programmers. This task was tedious and cumbersome in many cases since it involved
many resources, complex coding and more work hours. On top of it, maintaining the code placed a great
challenge among the programmers.
These difficulties are eliminated by ETL Tools since they are very powerful and they offer many
advantages in all stages of ETL process starting from extraction, data cleansing, data profiling,
transformation, debugging and loading into data warehouse when compared to the old method.
1. Normally ETL Tool stands for Extraction Transformation Loader
3. If you have a requirement like this you need to get the ETL tools, else you no need any
ETL
After creating a variable, we can use it in any expression in a mapping or a mapplet. Als they can be
used in source qualifier filter, user defined joins or extract overrides and in expression editor of reusable
transformations.
Their values can change automatically between sessions.
What are the various methods of getting incremental records or delta records from the source
systems
You can use a Command task to call the shell scripts, in the following ways:
1. Standalone Command task. You can use a Command task anywhere in the workflow or worklet to run
shell commands.
2. Pre- and post-session shell command. You can call a Command task as the pre- or post-session shell
command for a Session task. For more information about specifying pre-session and post-session shell
commands
Informatica Metadata contains all the information about the source tables, target tables, the
transformations, so that it will be useful and easy to perform transformations during the ETL process.
When the data in the data warehouse changes frequently we need to analyze the tables. Analyze tables
will compute/update the table statistics, that will help to boost the performance of your SQL.
There are pros and cons of both tool based ETL and hand-coded ETL. Tool based ETL provides
maintainability, ease of development and graphical view of the flow. It also reduces the learning curve on
the team.
Handcoded ETL is good when there is minimal transformational logic involved. It is also good when the
sources and targets are in the same environment. However, depending on the skill level of the team, this
can extend the overall development time.
Primary Key Materialized ViewsThe following statement creates the primary-key
materialized view on the table emp located on a remote database.SQL> CREATE
MATERIALIZED VIEW mv_emp_pk REFRESH FAST START WITH SYSDATE
NEXT SYSDATE + 1/48 WITH PRIMARY KEY AS SELECT * FROM
emp@remote_db; Materialized view created.Note: When you create a materialized view
using the FAST option you will need to create a view log on the master tables(s) as shown
below:SQL> CREATE MATERIALIZED VIEW LOG ON emp;Materialized view log
created.Rowid Materialized ViewsThe following statement creates the row id materialized
view on table emp located on a remote database:SQL> CREATE MATERIALIZED VIEW
mv_emp_rowid REFRESH WITH ROWID AS SELECT * FROM
emp@remote_db; Materialized view log created.Sub query Materialized ViewsThe
following statement creates a sub query materialized view based on the emp and dept
tables located on the remote database:SQL> CREATE MATERIALIZED VIEW mv_empdeptAS
SELECT * FROM emp@remote_db eWHERE EXISTS (SELECT * FROM dept@remote_db
d WHERE e.dept_no = d.dept_no)REFRESH CLAUSE[refresh [fast|
complete|force] [on demand | commit] [start with date] [next
date] [with {primary key|rowid}]]The refresh option specifies:
a. The refresh method used by Oracle to refresh data in materialized view
b. Whether the view is primary key based or row-id based
c. The time and interval at which the view is to be refreshed
Refresh Method - FAST ClauseThe FAST refreshes use the materialized view logs (as seen
above) to send the rows that have changed from master tables to the materialized view.You
should create a materialized view log for the master tables if you specify the REFRESH FAST
clause. SQL> CREATE MATERIALIZED VIEW LOG ON emp; Materialized view log
created.Materialized views are not eligible for fast refresh if the defined subquery contains
an analytic function.Refresh Method - COMPLETE ClauseThe complete refresh re-creates the
entire materialized view. If you request a complete refresh, Oracle performs a complete
refresh even if a fast refresh is possible.Refresh Method - FORCE ClauseWhen you specify a
FORCE clause, Oracle will perform a fast refresh if one is possible or a complete refresh
otherwise. If you do not specify a refresh method (FAST, COMPLETE, or FORCE), FORCE is
the default.PRIMARY KEY and ROWID ClauseWITH PRIMARY KEY is used to create a primary
key materialized view i.e. the materialized view is based on the primary key of the master
table instead of ROWID (for ROWID clause). PRIMARY KEY is the default option. To use the
PRIMARY KEY clause you should have defined PRIMARY KEY on the master table or else you
should use ROWID based materialized views.Primary key materialized views allow
materialized view master tables to be reorganized without affecting the eligibility of the
materialized view for fast refresh. Rowid materialized views should have a single master
table and cannot contain any of the following:
• Distinct or aggregate functions
• GROUP BY Subqueries , Joins & Set operations
Timing the refreshThe START WITH clause tells the database when to perform the
first replication from the master table to the local base table. It should evaluate to a future
point in time. The NEXT clause specifies the interval between refreshesSQL> CREATE
MATERIALIZED VIEW mv_emp_pk REFRESH FAST START WITH SYSDATE
NEXT SYSDATE + 2 WITH PRIMARY KEY AS SELECT * FROM
emp@remote_db; Materialized view created.In the above example, the first copy of the
materialized view is made at SYSDATE and the interval at which the refresh has to be
performed is every two days.
Bottom of Form
A passive transformation does not change the number of rows that pass through it, such as an
Expression transformation that performs a calculation on data and passes all rows through the
transformation.
What is tracing level and what are the types of tracing levels?
Tracing level represents the amount of information that informatcia server writes in a log file.
0 comments
Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz
Mapping thread: One mapping thread will be creates for each session.Fectchs session and
mapping information.
Pre and post session threads: This will be created to perform pre and post session
operations.
Reader thread: One thread will be created for each partition of a source. It reads data from
source.
If we are using Update Strategy Transformation in a mapping how can we know whether
insert or update or reject or delete option has been selected during running of sessions
in Informatica?
In Designer while creating Update Strategy Transformation uncheck "forward to next
transformation". If any rejected rows are there automatically it will be updated to the session log
file.
Update or insert files are known by checking the target file or table only.
When you drag and drop the tables you will be getting the source qualifier for each table. Delete
all the source qualifiers. Add a common source qualifier for all. Right click on the source qualifier
you will find EDIT, click on it. Click on the properties tab and then you will find sql query in that
you can write your sql.
Which is better among incremental load, Normal Load and Bulk load?
It depends on the requirement. Otherwise Incremental load can be better as it takes only that
data which is not available previously on the target.
What are the tasks that Load manger process will do?
Manages the session and batch scheduling: When you start the informatica server the load
manager launches and queries the repository for a list of sessions configured to run on the
informatica server. When you configure the session the load manager maintains list of list of
sessions and session start times. When you start a session load manger fetches the session
information from the repository to perform the validations and verifications prior to starting DTM
process.
Locking and reading the session: When the informatica server starts a session load manager
locks the session from the repository. Locking prevents starting the session again and again.
Reading the parameter file: If the session uses a parameter files,loadmanager reads the
parameter file and verifies that the session level parameters are declared in the file
Verifies permission and privileges: When the session starts load manger checks whether or
not the user have privileges to run the session.
Creating log files: Load manger creates log file contains the status of session.
0 comments
Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz
Target definitions: Definitions of database objects or files that contain the target data.
Multi-dimensional metadata: Target definitions that are configured as cubes and dimensions.
Mappings: A set of source and target definitions along with transformations containing business
logic that you build into the transformation. These are the instructions that the Informatica
Server uses to transform and move data.
Sessions and workflows: Sessions and workflows store information about how and when the
Informatica Server moves data. A workflow is a set of instructions that describes how and when
to run tasks related to extracting, transforming, and loading data. A session is a type of task that
you can put in a workflow. Each session corresponds to a single mapping.
What is the difference between dimension table and fact table and what are different
dimension tables and fact tables?
Fact table contain measurable data, contains primary key
0 comments
Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz
Can you use the mapping parameters or variables created in one mapping into any other
reusable transformation?
Yes. Because reusable transformation is not contained with any maplet or mapping.
Can you use a session Bulk loading options and during this time can you make a
recovery to the session?
If the session is configured to use in bulk mode it will not write recovery information to recovery
tables. So Bulk loading will not perform the recovery as required.
Unconnected lookup:
1) Receives input values from the result of a lkp expression in a another transformation.
2) You can use a static cache.
3) Cache includes all lookup output ports in the lookup condition and the lookup/return port.
4) Does not support user defined default values.
0 comments
Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz
In a sequential batch can you run the session if previous session fails?
Yes. By setting the option always runs the session.
What are the basic needs to join two sources in a source qualifier?
Basic need to join two sources using source qualifier:
1) Both sources should be in same database
2) The should have at least one column in common with same data types
0 comments
Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz
Mapping parameter values remain constant. If we need to change the parameter value then we
needs to edit the parameter file.
But value of mapping variables can be changed by using variable function. If we need to
increment the attribute value by 1 after every session run then we can use mapping variables.
In a mapping parameter we need to manually edit the attribute value in the parameter file after
every session run.
What is the method of loading 5 flat files of having same structure to a single target and
which transformations I can use?
Two Methods.
1. Write all files in one directory then use file repository concept (don’t forget to type source file
type as indirect in the session).
2. Use union transformation to combine multiple input files into a single target.
What is the difference between Stored Procedure (DB level) and Stored proc trans
(INFORMATICA level) ? Why should we use SP trans ?
First of all stored procedures (at DB level) are series of SQL statement. And those are stored
and compiled at the server side. In the Informatica it is a transformation that uses same stored
procedures which are stored in the database. Stored procedures are used to automate time-
consuming tasks that are too complicated for standard SQL statements. if you don't want to use
the stored procedure then you have to create expression transformation and do all the coding in
it.
0 comments
Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz
Expression:
Router:
1.for duplicate record: condition: falg = 'Y'
2. For distinct Records condition
flag = 'N'
What are the real times problems that generally come up while doing/running
mapping/any transformation? Explain with an example?
Here are few real time examples of problems while running informatica mappings:
3) If we have mappings loading multiple target tables we have to provide the Target Load Plan
in the sequence we want them to get loaded.
4) Error: Snapshot too old is a very common error when using Oracle tables. We get this error
while using too large tables. Ideally we should schedule these loads when server is not very
busy (meaning when no other loads are running).
5) We might get some poor performance issues while reading from large tables. All the source
tables should be indexed and updated regularly.
In update strategy target table or flat file which gives more performance? Why?
Pros: Loading, Sorting, Merging operations will be faster as there is no index concept and Data
will be in ASCII mode.
Cons: There is no concept of updating existing records in flat file. As there is no indexes, while
lookups speed will be lesser.
What is the difference between constraint base load ordering and target load plan ?
Constraint based load ordering
Example:
Table 1---Master
Take 2---Detail
If the data in Table-1 is dependent on the data in Table-2 then Table-2 should be loaded first. In
such cases to control the load order of the tables we need some conditional loading which is
nothing but constraint based load. In Informatica this feature is implemented by just one check
box at the session level.
For UNIX shell users, enclose the parameter file name in single quotes:
-paramfile '$PMRootDir/myfile.txt'
For Windows command prompt users, the parameter file name cannot have beginning or
trailing spaces. If the name includes spaces, enclose the file name in double quotes:
-paramfile ?$PMRootDirmy file.txt?
Note: When you write a pmcmd command that includes a parameter file located on another
machine, use the backslash () with the dollar sign ($). This ensures that the machine where the
variable is defined expands the server variable.
Pmcmd startworkflow -UV USERNAME -PV PASSWORD -s SALES: 6258 -f east -w wSalesAvg
-paramfile '$PMRootDir/myfile.txt'
What are the difference between joiner transformation and source qualifier
transformation?
Joiner Transformation can be used to join tables from heterogeneous (different sources), but we
still need a common key from both tables. If we join two tables without a common key we will
end up in a Cartesian Join. Joiner can be used to join tables from difference source systems
where as Source qualifier can be used to join tables in the same database. We definitely need a
common key to join two tables no mater they are in same database or difference databases.
Explain difference between static and dynamic cache with one example?
Static Cache: Once the data is cached, it will not change. Example unconnected lookup uses
static cache.
Dynamic Cache: The cache is updated as to reflect the update in the table (or source) for
which it is referring to. (Ex. connected lookup).
Within a session. When you configure a session, you can instruct the Informatica Server to
either treat all rows in the same way (for example, treat all rows as inserts), or use instructions
coded into the session mapping to flag rows for different database operations.
Within a mapping. Within a mapping, you use the Update Strategy transformation to flag rows
for insert, delete, update, or reject.
How the informatica server sorts the string values in Rank transformation?
We can run informatica server either in UNICODE data moment mode or ASCII data moment
mode.
Unicode mode: In this mode informatica server sorts the data as per the sorted order in
session.
ASCII Mode: In this mode informatica server sorts the date as per the binary order.
When do you use an unconnected lookup and connected lookup?
Or
what is the difference between dynamic and static lookup?
Or
Why and when do we use dynamic and static lookup?
In static lookup cache, you cache all the lookup data at the starting of the session. In dynamic
lookup cache, you go and query the database to get the lookup value for each record which
needs the lookup. Static lookup cache adds to the session run time, but it saves time as
informatica does not need to connect to your database every time it needs to lookup. Depending
on how many rows in your mapping needs a lookup, you can decide on this. Also remember
that static lookup eats up space. so remember to select only those columns which are needed.
Steps:
1. First validate the mapping
2.Create session on the mapping and then run workflow.
Once the session is succeeded then right click on session and go for statistics tab.There you
can see how many numbers of source rows are applied and how many number of rows loaded
in to targets and how many number of rows rejected. This is called Quantitative testing.
If once rows are successfully loaded then we will go for qualitative testing.
Steps:
1.Take the DATM (DATM means where all business rules are mentioned to the corresponding
source columns) and check whether the data is loaded according to the DATM in to target table.
If any data is not loaded according to the DATM then go and check in the code and rectify it.
What are the output files that the informatica server creates during the session
run
What are the output files that the informatica server creates during the session
run?
Informatica server log: Informatica server(on Unix) creates a log for all status and
error messages(default name: pm.server.log). It also creates an error log for error
messages. These files will be created in informatica home directory.
Session log file: Informatica server creates session log file for each session. It writes
information about session into log files such as initialization process, creation of sql
commands for reader and writer threads, errors encountered and load summary. The
amount of detail in session log file depends on the tracing level that you set.
Session detail file: This file contains load statistics for each target in mapping. Session
detail include information such as table name, number of rows written or rejected you
can view this file by double clicking on the session in monitor window.
Performance detail file: This file contains information known as session performance
details which helps you where performance can be improved. To generate this file
select the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does not write to targets.
Control file: Informatica server creates control file and a target file when you run a
session that uses the external loader. The control file contains the information about the
target flat file such as data format and loading instructions for the external loader.
Post session email: Post session email allows you to automatically communicate
information about a session run to designated recipents.You can create two different
messages. One if the session completed successfully the other if the session fails.
Indicator file: If you use the flat file as a target, you can configure the informatica
server to create indicator file. For each target row, the indicator file contains a number to
indicate
whether the row was marked for insert, update, delete or reject.
Output file: If session writes to a target file, the informatica server creates the target file
based on file properties entered in the session property sheet.
Cache files: When the informatica server creates memory cache it also creates cache
files.
For the following circumstances informatica server creates index and data cache
files:
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation
How do you handle decimal places while importing a flat file into informatica?
While importing flat file definition just specify the scale for a numeric data type. In the mapping,
the flat file source supports only number data type(no decimal and integer). In the SQ
associated with that source will have a data type as decimal for that number port of the source.
Source - Number data type port - SQ - decimal datatype. Integer is not supported. Hence
decimal is taken care.
The goal of performance tuning is to optimize session performance so sessions run during the
available load window for the Informatica Server.Increase the session performance by following:
The performance of the Informatica Server is related to network connections. Data generally
moves across a network at less than 1 MB per second, whereas a local disk moves data five to
twenty times faster. Thus network connections often affect on session performance.So aviod
netwrok connections.
Flat files: If your flat files stored on a machine other than the informatca server, move those files
to the machine that consists of informatica server.
Relational datasources: Minimize the connections to sources, targets and informatica server to
improve session performance.Moving target database into server system may improve session
performance.
Staging areas: If you use staging areas you force informatica server to perform multiple
datapasses.Removing of staging areas may improve session performance.
You can run the multiple informatica servers’ againist the same repository.Distibuting the
session load to multiple informatica servers may improve session performance.
Run the informatica server in ASCII datamovement mode improves the session performance.
Because ASCII datamovement mode stores a character value in one byte.Unicode mode takes
2 bytes to store a character.
If a session joins multiple source tables in one Source Qualifier, optimizing the query may
improve performance. Also, single table select statements with an ORDER BY or GROUP BY
clause may benefit from optimization such as adding indexes.
We can improve the session performance by configuring the network packet size, which allows
data to cross the network at one time. To do this go to server manger, choose server configure
database connections.
If your target consists key constraints and indexes you slow the loading of data. To improve the
session performance in this case drop constraints and indexes before you run the session and
rebuild them after completion of session.
Running parallel sessions by using concurrent batches will also reduce the time of loading the
data. So concurent batches may also increase the session performance.
Partitioning the session improves the session performance by creating multiple connections to
sources and targets and loads data in paralel pipe lines.
In some cases if a session contains an aggregator transformation, you can use incremental
aggregation to improve session performance.
If the session contained lookup transformation you can improve the session performance by
enabling the look up cache.
If your session contains filter transformation, create that filter transformation nearer to the
sources or you can use filter condition in source qualifier.
Aggreagator, Rank and joiner transformation may often decrease the session performance
.Because they must group data before processing it. To improve session performance in this
case use sorted ports option.
Snow flake schema: The normalized principles applied star schema is known as Snow flake
schema. Every dimension table is associated with sub dimension table.
Differences:
• A dimension table will not have parent table in star schema, whereas snow flake
schemas have one or more parent tables.
• The dimensional table itself consists of hierarchies of dimensions in star schema, where
as hierarchies are split into different tables in snow flake schema. The drilling down data
from top most hierarchies to the lowermost hierarchies can be done.
When a view is created, the data is not stored in the database. The data is created when a
query is fired on the view. Whereas, data of a materialized view is stored.
Snow flake schema: The normalized principles applied star schema is known as Snow
flake schema. Every dimension table is associated with sub dimension table.
Differences:
• A dimension table will not have parent table in star schema, whereas snow flake
schemas have one or more parent tables.
• The dimensional table itself consists of hierarchies of dimensions in star schema,
where as hierarchies are split into different tables in snow flake schema. The
drilling down data from top most hierarchies to the lowermost hierarchies can be
done.
When a view is created, the data is not stored in the database. The data is created
when a query is fired on the view. Whereas, data of a materialized view is stored.
Describe the foreign key columns in fact table and dimension table?
The primary keys of entity tables are the foreign keys of dimension tables.
The Primary keys of fact dimensional table are the foreign keys of fact tables.
Determining data cardinality is a substantial aspect used in data modeling. This is used to
determine the relationships
Types of cardinalities:
The Link Cardinality - 0:0 relationships
The Sub-type Cardinality - 1:0 relationships
The Physical Segment Cardinality - 1:1 relationship
The Possession Cardinality - 0: M relation
The Child Cardinality - 1: M mandatory relationship
The Characteristic Cardinality - 0: M relationship
The Paradox Cardinality - 1: M relationship.