You are on page 1of 70

Informatica Question and Answers

what is rank transformation?where can we use this ...

Rank transformation is used to find the status.ex if we have one sales table and in this if we find more
employees selling the same product and we are in need to find the first 5 0r 10 employee who is selling
more products.we can go for rank transformation.

Where is the cache stored in informatica?

cache stored in informatica is in informatica server.


If you want to create indexes after the load process which transformation you
choose?stored procedure transformation In a joiner transformation, you should specify the
source with fewer rows as the master source. Why? In joiner transformation Inforrmatica
server reads all the records from master source builds index and data caches based on master table rows
after building the caches the joiner transformation reads records from the detail source and perform
joins What happens if you try to create a shortcut to a non-shared folder? It only creates
a copy of it.
What is Transaction?

A transaction can be defined as DML operation.


means it can be insertion, modification or deletion of data performed by users/
analysts/applicators

Can any body write a session parameter file which will change the source and
targets for every session i.e different source and targets for each session run.
You are supposed to define a parameter file. And then in the Parameter file, you can define two
parameters, one for source and one for target.
Give like this for example:
$Src_file = c:\program files\informatica\server\bin\abc_source.txt
$tgt_file = c:\targets\abc_targets.txt
Then go and define the parameter file:
[folder_name.WF:workflow_name.ST:s_session_name]
$Src_file =c:\program files\informatica\server\bin\abc_source.txt
$tgt_file = c:\targets\abc_targets.txt
If its a relational db, you can even give an overridden sql at the session level...as a parameter. Make sure
the sql is in a single line.
Informatica Live Interview Questions
here are some of the interview questions i could not answer, any body can help giving answers for others
also.
thanks in advance.

Explain grouped cross tab?


Explain reference cursor
What are parallel query's and query hints
What is meta data and system catalog
What is factless fact schema
What is confirmed dimension
Which kind of index is preferred in DWH
Why do we use DSS database for OLAP tools
confirmed dimension == one dimension that shares with two fact table
factless means, fact table without measures only contains foreign keys-two types of factless
table, one is event tracking and other is coverage table
Bit map indexes preferred in the data ware housing
Metadate is data about data, here every thing is stored example-mapping, sessions, privileges
other data, in informatica we can see the Metadate in the repository.
System catalog that we used in the cognos, that also contains data, tables, privileges,
predefined filter etc, using this catalog we generate reports
group cross tab is a type of report in cognos, where we have to assign 3 measures for
getting the result
What is meant by Junk Attribute in Informatica?

Junk Dimension A Dimension is called junk dimension if it contains attribute which are rarely changed
ormodified. example In Banking Domain , we can fetch four attributes accounting to a junk dimensions
like from the Overall_Transaction_master table tput flag tcmp flag del flag advance flag all these
attributes can be a part of a junk dimensions.
Can anyone explain about incremental aggregation with an example?
When you use aggregator transformation to aggregate it creates index and data caches to store the data
1.Of group by columns 2. Of aggregate columns
the incremental aggregation is used when we have historical data in place which will be used in
aggregation incremental aggregation uses the cache which contains the historical data and for each
group by column value already present in cache it add the data value to its corresponding data cache
value and outputs the row in case of a incoming value having no match in index cache the new values for
group by and output ports are inserted into the cache .
Difference between Rank and Dense Rank?

Rank:
1
2<--2nd position
2<--3rd position
4
5
Same Rank is assigned to same totals/numbers. Rank is followed by the Position. Golf game usually
Ranks this way. This is usually a Gold Ranking.
Dense Rank:
1
2<--2nd position
2<--3rd position
3
4
Same ranks are assigned to same totals/numbers/names. The next rank follows the serial number.
About Informatica Power center 7:
1) I want to know which mapping properties can be overridden on a Session Task
level.
2)Know what types of permissions are needed to run and schedule Work flows.
1) I want to Know which mapping properties can be overridden on a Session Task level?
You can override any properties other than the source and targets. Make sure the source and targets
exist in your db if it is a relational db. If it is a flat file, you can override its properties. You can override sql
if its a relational db, session log, DTM buffer size, cache sizes etc.

2) Know what types of permissions are needed to run and schedule Work flows
You need execute permissions on the folder to run/schedule a workflow. You may have read and write.
But u need execute permissions as well.
Can any one explain real time complain mappings or complex transformations in
Informatica.
Especially in Sales Domain.
Most complex logic we use is denormalization. We don’t have any Denormalizer transformation in
Informatica. So we will have to use an aggregator followed by an expression. Apart from this, we use
most of the complex in expression transformation involving lot of nested IIF and Decode
statements...another one is the union transformation and joiner.
How do you create a mapping using multiple lookup transformation?
Use unconnected lookup if same lookup repeats multiple times.
In the source, if we also have duplicate records and we have 2 targets, T1- for unique
values and T2- only for duplicate values. How do we pass the unique values to T1
and duplicate values to T2 from the source to these 2 different targets in a single
mapping?
Soln1: source--->sq--->exp-->sorter (with enable select distinct check box) --->t1
--->aggregator (with enabling group by and write count function) --->t2
If u wants only duplicates to t2 u can follow this sequence
--->agg (with enable group by write this code decode(count(col),1,1,0))---
>Filter(condition is 0)--->t2.
Soln2: take two source instances and in first one embedded distinct in the source qualifier and connect
it to the target t1.
and just write a query in the second source instance to fetch the duplicate records and connect it to the
target t2.
<< if u use aggregator as suggested by my friend u will get duplicate as well as distinct records in the
second target >>
Soln3: Use a sorter transformation. Sort on key fields by which u want to find the duplicates. then use
an expression transformation.
Example:
Example:
field1-->
field2-->
SORTER:
field1 --ascending/descending
field2 --ascending/descending
Expression:
--> field1
--> field2
<--> v_field1_curr = field1
<--> v_field2_curr = field2
v_dup_flag = IIF(v_field1_curr = v_field1_prev, true, false)
o_dup_flag = IIF(v_dup_flag = true, 'Duplicate', 'Not Duplicate'
<--> v_field1_prev = v_field1_curr
<--> v_field2_prev = v_field2_curr
Use a Router transformation and put o_dup_flag = 'Duplicate' in T2 and 'Not Duplicate' in T1.
Informatica evaluates row by row. So as we sort, all the rows come in order and it will evaluate based on
the previous and current rows.
What are the enhancements made to Informatica 7.1.1 version when compared to
6.2.2 version?
In 7+ versions
- We can lookup a flat file - Union and custom transformation- There is
propagate option i.e., if we change any data type of a field, all the linked
columns will reflect that change- We can write to XML target.- We can use
up to 64 partitions What is the difference between Power Centre and Power Mart?

What is the procedure for creating Independent Data Marts from Informatica 7.1?
Power Centre have Multiple Repositories,where as Power mart have single repository(desktop
repository)Power Centre again linked to global repositor to share between users

Power center Powermart

No. of
repository n No. n No.

low&mid range
aplicability high end WH WH

global
repository supported not supported

local repository supported supported

ERP support available not available


What is lookup transformation and update strategy transformation and explain with
an example.
Look up transformation is used to lookup the data in a relational table, view, Synonym and Flat file.
The informatica server queries the lookup table based on the lookup ports used in the transformation.
It compares the lookup transformation port values to lookup table column values based on the lookup
condition
By using lookup we can get related value, Perform a calculation and Update SCD.
Two types of lookups
Connected
Unconnected
Update strategy transformation
This is used to control how the rows are flagged for insert, update, delete or reject.
To define a flagging of rows in a session it can be insert, Delete, Update or Data driven.
In Update we have three options
Update as Update
Update as insert
Update else insert
What is the logic will you implement to load the data in to one fact able from 'n'
number of dimension tables.
To load data into one fact table from more than one dimension tables. Firstly you need to create a fact
table and dimension tables. Later load data into individual dimensions by using sources and
transformations (aggregator, sequence generator, lookup) in mapping designer then to the fact table
connect the surrogate to the foreign key and the columns from dimensions to the fact.
After loading the data into the dimension tables we will load the data into the fact tables ... the reason
for this is that the dimension tables contain the data related to the fact table.
To load the data from dimension table to fact table is simple ..
assume (dimension table as source tables) and fact table as target. that all.....
Can i use a session Bulk loading option that time can i make a recovery to the
session?
If the session is configured to use in bulk mode it will not write recovery information to recovery tables. So
Bulk loading will not perform the recovery as required.
No, why because in bulk load u won’t create redo log file, when u normal load we create redo log file, but
in bulk load session performance increases.

How do you configure mapping in informatica

You should configure the mapping with the least number of transformations and expressions to do the
most amount of work possible. You should minimize the amount of data moved by deleting unnecessary
links between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup
transformations), limit connected input/output or output ports. Limiting the number of connected
input/output or output ports reduces the amount of data the transformations store in the data cache.
You can also perform the following tasks to optimize the mapping:
• Configure single-pass reading.
• Optimize datatype conversions.
• Eliminate transformation errors.
• Optimize transformations.
• Optimize expressions. You should configure the mapping with the least number of
transformations and expressions to do the most amount of work possible. You should minimize
the amount of data moved by deleting unnecessary links between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup
transformations), limit connected input/output or output ports. Limiting the number of connected
input/output or output ports reduces the amount of data the transformations store in the data
cache.
You can also perform the following tasks to optimize the mapping:

○ Configure single-pass reading.
○ Optimize datatype conversions.
○ Eliminate transformation errors.
○ Optimize transformations.
○ Optimize expressions.
What is difference between dimension table and fact table
and what are different dimension tables and fact tables
In the fact table contain measurable data and fewer columns and many rows,
It's contain primary key
Different types of fact tables:
Additive, non additive, semi additive
In the dimensions table contain textual description of data and also contain many columns, less rows
Its contain primary key
What are Work let and what use of work let and in which situation we can use it
Worklet is a set of tasks. If a certain set of task has to be reused in many workflows then we use work
lets. To execute a Work let, it has to be placed inside a workflow.
The use of work let in a workflow is similar to the use of mapplet in a mapping.
What are mapping parameters and variables in which situation we can use it
If we need to change certain attributes of a mapping after every time the session is run, it will be very
difficult to edit the mapping and then change the attribute. So we use mapping parameters and variables
and define the values in a parameter file. Then we could edit the parameter file to change the attribute
values. This makes the process simple.
Mapping parameter values remain constant. If we need to change the parameter values then we need to
edit the parameter file.
But value of mapping variables can be changed by using variable function. If we need to increment the
attribute value by 1 after every session run then we can use mapping variables
In a mapping parameter we need to manually edit the attribute value in the parameter file after every
session run.
explain use of update strategy transformation

Maintain the history data and maintain the most recent changes data.
what is meant by complex mapping,

Complex mapping means involved in more logic and more business rules.Actually in my project complex
mapping isIn my bank project, I involved in construct a 1 data ware houseMany customer is there in my
bank project, They r after taking loans relocated in to another place that time i feel to difficult maintain
both previous and current addressesin the sense i am using scd2This is an simple example of complex
mapping
I have an requirement where in the columns names in a table (Table A) should appear
in rows of target table (Table B) i.e. converting columns to rows. Is it possible
through Informatica? If so, how?
if data in tables as follows
Table A
Key-1 char(3);
table A values
_______
1
2
3

Table B
bkey-a char(3);
bcode char(1);
table b values
1T
1A
1G
2A
2T
2L
3A
and output required is as
1, T, A
2, A, T, L
3, A
the SQL query in source qualifier should be
select key_1,
max(decode( bcode, 'T', bcode, null )) t_code,
max(decode( bcode, 'A', bcode, null )) a_code,
max(decode( bcode, 'L', bcode, null )) l_code
from a, b
where a.key_1 = b.bkey_a
group by key_1
/
If a session fails after loading of 10,000 records in to the target How can u load the
records from 10001 th record when u run the session next time in informatica 6.1?
Simple solution, Nothing by using performance recovery option
Can we run a group of sessions without using workflow manager
ya Its Possible using pmcmd Command with out using the workflow Manager run the group of session.
what is the difference between stop and abort

The Power Center Server handles the abort command for the Session task like the stop command,
except it has a timeout period of 60 seconds. If the Power Center Server cannot finish processing and
committing data within the timeout period, it kills the DTM process and terminates the session.
stop: _______If the session u want to stop is a part of batch you must stop the batch,
if the batch is part of nested batch, Stop the outer most bacth\
Abort:----
You can issue the abort command , it is similar to stop command except it has 60 second time out .
If the server cannot finish processing and committing data with in 60 sec
What is difference between lookup cache and uncached lookup?

Can i run the mapping with out starting the informatica server?
The difference between cache and uncached lookup is when you configure the lookup transformation
cache lookup it stores all the lookup table data in the cache when the first input record enter into the
lookup transformation, in cache lookup the select statement executes only once and compares the values
of the input record with the values in the cache but in uncached lookup the select statement executes for
each input record entering into the lookup transformation and it has to connect to database each time
entering the new record
I want to prepare a questionnaire. The details about it are as follows: -

1. Identify a large company/organization that is a prime candidate for DWH project.


(For example Telecommunication, an insurance company, banks, may be the prime
candidate for this)

2. Give at least four reasons for the selecting the organization.

3. Prepare a questionnaire consisting of at least 15 non-trivial questions to collect


requirements/information about the organization. This information is required to
build data warehouse.

Can you please tell me what should be those 15 questions to ask from a company,
say a telecom company?
First of all meet your sponsors and make a BRD (business requirement document) about their
expectation from this data warehouse (main aim comes from them).For example they need customer
billing process. Now go to business management team they can ask for metrics out of billing process for
their use. Now management people monthly usage, billing metrics, sales organization, rate plan to
perform sales rep and channel performance analysis and rate plan analysis. So your dimension tables
can be Customer (customer id, name, city, state etc) Sales rep sales rep number, name, idsalesorg: sales
ord idBill dimension: Bill #,Bill date, Numberrate plan:rate plan codeAnd Fact table can be:Billing
details(bill #,customer id, minutes used, call details etc)you can follow star and snow flake schema in this
case. Depend upon the granularity of your data.
Can i start and stop single session in concurrent batch?
Just right click on the particular session and going to recovery option
or
by using event wait and event rise
What is Micro Strategy? Why is it used for? Can any one explain in detail about it?
Micro strategy is again an BI tool which is a HOLAP... u can create 2 dimensional report and also cubes
in here.......basically a reporting tool. It has a full range of reporting on web also in windows.
What is difference b/w Informatica 7.1 and Abinitio
There is a lot of difference between Inforrmatica an Abinitio
In Ab Initio we r using 3 parllalisim
but Informatica using 1 parllalisim
In Ab Initio no scheduling option we can scheduled manully or pl/sql script
but informatica contains 4 scheduling options
Ab Inition contains co-operating system
but informatica is not
Ramp time is very quickly in Ab Initio campare than Informatica
Ab Initio is userfriendly than Informatica

What is mystery dimension?

Also known as Junk Dimensions


Making sense of the rogue fields in your fact table..
What is cost based and rule based approaches and the difference
Cost based and rule based approaches are the optimization techniques which are used in related to
databases, where we need to optimize a SQL query.
Basically Oracle provides Two types of Optimizers (indeed 3 but we use only these two techniques. bcz
the third has some disadvantages.)
When ever you process any SQL query in Oracle, what oracle engine internally does is, it reads the query
and decides which will the best possible way for executing the query. So in this process, Oracle follows
these optimization techniques.
1. cost based Optimizer (CBO): If a SQL query can be executed in 2 different ways ( like may have path 1
and path2 for same query),then What CBO does is, it basically calculates the cost of each path and the
analyses for which path the cost of execution is less and then executes that path so that it can optimize
the query execution.
2. Rule base optimizer(RBO): this basically follows the rules which are needed for executing a query. So
depending on the number of rules which are to be applied, the optimzer runs the query.
Use:
If the table you are trying to query is already analysed, then oracle will go with CBO.
If the table is not analysed , the Oracle follows RBO.
For the first time, if table is not analysed, Oracle will go with full table scan.
what are partition points?
Partition points mark the thread boundaries in a source pipeline and divide
the pipeline into stages.
How to append the records in flat file (Informatica) ? Where as in Data stage we have
the options
i) overwrite the existing file
ii) Append existing file
This is not there in Informatica v 7. But heard that it’s included in the latest version 8.0 where u can
append to a flat file. Its about to be shipping in the market.
If u had to split the source level key going into two separate tables. One as surrogate
and other as primary. Since informatica does not gurantee keys are loaded
properly(order!) into those tables. What are the different ways you could handle this
type of situation?
foreign key
what is the best way to show metadata(number of rows at source, target and each
transformation level, error related data) in a report format
When your workflow gets completed go to workflow monitor right click the session .then go to
transformation statistics there we can see number of rows in source and target. if we go for session
properties we can see errors related to data
You can select these details from the repository table. you can use the view REP_SESS_LOG to get
these data
Two relational tables are connected to SQ transformation, what are the possible
errors it will be thrown?
We can connect two relational tables in one sq Transformation. No errors will be perform
With out using Updatestrategy and sessons options, how we can do the update our
target table?
Soln1: You can use this by using "update override" in target properties
Soln2: In session properties, There is an option
insert
update
insert as update
update as update
like that
by using this we will easily solve
Soln3: By default all the rows in the session is set as insert flag ,you can change it in the session general
properties -- Treate source rows as :update
so, all the incoming rows will be set with update flag. now you can update the rows in the target table
Could anyone please tell me what are the steps required for type2
dimension/version data mapping. how can we implement it
Go to mapping designer in it go for mapping select wizard in it go for slowly changing dimension
Here u will find a new window their u need to give the mapping name source table target table and type of
slowly changing dimension then if select finish slowly changing dimension 2 mapping is created
go to ware designer and generate the table then validate the mapping in mapping designer save it to
repository run the session in workflow manager
later update the source table and re run again u will find the difference in target table
How to import oracle sequence into Informatica.
Create one procedure and declare the sequence inside the procedure,finally call the procedure in
informatica with the help of stored procedure transformation
What is data merging, data cleansing, sampling?
Cleansing:---TO identify and remove the retundacy and inconsistency
sampling: just smaple the data throug send the data from source to target
What is IQD file?

IQD file is nothing but Impromptu Query Definition, This file is mainly used in Cognos Impromptu tool
after creating a imr ( report) we save the imr as IQD file which is used while creating a cube in power
play transformer.In data source type we select Impromptu Query Definetion.
Differences between Normalizer and Normalizer transformation.
Normalizer: It is a transormation mainly using for cobol sources,
it's change the rows into coloums and columns into rows
Normalization:To remove the retundancy and inconsitecy
How do I import VSAM files from source to target. Do I need a special plugin
In mapping Designer we have direct option to import files from VSAM Navigation : Sources => Import
from file => file from COBOL
What is the procedure or steps implementing versioning if you are already in
version7.X. Any gotcha\'s or precautions..
For version control in ETL layer using informatica, first of all after doing anything in your designer mode or
workflow manager, do the following steps.....
1> First save the changes or new implementations.
2>Then from navigator window, right click on the specific object you are currently in. There will be a pop
up window. In that window at the lower end side, you will find versioning->Check In. A window will be
opened. Leave the information you have done like "modified this mapping" etc. Then click ok button.
can anyone explain error handling in informatica with examples so that it will be easy
to explain the same in the interview.
go to the session log file there we will find the information regarding to the
session initiation process,
errors encountered.
load summary.
so by seeing the errors encountered during the session running, we can resolve the errors.
If you have four lookup tables in the workflow How do you troubleshoot to improve
performance?
There r many ways to improve the mapping which has multiple lookups.
1) We can create an index for the lookup table if we have permissions(staging area).
2) Divide the lookup mapping into two (a) dedicate one for insert means: source - target,, these r new
rows only the new rows will come to mapping and the process will be fast . (b) Dedicate the second one
to update : source=target,, these r existing rows only the rows which exists allready will come into the
mapping.
3)we can increase the chache size of the lookup
If you are workflow is running slow in informatica. Where do you start trouble
shooting and what are the steps you follow? If you are workflow is running slow in
informatica. Where do you start trouble shooting and what are the steps you follow?
SOLN1: when the work flow is running slowly you have to find out the bottlenecks
in this order
target
source
mapping
session
system
SOLN2: work flow may be slow due to different reasons one is alpha characters in decimal data check it
out this and due to insufficient length of strings check with the SQL override
How do you handle decimal places while importing a flatfile into informatica?
while importing the flat file, the flat file wizard helps in configuring the properties of the file so that select
the numeric column and just enter the precision value and the scale. Precision includes the scale for
examples if the number is 98888.654, enter precision as 8 and scale as 3 and width as 10 for fixed width
flat file
In a sequential Batch how can we stop single session?
we have a task called wait event using that we can stop.
we start using raise event.
why dimenstion tables are denormalized in nature ?...

Because in Data warehousing historical data should be maintained, to maintain historical


data means suppose one employee details like where previously he worked, and now where
he is working, all details should be maintain in one table, if u maintain primary key it won't
allow the duplicate records with same employee id. so to maintain historical data we are all
going for concept data warehousing by using surrogate keys we can achieve the historical
data(using oracle sequence for critical column).
so all the dimensions are marinating historical data, they are de normalized, because of
duplicate entry means not exactly duplicate record with same employee number another
record is maintaining in the table
Can we use aggregator/active transformation after update strategy transformation?
We can use, but the update flag will not be remain. But we can use passive transformation
Can any one comment on

significance of oracle 9i in informatica when compared to oracle 8 or 8i.

i mean how is oracle 9i advantageous when compared to oracle 8 or 8i when used in


informatica
it's very easy
Actually oracle 8i not allowed user defined data types
But 9i allows
and then blob, lob allow only 9i not 8i
and more over list partinition is there in 9i only
in the concept of mapping parameters and variables, the variable value will be saved
to the repository after the completion of the session and the next time when u run the
session, the server takes the saved variable value in the repository and starts
assigning the next value of the saved value. for example i ran a session and in the
end it stored a value of 50 to the repository.next time when i run the session, it
should start with the value of 70. not with the value of 51.

how to do this.
SOLN1: u can do onething after running the mapping,, in workflow manager
start-------->session.
right clickon the session u will get a menu, in that go for persistant values, there u will find the last value
stored in the repository regarding to mapping variable. then remove it and put ur desired one, run the
session... i hope ur task will be done
SOLN2: it takes value of 51 but u can override the saved variable in the repository by defining the value
in the parameter file.if there is a parameter file for the mapping variable it uses the value in the parameter
file not the value+1 in the repositoryfor example assign the value of the mapping variable as 70.in othere
words higher preference is given to the value in the parameter file
how to use mapping parameters and what is their use
Mapping parameters and variables make the use of mappings more flexible and also it avoids creating of
multiple mappings. it helps in adding incremental data mapping parameters and variables has to create in
the mapping designer by choosing the menu option as Mapping ----> parameters and variables and the
enter the name for the variable or parameter but it has to be preceded by $$. and choose type as
parameter/variable, data type once defined the variable/parameter is in the any expression for example in
SQ transformation in the source filter properties tab. just enter filter condition and finally create a
parameter file to assign the value for the variable / parameter and configure the session properties.
however the final step is optional. if their parameter is not present it uses the initial value which is
assigned at the time of creating the variable
How to delete duplicate rows in flat files source is any option in informatica
Use a sorter transformation , in that u will have a "distinct" option make use of it .
What is the use of incremental aggregation? Explain me in brief with an example.
Its a session option when the informatica server performs incremental aggregation it passes new source
data through the mapping and uses historical cache data to perform new aggregation calculations
incrementally for performance we will use it.
What is the procedure to load the fact table.Give in detail?
SOLN1: we use the 2 wizards (i.e) the getting started wizard and slowly changing dimension wizard to
load the fact and dimension tables,by using these 2 wizards we can create different types of mappings
according to the business requirements and load into the star schemas(fact and dimension tables).
SOLN2: first dimenstion tables need to be loaded, then according to the specifications the fact tables
should be loaded. Don’t think that fact table’s r different in case of loading; it is general mapping as we do
for other tables. specifications will play important role for loading the fact.
How to lookup the data on multiple tabels.
if u want to lookup data on multiple tables at a time u can do one thing join the tables which u want then
lookup that joined table. informatica provieds lookup on joined tables
How to retrieve the records from a rejected file. explane with syntax or example
SOLN1: there is one utility called "reject Loader" where we can find out the reject records and able to
refine and reload the rejected records..
SOLN2: During the execution of workflow all the rejected rows will be stored in bad files (where your
informatica server get installed C:\Program Files\Inforrmatica Power Center 7.1\Server) These bad files
can be imported as flat a file in source then thro' direct mapping we can load these files in desired format.
How does the server recognise the source and target databases?
By using ODBC connection.if it is relational.if is flat file FTP connection..see we can make sure with
connection in the properties of session both sources & targets
What are variable ports and list two situations when they can be used?
We have mainly three ports Inport, Outport, Variable port. Inport represents data is flowing into
transformation. Outport is used when data is mapped to next transformation. Variable port is used when
we mathematical calculations are required.
you can also use as for example consider price and quantity and total as a variable we can make a sum
on the total_amt by giving
sum (total_amt)
variable port is used to break the complex expression into simpler
and also it is used to store intermediate values
What is difference between IIF and DECODE function...
You can use nested IIF statements to test multiple conditions. The following example tests for various
conditions and returns 0 if sales is zero or negative:
IIF( SALES > 0, IIF( SALES < 50, SALARY1, IIF( SALES < 100, SALARY2,
IIF( SALES < 200, SALARY3, BONUS))), 0 )
You can use DECODE instead of IIF in many cases. DECODE may improve readability. The following
shows how you can use DECODE instead of IIF :
SALES > 0 and SALES < 50, SALARY1,
SALES > 49 AND SALES < 100, SALARY2,
SALES > 99 AND SALES < 200, SALARY3,
SALES > 199, BONUS)
in Dimensional modeling fact table is normalized or denormalized?in case of star
schema and incase of snow flake schema?
No concept of normailzation in the case of star schema but in the case of snow flack schema dimension
table must be normalized.
Star schema--De-Normalized dimensions
Snow Flake Schema-- Normalized dimensions
which is better among connected lookup and unconnected lookup transformations in
informatica or any other ETL tool?
When you compared both basically connected lookup will return more values and unconnected returns
one value conn lookup is in the same pipeline of source and it will accept dynamic caching. Unconn
lookup don't have that facility but in some special cases we can use Unconnected. if o/p of one lookup is
going as i/p of another lookup this unconnected lookups are favorable
I think the better one is connected look up. beacaz we can use dynamic cache with it ,, also connected
loop up can send multiple columns in a single row, where as unconnected is concerned it has a single
return port.(in case of etl informatica is concerned)
What is the limit to the number of sources and targets you can have in a mapping
As per my knowledge there is no such restriction to use this number of sources or targets inside a
mapping.
Question is " if you make N number of tables to participate at a time in processing what is the position of
your database. I organization point of view it is never encouraged to use N number of tables at a time, It
reduces database and informatica server performance"
The restriction is only on the database side. how many concurrent threads r u allowed to run on the db
server?
which objects are required by the debugger to create a valid debug session?
Initially the session should be valid session.
Source, target, lookups, expressions should be available min 1 break point should be available for
debugger to debug your session.
Informatica server Object is must.
what is the procedure to write the query to list the highest salary of three employees?
SELECT sal
FROM (SELECT sal FROM my_table ORDER BY sal DESC)
WHERE ROWNUM < 4;
since this is informatica.. you might as well use the Rank transformation. check out the help file on how to
use it.
We are using Update Strategy Transformation in mapping how can we know whether
insert or update or reject or delete option has been selected during running of
sessions in Informatica.
In Designer while creating Update Strategy Transformation uncheck "forward to next transformation". If
any rejected rows are there automatically it will be updated to the session log file.
Update or insert files are known by checking the target file or table only.
Suppose session is configured with commit interval of 10,000 rows and source has
50,000 rows. Explain the commit points for Source based commit and Target based
commit. Assume appropriate value wherever required.
Source based commit will commit the data into target based on commit interval so for every 10,000 rows
it will commit into target.
Target based commit will commit the data into target based on buffer size of the target. i.e., it commits the
data into target when ever the buffer fills Let us assume that the buffer size is 6,000. So for every 6,000
rows it commits the data.
How do we estimate the number of partitions that a mapping really requires? Is it
dependent on the machine configuration?
It depends upon the informatica version we r using suppose if we r using informatica 6 it supports only 32
partitions where as informatica 7 supports 64 partitions
Can Informatica be used as a Cleansing Tool? If yes give example of transformations
that can implement a data cleansing routine.
Yes, we can use Informatica for cleansing data some time we use stages to cleansing the data. It
depends upon performance again else we can use expression to cleansing data.
For example a field X has some values and other with Null values and assigned to target field where
target field is not null column, inside an expression we can assign space or some constant value to avoid
session failure.
The input data is in one format and target is in another format, we can change the format in expression.
We can assign some default values to the target to represent complete set of data in the target.
How do you decide whether you need it do aggregations at database level or at
Informatica level?
It depends upon our requirement only If you have good processing database you can create aggregation
table or view at database level else its better to use informatica. Here I am explaining why we need to use
informatica.
what ever it may be informatica is a third party tool, so it will take more time to process aggregation
compared to the database, but in Informatica an option we called "Incremental aggregation" which will
help you to update the current values with current values +new values. No necessary to process entire
values again and again unless this can be done if nobody deleted that cache files. If that happened total
aggregation we need to execute on informatica also.
In database we don't have Incremental aggregation facility.
Identifying bottlenecks in various components of Informatica and resolving them.
The best way to find out bottlenecks is writing to flat file and see where the bottle neck is .
How to join two tables without using the Joiner Transformation
SOLN1: It possible to join the two or more tables by using source qualifier. But provided the tables
should have relationship.
When u drag n drop the table u will getting the source qualifier for each table. Delete all the
source qualifiers. Add a common source qualifier for all. Right click on the source qualifier u will find EDIT
click on it. Click on the properties tab, u will find sql query in that u can write ur sqls
SOLN2: joiner transformation is used to join n (n>1) tables from same or different databases, but source
qualifier transformation is used to join only n tables from same database
SOLN3: use Source Qualifier transformation to join tables on the SAME database. Under its properties
tab, you can specify the user-defined join. Any select statement you can run on a database.. you can do
also in Source Qualifier.
Note: you can only join 2 tables with Joiner Transformation but you can join two tables from
different databases.
In a filter expression we want to compare one date field with a db2 system field
CURRENT DATE.
Our Syntax: datefield = CURRENT DATE (we didn't define it by ports, its a system
field ), but this is not valid (PMParser: Missing Operator)..
Can someone help us.
the db2 date format is "yyyymmdd" where as sysdate in oracle will give "dd-mm-yy" so conversion of
db2 date formate to local database date formate is compulsary. other wise u will get that type of error
Use Sysdate or use to_date for the current date
what does the expression n filter transformations do in Informatica Slowly growing
target wizard?
EXPESSION transformation detects and flags the rows from source.
Filter transformation filters the rows that are not flagged and passes the flagged rows to the Update
strategy transformation
how to create the staging area in your database

A Staging area in a DW is used as a temporary space to hold all the records from the source system. So
more or less it should be exact replica of the source systems except for the laod startegy where we use
truncate and reload options.
So create using the same layout as in your source tables or using the Generate SQL option in the
Warehouse Designer tab.
whats the diff between Informatica powercenter server, repositoryserver and
repository?
Power center server contains the scheduled runs at which time data should load from source to target
Repository contains all the definitions of the mappings done in designer.
What are the Differences between Informatica Power Center versions 6.2 and 7.1,
also between Versions 6.2 and 5.1?
The main difference between informatica 5.1 and 6.1 is that in 6.1 they introduce a new thing called
repository server and in place of server manager(5.1), they introduce workflow manager and workflow
monitor.
In ver 7x u have the option of looking up (lookup) on a flat file.
U can write to XML target.
Versioning
LDAP authentication
Support of 64 bit architectures
Differences between Informatica 6.2 and Informatica 7.0
Features in 7.1 are :
1. Union and custom transformation
2. Lookup on flat file
3. Grid servers working on different operating systems can coexist on same server
4. We can use pmcmdrep
5. We can export independent and dependent rep objects
6. We ca move mapping in any web application
7. Version controlling
8. Data profilling
What is the difference between connected and unconnected stored procedures.
Run a stored procedure before or after your session. Unconnected
Run a stored procedure once during your mapping, such as pre- or post-
Unconnected
session.
Run a stored procedure every time a row passes through the Stored Procedure Connected or
transformation. Unconnected
Run a stored procedure based on data that passes through the mapping, such
Unconnected
as when a specific port does not contain a null value.
Connected or
Pass parameters to the stored procedure and receive a single output parameter.
Unconnected
Pass parameters to the stored procedure and receive multiple output
parameters.
Connected or
Note: To get multiple output parameters from an unconnected Stored
Unconnected
Procedure transformation, you must create variables for each output
parameter. For details, see Calling a Stored Procedure From an Expression.
Run nested stored procedures. Unconnected
Call multiple times within a mapping. Unconnected
Discuss which is better among incremental load, Normal Load and Bulk load
If the database supports bulk load option from Inforrmatica then using BULK LOAD for intial loading the
tables is recommended.
Depending upon the requirment we should choose between Normal and incremental loading strategies
If supported by the database bulk load can do the loading faster than normal load.(incremental load
concept is differnt dont merge with bulk load, mormal load)
Compare Data Warehousing Top-Down approach with Bottom-up approach
in top down approch: first we have to build dataware house then we will build data marts. which will
need more crossfunctional skills and timetaking process also costly.
in bottom up approach: first we will build data marts then data warehuse. the data mart that is first build
will remain as a proff of concept for the others. less time as compared to above and less cost.
What is the difference between summary filter and detail filter
summary filter can be applied on a group of rows that contain a common value where as detail filters can
be applied on each and every rec of the data base.
what are the difference between view and materialized view?
Materialized views are schema objects that can be used to summarize, precompute, replicate, and distribute data.
E.g. to construct a data warehouse.
A materialized view provides indirect access to table data by storing the results of a query in a separate schema
object. Unlike an ordinary view, which does not take up any storage space or contain any data
can we modify the data in flat file?

Just open the text file with notepad, change what ever you want (but datatype should be the same)
how to get the first 100 rows from the flat file into the target?
SOLN1: task ----->(link) session (workflow manager)
double click on link and type $$source sucsess rows(parameter in session variables) = 100
it should automatically stops session.
SOLN2: 1. Use test download option if you want to use it for testing.
2. Put counter/sequence generator in mapping and perform it.
can we lookup a table from a source qualifer transformation-unconnected lookup
No. we can't do.
I will explain you why.
1) Unless you assign the output of the source qualifier to another transformation or to target no way it will
include the feild in the query.
2) source qualifier don't have any variables feilds to utalize as expression.
what is a junk dimension

A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are
unrelated to any particular dimension. The junk dimension is simply a structure that provides a
convenient place to store the junk attributes. A good example would be a trade fact in a company that
brokers equity trades.
What is the difference between Narmal load and Bul...

Normal Load: Normal load will write information to the database log file so that if any recorvery is needed
it is will be helpful. when the source file is a text file and loading data to a table,in such cases we should
you normal load only, else the session will be failed.Bulk Mode: Bulk load will not write information to the
database log file so that if any recorvery is needed we can't do any thing in such cases. compartivly Bulk
load is pretty faster than normal load.
At the max how many tranformations can be us in a mapping?
There is no such limitation to use this number of transformations. But in performance point of view using
too many transformations will reduce the session performance.
My idea is "if needed more tranformations to use in a mapping its better to go for some stored
procedure."

Waht are main advantages and purpose of using Normalizer Transformation in Informatica?

Narmalizer Transformation is used mainly with COBOL sources where most of the time data is stored in
de-normalized format. Also, Normalizer transformation can be used to create multiple rows from a single
row of data
How do u convert rows to columns in Normalizer? could you explain us??
Normally, its used to convert columns to rows but for converting rows to columns, we need an aggregator
and expression and little effort is needed for coding. Denormalization is not possible with a Normalizer
transformation.
Discuss the advantages & Disadvantages of star & snowflake schema?
In a star schema every dimension will have a primary key.
In a star schema, a dimension table will not have any parent table.
Whereas in a snow flake schema, a dimension table will have one or more parent tables.
Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill
down the data from topmost hierachies to the lowermost hierarchies.
star schema consists of single fact table surrounded by some dimensional table.In snowflake schema the
dimension tables are connected with some subdimension table.
In starflake dimensional ables r denormalized,in snowflake dimension tables r normalized.
star schema is used for report generation ,snowflake schema is used for cube.
The advantage of snowflake schema is that the normalized tables r easier to maintain.it also saves the
storage space.
The disadvantage of snowflake schema is that it reduces the effectiveness of navigation across the tables
due to large no of joins between them.
what is a time dimension? give an example.

Time dimension is one of important in Datawarehouse. Whenever u genetated the report , that time u
access all data from thro time dimension.

eg. employee time dimension

Fields : Date key, full date, day of wek, day , month,quarter,fiscal year
What r the connected or unconnected transforamations?
Connected transformation is a part of your data flow in the pipeline while unconnected Transformation is
not.
much like calling a program by name and by reference.
use unconnected transforms when you wanna call the same transform many times in a single mapping
An unconnected transformation cant be connected to another transformation. but it can be called inside
another transformation.
uncondition transformation are directly connected and can/used in as many as other transformations. If
you are using a transformation several times, use unconditional. You get better performance.
How can U create or import flat file definition in to the warehouse designer?
U can create flat file definition in warehouse designer.in the warehouse designer,u can create new target:
select the type as flat file. save it and u can enter various columns for that created target by editing its
properties.Once the target is created, save it. u can import it from the mapping designer.
U can not create or import flat file defintion in to warehouse designer directly.Instead U must analyze the
file in source analyzer,then drag it into the warehouse designer.When U drag the flat file source defintion
into warehouse desginer workspace,the warehouse designer creates a relational target defintion not a file
defintion.If u want to load to a file,configure the session to write to a flat file.When the informatica server
runs the session,it creates and loads the flatfile.
What r the tasks that Loadmanger process will do?
Manages the session and batch scheduling: Whe u start the informatica server the load maneger
launches and queries the repository for a list of sessions configured to run on the informatica
server.When u configure the session the loadmanager maintains list of list of sessions and session start
times.When u sart a session loadmanger fetches the session information from the repository to perform
the validations and verifications prior to starting DTM process.

Locking and reading the session: When the informatica server starts a session lodamaager locks the
session from the repository.Locking prevents U starting the session again and again.

Reading the parameter file: If the session uses a parameter files,loadmanager reads the parameter file
and verifies that the session level parematers are declared in the file

Verifies permission and privelleges: When the sesson starts load manger checks whether or not the user
have privelleges to run the session.

Creating log files: Loadmanger creates logfile contains the status of session.
How do you transfert the data from data warehouse to flatfile?
You can write a mapping with the flat file as a target using a DUMMY_CONNECTION. A flat file target is
built by pulling a source into target space using Warehouse Designer tool.
Diff between informatica repositry server & informatica server
Informatica Repository Server:It's manages connections to the repository from client application.
Informatica Server:It's extracts the source data,performs the data transformation,and loads the
transformed data into the target
Router transformation

A Router transformation is similar to a Filter transformation because both transformations allow you to
use a condition to test data. A Filter transformation tests data for one condition and drops the rows of
data that do not meet the condition. However, a Router transformation tests data for one or more
conditions and gives you the option to route rows of data that do not meet any of the conditions to a
default output group.
What are 2 modes of data movement in Informatica Server?The data movement mode depends on
whether Informatica Server should process single byte or multi-byte character data. This mode selection
can affect the enforcement of code page relationships and code page validation in the Informatica Client
and Server.
a) Unicode - IS allows 2 bytes for each character and uses additional byte for each non-
ascii character (such as Japanese characters)
b) ASCII - IS holds all data in a single byte
The IS data movement mode can be changed in the Informatica Server configuration parameters. This
comes into effect once you restart the Informatica Server.
How to read rejected data or bad data from bad file and reload it to target?
correction the rejected data and send to target relational tables using loadorder utility. Find out the
rejected data by using column indicatior and row indicator.
Explain the informatica Architecture in detail
Informatica server connects source data and target data using native
odbc drivers
again it connect to the repository for running sessions and retriveing metadata information
source------>informatica server--------->target
|
|
REPOSITORY repository←Repository→Repository ser.adm.
control server ¢Õ source←informatica server→target
-------------¢Õ ¢Õ ¢Õdesigner w.f.manager
w.f.monitor
how can we partition a session in Informatica?

The Informatica® PowerCenter® Partitioning option optimizes parallel processing on multiprocessor


hardware by providing a thread-based architecture and built-in data partitioning.
GUI-based tools reduce the development effort necessary to create data partitions and streamline
ongoing troubleshooting and performance tuning tasks, while ensuring data integrity throughout the
execution process. As the amount of data within an organization expands and real-time demand for
information grows, the PowerCenter Partitioning option
enables hardware and applications to provide outstanding performance and jointly scale to handle large
volumes of data and users.
What is Load Manager?
While running a Workflow,the PowerCenter Server uses the Load Manager process and the Data
Transformation Manager Process (DTM) to run the workflow and carry out workflow
tasks.When the PowerCenter Server runs a workflow, the Load Manager performs the following
tasks:

1. Locks the workflow and reads workflow properties.


2. Reads the parameter file and expands workflow variables.
3. Creates the workflow log file.
4. Runs workflow tasks.
5. Distributes sessions to worker servers.
6. Starts the DTM to run sessions.
7. Runs sessions from master servers.
8. Sends post-session email if the DTM terminates abnormally.

When the PowerCenter Server runs a session, the DTM performs the following tasks:
1. Fetches session and mapping metadata from the repository.
2. Creates and expands session variables.
3. Creates the session log file.
4. Validates session code pages if data code page validation is enabled. Checks query
conversions if data code page validation is disabled.
5. Verifies connection object permissions.
6. Runs pre-session shell commands.
7. Runs pre-session stored procedures and SQL.
8. Creates and runs mapping, reader, writer, and transformation threads to extract,transform, and
load data.
9. Runs post-session stored procedures and SQL.
10. Runs post-session shell commands.
11. Sends post-session email.
What is Data cleansing..?
The process of finding and removing or correcting data that is incorrect, out-of-date, redundant,
incomplete, or formatted incorrectly.
This is nothing but polising of data. For example of one of the sub system store the Gender as M
and F. The other may store it as MALE and FEMALE. So we need to polish this data, clean it before it is
add to Datawarehouse. Other typical example can be Addresses. The all sub systesms maintinns the
customer address can be different. We might need a address cleansing to tool to have the customers
addresses in clean and neat form.
To provide support for Mainframes source data,which files r used as a source
definitions?COBOL Copy-book filesWhere should U place the flat file to import the flat file
defintion to the designer?
There is no such restrication to place the source file. In performance point of view its better to place the
file in server local src folder. if you need path please check the server properties availble at workflow
manager.
It doesn't mean we should not place in any other folder, if we place in server src folder by default src will
be selected at time session creation
How many ways you can update a relational source defintion and what r they?Two
ways
1. Edit the definition
2. Reimport the definitionWhich transformation should u need while using the cobol
sources as source defintions?Normalizer transformaiton which is used to normalize the data.Since
cobol sources r oftenly consists of Denormailzed data.
What is the maplet?

For Ex:Suppose we have several fact tables that require a series of dimension keys.Then we can create
a mapplet which contains a series of Lkp transformations to find each dimension key and use it in each
fact table mapping instead of creating the same Lkp logic in each mapping.
what is a transforamation?It is a repostitory object that generates,modifies or passes data.A
transformation is repository object that pass data to the next stage(i.e to the next transformation or target)
with/with out modifying the dataWhat r the active and passive transforamtions?An active
transforamtion can change the number of rows that pass through it.A passive transformation does not
change the number of rows that pass through it.
Transformations can be active or passive. An active transformation can change the number of rows that
pass through it, such as a Filter transformation that removes rows that do not meet the filter condition.
A passive transformation does not change the number of rows that pass through it, such as an
Expression transformation that performs a calculation on data and passes all rows through the
transformation.
What r the reusable transforamtions?Reusable transformations can be used in multiple
mappings.When u need to incorporate this transformation into maping,U add an instance of it to
maping.Later if U change the definition of the transformation ,all instances of it inherit the changes.Since
the instance of reusable transforamation is a pointer to that transforamtion,U can change the
transforamation in the transformation developer,its instances automatically reflect these changes.This
feature can save U great deal of work.What r the methods for creating reusable
transforamtions?Two methods
1.Design it in the transformation developer.
2.Promote a standard transformation from the mapping designer.After U add a transformation to the
mapping , U can promote it to the status of reusable transformation.
Once U promote a standard transformation to reusable status,U can demote it to a standard
transformation at any time.
If u change the properties of a reusable transformation in mapping,U can revert it to the original reusable
transformation properties by clicking the revert button.What r the unsupported repository
objects for a mapplet?COBOL source definition
Joiner transformations
Normalizer transformations
Non reusable sequence generator transformations.
Pre or post session stored procedures
Target defintions
Power mart 3.5 style Look Up functions
XML source definitions
IBM MQ source definitions• Source definitions. Definitions of database objects (tables, views,
synonyms) or files that provide source data. • Target definitions. Definitions of database objects or files
that contain the target data. • Multi-dimensional metadata. Target definitions that are configured as
cubes and dimensions. • Mappings. A set of source and target definitions along with transformations
containing business logic that you build into the transformation. These are the instructions that the
Informatica Server uses to transform and move data. • Reusable transformations. Transformations that
you can use in multiple mappings. • Mapplets. A set of transformations that you can use in multiple
mappings. • Sessions and workflows. Sessions and workflows store information about how and when
the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run
tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in
a workflow. Each session corresponds to a single mapping.What r the mapping paramaters and
maping variables?Maping parameter represents a constant value that U can define before running a
session.A mapping parameter retains the same value throughout the entire session.
When u use the maping parameter ,U declare and use the parameter in a maping or maplet.Then define
the value of parameter in a parameter file for the session.
Unlike a mapping parameter,a maping variable represents a value that can change throughout the
session.The informatica server saves the value of maping variable to the repository at the end of session
run and uses that value next time U run the session.Can U use the maping parameters or
variables created in one maping into another maping?NO.
We can use mapping parameters or variables in any transformation of the same maping or mapplet in
which U have created maping parameters or variables.Can u use the maping parameters or
variables created in one maping into any other reusable transformation?Yes.Because
reusable tranformation is not contained with any maplet or maping.
How can U improve session performance in aggregator transformation?

use sorted input:


1. use a sorter before the aggregator
2. donot forget to check the option on the aggregator that tell the aggregator that the input is sorted on
the same keys as group by.
the key order is also very important
What is aggregate cache in aggregator transforamtion?The aggregator stores data in the
aggregate cache until it completes aggregate calculations.When u run a session that uses an aggregator
transformation,the informatica server creates index and data caches in memory to process the
transformation.If the informatica server requires more space,it stores overflow values in cache files.
When you run a workflow that uses an Aggregator transformation, the Informatica Server creates index
and data caches in memory to process the transformation. If the Informatica Server requires more space,
it stores overflow values in cache files.
What r the diffrence between joiner transformation and source qualifier
transformation?U can join hetrogenious data sources in joiner transformation which we can not
achieve in source qualifier transformation.
U need matching keys to join two relational sources in source qualifier transformation.Where as u doesn’t
need matching keys to join two sources.
Two relational sources should come from same datasource in sourcequalifier.U can join relatinal sources
which r coming from diffrent sources also.In which condtions we can not use joiner
transformation(Limitaions of joiner transformation)?Both pipelines begin with the same
original data source.
Both input pipelines originate from the same Source Qualifier transformation.
Both input pipelines originate from the same Normalizer transformation.
Both input pipelines originate from the same Joiner transformation.
Either input pipelines contains an Update Strategy transformation.
Either input pipelines contains a connected or unconnected Sequence Generator transformation.what r
the settiings that u use to cofigure the joiner transformation?• Master and detail source •
Type of join • Condition of the join
the Joiner transformation supports the following join types, which you set in the Properties tab:
• Normal (Default)
• Master Outer
• Detail Outer
• Full Outer
What r the join types in joiner transformation?

Normal (Default) -- only matching rows from both master and detail
Master outer -- all detail rows and only matching rows from master
Detail outer -- all master rows and only matching rows from detail
Full outer -- all rows from both master and detail ( matching or non matching)
follw this
1. In the Mapping Designer, choose Transformation-Create. Select the Joiner
transformation. Enter a name, click OK.
The naming convention for Joiner transformations is JNR_TransformationName. Enter a
description for the transformation. This description appears in the Repository Manager, making
it easier for you or others to understand or remember what the transformation does. The
Designer creates the Joiner transformation. Keep in mind that you cannot use a Sequence
Generator or Update Strategy transformation as a source to a Joiner transformation.
2. Drag all the desired input/output ports from the first source into the Joiner
transformation.
The Designer creates input/output ports for the source fields in the Joiner as detail fields by
default. You can edit this property later.
3. Select and drag all the desired input/output ports from the second source into the Joiner
transformation.
The Designer configures the second set of source fields and master fields by default.
4. Double-click the title bar of the Joiner transformation to open the Edit Transformations
dialog box.
5. Select the Ports tab.
6. Click any box in the M column to switch the master/detail relationship for the sources.
Change the master/detail relationship if necessary by selecting the master source in the
M column.
Tip: Designating the source with fewer unique records as master increases performance during a
join.
7. Add default values for specific ports as necessary.
Certain ports are likely to contain NULL values, since the fields in one of the sources may be
empty. You can specify a default value if the target database does not handle NULLs.
8. Select the Condition tab and set the condition.
9. Click the Add button to add a condition. You can add multiple conditions. The master
and detail ports must have matching datatypes. The Joiner transformation only supports
equivalent (=) joins:
10. Select the Properties tab and enter any additional settings for the transformations.
11. Click OK.
12. Choose Repository-Save to save changes to the mapping.

What r the joiner caches?When a Joiner transformation occurs in a session, the Informatica Server
reads all the records from the master source and builds index and data caches based on the master rows.

After building the caches, the Joiner transformation reads records from the detail source and perform
joinswhat is the look up transformation?Use lookup transformation in u’r mapping to lookup data
in a relational table,view,synonym.
Informatica server queries the look up table based on the lookup ports in the transformation.It compares
the lookup transformation port values to lookup table column values based on the look up condition.Why
use the lookup transformation ?To perform the following tasks.
Get a related value. For example, if your source table includes employee ID, but you want to include the
employee name in your target table to make your summary data easier to read.
Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales
per invoice or sales tax, but not the calculated value (such as net sales).
Update slowly changing dimension tables. You can use a Lookup transformation to determine whether
records already exist in the target.
What r the types of lookup?

1. Connected lookup
2. Unconnected lookup
1. Persistent cache
2. Re-cache from database
3. Static cache
4. Dynamic cache
5. Shared cache
Differences between connected and unconnected lookup?

Connected lookup Unconnected lookup

Receives input values diectly from the Receives input values from the result of a lkp expression in
pipe line. a another transformation.

U can use a dynamic or static cache U can use a static cache.

Cache includes all lookup columns Cache includes all lookup out put ports in the lookup
used in the maping condition and the lookup/return port.

Support user defined default values Does not support user defiend default values
What is meant by lookup caches?The informatica server builds a cache in memory when it
processes the first row af a data in a cached look up transformation.It allocates memory for the cache
based on the amount u configure in the transformation or session properties.The informatica server stores
condition values in the index cache and output values in the data cache.What r the types of lookup
caches?Persistent cache: U can save the lookup cache files and reuse them the next time the
informatica server processes a lookup transformation configured to use the cache.
Recache from database: If the persistent cache is not synchronized with he lookup table, U can configure
the lookup transformation to rebuild the lookup cache.

Static cache: U can configure a static or readonly cache for only lookup table.By default informatica
server creates a static cache.It caches the lookup table and lookup values in the cache for each row that
comes into the transformation.when the lookup condition is true,the informatica server does not update
the cache while it prosesses the lookup transformation.

Dynamic cache: If u want to cache the target table and insert new rows into cache and the target,u can
create a look up transformation to use dynamic cache.The informatica server dynamically inerts data to
the target table.

Shared cache: U can share the lookup cache between multiple transactions. U can share unnamed cache
between transformations in the same maping.Difference between static cache and dynamic
cache
Static cache Dynamic cache
U can insert rows into the cache as u pass to
U can not insert or update the cache
the target
The informatica server returns a value from the lookup table The informatica server inserts rows into
or cache when the condition is true. When the condition is not cache when the condition is false. This
true, informatica server returns the default value for indicates that the row is not in the cache or
connected transformations and null for unconnected target table. U can pass these rows to the
transformations. target table
Which transformation should we use to normalize the COBOL and relational
sources?Normalizer Transformation.
When U drag the COBOL source in to the mapping Designer workspace,the normalizer transformation
automatically appears,creating input and output ports for every column in the source.How the
informatica server sorts the string values in Ranktransformation?When the informatica
server runs in the ASCII data movement mode it sorts session data using Binary sortorder.If U configure
the seeion to use a binary sort order,the informatica server caluculates the binary value of each string and
returns the specified number of rows with the higest binary values for the string.What r the rank
caches?During the session ,the informatica server compares an inout row with rows in the datacache.If
the input row out-ranks a stored row,the informatica server replaces the stored row with the input row.The
informatica server stores group information in an index cache and row data in a data cache.What is the
Rankindex in Ranktransformation?The Designer automatically creates a RANKINDEX port for
each Rank transformation. The Informatica Server uses the Rank Index port to store the ranking position
for each record in a group. For example, if you create a Rank transformation that ranks the top 5
salespersons for each quarter, the rank index numbers the salespeople from 1 to 5:What is the
Router transformation?A Router transformation is similar to a Filter transformation because both
transformations allow you to use a condition to test data. However, a Filter transformation tests data for
one condition and drops the rows of data that do not meet the condition. A Router transformation tests
data for one or more conditions and gives you the option to route rows of data that do not meet any of the
conditions to a default output group.
If you need to test the same input data based on multiple conditions, use a Router Transformation in a
mapping instead of creating multiple Filter transformations to perform the same task.What r the types
of groups in Router transformation?Input group Output group
The designer copies property information from the input ports of the input group to create a set of output
ports for each output group.
Two types of output groups
User defined groups
Default group
U can not modify or delete default groups.Why we use stored procedure transformation?
A Stored Procedure transformation is an important tool for populating and maintaining
databases. Database administrators create stored procedures to automate time-consuming tasks
that are too complicated for standard SQL statements
What r the types of data that passes between informatica server and stored
procedure?3 types of data
Input/Out put parameters
Return Values
Status code.What is the status code?Status code provides error handling for the informatica server
during the session.The stored procedure issues a status code that notifies whether or not stored
procedure completed sucessfully.This value can not seen by the user.It only used by the informatica
server to determine whether to continue running the session or stop.
What is source qualifier transformation? What r the tasks that source qualifier performs?

When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source
Qualifier transformation. The Source Qualifier represents the rows that the Informatica Server reads
when it executes a session.
• Join data originating from the same source database. You can join two or more tables with
primary-foreign key relationships by linking the sources to one Source Qualifier. • Filter records when
the Informatica Server reads source data. If you include a filter condition, the Informatica Server adds
a WHERE clause to the default query. • Specify an outer join rather than the default inner join. If
you include a user-defined join, the Informatica Server replaces the join information specified by the
metadata in the SQL query. • Specify sorted ports. If you specify a number for sorted ports, the
Informatica Server adds an ORDER BY clause to the default SQL query. • Select only distinct values
from the source. If you choose Select Distinct, the Informatica Server adds a SELECT DISTINCT
statement to the default SQL query. • Create a custom query to issue a special SELECT statement
for the Informatica Server to read source data. For example, you might use a custom query to
perform aggregate calculations or execute a stored procedure.
What is the target load order?U specify the target loadorder based on source qualifiers in a
maping.If u have the multiple
source qualifiers connected to the multiple targets,U can designatethe order in which informatica
server loads data into the targets.
A target load order group is the collection of source qualifiers, transformations, and targets linked together
in a mapping.
What is the default join that source qualifier provides?Inner equi join.
The Joiner transformation supports the following join types, which you set in the Properties tab:
• Normal (Default)
• Master Outer
• Detail Outer
• Full Outer
What r the basic needs to join two sources in a source qualifier?Two sources should have
primary and Foreign key relation ships.
Two sources should have matching data types.
what is update strategy transformation ?

The model you choose constitutes your update strategy, how to handle changes to existing rows. In
PowerCenter and PowerMart, you set your update strategy at two different levels:
• Within a session. When you configure a session, you can instruct the Informatica Server to
either treat all rows in the same way (for example, treat all rows as inserts), or use instructions
coded into the session mapping to flag rows for different database operations.
• Within a mapping. Within a mapping, you use the Update Strategy transformation to flag rows
for insert, delete, update, or reject.
Describe two levels in which update strategy transformation sets?Within a session. When
you configure a session, you can instruct the Informatica Server to either treat all records in the same way
(for example, treat all records as inserts), or use instructions coded into the session mapping to flag
records for different database operations.

Within a mapping. Within a mapping, you use the Update Strategy transformation to flag records for
insert, delete, update, or reject.What is the default source option for update stratgey
transformation?Data driven.What is Datadriven?The informatica server follows instructions
coded into update strategy transformations with in the session maping determine how to flag records for
insert, update, delete or reject. If u do not choose data driven option setting,the informatica server ignores
all update strategy transformations in the mapping.What r the options in the target session of
update strategy transsformatioin?Insert
Delete
Update
Update as update
Update as insert
Update esle insert
Truncate table
Update as Insert:
This option specified all the update records from source to be flagged as inserts in the target. In other
words, instead of updating the records in the target they are inserted as new records.
Update else Insert:
This option enables informatica to flag the records either for update if they are old or insert, if they are
new records from source.
What r the types of maping wizards that r to be provided in Informatica?Simple Pass
through Slowly Growing Target Slowly Changing the Dimension Type1
Most recent values
Type2Full History
Version
Flag
Date
Type3
Current and one previous
What r the types of maping in Getting Started Wizard?Simple Pass through maping :
Loads a static fact or dimension table by inserting all rows. Use this mapping when you want to drop all
existing data from your table before loading new data.

Slowly Growing target :


Loads a slowly growing fact or dimension table by inserting new rows. Use this mapping to load new data
when existing data does not require updates.What r the mapings that we use for slowly
changing dimension table?
Type1: Rows containing changes to existing dimensions are updated in the target by overwriting the
existing dimension. In the Type 1 Dimension mapping, all rows contain current dimension data.
Use the Type 1 Dimension mapping to update a slowly changing dimension table when you do not need
to keep any previous versions of dimensions in the table.

Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the target.
Changes are tracked in the target table by versioning the primary key and creating a version number for
each dimension in the table.
Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension table when you
want to keep a full history of dimension data in the table. Version numbers and versioned primary keys
track the order of changes to each dimension.

Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and
inserts only those found to be new dimensions to the target. Rows containing changes to existing
dimensions are updated in the target. When updating an existing dimension, the Informatica Server saves
existing data in different columns of the same row and replaces the existing data with the updatesWhat r
the different types of Type2 dimension maping?Type2 Dimension/Version Data Maping: In
this maping the updated dimension in the source will gets inserted in target along with a new version
number.And newly added dimension
in source will inserted into target with a primary key.

Type2 Dimension/Flag current Maping: This maping is also used for slowly changing dimensions.In
addition it creates a flag value for changed or new dimension.
Flag indiactes the dimension is new or newlyupdated.Recent dimensions will gets saved with cuurent flag
value 1. And updated dimensions r saved with the value 0.

Type2 Dimension/Effective Date Range Maping: This is also one flavour of Type2 maping used for slowly
changing dimensions.This maping also inserts both new and changed dimensions in to the target.And
changes r tracked by the effective date range for each version of each dimension.How can u
recognise whether or not the newly added rows in the source r gets insert in the
target ?In the Type2 maping we have three options to recognise the newly added rows
Version number
Flagvalue
Effective date RangeWhat r two types of processes that informatica runs the session?
Load manager Process: Starts the session, creates the DTM process, and sends post-session email
when the session completes.
The DTM process. Creates threads to initialize the session, read, write, and transform data, and handle
pre- and post-session operations.What r the new features of the server manager in the
informatica 5.0?U can use command line arguments for a session or batch.This allows U to change
the values of session parameters,and mapping parameters and maping variables.

Parallel data processing: This feature is available for powercenter only.If we use the informatica server on
a SMP system,U can use multiple CPU’s to process a session concurently.

Process session data using threads: Informatica server runs the session in two processes.Explained in
previous question.Can u generate reports in Informatcia?
It is a ETL tool, you could not make reports from here, but you can generate metadata report,
that is not going to be used for business analysis
What is metadata reporter?It is a web based application that enables you to run reports againist
repository metadata.
With a meta data reporter,u can access information about U’r repository with out having knowledge of
sql,transformation language or underlying tables in the repository.Define maping and sessions?
Maping: It is a set of source and target definitions linked by transformation objects that define the rules for
transformation.
Session : It is a set of instructions that describe how and when to move data from source to
targets.Which tool U use to create and manage sessions and batches and to monitor
and stop the informatica server?Informatica server manager.what is polling?It displays the
updated information about the session in the monitor window. The monitor window displays the status of
each session when U poll the informatica server.While importing the relational source
defintion from database,what are the meta data of source U import?Source name
Database location
Column names
Datatypes
Key constraints What r the designer tools for creating tranformations?Mapping designer
Tansformation developer
Mapplet designerHow many ways u create ports?Two ways
1.Drag the port from another transforamtion
2.Click the add buttion on the ports tab.Why we use partitioning the session in informatica?
Partitioning achieves the session performance by reducing the time period of reading the source and
loading the data into target.
Performance can be improved by processing data in parallel in a single session by creating multiple
partitions of the pipeline.
Informatica server can achieve high performance by partitioning the pipleline and performing the extract ,
transformation, and load for each partition in parallel.
To achieve the session partition what r the necessary tasks u have to do?Configure the
session to partition source data.

Install the informatica server on a machine with multiple CPU’s.How the informatica server
increases the session performance through partitioning the source?For a relational
sources informatica server creates multiple connections for each parttion of a single source and extracts
seperate range of data for each connection.Informatica server reads multiple partitions of a single source
concurently.Similarly for loading also informatica server creates multiple connections to the target and
loads partitions of data concurently.

For XML and file sources,informatica server reads multiple files concurently.For loading the data
informatica server creates a seperate file for each partition(of a source file).U can choose to merge the
targets.Why u use repository connectivity?When u edit,schedule the sesion each
time,informatica server directly communicates the repository to check whether or not the session and
users r valid.All the metadata of sessions and mappings will be stored in repository. What is DTM
process?After the loadmanger performs validations for session,it creates the DTM process.DTM is to
create and manage the threads that carry out the session tasks.I creates the master thread.Master thread
creates and manges all the other threads.What r the different threads in DTM process?Master
thread: Creates and manages all other threads

Maping thread: One maping thread will be creates for each session.Fectchs session and maping
information.

Pre and post session threads: This will be created to perform pre and post session operations.

Reader thread: One thread will be created for each partition of a source.It reads data from source.

Writer thread: It will be created to load data to the target.

Transformation thread: It will be created to tranform data.What r the data movement modes in
informatcia?Datamovement modes determines how informatcia server handles the charector data.U
choose the datamovement in the informatica server configuration settings.Two types of datamovement
modes avialable in informatica.

ASCII mode
Uni code mode.What r the out put files that the informatica server creates during the
session running?Informatica server log: Informatica server(on unix) creates a log for all status and
error messages(default name: pm.server.log).It also creates an error log for error messages.These files
will be created in informatica home directory.

Session log file: Informatica server creates session log file for each session.It writes information about
session into log files such as initialization process,creation of sql commands for reader and writer
threads,errors encountered and load summary.The amount of detail in session log file depends on the
tracing level that u set.

Session detail file: This file contains load statistics for each targets in mapping.Session detail include
information such as table name,number of rows written or rejected.U can view this file by double clicking
on the session in monitor window

Performance detail file: This file contains information known as session performance details which helps
U where performance can be improved.To genarate this file select the performance detail option in the
session property sheet.

Reject file: This file contains the rows of data that the writer does notwrite to targets.

Control file: Informatica server creates control file and a target file when U run a session that uses the
external loader.The control file contains the information about the target flat file such as data format and
loading instructios for the external loader.

Post session email: Post session email allows U to automatically communicate information about a
session run to designated recipents.U can create two different messages.One if the session completed
sucessfully the other if the session fails.

Indicator file: If u use the flat file as a target,U can configure the informatica server to create indicator
file.For each target row,the indicator file contains a number to indicate whether the row was marked for
insert,update,delete or reject.

output file: If session writes to a target file,the informatica server creates the target file based on file
prpoerties entered in the session property sheet.

Cache files: When the informatica server creates memory cache it also creates cache files.For the
following circumstances informatica server creates index and datacache files.

Aggreagtor transformation
Joiner transformation
Rank transformation
Lookup transformationIn which circumstances that informatica server creates Reject
files?When it encounters the DD_Reject in update strategy transformation.
Violates database constraint
Filed in the rows was truncated or overflowed.Can u copy the session to a different folder or
repository?Yes. By using copy session wizard u can copy a session in a different folder or
repository.But that
target folder or repository should consists of mapping of that session.
If target folder or repository is not having the maping of copying session ,
u should have to copy that maping first before u copy the sessionIn addition, you can copy the workflow
from the Repository manager. This will automatically copy the mapping, associated source,targets and
session to the target folder.What is batch and describe about types of batches?Grouping of
session is known as batch.Batches r two types
Sequential: Runs sessions one after the other
Concurrent: Runs session at same time.

If u have sessions with source-target dependencies u have to go for sequential batch to start the
sessions one after another.If u have several independent sessions u can use concurrent batches.
Whch runs all the sessions at the same time. How many number of sessions that u can create
in a batch?Any number of sessions.When the informatica server marks that a batch is
failed?If one of session is configured to "run if previous completes" and that previous session failsWhat
is a command that used to run a batch?pmcmd is used to start a batch.What r the different
options used to configure the sequential batches?Two options
Run the session only if previous session completes sucessfully. Always runs the session.In a
sequential batch can u run the session if previous session fails?Yes.By setting the option
always runs the session.Can u start a batches with in a batch?U can not. If u want to start batch
that resides in a batch,create a new independent batch and copy the necessary sessions into the new
batch.Can u start a session inside a batch idividually?We can start our required session only
in case of sequential batch.in case of concurrent batch
we cant do like this.How can u stop a batch?By using server manager or pmcmd.What r the
session parameters?Session parameters r like maping parameters,represent values U might want to
change between
sessions such as database connections or source files.

Server manager also allows U to create userdefined session parameters.Following r user defined
session parameters.
Database connections
Source file names: use this parameter when u want to change the name or location of
session source file between session runs
Target file name : Use this parameter when u want to change the name or location of
session target file between session runs.
Reject file name : Use this parameter when u want to change the name or location of
session reject files between session runs.What is parameter file?Parameter file is to define the
values for parameters and variables used in a session.A parameter
file is a file created by text editor such as word pad or notepad.
U can define the following values in parameter file
Maping parameters
Maping variables
session parameters
For Windows command prompt users, the parameter file name cannot have beginning or trailing spaces.
If the name includes spaces, enclose the file name in double quotes:
-paramfile ”$PMRootDir\my file.txt”
Note: When you write a pmcmd command that includes a parameter file located on another machine, use
the backslash (\) with the dollar sign ($). This ensures that the machine where the variable is defined
expands the server variable.
pmcmd startworkflow -uv USERNAME -pv PASSWORD -s SALES:6258 -f east -w
wSalesAvg -paramfile '$PMRootDir/myfile.txt'
How can u access the remote source into U’r session?Relational source: To acess relational
source which is situated in a remote place ,u need to
configure database connection to the datasource.

FileSource : To access the remote source file U must configure the FTP connection to the
host machine before u create the session.

Hetrogenous : When U’r maping contains more than one source type,the server manager creates
a hetrogenous session that displays source options for all types.What is difference between
partioning of relatonal target and partitioning of file targets?If u parttion a session with a
relational target informatica server creates multiple connections
to the target database to write target data concurently.If u partition a session with a file target
the informatica server creates one target file for each partition.U can configure session properties
to merge these target fileswhat r the transformations that restricts the partitioning of
sessions?Advanced External procedure tranformation and External procedure transformation: This
transformation contains a check box on the properties tab to allow partitioning.

Aggregator Transformation: If u use sorted ports u can not parttion the assosiated source
Joiner Transformation : U can not partition the master source for a joiner transformation

Normalizer Transformation

XML targets.Performance tuning in Informatica?The goal of performance tuning is optimize


session performance so sessions run during the available load window for the Informatica
Server.Increase the session performance by following.

The performance of the Informatica Server is related to network connections. Data generally moves
across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times
faster. Thus network connections ofteny affect on session performance.So aviod netwrok connections.

Flat files: If u’r flat files stored on a machine other than the informatca server, move those files to the
machine that consists of informatica server.
Relational datasources: Minimize the connections to sources ,targets and informatica server to
improve session performance.Moving target database into server system may improve session
performance.
Staging areas: If u use staging areas u force informatica server to perform multiple datapasses.
Removing of staging areas may improve session performance.

U can run the multiple informatica servers againist the same repository.Distibuting the session load to
multiple informatica servers may improve session performance.

Run the informatica server in ASCII datamovement mode improves the session performance.Because
ASCII datamovement mode stores a character value in one byte.Unicode mode takes 2 bytes to store a
character.

If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve
performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit
from optimization such as adding indexes.

We can improve the session performance by configuring the network packet size,which allows
data to cross the network at one time.To do this go to server manger ,choose server configure database
connections.

If u r target consists key constraints and indexes u slow the loading of data.To improve the session
performance in this case drop constraints and indexes before u run the session and rebuild them after
completion of session.

Running a parallel sessions by using concurrent batches will also reduce the time of loading the
data.So concurent batches may also increase the session performance.

Partittionig the session improves the session performance by creating multiple connections to sources
and targets and loads data in paralel pipe lines.

In some cases if a session contains a aggregator transformation ,u can use incremental aggregation to
improve session performance.

Aviod transformation errors to improve the session performance.

If the sessioin containd lookup transformation u can improve the session performance by enabling the
look up cache.

If U’r session contains filter transformation ,create that filter transformation nearer to the sources
or u can use filter condition in source qualifier.
Aggreagator,Rank and joiner transformation may oftenly decrease the session performance .Because
they must group data before processing it.To improve session performance in this case use sorted ports
option.What is difference between maplet and reusable transformation?Maplet consists of
set of transformations that is reusable.A reusable transformation is a
single transformation that can be reusable.

If u create a variables or parameters in maplet that can not be used in another maping or maplet.Unlike
the variables that r created in a reusable transformation can be usefull in any other maping or maplet.

We can not include source definitions in reusable transformations.But we can add sources to a maplet.

Whole transformation logic will be hided in case of maplet.But it is transparent in case of reusable
transformation.

We cant use COBOL source qualifier,joiner,normalizer transformations in maplet.Where as we can make


them as a reusable transformations.Define informatica repository?The Informatica repository is a
relational database that stores information, or metadata, used by the Informatica Server and Client tools.
Metadata can include information such as mappings describing how to transform source data, sessions
indicating when you want the Informatica Server to perform the transformations, and connect strings for
sources and targets.

The repository also stores administrative information such as usernames and passwords, permissions
and privileges, and product version.

Use repository manager to create the repository.The Repository Manager connects to the repository
database and runs the code needed to create the repository tables.Thsea tables
stores metadata in specific format the informatica server,client tools use.What r the types of
metadata that stores in repository?Following r the types of metadata that stores in the repository
Database connections
Global objects
Mappings
Mapplets
Multidimensional metadata
Reusable transformations
Sessions and batches
Short cuts
Source definitions
Target defintions
Transformations• Source definitions. Definitions of database objects (tables, views, synonyms) or files
that provide source data. • Target definitions. Definitions of database objects or files that contain the
target data. • Multi-dimensional metadata. Target definitions that are configured as cubes and
dimensions. • Mappings. A set of source and target definitions along with transformations containing
business logic that you build into the transformation. These are the instructions that the Informatica
Server uses to transform and move data. • Reusable transformations. Transformations that you can
use in multiple mappings. • Mapplets. A set of transformations that you can use in multiple mappings. •
Sessions and workflows. Sessions and workflows store information about how and when the
Informatica Server moves data. A workflow is a set of instructions that describes how and when to run
tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in
a workflow. Each session corresponds to a single mappingWhat is power center repository?The
PowerCenter repository allows you to share metadata across repositories to create a data mart domain.
In a data mart domain, you can create a single global repository to store metadata used across an
enterprise, and a number of local repositories to share the global metadata as needed.• Standalone
repository. A repository that functions individually, unrelated and unconnected to other repositories. •
Global repository. (PowerCenter only.) The centralized repository in a domain, a group of connected
repositories. Each domain can contain one global repository. The global repository can contain common
objects to be shared throughout the domain through global shortcuts. • Local repository. (PowerCenter
only.) A repository within a domain that is not the global repository. Each local repository in the domain
can connect to the global repository and use objects in its shared folders.How can u work with
remote database in informatica?did u work directly by using remote connections?To
work with remote datasource u need to connect it with remote connections.But it is not
preferable to work with that remote source directly by using remote connections .Instead u bring that
source into U r local machine where informatica server resides.If u work directly with remote source the
session performance will decreases by passing less amount of data across the network in a particular
time.
You can work with remote,

But you have to

Configure FTP
Connection details
IP address
User authentication
what is incremantal aggregation?When using incremental aggregation, you apply captured
changes in the source to aggregate calculations in a session. If the source changes only incrementally
and you can capture changes, you can configure the session to process only those changes. This allows
the Informatica Server to update your target incrementally, rather than forcing it to process the entire
source and recalculate the same calculations each time you run the session.What r the scheduling
options to run a sesion?U can shedule a session to run at a given time or intervel,or u can manually
run the session.

Different options of scheduling

Run only on demand: server runs the session only when user starts session explicitly
Run once: Informatica server runs the session only once at a specified date and time.
Run every: Informatica server runs the session at regular intervels as u configured.
Customized repeat: Informatica server runs the session at the dats and times secified in the repeat dialog
box.What is tracing level and what r the types of tracing level?Tracing level represents the
amount of information that informatcia server writes in a log file.
Types of tracing level
Normal
Verbose
Verbose init
Verbose dataWhat is difference between stored procedure transformation and external
procedure transformation?In case of storedprocedure transformation procedure will be compiled
and executed in a relational data source.U need data base connection to import the stored procedure in
to u’r maping.Where as in external procedure transformation procedure or function will be executed out
side of data source.Ie u need to make it as a DLL to access in u r maping.No need to have data base
connection in case of external procedure transformation.Explain about Recovering sessions?If
you stop a session or if an error causes a session to stop, refer to the session and error logs to determine
the cause of failure. Correct the errors, and then complete the session. The method you use to complete
the session depends on the properties of the mapping, session, and Informatica Server configuration.
Use one of the following methods to complete the session:
· Run the session again if the Informatica Server has not issued a commit.
· Truncate the target tables and run the session again if the session is not recoverable.
· Consider performing recovery if the Informatica Server has issued at least one commit. If a session
fails after loading of 10,000 records in to the target.How can u load the records from
10001 th record when u run the session next time?As explained above informatcia server has
3 methods to recovering the sessions.Use performing recovery to load the records from where the
session fails.Explain about perform recovery?When the Informatica Server starts a recovery
session, it reads the OPB_SRVR_RECOVERY table and notes the row ID of the last row committed to
the target database. The Informatica Server then reads all sources again and starts processing from the
next row ID. For example, if the Informatica Server commits 10,000 rows before the session fails, when
you run recovery, the Informatica Server bypasses the rows up to 10,000 and starts loading with row
10,001.
By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery in
the Informatica Server setup before you run a session so the Informatica Server can create and/or write
entries in the OPB_SRVR_RECOVERY table.How to recover the standalone session?A
standalone session is a session that is not nested in a batch. If a standalone session fails, you can run
recovery using a menu command or pmcmd. These options are not available for batched sessions.

To recover sessions using the menu:


1. In the Server Manager, highlight the session you want to recover.
2. Select Server Requests-Stop from the menu.
3. With the failed session highlighted, select Server Requests-Start Session in Recovery Mode from the
menu.

To recover sessions using pmcmd:


1.From the command line, stop the session.
2. From the command line, start recovery.How can u recover the session in sequential
batches?If you configure a session in a sequential batch to stop on failure, you can run recovery
starting with the failed session. The Informatica Server completes the session and then runs the rest of
the batch. Use the Perform Recovery session property

To recover sessions in sequential batches configured to stop on failure:

1.In the Server Manager, open the session property sheet.


2.On the Log Files tab, select Perform Recovery, and click OK.
3.Run the session.
4.After the batch completes, open the session property sheet.
5.Clear Perform Recovery, and click OK.

If you do not clear Perform Recovery, the next time you run the session, the Informatica Server attempts
to recover the previous session.
If you do not configure a session in a sequential batch to stop on failure, and the remaining sessions in
the batch complete, recover the failed session as a standalone session. How to recover sessions
in concurrent batches?If multiple sessions in a concurrent batch fail, you might want to truncate all
targets and run the batch again. However, if a session in a concurrent batch fails and the rest of the
sessions complete successfully, you can recover the session as a standalone session.
To recover a session in a concurrent batch:
1.Copy the failed session using Operations-Copy Session.
2.Drag the copied session outside the batch to be a standalone session.
3.Follow the steps to recover a standalone session.
4.Delete the standalone copy.How can u complete unrcoverable sessions?Under certain
circumstances, when a session does not complete, you need to truncate the target tables and run the
session from the beginning. Run the session from the beginning when the Informatica Server cannot run
recovery or when running recovery might result in inconsistent data.What r the circumstances that
infromatica server results an unreciverable session?The source qualifier transformation does
not use sorted ports.
If u change the partition information after the initial session fails.
Perform recovery is disabled in the informatica server configuration.
If the sources or targets changes after initial session fails.
If the maping consists of sequence generator or normalizer transformation.
If a concuurent batche contains multiple failed sessions.If i done any modifications for my table
in back end does it reflect in informatca warehouse or maping desginer or source
analyzer?NO. Informatica is not at all concern with back end data base.It displays u all the information
that is to be stored in repository.If want to reflect back end changes to informatica screens,
again u have to import from back end to informatica by valid connection.And u have to replace the
existing files with imported files.After draging the ports of three sources(sql
server,oracle,informix) to a single source qualifier, can u map these three ports
directly to target?NO.Unless and until u join those three ports in source qualifier u cannot map them
directly
if u drag three hetrogenous sources and populated to target without any join means you are entertaining
Carteisn product. If you don't use join means not only diffrent sources but homegeous sources are show
same error.
If you are not interested to use joins at source qualifier level u can add some joins sepratly.
What are Target Types on the Server?Target Types are File, Relational, XML and ERP What are
Target Options on the Servers?Target Options for File Target type are FTP File, Loader and MQ.
There are no target options for ERP target type
Target Options for Relational are Insert, Update (as Update), Update (as Insert), Update (else Insert),
Delete, and Truncate Table
How do you identify existing rows of data in the target table using lookup transformation?
Can identify existing rows of data using unconnected lookup transformation.
You can use a Connected Lookup with dynamic cache on the target
What are Aggregate transformation?
Aggregator transform is much like the Group by clause in traditional SQL.
this particular transform is a connected/active transform which can take the incoming data form
the mapping pipeline and group them based on the group by ports specified and can calculated
aggregate funtions like ( avg, sum, count, stddev....e.tc) for each of those groups.
From a performanace perspective if your mapping has an AGGREGATOR transform use filters and
sorters very early in the pipeline if there is any need for them.
What are various types of Aggregation?
Various types of aggregation are SUM, AVG, COUNT, MAX, MIN, FIRST, LAST, MEDIAN,
PERCENTILE, STDDEV, and VARIANCE.
What is Code Page Compatibility?Compatibility between code pages is used for accurate data
movement when the Informatica Sever runs in the Unicode data movement mode. If the code pages are
identical, then there will not be any data loss. One code page can be a subset or superset of another. For
accurate data movement, the target code page must be a superset of the source code page.
Superset - A code page is a superset of another code page when it contains the character encoded in the
other code page, it also contains additional characters not contained in the other code page.
Subset - A code page is a subset of another code page when all characters in the code page are encoded
in the other code page.
What is Code Page used for?
Code Page is used to identify characters that might be in different languages. If you are importing
Japanese data into mapping, u must select the Japanese code page of source data.
what is a source qualifier?
It is a transformation which represents the data Informatica server reads from source.
The Source Qualifier represents the rows that the Informatica Server reads when it executes a session. It
represents all data queried from the source.
What are Dimensions and various types of Dimensions?
set of level properties that describe a specific aspect of a business, used for analyzing the factual
measures of one or more cubes, which use that dimension. Egs. Geography, time, customer and
product.
What is Data Transformation Manager?After the load manager performs validations for the
session, it creates the DTM process. The DTM process is the second process associated with the session
run. The primary purpose of the DTM process is to create and manage threads that carry out the session
tasks.
· The DTM allocates process memory for the session and divide it into buffers. This is
also known as buffer memory. It creates the main thread, which is called the master
thread. The master thread creates and manages all other threads.
· If we partition a session, the DTM creates a set of threads for each partition to allow
concurrent processing.. When Informatica server writes messages to the session log it
includes thread type and thread ID. Following are the types of threads that DTM creates:
Master thread - Main thread of the DTM process. Creates and manages all other
threads.Mapping thread - One Thread to Each Session. Fetches Session and Mapping
Information.Pre and Post Session Thread-One Thread each to Perform Pre and Post Session
Operations.reader thread-One Thread for Each Partition for Each Source Pipeline.WRITER
THREAD-One Thread for Each Partition if target exist in the source pipeline write to the
target.tRANSFORMATION THREAD - One or More Transformation Thread For Each Partition.
What is Session and Batches?Session - A Session Is A set of instructions that tells the Informatica
Server How And When To Move Data From Sources To Targets. After creating the session, we can use
either the server manager or the command line program pmcmd to start or stop the session.Batches - It
Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The Informatica Server.
There Are Two Types Of Batches :
Sequential - Run Session One after the Other.concurrent - Run Session At The Same Time.
Why we use lookup transformations?Lookup Transformations can access data from relational
tables that are not sources in mapping. With Lookup transformation, we can accomplish the following
tasks:
Get a related value-Get the Employee Name from Employee table based on the Employee
IDPerform Calculation.
Update slowly changing dimension tables - We can use unconnected lookup transformation to
determine whether the records already exist in the target or not.

ETL Questions and Answers


what is the metadata extension?

Informatica allows end users and partners to extend the metadata stored in the repository by associating
information with individual objects in the repository. For example, when you create a mapping, you can
store your contact information with the mapping. You associate information with repository metadata
using metadata extensions.
Informatica Client applications can contain the following types of metadata extensions:
• Vendor-defined. Third-party application vendors create vendor-defined metadata extensions.
You can view and change the values of vendor-defined metadata extensions, but you cannot
create, delete, or redefine them.
• User-defined. You create user-defined metadata extensions using PowerCenter/PowerMart.
You can create, edit, delete, and view user-defined metadata extensions. You can also change
the values of user-defined extensions.
what is ODS (operation data source)
ANS1: ODS - Operational Data Store.
ODS Comes between staging area & Data Warehouse. The data is ODS will be at the low level of
granularity.
Once data was poopulated in ODS aggregated data will be loaded into into EDW through ODS.
ANS2: An updatable set of integrated operational data used for enterprise- wide tactical decision
making.Contains live data, not snapshots,and has minimal history retained
can we lookup a table from source qualifier transformation. ie. unconnected lookup
You cannot lookup from a source qualifier directly. However, you can override the SQL in the source
qualifier to join with the lookup table to perform the lookup.
What are the different Lookup methods used in Informatica?
In the lookup transormation mainly 2 types
1)connected 2)unconnected lookup
Connected lookup: 1)It recive the value directly from pipeline
2)it iwill use both dynamic and static
3)it return multiple value
4)it support userdefined value
Unconnected lookup:it recives the value :lkp expression
2)it will be use only dynamic
3)it return only single value
4)it does not support user defined values
What are parameter files ? Where do we use them?
Parameter file is any text file where u can define a value for the parameter defined in the informatica
session, this parameter file can be referenced in the session properties,When the informatica sessions
runs the values for the parameter is fetched from the specified file. For eg : $$ABC is defined in the
infomatica mapping and the value for this variable is defined in the file called abc.txt as
[foldername_session_name]
ABC='hello world"

In the session properties u can give in the parameter file name field abc.txt
What is a mapping, session, worklet, workflow, mapplet?
Mapping - represents the flow and transformation of data from source to taraget.
Mapplet - a group of transformations that can be called within a mapping.
Session - a task associated with a mapping to define the connections and other configurations for that
mapping.
Workflow - controls the execution of tasks such as commands, emails and sessions.
Worklet - a workflow that can be called within a workflow.
Session - a task associated with a mapping to define the connections and other configurations for that
mapping. Workflow - controls the execution of tasks such as commands, emails and sessions. Worklet - a workflow
that can be called within a workflow. Mapping - represents the flow and transformation of data from source to
taraget.

Mapplet - a group of transformations that can be called within a mapping.

What is the difference between Power Center & Power Mart?


Power Mart is designed for:

Low range of warehouses


only for local repositories
mainly desktop environment.

Power mart is designed for:

High-end warehouses
Global as well as local repositories
ERP support.
Can Informatica load heterogeneous targets from heterogeneous sources?
yes! it loads from heterogeneous sources..
What are the various tools? - Name a few
The various ETL tools are as follows.

Informatica
Datastage
Business Objects Data Integrator
Abinitio,

OLAp tools are as follows.

Cognos
Business Objects
What are snapshots? What are materialized views & where do we use them? What is
a materialized view log?
Materialized view is a view in wich data is also stored in some temp table.i.e if we will go with the View
concept in DB in that we only store query and once we call View it extract data from DB.But In
materialized View data is stored in some temp tables.
What is partitioning? What are the types of partitioning?
Partitioning is a part of physical data warehouse design that is carried out to improve performance and
simplify stored-data management. Partitioning is done to break up a large table into smaller,
independently-manageable components because it:
1. reduces work involved with addition of new data.
2. reduces work involved with purging of old data.

Two types of partitioning are:


1. Horizontal partitioning.
2. Vertical partitioning (reduces efficiency in the context of a data warehouse).
What is Full load & Incremental or Refresh load?
Full Load is the entire data dump load taking place the very first time.
Gradually to synchronize the target data with source data, there are further 2 techniques:-
Refresh load - Where the existing data is truncated and reloaded completely.
Incremental - Where delta or difference between target and source data is dumped at regular intervals.
Timestamp for previous delta load has to be maintained.

What are the modules in Power Mart?

1. Power Mart Designer


2. Server
3. Server Manager
4. Repository
5. Repository Manager
What is a staging area? Do we need it? What is the purpose of a staging area?
Staging area is place where you hold temporary tables on data warehouse server. Staging tables are
connected to work area or fact tables. We basically need staging area to hold the data , and perform data
cleansing and merging , before loading the data into warehouse
A staging area is like a large table with data separated from their sources to be loaded into a data
warehouse in the required format. If we attempt to load data directly from OLTP, it might mess up the
OLTP because of format changes between a warehouse and OLTP. Keeping the OLTP data intact is very
important for both the OLTP and the warehouse.
Staging area is a temp schema used to
1. Do Flat mapping i.e dumping all the OLTP data in to it without applying any business rules pushing
data into staging will take less time because there is no business rules or transformation applied on it.

2. Used for data cleansing and validation using First Logic.


How to determine what records to extract?
Data modeler will provide the ETL developer, the tables that are to be extracted from various sources.
When addressing a table some dimension key must reflect the need for a record to get extracted. Mostly
it will be from time dimension (e.g. date >= 1st of current mth) or a transaction flag (e.g. Order Invoiced
Stat). Foolproof would be adding an archive flag to record which gets reset when record changes
What are the various transformation available?

Aggregator Transformation
Expression Transformation
Filter Transformation
Joiner Transformation
Lookup Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Sequence Generator Transformation
Stored Procedure Transformation
Sorter Transformation
Update Strategy Transformation
XML Source Qualifier Transformation
Advanced External Procedure Transformation
External Transformation

What is a three tier data warehouse?

Three tier data warehouse contains three tier such as bottom tier, middle tier and top tier.
Bottom tier deals with retrieving related data’s or information from various information repositories by
using SQL.
Middle tier contains two types of servers.
1. ROLAP server
2. MOLAP server
Top tier deals with presentation or visualization of the results . The 3 tiers are:
1. Data tier - bottom tier - consists of the database
2. Application tier - middle tier - consists of the analytical server
3. Presentation tier - tier that interacts with the end-user
Do we need an ETL tool? When do we go for the tools in the market?

ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making.
Before the evolution of ETL Tools, the above mentioned ETL process was done manually by using SQL
code created by programmers. This task was tedious and cumbersome in many cases since it involved
many resources, complex coding and more work hours. On top of it, maintaining the code placed a great
challenge among the programmers.

These difficulties are eliminated by ETL Tools since they are very powerful and they offer many
advantages in all stages of ETL process starting from extraction, data cleansing, data profiling,
transformation, debugging and loading into data warehouse when compared to the old method.
1. Normally ETL Tool stands for Extraction Transformation Loader

2. This helps you to extract the data from different ODS/Database,

3. If you have a requirement like this you need to get the ETL tools, else you no need any
ETL

How can we use mapping variables in Informatica? Where do we use them?

After creating a variable, we can use it in any expression in a mapping or a mapplet. Als they can be
used in source qualifier filter, user defined joins or extract overrides and in expression editor of reusable
transformations.
Their values can change automatically between sessions.

What are the various methods of getting incremental records or delta records from the source
systems

getting incremental records from source systems to target can be done


by using incremental aggregation transformation
Techniques of Error Handling - Ignore , Rejecting bad records to a flat file , loading
the records and reviewing them (default values)
Rejection of records either at the database due to constraint key violation or the informatica server when
writing data into target table These rejected records we can find in the bad file folder where a reject file
will be created for a session. we can check why a record has been rejected and this bad file contains first
column a row indicator and second column a column indicator.
These row indicators or of four types
D-valid data,
O-overflowed data,
N-null data,
T- Truncated data,
And depending on these indicators we can changes to load data successfully to target.
Can we use procedural logic inside Inforrmatica If yes how if now how can we use
external procedural logic in Inforrmatica?
We can use External Procedure Transformation to use external procedures. Both COM and Inforrmatica
Procedures are supported using External procedure Transformation
Can we override a native sql query within Informatica? Where do we do it? How do
we do it?
we can override a sql query in the sql override property of a source qualifier
What is latest version of Power Center / Power Mart?
Power Center 7.1
How do we call shell scripts from Inforrmatica?

You can use a Command task to call the shell scripts, in the following ways:
1. Standalone Command task. You can use a Command task anywhere in the workflow or worklet to run
shell commands.
2. Pre- and post-session shell command. You can call a Command task as the pre- or post-session shell
command for a Session task. For more information about specifying pre-session and post-session shell
commands

What is Informatica Metadata and where is it stored?

Informatica Metadata contains all the information about the source tables, target tables, the
transformations, so that it will be useful and easy to perform transformations during the ETL process.

The Informatica Metadata is stored in Informatica repository


What are active transformation / Passive transformations?
An active transformation can change the number of rows as output after a transformation, while a passive
transformation does not change the number of rows and passes through the same number of rows that
was given to it as input.
Transformations can be active or passive. An active transformation can change the number of rows that
pass through it, such as a Filter transformation that removes rows that do not meet the filter condition. A
passive transformation does not change the number of rows that pass through it, such as an Expression
transformation that performs a calculation on data and passes all rows through the transformation
Active transformations
Advanced External Procedure
Aggregator
Application Source Qualifier
Filter
Joiner
Normalizer
Rank
Router
Update Strategy
Passive transformation
Expression
External Procedure
Maplet- Input
Lookup
Sequence generator
XML Source Qualifier
Maplet - Output
When do we Analyze the tables? How do we do it?

When the data in the data warehouse changes frequently we need to analyze the tables. Analyze tables
will compute/update the table statistics, that will help to boost the performance of your SQL.

Compare ETL & Manual development?

There are pros and cons of both tool based ETL and hand-coded ETL. Tool based ETL provides
maintainability, ease of development and graphical view of the flow. It also reduces the learning curve on
the team.
Handcoded ETL is good when there is minimal transformational logic involved. It is also good when the
sources and targets are in the same environment. However, depending on the skill level of the team, this
can extend the overall development time.
Primary Key Materialized ViewsThe following statement creates the primary-key
materialized view on the table emp located on a remote database.SQL> CREATE
MATERIALIZED VIEW mv_emp_pk REFRESH FAST START WITH SYSDATE
NEXT SYSDATE + 1/48 WITH PRIMARY KEY AS SELECT * FROM
emp@remote_db; Materialized view created.Note: When you create a materialized view
using the FAST option you will need to create a view log on the master tables(s) as shown
below:SQL> CREATE MATERIALIZED VIEW LOG ON emp;Materialized view log
created.Rowid Materialized ViewsThe following statement creates the row id materialized
view on table emp located on a remote database:SQL> CREATE MATERIALIZED VIEW
mv_emp_rowid REFRESH WITH ROWID AS SELECT * FROM
emp@remote_db; Materialized view log created.Sub query Materialized ViewsThe
following statement creates a sub query materialized view based on the emp and dept
tables located on the remote database:SQL> CREATE MATERIALIZED VIEW mv_empdeptAS
SELECT * FROM emp@remote_db eWHERE EXISTS (SELECT * FROM dept@remote_db
d WHERE e.dept_no = d.dept_no)REFRESH CLAUSE[refresh [fast|
complete|force] [on demand | commit] [start with date] [next
date] [with {primary key|rowid}]]The refresh option specifies:
a. The refresh method used by Oracle to refresh data in materialized view
b. Whether the view is primary key based or row-id based
c. The time and interval at which the view is to be refreshed
Refresh Method - FAST ClauseThe FAST refreshes use the materialized view logs (as seen
above) to send the rows that have changed from master tables to the materialized view.You
should create a materialized view log for the master tables if you specify the REFRESH FAST
clause. SQL> CREATE MATERIALIZED VIEW LOG ON emp; Materialized view log
created.Materialized views are not eligible for fast refresh if the defined subquery contains
an analytic function.Refresh Method - COMPLETE ClauseThe complete refresh re-creates the
entire materialized view. If you request a complete refresh, Oracle performs a complete
refresh even if a fast refresh is possible.Refresh Method - FORCE ClauseWhen you specify a
FORCE clause, Oracle will perform a fast refresh if one is possible or a complete refresh
otherwise. If you do not specify a refresh method (FAST, COMPLETE, or FORCE), FORCE is
the default.PRIMARY KEY and ROWID ClauseWITH PRIMARY KEY is used to create a primary
key materialized view i.e. the materialized view is based on the primary key of the master
table instead of ROWID (for ROWID clause). PRIMARY KEY is the default option. To use the
PRIMARY KEY clause you should have defined PRIMARY KEY on the master table or else you
should use ROWID based materialized views.Primary key materialized views allow
materialized view master tables to be reorganized without affecting the eligibility of the
materialized view for fast refresh. Rowid materialized views should have a single master
table and cannot contain any of the following:
• Distinct or aggregate functions
• GROUP BY Subqueries , Joins & Set operations
Timing the refreshThe START WITH clause tells the database when to perform the
first replication from the master table to the local base table. It should evaluate to a future
point in time. The NEXT clause specifies the interval between refreshesSQL> CREATE
MATERIALIZED VIEW mv_emp_pk REFRESH FAST START WITH SYSDATE
NEXT SYSDATE + 2 WITH PRIMARY KEY AS SELECT * FROM
emp@remote_db; Materialized view created.In the above example, the first copy of the
materialized view is made at SYSDATE and the interval at which the refresh has to be
performed is every two days.

Informatica Training in Bangalore, Marathahalli


Top of Form

Bottom of Form

Informatica Interview Questions - Part 15


What are active and passive transformations?
Transformations can be active or passive. An active transformation can change the number of
rows that pass through it, such as a Filter transformation that removes rows that do not meet the
filter condition.

A passive transformation does not change the number of rows that pass through it, such as an
Expression transformation that performs a calculation on data and passes all rows through the
transformation.

What is tracing level and what are the types of tracing levels?
Tracing level represents the amount of information that informatcia server writes in a log file.

Types of tracing level:


Normal
Verbose
Verbose init
Verbose data
How can you say that union Transormation is Active transformation?
By Definition, Active transformation is the transformation that changes the number of rows that
pass through it. In union transformation the number of rows resulting from union can be different
from the actual number of rows.

Is a fact table normalized or de-normalized?


A fact table is always DENORMALISED table. It consists of data from dimension table (Primary
Key's) and Fact table has foreign keys and measures.

0 comments

Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz

Informatica Interview Questions - Part 14


What are the different threads in DTM process?
Master thread: Creates and manages all other threads

Mapping thread: One mapping thread will be creates for each session.Fectchs session and
mapping information.

Pre and post session threads: This will be created to perform pre and post session
operations.

Reader thread: One thread will be created for each partition of a source. It reads data from
source.

Writer thread: It will be created to load data to the target.

Transformation thread: It will be created to transform data.

If we are using Update Strategy Transformation in a mapping how can we know whether
insert or update or reject or delete option has been selected during running of sessions
in Informatica?
In Designer while creating Update Strategy Transformation uncheck "forward to next
transformation". If any rejected rows are there automatically it will be updated to the session log
file.
Update or insert files are known by checking the target file or table only.

How to join two tables without using the Joiner Transformation?


It’s possible to join the two or more tables by using source qualifier. But provided the tables
should have relationship.

When you drag and drop the tables you will be getting the source qualifier for each table. Delete
all the source qualifiers. Add a common source qualifier for all. Right click on the source qualifier
you will find EDIT, click on it. Click on the properties tab and then you will find sql query in that
you can write your sql.

Which is better among incremental load, Normal Load and Bulk load?
It depends on the requirement. Otherwise Incremental load can be better as it takes only that
data which is not available previously on the target.

What is the difference between summary filter and detail filter?


Summary filter can be applied on a group of rows that contain a common value. Whereas detail
filters can be applied on each and every red of the data base.

What are the tasks that Load manger process will do?
Manages the session and batch scheduling: When you start the informatica server the load
manager launches and queries the repository for a list of sessions configured to run on the
informatica server. When you configure the session the load manager maintains list of list of
sessions and session start times. When you start a session load manger fetches the session
information from the repository to perform the validations and verifications prior to starting DTM
process.

Locking and reading the session: When the informatica server starts a session load manager
locks the session from the repository. Locking prevents starting the session again and again.

Reading the parameter file: If the session uses a parameter files,loadmanager reads the
parameter file and verifies that the session level parameters are declared in the file

Verifies permission and privileges: When the session starts load manger checks whether or
not the user have privileges to run the session.
Creating log files: Load manger creates log file contains the status of session.

0 comments

Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz

Informatica Interview Questions - Part 13


What is Router transformation?
Router transformation allows you to use a condition to test data. It is similar to filter
transformation. It allows the testing to be done on one or more conditions.

What type of metadata is stored in repository?


Source definitions: Definitions of database objects (tables, views, synonyms) or files that
provide source data.

Target definitions: Definitions of database objects or files that contain the target data.

Multi-dimensional metadata: Target definitions that are configured as cubes and dimensions.

Mappings: A set of source and target definitions along with transformations containing business
logic that you build into the transformation. These are the instructions that the Informatica
Server uses to transform and move data.

Reusable transformations: Transformations that you can use in multiple mappings.

Mapplets: A set of transformations that you can use in multiple mappings.

Sessions and workflows: Sessions and workflows store information about how and when the
Informatica Server moves data. A workflow is a set of instructions that describes how and when
to run tasks related to extracting, transforming, and loading data. A session is a type of task that
you can put in a workflow. Each session corresponds to a single mapping.

How to delete duplicate rows in flat files source?


Use a sorter transformation, in this you will have a "distinct" option make use of it.
Can you use aggregator/active transformation after update strategy transformation?
You can use aggregator after update strategy. The problem will be, once you perform the
update strategy, say you had flagged some rows to be deleted and you had performed
aggregator transformation for all rows, say you are using SUM function, then the deleted rows
will be subtracted from this aggregator transformation.

What is the difference between dimension table and fact table and what are different
dimension tables and fact tables?
Fact table contain measurable data, contains primary key

Different types of fact tables:


1. Additive
2. Non additive
3. Semi additive

Dimensions table contain textual description of data.


It contains primary key.

0 comments

Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz

Informatica Interview Questions - Part 12


What is meant by lookup cache?
The informatica server builds a cache in memory when it processes the first row at a data in a
cached look up transformation. It allocates memory for the cache based on the amount you
configure in the transformation or session properties. The informatica server stores condition
values in the index cache and output values in the data cache.

Can you use the mapping parameters or variables created in one mapping into any other
reusable transformation?
Yes. Because reusable transformation is not contained with any maplet or mapping.

What are reusable transformations?


You can design using two methods:
1. using transformation developer
2. Create normal one and promote it to reusable

What is Code Page used for?


Code Page is used to identify characters that might be in different languages. If you are
importing Japanese data into mapping, you must select the Japanese code page of source
data.

Can you use a session Bulk loading options and during this time can you make a
recovery to the session?
If the session is configured to use in bulk mode it will not write recovery information to recovery
tables. So Bulk loading will not perform the recovery as required.

What are the differences between connected and unconnected lookup?


Connected lookup:
1) Receives input values directly from the pipe line.
2) you can use a dynamic or static cache.
3) Cache includes all lookup columns used in the mapping.
4) Support user defined default values.

Unconnected lookup:
1) Receives input values from the result of a lkp expression in a another transformation.
2) You can use a static cache.
3) Cache includes all lookup output ports in the lookup condition and the lookup/return port.
4) Does not support user defined default values.

0 comments

Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz

Informatica Interview Questions - Part 11


What are the scheduling options to run a session?
A session can be scheduled to run at a given time or intervel, or you can manually run the
session.
Different options of scheduling:
Run only on demand: server runs the session only when user starts session explicitly.
Run once: Informatica server runs the session only once at a specified date and time.
Run every: Informatica server runs the session at regular intervals as u configured.
Customized repeat: Informatica server runs the session at the dates and times specified in the
repeat dialog box.

What is parameter file?


Parameter file is to define the values for parameters and variables used in a session. A
parameter file is a file created by text editor such as word pad or notepad.

You can define the following values in parameter file:


Mapping parameters
mapping variables
session parameters.

What are the session parameters?


Session parameters are like mapping parameters, that represent values you might want to
change between sessions such as database connections or source files.
Server manager also allows you to create user defined session parameters. Following are user
defined session parameters:
Database connections
Source file names: Use this parameter when you want to change the name or location of
session source file between session runs.
Target file name: Use this parameter when you want to change the name or location of session
target file between session runs.
Reject file name: Use this parameter when you want to change the name or location of session
reject files between session runs.

In a sequential batch can you run the session if previous session fails?
Yes. By setting the option always runs the session.

How can you transform row to a column?


1. We can use normalizer transformation
or
2.Use pivot function in oracle

What are the basic needs to join two sources in a source qualifier?
Basic need to join two sources using source qualifier:
1) Both sources should be in same database
2) The should have at least one column in common with same data types

0 comments

Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz

Informatica Interview Questions - Part 10


What are two types of processes that informatica runs the session?
Load manager Process: Starts the session, creates the DTM process, and sends post-session
email when the session completes.
The DTM process: Creates threads to initialize the session, read, write, and transform data,
and handle pre- and post-session operations.
What are mapping parameters and variables in which situation we can use it ?
If we need to change certain attributes of a mapping after every time the session is run, it will be
very difficult to edit the mapping and then change the attribute. So we use mapping parameters
and variables and define the values in a parameter file. Then we could edit the parameter file to
change the attribute values. This makes the process simple.

Mapping parameter values remain constant. If we need to change the parameter value then we
needs to edit the parameter file.

But value of mapping variables can be changed by using variable function. If we need to
increment the attribute value by 1 after every session run then we can use mapping variables.

In a mapping parameter we need to manually edit the attribute value in the parameter file after
every session run.

What is the method of loading 5 flat files of having same structure to a single target and
which transformations I can use?
Two Methods.
1. Write all files in one directory then use file repository concept (don’t forget to type source file
type as indirect in the session).
2. Use union transformation to combine multiple input files into a single target.

In which circumstances that informatica server creates Reject files?


When it encounters the DD_Reject in update strategy transformation.
Violates database constraint
Field in the rows was truncated or overflown.

What is the default join that source qualifier provides?


Inner equi join.

What is the difference between Stored Procedure (DB level) and Stored proc trans
(INFORMATICA level) ? Why should we use SP trans ?
First of all stored procedures (at DB level) are series of SQL statement. And those are stored
and compiled at the server side. In the Informatica it is a transformation that uses same stored
procedures which are stored in the database. Stored procedures are used to automate time-
consuming tasks that are too complicated for standard SQL statements. if you don't want to use
the stored procedure then you have to create expression transformation and do all the coding in
it.
0 comments

Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz

Informatica Interview Questions - Part 9


What are variable ports and list two situations when they can be used?
We have mainly tree ports Import, Output, Variable port. Import represents data is flowing into
transformation. Out port is used when data is mapped to next transformation. Variable port is
used when we mathematical calculations are required.
This is a scenario in which the source has 2 columns
10 A
10 A
20 C
30 D
40 E
20 C
and there should be 2 targets one to show the duplicate values and another target for distinct
rows.
T1 T2
10 A 10 A
20 C 20 C
30 D

which transformation can be used to load data into target?


40 E
Step1: sort the source data based on the unique key.

Expression:

Flag= iif(col1 =prev_col1,'Y','N')


prev_col1 = col1

Router:
1.for duplicate record: condition: falg = 'Y'
2. For distinct Records condition
flag = 'N'

What r the types of lookup caches?


1) Static Cache
2) Dynamic Cache
3) Persistent Cache
4) Reusable Cache
5) Shared Cache

What are the real times problems that generally come up while doing/running
mapping/any transformation? Explain with an example?
Here are few real time examples of problems while running informatica mappings:

1) Informatica uses OBDC connections to connect to the databases.The database passwords


(production) is changed in a periodic manner and the same is not updated at the Informatica
side. Your mappings will fail in this case and you will get database connectivity error.
2) If you are using Update strategy transformation in the mapping, in the session properties
you have to select Treat Source Rows: Data Driven. If we do not select this Informatica
server will ignore updates and it only inserts rows.

3) If we have mappings loading multiple target tables we have to provide the Target Load Plan
in the sequence we want them to get loaded.
4) Error: Snapshot too old is a very common error when using Oracle tables. We get this error
while using too large tables. Ideally we should schedule these loads when server is not very
busy (meaning when no other loads are running).
5) We might get some poor performance issues while reading from large tables. All the source
tables should be indexed and updated regularly.

Informatica Interview Questions - Part 8


Is sorter an active or passive transformation? What happens if we uncheck the distinct
option in sorter? Will it be under active or passive transformation?
Sorter is an active transformation. if you don't check the distinct option it is considered as a
passive transformation. Because this distinct option eliminates the duplicate records from the
table.

How can we partition a session in Informatica?


Partitioning option optimizes parallel processing on multiprocessor hardware by providing a
thread-based architecture and built-in data partitioning. GUI-based tools reduce the
development effort necessary to create data partitions and streamline ongoing troubleshooting
and performance tuning tasks, while ensuring data integrity throughout the execution process.
As the amount of data within an organization expands and real-time demand for information
grows, the Power Center Partitioning option enables hardware and applications to provide
outstanding performance and jointly scale to handle large volumes of data and users.

In update strategy target table or flat file which gives more performance? Why?
Pros: Loading, Sorting, Merging operations will be faster as there is no index concept and Data
will be in ASCII mode.
Cons: There is no concept of updating existing records in flat file. As there is no indexes, while
lookups speed will be lesser.

What is the difference between constraint base load ordering and target load plan ?
Constraint based load ordering

Example:
Table 1---Master
Take 2---Detail

If the data in Table-1 is dependent on the data in Table-2 then Table-2 should be loaded first. In
such cases to control the load order of the tables we need some conditional loading which is
nothing but constraint based load. In Informatica this feature is implemented by just one check
box at the session level.

What is parameter file?


When you start a workflow, you can optionally enter the directory and name of a parameter file.
The Informatica Server runs the workflow using the parameters in the file you specify.

For UNIX shell users, enclose the parameter file name in single quotes:
-paramfile '$PMRootDir/myfile.txt'

For Windows command prompt users, the parameter file name cannot have beginning or
trailing spaces. If the name includes spaces, enclose the file name in double quotes:
-paramfile ?$PMRootDirmy file.txt?

Note: When you write a pmcmd command that includes a parameter file located on another
machine, use the backslash () with the dollar sign ($). This ensures that the machine where the
variable is defined expands the server variable.
Pmcmd startworkflow -UV USERNAME -PV PASSWORD -s SALES: 6258 -f east -w wSalesAvg
-paramfile '$PMRootDir/myfile.txt'

Informatica interview questions - Part 7


Define informatica repository?
Infromatica Repository: The informatica repository is at the center of the informatica suite. You
create a set of metadata tables within the repository database that the informatica application
and tools access. The informatica client and server access the repository to save and retrieve
metadata.

What are the difference between joiner transformation and source qualifier
transformation?
Joiner Transformation can be used to join tables from heterogeneous (different sources), but we
still need a common key from both tables. If we join two tables without a common key we will
end up in a Cartesian Join. Joiner can be used to join tables from difference source systems
where as Source qualifier can be used to join tables in the same database. We definitely need a
common key to join two tables no mater they are in same database or difference databases.

How can you improve session performance in aggregator transformation?


One way is supplying the sorted input to aggregator transformation. In situations where sorted
input cannot be supplied, we need to configure data cache and index cache at
session/transformation level to allocate more space to support aggregation.

What is the difference between connected and unconnected stored procedures?


Unconnected: The unconnected Stored Procedure transformation is not connected directly to
the flow of the mapping. It either runs before or after the session, or is called by an expression
in another transformation in the mapping.
Connected: The flow of data through a mapping in connected mode also passes through the
Stored Procedure transformation. All data entering the transformation through the input ports
affects the stored procedure. You should use a connected Stored Procedure transformation
when you need data from an input port sent as an input parameter to the stored procedure, or
the results of a stored procedure sent as an output parameter to another transformation.

Informatica interview questions - Part 6


Explain error handling in informatica with examples?
There is one file called the bad file which generally has the format as *.bad and it contains the
records rejected by informatica server. There are two parameters one for the types of row and
other for the types of columns. The row indicators signify what operation is going to take place
(i.e. insertion, deletion, updating etc.). The column indicators contain information regarding why
the column has been rejected. (Such as violation of not null constraint, value error, overflow
etc.) If one rectifies the error in the data present in the bad file and then reloads the data in the
target, then the table will contain only valid data.

What is power center repository?


Standalone repository: A repository that functions individually, unrelated and unconnected to
other repositories.
Global repository: (Power Center only.) The centralized repository in a domain, a group of
connected repositories. Each domain can contain one global repository. The global repository
can contain common objects to be shared throughout the domain through global shortcuts.
Local repository. (Power Center only.) A repository within a domain that is not the global
repository. Each local repository in the domain can connect to the global repository and use
objects in its shared folders.

Explain difference between static and dynamic cache with one example?
Static Cache: Once the data is cached, it will not change. Example unconnected lookup uses
static cache.
Dynamic Cache: The cache is updated as to reflect the update in the table (or source) for
which it is referring to. (Ex. connected lookup).

What is update strategy transformation?


The model you choose constitutes your update strategy, how to handle changes to existing
rows. In Power Center and Power Mart, you set your update strategy at two different levels:

Within a session. When you configure a session, you can instruct the Informatica Server to
either treat all rows in the same way (for example, treat all rows as inserts), or use instructions
coded into the session mapping to flag rows for different database operations.
Within a mapping. Within a mapping, you use the Update Strategy transformation to flag rows
for insert, delete, update, or reject.

Explain Informatica server Architecture?


Informatica server, load manager/rs,data transfer manager,reader,temp server and writer are
the components of informatica server. First load manager sends a request to the reader if the
reader is ready to read the data from source and dump into the temp server and data transfer
manager manages the load and it send the request to writer as per first in first out process and
writer takes the data from temp server and loads it into the target.

What is Data driven?


The informatica server follows instructions coded into update strategy transformations with in
the session mapping determine how to flag records for insert, update, delete or reject. If you do
not choose data driven option setting, the informatica server ignores all update strategy
transformations in the mapping.

How the informatica server sorts the string values in Rank transformation?
We can run informatica server either in UNICODE data moment mode or ASCII data moment
mode.
Unicode mode: In this mode informatica server sorts the data as per the sorted order in
session.
ASCII Mode: In this mode informatica server sorts the date as per the binary order.
When do you use an unconnected lookup and connected lookup?
Or
what is the difference between dynamic and static lookup?
Or
Why and when do we use dynamic and static lookup?
In static lookup cache, you cache all the lookup data at the starting of the session. In dynamic
lookup cache, you go and query the database to get the lookup value for each record which
needs the lookup. Static lookup cache adds to the session run time, but it saves time as
informatica does not need to connect to your database every time it needs to lookup. Depending
on how many rows in your mapping needs a lookup, you can decide on this. Also remember
that static lookup eats up space. so remember to select only those columns which are needed.

How do we do unit testing in informatica? How do we load data in informatica?


Unit testing in informatica are of two types
1. Quantitative testing
2. Qualitative testing

Steps:
1. First validate the mapping
2.Create session on the mapping and then run workflow.

Once the session is succeeded then right click on session and go for statistics tab.There you
can see how many numbers of source rows are applied and how many number of rows loaded
in to targets and how many number of rows rejected. This is called Quantitative testing.

If once rows are successfully loaded then we will go for qualitative testing.

Steps:
1.Take the DATM (DATM means where all business rules are mentioned to the corresponding
source columns) and check whether the data is loaded according to the DATM in to target table.
If any data is not loaded according to the DATM then go and check in the code and rectify it.

This is called Qualitative testing.This is what a developer will do in Unit Testing.

What are the output files that the informatica server creates during the session
run
What are the output files that the informatica server creates during the session
run?
Informatica server log: Informatica server(on Unix) creates a log for all status and
error messages(default name: pm.server.log). It also creates an error log for error
messages. These files will be created in informatica home directory.

Session log file: Informatica server creates session log file for each session. It writes
information about session into log files such as initialization process, creation of sql
commands for reader and writer threads, errors encountered and load summary. The
amount of detail in session log file depends on the tracing level that you set.

Session detail file: This file contains load statistics for each target in mapping. Session
detail include information such as table name, number of rows written or rejected you
can view this file by double clicking on the session in monitor window.

Performance detail file: This file contains information known as session performance
details which helps you where performance can be improved. To generate this file
select the performance detail option in the session property sheet.

Reject file: This file contains the rows of data that the writer does not write to targets.

Control file: Informatica server creates control file and a target file when you run a
session that uses the external loader. The control file contains the information about the
target flat file such as data format and loading instructions for the external loader.

Post session email: Post session email allows you to automatically communicate
information about a session run to designated recipents.You can create two different
messages. One if the session completed successfully the other if the session fails.

Indicator file: If you use the flat file as a target, you can configure the informatica
server to create indicator file. For each target row, the indicator file contains a number to
indicate
whether the row was marked for insert, update, delete or reject.

Output file: If session writes to a target file, the informatica server creates the target file
based on file properties entered in the session property sheet.

Cache files: When the informatica server creates memory cache it also creates cache
files.

For the following circumstances informatica server creates index and data cache
files:
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation

How do you handle decimal places while importing a flat file into informatica?
While importing flat file definition just specify the scale for a numeric data type. In the mapping,
the flat file source supports only number data type(no decimal and integer). In the SQ
associated with that source will have a data type as decimal for that number port of the source.
Source - Number data type port - SQ - decimal datatype. Integer is not supported. Hence
decimal is taken care.

What is the use of incremental aggregation? Explain in brief with an example?


It’s a session option. When the informatica server performs incremental aggregation, it passes
new source data through the mapping and uses historical chache data to perform new
aggregation caluculations incrementaly. For performance we will use it.

Differences between Normalizer and Normalizer transformation?


Normalizer: It is a transormation mainly used for Cobol sources,it changes the rows into
columns and columns into rows
Normalization: To remove the redundancy and inconsistency

What is the target load order?


You specify the target load order based on source qualifiers in a maping.If you have the multiple
source qualifiers connected to the multiple targets, you can designate the order in which
informatica server loads data into the targets.

What can you do to increase performance or explain Performance tuning in


Informatica?
What can you do to increase performance or explain Performance tuning in Informatica?

The goal of performance tuning is to optimize session performance so sessions run during the
available load window for the Informatica Server.Increase the session performance by following:

The performance of the Informatica Server is related to network connections. Data generally
moves across a network at less than 1 MB per second, whereas a local disk moves data five to
twenty times faster. Thus network connections often affect on session performance.So aviod
netwrok connections.

Flat files: If your flat files stored on a machine other than the informatca server, move those files
to the machine that consists of informatica server.

Relational datasources: Minimize the connections to sources, targets and informatica server to
improve session performance.Moving target database into server system may improve session
performance.

Staging areas: If you use staging areas you force informatica server to perform multiple
datapasses.Removing of staging areas may improve session performance.

You can run the multiple informatica servers’ againist the same repository.Distibuting the
session load to multiple informatica servers may improve session performance.

Run the informatica server in ASCII datamovement mode improves the session performance.
Because ASCII datamovement mode stores a character value in one byte.Unicode mode takes
2 bytes to store a character.

If a session joins multiple source tables in one Source Qualifier, optimizing the query may
improve performance. Also, single table select statements with an ORDER BY or GROUP BY
clause may benefit from optimization such as adding indexes.

We can improve the session performance by configuring the network packet size, which allows
data to cross the network at one time. To do this go to server manger, choose server configure
database connections.

If your target consists key constraints and indexes you slow the loading of data. To improve the
session performance in this case drop constraints and indexes before you run the session and
rebuild them after completion of session.

Running parallel sessions by using concurrent batches will also reduce the time of loading the
data. So concurent batches may also increase the session performance.

Partitioning the session improves the session performance by creating multiple connections to
sources and targets and loads data in paralel pipe lines.

In some cases if a session contains an aggregator transformation, you can use incremental
aggregation to improve session performance.

Aviod transformation errors to improve the session performance.

If the session contained lookup transformation you can improve the session performance by
enabling the look up cache.

If your session contains filter transformation, create that filter transformation nearer to the
sources or you can use filter condition in source qualifier.

Aggreagator, Rank and joiner transformation may often decrease the session performance
.Because they must group data before processing it. To improve session performance in this
case use sorted ports option.

What is snow flake scheme design in database?


Snow flake schema is one of the designs that are present in database design. Snow flake
schema serves the purpose of dimensional modeling in data warehousing. If the dimensional
table is split into many tables, where the schema is inclined slightly towards normalization, then
the snow flake design is utilized. It contains joins in depth. The reason is that, the tables split
further.

Explain the difference between star and snowflake schemas?


Star schema: A highly de-normalized technique. A star schema has one fact table and is
associated with numerous dimensions table and depicts a star.

Snow flake schema: The normalized principles applied star schema is known as Snow flake
schema. Every dimension table is associated with sub dimension table.

Differences:

• A dimension table will not have parent table in star schema, whereas snow flake
schemas have one or more parent tables.
• The dimensional table itself consists of hierarchies of dimensions in star schema, where
as hierarchies are split into different tables in snow flake schema. The drilling down data
from top most hierarchies to the lowermost hierarchies can be done.

What is the difference between view and materialized view?


A view is created by combining data from different tables. Hence, a view does not have data of
itself.
On the other hand, Materialized view usually used in data warehousing has data. This data
helps in decision making, performing calculations etc. The data stored by calculating it before
hand using queries.

When a view is created, the data is not stored in the database. The data is created when a
query is fired on the view. Whereas, data of a materialized view is stored.

What is junk dimension?


A single dimension is formed by lumping a number of small dimensions. This dimension is
called a junk dimension. Junk dimension has unrelated attributes. The process of grouping
random flags and text attributes in dimension by transmitting them to a distinguished sub
dimension is related to junk dimension.

What is degenerate dimension table?


A degenerate table does not have its own dimension table. It is derived from a fact table. The
column (dimension) which is a part of fact table but does not map to any dimension.
E.g. employee_id

What is conformed fact and conformed dimensions use for?


Conformed fact in a warehouse allows itself to have same name in separate tables. They can
be compared and combined mathematically. Conformed dimensions can be used across
multiple data marts. These conformed dimensions have a static structure. Any dimension table
that is used by multiple fact tables can be conformed dimensions.
What is the difference between Informatica 7.0 and 8.0?
The architecture of Power Center 8 has changed a lot:
1. PC8 is service-oriented for modularity, scalability and flexibility.
2. The Repository Service and Integration Service (as replacement for Rep Server and
Informatica Server) can be run on different computers in a network (so called nodes), even
redundantly.
3. Management is centralized, that means services can be started and stopped on nodes via a
central web interface.
4. Client Tools access the repository via that centralized machine, resources are distributed
dynamically.
5. Running all services on one machine is still possible, of course.
6. It has a support for unstructured data which includes spreadsheets, email, Microsoft Word
files, presentations and .PDF documents. It provides high availability, seamless fail over,
eliminating single points of failure.
7. It has added performance improvements (To bump up systems performance, Informatica has
added "push down optimization" which moves data transformation processing to the native
relational database I/O engine whenever it is most appropriate.)
8. Informatica has now added more tightly integrated data profiling, cleansing, and matching
capabilities.
9. Informatica has added a new web based administrative console.
10. Ability to write a Custom Transformation in C++ or Java.
11. Midstream SQL transformation has been added in 8.1.1, not in 8.1.
12. Dynamic configuration of caches and partitioning
13. Java transformation is introduced.
14. User defined functions
15. PowerCenter 8 release has "Append to Target file" feature.

What is Data warehousing?


A data warehouse can be considered as a storage area where interest specific or relevant data
is stored irrespective of the source. What actually is required to create a data warehouse can be
considered as Data Warehousing. Data warehousing merges data from multiple sources into an
easy and complete form.

What are fact tables and dimension tables?


As mentioned, data in a warehouse comes from the transactions. Fact table in a data
warehouse consists of facts and/or measures. The nature of data in a fact table is usually
numerical.
On the other hand, dimension table in a data warehouse contains fields used to describe the
data in fact tables. A dimension table can provide additional and descriptive information
(dimension) of the field of a fact table.
e.g. If I want to know the number of resources used for a task, my fact table will store the actual
measure (of resources) while my Dimension table will store the task and resource details.
Hence, the relation between a fact and dimension table is one to many.

What is ETL process in data warehousing?


ETL stands for Extraction, transformation and loading. That means extracting data from different
sources such as flat files, databases or XML data, transforming this data depending on the
application’s need and loads this data into data warehouse.

Explain the difference between data mining and data warehousing?


Data mining is a method for comparing large amounts of data for the purpose of finding
patterns. Data mining is normally used for models and forecasting. Data mining is the process of
correlations, patterns by shifting through large data repositories using pattern recognition
techniques.
Data warehousing is the central repository for the data of several business systems in an
enterprise. Data from various resources extracted and organized in the data warehouse
selectively for analysis and accessibility.
What is an OLTP system and OLAP system?
OLTP stands for OnLine Transaction Processing. Applications that supports and manges
transactions which involve high volumes of data are supported by OLTP system. OLTP is based
on client-server architecture and supports transactions across networks.
OLAP stands for OnLine Analytical Processing. Business data analysis and complex
calculations on low volumes of data are performed by OLAP. An insight of data coming from
various resources can be gained by a user with the support of OLAP.

What are cubes?


Multi dimensional data is logically represented by Cubes in data warehousing. The dimension
and the data are represented by the edge and the body of the cube respectively. OLAP
environments view the data in the form of hierarchical cube. A cube typically includes the
aggregations that are needed for business intelligence queries.

What is snow flake scheme design in database?


Snow flake schema is one of the designs that are present in database design. Snow
flake schema serves the purpose of dimensional modeling in data warehousing. If the
dimensional table is split into many tables, where the schema is inclined slightly towards
normalization, then the snow flake design is utilized. It contains joins in depth. The
reason is that, the tables split further.

Explain the difference between star and snowflake schemas?


Star schema: A highly de-normalized technique. A star schema has one fact table and
is associated with numerous dimensions table and depicts a star.

Snow flake schema: The normalized principles applied star schema is known as Snow
flake schema. Every dimension table is associated with sub dimension table.

Differences:
• A dimension table will not have parent table in star schema, whereas snow flake
schemas have one or more parent tables.
• The dimensional table itself consists of hierarchies of dimensions in star schema,
where as hierarchies are split into different tables in snow flake schema. The
drilling down data from top most hierarchies to the lowermost hierarchies can be
done.

What is the difference between view and materialized view?


A view is created by combining data from different tables. Hence, a view does not have
data of itself.
On the other hand, Materialized view usually used in data warehousing has data. This
data helps in decision making, performing calculations etc. The data stored by
calculating it before hand using queries.

When a view is created, the data is not stored in the database. The data is created
when a query is fired on the view. Whereas, data of a materialized view is stored.

What is junk dimension?


A single dimension is formed by lumping a number of small dimensions. This dimension
is called a junk dimension. Junk dimension has unrelated attributes. The process of
grouping random flags and text attributes in dimension by transmitting them to a
distinguished sub dimension is related to junk dimension.

What is degenerate dimension table?


A degenerate table does not have its own dimension table. It is derived from a fact
table. The column (dimension) which is a part of fact table but does not map to any
dimension.
E.g. employee_id

What is conformed fact and conformed dimensions use for?


Conformed fact in a warehouse allows itself to have same name in separate tables.
They can be compared and combined mathematically. Conformed dimensions can be
used across multiple data marts. These conformed dimensions have a static structure.
Any dimension table that is used by multiple fact tables can be conformed dimensions.

What is Virtual Data Warehousing?


A virtual data warehouse provides a compact view of the data inventory. It contains Meta data. It
uses middleware to build connections to different data sources. They can be fast as they allow
users to filter the most important pieces of data from different legacy applications.

What is active data warehousing?


An Active data warehouse aims to capture data continuously and deliver real time data. They
provide a single integrated view of a customer across multiple business lines. It is associated
with Business Intelligence Systems

What is the difference between dependent and independent data warehouse?


A dependent data warehouse stored the data in a central data warehouse. On the other hand
independent data warehouse does not make use of a central data warehouse.

Difference between data modeling and data mining?


Data modeling aims to identify all entities that have data. It then defines a relationship between
these entities. Data models can be conceptual, logical or Physical data models. Conceptual
models are typically used to explore high level business concepts in case of stakeholders.
Logical models are used to explore domain concepts. While Physical models are used to
explore database design.
Data mining is used to examine or explore the data using queries. These queries can be fired
on the data warehouse. Data mining helps in reporting, planning strategies, finding meaningful
patterns etc. it can be used to convert a large amount of data into a sensible form.

What is the difference between ER Modeling and Dimensional Modeling?


ER modeling, that models an ER diagram represents the entire businesses or applications
processes. This diagram can be segregated into multiple Dimensional models. This is to say, an
ER model will have both logical and physical model. The Dimensional model will only have
physical model.

What is Data Mart?


Data mart stores particular data that is gathered from different sources. Particular data may
belong to some specific community (group of people) or genre. Data marts can be used to focus
on specific business needs.

What are various methods of loading Dimension tables?


Conventional load: Here the data is checked for any table constraints before loading.
Direct or Faster load: The data is directly loaded without checking for any constraints.

What is the difference between OLAP and data warehouse?


A data warehouse serves as a repository to store historical data that can be used for analysis.
OLAP is Online Analytical processing that can be used to analyze and evaluate data in a
warehouse. The warehouse has data coming from varied sources. OLAP tool helps to organize
data in the warehouse using multidimensional models.

Describe the foreign key columns in fact table and dimension table?
The primary keys of entity tables are the foreign keys of dimension tables.
The Primary keys of fact dimensional table are the foreign keys of fact tables.

Define the term slowly changing dimensions (SCD)?


SCD are dimensions whose data changes very slowly. An example of this can be city of an
employee. This dimension will change very slowly. The row of this data in the dimension can be
either replaced completely without any track of old record OR a new row can be inserted, OR
the change can be tracked

What is a Star Schema?


A star schema comprises of fact and dimension tables. Fact table contains the fact or the actual
data. Usually numerical data is stored with multiple columns and many rows. Dimension tables
contain attributes or smaller granular data. The fact table in start schema will have foreign key
references of dimension tables.

What is the difference between star and snowflake schema?


Star Schema: A de-normalized technique in which one fact table is associated with several
dimension tables. It resembles a star.
Snow Flake Schema: A star schema that is applied with normalized principles is known as
Snow flake schema. Every dimension table is associated with sub dimension table.

Explain the use lookup tables and Aggregate tables?


An aggregate table contains summarized view of data. Lookup tables, using the primary key of
the target, allow updating of records based on the lookup condition.

What is real time data-warehousing?


In real time data-warehousing, the warehouse is updated every time the system performs a
transaction. It reflects the businesses real time information. This means that when the query is
fired in the warehouse, the state of the business at that time will be returned.

Define non-additive facts?


The facts that can not be summed up for the dimensions present in the fact table are called non-
additive facts. The facts can be useful if there are changes in dimensions. For example, profit
margin is a non-additive fact for it has no meaning to add them up for the account level or the
day level.

Define BUS Schema?


A BUS schema is to identify the common dimensions across business processes, like
identifying conforming dimensions. BUS schema has conformed dimension and standardized
definition of facts.

What is data cleaning? How can we do that?


Data cleaning is the process of identifying erroneous data. The data is checked for accuracy,
consistency, typos etc.

Data cleaning Methods:


Parsing - Used to detect syntax errors.
Data Transformation - Confirms that the input data matches in format with expected data.
Duplicate elimination - This process gets rid of duplicate entries.
Statistical Methods- values of mean, standard deviation, range, or clustering algorithms etc
are used to find erroneous data.

What is the purpose of Fact less Fact Table?


Fact less tables are so called because they simply contain keys which refer to the dimension
tables. Hence, they don’t really have facts or any information but are more commonly used for
tracking some information of an event.
Eg. To find the number of leaves taken by an employee in a month.

What is a level of Granularity of a fact table?


A fact table is usually designed at a low level of Granularity. This means that we need to find the
lowest level of information that can store in a fact table.
E.g. Employee performance is a very high level of granularity. Employee_performance_daily,
employee_perfomance_weekly can be considered lower levels of granularity.

What is Bit Mapped Index?


Bitmap indexes make use of bit arrays (bitmaps) to answer queries by performing bitwise logical
operations.
They work well with data that has a lower cardinality which means the data that take fewer
distinct values.
Bitmap indexes are useful in the data warehousing applications.
Bitmap indexes have a significant space and performance advantage over other structures for
such data.
Tables that have less number of insert or update operations can be good candidates.

The advantages of Bitmap indexes are:


They have a highly compressed structure, making them fast to read.
Their structure makes it possible for the system to combine multiple indexes together so that
they can access the underlying table faster.

The Disadvantage of Bitmap indexes is:


The overhead on maintaining them is enormous.

What is Data Cardinality?


Cardinality is the term used in database relations to denote the occurrences of data on either
side of the relation.

There are 3 basic types of cardinality:


High data cardinality:Values of a data column are very uncommon.
e.g.: email ids and the user names
Normal data cardinality:Values of a data column are somewhat uncommon but never unique.
e.g.: A data column containing LAST_NAME (there may be several entries of the same last
name)
Low data cardinality:Values of a data column are very usual.
e.g.: flag statuses: 0/1

Determining data cardinality is a substantial aspect used in data modeling. This is used to
determine the relationships
Types of cardinalities:
The Link Cardinality - 0:0 relationships
The Sub-type Cardinality - 1:0 relationships
The Physical Segment Cardinality - 1:1 relationship
The Possession Cardinality - 0: M relation
The Child Cardinality - 1: M mandatory relationship
The Characteristic Cardinality - 0: M relationship
The Paradox Cardinality - 1: M relationship.

You might also like