You are on page 1of 20

1. Explain about your project?

Getting data from Edward source performing the Master records to the target named as Master
Edward. For getting the unique version of truth we are using this MDM tool.

2. What is your role and responsibilities in your project?

3. What are your daily activities in your work?

4. How many source systems are there and how data will come to landing tables?
8(Eight) sources are available MBR, Enrollment, Provider, Cap, Product, Address and CLM.
But mainly we focused on the enrolled members, Address Doctor means address related and
Contact information.

5. Landing tables?
What is Load Job?
Load job is nothing but the sending the data from staging to BO.
Load process will like update and insert.
This will be determined by P KEY SOURCE and ROWID in the Xref table.
Staging table will be compared with Xref table.

6. Staging tables? How many tables created in staging?


How data is loaded from landing to staging and to base object?
when you map a column in landing table that contains rowid to rowid_object in stage table then
MDM will check the orig_rowid_object column in base object to see if incoming rowid exists
and if so then updates the existing XREF (if XREF doesn't exist inserts a new XREF) and then
updates the BO based on the cell values that win based on trust/recency.

7. Base object? How many tables created in BO? Names of base object tables?
A base object is used to describe central business entities, such as customers, accounts,
products, employees, and so on. The base object is the end-point for consolidating data from
multiple source systems.

8. What is trust? How trust scores are given?


If not all source systems are to be treated equally, you need to specify trust levels for each
source system. Trust ia a mechanism for measuring the confidence factor associated with each
cell based on its source system, change history, and other business rules. Trust takes into
account the age of data, how much its reliability has decayed over time, and the validity of the
data. Trust is specified at the column level. As an example, you can specify a higher trust level
for Customer Name in Source System 1 and for Phone Number in Source System 2. For each
combination of a column and a source system, the following settings can be specified:
Maximum (initial) trust level for a new value
Minimum trust level for old values
Length of time that the trust level takes to decay from the maximum trust to the minimum
trust
Shape of the decay curve (a straight line or a curve)

9. What is downgrade trust?

10. What is validation rule? And what validation rules created in your project?
What are VCT and VXR?
These two tables are used for validation or we can these as control table also
What is validation and significance?
Can we use join condition in validation rule?
Yes

11. Base object table names?

12. Difference between ETL and MDM?


Power Center is for data integration/transformation/migration. MDM stands for master data
management. Managing business entities such as Customer, Product, Employee, locations.
MDM provides technology and governance platform to consolidate master data from different
systems and ongoing data maintenance. MDM is used by data stewards to create clean,
consistent master data following data policies and standards setup under data governance.

13. Difference between IDD and MDM?

14. What are cleansing functions you used in project?


Cleanse functions are used to customize the data in consistent and proper format, for this we
use different cleanse functions.
For example in date we have in proper customer phone number format is represented and we
can define the proper phone number format by using cleanse functions.
For this functions, Go to cleanse functions and if u want to change address format go to
geographic and we use Address line function, once we created function the parameters where
defined and we apply these functions to our data where data needs to be cleansed.
What is STOPONHIT?
Stoponhit is the term we can find when initializing cleanse functions, By stoponhit we can only
define one match data after hitting desired data, function will be stopped.
When you run the match job, on which server is it running?
Cleanse/Match server
Tell me how to create graph cleanse function?

15. What is load process?


I have to tables Person and Address, there is a relationship between them by rowid object
column, the Address table PK is Rowid_Object and I added a columns Rowid_Address in Person
table, these are the columns I used to stablish the relationship between both tables. I used a
load process to insert some records into both tables (Address and Person) and it is ok, when
trying to insert a record into Person table and the specified Address code (Rowid_Address) does
not exist into Address table, that specific record is moved to reject table and it is ok too, what I
need to know is, how I can move the rejected record to the corresponding table (Person) after
inserting the non-existing address into Address table. After inserting the non-existing address
which produced the error, I should be able to insert the record into Person table and thats
exactly what I need to know, how to do it.

16. How data is loaded from landing to staging and to base object?
How data is loaded from landing to staging and to base object?
when you map a column in landing table that contains rowid to rowid_object in stage table then
MDM will check the orig_rowid_object column in base object to see if incoming rowid exists
and if so then updates the existing XREF (if XREF doesn't exist inserts a new XREF) and then
updates the BO based on the cell values that win based on trust/recency.

17. Have you created any cleanse function for your project?
Cleanse Process in MDM?
Data cleansing is the process of:
standardizing data content and layout
decomposing/parsing text values into identifiable elements
verifying identifiable values (such as zip codes) against data libraries and replacing incorrect
values with correct values from data libraries.
Tell me what is cleanse function and how to create cleanse function?
Cleanse functions are used to customize the data in consistent and proper format, for this we
use different cleanse functions.
For example in date we have in proper customer phone number format is represented and we
can define the proper phone number format by using cleanse functions.
For this functions, Go to cleanse functions and if u want to change address format go to
geographic and we use Address line function, once we created function the parameters where
defined and we apply these functions to our data where data needs to be cleansed.
What is STOPONHIT?
Stoponhit is the term we can find when initializing cleanse functions, By stoponhit we can only
define one match data after hitting desired data, function will be stopped.
When you run the match job, on which server is it running?
Cleanse/Match server
Tell me how to create graph cleanse function?

18. Data models in your project?

19. Schemas in your project?

20. Have you involved match and merge process?

21. Types of Match?


Match Process?
Matching is the process of comparing two records for points of similarity. If sufficient points of
similarity are found to indicate that the two records are probably duplicates of each other, then
MDM Hub flags those records for consolidation. Data that you are matching is stored in tables
called base objects. These base objects have columns, some of which you designate for
matching. The columns to be used for comparison purposes are called match columns. Each
match column is based on one or more columns from the base object. Each match column also
has a match type that determines how the match column will be tokenized in preparation for
the match comparison.

22. Fuzzy match and exact match columns?


There are two types of Match Strategy-Fuzzy and Exact. The Tokenization process runs only on
Base Objects with Fuzzy match strategy. Tokenization is the process of identifying the Match
pairs between the records (Fuzzy Match Key) by generating and comparing tokens.
Tokenization generates 1-20 tokens for each record of Fuzzy Match Key depending on the Key
width and the length of the record. A token is an 8 char system generated value that is stored in
BaseObject_STRP table (column-SSA_KEY). If there is a match between the tokens of two
records, then they are identified as Match Pairs. The process of matching is only executed on
these match pairs. So tokenization is basically the first step in Matching Process before
executing the match process. Match tokens are generated in MDM Hub Console by executing
the 'Generate Match Tokens' Batch Job in the Batch viewer after the staging and loading process
are done.
Tokenization Process:
The Tokenization Process generates match tokens and stores them in a match key table
associated with the base object.MDM hub cannot generate match keys for columns with
encrypted data. Match tokens are encoded and non encoded representations of the data in
base object records Match token includes Match keys, which are fixed length, compressed
strings consisting of encoded values built from all of the columns in the fuzzy match key of a
fuzzy match base object Match tokens also includes non encoded data.

23. How to create query and package and why?

24. Different types of packages?

25. What queries have created in your project?

26. How much data matched once match job runs in your project?

27. How to merge records once they are matched?


28. How much data merged in your project?

29. Have you involved in design and architecture?

30. What is consolidation indicator?


What is consolidation?
The use of merging data in Informatica MDM Hub is to combine the records together into a
single, consolidated record by removing all duplicates. It also includes maintaining traceability
to determine which systems and the cells from that source system are contributing to the BVT.
The match process aids to determine Duplicate records. Informatica MDM Hub compares
records at the cell level. The data in cells which have the highest trust level are consolidated
together. The consolidation indicator identifies the status of individual records relating to their
consolidation when as they progress through various processes in MDM Hub. All the base
objects have a system column named ONSOLIDATION_IND.

31. When consolidation process will start at staging or landing?

32. Explain Consolidation indicator 1,2,3,4 and 9?


Explain Consolidation indicator 1,2,3,4 and 9?
1 CONSOLIDATED this record has been consolidated, determined to be unique, and represents
the best version of the truth.
2 Not MERGED or MATCHED this record has gone through the match process and is ready to be
consolidated. Also, if a record has gone through the manual merge process, where the matching
is done by a data steward, the consolidation indicator is 2.
3 QUEUED_FOR_MATCH This record is a match candidate in the match batch that is being
processed in the currently-executing match process.
4 NEWLY_LOADED This record is a new insert or is unmerged and must undergo the match
process.
9 ON_HOLD The data steward has put this record on hold until further notice.

33. How will you process records having consolidator indicator 9?

34. When records will turn to golden records?


What will you do after golden records are generated?
User Exit is a customize java code, that will help you to manipulate with the data before
referring it as GOLDEN. There are many class provided by Informatica for User Exit such as
postLoadUserExit, postMatchUserExit etc. Example: Suppose I have a business requirement to
merge my record according to survivorship rules. Let say, I have a column name Amount, and I
want to pick the highest value in the group of matched record for Amount. I can accomplish this
task using post merge user exit.
When records will turn to golden records?
After records merged and consolidator indicator changes to 1 then the record is said to be
golden record.
What is consolidation/merge process?
Consolidation is nothing but making a golden record.
Match table is the input for the consolidation process based on that the winning record will be
decided and final record will merged into BO table.

35. What is hub state indicator?

36. Explain tokenization?


There are two types of Match Strategy-Fuzzy and Exact. The Tokenization process runs only on
Base Objects with Fuzzy match strategy. Tokenization is the process of identifying the Match
pairs between the records (Fuzzy Match Key) by generating and comparing tokens.
Tokenization generates 1-20 tokens for each record of Fuzzy Match Key depending on the Key
width and the length of the record. A token is an 8 char system generated value that is stored in
BaseObject_STRP table (column-SSA_KEY). If there is a match between the tokens of two
records, then they are identified as Match Pairs. The process of matching is only executed on
these match pairs. So tokenization is basically the first step in Matching Process before
executing the match process. Match tokens are generated in MDM Hub Console by executing
the 'Generate Match Tokens' Batch Job in the Batch viewer after the staging and loading process
are done.
Tokenization Process:
The Tokenization Process generates match tokens and stores them in a match key table
associated with the base object.MDM hub cannot generate match keys for columns with
encrypted data. Match tokens are encoded and non encoded representations of the data in
base object records Match token includes Match keys, which are fixed length, compressed
strings consisting of encoded values built from all of the columns in the fuzzy match key of a
fuzzy match base object Match tokens also includes non encoded data.

37. Explain Match process? Match path? Match columns?


Explain about Match process?
The input or pre request for mach process is tokenization, So we must have done the
tokenization before we going to match process.
During the process best of the tokens which are generated during tokenization the records will
match with other.
The final outputs will the MTC table or match table will populated with records which are to be
merged.

38. Match Rules?What match rules created in your project?


Match Rules?
Automatic match rules: Your best rules, the ones you are most sure will result in an accurate
match. For matches resulting from automatic match rules, the records are consolidated
automatically.
Manual match rules: Rules that you are less sure will generate accurate matches. These rules
use looser criteria than automatic match rules. For matches resulting from manual match rules,
the records are consolidated manually after being reviewed by a data steward. Manual match
rules identify records that have enough points of similarity to warrant attention from a data
steward, but not enough points of similarity to allow the system to automatically consolidate
the records.

39. Types of merge?

40. Have you implemented user exits?

41. What is delta detection?


Delta detection can be done either by comparing entire records or via a date column. Delta
detection on last update date is the most efficient, as MDM Hub can simply compare the last
update date columns for each incoming record against the records previous last update date.

42. What are issues you faced in your project?


43. If a load job fails what is your next step?

44. How delete records from Base object?

45. Where rejected records are stored and if you want back those records how will you process?

46. Have you worked with IDQ?

47. What is dirty table?And what columns?


Dirty table is used for tokenization process.
Dirty table will have only one column that is Rowid object, Based on this tokenization process
will be generated.

49. What batch process?


What are the issues faced you during batch process running?
We have process running like stage process, load process, Tokenization process, match process
and Merge process.
Need to explain issued faced like why records rejects during process and what are the steps you
have taken to resolve the issue. During load job we will face issue like lookup failure issue.
And tokenization process and match process we will get memory issue and time out issue.
How to do bulk unmerge?
We have batch unmerge api where we can unmerge bulk data.
How do schedule the batch job? And what is frequency for run?
We can configure scheduling by WLM.
Tivoli , autosys and tidal and Maestro
How do you get Job ids for jobs such as stage job and load job?
C_repos_table_object batch job table from which we will get the job ids
50. If a load job fails which log file you will verify?

51. Have you implemented cleanse server configuration?

52. What is child missing concept in MDM?

53. What is incremental load and full load?

54. If a load failed at the time of staging what is your next step?
Stage job has failed. Which log files will you look?
cmxserver.log

55. If a load failed at the time of landing what is your next step?
Load job has failed. Which log file will you look at to debug?
Database log file
56. If a load failed at the time of loading base object what is your next step?

57. What are the steps you will go through with source systems before starting to work on MDM?

58. We can match and delete duplicate in oracle also then what the use of MDM?

59. How u learned MDM?


MY company has provided the KT in my company, client has introduced this new business foir
their feasibility. For that purpose only my company has introduced this new technology then I
learned accordingly.

60. How much u can rate yourself in mdm? And reasons for remaining minus rating?

61. What is your role and responsibilities in your project?

62. What are your daily activities in your work?

63. How many source systems are there and how data will come to landing tables?
64. Explain your informatica MDM experience related to MDM hub configuration?
For this explain how load process, and stage process and Match and merge process are configured

65. Explaining about Project?

66. How many sources presented in your project?


8(Eight) sources are available MBR, Enrollment, Provider, Cap, Product, Address and CLM.
But mainly we focused on the enrolled members, Address Doctor means address related and
Contact information.

67. How many landing, Staging and BO in your project?


There are some landing tables source data which are populated from the Oracle systems (8
sources (tables) are available MBR, Enrollment, Provider, Cap, Product, Address and CLM.)
1) System columns for base object
ROWID_OBJECT
CONSOLIDATION_IND
DELETED_IND
DELETED_DATE
DELETED_BY
DIRTY_IND
LAST_ROWID_SYSTEM
CREATE_DATE
CREATOR
LAST_UPDATE_DATE
UPDATED_BY

What is user exit?


User Exit is a customize java code, that will help you to manipulate with the data before referring it as
GOLDEN. There are many class provided by Informatica for User Exit such as postLoadUserExit,
postMatchUserExit etc. Example: Suppose I have a business requirement to merge my record according
to survivorship rules. Let say, I have a column name Amount, and I want to pick the highest value in the
group of matched record for Amount. I can accomplish this task using post merge user exit.

When consolidation process will start at staging or landing?


Landing.

How will you process records having consolidator indicator 9?


Data steward check with end user regarding data that is in hold and after get permissions we go with
that whether records needs to merged or deleted

How do we enable the database in the hub?


In the C_Repos table.

What is incremental load and full load?


1) Full load/Initial load: it will dump all the data from source system to target system.(starting load)
2) Incremental load: here we will load the records which are changed after full load happens.

How much data merged in your project?


There is about 1.2 million data is there in our project after merging we got around 1million data and we
still working on that one.

Have you involved in design and architecture?


No

What are the tables created when you create a base object?
Xref
VCT
VXR(validation related table)
Strp table
Dirty table
History tables

What are VCT and VXR?


These two tables are used for validation or we can these as control table also

What is validation and significance?

Can we use join condition in validation rule?


Yes

Is it possible to have duplicate P_KEY_SOURCE object in Xref table?


Yes. If we define the timeline for the BO we will have duplicate PKEY_ source.

What is Shadow column and what are the Shadow columns in Xref table?

What is FMHA(Match flag audit table)


It is an audit table which is used to test down. Data steward pushed record for audit such details will
maintain in FMHA.

Landing and BO are will be same number but staging will be depends on source system?
Yes, based on how many source are available then we can have same number of BOs in the
project.
Reason behind this is

What are the processes involved in informatica MDM?

Landstage loadTokenizationmatch consolidate (merge)publish the master


records(consolidation "1").
Explain all load, stage, match and merge, Tokenization, consolidation?

What is stage process and what are its significance?


At stage process so many process are involved like delta detection, hard delete detection and tabled like
PRL, RAW and REJECT.

Where do you configure delta detection and audit trail?

What are the configurations you will do for delta detection and audit trail in your project?
These will be done at the time of staging time level and need to explain business reason for these and
where do you configured.

What is the duration you are maintaining audit trail?

What is delta detection and how u will configure delta detection?

What is full data load and what is incremental data load?


At the time of full data load we will use delta detection concept.

How u will use delta detection with incremental data load?


It is not possible to use delta detection with incremental load.

While running a stage job with delta detection enabled.


While running stage, landing table will check with PRL table for records are properly loaded are
not.

When PRL, OPL, RAW and reject table will be created?


PRL, OPL, RAW will be created only when we create mapping.
Reject and stg tabled will be created only when we create staging.

What are the reasons for record rejection?


Records are rejected when PKEY_sourec is populated as duplicate.
Or same pkey source already exists and if not null column contains null and last update column contains
future date and during load job if look value fails.

When PRL , Rej and STG and RAW table get cleared / truncated?
PRL table will truncated for every run, So when u run the stage job PRL table will be truncated and also
STG job will be truncated when u run stage job.
Reject table only can truncated when u ran stage job more than your load job.
RAW table get cleared when based on the what time u have given.

Have u used any data quality tool along with MDM such as informatica data quality?

What are the different ways by which we can generate tokens?


On load generation of tokens: We can generate tokens at the time of loading the data
Pre match generation of tokens: Before matching the records we can generate tokens
And we can run tokenization process manually and generate tokens

What is the impact of key width on tokenization process?


In mdm generation of tokens will depend on the key width.

What is Path and why it is used in MDM?

What is the difference between search level and match level?

What is segment matching?


When we to particular matching i.e customer segment details or organization details we will use
segment matching.

What are null matches null and null matches not null attributes?

What are the best practices you followed for match rule creation?

What is consolidation/merge process?


Consolidation is nothing but making a golden record.
Match table is the input for the consolidation process based on that the winning record will be decided
and final record will merged into BO table.

What is consolidation indicator and what are valid values?


Consolidation indicator gives details of what is the state of record that means to which phase the record
is present
4 new records in BO
3 Record gone to match process
2 merging
1 unique record
9 hold

What are queries and packages in MDM?

While running load job because of memory issue load job got to idle state because of this you are not
able to run other load job how u will fix the issue?
When we run a load job lock will be applied on that job if job gone to idle we have truncate or delete the
record C_repo_applied_lock that is created during load process. After truncating this record we can run
the job again

What is GETLIST LIMIT?


Max value 5999. If you want to change this then go to CMX_SYSTEM.C_REPOS_DATABASE
Which table maintains landing , staging and base object table names?
C_REPOS__TABLE

Why MDM?
1. Multiple products(different databases in on organization)
2. Helps a data warehouse and reporting application(duplicate recores and multiple instances)
3. Helps a CRM in retaining and fetching new customers
4. Stopping communication for deceased individuals.

Why informatica MDM?


1. Informatica is the only vendor which sells products like Data integration, Data quality, and
MDM
2. Scaling( combining several servers from different places to one hub console)
3. Robust matching engine

Stage Process:
1. Delta detection
2. Cleansing
3. Raw
4. Reject

Load Process:
1. Apply trust
2. Apply validation
3. Referential integrity (Lookup translation).

RAW table: Data is stored for some amount of time of some amount of loads, it depends upon as we
like to keep storage.

Reject table: While loading data from landing to staging the records which rejected for duplicates, null
records and some other records which are not supported the condition and those will be transferred to
reject table.

Example: If the sources are 50, but we can load data into one landing table but staging tables will be 50.

Landstage loadTokenizationmatch consolidate (merge)publish the master


records(consolidation "1").
CMXUE is used for User Exit implementation, CMXUT packages consists of utility classes which can be
used for User Exits as well as for SIF components.

Last Row id system: Last row id means from which system record is coming.

Dirty Indicator: Dirty Indicator is used for tokenization process.

Interaction id : Interaction id is used to see from which process record is coming.

Hub state indicator: Hub state indicator is used to check whether which state the record is there i.e
Active or delete or pending state.

Rowid object : Unique identifier for each record.

Consolidation indicator: To check whether record is new, matched, merged or unique.

What log files does Informatica MDM write out and where are they?
Informatica MDM writes to a number of different log files, depending on the process that is running. If
you encounter a problem when running a process, Informatica MDM Support will likely ask you for one
or more of the log files to assist with troubleshooting.

Log files typically requested by Siperian Support


If you have a failure in Staging, Match or Tokenize, Siperian Support will usually ask you for
Siperian cleanse-match server log,
Siperian database debug log
Support might also ask for
SQLLoader logs,
Application Server logs (i.e. WebLogic/WebSphere/JBOSS logs)
If you have a failure in Load, Merge, Automerge or Unmerge, Siperian Support will usually ask you for
Siperian database debug log
Support might also ask for
Siperian cleanse-match server log and WebLogic/WebSphere/JBOSS logs (for Automerge failure
occurring in BuildMatchGroups (BMG) process)
If you have a failure in a SIF API call or BDD, Siperian Support will usually ask you for
Siperian hub server log,
Siperian database debug log
Support might also ask for
Siperian cleanse-match server log,
Application Server logs (i.e. WebLogic/WebSphere/JBOSS logs)
If you have an error reported in the Siperian console, Siperian Support will usually ask you for
Siperian console log,
Siperian hub server log
Support might also ask for
Siperian database debug log,
Application Server logs (i.e. WebLogic/WebSphere/JBOSS logs)
Siperian hub server log
The Siperian hub server log is located in <SIPERIAN_HOME>/hub/server/logs/cmxserver.log.
The logging detail level can be changed in log4j.xml:
For WebLogic and WebSphere, this is located in <SIPERIAN_HOME>/hub/server/conf/log4j.xml
For JBOSS, this is located in <JBOSS_HOME>/server/default/conf/log4j.xml

Why Informatica?
Informatica Master Data Management is the only MDM solution that is both easy to deploy and flexible
enough to solve your unique business challenges.
Agility-Our solution can be deployed rapidly and easily, and includes the full assortment of data
integrity, data quality, and BPM capabilities required to successfully complete any MDM project.
Business user-focused-We deliver value directly to your business users by immediately improving
business processes and helping them discover relationships in the data that gives them powerful
insights.
Focused on customer success-A dedicated group within Informatica's MDM division, our customer
success team was created with the sole purpose of ensuring customer satisfaction. As a result,
Informatica has been named number one for customer loyalty nine years in a row by independent
research firm TNS.

What is your role in MDM?

What are the errors you have faced in MDM?

How many landing sources, BO, cleanse functions tables in your project?

What you do once you complete the Match job?

What are the tables?

What are the log files?


What is PIM?

What is the robust matching engine?

You might also like