You are on page 1of 5

Data Quality Architecture consists mainly of 4 layers

1)
2)
3)
4)

Client Layer
Server Layer
Metadata
Content Layer

We have 2 clients in the Client Layer namely


1) Informatica Analyst Its a web based client used to carry out Data Quality
Analysis.
2) Informatica Developer Its a client tool used to carry out Data Quality
activities.
All services under Data Quality are used to serve the requests from the Client tools
Analyst and Developer. The services that we use are

Analyst Service Used to perform Web based data quality activities. It needs
to be up and running always.
Model Repository Service used to take care of repository activities,
metadata, tables, structures etc. It needs to be up and running always.
Data Integration Service used to move data from one transformation to
another. It needs to be up and running always.
Content Management Service Used to provide address content or Identity
content

These services are created in the admin console.


Metadata Layer which is the Model repository where the Metadata is stored. This is
a central repository meaning any change we do in Analyst tool will be visible in
Developer tool and vice versa. Note this repository is different from the
PowerCenter/Power Exchange Repository.

Reference Data In order to perform any standardization like str, street, st. etc. that
is simple find and replace method, we use reference data.
Address Data Holds the Address details like Master table for reference.
Example notes
Master Record:
Merger
Lets say we have 2 banks B1 and B2 and Bank B1 has acquired Bank B2.
NameAddress
B1

abc

#111, sname, city, zip

B2

abc1 null

After Merger:

We need to identify the duplicates, identify matching records to identify same


customers. To do this matching in Data quality we have 2 strategies i.e, exact
matching and probable matching.
In case of exact, if we consider name as a column, then the names dont match. So
there are no duplicates.
Then we use probable match where in matching abc with abc1 would be 80% or
70% match. These records are considered matches or duplicate records.
Next step is creation of good data as a result of these 2 matches which is called the
Master Record.
Consolidation these 2 records will be merged into a single record. We can define
consolidation rules here.
Rule 1: which columns are longer in length. Lets say Full name as I/p column. Name
with more no. of chars are called good data.
abc1 #111, sname, city, zip

-> Final result of data quality tool.

Other data quality tools are Trillium, SAS Data flux, IBM Infosphere quality, Address
Doctor, First Logic. Address doctor is part of IDQ since Informatica acquired it.

Physical Data Objects: - can be brought in by importing them from Relational


Databases or Flat files.
Or by creating them yourself as either relational or flat files. And they can be used
for reading in or writing out data in a Mapping and Mappings can have
transformational logic to modify the data.
Fig. below shows screenshot of Informatica Developer and on the right hand side,
we have the Database connections.

Join Analysis Profile


Lets assume weve 3 sources Customer_Shipping, Orders, Order_Details. Im a
customer and I buy dog food so there would be an order and the details of that
order would be in Order_Details.
The goal of the example is that can we create a Master Source File.
Without any data analysis, can we analyze this?
Is there a unique (primary/foreign) key that makes sense to join this data.
Lets look at Customers & Orders? Any orphans? Can we have orders without
Customers?

Lets look at Orders and Order Details? Same questions. I.e. Are Orphans allowed?
Could we have orders without order details? Or could we have Order details without
Orders? Or Orders and Order Details without Customers?
There is an easier way in IDQ to do this called Join Analysis:Right click and add a Transformation called Joiner. Lets say we are studying Order
Details and Orders. So in left hand side, right click on Order Details and click Profile.

Notice what comes up in the next wizard. It shows we can do Multiple Profiles,
Profile and Profile Model.

In this case we do Profile Model.


Generic Data Profiling:The way to create a Physical data object is right click on Physical Data Object under
Project Folder and click Create Physical Object (PDO).
PDO where am I reading data from? Where is the data itself? Where am I going to
write the results to? Commonly known as the source/Target except that we call as
PDO. They can Relational, flat file, Non-Relational, SAP or Web Services data objects.

You might also like