Professional Documents
Culture Documents
working with individual sales reps and district managers. What could be
wrong? I told him I would look into the issue and get back to him.
Upon examining the customer information contained within the data
mart, the following was an example of the type of records found for ABC
company.
When all the data from the various records were added together, it
showed that ABC company was, in fact, the largest customer. However,
due to data issues, that analysis could not be seen on any metric,
dashboard or report.
The BI solution worked properly but the underlying data prevented it
from delivering accurate information for some of the metrics. If the
customer learned of this oversight, it could lead to a very awkward
conversation between the vendor and the customer or worse result
in the customer changing vendors.
What kind of a problem was this and what are options for solving it?
The problem was related to Data Quality; the data was all there, but the
problem was that the data was inconsistent between regions, sales
people, and so forth. The data, being inconsistent, could not be added
together to get a true picture of customer activity. Further analysis
showed that the issue of inconsistent data was pervasive and, due to
that fact, it rendered the largest and most profitable customer metrics
meaningless.
Fixing the data in the mart would address the data quality issue for the
sales team. They discussed how, as the data travelled through the ETL
process, they could put transformation logic in place to fix the issue of
having multiple variations of a company name. However, what if the
company name was entered in a format that was outside the rules
implemented to catch and change names to ensure consistent naming?
They would constantly have to review the data for new inconsistencies
and revise the transformation logic.
What about other groups within the organisation? Marketing are they
sending the same customer multiple copies of promotions or new
product information? Accounting are they sending bills to the wrong
address? Customer Service are they unable to find the proper
customer information because they didnt use the correct company
name for the person who is calling? Fixing the data issue for the sales
group is good for the sales team but leaves the root problem untouched.
In this example, the data quality issue was generated from a single
source system. Imagine how data problems can escalate when you
have the same data in multiple systems. For example, what if the
company name in a CRM system differs from the name for the same
company in an accounting system. Add in a third, a fourth, or even more
additional applications with the same data, when there is a difference
in data between the systems, which is correct?
As you can see, data quality can be a significant problem for
companies. Data quality problems can be addressed in one of three
ways (at a high level) and each has times where it is a great choice and
there are also cost and time factors that must be evaluated in deciding
the best resolution for your organisation. The three high level methods
of addressing data quality issues are:
Data Completeness
Data completeness indicates that all requisite data items have been
populated. Not every data item needs to be populated in every instance.
For example, in a purchasing system, if a customer is paying by credit
card, the credit card number is required but if they are paying with cash,
that data item is not necessary. What we need to ensure is that all the
requisite data items are populated for each record.
Data Completeness has a second aspect as well: suppose that a
particular data item is deemed optional and so is entered into the
system sporatically. As an example, perhaps email was an optional field
for a customer but then there is a desire to do an ongoing email
marketing program and for that, they need to have customer email
addresses. Or perhaps there is a desire to analyse information in a new
way such as by gender, another data item that was not required and
therefore is sporadically populated.
This is a quality issue that cannot be found though a few simple rules,
as the necessity of a data item is sometimes not realised until after an
application has been in place for years; it is not until analysis of the data
is attempted that the lack of data in a certain area is discovered.
Additionally, even when the data item is determined to be necessary for
analysis, there may still be cases where null values are to be expected
as there are often exceptions to general rules.
Data Consistency
The problem described in the story above is a data consistency
problem. This problem is most frequently found in cases where an
application allows for free-form entry of text in regard to important
informational items or can also arise when merging data from disparate
systems (more on that in the External Integrity section). The absence of
data consistency makes accurate analysis difficult. While correcting a
data consistency issue can largely be resolved in an ETL process, it
means that the information in the data warehouse does not accurately
reflect the source data and so reports on the same data, one from the
source and one from the warehouse, may not match due to the
differences.
Data Accuracy
This can be a difficult item to validate and correct. What do you do if the
address provided by the customer is invalid or if the person entering the
address makes a mistake? Such information inaccuracies will exist in
the data and can be difficult to detect and correct. While data accuracy
items do not impact analysis as much as consistency or completeness,
they can still impact analysis. If an analysis is being done by city and
the address has the wrong city, the analysis will be incorrect. Depending
on the amount of data, the inaccuracy is often only found in a small
number of records and so the impact is minimised.
Internal Integrity
This problem refers to referential problems within the database; in other
words, how well does the data fit together? Perhaps a company is
deleted from the system but the contacts for that company are not
deleted; with the company gone, those contacts are now orphans as
they have no parent. Depending on the type of analysis being done, this
can lead to analysis problems. This type of issue is often caused by
External Integrity
External Integrity is linked with data consistency. When identical data is
merged between systems, it is hoped that the data between the
systems will match. If merging data between sales and accounting, the
sales database may have customer ABC while the accounting
database may have customer ABC, Inc.. Maintaining external integrity
means ensuring data consistency across source applications.
As you can see, there are various ways in which data quality issues can
impact reporting and analysis. In the next post in this series, Ill discuss
how Master Data Management can be used to help maintain the quality
of organisational data.
QUESTIONS
1. Identify the name of the information system that being used in the
article.
2. Discuss what the system can do?
3. What could be the problem with regards data integrity that
mentioned in the article?
4. What do you understand with below categories of data quality
issues? (Explain in your own words)
a. Data Completeness
b. Data Consistency
c. Data Accuracy
d. Internal Integrity
e. External Integrity
ETL
BI
BI
ABC
ABC
ABC
BI
- -
ETL
-
-
-
CRM
- ETL
''
Acuracy
sporatically
ETL
'' SQL
'ABC'
ABC
1.
2.
3.
4.