You are on page 1of 21

The What, Why, and How

of Master Data
Management
777 out of 814 rated this helpful

Roger Wolter and Kir Haselden
Mi!rosoft "orporation
#o$em%er &''(
)pplies to*
Master Data Management +MDM,
-er$i!e./riented )r!hite!ture +-/),
-oftware as a -er$i!e +-aa-,
Summary: The re!ent emphasis on regulatory !omplian!e, -/), and
mergers and a!0uisitions has made the !reating and maintaining of
a!!urate and !omplete master data a %usiness imperati$e1 This paper
!o$ers the reasons for adopting master.data management, the pro!ess
of de$eloping a solution, and se$eral options for the te!hnologi!al
implementation of the solution1 +1& printed pages,
"ontents
2ntrodu!tion
De!iding What to Manage
Why -hould 2 Manage Master Data3
What 2s Master Data Management3
"on!lusion
2ntrodu!tion
The pain that organi4ations are e5perien!ing around !onsistent
reporting, regulatory !omplian!e, strong interest in -er$i!e./riented
)r!hite!ture +-/),, and -oftware as a -er$i!e +-aa-, has prompted a
great deal of interest in Master Data Management +MDM,1 This paper
e5plains what MDM is, why it is important, and how to manage it, while
identifying some of the ey MDM management patterns and %est
pra!ti!es that are emerging1 This paper is a high.le$el treatment of the
pro%lem spa!e1 2n su%se0uent papers, we will drill down into the
te!hni!al and pro!edural issues in$ol$ed in Master Data Management1
What 2s Master Data3
Most software systems ha$e lists of data that are shared and used %y
se$eral of the appli!ations that mae up the system1 6or e5ample, a
typi!al 7R8 system as a minimum will ha$e a "ustomer Master, an
2tem Master, and an )!!ount Master1 This master data is often one of
the ey assets of a !ompany1 2t9s not unusual for a !ompany to %e
a!0uired primarily for a!!ess to its "ustomer Master data1
Rudimentary De:nitions
There are some $ery well.understood and easily identi:ed master.data
items, su!h as ;!ustomer; and ;produ!t1; 2n fa!t, many de:ne master
data %y simply re!iting a !ommonly agreed upon master.data item list,
su!h as* !ustomer, produ!t, lo!ation, employee, and asset1 <ut how
you identify elements of data that should %e managed %y a master.
data management system is mu!h more !omple5 and de:es su!h
rudimentary de:nitions1 2n fa!t, there is a lot of !onfusion around what
master data is and how it is 0uali:ed, ne!essitating a more
!omprehensi$e treatment1
There are essentially :$e types of data in !orporations*
Unstructured=This is data found in e.mail, white papers lie
this, maga4ine arti!les, !orporate intranet portals, produ!t
spe!i:!ations, mareting !ollateral, and 8D6 :les1
Transactional=This is data related to sales, deli$eries, in$oi!es,
trou%le ti!ets, !laims, and other monetary and non.monetary
intera!tions1
Metadata=This is data a%out other data and may reside in a
formal repository or in $arious other forms su!h as >M?
do!uments, report de:nitions, !olumn des!riptions in a data%ase,
log :les, !onne!tions, and !on:guration :les1
Hierarchical=Hierar!hi!al data stores the relationships %etween
other data1 2t may %e stored as part of an a!!ounting system or
separately as des!riptions of real.world relationships, su!h as
!ompany organi4ational stru!tures or produ!t lines1 Hierar!hi!al
data is sometimes !onsidered a super MDM domain, %e!ause it is
!riti!al to understanding and sometimes dis!o$ering the
relationships %etween master data1
Master=Master data are the !riti!al nouns of a %usiness and fall
generally into four groupings* people, things, pla!es, and
!on!epts1 6urther !ategori4ations within those groupings are
!alled su%@e!t areas, domain areas, or entity types1 6or e5ample,
within people, there are !ustomer, employee, and salesperson1
Within things, there are produ!t, part, store, and asset1 Within
!on!epts, there are things lie !ontra!t, warrantee, and li!enses1
6inally, within pla!es, there are oA!e lo!ations and geographi!
di$isions1 -ome of these domain areas may %e further di$ided1
"ustomer may %e further segmented, %ased on in!enti$es and
history1 ) !ompany may ha$e normal !ustomers, as well as
premiere and e5e!uti$e !ustomers1 8rodu!t may %e further
segmented %y se!tor and industry1 The re0uirements, life !y!le,
and "RBD !y!le for a produ!t in the "onsumer 8a!aged Coods
+"8C, se!tor is liely $ery diDerent from those of the !lothing
industry1 The granularity of domains is essentially determined %y
the magnitude of diDeren!es %etween the attri%utes of the
entities within them1
De!iding What to Manage
While identifying master data entities is pretty straightforward, not all
data that :ts the de:nition for master data should ne!essarily %e
managed as su!h1 This paper narrows the de:nition of master data to
the following !riteria, all of whi!h should %e !onsidered together when
de!iding if a gi$en entity should %e treated as master data1
<eha$ior
Master data !an %e des!ri%ed %y the way that it intera!ts with other
data1 6or e5ample, in transa!tion systems, master data is almost
always in$ol$ed with transa!tional data1 ) customer %uys a product1
) vendorsells a part, and a partner deli$ers a !rate of materials to
a location1 )n employee is hierar!hi!ally related to their manager, who
reports up through a manager +another employee,1 ) product may %e a
part of multiple hierar!hies des!ri%ing their pla!ement within a store1
This relationship %etween master data and transactional data may %e
fundamentally $iewed as a nounE$er% relationship1 Transa!tional data
!apture the $er%s, su!h as sale, deli$ery, pur!hase, email, and
re$o!ationF master data are the nouns1 This is the same relationship
data.warehouse fa!ts and dimensions share1
?ife "y!le
Master data !an %e des!ri%ed %y the way that it is !reated, read,
updated, deleted, and sear!hed1 This life !y!le is !alled the "RBD !y!le
and is diDerent for diDerent master.data element types and !ompanies1
6or e5ample, how a !ustomer is !reated depends largely upon a
!ompany9s %usiness rules, industry segment, and data systems1 /ne
!ompany may ha$e multiple !ustomer.!reation $e!tors, su!h as
through the 2nternet, dire!tly through a!!ount representati$es, or
through outlet stores1 )nother !ompany may only allow !ustomers to
%e !reated through dire!t !onta!t o$er the phone with its !all !enter1
6urther, how a !ustomer element is !reated is !ertainly diDerent from
how a $endor element is !reated1 The following ta%le illustrates the
diDering "RBD !y!les for four !ommon master.data su%@e!t areas1
Sample CRUD cycle
Customer Product Asset
Create Customer visit, such as to Web site or
facility; account created
Product purchased or manufactured;
SCM involvement
Unit acquired by opening a PO; approval
process necessary
ead Conte!tuali"ed vie#s based on credentials of
vie#er
Periodic inventory catalogues Periodic reporting purposes, figuring
depreciation, verification
Update $ddress, discounts, phone number,
preferences, credit accounts
Pac%aging changes, ra# materials
changes
&ransfers, maintenance, accident reports
'estroy 'eath, ban%ruptcy, liquidation, do(not(call) Canceled, replaced, no longer available Obsolete, sold, destroyed, stolen, scrapped
Search CM system, call(center system, contact(
management system
*P system, orders(processing system +, trac%ing, asset '- management
"ardinality
)s !ardinality +the num%er of elements in a set, de!reases, the
lielihood of an element %eing treated as a master.data element=e$en
a !ommonly a!!epted su%@e!t area, su!h as !ustomer=de!reases1 6or
e5ample, if a !ompany has only three !ustomers, most liely they
would not !onsider those !ustomers master data=at least, not in the
!onte5t of supporting them with a master.data management solution,
simply %e!ause there is no %ene:t to managing those !ustomers with a
master.data infrastru!ture1 Get, a !ompany with thousands of
!ustomers would !onsider "ustomer an important su%@e!t area,
%e!ause of the !on!omitant issues and %ene:ts around managing su!h
a large set of entities1 The !ustomer $alue to ea!h of these !ompanies
is the same1 <oth rely upon their !ustomers for %usiness1 /ne needs a
!ustomer master.data solutionF the other does not1 "ardinality does
not !hange the !lassi:!ation of a gi$en entity typeF howe$er, the
importan!e of ha$ing a solution for managing an entity type in!reases
as the !ardinality of the entity type in!reases1
?ifetime
Master data tends to %e less $olatile than transa!tional data1 )s it
%e!omes more $olatile, it typi!ally is !onsidered more transa!tional1
6or e5ample, some might !onsider ;!ontra!t; a master.data element1
/thers might !onsider it a transa!tion1 Depending on the lifespan of a
!ontra!t, it !an go either way1 )n agen!y promoting professional
athletes might !onsider their !ontra!ts as master data1 7a!h is
diDerent from the other and typi!ally has a lifetime of greater than a
year1 2t may %e tempting to simply ha$e one master.data item !alled
;athlete1; Howe$er, athletes tend to ha$e more than one !ontra!t at
any gi$en time* one with their teams and others with !ompanies for
endorsing produ!ts1 The agen!y would need to manage all those
!ontra!ts o$er time, as elements of the !ontra!t are renegotiated or
athletes traded1 /ther !ontra!ts=for e5ample, !ontra!ts for detailing
!ars or painting a house=are more lie a transa!tion1 They are one.
time, short.li$ed agreements to pro$ide ser$i!es for payment and are
typi!ally ful:lled and destroyed within hours1
"omple5ity
-imple entities, e$en $alua%le entities, are rarely a !hallenge to
manage and are rarely !onsidered master.data elements1 The less
!omple5 an element, the less liely the need to manage !hange for
that element1 Typi!ally, su!h assets are simply !olle!ted and tallied1
6or e5ample, 6ort Kno5 liely would not tra! information on ea!h
indi$idual gold %ar stored there, %ut rather only eep a !ount of them1
The $alue of ea!h gold %ar is su%stantial, the !ardinality high, and the
lifespan longF yet, the !omple5ity is low1
Halue
The more $alua%le the data element is to the !ompany, the more liely
it will %e !onsidered a master data element1 Halue and !omple5ity wor
together1
Holatility
While master data is typi!ally less $olatile than transa!tional data,
entities with attri%utes that do not !hange at all typi!ally do not re0uire
a master.data solution1 6or e5ample, rare !oins would seem to meet
many of the !riteria for a master.data treatment1 ) rare.!oin !olle!tor
would liely ha$e many rare !oins1 -o, !ardinality is high1 They are
$alua%le1 They are also !omple51 6or e5ample, rare !oins ha$e a
history and des!ription1 There are attri%utes, su!h as !ondition of
o%$erse, re$erse, legend, ins!ription, rim, and :eld1 There are other
attri%utes, su!h as designer initials, edge design, layers, and portrait1
Get, rare !oins do not need to %e managed as a master.data item,
%e!ause they don9t !hange o$er time=or, at least, they don9t !hange
enough1 There may need to %e more information added, as the history
of a parti!ular !oin is re$ealed or if !ertain attri%utes must %e
!orre!ted1 <ut, generally speaing, rare !oins would not %e managed
through a master.data management system, %e!ause they are not
$olatile enough to warrant a solution1
Reuse
/ne of the primary dri$ers of master.data management is reuse1 6or
e5ample, in a simple world, the "RM system would manage e$erything
a%out a !ustomer and ne$er need to share any information a%out the
!ustomer with other systems1 Howe$er, in today9s !omple5
en$ironments, !ustomer information needs to %e shared a!ross
multiple appli!ations1 That9s where the trou%le %egins1 <e!ause=for a
num%er of reasons=a!!ess to a master datum is not always a$aila%le,
people start storing master data in $arious lo!ations, su!h as
spreadsheets and appli!ation pri$ate stores1 There are still reasons,
su!h as data.0uality degradation and de!ay, to manage master data
that is not reused a!ross the enterprise1 Howe$er, if a master.data
entity is reused in multiple systems, it9s a sure %et that it should %e
managed with a master.data management system1
To summari4e, while it is simple to enumerate the $arious master.data
entity types, it is sometimes more !hallenging to de!ide whi!h data
items in a !ompany should %e treated as master data1 /ften, data that
does not normally !omply with the de:nition for master data may need
to %e managed as su!h, and data that does !omply with the de:nition
may not1 Bltimately, when de!iding on what entity types should %e
treated as master data, it is %etter to !ategori4e them in terms of their
%eha$ior and attri%utes within the !onte5t of the %usiness needs than
to rely on simple lists of entity types1
Why -hould 2 Manage Master Data3
<e!ause it is used %y multiple appli!ations, an error in master data !an
!ause errors in all the appli!ations that use it1 6or e5ample, an
in!orre!t address in the !ustomer master might mean orders, %ills, and
mareting literature are all sent to the wrong address1 -imilarly, an
in!orre!t pri!e on an item master !an %e a mareting disaster, and an
in!orre!t a!!ount num%er in an )!!ount Master !an lead to huge :nes
or e$en @ail time for the "7/=a !areer.limiting mo$e for the person
who made the mistaeI
Here is a typi!al master.data horror story* ) !redit.!ard !ustomer
mo$es from &847 #orth Jth -t1 to 1''1 11th -t1 #orth1 The !ustomer
!hanged his %illing address immediately, %ut did not re!ei$e a %ill for
se$eral months1 /ne day, the !ustomer re!ei$ed a threatening phone
!all from the !redit.!ard %illing department, asing why the %ill has not
%een paid1 The !ustomer $eri:es that they ha$e the new address, and
the %illing department $eri:es that the address on :le is 1''1 11th -t1
#1 The !ustomer ass for a !opy of the %ill, to settle the a!!ount1 )fter
two more wees without a %ill, the !ustomer !alls %a! and :nds the
a!!ount has %een turned o$er to a !olle!tion agen!y1 This time, they
:nd out that e$en though the address in the :le was 1''1 11th -t1 #,
the %illing address is 1'1 11th -t1 #1 )fter a %un!h of phone !alls and
letters %etween lawyers, the %ill :nally gets resol$ed and the !redit.
!ard !ompany has lost a !ustomer for life1 2n this !ase, the master !opy
of the data was a!!urate, %ut another !opy of it was Kawed1 Master
data must %e %oth !orre!t and !onsistent1
7$en if the master data has no errors, few organi4ations ha$e @ust one
set of master data1 Many !ompanies grow through mergers and
a!0uisitions1 7a!h !ompany you a!0uire !omes with its own !ustomer
master, item master, and so forth1 This would not %e %ad if you !ould
@ust Bnion the new master data with your !urrent master data, %ut
unless the !ompany you a!0uire is in a !ompletely diDerent %usiness in
a faraway !ountry, there9s a $ery good !han!e that some !ustomers
and produ!ts will appear in %oth sets of master data=usually, with
diDerent formats and diDerent data%ase eys1 2f %oth !ompanies use
the Dun L <radstreet num%er or -o!ial -e!urity num%er as the
!ustomer identi:er, dis!o$ering whi!h !ustomer re!ords are for the
same !ustomer is a straightforward issueF %ut that seldom happens1 2n
most !ases, !ustomer num%ers and part num%ers are assigned %y the
software that !reates the master re!ords, so the !han!es of the same
!ustomer or the same produ!t ha$ing the same identi:er in %oth
data%ases is pretty remote1 2tem masters !an %e e$en harder to
re!on!ile, if e0ui$alent parts are pur!hased from diDerent $endors with
diDerent $endor num%ers1
Merging master lists together !an %e $ery diA!ult1 The same !ustomer
may ha$e diDerent names, !ustomer num%ers, addresses, and phone
num%ers in diDerent data%ases1 6or e5ample, William -mith might
appear as <ill -mith, Wm1 -mith, and William -mithe1 #ormal data%ase
@oins and sear!hes will not %e a%le to resol$e these diDeren!es1 ) $ery
sophisti!ated tool that understands ni!names, alternate spellings, and
typing errors will %e re0uired1 The tool will pro%a%ly also ha$e to
re!ogni4e that diDerent name $ariations !an %e resol$ed, if they all li$e
at the same address or ha$e the same phone num%er1 While !reating a
!lean master list !an %e a daunting !hallenge, there are many positi$e
%ene:ts to your %ottom line from a !ommon master list*
) single, !onsolidated %ill sa$es money and impro$es !ustomer
satisfa!tion1
-ending the same mareting literature to a !ustomer from
multiple !ustomer lists wastes money and irritates the !ustomer1
<efore you turn a !ustomer a!!ount o$er to a !olle!tion agen!y,
it would %e good to now if they owe other parts of your !ompany
money or, more importantly, that they are another di$ision9s
%iggest !ustomer1
-to!ing the same item under diDerent part num%ers is not only
a waste of money and shelf spa!e, %ut !an potentially lead to
arti:!ial shortages1
The re!ent mo$ements toward -/) and -aa- mae Master Data
Management a !riti!al issue1 6or e5ample, if you !reate a single
!ustomer ser$i!e that !ommuni!ates through well.de:ned >M?
messages, you may thin you ha$e de:ned a single $iew of your
!ustomers1 <ut if the same !ustomer is stored in :$e data%ases with
three diDerent addresses and four diDerent phone num%ers, what will
your !ustomer ser$i!e return3 -imilarly, if you de!ide to su%s!ri%e to a
"RM ser$i!e pro$ided through -aa-, the ser$i!e pro$ider will need a
list of !ustomers for their data%ase1 Whi!h one will you send them3
6or all these reasons, maintaining a high.0uality, !onsistent set of
master data for your organi4ation is rapidly %e!oming a ne!essity1 The
systems and pro!esses re0uired to maintain this data are nown
asMaster Data Management1
What 2s Master Data Management3
6or purposes of this arti!le, we de:ne Master Data Management +MDM,
as the te!hnology, tools, and pro!esses re0uired to !reate and
maintain !onsistent and a!!urate lists of master data1 There are a
!ouple things worth noting in this de:nition1 /ne is that MDM is not @ust
a te!hnologi!al pro%lem1 2n many !ases, fundamental !hanges to
%usiness pro!ess will %e re0uired to maintain !lean master data, and
some of the most diA!ult MDM issues are more politi!al than te!hni!al1
The se!ond thing to note is that MDM in!ludes %oth !reating and
maintaining master data1 2n$esting a lot of time, money, and eDort in
!reating a !lean, !onsistent set of master data is a wasted eDort unless
the solution in!ludes tools and pro!esses to eep the master data
!lean and !onsistent as it is updated and e5panded1
While MDM is most eDe!ti$e when applied to all the master data in an
organi4ation, in many !ases the ris and e5pense of an enterprise.wide
eDort are diA!ult to @ustify1 2t may %e easier to start with a few ey
sour!es of Master Data and e5pand the eDort, on!e su!!ess has %een
demonstrated and lessons ha$e %een learned1 2f you do start small,
you should in!lude an analysis of all the master data that you might
e$entually want to in!lude, so you do not mae design de!isions or tool
!hoi!es that will for!e you to start o$er when you try to in!orporate a
new data sour!e1 6or e5ample, if your initial "ustomer master
implementation only in!ludes the 1',''' !ustomers your dire!t.sales
for!e deals with, you don9t want to mae design de!isions that will
pre!lude adding your 1',''',''' We% !ustomers later1
)n MDM pro@e!t plan will %e inKuen!ed %y re0uirements, priorities,
resour!e a$aila%ility, time frame, and the si4e of the pro%lem1 Most
MDM pro@e!ts in!lude at least these phases*
11 Identify sources of master data. This step is usually a $ery
re$ealing e5er!ise1 -ome !ompanies :nd they ha$e do4ens of
data%ases !ontaining !ustomer data that the 2T department did
not now e5isted1
&1 Identify the producers and consumers of the master
data. Whi!h appli!ations produ!e the master data identi:ed in
the :rst step, and=generally more diA!ult to determine=whi!h
appli!ations use the master data1 Depending on the approa!h
you use for maintaining the master data, this step might not %e
ne!essary1 6or e5ample, if all !hanges are dete!ted and handled
at the data%ase le$el, it pro%a%ly does not matter where the
!hanges !ome from1
M1 Collect and analyze metadata about for your master
data. 6or all the sour!es identi:ed in step one, what are the
entities and attri%utes of the data, and what do they mean3 This
should in!lude attri%ute name, datatype, allowed $alues,
!onstraints, default $alues, dependen!ies, and who owns the
de:nition and maintenan!e of the data1 The owner is the most
important and often the hardest to determine1 2f you ha$e a
repository loaded with all your metadata, this step is an easy one1
2f you ha$e to start from data%ase ta%les and sour!e !ode, this
!ould %e a signi:!ant eDort1
41 ppoint data ste!ards. These should %e the people with the
nowledge of the !urrent sour!e data and the a%ility to determine
how to transform the sour!e into the master.data format1 2n
general, stewards should %e appointed from the owners of ea!h
master.data sour!e, the ar!hite!ts responsi%le for the MDM
systems, and representati$es from the %usiness users of the
master data1
N1 Implement a data"#o$ernance pro#ram and data"
#o$ernance council. This group must ha$e the nowledge and
authority to mae de!isions on how the master data is
maintained, what it !ontains, how long it is ept, and how
!hanges are authori4ed and audited1 Hundreds of de!isions must
%e made in the !ourse of a master.data pro@e!t, and if there is not
a well.de:ned de!ision.maing %ody and pro!ess, the pro@e!t !an
fail, %e!ause the politi!s pre$ent eDe!ti$e de!ision maing1
(1 De$elop the master"data model. De!ide what the master
re!ords loo lie* what attri%utes are in!luded, what si4e and
datatype they are, what $alues are allowed, and so forth1 This
step should also in!lude the mapping %etween the master.data
model and the !urrent data sour!es1 This is normally %oth the
most important and most diA!ult step in the pro!ess1 2f you try to
mae e$ery%ody happy %y in!luding all the sour!e attri%utes in
the master entity, you often end up with master data that is too
!omple5 and !um%ersome to %e useful1 6or e5ample, if you
!annot de!ide whether weight should %e in pounds or ilograms,
one approa!h would %e to in!lude %oth +Weight?% and WeightKg,1
While this might mae people happy, you are wasting mega%ytes
of storage for num%ers that !an %e !al!ulated in mi!rose!onds, as
well as running the ris of !reating in!onsistent data +Weight?% O
N and WeightKg O N,1 While this is a pretty tri$ial e5ample, a
%igger issue would %e maintaining multiple part num%ers for the
same part1 )s in any !ommittee eDort, there will %e :ghts and
deals resulting in su%.optimal de!isions1 2t9s important to wor out
the de!ision pro!ess, priorities, and :nal de!ision maer in
ad$an!e, to mae sure things run smoothly1
71 Choose a toolset. Gou will need to %uy or %uild tools to !reate
the master lists %y !leaning, transforming, and merging the
sour!e data1 Gou will also need an infrastru!ture to use and
maintain the master list1 These fun!tions are !o$ered in detail
later in the paper1
Gou !an use a single toolset from a single $endor for all of these
fun!tions, or you might want to tae a %est.of.%reed approa!h1 2n
general, the te!hni0ues to !lean and merge data are diDerent for
diDerent types of data, so there are not a lot of tools that span
the whole range of master data1
The two main !ategories of tools are "ustomer Data 2ntegration
+"D2, tools for !reating the !ustomer master and 8rodu!t
2nformation Management +82M, tools for !reating the produ!t
master1 -ome tools will do %oth, %ut generally they are %etter at
one or the other1
The toolset should also ha$e support for :nding and :5ing data.
0uality issues and maintaining $ersions and hierar!hies1
Hersioning is a !riti!al feature, %e!ause understanding the history
of a master.data re!ord is $ital to maintaining its 0uality and
a!!ura!y o$er time1 6or e5ample, if a merge tool !om%ines two
re!ords for Pohn -mith in <oston, and you de!ide there really are
two diDerent Pohn -miths in <oston, you need to now what the
re!ords looed lie %efore they were merged, in order to
;unmerge; them1
81 Desi#n the infrastructure. /n!e you ha$e !lean, !onsistent
master data, you will need to e5pose it to your appli!ations and
pro$ide pro!esses to manage and maintain it1 This step is a %ig.
enough issue, 2 de$ote a se!tion to it later in the do!ument1 When
this infrastru!ture is implemented, you will ha$e a num%er of
appli!ations that will depend on it %eing a$aila%le, so relia%ility
and s!ala%ility are important !onsiderations to in!lude in your
design1 2n most !ases, you will ha$e to implement signi:!ant
parts of the infrastru!ture yourself, %e!ause it will %e designed to
:t into your !urrent infrastru!ture, platforms, and appli!ations1
J1 %enerate and test the master data. This step is where you
use the tools you ha$e de$eloped or pur!hased to merge your
sour!e data into your master.data list1 This is often an iterati$e
pro!ess re0uiring tinering with rules and settings to get the
mat!hing right1 This pro!ess also re0uires a lot of manual
inspe!tion to ensure that the results are !orre!t and meet the
re0uirements esta%lished for the pro@e!t1 #o tool will get the
mat!hing done !orre!tly 1'' per!ent of the time, so you will ha$e
to weigh the !onse0uen!es of false mat!hes $ersus missed
mat!hes to determine how to !on:gure the mat!hing tools1 6alse
mat!hes !an lead to !ustomer dissatisfa!tion, if %ills are
ina!!urate or the wrong person is arrested1 Too many missed
mat!hes mae the master data less useful, %e!ause you are not
getting the %ene:ts you in$ested in MDM to get1
1'1 Modify the producin# and consumin#
systems. Depending on how your MDM implementation is
designed, you might ha$e to !hange the systems that produ!e,
maintain, or !onsume master data to wor with the new sour!e of
master data1 2f the master data is used in a system separate from
the sour!e systems=a data warehouse, for e5ample=the sour!e
systems might not ha$e to !hange1 2f the sour!e systems are
going to use the master data, howe$er, there will liely %e
!hanges re0uired1 7ither the sour!e systems will ha$e to a!!ess
the new master data or the master data will ha$e to %e
syn!hroni4ed with the sour!e systems, so that the sour!e
systems ha$e a !opy of the !leaned.up master data to use1 2f it9s
not possi%le to !hange one or more of the sour!e systems, either
that sour!e system might not %e a%le to use the master data or
the master data will ha$e to %e integrated with the sour!e
system9s data%ase through e5ternal pro!esses, su!h as triggers
and -Q? !ommands1
The sour!e systems generating new re!ords should %e !hanged
to loo up e5isting master re!ord sets %efore !reating new
re!ords or updating e5isting master re!ords1 This ensures that the
0uality of data %eing generated upstream is good, so that the
MDM !an fun!tion more eA!iently and the appli!ation itself
manages data 0uality1 MDM should %e le$eraged not only as a
system of re!ord, %ut also as an appli!ation that promotes
!leaner and more eA!ient handling of data a!ross all appli!ations
in the enterprise1 )s part of MDM strategy, all three pillars of data
management need to %e looed into* data origination, data
management, and data !onsumption1 2t is not possi%le to ha$e a
ro%ust enterprise.le$el MDM strategy if any one of these aspe!ts
is ignored1
111 Implement the maintenance processes. )s we stated
earlier, any MDM implementation must in!orporate tools,
pro!esses, and people to maintain the 0uality of the data1 )ll data
must ha$e a data steward who is responsi%le for ensuring the
0uality of the master data1 The data steward is normally a
%usiness person who has nowledge of the data, !an re!ogni4e
in!orre!t data, and has the nowledge and authority to !orre!t
the issues1 The MDM infrastru!ture should in!lude tools that help
the data steward re!ogni4e issues and simplify !orre!tions1 )
good data.stewardship tool should point out 0uestiona%le
mat!hes that were made=!ustomers with diDerent names and
!ustomer num%ers that li$e at the same address, for e5ample1
The steward might also want to re$iew items that were added as
new, %e!ause the mat!h !riteria were !lose %ut %elow the
threshold1 2t is important for the data steward to see the history
of !hanges made to the data %y the MDM systems, to isolate the
sour!e of errors and undo in!orre!t !hanges1 Maintenan!e also
in!ludes the pro!esses to pull !hanges and additions into the
MDM system, and to distri%ute the !leansed data to the re0uired
pla!es1
)s you !an see, MDM is a !omple5 pro!ess that !an go on for a long
time1 ?ie most things in software, the ey to su!!ess is to implement
MDM in!rementally, so that the %usiness reali4es a series of short.term
%ene:ts while the !omplete pro@e!t is a long.term pro!ess1 #o MDM
pro@e!t !an %e su!!essful without the support and parti!ipation of the
%usiness users1 2T professionals do not ha$e the domain nowledge to
!reate and maintain high.0uality master data1 )ny MDM pro@e!t that
does not in!lude !hanges to the pro!esses that !reate, maintain, and
$alidate master data is liely to fail1 The rest of this paper will !o$er
the details of the te!hnology and pro!esses for !reating and
maintaining master data1
How Do 2 "reate a Master ?ist3
Whether you %uy a tool or de!ide to roll your own, there are two %asi!
steps to !reating master data* !lean and standardi4e the data, and
mat!h data from all the sour!es to !onsolidate dupli!ates1 <efore you
!an start !leaning and normali4ing your data, you must understand the
data model for the master data1 )s part of the modeling pro!ess, the
!ontents of ea!h attri%ute were de:ned, and a mapping was de:ned
from ea!h sour!e system to the master.data model1 This information is
used to de:ne the transformations ne!essary to !lean your sour!e
data1
"leaning the data and transforming it into the master data model is
$ery similar to the 75tra!t, Transform, and ?oad +7T?, pro!esses used
to populate a data warehouse1 2f you already ha$e 7T? tools and
transformation de:ned, it might %e easier @ust to modify these as
re0uired for the master data, instead of learning a new tool1 Here are
some typi!al data.!leansing fun!tions*
&ormalize data formats. Mae all the phone num%ers loo the
same, transform addresses +and so on, to a !ommon format1
Replace missin# $alues. 2nsert defaults, loo up R28 !odes from
the address, loo up the Dun L <radstreet num%er1
Standardize $alues. "on$ert all measurements to metri!,
!on$ert pri!es to a !ommon !urren!y, !hange part num%ers to an
industry standard1
Map attributes. 8arse the :rst name and last name out of a
!onta!t.name :eld, mo$e 8artS and partno to the 8art#um%er
:eld1
Most tools will !leanse the data that they !an, and put the rest into an
error ta%le for hand pro!essing1 Depending on how the mat!hing tool
wors, the !leansed data will %e put into a master ta%le or a series of
staging ta%les1 )s ea!h sour!e is !leansed, the output should %e
e5amined to ensure the !leansing pro!ess is woring !orre!tly1
Mat!hing master.data re!ords to eliminate dupli!ates is %oth the
hardest and most important step in !reating master data1 6alse
mat!hes !an a!tually lose data +two )!me "orporations %e!ome one,
for e5ample, and missed mat!hes redu!e the $alue of maintaining a
!ommon list1 The mat!hing a!!ura!y of MDM tools is one of the most
important pur!hase !riteria1 -ome mat!hes are pretty tri$ial to do1 2f
you ha$e -o!ial -e!urity num%ers for all your !ustomers, or if all your
produ!ts use a !ommon num%ering s!heme, a data%ase P/2# will :nd
most of the mat!hes1 This hardly e$er happens in the real world,
howe$er, so mat!hing algorithms are normally $ery !omple5 and
sophisti!ated1 "ustomers !an %e mat!hed on name, maiden name,
ni!name, address, phone num%er, !redit.!ard num%er, and so on,
while produ!ts are mat!hed on name, des!ription, part num%er,
spe!i:!ations, and pri!e1 The more attri%ute mat!hes and the !loser
the mat!h, the higher degree of !on:den!e the MDM system has in the
mat!h1 This !on:den!e fa!tor is !omputed for ea!h mat!h, and if it
surpasses a threshold, the re!ords mat!h1 The threshold is normally
ad@usted depending on the !onse0uen!es of a false mat!h1 6or
e5ample, you might spe!ify that if the !on:den!e le$el is o$er JN
per!ent, the re!ords are merged automati!ally, and if the !on:den!e is
%etween 8' per!ent and JN per!ent, a data steward should appro$e
the mat!h %efore they are merged1
Most merge tools merge one set of input into the master list, so the
%est pro!edure is to start the list with the data in whi!h you ha$e the
most !on:den!e, and then merge the other sour!es in one at a time1 2f
you ha$e a lot of data and a lot of pro%lems with it, this pro!ess !an
tae a long time1 Gou might want to start with the data from whi!h you
e5pe!t to get the most %ene:t ha$ing !onsolidatedF run a pilot pro@e!t
with that data, to ensure your pro!esses wor and you are seeing the
%usiness %ene:ts you e5pe!tF and then start adding other sour!es, as
time and resour!es permit1 This approa!h means your pro@e!t will tae
longer and possi%ly !ost more, %ut the ris is lower1 This approa!h also
lets you start with a few organi4ations and add more as the pro@e!t
demonstrates su!!ess, instead of trying to get e$ery%ody on %oard
from the start1
)nother fa!tor to !onsider when merging your sour!e data into the
master list is pri$a!y1 When !ustomers %e!ome part of the !ustomer
master, their information might %e $isi%le to any of the appli!ations
that ha$e a!!ess to the !ustomer master1 2f the !ustomer data was
o%tained under a pri$a!y poli!y that limited its use to a parti!ular
appli!ation, you might not %e a%le to merge it into the !ustomer
master1 Gou might want to add a lawyer to your MDM planning team1
)t this point, if your goal was to produ!e a list of master data, you are
done1 8rint it out or %urn it to a "D, and mo$e on1 2f you want your
master data to stay !urrent as data is added and !hanged, you will
ha$e to de$elop infrastru!ture and pro!esses to manage the master
data o$er time1 The ne5t se!tion pro$ides some options on how to do
@ust that1
How Do 2 Maintain a Master ?ist3
There are many diDerent tools and te!hni0ues for managing and using
master data1 We will !o$er three of the more !ommon s!enarios here*
Sin#le"copy approach=2n this approa!h, there is only one
master !opy of the master data1 )ll additions and !hanges are
made dire!tly to the master data1 )ll appli!ations that use master
data are rewritten to use the new data instead of their !urrent
data1 This approa!h guarantees !onsisten!y of the master data,
%ut in most !ases it9s not pra!ti!al1 Modifying all your appli!ations
to use a new data sour!e with a diDerent s!hema and diDerent
data is, at least, $ery e5pensi$eF if some of your appli!ations are
pur!hased, it might e$en %e impossi%le1
Multiple copies' sin#le maintenance=2n this approa!h,
master data is added or !hanged in the single master !opy of the
data, %ut !hanges are sent out to the sour!e systems in whi!h
!opies are stored lo!ally1 7a!h appli!ation !an update the parts of
the data that are not part of the master data, %ut they !annot
!hange or add master data1 6or e5ample, the in$entory system
might %e a%le to !hange 0uantities and lo!ations of parts, %ut
new parts !annot %e added, and the attri%utes that are in!luded
in the produ!t master !annot %e !hanged1 This redu!es the
num%er of appli!ation !hanges that will %e re0uired, %ut the
appli!ations will minimally ha$e to disa%le fun!tions that add or
update master data1 Bsers will ha$e to learn new appli!ations to
add or modify master data, and some of the things they normally
do will not wor anymore1
Continuous mer#e=2n this approa!h, appli!ations are allowed
to !hange their !opy of the master data1 "hanges made to the
sour!e data are sent to the master, where they are merged into
the master list1 The !hanges to the master are then sent to the
sour!e systems and applied to the lo!al !opies1 This approa!h
re0uires few !hanges to the sour!e systemsF if ne!essary, the
!hange propagation !an %e handled in the data%ase, so no
appli!ation !ode is !hanged1 /n the surfa!e, this seems lie the
ideal solution1 )ppli!ation !hanges are minimi4ed, and no
retraining is re0uired1 7$ery%ody eeps doing what they are
doing, %ut with higher.0uality, more !omplete data1 This
approa!h does ha$e se$eral issues*
o Update con(icts are possible and di)cult to
reconcile. What happens if two of the sour!e systems
!hange a !ustomer9s address to diDerent $alues3 There9s no
way for the MDM system to de!ide whi!h one to eep, so
inter$ention %y the data steward is re0uiredF in the
meantime, the !ustomer has two diDerent addresses1 This
must %e addressed %y !reating data.go$ernan!e rules and
standard operating pro!edures, to ensure that update
!onKi!ts are redu!ed or eliminated1
o dditions must be remer#ed. When a !ustomer is
added, there is a !han!e that another system has already
added the !ustomer1 To deal with this situation, all data
additions must go through the mat!hing pro!ess again to
pre$ent new dupli!ates in the master1
o Maintainin# consistent $alues is more di)cult. 2f the
weight of a produ!t is !on$erted from pounds to ilograms
and then %a! to pounds, rounding !an !hange the original
weight1 This !an %e dis!on!erting to a user who enters a
$alue and then sees it !hange a few se!onds later1
2n general, all these things !an %e planned for and dealt with, maing
the user9s life a little easier, at the e5pense of a more !ompli!ated
infrastru!ture to maintain and more wor for the data stewards1 This
might %e an a!!epta%le trade.oD, %ut it9s one that should %e made
!ons!iously1
Hersioning and )uditing
#o matter how you manage your master data, it9s important to %e a%le
to understand how the data got to the !urrent state1 6or e5ample, if a
!ustomer re!ord was !onsolidated from two diDerent merged re!ords,
you might need to now what the original re!ords looed lie, in !ase a
data steward determines that the re!ords were merged %y mistae and
really should %e two diDerent !ustomers1 The $ersion management
should in!lude a simple interfa!e for displaying $ersions and re$erting
all or part of a !hange to a pre$ious $ersion1 The normal %ran!hing of
$ersions and grouping of !hanges that sour!e.!ontrol systems use !an
also %e $ery useful for maintaining diDerent deri$ation !hanges and
re$erting groups of !hanges to a pre$ious %ran!h1
Data stewardship and !omplian!e re0uirements will often in!lude a
way to determine who made ea!h !hange and when it was made1 To
support these re0uirements, an MDM system should in!lude a fa!ility
for auditing !hanges to the master data1 2n addition to eeping an audit
log, the MDM system should in!lude a simple way to :nd the parti!ular
!hange you are looing for1 )n MDM system !an audit thousands of
!hanges a day, so sear!h and reporting fa!ilities for the audit log are
important1
Hierar!hy Management
2n addition to the master data itself, the MDM system must maintain
data hierar!hies=for e5ample, %ill of materials for produ!ts, sales
territory stru!ture, organi4ation stru!ture for !ustomers, and so forth1
2t9s important for the MDM system to !apture these hierar!hies, %ut it9s
also useful for an MDM system to %e a%le to modify the hierar!hies
independently of the underlying systems1 6or e5ample, when an
employee mo$es to a diDerent !ost !enter, there might %e impa!ts to
the Tra$el and 75pense system, payroll, time reporting, reporting
stru!tures, and performan!e management1 2f the MDM system
manages hierar!hies, a !hange to the hierar!hy in a single pla!e !an
propagate the !hange to all the underlying systems1 There might also
%e reasons to maintain hierar!hies in the MDM system that do not e5ist
in the sour!e systems1 6or e5ample, re$enue and e5penses might need
to %e rolled up into territory or organi4ational stru!tures that do not
e5ist in any single sour!e system1 8lanning and fore!asting might also
re0uire temporary hierar!hies to !al!ulate ;what if; num%ers for
proposed organi4ational !hanges1 Histori!al hierar!hies are also
re0uired in many !ases to roll up :nan!ial information into stru!tures
that e5isted in the past, %ut not in the !urrent stru!ture1 6or these
reasons, a powerful, Ke5i%le hierar!hy.management feature is an
important part of an MDM system1
"on!lusion
The re!ent emphasis on regulatory !omplian!e, -/), and mergers and
a!0uisitions has made the !reating and maintaining of a!!urate and
!omplete master data a %usiness imperati$e1 <oth large and small
%usinesses must de$elop data.maintenan!e and go$ernan!e pro!esses
and pro!edures, to o%tain and maintain a!!urate master data1 While
it9s easy to thin of master.data management as a te!hnologi!al issue,
a purely te!hnologi!al solution without !orresponding !hanges to
%usiness pro!esses and !ontrols will liely fail to produ!e satisfa!tory
results1 This paper has !o$ered the reasons for adopting master.data
management, the pro!ess of de$eloping a solution, and se$eral options
for the te!hnologi!al implementation of the solution1 6uture papers in
this series will e5plain the te!hnologi!al and pro!edural issues that
must %e resol$ed to implement an MDM system1
T &'14 Mi!rosoft

You might also like