This research note is restricted to the personal use of Stephen Oudet (Soudet@deloitte.fr).
Does the 21st-Century "Big Data"
Warehouse Mean the End of the Enterprise Data Warehouse? 25 August 2011| I!"0021#0$1 %ar& A. 'e(er | onald )ein*erg The ideal enterprise data +arehouse has *een en,isaged as a centrali-ed repositor( for 25 (ears. *ut the ti/e has co/e for a ne+ t(pe of +arehouse to handle 0*ig data.0 This 0logical data +arehouse0 de/ands radical realign/ent of practices and a h(*rid architecture of repositories and ser,ices. Overview The ne+ data +arehouse needed for the infor/ation /anage/ent de/ands of the 21st centur( is not a replace/ent for e1isting practices. 2ather. it in,ol,es a funda/ental realign/ent of al/ost e,er( e1isting practice in order to pro,ide specific functionalit( +ithin a rest(led architecture that capitali-es on the greatest strength of e,er( techni3ue. approach and strateg(. At the sa/e ti/e. it introduces fresh techni3ues and architectural capa*ilities to /eet the de/and. created *( 0*ig data.0 cloud utili-ation. operational technolog( and social /edia. for deli,er( of data to traditional. readil( a,aila*le and consu/er4st(le anal(tics tools. The focus is on the data4processing or infor/ation /anage/ent logic. not the ph(sical infrastructure 5 this is a 0logical data +arehouse0 (67). Key indings 8 The ,ast /a9orit( of organi-ations (9udging fro/ o,er :5; of the data +arehouse in3uiries recei,ed fro/ "artner clients) select a single deplo(/ent st(le for +hat the( ter/ an enterprise data +arehouse (<7). In doing so the( create a co/pro/ised en,iron/ent that fails to deli,er on so/e aspect of the associated S6A. 8 Organi-ations that deplo( an <7 al/ost all create second and third data +arehouses or /arts to support additional user needs (9udging fro/ up to =0; of the data +arehouse in3uiries recei,ed fro/ "artner clients). despite strict instructions to use the <7. 8 The architectural st(le of a data +arehouse is usuall( deter/ined *( the a,aila*le s&ills and tools. and secondaril( *( ti/e4to4deli,er(. in preference to the anticipated future fle1i*ilit( or e1tensi*ilit( of the solution. !e"o##endations 8 Start (our e,olution to+ard a 67 *( identif(ing data assets that are not easil( addressed *( traditional data integration approaches and>or easil( supported *( a 0single ,ersion of the truth.0 ?onsider all technolog( options for data access and do not focus onl( on consolidated repositories. This is especiall( rele,ant to 0*ig data0 issues. 8 Identif( pilot pro9ects in +hich to use 67 concepts *( focusing on highl( ,olatile and significantl( interdependent *usiness processes. 8 @se an 67 to create a single. logicall( consistent infor/ation resource independent of an( se/antic la(er that is specific to an anal(tic platfor/. The 67 should /anage reused se/antics and reused data. $a%&e of Contents Anal(sis <nding the <ra of eficient ?o/pro/ise Ser,ice 6e,el and 'enefit <1pectations 5 2e,isited A ?o/*ined Ser,ices and Infor/ation Asset %anage/ent Alatfor/ The 6ogical ata 7arehouse Architecture <,ol,ing To+ard the 6ogical ata 7arehouse Bo+ <1isting Technolog( ?an )it In Page 1 of 10 Print Document 15/01/2014 http://my.gartner.com/portal/server.pt/gateway/PT!"#$0$2%%&5&'$'5'$25&$2'50... Ta*le 1. ata 7arehouse Architecture Arinciples. Ser,ice ri,ers and Ari/ar( 6i/itations )igure 1. Su//ar( of Standard ata 7arehouse Ser,ice ?ontracts )igure 2. Infor/ation ?apa*ilities )ra/e+or& %anage/ent and Se/antic Ser,ices ?ategories )igure #. Traditional ata 7arehouse and 'usiness Intelligence Infrastructure )igure C. Ser,ices4Oriented Anal(tics Infor/ation %anage/ent 2eco//ended 2eading 'ist of $a%&es 'ist of igures (na&ysis This docu/ent +as re,ised on 5 Septe/*er 2011. )or /ore infor/ation. see the ?orrections page on gartner.co/. ata +arehouse architecture is undergoing an i/portant e,olution. as co/pared +ith the relati,e stasis of the pre,ious 25 (ears. 7hile the ter/ 0data +arehouse0 +as coined around 1=$=. the architectural st(le predated the ter/ (at A/erican Airlines. )rito46a( and ?oca4?ola). At its core. a data +arehouse is a negotiated. consistent logical /odel that is populated using predefined transfor/ation processes. O,er the (ears. the ,arious options 5 centrali-ed <7. federated /arts. hu*4and4spo&e arra( of central +arehouse +ith dependent /arts. and ,irtual +arehouse 5 ha,e all ser,ed to e/phasi-e certain aspects of the ser,ice e1pectations for a data +arehouse. The co//on thread running through all st(les is that the( +ere repositor(4oriented. This. ho+e,er. is changing! the data +arehouse is e,ol,ing fro/ co/peting repositor( concepts to include a full( ena*led data /anage/ent and infor/ation4processing platfor/. This ne+ +arehouse forces a co/plete rethin& of ho+ data is /anipulated. and +here in the architecture each t(pe of processing occurs to support transfor/ation and integration. It also introduces a go,ernance /odel that is onl( loosel( coupled +ith data /odels and file structures. as opposed to the ,er( tight. ph(sical orientation pre,iousl( used. This ne+ t(pe of +arehouse 5 the 67 5 is an infor/ation /anage/ent and access engine that ta&es an architectural approach +hich de4e/phasi-es repositories in fa,or of ne+ guidelines! 8 The 67 follo+s a se/antic directi,e to orchestrate the consolidation and sharing of infor/ation assets. as opposed to one that focuses e1clusi,el( on storing integrated datasets. 8 The se/antics are descri*ed *( go,ernance rules fro/ data creation and use case *usiness processes in a data /anage/ent la(er. instead of ,ia a negotiated. static transfor/ation process located +ithin indi,idual tools or platfor/s. 8 Integration le,erages *oth stead(4state data assets in repositories and ser,ices in a fle1i*le. audited /odel ,ia the *est a,aila*le opti/i-ation and co/prehension solution a,aila*le. Ending the Era of Defi"ient Co#pro#ise So/e +ould sa( that the result of co/pro/ise is that e,er(one is e3uall( unhapp(. The ne+ data +arehouse is e1pected to /eet all pre,ious data +arehouse ser,ice4le,el e1pectations and to deli,er all the originall( intended *enefits of a +arehouse or integration platfor/ 5 *ut +ithout an( artificial li/itations *ased on use cases or deficient technolog(. At the sa/e ti/e. the ne+ +arehouse /ust integrate ,er( non4traditional infor/ation assets. <,er( data +arehouse is deplo(ed essentiall( to /eet specific ser,ice4le,el e1pectations for the deli,er( and /anage/ent of data. These e1pectations ha,e *een /et using a +ide ,ariet( of architectures and approaches. The *asic pre/ise *ehind the ne+ data +arehouse is that it +ill co/*ine the strengths of e,er( engineering approach pre,iousl( used to create a ,ariet( of architectural st(les into a ne+ /odel that supports eas( s+itching *et+een st(les or a h(*rid of di,erse deli,er( approaches. <1isting architectures /ust *e altered radicall( to /eet these ne+ de/ands. There are /an( co/ponents and e1pectations associated +ith each of the traditional +arehouse approaches. 'ut for each of the traditional approaches. there is a principal ser,ice e1pectation. a pri/ar( design dri,er and so/e predo/inant li/itation (other+ise Page 2 of 10 Print Document 15/01/2014 http://my.gartner.com/portal/server.pt/gateway/PT!"#$0$2%%&5&'$'5'$25&$2'50... alternati,es +ould not ha,e *een necessar(). Ta*le 1 co/pares traditional data +arehouse architectures and the 67. $a%&e 1) Data Warehouse (r"hite"ture *rin"ip&es+ ,ervi"e Drivers and *ri#ary 'i#itations Warehouse (r"hite"ture *rin"ipa& ,ervi"e- 'eve& E-pe"tation *ri#ary Design Driver *ri#ary 'i#itations Centra&i.ed repository/ Dor/ali-ed or slightl( denor/ali-ed data in a single data*aseE the traditional 0enterprise data +arehouse.0 Integrate and a*stract data for reuse in anal(tics. or ser,e as a data4sharing platfor/ for transactional s(ste/s. Deed to resol,e si/ilar or the sa/e data that +as designed and deplo(ed in different applications and s(ste/s. *ecause in those s(ste/s the data +as designed specificall( to support transactional roles. 8 Aerfor/ance opti/i-ation is often difficult due to the /ore nor/ali-ed nature of the data. 8 ?o/prehension and usage *arriers arise due to usersF lac& of fa/iliarit( +ith third nor/al for/ approaches. 8 Inherited data go,ernance fro/ authoring applications /a&es ongoing rationali-ation and e1tensions difficult. ederated #arts/ %ultiple indi,idual data /odels +ith 9oin ta*les or ,ie+s of selected infor/ation deplo(ed in one or /ore data*ases. Isolate cost and deplo(/ent for rapid deplo(/ent. +hile producing /ore co/prehensi*le reports in a short ti/e 4to4deli,er( /odel. The de/and for d(na/ic reporting +ithin a +ell4 descri*ed *usiness process. *ased on one *usiness process or *usiness unitFs specific infor/ation go,ernance de/ands. <na*les anal(sis *( drilling do+n into +ell4organi-ed reports. 8 Aerpetuates parochial definitions and data design. /erel( deferring costs for rationali-ation of so/eti/es inco/pati*le data /odels. 8 )orces /ultiple /aintenance points +ithout actuall( integrating disparate data. 0irtua& warehouse/ A ,ie+ or se/antic la(er o,er the top of transactional s(ste/s data. usuall( +ithout a dedicated repositor( *ut so/eti/es using cache technolog(. Aer/it the a*straction of disparate /odels fro/ disparate locations +ithout actuall( /o,ing the data. Allo+ for consolidated reporting across /ultiple s(ste/s +ithout ha,ing to add to the storage en,iron/ent. +hile also a,oiding significant additions to the co/pute>processing en,iron/ent or ser,er de/ands. 8 ependent on e1ternal li/itations for data ,olu/e. net+or& capacit( and source a,aila*ilit(. 8 Aressured *( desired end4user and application connections. Often disrupted *( do+nti/e fro/ these issues. 8 <,en the *est4 designed ,irtual +arehouse often has to resort to so/e for/ of ph(sicall( stored cache. 1u%-and-spo2e array/ Su//aries. aggregates and e,en ,ariants of si/ilar di/ensions. all deri,ed fro/ a central repositor( of transfor/ed. re/odeled and relocated transactional data. A second ,ariant of the Aro,ide for integration of designated su*sets of data. +hile deli,ering high4 perfor/ance and co/prehension4 opti/i-ed data access. The desire for /ultiple renderings of the sa/e data for different use cases. each opti/i-ed for perfor/ance. 8 Ti/e4to4deplo(/ent re3uires phased rollout. and poor planning of initial rollouts often forces a radical redesign t+o to fi,e iterations later. 8 Sa/e issues as for the centrali-ed repositor(. Page ' of 10 Print Document 15/01/2014 http://my.gartner.com/portal/server.pt/gateway/PT!"#$0$2%%&5&'$'5'$25&$2'50... ,our"e/ 3artner 4(ugust 25116 ,ervi"e 'eve& and Benefit E-pe"tations 7 !evisited <,er( data +arehouse is e1pected to /eet +ell4esta*lished and persistent ser,ice4le,el e1pectations as part of industr( *est practices in order to deli,er the desired *enefits of deplo(ing that +arehouse. In the past. /an( of the architectural. design and engineering approaches used to deplo( +arehouses e3uated to a series of co/pro/ises that fa,ored so/e of these 0ser,ice contracts0 to the detri/ent of others. or e,en sacrificed so/e re3uire/ents due to ti/e4to4/ar&et pressures. )igure 1 su//ari-es the ser,ices contracts of a data +arehouse (see Dote 1). igure 1) ,u##ary of ,tandard Data Warehouse ,ervi"e Contra"ts ,our"e/ 3artner 4(ugust 25116 The ne+ +arehouse has the sa/e ser,ice e1pectations. *ut is not specificall( a repositor( and it no+ includes a series of infor/ation /anage/ent ser,ices. So. +hat precisel( is the ne+ architectural for/G ( Co#%ined ,ervi"es and 8nfor#ation (sset Manage#ent *&atfor# The 67 incorporates *est practices for ser,ice4oriented architecture. infor/ation go,ernance. data +arehouses and infor/ation /anage/ent. It shifted the de*ate and the focus of data +arehouse design fro/ choosing *et+een fi1ed i/ple/entation and architectural st(les to appl(ing *est practices in /ultiple IT deli,er( areas for the /ost appropriate use. The 0old0 data +arehouse usuall( fa,ored one specified engineering approach. often using procedural processing to e1tract data fro/ designated repositories. ,alidating the transfor/ations against so/e+hat static *usiness rules. and then loading the data. )or e1a/ple. traditional e1traction. transfor/ation and loading (<T6) identifies the ta*le and colu/n +here the source data is and /o,es it in so/e t(pe of processing strea/ to a target. The for/at and content is &no+n at *oth source and target 5 such as +hen Warehouse (r"hite"ture *rin"ipa& ,ervi"e- 'eve& E-pe"tation *ri#ary Design Driver *ri#ary 'i#itations 0enterprise data +arehouse.0 'ogi"a& data warehouse ?o/*ine the *enefits of pre,ious approaches in a 0*est fit0 architecture. Add support for distri*uted data assets and parallel distri*ution of processing re3uire/ents +ith predicta*le and repeata*le results. +hile continuing to support data centrali-ation +hen appropriate. Support all pre,ious for/s of data +arehouse architecture +ith eas( s+itching *et+een data /anage/ent and deli,er( st(les. The need to account for the reuse of infor/ation transfor/ation. 3ualit( and access ser,ices. regardless of infor/ation>data for/ats or locations. to support data redistri*ution or anal(tics. Also. the need to support ne+ and di,erse data t(pes at the sa/e ti/e. 8 %ore of a *arrier than a li/itation. gi,en that e1isting +arehouse platfor/s and architectures +ere designed +ith centrali-ed processing as an underl(ing assu/ption 5 e,en for federated approaches 5 and that e1isting se/antics and data processing code +ill *e difficult to adapt and reuse. Page 4 of 10 Print Document 15/01/2014 http://my.gartner.com/portal/server.pt/gateway/PT!"#$0$2%%&5&'$'5'$25&$2'50... /o,ing 0first na/e0 in a custo/er ta*le to 0gi,en na/e0 in a +arehouse. The 3ualit( rules /ight e,en *e coded directl( into the <T6 s(ste/. '( contrast. the 67 ta&es a data ser,ices approach to /anaging these ,arious re3uire/ents. A data ser,ices approach separates data access fro/ processing. processing fro/ transfor/ation. and transfor/ation fro/ deli,er(. In a data ser,ices approach. the pieces are +ritten separatel( to ena*le fle1i*le 9o* flo+s and easil( coupled processing. 6etFs assu/e. for e1a/ple. that there are se,en sources for 0gi,en na/e.0 One le,el of ser,ices +ould /anage the connection string. Another le,el +ould read the /etadata ta*le indicating that in three of the s(ste/s the colu/n desired is na/ed 0fna/eH2C.0 in t+o of the s(ste/s it is listed as 0cusHfirstna/e0 and in another t+o s(ste/s 0na/eHgi,en0 and 0na/enerst/al.0 All of these are e3ui,alent to 0gi,enHna/e0 and therefore su*9ect to the sa/e data 3ualit( rule. So the ser,ice that accesses the data passes each of the/ to one co//on 3ualit( ser,ice. Then. after the co/pletion of 3ualit( operations. the data is passed on to a deli,er( ser,ice. Sa(. ho+e,er. that one of the targets is a data +arehouse that needs an insert ser,ice to a data*ase /anage/ent s(ste/ ('%S). that another deli,er( location is an I%6 /essage +hich needs I%6 structure around it. and that a third deli,er( location is an application +hich needs to +rite data to an arra( or cursor. etc. It +ould then *e possi*le to +rite code so that the I%6 is al+a(s created and additional ser,ices render the insert and the arra( *uild. or the three deli,er( functions (insert. I%6 and arra() could *e +ritten as three ser,ices. 'ased partl( (or no/inall() on the reuse rate of a function. it +ould *e necessar( to code that function in a loosel( coupled fashion or a tightl( coupled procedure. The 67 participates in. and is a *eneficiar( of. a +ider infor/ation capa*ilities ser,ices4 st(le approach (see 0Infor/ation %anage/ent in the 21st ?entur(.0). In a 21st4centur( infor/ation /anage/ent architecture. the ne+ +arehouse participates in an infor/ation capa*ilities fra/e+or& (I?))! see 0The Infor/ation ?apa*ilities )ra/e+or&! An Aligned Jision )or Infor/ation Infrastructure.0 Since a data +arehouse ser,es pri/aril( as a rationali-ation and integration engine. it is e1pected to perfor/ /ost of its infor/ation /anage/ent duties using data /anage/ent ,er*s that integrate and organi-e data. Additionall(. the +arehouse is e1pected to deli,er integrated infor/ation in an opti/i-ed fashion. supporting *oth co/prehension and perfor/ance. The ne+ +arehouse. therefore. /ust 0decide0 +hen a consolidated repositor( or a transient (,irtual) st(le of deli,er( is appropriate. Organi-ation +ill ta&e place at t+o le,els 5 first putting infor/ation assets together and then deter/ining +hether a su//ar(>aggregated dataset is the *est organi-ation approach for an end4use case. An I?) specifies that. regardless of ho+ an application or repositor( is designed. the infor/ation /anage/ent approach used is e1pected to perfor/ duties and ser,ices fro/ +ell4esta*lished categories of infor/ation /anage/ent functions (see )igure 2). )urther. infor/ation itself is treated as an o*9ect +ith ,alue. integrit( and rules of *eha,ior. So/e of these rules can *e deplo(ed as logical policies that are enacted against an( asset t(pe. as long as the actual content is the sa/e. )or e1a/ple. a personFs na/e is his or her na/e regardless of +hether it is in a data*ase. a docu/ent or spo&en in an audio clip. The 67 architect si/pl( deter/ines +here each of these ,er* classes +ill *e pro,ided 5 in a data*ase. on a ser,ices *us. in a ,ie+ la(er and so on. I/portantl(. an <7 uses a 0dedicated0 se/antic st(le onl(. *ut an 67 uses all se/antic st(les *ased on +hich is /ost appropriate for the applica*le S6A. igure 2) 8nfor#ation Capa%i&ities ra#ewor2 Manage#ent and ,e#anti" ,ervi"es Categories ,our"e/ 3artner 4(ugust 25116 $he 'ogi"a& Data Warehouse (r"hite"ture In a ser,ices4oriented approach to data /anage/ent it is i/perati,e to understand that nothing is re3uired to e1ecute in the sa/e order e,er( ti/e. Orchestration of ser,ices can *e declared or d(na/ic. eclared orchestration e1ecutes al/ost as /odulari-ed procedural code. the difference *eing that the ser,ices are free4standing operations and can also *e called *( other co/posite processes to occur in a different order. (na/ic orchestration reacts to /etadata instructions that are often recei,ed as audits of the en,iron/ent. )or Page 5 of 10 Print Document 15/01/2014 http://my.gartner.com/portal/server.pt/gateway/PT!"#$0$2%%&5&'$'5'$25&$2'50... example, an analytic query that anticipates putting multiple sources together could get the information from an integrated repository or from a federated view; it might decide which is best by comparing the latency of the integrated repository data with the intention of the querying user to capture newer or older data. Within an ICF, the data warehouse, lie any other use case, must determine a primary semantic !entry point! to begin using a services architecture. For the data warehouse this primary entry point is defined by the primary service contract, to deliver a consolidated view of disparate data in optimal fashion. " warehouse needs to access sources and deliver that consolidated view. #herefore, the $%W is designed, first and foremost, using a combination of services and physical data repositories. &econdly, it can be designed with a focus on declared or dynamic orchestration. Finally, it is possible to design some of the $%W using any combination of physical repositories, virtual data ob'ects, declared orchestration or dynamic orchestration. It is also possible to begin with a physical repository approach with highly dedicated, declared access, and then evolve slowly toward more dynamic and mixed data delivery approaches. Evolving Toward the Logical Data Warehouse #raditional data warehouses and (I environments have a fairly consistent architecture )see Figure *+. &ome of the capabilities are on different platforms, but there is primarily a unidirectional flow of data toward one set of new models and data governance rules. Figure 3. Traditional Data Warehouse and Business Intelligence Infrastructure (I , business intelligence; %(-& , database management system; %W , data warehouse; .#$ , extraction, transformation and loading; $%"/ , $ightweight %irectory "ccess /rotocol; 0%& , operational data store; 0$"/ , online analytical processing; 1%(-& , relational database management system Source: Gartner !ugust "#$$% If we assume an initial state with a traditional data warehouse, the following points most liely apply2 3 4ou already have a data integration process with !describe! and !organi5e! functions that specify both the source and target states of the data. #hey may or may not be deployed as modular code or metadata that drives the process. 3 4ou already have functions that resolve differences between the governance rules of sources and your warehouse target. 4ou also have integration processes that resolve formatting issues. 3 4ou have some implementation rules 6 sometimes embedded at design time, sometimes deployed when ready for runtime. Page 6 of 10 Print Document 15/01/2014 http://my.gartner.com/portal/server.pt/gateway/P!"#$%0%2&&656'%'5'%256%2'50... 3 4ou may or may not have auditing capabilities built into your processing )such as for profiling, record counts of completed versus dropped transformations and data quality !outs!+. 7owever, they are probably designed for permanent use of a combined !consolidation! and !dedicated! semantic layer. In other words, they are probably procedural and not dynamic )at least not without returning to the design tools and redeploying+. 3 4our existing orchestration is most liely not dynamic 6 and, unless you are using a virtual warehouse strategy, the concept of using registries for data sources and target ob'ects is most liely nonexistent. 3 #here is probably little or no ability to use other repositories as information assets in query responses 6 such external assets are either loaded directly during a transformation8and8load process or loaded from one of your source systems )as with postal data added to an .1/ system and then relayed to the warehouse+. $et9s assume that, instead of accepting this unidirectional, static orchestration, you want to develop an $%W. #o do this, you start by introducing the $%W concepts used for dynamic consolidation, integration and implementation, as depicted in Figure : )note that the diagram uses today9s terminology 6 transformation using !.#$;.$#,! !federation! and so on 6 but that these concepts are deconstructed, for evolution toward a modern architecture, in !Information -anagement in the <=st Century! +2 3 #he data integration process can be broen into sourcing, collation, data quality, formatting and domain governance segments, based on information availability and governance rules. For example, the sourcing;extraction process can be a registry semantic layer using !describe! verbs that tell the service !where! the data is. If data for !person! is located in documents, clicstreams and enterprise systems, one service can use textual analysis and search for documents, another service can use -ap1educe to read massive volumes of tags in !clics,! and a traditional native driver access approach can pull data from the enterprise system database. " data quality process can then verify the wor done by each service and undertae an enrichment and;or value substitution process, before prepping the data for delivery. If the data is dynamic and constantly changing, the data integration process can deliver a virtual data ob'ect, but if the data is already validated by a master data management process and fairly static, it can be loaded into a table or file. " final service can determine the appropriate load or access format and put the data into that format. 3 In relation to latency issues, you are no longer bound by load restrictions. It is possible to indicate in a metadata layer that there are different requirements for different analytic end8use cases. For example, one department may require higher8 quality data but tolerate higher8latency delivery )it would get data from fully validated tables+, while another department might be prepared to ris inconsistencies in data but require low latency )it would get a combined8registry delivery of yesterday9s data in the tables with today9s data from the 0$#/ system 6 !dirty but fast!+. 0r, instead of this fixed approach, you could have a service that negotiates whether the quality &$" is being met for each of the departments and switches between strategies dynamically. For example, the department requiring low latency might receive data from the warehouse repository in the morning, after the previous night9s load had brought everything up to date, but in the afternoon it might receive a composite view. "nd, instead of switching at a predetermined time of day, the switch would be based on how far out of synchroni5ation the two sources are, based on record counts and data quality ratings. 3 " dynamic service that determines when to write summary or aggregate data is generally faster than one that performs a query8time summary of detailed rows. It could even switch on the basis of C/> and storage utili5ation;performance audits, and change its approach throughout the day. It could also switch dynamically between approaches on the basis of system audits that determine whether more memory is added for caching, or even if it is worthwhile to perform caching. 3 "dding external data based on services written to read and analy5e those data sources also becomes easier. For example, adding operational8technology data such as the millions of records generated each day by 1FI%8enabled supply chain management tracing systems or utility smart grid meters requires massive data integration processing in a procedural manner when using traditional warehouses (ut by developing two or three variants of the same -ap1educe function, the $%W can orchestrate the preferred approach for different analyst audiences and leave the data in the source or the historian software )see !7istorian &oftware and the &mart ?rid2 @uestions and -isconceptions!+. Aote that with the $%W approach, the differing styles of support, such as federation, data repositories, messaging and reductions, are not mutually exclusive. #hey are 'ust manifestations of data delivery. #he focus is on getting the data first, then figuring out the delivery approach that best achieves the &$" with the querying application. #he transformations occur in separate services. Figure &. Services'(riented !nal)tics Infor*ation +anage*ent Page & of 10 Print Document 15/01/2014 http://my.gartner.com/portal/server.pt/gateway/P!"#$%0%2&&656'%'5'%256%2'50... .#$ , extraction, transformation and loading; .$# , extraction, loading and transformation Source: Gartner !ugust "#$$% ,ow E-isting Technolog) .an Fit In Aote that Figure : does not argue for specific technologies to perform each approach. #his is because multiple engineered solutions can be used to deliver the same architecture and design, as noted in the two different design scenarios2 =. /se a BI 0latfor* and DB+S stac1. While tending toward a more dedicated semantic, a (I platform deployed in tandem with a dedicated %(-& can deliver the entire approach. For example, the (I platform could negotiate with the %(-& when to use a table as opposed to a federated view of data. (ut any form of dynamic approach to using federation, materiali5ed views or tables would have to be leveraged by the %(-& optimi5er 6 and all the options would have to be maintained in the database. 0f course, some semantic layers in (I platforms fail to properly combine platform optimi5ation with %(-& optimi5ation, while others can accomplish this tas, and still others are improving. #his is one disadvantage of an engineering approach to !use what is available,! instead of !designing to purpose.! <. /se an enter0rise service 2us ESB%3 data integration tools and DB+S. "n .&( can define discrete services or register services provided by the data integration tool )which becomes a development worbench with orchestration occurring in the .&(+. #he %(-& wors in its usual fashion 6 optimi5ing for view and table use to respond to queries )by maintaining cubes, views, indices, etc.+. In addition, the %(-& or .&( could manage external calls to nondatabase types of information as service calls to other application services, or by orchestrating calls to the functions of other tools or repositories. #his could even include calls to content management systems, sentiment analysis tools and text analysis tools. In addition, many data integration tool vendors support variations of this infrastructure to some degree. %atabase vendors support capabilities to deploy access to external information assets and even to externally managed parallel distributed processes. #he main shortcoming of an approach that uses existing technologies is the inability to integrate data management with business process management. " business process management tool could add the ability to lin analytics data sources with analytics processing, and then provide the results to an operational application for use in on8 demand analytics. #he ability to lin process management with analytics is the first step in a /attern8(ased &trategy. Page 8 of 10 Print Document 15/01/2014 http://my.gartner.com/portal/server.pt/gateway/P!"#$%0%2&&'5'(%(5(%25'%2(50... 4eco**ended 4eading Some documents may not be available as part of your current Gartner subscription. !#he &tate of %ata Warehousing in <B==! !-agic @uadrant for %ata Warehouse %atabase -anagement &ystems! !"nalytics and $earning #echnology2 CI0s, C#0s &hould 1ethin "rt of the /ossible! !-agic @uadrant for %ata Integration #ools! !%ata "rchitectures to &upport /erformance -anagement "pplications! !-agic @uadrant for %ata @uality #ools! !7ype Cycle for %ata -anagement, <B==! !"pplying ?artner9s /ace $ayer -odel to 7uman Capital -anagement! !Cool Cendors in %ata -anagement and Integration, <B==! Strategic 5lanning !ssu*0tions 3 (y <B=:, DEF of organi5ations will fail to deploy new strategies to address data complexity and volume in their analytics. 3 0rgani5ations which fail to deploy strategies to address data complexity and volume issues for their analytics by <B=< will experience more than doubling costs of ownership for their data warehouse and mart environments in disorgani5ed attempts to meet this new demand. 3 (y <B=:, organi5ations which have deployed analytics to support new complex data types and large volumes of data in analytics will outperform their maret peers by more than <BF in revenue, margins, penetration and retention. 6ote $ ,ow 7!2le7 Is 8our Data Warehouse9 -any organi5ations recogni5e that best practices demand a data warehouse that provides for sub'ect8oriented, integrated, consistent and time8variant data for critical corporate data. #he overall architecture of the warehouse can achieve these ob'ectives by adhering to six basic architectural principles. %ata warehouses should be2 3 E-tensi2le. It should be easy to add more data sources or to change data sources during the life of the data warehouse. 3 Fle-i2le. #he data warehouse should be modeled to a level of abstraction that supports modifications to the data model as more data sub'ect areas are added. 3 4e0eata2le. %ata warehouses should provide consistent, predictable query response times; as a result, they may themselves introduce redundancy as needed. 3 4eusa2le. %ata in the warehouse should be fully qualified to allow multiple departments to use it in a variety of contexts. #his relates to the abstraction rules in the data model, and to the data integration transformation rules that consolidate and collate data to support the introduction of commonly held data enrichment and cleansing rules. 3 Scala2le. #he data warehouse must be able to support more rows of data, and the data architecture must account for storage of and access to data, as well as its archive and retirement. 3 !vaila2le. #he data warehouse must be able to operate in virtually nonstop mode, with provisions for reconfiguration, migration, bacup, data insertion and performance optimi5ation. #hese !8ables,! which were originally conceived as a group by other analysts, have existed for years. 7owever, many organi5ations have attempted to achieve all six in a single data architecture tier, an approach that has proved untenable in the end8user maret. It is best to thin of these six expectations as clauses in a service contract 6 which the warehouse is expected to fulfill. 6ote " Gartner:s I.F Definitions ?artner9s infor*ation ca0a2ilities fra*ewor1 I.F% is the collection of technical capabilities required to create business value from information assets. It is a conceptual model that is people, process and technology independent and allows I# leaders to thin holistically about the capabilities required to describe, organi5e, integrate, share and govern information in an application8independent manner. It is independent of use case and Page 9 of 10 Print Document 15/01/2014 http://my.gartner.com/portal/server.pt/gateway/P!"#$%0%2&&'5'(%(5(%25'%2(50... information source and does not rely on, nor advocate, any technology or architectural style. 7owever, it does tae into account the specifics of use cases. "n 7infor*ation ca0a2ilit)7 is a representation of the actions needed for the information to be used, treated, organi5ed or developed for the general management of, and for specific purposes throughout, the organi5ation. "n 7infor*ation use case7 represents the usage of information throughout the organi5ation to create business value. #he ICF9s co**on ca0a2ilitiesla)er provides the range of functionalities used to describe, organi5e, integrate, share and govern the information, and the capabilities required to interact with physical data stores )operate+, to prepare the information for consumption )provision+ and to increase the value of the information by maing it more easily used and found, and by providing context )enrich+. #he ICF9s infor*ation se*antic st)lesla)er provides the specific entry or !gate! into information management functions or capabilities. #hese services follow styles or approaches that support specific assumptions on how an application interacts with the data it uses. #he ICF9s s0eciali;ed ca0a2ilities la)er deals with the range of functionalities used to support use8case8specific requirements. G <B== ?artner, Inc. and;or its "ffiliates. "ll 1ights 1eserved. 1eproduction and distribution of this publication in any form without prior written permission is forbidden. #he information contained herein has been obtained from sources believed to be reliable. ?artner disclaims all warranties as to the accuracy, completeness or adequacy of such information. "lthough ?artner9s research may discuss legal issues related to the information technology business, ?artner does not provide legal advice or services and its research should not be construed or used as such. ?artner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. #he opinions expressed herein are sub'ect to change without notice. Page 10 of 10 Print Document 15/01/2014 http://my.gartner.com/portal/server.pt/gateway/P!"#$%0%2&&'5'(%(5(%25'%2(50...