You are on page 1of 95

IN4392 Cloud Computing Cloud Programming Models

Cloud Computing (IN4392) D. .!. "pema and #. Iosup. $it% &ontri'utions (rom Claudio Martella and )ogdan *%it. 2012-2013
1

Parallel and Distri'uted *roup+,- Del(t

,erms (or ,oda./s Dis&ussion


Programming model = language + libraries + runtime system that create a model of computation (an abstract machine) = an abstraction of a computer system Wikipedia Examples message!passing "s shared memory# data! "s task!parallelism# $ #'stra&tion le0el % What is the best abstraction le"el& = distance from physical machine Examples Assembly lo*!le"el "s Java is high le"el +any design trade!offs performance# ease!of!use# common!task optimi,ation# programming paradigm# $
'(1'!'(1) '

C%ara&teristi&s o( a Cloud Programming Model

1')234-

.ost model (Efficiency) = cost/performance# o"erheads# $ 0calability 1ault!tolerance 0upport for specific ser"ices .ontrol model# e-g-# fine!grained many!task scheduling 5ata model# including partitioning and placement# out!of! memory data access# etc6- 0ynchroni,ation model

'(1'!'(1)

#genda
12. )237ntroduction Cloud Programming in Pra&ti&e (,%e Pro'lem) 8rogramming +odels for .ompute!7ntensi"e Workloads 8rogramming +odels for 9ig 5ata 0ummary

'(1'!'(1)

,oda./s C%allenges

e0cience :he 1ourth 8aradigm :he 5ata 5eluge and 9ig 5ata 8ossibly others

'(1'!'(1)

e1&ien&e2 ,%e $%.


0cience experiments already cost '3;3(< budget
$ and perhaps incur 63< of the delays

+illions of lines of code *ith similar functionality


=ittle code reuse across pro>ects and application domains $ but last t*o decades? science is "ery similar in structure

+ost results difficult to share and reuse


.ase!in!point 0loan 5igital 0ky 0ur"ey digital map of '3< of the sky x spectra 2(:9+ sky sur"ey data '((++ astro!ob>ects (images) 1++ ob>ects *ith spectrum (spectra) @o* to make it *ork for this and the next generation of scientists&
'(1'!'(1) 4

0ource Aim Bray and Clex 0,alay#e0cience !! C :ransformed 0cientific +ethod# http //research-microsoft-com/en!us/um/people/gray/talks/DE.!.0:9Fe0cience-ppt

e1&ien&e (!o%n ,a.lor3 -4 1&i.,e&%.3 1999)


# ne5 s&ienti(i& met%od
.ombine science *ith 7: 1ull scientific process control scientific instrument or produce data from simulations# gather and reduce data# analy,e and model results# "isuali,e results +ostly compute!intensi"e# e-g-# simulation of complex phenomena

I, support
7nfrastructure =@. Brid# Gpen 0cience Brid# 5C0# DorduBrid# $ 1rom programming models to infrastructure management tools

"6amples
'(1'!'(1)

% Why is .omp0ci an example here&


6

H physics# 9ioinformatics# +aterial science# Engineering# Comp1&i

,%e 7ourt% Paradigm2 ,%e $%. (#n #ne&dotal "6ample)

,%e 80er5%elming *ro5t% o( 4no5ledge


When 1' men founded the Dumber of 1II) 1II6 Eoyal 0ociety in 144(# it *as 8ublications 1II6 '((1 possible for an educated person to encompass all of scientific kno*ledge- J$K 7n the last 3( years# such has been the pace of scientific ad"ance that e"en the best scientists cannot keep up *ith disco"eries at frontiers outside their o*n field- :ony 9lair# 8+ 0peech# +ay '((' 5ata Ling#:he scientific impact of nations#Dature?(2-

,%e 7ourt% Paradigm2 ,%e $%at

7rom

.pot%esis to Data
2

:housand years ago science *as empiri&al describing natural phenomena =ast fe* hundred years t%eoreti&al branch using models# generali,ations
. a 4G c2 a = 3 2 a

=ast fe* decades a &omputational branch simulating complex phenomena :oday (t%e 7ourt% Paradigm) data e6ploration %1 What is the 1ourth 8aradigm&

unify theory# experiment# and simulation 5ata captured by instruments or generated by simulator %' What are the dangers 8rocessed by soft*are of the 1ourth 8aradigm& 7nformation/Lno*ledge stored in computer 0cientist analy,es results using data management and statistics
'(1'!'(1)

0ource Aim Bray and :he 1ourth 8aradigm# http //research-microsoft-com/en!us/collaboration/fourthparadigm/

,%e 9Data Deluge:2 ,%e $%.


ME"ery*here you look# the Nuantity of information in the *orld is soaring- Cccording to one estimate# mankind created 13( exabytes (billion gigabytes) of data in '((3- :his year# it *ill create 1#'(( exabytes- +erely keeping up *ith this flood# and storing the bits that might be useful# is difficult enoughCnalysing it# to spot patterns and extract useful information# is harder still- :he 5ata 5eluge# :he Economist# '3 1ebruary '(1(
1(

9Data Deluge:2 ,%e Personal Meme6 "6ample

Oanne"ar 9ush in the 1I2(s record your life +7: +edia =aboratory :he @uman 0peechome 8ro>ect/:otalEecall# data mining/analysis/"isio 5eb Eoy and Eupal 8atel record practically e"ery *aking moment of their son?s first three years ('(< pri"acy time$7s this e"en legal&P 0hould it be&P) 11x1+8/12fps cameras# 12x14b!2QL@, mics# 2-2:9 EC75 + tapes# 1( computersR '((k hours audio!"ideo 5ata si,e '((B9/day# 1-389 total
11

9Data Deluge:2 ,%e *aming #nal.ti&s "6ample

E% 77 '(:9/year all logs @alo) 1-289 ser"ed statistics on player logs


1'

9Data Deluge:2 Datasets in Comp.1&i.


5ataset 0i,e http //g*a-e*i-tudelft-nl :he 1ailure :race Crchi"e 1:9/yr 1:9 1((B9 1(B9 Bam:C 8'8:C

http //fta-inria-fr

8eer!to!8eer :race Crchi"e $ 8WC# 7:C# .ECW5C5# $

1B9 T(4 T(I T1( T11 Sear


1)

1#(((s of scientists 1rom theory to practice


1)

9Data Deluge:2 ,%e Pro(essional $orld *ets Conne&ted

1eb '(1'
'(1'!'(1) 12

0ource Oincen,o .osen,a# :he 0tate of =inked7n# http //"incos-it/the!state!of!linkedin/

$%at is 9)ig Data:;


Oery large# distributed aggregations of loosely structured data# often incomplete and inaccessible Easily exceeds the processing capacity of con"entional database systems 8rinciple of 9ig 5ata When you can# keep e"erythingP :oo big# too fast# and doesn?t comply *ith the traditional database architectures
'(11!'(1' 13

,%e ,%ree 9<:s o( )ig Data


Oolume
+ore data "s- better models 5ata gro*s exponentially Cnalysis in near!real time to extract "alue 0calable storage and distributed Nueries

Oelocity
0peed of the feedback loop Bain competiti"e ad"antage fast recommendations 7dentify fraud# predict customer churn faster

Oariety
:he data can become messy text# "ideo# audio# etc 5ifficult to integrate into applications
'(11!'(1' 14

Cdapted from 5oug =aney# )5 data management# +E:C Broup/Bartner report# 1eb '((1- http //blogs-gartner-com/doug!laney/files/'(1'/(1/adI2I!)5!5ata! +anagement!.ontrolling!5ata!Oolume!Oelocity!and!Oariety-pdf

Data $are%ouse 0s. )ig Data

'(11!'(1'

16

0ource http //*ikibon-org/

#genda
1- 7ntroduction '- .loud 8rogramming in 8ractice (:he 8roblem) 3. Programming Models (or Compute-Intensi0e $or=loads 1. )ags o( ,as=s '- Workflo*s )- 8arallel 8rogramming +odels 2- 8rogramming +odels for 9ig 5ata 3- 0ummary
'(1'!'(1) 1Q

$%at is a )ag o( ,as=s ()o,); # 1.stem <ie5


9o: = set of >obs sent by a user$

$that start at most Us after the first >ob

% What is the user?s "ie*& :ime JunitsK

Why 9ag of :asks& 1rom the perspecti"e of the user# >obs in set are >ust tasks of a larger >ob C single useful result from the complete 9o: Eesult can be combination of all tasks# or a selection of the results of most or e"en a single task
'(1'!'(1) Iosup et al., The Characteristics and Performance of Groups of Jobs in Grids, Euro-Par, LNCS, ol.!"!#, pp. $%&-$'$, '((6.

1I

#ppli&ations o( t%e )o, Programming Model


8arameter s*eeps
.omprehensi"e# possibly exhausti"e in"estigation of a model Oery useful in engineering and simulation!based science

+onte .arlo simulations


0imulation *ith random elements fixed time yet limited inaccuracy Oery useful in engineering and simulation!based science

+any other types of batch processing


8eriodic computation# .ycle sca"enging Oery useful to automate operations and reduce *aste
'(1'!'(1) '(

)o,s )e&ame t%e Dominant Programming Model (or *rid Computing


(V0) :eraBrid!' D.0C (V0) .ondor V-Wisc(EV) EBEE (.C) 0@CE.DE: (V0) Brid) (V0) B=GW (VL) EC= (DG#0E) DorduBrid (1E) BridW3((( (D=) 5C0!'

7rom >o's ?@A (


(V0) :eraBrid!' D.0C (V0) .ondor V-Wisc(EV) EBEE (.C) 0@CE.DE: (V0) Brid) (V0) B=GW (VL) EC= (DG#0E) DorduBrid (1E) BridW3((( (D=) 5C0!'

'(

2(

4(

Q(

1((

7rom CP-,ime ( ?@A


'(1'!'(1)

'(

2(

4(

Q( '1

1((

Iosup and Epema( Grid Computin) *or+loads. IEEE Internet Computin) #,-&.( #'-&" -&/##.

Pra&ti&al #ppli&ations o( t%e )o, Programming Model

Parameter 15eeps in Condor ?1+4A


0ue the scientist *ants to 1ind the "alue of 1(x#y#,) for 1( "alues for x and y# and 4 "alues for , 1olution Eun a parameter s5eep# *ith 1( x 1( x 4 = 4(( parameter "alues 8roblem of the solution
0ue runs one >ob (a combination of x# y# and ,) on her lo*!end machine- 7t takes 4 hours :hat?s 1B0 da.s uninterrupted computation on 0ue?s machineP
'(1'!'(1) ''

0ource .ondor :eam# .ondor Vser?s :utorialhttp //cs-u*isc-edu/condor

Pra&ti&al #ppli&ations o( t%e )o, Programming Model

Parameter 15eeps in Condor ?2+4A


Universe Executable Input Output Error Log = = = = = = vanilla sim.exe input.txt output.txt error.txt sim.log

Requirements = OpSys == WINNT61 && .omplex 0=Cs can be specified easily Arch == INTEL && (Disk >= DiskUsage) && ((Memory * 1024)>=ImageSize)

InitialDir = run_$(Process) Clso passed as Queue 600 parameter to sim.exe


'(1'!'(1) ')

0ource .ondor :eam# .ondor Vser?s :utorialhttp //cs-u*isc-edu/condor

Pra&ti&al #ppli&ations o( t%e )o, Programming Model

Parameter 15eeps in Condor ?3+4A


% condor_submit sim.submit Submitting job(s) ................................................ ................................................ ................................................ ................................................ ................................................ ............... Logging submit event(s) ................................................ ................................................ ................................................ ................................................ ................................................ ............... 600 job(s) submitted to cluster 3.
'(1'!'(1) '2

0ource .ondor :eam# .ondor Vser?s :utorialhttp //cs-u*isc-edu/condor

Pra&ti&al #ppli&ations o( t%e )o, Programming Model

Parameter 15eeps in Condor ?4+4A


% condor_q -- Submitter: x.cs.wisc.edu : <128.105.121.53:510> : x.cs.wisc.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 3.0 frieda 4/20 12:08 0+00:00:05 R 0 9.8 sim.exe 3.1 frieda 4/20 12:08 0+00:00:03 I 0 9.8 sim.exe 3.2 frieda 4/20 12:08 0+00:00:01 I 0 9.8 sim.exe 3.3 frieda 4/20 12:08 0+00:00:00 I 0 9.8 sim.exe ... 3.598 frieda 4/20 12:08 0+00:00:00 I 0 9.8 sim.exe 3.599 frieda 4/20 12:08 0+00:00:00 I 0 9.8 sim.exe 600 jobs; 599 idle, 1 running, 0 held

'(1'!'(1)

'3

0ource .ondor :eam# .ondor Vser?s :utorialhttp //cs-u*isc-edu/condor

#genda
1- 7ntroduction '- .loud 8rogramming in 8ractice (:he 8roblem) 3. Programming Models (or Compute-Intensi0e $or=loads 1- 9ags of :asks 2. $or=(lo5s )- 8arallel 8rogramming +odels 2- 8rogramming +odels for 9ig 5ata 3- 0ummary
'(1'!'(1) '4

$%at is a $o=(lo5;

W1 = set of >obs *ith precedence (think 5irect Ccyclic Braph)

'(1'!'(1)

'6

#ppli&ations o( t%e $or=(lo5 Programming Model


.omplex applications
.omplex filtering of data .omplex analysis of instrument measurements

Cpplications created by non!.0 scientistsH


Workflo*s ha"e a natural correspondence in the real!*orld# as descriptions of a scientific procedure Oisual model of a graph sometimes easier to program

8recursor of the +apEeduce 8rogramming +odel (next slides)


'(1'!'(1) 'Q

HCdapted from .arole Boble and 5a"id de Eoure# .hapter in :he 1ourth 8aradigm# http //research-microsoft-com/en!us/collaboration/fourthparadigm/

$or=(lo5s "6isted in *rids3 'ut Did Not )e&ome a Dominant Programming Model
:races

0elected 1indings

=oose coupling Braph *ith )!2 le"els C"erage W1 si,e is )(/22 >obs 63<+ W1s are si,ed 2( >obs or less# I3< are si,ed '(( >obs or less
'I

'(1'!'(1) 0stermann et al., 0n the Characteristics of Grid *or+flo1s, CoreG2I3 Inte)rated 2esearch in Grid Computin) -CGI*., '((Q.

Pra&ti&al #ppli&ations o( t%e $7 Programming Model

)ioin(ormati&s in ,a0erna

'(1'!'(1)

)(

0ource .arole Boble and 5a"id de Eoure# .hapter in :he 1ourth 8aradigm# http //research-microsoft-com/en!us/collaboration/fourthparadigm/

#genda
1- 7ntroduction '- .loud 8rogramming in 8ractice (:he 8roblem) 3. Programming Models (or Compute-Intensi0e $or=loads 1- 9ags of :asks '- Workflo*s 3. Parallel Programming Models 2- 8rogramming +odels for 9ig 5ata 3- 0ummary
'(1'!'(1) )1

Parallel Programming Models

,-D &ourse in4049 Introdu&tion to PC

Cbstract machines ,as= (groups o( B3 B minutes)2 (5istributed) shared memory dis&uss parallel programming in &louds
5istributed memory +87

,as= (inter (inter-group dis&ussion)2 dis&uss. models .onceptual programming


+aster/*orker 5i"ide and conNuer 5ata / :ask parallelism 908

0ystem!le"el programming models


:hreads on B8Vs and other multi!cores
'(1'!'(1) )'

4arbanescu et al.( To1ards an Effecti e 5nified Pro)rammin) 6odel for 6an7-Cores. IP3PS *S &/#&

#genda
1')4. 37ntroduction .loud 8rogramming in 8ractice (:he 8roblem) 8rogramming +odels for .ompute!7ntensi"e Workloads Programming Models (or )ig Data 0ummary

'(1'!'(1)

))

"&os.stems o( )ig-Data Programming Models


% Where does +E!on!demand %fit& Where does 8regel!on!B8Vs fit&
@igh!=e"el =anguage
1lume 9ig%uery 0%= +eteor AC%= @i"e 8ig 0a*,all 0cope 5ryad=7D% C%=

8rogramming +odel
P#C, MapCedu&e Model Pregel 5ataflo* Clgebrix

Execution Engine
1lume 5remel :era C,ure Nep%ele @aloop Engine 0er"ice 5ata Engine :ree Engine adoop+ *irap% +87/ 5ryad D#CN Erlang @yracks

0torage Engine
0) B10 :era C,ure 5ata 5ata 0tore 0tore D71 Ooldemort = .osmos10 1 0 Csterix 9!tree
)2

H 8lus Yookeeper# .5D# etc-

'(1'!'(1)

Cdapted from 5agstuhl 0eminar on 7nformation +anagement in the .loud# http //***-dagstuhl-de/program/calendar/partlist/&semnr=11)'1X0VGB

#genda
1')4. 7ntroduction .loud 8rogramming in 8ractice (:he 8roblem) 8rogramming +odels for .ompute!7ntensi"e Workloads Programming Models (or )ig Data 1. MapCedu&e '- Braph 8rocessing )- Gther 9ig 5ata 8rogramming +odels 3- 0ummary

'(1'!'(1)

)3

MapCedu&e
+odel for processing and generating large data sets Enables a functional!like programming model 0plits computations into independent parallel tasks +akes efficient use of large commodity clusters @ides the details of paralleli,ation# fault!tolerance# data distribution# monitoring and load balancing
'(11!'(1' )4

MapCedu&e2 ,%e Programming Model


C programmig model# not a programming languageP 1- 7nput/Gutput 0et of key/"alue pairs '- +ap 8hase
8rocesses input key/"alue pair 8roduces set of intermediate pairs map (in_key, in_value) -> list(out_key, interm_value)

3. Reduce Phase:
Combines all intermediate values for a given key Produces a set of merged output values reduce(out_key, list(interm_value)) -> list(out_value)
'(11!'(1' )6

MapCedu&e in t%e Cloud


1acebook use case
0ocial!net*orking ser"ices Cnaly,e connections in the graph of friendships to recommend ne* connections

Boogle use case


Web!base email ser"ices# Boogle5ocs Cnaly,e messages and user beha"ior to optimi,e ad selection and placement

Soutube use case


Oideo!sharing sites Cnaly,e user preferences to gi"e better stream suggestions

'(11!'(1'

)Q

$ord&ount "6ample
1ile 1 :he big data is big- 1ile ' +apEeduce tames big data- Map 8utput2
+apper!1 (:he# 1)# (big# 1)# (data# 1)# (is# 1)# (big# 1) +apper!' (+apEeduce# 1)# (tames# 1)# (big# 1)# (data# 1)

Cedu&e Input
Eeducer!1 Eeducer!' Eeducer!) Eeducer!2 Eeducer!3 Eeducer!4 (:he# 1) (big# 1)# (big# 1)# (big# 1) (data# 1)# (data# 1) (is# 1) (+apEeduce# 1)# (+apEeduce# 1) (tames# 1)

Cedu&e 8utput
Eeducer!1 Eeducer!' Eeducer!) Eeducer!2 Eeducer!3 Eeducer!4 (:he# 1) (big# )) (data# ') (is# 1) (+apEeduce# ') (tames# 1)
)I

'(11!'(1'

Colored 1Euare Counter

'(11!'(1'

2(

MapCedu&e F *oogle2 "6ample 1

Benerating language model statistics


.ount Z of times e"ery 3!*ord seNuence occurs in large corpus of documents (and keep all those *here count [= 2)

+apEeduce solution
+ap extract 3!*ord seNuences =[ count from document Eeduce combine counts# and keep if count large enough

'(11!'(1'

21

http //***-slideshare-net/>hammerb/mapreduce!pact(4!keynote

MapCedu&e F *oogle2 "6ample 2

Aoining *ith other data


Benerate per!doc summary# but include per!host summary (e-g-# Z of pages on host# important terms on host)

+apEeduce solution
+ap extract host name from VE=# lookup per!host info# combine *ith per!doc data and emit Eeduce identity function (emit key/"alue directly)

'(11!'(1'

2'

http //***-slideshare-net/>hammerb/mapreduce!pact(4!keynote

More "6amples o( MapCedu&e #ppli&ations


5istributed Brep .ount of VE= Cccess 1reNuency Ee"erse Web!=ink Braph :erm!Oector per @ost 5istributed 0ort 7n"erted indices 8age ranking +achine learning algorithms $

'(11!'(1'

2)

"6e&ution 80er0ie5 (1+2)


% What is the performance problem raised by this step&

'(11!'(1'

22

"6e&ution 80er0ie5 (2+2)


Gne master# many *orkers example
7nput data split into + map tasks (42 / 1'Q +9) '((#((( maps Eeduce phase partitioned into E reduce tasks 2#((( reduces 5ynamically assign tasks to *orkers '#((( *orkers

+aster assigns each map / reduce task to an idle *orker


+ap *orker 5ata locality a*areness Eead input and generate E local files *ith key/"alue pairs Eeduce *orker Eead intermediate output from mappers 0ort and reduce to produce the output
'(11!'(1' 23

7ailures and )a&=-up ,as=s

'(11!'(1'

24

$%at is

adoop; # MapCedu&e "6e&. "ngine

7nspired by Boogle# supported by SahooP 5esigned to perform fast and reliable analysis of the big data =arge expansion in many domains such as
1inance# technology# telecom# media# entertainment# go"ernment# research institutions

'(11!'(1'

26

adoop F Da%oo
When you "isit yahoo# you are interacting *ith data processed *ith @adoopP

'(11!'(1'

2Q

https //open&irrus-org/system/files/8penCirrus adoop'((I-ppt

adoop -se Cases


1- 0earch Sahoo# Cma,on# Y"ents '- =og processing 1acebook# Sahoo# Aoost# =ast-fm )- Eecommendation 0ystems 1acebook 2- 5ata Warehouse 1acebook# CG= 3- Oideo and 7mage Cnalysis De* Sork :imes# Eyealike
'(11!'(1' 2I

http //cloud-berkeley-edu/data/hdfs-pdf

adoop Distri'uted 7ile 1.stem


Cssumptions and goals
5ata distributed across hundreds or thousands of machines 5etection of faults and automatic reco"ery 5esigned for batch processing "s- interacti"e use @igh throughput of data access "s- lo* latency of data access @igh aggregate band*idth and high scalability Write!once!read!many file access model +o"ing computations is cheaper than mo"ing data minimi,e net*ork congestion and increase throughput of the system

'(11!'(1'

3(

D71 #r&%ite&ture
+aster/sla"e architecture DameDode (DD)
+anages the file system namespace Eegulates access to files by clients Gpen/close/rename files or directories +apping of blocks to 5ataDodes Gne per node in the cluster +anages local storage of the node 9lock creation/deletion/replication initiated by DD 0er"e read/*rite reNuests from clients
'(11!'(1' 31

5ataDode (5D)

D71 Internals
Eeplica 8lacement 8olicy
1irst replica on one node in the local rack 0econd replica on a different node in the local rack :hird replica on a different node in a different rack impro"ed *rite performance ('/) are on the local rack) preser"es data reliability and read performance

.ommunication 8rotocols
=ayered on top of :.8/78 .lient 8rotocol client \ DameDode machine 5ataDode 8rotocol 5ataDodes \ DameDode DameDode responds to E8. reNuests issued by 5ataDodes / clients
'(11!'(1' 3'

adoop 1&%eduler
Aob di"ided into se"eral independent tasks executed in parallel
:he input file is split into chunks of 42 / 1'Q +9 Each chunk is assigned to a map task Eeduce task aggregate the output of the map tasks

:he master assigns tasks to the *orkers in 171G order


Aob:racker maintains a Nueue of running >obs# the states of the :ask:rackers# the tasks assignments :ask:rackers report their state "ia a heartbeat mechanism

5ata =ocality execute tasks close to their data 0peculati"e execution re!launch slo* tasks
'(11!'(1' 3)

MapCedu&e "0olution ?1+BA

adoop is Maturing2 Important Contri'utors


Dumber of patches
'(1'!'(1)

32

0ource http //***-theregister-co-uk/'(1'/(Q/16/communityFhadoop/

MapCedu&e "0olution ?2+BA

Gate 1&%eduler
=C:E 0cheduler
=ongest Cpproximate :ime to End 0peculati"ely execute the task that *ill finish farthest into the future

timeFleft = (1!progress0core) / progressEate


:asks make progress at a roughly constant rate Eobust to node heterogeneity

E.' 0ort running times


=C:E "s- @adoop "s- Do spec-

'(11!'(1'

33

8aharia et al.( Impro in) 6ap2educe performance in hetero)eneous en ironments. 0S3I &//%.

MapCedu&e "0olution ?3+BA

7#IC 1&%eduling
7solation and statistical multiplexing :*o!le"el architecture
1irst# allocates task slots across pools 0econd# each pool allocates its slots among multiple >obs 9ased on a max!min fairness policy

'(11!'(1'

34

8aharia et al.( 3ela7 schedulin)( a simple techni9ue for achie in) localit7 and fairness in cluster schedulin). EuroS7s &/#/. :lso T2 EECS-&//'-,,

MapCedu&e "0olution ?4+BA

Dela. 1&%eduling
5ata locality issues
@ead!of!line scheduling 171G# 1C7E =o* probability for small >obs to achie"e data locality 3Q< of >obs ] 1C.E9GGL ha"e ^ '3 maps Gnly 3< achie"e node locality "s- 3I< rack locality

'(11!'(1'

36

8aharia et al.( 3ela7 schedulin)( a simple techni9ue for achie in) localit7 and fairness in cluster schedulin). EuroS7s &/#/. :lso T2 EECS-&//'-,,

MapCedu&e "0olution ?B+BA

Dela. 1&%eduling
0olution
0kip the head!of!line >ob until you find a >ob that can launch a local task Wait :1 seconds before launching tasks on the same rack Wait :' seconds before launching tasks off!rack :1 = :' = 13 seconds =[ Q(< data locality

'(11!'(1'

3Q

8aharia et al.( 3ela7 schedulin)( a simple techni9ue for achie in) localit7 and fairness in cluster schedulin). EuroS7s &/#/. :lso T2 EECS-&//'-,,

48#G# and MapCedu&e ?1+9A


LGC=C
8lacement X allocation .entral for all +E clusters +aintains +E cluster metadata

+E!Eunner
.onfiguration X deployment +E cluster monitoring Bro*/0hrink mechanism

Gn!demand +E clusters
8erformance isolation 5ata isolation 1ailure isolation Oersion isolation

'(11!'(1'

3I

4oala and MapCedu&e ?2+9A

CesiHing Me&%anism
:*o types of nodes
Core nodes2 fully!functional nodes# *ith :ask:racker and 5ataDode (local disk access) ,ransient nodes2 compute nodes# *ith :ask:racker

8arameters
7 = Dumber of running tasks per number of a"ailable slots 8redefined 7min and 7ma6 thresholds 8redefined constant step gro51tep / s%rin=1tep , = time elapsed bet*een t*o successi"e resource offers

:hree policies
*ro5-1%rin= Poli&. (*1P)2 gro*!shrink but maintain 1 bet*een 7min and 7ma6 *reed.-*ro5 Poli&. (**P)2 gro*# shrink *hen *orkload done *reed.-*ro5-5it%-Data Poli&. (**DP)2 BB8 *ith core nodes (local disk access)
4(

'(11!'(1'

4oala and MapCedu&e ?3+9A

$or=loads


J1K J'K

IQ< of >obs ] 1acebook process 4-I +9 and take less than a minute J1K Boogle reported in '((2 computations *ith :9 of data on 1(((s of machines J'K
S- .hen# 0- Clspaugh# 5- 9orthakur# and E- Lat,# Energy Efficiency for =arge!0cale +apEeduce Workloads *ith 0ignificant 7nteracti"e Cnalysis# pp- 2)\34# '(1'A- 5ean and 0- Bhema*at# +apreduce 0implified 5ata 8rocessing on =arge .lusters# .omm- of the C.+# Ool- 31# no- 1# pp- 1(6\11)# '((Q-

'(11!'(1'

41

4oala and MapCedu&e ?4+9A

$ord&ount (,.pe 03 (ull s.stem)


.8V 5isk

1(( B9 input data 1( core nodes *ith Q map slots each Q(( map tasks executed in 6 *a"es Wordcount is .8V!bound in the map phase
'(11!'(1' 4'

4oala and MapCedu&e ?B+9A

1ort (,.pe 03 (ull s.stem)


.8V 5isk

8erformed by the +E frame*ork during the shuffling phase 7ntermediate key/"alue pairs are processed in increasing key order 0hort map phase *ith 2(<!4(< .8V utili,ation =ong reduce phase *hich is highly disk intensi"e
'(11!'(1' 4)

4oala and MapCedu&e ?I+9A

1peedup ((ull s.stem)


Gptimum

a) Wordcount

(:ype 1)

b) 0ort (:ype ')

0peedup relati"e to an +E cluster *ith 1( core nodes Close to linear speedup on &ore nodes

'(11!'(1'

42

4oala and MapCedu&e ?J+9A

"6e&ution ,ime 0s. ,ransient Nodes

7nput data of 2( B9 Wordcount output data = '( L9 0ort output data = 2( B9 Wordcount scales better than 0ort on transient nodes
'(11!'(1' 43

4oala and MapCedu&e ?K+9A

Per(orman&e o( t%e CesiHing Me&%anism


7minL0.2B 7ma6L1.2B gro51tep=3 s%rin=1tep=' ,*1P =)( s ,**(D)P =1'( s

0tream of 3( +E >obs +E cluster of '( core nodes + '( transient nodes BB8 increases the si,e of the data transferred across the net*ork B08 gro*s/shrinks based on the resource utili,ation of the cluster BB58 enables local *rites on the disks of pro"isioned nodes
'(11!'(1' 44

4oala and MapCedu&e ?9+9A

-se Case F ,-Del(t


Boal 8rocess 1(s :9 of 8'8 traces 5C0!2 constraints
:ime limit per >ob (13 min) Don!persistent storage

0olution
Eeser"e the nodes for se"eral days# import and process the data&

Gur approach
0plit the data into multiple subsets 0maller data sets =[ faster import and processing 0etup multiple +E clusters# one for each subset
46

'(11!'(1'

#genda
1')4. 7ntroduction .loud 8rogramming in 8ractice (:he 8roblem) 8rogramming +odels for .ompute!7ntensi"e Workloads Programming Models (or )ig Data 1- +apEeduce 2. *rap% Pro&essing )- Gther 9ig 5ata 8rogramming +odels 3- 0ummary

'(1'!'(1)

4Q

*rap% Pro&essing "6ample 1ingle-1our&e 1%ortest Pat% (111P)


5i>kstra?s algorithm
0elect node *ith minimal distance Vpdate neighbors G(_E_ + _O_`log_O_) *ith 1ibo@eap

7nitial dataset
C 9 . 5 --'(1'!'(1)

^(# (9# 3)# (5# ))[ ^inf# (E# 1)[ ^inf# (1# 3)[ ^inf# (9# 1)# (.# ))# (E# 2)# (1# 2)[
4I

0ource .laudio +artella# 8resentation on Biraph at :V 5elft# Cpr '(1'-

*rap% Pro&essing "6ample 111P in MapCedu&e


% What is the performance problem& +apper output distances
^9# 3[# ^5# )[# ^.# inf[# ---

9ut also graph structure


^C# ^(# (9# 3)# (5# ))[ ---

Eeducer input distances


9 ^inf# 3[# 5 ^inf# )[ ---

9ut also graph structure D >obs# *here D is the graph diameter


'(1'!'(1)

9 ^inf# (E# 1)[ --6(

0ource .laudio +artella# 8resentation on Biraph at :V 5elft# Cpr '(1'-

,%e Pregel Programming Model (or *rap% Pro&essing

9atch!oriented processing Euns in!memory Oertex!centric C87 1ault!tolerant Euns on +aster!0la"e architecture

GL# the actual model follo*s in the next slides


'(1'!'(1) 61

0ource .laudio +artella# 8resentation on Biraph at :V 5elft# Cpr '(1'-

,%e Pregel Programming Model (or *rap% Pro&essing


8rocessors

9ased on Oaliant?s 9ulk 0ynchronous 8arallel (908)


=ocal .omputation

D processing units *ith fast local memory 0hared communication medium 0eries of 1upersteps Blobal 0ynchroni,ation 9arrier .ommunication Ends *hen all "ote:o@alt

8regel executes initiali,ation# one or se"eral supersteps# shutdo*n


'(1'!'(1)

9arrier 0ynchroni,ation
6'

0ource .laudio +artella# 8resentation on Biraph at :V 5elft# Cpr '(1'-

Pregel ,%e 1uperstep


Each Oertex (execution in parallel)
Eecei"e messages from other "ertices 8erform o*n computation (user!defined function) +odify o*n state or state of outgoing messages +utate topology of the graph 0end messages to other "ertices

:ermination condition
Cll "ertices inacti"e Cll messages ha"e been transmitted

'(1'!'(1)

6)

0ource 8regel article-

Pregel2 ,%e <erte6-)ased #PI


7nput message 7mplements processing algorithm Gutput message

'(1'!'(1)

62

0ource 8regel article-

,%e Pregel #r&%ite&ture Master-$or=er


+aster assigns "ertices to Workers
Braph partitioning

+aster Worker 1
3

+aster coordinates 0upersteps +aster coordinates .heckpoints Workers execute "ertices compute() Workers exchange messages directly
1 ' 2 )
'(1'!'(1)

Worker

k
63

0ource 8regel article-

Pregel Per(orman&e 111P on 1 )illion-<erte6 )inar. ,ree

'(1'!'(1)

64

0ource 8regel article-

Pregel Per(orman&e 111P on Candom *rap%s3 <arious 1iHes

'(1'!'(1)

66

0ource 8regel article-

#pa&%e *irap% #n 8pen-1our&e Implementation o( Pregel


:asktracker
+ap 0lot +ap 0lot

:asktracker
+ap 0lot +ap 0lot

:asktracker
+ap 0lot +ap 0lot

:asktracker
+ap 0lot +ap 0lot

8ersistent computation state

Yookeeper

DD X A:

+aster

=oose implementation of 8regel 0trong community (1acebook# :*itter# =inked7n) Euns 1((< on existing @adoop clusters 0ingle +ap!only >ob
6Q

'(1'!'(1)

0ource .laudio +artella# 8resentation on Biraph at :V 5elft# Cpr '(1'-

%ttp2++in&u'ator.apa&%e.org+girap%+

Page ran= 'en&%mar=s


:iberium :an
Clmost 2((( nodes# shared among numerous groups in SahooP @adoop (-'(-'(2 (secure @adoop) 'x %uad .ore '-2B@,# '2 B9 EC+# 1x 4:9 @5

org-apache-giraph-benchmark-8ageEank9enchmark
Benerates data# number of edges# number of "ertices# Z of supersteps 1 master/YooLeeper '( supersteps Do checkpoints 1 random edge per "ertex
6I

0ource C"ery .hing presentation at @ortonWorkshttp //***-slideshare-net/a"eryching/'(111(12horton*orks/

$or=er s&ala'ilit. (2B0M 0erti&es)


3((( 23(( 2((( ,otal se&onds )3(( )((( '3(( '((( 13(( 1((( 3(( ( ( 3( 1(( 13( '(( M o( 5or=ers '3( )(( )3(
Q(

0ource C"ery .hing presentation at @ortonWorkshttp //***-slideshare-net/a"eryching/'(111(12horton*orks/

<erte6 s&ala'ilit. (300 5or=ers)


'3(( '((( ,otal se&onds 13(( 1((( 3(( ( ( 1(( '(( )(( 2(( 3(( M o( 0erti&es (in 100Ns o( millions) 4((
Q1

0ource C"ery .hing presentation at @ortonWorkshttp //***-slideshare-net/a"eryching/'(111(12horton*orks/

<erte6+5or=er s&ala'ilit.
M o( 0erti&es (100Ns o( millions) '3(( '((( ,otal se&onds 13(( )(( 1((( '(( 3(( ( ( 3( 1(( 13( '(( M o( 5or=ers '3( )(( )3(
Q'

4(( 3(( 2((

1(( (

0ource C"ery .hing presentation at @ortonWorkshttp //***-slideshare-net/a"eryching/'(111(12horton*orks/

#genda
1')4. 7ntroduction .loud 8rogramming in 8ractice (:he 8roblem) 8rogramming +odels for .ompute!7ntensi"e Workloads Programming Models (or )ig Data 1- +apEeduce '- Braph 8rocessing 3. 8t%er )ig Data Programming Models 3- 0ummary

'(1'!'(1)

Q)

1tratosp%ere
+eteor Nuery language# 0upremo operator frame*ork 8rogramming .ontracts (8C.:s) programming model
Extended set of 'nd order functions ("s +apEeduce) 5eclarati"e definition of data parallelism

Dephele execution engine


0chedules multiple dataflo*s simultaneously 0upports 7aa0 en"ironments based on Cma,on E.'# Eucalyptus

@510 storage engine


'(1'!'(1) Q2

1tratosp%ere Programming Contra&ts (P#C,s) ?1+2A


'nd!order function

Clso in +apEeduce
'(1'!'(1) Q3

0ource 8C.: o"er"ie*# https //stratosphere-eu/*iki/doku-php/*iki pactpm

1tratosp%ere Programming Contra&ts (P#C,s) ?2+2A

% @o* can 8C.:s optimi,e data processing&

'(1'!'(1)

Q4

0ource 8C.: o"er"ie*# https //stratosphere-eu/*iki/doku-php/*iki pactpm

1tratosp%ere Nep%ele

'(1'!'(1)

Q6

1tratosp%ere 0s MapCedu&e
8C.: extends +apEeduce
9oth propose 'nd!order functions (3 8C.:s "s +ap X Eeduce) 9oth reNuire from user 1st!order functions (*hat?s inside the +ap) 9oth can benefit from higher!le"el languages 8C.: ecosystem has 7aa0 support

Ley!"alue data model

'(1'!'(1)

QQ

0ource 1abian @ueske# =arge 0cale 5ata Cnalysis 9eyond +apEeduce# @adoop Bet :ogether# 1eb '(1'-

1tratosp%ere 0s MapCedu&e Pair5ise 1%ortest Pat%s ?1+3A


1loyd!Warshall Clgorithm
For k from 1 to n For i from 1 to n For j from 1 to n Di,j min( Di,j , Di,k + Dk,j )

'(1'!'(1)

QI

0ource 0tratosphere example# https //stratosphere-eu/*iki/lib/exe/detail-php/*iki all'allF spFtaskdescription-png&id=*iki<)Ca'aspexample

1tratosp%ere 0s MapCedu&e )- Aoin edges to triads ,riangle "numeration ?1+3A


7nput undirected graph edge!based Gutput triangles 1- Eead input graph '- 0ort by smaller "ertex 75

2- :riads to triangles

'(1'!'(1)

I(

0ource 1abian @ueske# =arge 0cale 5ata Cnalysis 9eyond +apEeduce# @adoop Bet :ogether# 1eb '(1' and 0tratosphere example-

1tratosp%ere 0s MapCedu&e ,riangle "numeration ?2+3A


+apEeduce 1ormulation

'(1'!'(1)

I1

0ource 1abian @ueske# =arge 0cale 5ata Cnalysis 9eyond +apEeduce# @adoop Bet :ogether# 1eb '(1' and 0tratosphere example-

1tratosp%ere 0s MapCedu&e ,riangle "numeration ?3+3A


0tratosphere 1ormulation

'(1'!'(1)

I'

0ource 1abian @ueske# =arge 0cale 5ata Cnalysis 9eyond +apEeduce# @adoop Bet :ogether# 1eb '(1' and 0tratosphere example-

#genda
1')2B. 7ntroduction .loud 8rogramming in 8ractice (:he 8roblem) 8rogramming +odels for .ompute!7ntensi"e Workloads 8rogramming +odels for 9ig 5ata 1ummar.

'(1'!'(1)

I)

Con&lusion ,a=e,a=e- ome Message


Programming model L &omputer s.stem a'stra&tion Programming Models (or Compute-Intensi0e $or=loads
+any trade!offs# fe* dominant programming models +odels bags of tasks# *orkflo*s# master/*orker# 908# $

Programming Models (or )ig Data


)ig data programming models %a0e e&os.stems +any trade!offs# many programming models +odels +apEeduce# 8regel# 8C.:# 5ryad# $ Execution engines @adoop# Loala++E# Biraph# 8C.:/Dephele# 5ryad# $

Cealit. &%e&=2 &loud programming is maturing


Gctober 1(# '(1' http //***-flickr-com/photos/dimitrisotiropoulos/2'(264421Q/ I2

Ceading Material (Ceall. #&ti0e 7ield)


$or=loads
Clexandru 7osup# 5ick @- A- Epema Brid .omputing Workloads- 7EEE 7nternet .omputing 13(') 1I!'4 ('(11)

,%e 7ourt% Paradigm :he 1ourth 8aradigm# http //research-microsoft-com/en!us/collaboration/fourthparadigm/ Programming Models (or Compute-Intensi0e $or=loads
Clexandru 7osup# +athieu Aan# Gmer G,an 0onme,# 5ick @- A- Epema :he .haracteristics and 8erformance of Broups of Aobs in BridsEuro!8ar '((6 )Q'!)I) Gstermann et al-# Gn the .haracteristics of Brid Workflo*s# .oreBE75 7ntegrated Eesearch in Brid .omputing (.B7W)# '((Qhttp //***-pds-e*i-tudelft-nl/aiosup/*ftraces(6charsFcamera-pdf .orina 0tratan# Clexandru 7osup# 5ick @- A- Epema C performance study of grid *orkflo* engines- BE75 '((Q '3!)'

Programming Models (or )ig Data


Aeffrey 5ean# 0an>ay Bhema*at +apEeduce 0implified 5ata 8rocessing on =arge .lusters- G057 '((2 1)6!13( Aeffrey 5ean# 0an>ay Bhema*at +apEeduce a flexible data processing tool- .ommun- C.+ 3)(1) 6'!66 ('(1() +atei Yaharia# Cndy Lon*inski# Cnthony 5- Aoseph# Eandy Lat,# and 7on 0toica- '((Q- 7mpro"ing +apEeduce performance in heterogeneous en"ironments- 7n 8roceedings of the Qth V0ED7b conference on Gperating systems design and implementation (G057W(Q)V0ED7b Cssociation# 9erkeley# .C# V0C# 'I!2'+atei Yaharia# 5hruba 9orthakur# Aoydeep 0en 0arma# Lhaled Elmeleegy# 0cott 0henker# 7on 0toica 5elay scheduling a simple techniNue for achie"ing locality and fairness in cluster scheduling- Euro0ys '(1( '43!'6Q :yson .ondie# Deil .on*ay# 8eter Cl"aro# Aoseph +- @ellerstein# Lhaled Elmeleegy# Eussell 0ears +apEeduce Gnline- D057 '(1( )1)!)'Q Br,egor, +ale*ic,# +atthe* @- Custern# Cart A- .- 9ik# Aames .- 5ehnert# 7lan @orn# Daty =eiser# Br,egor, .,a>ko*ski 8regel a system for large!scale graph processing- 07B+G5 .onference '(1( 1)3!124 5ominic 9attrc# 0tephan E*en# 1abian @ueske# Gde> Lao# Oolker +arkl# 5aniel Warneke Dephele/8C.:s a programming model and execution frame*ork for *eb!scale analytical processing- 0o.. '(1( 11I!1)(
'(1'!'(1) I3

You might also like