Professional Documents
Culture Documents
useR! 2014
http://r-pbd.org/tutorial
ppbbddR
R Core Team
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Downloads
This presentation is available at: http://r-pbd.org/tutorial
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Installation Instructions
Installation instructions for setting up a pbdR environment are available:
http://r-pbd.org/install.html
This includes instructions for installing R, MPI, and pbdR .
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Contents
1
Introduction
Introduction to pbdMPI
Data Input
10
MPI Profiling
11
Wrapup
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Introduction
Contents
Introduction
A Concise Introduction to Parallelism
A Quick Overview of Parallel Hardware
A Quick Overview of Parallel Software
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Introduction
Introduction
A Concise Introduction to Parallelism
A Quick Overview of Parallel Hardware
A Quick Overview of Parallel Software
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Introduction
Parallelism
Serial Programming
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Parallel Programming
1/131
Introduction
Difficulty in Parallelism
1
http://r-pbd.org/tutorial
ppbbddR
R Core Team
2/131
Introduction
Speedup
Wallclock Time: Time of the clock on the wall from start to finish
Speedup: unitless measure of improvement; more is better.
Sn1 ,n2 =
n1 is often taken to be 1
In this case, comparing parallel algorithm to serial algorithm
http://r-pbd.org/tutorial
ppbbddR
R Core Team
3/131
Introduction
Good Speedup
Bad Speedup
group
Application
Optimal
4
Speedup
Speedup
group
Application
Optimal
4
Cores
http://r-pbd.org/tutorial
Cores
ppbbddR
R Core Team
4/131
Introduction
http://r-pbd.org/tutorial
ppbbddR
R Core Team
5/131
Introduction
50
3100
40
3050
30
Time
Speedup
3000
20
10
2950
0
0
20
40
60
Cores
http://r-pbd.org/tutorial
4000
8000
12000
16000
Cores
ppbbddR
R Core Team
6/131
Introduction
Shared Memory
Direct access to read/change
memory (one node)
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Distributed
No direct access to read/change
memory (many nodes); requires
communication
7/131
Introduction
http://r-pbd.org/tutorial
ppbbddR
R Core Team
8/131
Introduction
Introduction
A Concise Introduction to Parallelism
A Quick Overview of Parallel Hardware
A Quick Overview of Parallel Software
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Introduction
Distributed Memory
Interconnection Network
PROC
+ cache
PROC
+ cache
PROC
+ cache
PROC
+ cache
Mem
Mem
Mem
Mem
Co-Processor
GPU
or
MIC
Shared Memory
CORE
+ cache
CORE
+ cache
CORE
+ cache
Local Memory
GPU: Graphical Processing Unit
CORE
+ cache
Network
Memory
http://r-pbd.org/tutorial
ppbbddR
R Core Team
9/131
Introduction
Distributed Memory
Interconnection Network
PROC
+ cache
PROC
+ cache
PROC
+ cache
PROC
+ cache
Mem
Mem
Mem
Mem
Co-Processor
GPU
or
MIC
Shared Memory
CORE
+ cache
CORE
+ cache
CORE
+ cache
Local Memory
GPU: Graphical Processing Unit
CORE
+ cache
Network
Memory
http://r-pbd.org/tutorial
ppbbddR
R Core Team
10/131
Introduction
A Server or Cluster
Distributed Memory
Interconnection Network
PROC
+ cache
PROC
+ cache
PROC
+ cache
PROC
+ cache
Mem
Mem
Mem
Mem
Co-Processor
GPU
or
MIC
Shared Memory
CORE
+ cache
CORE
+ cache
CORE
+ cache
Local Memory
GPU: Graphical Processing Unit
CORE
+ cache
Network
Memory
http://r-pbd.org/tutorial
ppbbddR
R Core Team
11/131
Introduction
Server to Supercomputer
Distributed Memory
Interconnection Network
PROC
+ cache
PROC
+ cache
PROC
+ cache
PROC
+ cache
Mem
Mem
Mem
Mem
Co-Processor
GPU
or
MIC
Shared Memory
CORE
+ cache
CORE
+ cache
CORE
+ cache
Local Memory
GPU: Graphical Processing Unit
CORE
+ cache
Network
Memory
http://r-pbd.org/tutorial
ppbbddR
R Core Team
12/131
Introduction
ing
t
r
u
e
ib
st
Clu Distr
Distributed Memory
Interconnection Network
PROC
+ cache
PROC
+ cache
PROC
+ cache
PROC
+ cache
Mem
Mem
Mem
Mem
Mu
CORE
+ cache
e
cor
y
n
a
ing
r M
d
o
a
U
o
ffl
GP
O
Co-Processor
ore
ltic
GPU
or
MIC
Shared Memory
CORE
+ cache
CORE
+ cache
Local Memory
GPU: Graphical Processing Unit
CORE
+ cache
Network
ing
d
a
re
ti t h
l
u
M
Memory
http://r-pbd.org/tutorial
ppbbddR
R Core Team
13/131
Introduction
Introduction
A Concise Introduction to Parallelism
A Quick Overview of Parallel Hardware
A Quick Overview of Parallel Software
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Introduction
Distributed Memory
Sockets
MPI
Hadoop
Interconnection Network
PROC
+ cache
PROC
+ cache
PROC
+ cache
PROC
+ cache
Mem
Mem
Mem
Mem
Co-Processor
GPU
or
MIC
Shared Memory
CORE
+ cache
CORE
+ cache
CORE
+ cache
Network
Memory
http://r-pbd.org/tutorial
CUDA
OpenCL
OpenACC
Local Memory
GPU: Graphical Processing Unit
CORE
+ cache
ppbbddR
R Core Team
OpenMP
OpenACC
OpenMP
Pthreads
fork
14/131
Introduction
Distributed Memory
Sockets
MPI
Hadoop
Interconnection Network
PROC
+ cache
PROC
+ cache
PROC
+ cache
PROC
+ cache
Mem
Mem
Mem
Mem
Co-Processor
GPU
or
MIC
Shared Memory
CORE
+ cache
CORE
+ cache
CORE
+ cache
Network
Memory
http://r-pbd.org/tutorial
CUDA
OpenCL
OpenACC
Local Memory
GPU: Graphical Processing Unit
CORE
+ cache
ppbbddR
R Core Team
OpenMP
OpenACC
OpenMP
Pthreads
fork
15/131
Introduction
Distributed Memory
Sockets
MPI
Hadoop
Interconnection Network
PROC
+ cache
PROC
+ cache
PROC
+ cache
PROC
+ cache
Mem
Mem
Mem
Mem
Co-Processor
GPU
or
MIC
Shared Memory
CORE
+ cache
CORE
+ cache
CORE
+ cache
Network
Memory
http://r-pbd.org/tutorial
CUDA
OpenCL
OpenACC
Local Memory
GPU: Graphical Processing Unit
CORE
+ cache
ppbbddR
R Core Team
OpenMP
OpenACC
OpenMP
Pthreads
fork
16/131
Introduction
Distributed Memory
Sockets
MPI
Hadoop
Interconnection Network
PROC
+ cache
PROC
+ cache
PROC
+ cache
PROC
+ cache
Mem
Mem
Mem
Mem
Co-Processor
GPU
or
MIC
Shared Memory
CORE
+ cache
CORE
+ cache
CORE
+ cache
Network
Memory
http://r-pbd.org/tutorial
CUDA
OpenCL
OpenACC
Local Memory
GPU: Graphical Processing Unit
CORE
+ cache
ppbbddR
R Core Team
OpenMP
OpenACC
OpenMP
Pthreads
fork
17/131
Introduction
Distributed Memory
PROC
+ cache
PROC
+ cache
PROC
+ cache
PROC
+ cache
Mem
Mem
Mem
Mem
Co-Processor
GPU
or
MIC
Shared Memory
CORE
+ cache
CORE
+ cache
Network
Memory
CUDA
OpenCL
OpenACC
Local Memory
GPU: Graphical Processing Unit
CORE
+ cache
http://r-pbd.org/tutorial
snow
Rmpi
pbdMPI
RHadoop
Interconnection Network
CORE
+ cache
Sockets
MPI
Hadoop
ppbbddR
R Core Team
OpenMP
OpenACC
OpenMP
Pthreads
fork
Foreign
Langauge
Interfaces:
.C
.Call
Rcpp
OpenCL
inline
.
.
.
multicore
18/131
Introduction
Summary
Introduction
A Concise Introduction to Parallelism
A Quick Overview of Parallel Hardware
A Quick Overview of Parallel Software
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Introduction
Summary
Summary
Three flavors of hardware
Distributed is stable
Multicore and co-processor are evolving
Two memory models
Distributed works in multicore
Parallelism hierarchy
Medium to big machines have all three
http://r-pbd.org/tutorial
ppbbddR
R Core Team
19/131
Contents
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Why Profile?
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Why Profile?
http://r-pbd.org/tutorial
ppbbddR
R Core Team
20/131
Why Profile?
Why Profile?
Because performance matters.
Bad practices scale up!
Your bottlenecks may surprise you.
Because R is dumb.
R users claim to be data people. . . so act like it!
http://r-pbd.org/tutorial
ppbbddR
R Core Team
21/131
Why Profile?
main :
int main () {
int x , i ;
for ( i =0; i <10; i ++)
x = 1;
return 0;
}
. cfi _ startproc
# BB #0:
movl
movl
$ 0 , -4(% rsp )
$ 0 , -12(% rsp )
cmpl
jge
$ 10 , -12(% rsp )
. LBB0 _ 4
movl
$ 1 , -8(% rsp )
movl
addl
movl
jmp
movl
ret
$ 0 , % eax
. LBB0 _ 1:
# BB #2:
main :
. cfi _ startproc
# BB #0:
xorl
% eax ,
% eax
ret
http://r-pbd.org/tutorial
. LBB0 _ 4:
ppbbddR
R Core Team
22/131
Why Profile?
R will not!
Dumb Loop
1
2
3
4
5
6
7
for
tA
Y
Q
Y
Q
}
(i
<<<<<-
Better Loop
in 1: n ) {
t(A)
tA % * % Q
qr . Q ( qr ( Y ) )
A %*% Q
qr . Q ( qr ( Y ) )
1
2
3
4
5
6
7
8
9
for
Y
Q
Y
Q
}
(i
<<<<-
in 1: n ) {
tA % * % Q
qr . Q ( qr ( Y ) )
A %*% Q
qr . Q ( qr ( Y ) )
9
10
http://r-pbd.org/tutorial
tA <- t(A)
ppbbddR
R Core Team
23/131
Why Profile?
while (i <= N ) {
for ( j in 1: i ) {
d.k <- as.matrix(x)[l==j,l==j]
...
2
3
4
5
6
while (i <= N ) {
for ( j in 1: i ) {
d.k <- x.mat[l==j,l==j]
...
http://r-pbd.org/tutorial
ppbbddR
R Core Team
24/131
Why Profile?
Some Thoughts
R is slow.
Bad programmers are slower.
R isnt very clever (compared to a compiler).
The Bytecode compiler helps, but not nearly as much as a compiler.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
25/131
Profiling R Code
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Profiling R Code
Timings
Getting simple timings as a basic measure of performance is easy, and
valuable.
system.time() timing blocks of code.
Rprof() timing execution of R functions.
Rprofmem() reporting memory allocation in R .
tracemem() detect when a copy of an R object is created.
The rbenchmark package Benchmark comparisons.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
26/131
Advanced R Profiling
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Advanced R Profiling
http://r-pbd.org/tutorial
ppbbddR
R Core Team
27/131
Advanced R Profiling
2. Run code
mpirun - np 64 Rscript my _ script . R
3. Analyze results
1
2
3
library ( pbdPROF )
prof <- read . prof ( " output . mpiP " )
plot ( prof , plot . type = " messages2 " )
http://r-pbd.org/tutorial
ppbbddR
R Core Team
28/131
Advanced R Profiling
http://r-pbd.org/tutorial
Description of Measurement
Time, floating point instructions, and Mflips
Time, floating point operations, and Mflops
Cache misses, hits, accesses, and reads
Events per cycle
Idle cycles
CPU or RAM bound
CPU utilization
ppbbddR
R Core Team
29/131
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Summary
Summary
Profile, profile, profile.
Use system.time() to get a general sense of a method.
Use rbenchmarks benchmark() to compare 2 methods.
Use Rprof() for more detailed profiling.
Other tools exist for more hardcore applications (pbdPAPI and
pbdPROF).
http://r-pbd.org/tutorial
ppbbddR
R Core Team
30/131
Contents
http://r-pbd.org/tutorial
ppbbddR
R Core Team
http://r-pbd.org/tutorial
ppbbddR
R Core Team
http://r-pbd.org/tutorial
ppbbddR
R Core Team
31/131
pbdR Packages
http://r-pbd.org/tutorial
ppbbddR
R Core Team
32/131
pbdR Motivation
Why HPC libraries (MPI, ScaLAPACK, PETSc, . . . )?
The HPC community has been at this for decades.
Theyre tested. They work. Theyre fast.
Youre not going to beat Jack Dongarra at dense linear algebra.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
33/131
pbdMPI
# int
mpi . allreduce (x , type =1)
# double
mpi . allreduce (x , type =2)
allreduce ( x )
Types in R
1
2
3
4
5
6
http://r-pbd.org/tutorial
ppbbddR
R Core Team
34/131
1 2 3 4
12 4
12
16
24
32
48
64
Cores
library ( pbdDMAT )
2
3
4
http://r-pbd.org/tutorial
ppbbddR
R Core Team
35/131
Predictors
.4
41
1.4
.4
41
500
1000
2000
.09
16
10
0.7
17
100
.3468 35
2142. 85.
75
.09
16
10
50
.35
85
.3468
2142.
0.7
17
34
.09
16
00
64
24
40
80
17
32
5
.34 8 5.3
2142.6 8
5
1004
08
20
16
25
10
0.7
Cores
http://r-pbd.org/tutorial
ppbbddR
R Core Team
36/131
http://r-pbd.org/tutorial
ppbbddR
R Core Team
37/131
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Tau
Distributed Memory
Interconnection Network
ScaLAPACK
PBLAS
BLACSPROC
PROC
+ cache
+ cache
PROC
+ cache
pbdPROF
pbdPROF
pbdPROF
PETSc
mpiP
Trilinos
PROC
+ cache
fpmpi
pbdMPI
pbdDMAT
Mem
Mem
MPI
Mem
Mem
MKL
LibSci
pbdPAPI
PAPI
GPU
or
MIC
DPLASMA
NetCDF4
Shared Memory
CORE
+ cache
Co-Processor
CORE
ACML
+ cache
CORE
+ cache
magma
Local Memory
PLASMA
MAGMA
Network
pbdNCDF4
CORE
+ cache
HiPLARM
HiPLAR
Memory
ADIOS
cuBLAS
pbdADIOS
http://r-pbd.org/tutorial
ppbbddR
R Core Team
38/131
Using pbdR
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Using pbdR
pbdR Paradigms
pbdR programs are R programs!
Differences:
Batch execution (non-interactive).
Parallel code utilizes Single Program/Multiple Data (SPMD) style
Emphasizes data parallelism.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
39/131
Using pbdR
Batch Execution
Running a serial R program in batch:
1
Rscript my _ script . r
or
1
http://r-pbd.org/tutorial
ppbbddR
R Core Team
40/131
Using pbdR
Paradigms
Programming models
OOP, Functional, SPMD, . . .
http://r-pbd.org/tutorial
ppbbddR
R Core Team
SIMD
Hardware instructions
MMX, SSE, . . .
41/131
Using pbdR
http://r-pbd.org/tutorial
ppbbddR
R Core Team
42/131
Summary
Summary
pbdR connects R to scalable HPC libraries.
The pbdDEMO package offers numerous examples and explanations
for getting started with distributed R programming.
pbdR programs are R programs.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
43/131
Introduction to pbdMPI
Contents
Introduction to pbdMPI
Managing a Communicator
Reduce, Gather, Broadcast, and Barrier
Other pbdMPI Tools
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Introduction to pbdMPI
Managing a Communicator
Introduction to pbdMPI
Managing a Communicator
Reduce, Gather, Broadcast, and Barrier
Other pbdMPI Tools
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Introduction to pbdMPI
Managing a Communicator
http://r-pbd.org/tutorial
ppbbddR
R Core Team
44/131
Introduction to pbdMPI
Managing a Communicator
MPI Operations (1 of 2)
Managing a Communicator: Create and destroy communicators.
init() initialize communicator
finalize() shut down communicator(s)
Rank query: determine the processors position in the communicator.
comm.rank() who am I?
comm.size() how many of us are there?
Printing: Printing output from various ranks.
comm.print(x)
comm.cat(x)
WARNING: only use these functions on results, never on
yet-to-be-computed things.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
45/131
Introduction to pbdMPI
Managing a Communicator
Quick Example 1
Rank Query: 1 rank.r
1
2
3
4
5
6
7
finalize ()
Sample Output:
1
2
3
4
http://r-pbd.org/tutorial
ppbbddR
R Core Team
COMM . RANK = 0
[1] 0
COMM . RANK = 1
[1] 1
46/131
Introduction to pbdMPI
Managing a Communicator
Quick Example 2
Hello World: 2 hello.r
1
2
3
4
5
6
comm . print ( " Hello again " , all . rank = TRUE , quietly = TRUE )
7
8
finalize ()
Sample Output:
1
2
3
4
http://r-pbd.org/tutorial
ppbbddR
R Core Team
COMM . RANK = 0
[1] " Hello , world "
[1] " Hello again "
[1] " Hello again "
47/131
Introduction to pbdMPI
Introduction to pbdMPI
Managing a Communicator
Reduce, Gather, Broadcast, and Barrier
Other pbdMPI Tools
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Introduction to pbdMPI
MPI Operations
1
Reduce
Gather
Broadcast
Barrier
http://r-pbd.org/tutorial
ppbbddR
R Core Team
48/131
Introduction to pbdMPI
http://r-pbd.org/tutorial
ppbbddR
R Core Team
49/131
Introduction to pbdMPI
Gather Many-to-one
http://r-pbd.org/tutorial
ppbbddR
R Core Team
50/131
Introduction to pbdMPI
Broadcast One-to-many
http://r-pbd.org/tutorial
ppbbddR
R Core Team
51/131
Introduction to pbdMPI
Barrier Synchronization
http://r-pbd.org/tutorial
ppbbddR
R Core Team
52/131
Introduction to pbdMPI
MPI Operations (2 of 2)
Reduction: each processor has a number x; add all of them up, find
the largest/smallest, . . . .
reduce(x, op=sum) reduce to one
allreduce(x, op=sum) reduce to all
Gather: each processor has a number; create a new object on some
processor containing all of those numbers.
gather(x) gather to one
allgather(x) gather to all
Broadcast: one processor has a number x that every other processor
should also have.
bcast(x)
Barrier: computation wall; no processor can proceed until all
processors can proceed.
barrier()
http://r-pbd.org/tutorial
ppbbddR
R Core Team
53/131
Introduction to pbdMPI
Quick Example 3
Reduce and Gather: 3 gt.r
1
2
3
4
5
6
7
8
9
gt <- gather ( n )
comm . print ( unlist ( gt ) )
10
11
12
13
14
finalize ()
mpirun - np 2 Rscript 3 _ gt . r
Sample Output:
1
2
3
4
5
6
http://r-pbd.org/tutorial
ppbbddR
R Core Team
COMM . RANK = 0
[1] 2 8
COMM . RANK = 0
[1] 10
COMM . RANK = 1
[1] 10
54/131
Introduction to pbdMPI
Quick Example 4
Broadcast: 4 bcast.r
1
2
3
4
5
6
7
8
9
10
11
12
13
14
finalize ()
Sample Output:
1
2
3
4
http://r-pbd.org/tutorial
ppbbddR
R Core Team
COMM . RANK = 1
[ ,1] [ ,2]
[1 ,]
1
3
[2 ,]
2
4
55/131
Introduction to pbdMPI
Introduction to pbdMPI
Managing a Communicator
Reduce, Gather, Broadcast, and Barrier
Other pbdMPI Tools
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Introduction to pbdMPI
Random Seeds
pbdMPI offers a simple interface for managing random seeds:
comm.set.seed(seed=1234, diff=TRUE) All processors
generate different streams.
comm.set.seed(seed=1234, diff=FALSE) All processors
generate same streams.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
56/131
Introduction to pbdMPI
http://r-pbd.org/tutorial
ppbbddR
R Core Team
57/131
Introduction to pbdMPI
Summary
Introduction to pbdMPI
Managing a Communicator
Reduce, Gather, Broadcast, and Barrier
Other pbdMPI Tools
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Introduction to pbdMPI
Summary
Summary
Start by loading the package:
1
init ()
2
3
# ...
4
5
finalize ()
http://r-pbd.org/tutorial
ppbbddR
R Core Team
58/131
GBD
Contents
http://r-pbd.org/tutorial
ppbbddR
R Core Team
GBD
http://r-pbd.org/tutorial
ppbbddR
R Core Team
GBD
Distributing Data
Problem: How to distribute the data
x =
x1,1
x2,1
x3,1
x4,1
x5,1
x6,1
x7,1
x8,1
x9,1
x10,1
http://r-pbd.org/tutorial
x1,2
x2,2
x3,2
x4,2
x5,2
x6,2
x7,2
x8,2
x9,2
x10,2
x1,3
x2,3
x3,3
x4,3
x5,3
x6,3
x7,3
x8,3
x9,3
x10,3
103
ppbbddR
R Core Team
59/131
GBD
Distributing a Matrix
Data
x1,1 x1,2
x2,1 x2,2
x
3,1 x3,2
x
4,1 x4,2
x5,2
x
x = 5,1
x6,1 x6,2
x7,1 x7,2
x8,1 x8,2
x9,1 x9,2
x10,1 x10,2
http://r-pbd.org/tutorial
0
x1,3
1
x2,3
2
x3,3
3
x4,3
x5,3
x6,3
x7,3
x8,3
x9,3
x10,3 103
ppbbddR
R Core Team
60/131
GBD
Distributing a Matrix
Data
x1,1 x1,2
x2,1 x2,2
x
3,1 x3,2
x
4,1 x4,2
x5,2
x
x = 5,1
x6,1 x6,2
x7,1 x7,2
x8,1 x8,2
x9,1 x9,2
x10,1 x10,2
http://r-pbd.org/tutorial
0
x1,3
1
x2,3
2
x3,3
3
x4,3
x5,3
x6,3
x7,3
x8,3
x9,3
x10,3 103
ppbbddR
R Core Team
61/131
GBD
The last row of the local storage of a processor is adjacent (by global row) to the first row
of the local storage of next processor (by communicator number) that owns data.
GBD is (relatively) easy to understand, but can lead to bottlenecks if you have many more
columns than rows.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
x1,1
x2,1
x3,1
x4,1
x5,1
x6,1
x7,1
x8,1
x9,1
x10,1
x1,2
x2,2
x3,2
x4,2
x5,2
x6,2
x7,2
x8,2
x9,2
x10,2
x1,3
x2,3
x3,3
x4,3
x5,3
x6,3
x7,3
x8,3
x9,3
x10,3
62/131
GBD
http://r-pbd.org/tutorial
ppbbddR
R Core Team
GBD
x =
x11
x21
x31
x41
x51
x61
x71
x81
x91
x12
x22
x32
x42
x52
x62
x72
x82
x92
x13
x23
x33
x43
x53
x63
x73
x83
x93
x14
x24
x34
x44
x54
x64
x74
x84
x94
x15
x25
x35
x45
x55
x65
x75
x85
x95
x16
x26
x36
x46
x56
x66
x76
x86
x96
x17
x27
x37
x47
x57
x67
x77
x87
x97
x18
x28
x38
x48
x58
x68
x78
x88
x98
x19
x29
x39
x49
x59
x69
x79
x89
x99
99
Processors = 0 1 2 3 4 5
http://r-pbd.org/tutorial
ppbbddR
R Core Team
63/131
GBD
x =
x11
x21
x31
x41
x51
x61
x71
x81
x91
x12
x22
x32
x42
x52
x62
x72
x82
x92
x13
x23
x33
x43
x53
x63
x73
x83
x93
x14
x24
x34
x44
x54
x64
x74
x84
x94
x15
x25
x35
x45
x55
x65
x75
x85
x95
x16
x26
x36
x46
x56
x66
x76
x86
x96
x17
x27
x37
x47
x57
x67
x77
x87
x97
x18
x28
x38
x48
x58
x68
x78
x88
x98
x19
x29
x39
x49
x59
x69
x79
x89
x99
99
Processors = 0 1 2 3 4 5
http://r-pbd.org/tutorial
ppbbddR
R Core Team
64/131
GBD
29
29
29
19
19
19
Processors = 0 1 2 3 4 5
http://r-pbd.org/tutorial
ppbbddR
R Core Team
65/131
GBD
x =
x11
x21
x31
x41
x51
x61
x71
x81
x91
x12
x22
x32
x42
x52
x62
x72
x82
x92
x13
x23
x33
x43
x53
x63
x73
x83
x93
x14
x24
x34
x44
x54
x64
x74
x84
x94
x15
x25
x35
x45
x55
x65
x75
x85
x95
x16
x26
x36
x46
x56
x66
x76
x86
x96
x17
x27
x37
x47
x57
x67
x77
x87
x97
x18
x28
x38
x48
x58
x68
x78
x88
x98
x19
x29
x39
x49
x59
x69
x79
x89
x99
99
Processors = 0 1 2 3 4 5
http://r-pbd.org/tutorial
ppbbddR
R Core Team
66/131
GBD
09
x11
x21
x31
x41
x51
x61
x71
x12
x22
x32
x42
x13
x23
x33
x43
x14
x24
x34
x44
x15
x25
x35
x45
x16
x26
x36
x46
x17
x27
x37
x47
x18
x28
x38
x48
x19
x29
x39
x49 49
x59
x69 29
x79 19
09
29
Processors = 0 1 2 3 4 5
http://r-pbd.org/tutorial
ppbbddR
R Core Team
67/131
GBD
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
GBD
Summary
Summary
Need to distribute your data? Try splitting by row.
May not work well if your data is square (or longer than tall).
http://r-pbd.org/tutorial
ppbbddR
R Core Team
68/131
Contents
http://r-pbd.org/tutorial
ppbbddR
R Core Team
http://r-pbd.org/tutorial
ppbbddR
R Core Team
0.8
0.6
0.4
0.2
0.0
0.0
http://r-pbd.org/tutorial
0.2
0.4
ppbbddR
R Core Team
0.6
0.8
1.0
69/131
Ask everyone else what their answer is; sum it all up.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
70/131
N <- 50000
X <- matrix ( runif ( N * 2) , ncol =2)
r <- sum ( rowSums ( X ^2) <= 1)
PI <- 4 * r / N
print ( PI )
Parallel Code
1
2
3
4
5
6
7
8
9
10
11
12
finalize ()
http://r-pbd.org/tutorial
ppbbddR
R Core Team
71/131
Note
For the remainder, we will exclude loading, init, and finalize calls.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
72/131
http://r-pbd.org/tutorial
ppbbddR
R Core Team
cov (xnp ) =
1 X
(xi x ) (xi x )T
n1
i=1
http://r-pbd.org/tutorial
ppbbddR
R Core Team
73/131
Subtract each columns mean from that columns entries in each local
matrix.
Divide by N 1.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
74/131
N <- nrow ( X )
mu <- colSums ( X ) / N
3
4
5
6
7
print ( Cov . X )
Parallel Code
1
2
3
4
5
6
7
http://r-pbd.org/tutorial
ppbbddR
R Core Team
75/131
http://r-pbd.org/tutorial
ppbbddR
R Core Team
http://r-pbd.org/tutorial
ppbbddR
R Core Team
76/131
Locally, compute tx = x T
Locally, compute A1 B
http://r-pbd.org/tutorial
ppbbddR
R Core Team
77/131
tX <- t ( X )
A <- tX % * % X
B <- tX % * % y
4
5
Parallel Code
1
2
3
4
5
http://r-pbd.org/tutorial
ppbbddR
R Core Team
78/131
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Summary
Summary
SPMD programming is (often) a natural extension of serial
programming.
More pbdMPI examples in pbdDEMO.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
79/131
Data Input
Contents
Data Input
Cluster Computer and File System
Serial Data Input
Parallel Data Input
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Data Input
Data Input
Cluster Computer and File System
Serial Data Input
Parallel Data Input
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Data Input
File System
Storage
Server
Disk
Compute Nodes
http://r-pbd.org/tutorial
ppbbddR
R Core Team
80/131
Data Input
File System
Storage
Server
Disk
Compute Nodes
http://r-pbd.org/tutorial
ppbbddR
R Core Team
81/131
Data Input
Compute Nodes
http://r-pbd.org/tutorial
Parallel
File System
Storage
Servers
ppbbddR
R Core Team
Disk
82/131
Data Input
Compute Nodes
http://r-pbd.org/tutorial
I/O Nodes
ppbbddR
R Core Team
Parallel
File System
Storage
Servers
Disk
83/131
Data Input
Compute Nodes
pbdADIOS
http://r-pbd.org/tutorial
I/O Nodes
Parallel
File System
Storage
Servers
Disk
ADIOS
ppbbddR
R Core Team
84/131
Data Input
Data Input
Cluster Computer and File System
Serial Data Input
Parallel Data Input
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Data Input
http://r-pbd.org/tutorial
ppbbddR
R Core Team
85/131
Data Input
library ( pbdDMAT )
if ( comm . rank () == 0) { # only read on process 0
x <- read . csv ( " myfile . csv " )
} else {
x <- NULL
}
7
8
dx <- as . ddmatrix ( x )
http://r-pbd.org/tutorial
ppbbddR
R Core Team
86/131
Data Input
Data Input
Cluster Computer and File System
Serial Data Input
Parallel Data Input
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Data Input
New Issues
How to read in parallel?
CSV, SQL, NetCDF4, HDF, ADIOS, custom binary
How to partition data across nodes?
How to structure for scalable libraries?
Read directly into form needed or restructure?
...
A lot of work needed here!
http://r-pbd.org/tutorial
ppbbddR
R Core Team
87/131
Data Input
CSV Data
Serial Code
1
2
3
Parallel Code
1
2
3
4
5
dx <- read . csv . ddmatrix ( " x . csv " , header = TRUE , sep = " ," ,
nrows =10 , ncols =10 , num . rdrs =2 , ICTXT =0)
6
7
dx
8
9
finalize ()
http://r-pbd.org/tutorial
ppbbddR
R Core Team
88/131
Data Input
1
2
3
4
5
6
7
8
9
10
11
con <- file ( " binary . vector . file " , " rb " )
seekval <- seek ( con , where = my _ start , rw = " read " )
x <- readBin ( con , what = " double " , n = my _ length , size = size )
http://r-pbd.org/tutorial
ppbbddR
R Core Team
89/131
Data Input
1
2
3
4
5
6
7
8
9
10
11
con <- file ( " binary . matrix . file " , " rb " )
seekval <- seek ( con , where = my _ start , rw = " read " )
x <- readBin ( con , what = " double " , n = my _ length , size = size )
12
13
14
15
16
17
18
19
20
21
22
http://r-pbd.org/tutorial
ppbbddR
R Core Team
90/131
Data Input
NetCDF4 Data
1
2
3
4
5
6
7
8
finalize ()
http://r-pbd.org/tutorial
ppbbddR
R Core Team
91/131
Data Input
Summary
Data Input
Cluster Computer and File System
Serial Data Input
Parallel Data Input
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Data Input
Summary
Summary
Mostly do it yourself
Parallel file system for big data
Binary files for true parallel reads
Know number of readers vs number of storage servers
http://r-pbd.org/tutorial
ppbbddR
R Core Team
92/131
Contents
http://r-pbd.org/tutorial
ppbbddR
R Core Team
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Distributed Matrices
You can only get so far with one node. . .
1000
Script
100
10
16
64
256
Cores
ppbbddR
R Core Team
93/131
Distributed Matrices
(a) Block
(b) Cyclic
(c) Block-Cyclic
http://r-pbd.org/tutorial
ppbbddR
R Core Team
94/131
Distributed Matrices
(a) 2d Block
(b) 2d Cyclic
(c) 2d Block-Cyclic
http://r-pbd.org/tutorial
ppbbddR
R Core Team
95/131
0
1
4
0
3
5
(a) 1 6
1
4
2
5
(b) 2 3
0
2
4
1
3
5
(c) 3 2
0
1
2
3
4
5
(d) 6 1
http://r-pbd.org/tutorial
ppbbddR
R Core Team
96/131
Data
dim
ddmatrix = ldim
bldim
ICTXT
http://r-pbd.org/tutorial
ppbbddR
R Core Team
97/131
x =
http://r-pbd.org/tutorial
x11
x21
x31
x41
x51
x61
x71
x81
x91
x12
x22
x32
x42
x52
x62
x72
x82
x92
x13
x23
x33
x43
x53
x63
x73
x83
x93
x14
x24
x34
x44
x54
x64
x74
x84
x94
x15
x25
x35
x45
x55
x65
x75
x85
x95
ppbbddR
R Core Team
x16
x26
x36
x46
x56
x66
x76
x86
x96
x17
x27
x37
x47
x57
x67
x77
x87
x97
x18
x28
x38
x48
x58
x68
x78
x88
x98
x19
x29
x39
x49
x59
x69
x79
x89
x99
99
98/131
x =
http://r-pbd.org/tutorial
x11
x21
x31
x41
x51
x61
x71
x81
x91
x12
x22
x32
x42
x52
x62
x72
x82
x92
x13
x23
x33
x43
x53
x63
x73
x83
x93
x14
x24
x34
x44
x54
x64
x74
x84
x94
x15
x25
x35
x45
x55
x65
x75
x85
x95
0
1
Processor grid =
2
3
ppbbddR
R Core Team
x19
x29
x39
x49
x59
x69
x79
x89
x99
99
99/131
x =
x11
x21
x31
x41
x51
x61
x71
x81
x91
x12
x22
x32
x42
x52
x62
x72
x82
x92
x13
x23
x33
x43
x53
x63
x73
x83
x93
x14
x24
x34
x44
x54
x64
x74
x84
x94
x15
x25
x35
x45
x55
x65
x75
x85
x95
0 1
Processor grid =
2 3
http://r-pbd.org/tutorial
ppbbddR
R Core Team
x16
x26
x36
x46
x56
x66
x76
x86
x96
x17
x27
x37
x47
x57
x67
x77
x87
x97
(0,0)
=
(1,0)
x18
x28
x38
x48
x58
x68
x78
x88
x98
x19
x29
x39
x49
x59
x69
x79
x89
x99
99
(0,1)
(1,1)
100/131
x =
http://r-pbd.org/tutorial
x11
x21
x31
x41
x51
x61
x71
x81
x91
x12
x22
x32
x42
x52
x62
x72
x82
x92
x13
x23
x33
x43
x53
x63
x73
x83
x93
x14
x24
x34
x44
x54
x64
x74
x84
x94
x15
x25
x35
x45
x55
x65
x75
x85
x95
0
1
Processor grid =
2
3
ppbbddR
R Core Team
x19
x29
x39
x49
x59
x69
x79
x89
x99
99
101/131
x =
x11
x21
x31
x41
x51
x61
x71
x81
x91
x12
x22
x32
x42
x52
x62
x72
x82
x92
x13
x23
x33
x43
x53
x63
x73
x83
x93
x14
x24
x34
x44
x54
x64
x74
x84
x94
x15
x25
x35
x45
x55
x65
x75
x85
x95
0 1
Processor grid =
2 3
http://r-pbd.org/tutorial
ppbbddR
R Core Team
x16
x26
x36
x46
x56
x66
x76
x86
x96
x17
x27
x37
x47
x57
x67
x77
x87
x97
(0,0)
=
(1,0)
x18
x28
x38
x48
x58
x68
x78
x88
x98
x19
x29
x39
x49
x59
x69
x79
x89
x99
99
(0,1)
(1,1)
102/131
x =
x11
x21
x31
x41
x51
x61
x71
x81
x91
x12
x22
x32
x42
x52
x62
x72
x82
x92
x13
x23
x33
x43
x53
x63
x73
x83
x93
x14
x24
x34
x44
x54
x64
x74
x84
x94
x15
x25
x35
x45
x55
x65
x75
x85
x95
0 1
Processor grid =
2 3
http://r-pbd.org/tutorial
ppbbddR
R Core Team
x16
x26
x36
x46
x56
x66
x76
x86
x96
x17
x27
x37
x47
x57
x67
x77
x87
x97
(0,0)
=
(1,0)
x18
x28
x38
x48
x58
x68
x78
x88
x98
x19
x29
x39
x49
x59
x69
x79
x89
x99
99
(0,1)
(1,1)
103/131
pbdDMAT
http://r-pbd.org/tutorial
ppbbddR
R Core Team
pbdDMAT
http://r-pbd.org/tutorial
ppbbddR
R Core Team
104/131
pbdDMAT
x =
x11
x21
x31
x41
x51
x61
x71
x81
x91
x12
x22
x32
x42
x52
x62
x72
x82
x92
x13
x23
x33
x43
x53
x63
x73
x83
x93
x14
x24
x34
x44
x54
x64
x74
x84
x94
x15
x25
x35
x45
x55
x65
x75
x85
x95
x16
x26
x36
x46
x56
x66
x76
x86
x96
x17
x27
x37
x47
x57
x67
x77
x87
x97
0 1 2 (0,0)
=
Processor grid =
3 4 5 (1,0)
http://r-pbd.org/tutorial
ppbbddR
R Core Team
x18
x28
x38
x48
x58
x68
x78
x88
x98
x19
x29
x39
x49
x59
x69
x79
x89
x99
(0,1)
(1,1)
99
(0,2)
(1,2)
105/131
pbdDMAT
x11
x21
x51
x61
x91
x12
x22
x52
x62
x92
x17
x27
x57
x67
x97
x31
x41
x71
x81
x32
x42
x72
x82
x37
x47
x77
x87
x18
x28
x58
x68
x98
54
x38
x48
x78
x88 44
x13
x23
x53
x63
x93
x14
x24
x54
x64
x94
x33
x43
x73
x83
x34
x44
x74
x84
x19
x29
x59
x69
x99
ppbbddR
R Core Team
53
x39
x49
x79
x89 43
0 1 2 (0,0)
=
Processor grid =
3 4 5 (1,0)
http://r-pbd.org/tutorial
x15
x25
x55
x65
x95
x35
x45
x75
x85
(0,1)
(1,1)
x16
x26
x56
x66
x96
52
x36
x46
x76
x86 42
(0,2)
(1,2)
106/131
pbdDMAT
x11
x21
x31
x41
x51
x61
x71
x81
x91
x12
x22
x32
x42
x52
x62
x72
x82
x92
x13
x23
x33
x43
x53
x63
x73
x83
x93
http://r-pbd.org/tutorial
ppbbddR
R Core Team
x14
x24
x34
x44
x54
x64
x74
x84
x94
x15
x25
x35
x45
x55
x65
x75
x85
x95
107/131
pbdDMAT
Cons
Confusing layout.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
108/131
pbdDMAT
cov ( x )
Parallel Code
1
cov ( x )
http://r-pbd.org/tutorial
ppbbddR
R Core Team
109/131
pbdDMAT
http://r-pbd.org/tutorial
ppbbddR
R Core Team
110/131
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Summary
Summary
1
init . grid ()
2
3
# ...
4
5
finalize ()
http://r-pbd.org/tutorial
ppbbddR
R Core Team
111/131
Contents
http://r-pbd.org/tutorial
ppbbddR
R Core Team
RandSVD
http://r-pbd.org/tutorial
ppbbddR
R Core Team
RandSVD
Randomized
SVD1
PROBABILISTIC ALGORITHMS FOR MATRIX APPROXIMATION
227
orthonormal
matrix
Q
whose
range
approximates
the range of A.
Stage B:
Draw B
an=n Q
A.
Gaussian random matrix .
41 Form
2
Form the m matrix Y = (AA )q A via alternating
application
5
Compute anSVD of the small matrix: B = U V .
of A and A .
6
Set U = QU
.
3
Construct an m matrix Q whose columns form an orthonormal
Note: The computation of Y in step 2 is vulnerable to round-o errors.
basis for the range of Y , e.g., via the QR factorization Y = QR.
When high accuracy is required, we must incorporate an orthonormalization
Note: This procedure is vulnerable to round-o errors; see Remark 4.3. The
step between each application of A and A ; see Algorithm 4.4.
recommended implementation appears as Algorithm 4.4.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
The theoryAlgorithm
developed in4.4:
thisRandomized
paper provides
much
more
detailed
information
Subspace Iteration
15
about
the performance
of theAproto-algorithm.
Given
an m n matrix
and integers and q, this algorithm computes an 16
the singular
values
of A decay
the error
A of
QQ
mWhen
orthonormal
matrix
Q whose
range slightly,
approximates
the range
A.A does17
depend
on the
dimensions
of the
matrix
1 not
Draw
an n
standard
Gaussian
matrix
.(sections 10.210.3).
canY0reduce
of the bracket
in the error bound
by combining18
2 We
Form
= Athe
andsize
compute
its QR factorization
Y0 = Q(1.8)
0 R0 .
3 the
for proto-algorithm
j = 1, 2, . . . , q with a power iteration (section 10.4). For an example,19
20
below.
4 see section
Form Y1.6
j = A Qj1 and compute its QR factorization Yj = Qj Rj .
For the
structured
matrices we mentioned in section 1.4.1, related21
random
5
Form
Yj = AQ
j and compute its QR factorization Yj = Qj Rj .
error
22
6
end bounds are in force (section 11).
7
Serial R
randSVD < f u n c t i o n (A , k , q=3)
{
## S t a g e A
Omega < matrix(rnorm(n*2*k),
nrow=n, ncol=2*k)
Y < A %% Omega
Q < q r .Q( q r (Y) )
At < t (A)
for ( i in 1: q)
{
Y < At %% Q
Q < q r .Q( q r (Y) )
Y < A %% Q
Q < q r .Q( q r (Y) )
}
## S t a g e B
B < t (Q) %% A
U < La . s v d (B) $u
U < Q %% U
U[ , 1: k ]
}
for constructing
112/131
RandSVD
Randomized SVD
Serial R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Parallel pbdR
http://r-pbd.org/tutorial
ppbbddR
R Core Team
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
113/131
RandSVD
Randomized SVD
30 Singular Vectors from a 100,000 by 1,000 Matrix
Algorithm
full
randomized
128
15
64
Speedup
16
Speedup
32
10
2
1
16
32
64
128
Cores
http://r-pbd.org/tutorial
16
32
64
128
Cores
ppbbddR
R Core Team
114/131
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Summary
Summary
pbdDMAT makes distributed (dense) linear algebra easier.
Can enable rapid prototyping at large scale.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
115/131
MPI Profiling
Contents
10
MPI Profiling
Profiling with the pbdPROF Package
Installing pbdPROF
Example
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
MPI Profiling
10
MPI Profiling
Profiling with the pbdPROF Package
Installing pbdPROF
Example
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
MPI Profiling
Introduction to pbdPROF
Successful Google Summer of Code 2013 project.
Available on the CRAN.
Enables profiling of MPI-using R scripts.
pbdR packages officially supported; can work with others. . .
Also reads, parses, and plots profiler outputs.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
116/131
MPI Profiling
How it works
MPI calls get hijacked by profiler and logged:
http://r-pbd.org/tutorial
ppbbddR
R Core Team
117/131
MPI Profiling
Introduction to pbdPROF
Currently supports the profilers fpmpi and mpiP.
fpmpi is distributed with pbdPROF and installs easily, but offers
minimal profiling capabilities.
mpiP is fully supported also, but you have to install and link it
yourself.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
118/131
MPI Profiling
10
Installing pbdPROF
MPI Profiling
Profiling with the pbdPROF Package
Installing pbdPROF
Example
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
MPI Profiling
Installing pbdPROF
Installing pbdPROF
1
Build pbdPROF.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
119/131
MPI Profiling
Installing pbdPROF
Rebuild pbdMPI
R CMD INSTALL pbdMPI _ 0.2 -2. tar . gz
-- configure - args = " -- enable - pbdPROF "
Any package which explicitly links with an MPI library must be rebuilt
in this way (pbdMPI, Rmpi, . . . ).
Other pbdR packages link with pbdMPI, and so do not need to be
rebuilt.
See pbdPROF vignette if something goes wrong.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
120/131
MPI Profiling
10
Example
MPI Profiling
Profiling with the pbdPROF Package
Installing pbdPROF
Example
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
MPI Profiling
Example
http://r-pbd.org/tutorial
ppbbddR
R Core Team
121/131
MPI Profiling
Example
Example Script
my svd.r
1
2
3
4
5
6
7
n <- 1000
x <- ddmatrix ( " rnorm " , n , n )
8
9
10
11
12
finalize ()
http://r-pbd.org/tutorial
ppbbddR
R Core Team
122/131
MPI Profiling
Example
Example Script
Run example with 4 ranks:
$ mpirun - np 4 Rscript my _ svd . r
mpiP :
mpiP : mpiP : mpiP V3 .3.0 ( Build Sep 23 2013 / 14:00:47)
mpiP : Direct questions and errors to
mpip - help@lists . sourceforge . net
mpiP :
Using 2 x2 for the default grid size
mpiP :
mpiP : Storing mpiP output in [. / R .4.5944.1. mpiP ].
mpiP :
http://r-pbd.org/tutorial
ppbbddR
R Core Team
123/131
MPI Profiling
Example
library ( pbdPROF )
prof . data <- read . prof ( " R .4.28812.1. mpiP " )
http://r-pbd.org/tutorial
ppbbddR
R Core Team
124/131
MPI Profiling
Example
Generate plots
1
http://r-pbd.org/tutorial
ppbbddR
R Core Team
125/131
MPI Profiling
Example
Generate plots
1
http://r-pbd.org/tutorial
ppbbddR
R Core Team
126/131
MPI Profiling
Example
Generate plots
1
http://r-pbd.org/tutorial
ppbbddR
R Core Team
127/131
MPI Profiling
Example
Generate plots
1
http://r-pbd.org/tutorial
ppbbddR
R Core Team
128/131
MPI Profiling
Example
Generate plots
1
http://r-pbd.org/tutorial
ppbbddR
R Core Team
129/131
MPI Profiling
10
Summary
MPI Profiling
Profiling with the pbdPROF Package
Installing pbdPROF
Example
Summary
http://r-pbd.org/tutorial
ppbbddR
R Core Team
MPI Profiling
Summary
Summary
pbdPROF offers tools for profiling R-using MPI codes.
Easily builds fpmpi; also supports mpiP.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
130/131
Wrapup
Contents
11
Wrapup
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Wrapup
Summary
Profile your code to understand your bottlenecks.
pbdR makes distributed parallelism with R easier.
Distributing data to multiple nodes
For truly large data, I/O must be parallel as well.
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Wrapup
http://r-pbd.org/tutorial
ppbbddR
R Core Team
Questions?
http://r-pbd.org/
Come see our poster on Wednesday at 5:30!
http://r-pbd.org/tutorial
ppbbddR
R Core Team