Professional Documents
Culture Documents
Presentation on
By:
ELSON D’SOUZA (1MS14IS035)
ESHWAR M. S. (1MS14IS132)
ARPITHA (1MS14IS019)
AGENDA
ØBIG DATA
ØHADOOP ECOSYSTEM
ØHDFS
ØMAP REDUCE
ØDEMO on BLUEMIX
BIG DATA
BIG DATA
SQO
MAHOUT OP
HU E
OOZIE
FLUME
PIG HIVE
MR IMPALA HBASE
HDFS
APPLICATIONS OF
HADOOP
ØData-intensive text processing
ØAssembly of large genomes
ØGraph mining
ØMachine learning and data mining
ØLarge scale social network analysis
•
USERS OF HADOOP
Hadoop
Distributed
File
HDFS
System
HDFS
mydata.txt
blk_1 64Mb
blk_2 64Mb
blk_3 22Mb
150 MB
HDFS
mydata.txt
blk_1 64Mb
blk_2 64Mb
blk_3 22Mb
150 MB
HDFS
ØHDFS is a file system written in Java based on
the Google’s GFS
ØProvides redundant storage for massive
amounts of data
ØFiles are split into blocks
ØBlocks are split across many machines at load
time
• Different blocks from the same file will be
stored on different machines
ØBlocks are replicated across multiple machines
ØThe Name Node keeps track of which blocks
make up a file and where they are stored
HDFS
ØWhen a client wants to retrieve data
• Communicates with the Name Node to
determine which blocks make up a file
and on which data nodes those blocks
are stored
• Then communicated directly with the data
nodes to read the data
ØHDFS works best with a smaller number
of large files
• Millions as opposed to billions of files
• Typically 100MB or more per file
NAME NODE
ØStores metadata for the files, like the
directory structure of a typical FS.
ØThe server holding the Name Node
instance is quite crucial, as there is only
one.
ØTransaction log for file deletes/adds, etc.
Does not use transactions for whole
blocks or file-streams, only metadata.
ØHandles creation of more replica blocks
when necessary after a Data Node failure
•
DATA NODE
Mumbai ₹ 45 + ₹ 68
Mumbai ₹ 45
Bangalore ₹ 64
Chennai ₹ 123
Delhi ₹ 75 Bangalore ₹ 64
Mumbai ₹ 68
...
…
Chennai ₹ 123
…
Delhi ₹ 75
MAP REDUCE
Mappers
Mum45
45 Mum45
Mum 45
Mum
Mum 45 Mum
Mum 45
45
Mum 45
Mum 45 Mum64
Bang 45
Reducers
MAP REDUCE
Mappers
Mum45
45 Mum45
Mum 45 Mum45 45 Mum
Mum
Mum
Mum 45 Mum
Mum 45
45 Mum45 45 Mum
Mum 45 Mum
Mum Mum45
Mum 45
Mum 45
Mum 45 Mum64
Bang 45 Mum
Delhi 36 Mum
Che 8545 4572
Mum
45
Che Mum
Mum 45
106
45
45
45
Reducers
Mum, Ban Che, Del
MAP REDUCE
Mappers
Intermediate Records
(KEY, VALUE)
Reducers
RESULTS
MAP REDUCE
Task Tracker
Job Tracker
MAP REDUCE
ØTask Tracker
• Keeps track of the performance of an
individual mapper or reducer
•
DEMO