You are on page 1of 28

Best IT Training Institute in

Hyderabad
Office Address:
Flat No 204, Annapurna
Block
Aditya Enclave, Ameerpet
Hyderabad - 500038
Telangana, India.

Quick Contact:
info@OrienIT.com
+91 040 6514 2345
+91 970 320 2345

http://www.orienit.com/
HADOOP
Course
urse Details :
urse Duration 55 Days
Content
ainer: Kalyan
ainer Profile: SeniorBigDataTrainer At ORIENhttp://www.orienit
IT
ode Of Training: Both Online/Classroom .com/
1.Introduction to Big Data and Hadoop
Big Data
o What is Big Data?
o Why all industries are talking about Big Data?
o What are the issues in Big Data?
Storage
What are the challenges for storing big data?
Processing
What are the challenges for processing big data?

o What are the technologies support big data?


Hadoop
Data Bases
Traditional
NO SQL HADOO
Hadoop
o What is Hadoop? P
o History of Hadoop
o Why Hadoop?
o Hadoop Use cases
o Advantages and Disadvantages of Hadoop
Importance of Different Ecosystems of Hadoop
http://www.orienit.com/
Importance of Integration with other Big Data solutions
Big Data Real time Use Cases
2.HDFS (Hadoop Distributed File System)

HDFS architecture
Name Node
Importance ofName Node
What are the roles ofName Node
What are the drawbacks in Name Node

Secondary Name Node


Importance ofSecondary Name Node
What are the roles ofSecondary Name Node
What are the drawbacks in Secondary Name Node
Data Node HADOO
Importance ofData Node
What are the roles ofData Node
What are the drawbacks in Data Node
P
Data Storage in HDFS
o How blocks are storing in DataNodes
o How replication works in Data Nodes
o How to write the files in HDFS
o How to read the files in HDFS http://www.orienit.com/
HDFS Block size
o Importance of HDFS Block size
o Why Block size is so large?
o How it is related to MapReduce split size

HDFS Replication factor


o Importance of HDFS Replication factor in production environment
o Can we change the replication for a particular file or folder
o Can we change the replication for all files or folders

Accessing HDFS
o CLI(Command Line Interface) using hdfs commands
o Java Based Approach

HDFS Commands HADOO


o Importance of each command are
o How to execute the command P
o Hdfs admin related commands explanation

Configurations
o Can we change the existing configurations of hdfs or not?
http://www.orienit.com/
o Importance of configurations
How to overcome the Drawbacks in HDFS
o Name Node failures
o Secondary Name Node failures
o Data Node failures
Where does it fit and Where doesnt fit?
Exploring the Apache HDFS Web UI

How to configure the Hadoop Cluster


o How to add the new nodes ( Commissioning )
o How to remove the existing nodes ( De-Commissioning )
o How to verify the Dead Nodes
o How to start the Dead Nodes
HADOO
Hadoop 2.x.x version features
o Introduction to Namenode federation P
o Introduction to Namenode High Availabilty with NFS
o Introduction to Namenode High Availabilty with QJM

Difference between Hadoop 1.x.x and Hadoop 2.x.x


versions http://www.orienit.com/
3.MAPREDUCE
Map Reduce architecture
o JobTracker
Importance of JobTracker
What are the roles of JobTracker
What are the drawbacks in JobTracker
o TaskTracker
Importance of TaskTracker
What are the roles of TaskTracker
What are the drawbacks in TaskTracker
o Map Reduce Job execution flow
Data Types in Hadoop
o What are the Data types in Map Reduce
o Why these are importance in Map Reduce
o Can we write custom Data Types in MapReduce HADOO
Input Format's in Map Reduce
o Text Input Format P
o Key Value Text Input Format
o Sequence File Input Format
o NLine Input Format
o Importance of Input Format in Map Reduce
o How to use Input Format in Map Reduce http://www.orienit.com/
o How to write custom Input Format's and its Record Readers
Output Format's in Map Reduce
o Text Output Format
o Sequence File Output Format
o Importance of Output Format in Map Reduce
o How to use Output Format in Map Reduce
o How to write custom Output Format's and its Record Writers
Mapper utput Format's in Map Reduce
o What is mapper in Map
TextReduce
OutputJobFormat
o Why we need mapper?
Sequence File Output Format
o What are the Advantages and Disadvantages of mapper
Importance of Output Format in
o Writing mapper programs
Reducer Map Reduce
How to use Output Format in Map
HADOO
o What is reducer in Map Reduce Job
o Why we need reducer Reduce
?
How to
o What are the Advantages andwrite custom Output
Disadvantages of reducer
Format's and its Record Writer
o Writing reducer programs
Combiner
P
o What is combiner in Map Reduce Job
o Why we need combiner?
o http://www.orienit.com/
What are the Advantages and Disadvantages of Combiner
o Writing Combiner programs
Partitioner
o What is Partitioner in Map Reduce Job
o Why we need Partitioner?
o What are the Advantages and Disadvantages of Partitioner
o Writing Partitioner programs

Distributed Cache utput Format's in Map Reduce


o What is Distributed Cache
Text Output FormatJob
in Map Reduce
o Importance of Distributed CacheFile
Sequence in Map Reduce
Output job
Format
o What are the Advantages and Disadvantages of Distributed Cache
Importance of Output Format in
o Writing Distributed Cache programs
Map Reduce
How to use Output Format in Map
Counters

o Why we need Counters


Reduce
o What is Counter in Map Reduce Job
How to write custom Output
in production environment?
HADOO
o How to Write CountersFormat's and its programs
in Map Reduce Record Writer
P
Importance of Writable and Writable Comparable Apis
o How to write custom Map Reduce Keys using Writable
o How to write custom Map Reduce Values using Writable Comparable

http://www.orienit.com/
Joins
o Map Side Join
What is the importance of Map Side Join
Where we are using it
o Reduce Side Join
What is the importance of Reduce Side Join
Where we are usingFormat's
utput it in Map Reduce
o What is the difference between Map Side join and Reduce Side Join?
Text Output Format
Compression techniques
Sequence File Output Format
o Importance of Compression techniques in production environment
Importance of Output Format in
o Compression Types
NONE, RECORD and Map Reduce
BLOCK
o Compression Codecs How to use Output Format in Map
Reduce
Default, Gzip, Bzip, Snappy and LZO
How
o Enabling and Disabling to write
these custom
techniques Output
for all the Jobs
Format's
o Enabling and Disabling
Map Reduce Schedulers
and its Record
these techniques Writer Job
for a particular
HADOO
P
oFIFO Scheduler
oCapacity Scheduler
oFair Scheduler
oImportance of Schedulers in production environment
oHow to use Schedulers in production environment http://www.orienit.com/
Map Reduce Programming Model
o How to write the Map Reduce jobs in Java
o Running the Map Reduce jobs in local mode
o Running the Map Reduce jobs in pseudo mode
o Running the Map Reduce jobs in cluster mode

Debugging Map utput Reduce Format's


Jobs in Map Reduce
o How to debug Map ReduceText Jobs
Output Format
in Local Mode.
o How to debug Map ReduceSequence
Jobs in File Output
Remote Format
Mode.
Importance of Output Format in
Data Locality Map Reduce
o What is Data Locality? How to use Output Format in Map
o Will Hadoop follows Data Locality?
Reduce HADOOP
How to write custom Output
Speculative Execution Format's and its Record Writer
o What is Speculative Execution?
o Will Hadoop follows Speculative Execution?

Map Reduce Commands


o Importance of each command
o How to execute the command
http://www.orienit.com/
o Mapreduce admin related commands explanation
Configurations
o Can we change the existing configurations of mapreduce or not?
o Importance of configurations
Writing Unit Tests for Map Reduce Jobs
Configuring hadoop development environment using Eclipse
Use of Secondary Sorting and how to solve using MapReduce
How to Identify Performance Bottlenecks in MR jobs and tuning MR jobs.
Map Reduce Streaming utput Format's
and Pipes in Map Reduce
with examples
Exploring the MapReduce WebText UI
Output Format
Sequence File Output Format
4.YARN (Next Generation Importance of Output Format in
Map Reduce)
What is YARN? Map Reduce
What is the importance of How
YARN?to use Output Format in Map
Where we can use the concept of YARN in Real Time & it's powered
Reduce
projects How to write custom Output
What is difference between YARN and Map Reduce
HADOOP
Format's and its Record Writer
Yarn Architecture
1. 1.Importance ofResource Manager
2. Importance ofNode Manager
3. Importance ofApplication Manager
4. Yarn Application execution flow
Installing YARN on both windows & Linux
Exploring the YARN Web UI http://www.orienit.com/
Examples on YARN
5.Apache PIG
Introduction to Apache Pig
Map Reduce Vs Apache Pig
SQL Vs Apache Pig
Different data types in Pig
Modes Of Execution in Pig
o Local Mode utput Format's in Map Reduce
o Map Reduce Mode Text Output Format
Execution Mechanism Sequence File Output Format
o Grunt Shell
Importance of Output Format in
o Script
o Embedded
Map Reduce
UDF's How to use Output Format in Map
Reduce
o How to write the UDF's
How
in Pig
to write custom Output
o How to use the UDF's in Pig HADOOP
o Importance of UDF'sFormat's
in Pig and its Record Writer
Filter's
o How to write the Filter's in Pig
o How to use the Filter's in Pig
o Importance of Filter's in Pig

http://www.orienit.com/
Load Functions
o How to write the Load Functions in Pig
o How to use the Load Functions in Pig
o Importance of Load Functions in Pig
Store Functions
o How to use the Store Functions in Pig
o Importance of Store Functions in Pig
utput Format's in Map Reduce
Transformations in Pig
How to write the complexText Output Format
pig scripts
How to integrate the PigSequence
and Hbase File Output Format
Importance of Output Format in
6.Apache HIVE Map Reduce
Hive Introduction How to use Output Format in Map
Hive architecture Reduce
o Driver How to write custom Output HADOO
P
o Compiler Format's and its Record Writer
o Optimizer
o Semantic Analyzer
Hive Query Language(Hive QL)
SQL VS Hive QL
Hive Installation and Configuration
Hive DLL and DML Operations
http://www.orienit.com/
Hive Services
o CLI
o Hiveserver
o Hwi

Metastore
o embedded metastore configuration
utput
o external metastore Format's in Map Reduce
configuration
Text Output Format
UDF's Sequence File Output Format
Importance
o How to write the UDF's in Hive of Output Format in
o How to use the UDF'sMap Reduce
in Hive
o Importance of UDF's in
HowHive
to use Output Format in Map
UDAF's
Reduce
How to write custom Output
o How to use the UDAF's in Hive
HADOO
o
Format's and its Record Writer
Importance of UDAF's in Hive P
UDTF's
o How to use the UDTF's in Hive
o Importance of UDTF's in Hive
How to write a complex Hive queries
What is Hive Data Model? http://www.orienit.com/
Partitions
o Importance of Hive Partitionsin production
environment
o Limitations of Hive Partitions
o How to write Partitions

Buckets
utput Format's in Map Reduce
o Importance of Hive Bucketsin production
Text Output Format
environment Sequence File Output Format
o How to write Buckets
Importance of Output Format in
SerDe Map Reduce
o Importance of HiveHow to use Output
SerDe'sin Format in Map
production
Reduce
environment
o How to write SerDeHow to write custom Output
programs HADOO
How to integrate the Format's
Hive and and its Record Writer
Hbase
P
7.Cloudera Impala
Introduction to Impala
Impala Examples

8.Apache Zookeeper http://www.orienit.com/


Introduction to zookeeper
9.Apache HBase
HBase introduction
HBase use cases
HBase basics
o Importane of Column families
o Basic CRUD operations
create utput Format's in Map Reduce
scan / get Text Output Format
put Sequence File Output Format
delete / drop Importance of Output Format in
o Bulk loading in Hbase
Map Reduce
HBase installationHow to use Output Format in Map
o Local mode
o Psuedo mode
Reduce
How to write custom Output HADOO
o Cluster mode
HBase Architecture P
Format's and its Record Writer

o HMaster
o HRegionServer
o Zookeeper
Mapreduce integration
o Mapreduce over HBase http://www.orienit.com/
10.Apache Phoenix
Introduction to Phoenix
Installing Phoenix
Integrating with Hbase
Comparing Hbase & Phoenix
Practice on Phoenix examples

11.Apache Cassandra
Introduction to Cassandra
Installing Cassandra
Practice on Cassandra examples

12.MongoDB
Introduction to MongoDB
Installing MongoDB HADOO
Practice on MongoDB examples
P
13.Apache Drill
Introduction to Drill
Installing Drill
Practice on Drill examples

http://www.orienit.com/
14.Apache SQOOP
Introduction to Sqoop
MySQL client and Server Installation
Sqoop Installation
How to connect to Relational Database using Sqoop
Examples on Import and Export Sqoop commands

15.Apache FLUME
Introduction to flume
Flume installation
Flume Architecture
o Agent
o Sources
o Channels
o Sinks HADOO
Practice on Flume examples
P
16.Apache Kafka
Introduction to Kafka
Installing Kafka
Practice on Kafka examples

http://www.orienit.com/
17.Apache Spark
Introduction to Spark
Installing Spark
Spark Architecture
Introduction to Spark Components
o Spark Core
o Spark SQL
o Spark Streaming
o Spark MLLib
o Spark GraphX
Practice on Spark examples
Spark and Hive interation

18.Apache OOZIE
Introduction to oozie
Oozie installation
HADOO
Executing different oozie workflow jobs
Monitering Oozie workflow jobs P
19.Real Time Big Data Projects
We willl be sharingEnd-to-End Big Data Projects
We are providingBig Data Project Practice on Our Lab
We are providingImportantRecorded Videos on Our
YouTube Channel http://www.orienit.com/
Any information search inGoogle / YouTubeby keyword
20.Pre-Requisites for this Course
Java Basics like OOPS Concepts, Interfaces, Classes and
Abstract Classes etc (Free Java classes as part of course)
SQL Basic Knowledge ( Free SQL classes as part of course)
Linux Basic Commands (Provided in our blog)

Administration topics:
Hadoop Installations (Windows & Linux)
o Local mode (hands on installation on ur laptop)
o Pseudo mode (hands on installation on ur laptop)
o Cluster mode (hands on 40+node cluster setup
in our lab)
HADOO
o Nodes Commissioning and De-commissioning in
Hadoop Cluster
P
o Jobs Monitoring in Hadoop Cluster
o Fair Scheduler (hands on installation on ur laptop)
o Capacity Scheduler (hands on installation on ur laptop)

http://www.orienit.com/
Hive Installations
o Local mode (hands on installation on ur laptop)
With internal Derby
o Cluster mode (hands on installation on ur laptop)
With external Derby
With external MySql
o Hive Web Interface (HWI) mode (hands on installation on ur
laptop)
o Hive Thrift Server mode (hands on installation on ur laptop)
o Derby Installation (hands on installation on ur laptop)
o MySql Installation (hands on installation on ur laptop)

Pig Installations
o Local mode (hands on installation on ur laptop)
o Mapreduce mode (hands on installation on ur laptop)
HADOO
Hbase Installations
P
o Local mode (hands on installation on ur laptop)
o Psuedo mode (hands on installation on ur laptop)
o Cluster mode (hands on installation on ur laptop)
With internal Zookeeper
With external Zookeeper
http://www.orienit.com/
Zookeeper Installations
o Local mode (hands on installation on ur laptop)
o Cluster mode (hands on installation on ur laptop)

Sqoop Installations
oSqoop installation with MySql (hands on installation on ur
laptop)
oSqoop with hadoop integration (hands on installation on ur
laptop)
oSqoop with hive integration (hands on installation on ur
laptop)
oSqoop with hbase integration (hands on installation on ur
laptop)
HADOO
Flume Installation
o Psuedo mode (hands on installation on ur laptop) P
Oozie Installation
o Psuedo mode (hands on installation on ur laptop)

Advanced Technologies Installations


o Spark http://www.orienit.com/
o Cassandra
Cloudera Hadoop Distribution installation

HortonWorks Hadoop Distribution


installation

21.ORIENIT Hadoop POC's Solution Class

22.Advanced and New technologies


architectural discussions
Spark / Flink (Real time data processing)
Storm / Kafka / Flume (Real time data streaming)
Cassandra / MongoDB (NOSQL database) HADOO
Solr (Search engine)
Nutch (Web Crawler)
Lucene (Indexing data)
P
Mahout (Machine Learning Algorithms)
Ganglia, Nagios (Monitoring tools)
Cloudera, Hortonworks, MapR, Amazon EMR
(Distributions)
How to crack the Cloudera / Hortonworks certification
questions http://www.orienit.com/
Cloudera Distribution
Introduction to Cloudera
Cloudera Installation
Cloudera Certification details
How to use cloudera hadoop
What are the main differences between Cloudera and Apache
hadoop

Hortonworks Distribution
Introduction to Hortonworks
Hortonworks Installation
Hortonworks Certification details
How to use Hortonworks hadoop
What are the main differences between Hortonworks and
Apache hadoop
HADOO
Amazon EMR
P
Introduction to Amazon EMR and Amazon EC2
How to use Amazon EMR and Amazon EC2
Why to use Amazon EMR and Importance of this

http://www.orienit.com/
Hadoop ecosystem Integrations:
Hive and Spark integration
Hive and HBase integration
Pig and HBase integration
Sqoop and RDBMS integration
Hbase and Phoenix integration
Flume and Phoenix integration
Kakfa and Phoenix integraion

Free Big Data Workshops:


Spark & Scala
Cassandra
MongoDB
Search engine & E-commerce solutions HADOO
Big Data Analytics (R, Mahout, Spark ML)
P

http://www.orienit.com/
23.What we are offering to you:
Hadoop installation on bothWindows & Linux
Free WeeklyOnline Hadoop Certification
Real Time Big Data projects will be shared
FreeBig Data Workshopsonnew & advanced technologies
Hands on MapReduce programming around20+programs these will make you to
perfect in MapReduce both concept-wise and programmatically
Hands on5 POC'swill be provided (These POC's will help you perfect in Hadoop
and it's ecosystems)
Hands on practical 40+Node hadoop cluster setup in our Lab.
Well documentedHadoop material with all the topics covering in the course
Well documentedHadoop blogcontains frequent interview questions along with
the answers and latest updates on Big Data technology.
Discussing abouthadoop interview questions & answersdaily base.
Resume preparationwith POC's or Project's based on your experience.

http://www.orienit.com/
Thank You
Office Address:
Flat No 204, Annapurna Block
Aditya Enclave, Ameerpet
Hyderabad - 500038
Telangana, India.
+91 040 6514 2345
+91 970 320 2345
info@OrienIT.com
http://www.orienit.com/

You might also like