You are on page 1of 6

Hadoop Developer Course: Hadoop Developer Duration: 4 Days of Training Many enterprises, in the modern IT industry, are forced

to process and distribute large amounts of data on a regular basis. Basic database management systems and tools become ineffective in dealing with processing and storing such large amounts of data. Knowledge and expertise in dealing with Big Data management applications has become a necessity within the IT industry. CloudAce offers two separate Big Data Training programs that focus around Apaches Hadoop platform, This is a 30 hours instructor lead developer training course delivers the key concepts and expertise necessary to create robust data processing applications using Apache Hadoop.Through lecture and interactive, hands-on exercises, attendees will learn Hadoop and its ecosystem components. The training is desgined with a vendor neutral approach .However upon completion of the course, attendees can clear Hadoop developer certification from Cloudera or from HortonWorks. Certification is a great differentiator; it helps establish individuals as leaders in their field, providing customers with tangible evidence of skills and expertise. About Our Trainers By participating in our Big Data Training programs, you will be placed under the guidance of a certified cloud computing professional that has worked with us as a Technical Lead for over 9 years, dealing extensively with Big Data analytics, development, and implementation. Our trainer holds the Hadoop developer and Hadoop administrator certifications, also boasting a wealth of teaching experience. Our trainer also has intensive hands on experience in the implementation of algorithms like decision trees, support vector machines, random forest, nave bayees, neural networks, genetic algorithm, conjoint analysis, principal component analysis, etc. Hadoop Developer Training Our Hadoop Developer Big Data Training program consists of a total of 14 modules that detail the platforms functionalities, advantages and drawbacks. Participants will benefit from an in-depth understanding of the Apache Hadoop platform and will come to learn how to program and tune the program to perform relevant analytics. Participants will learn how to setup Hadoop clusters and also be introduced to common and advanced algorithms and programs. The program also covers the various components of Hadoop Ecosystem extensively.
CLOUDACE TECHNOLOGIES, Regus Solitaire Business Centre (Hyderabad) Pvt Ltd, 4th Floor, Gumidelli Commercial Complex, 1-10-39 to 44, Old Airport Road, Begumpet, Hyderabad - 500016. www.cloudace.in

The duration of the program is 30 hours, completed over the course of 4 days The Hadoop Developer training program will be conducted in a classroom. The fee for the Hadoop Developer tutorial is 24,000 INR, exclusive of service taxes. Upon completion of this training, successful participants will receive the certification of Hadoop Developer.
The agenda for the course is outlined below

Module 1 : Big Data An Overview

o o o o o o o o o o o o

What is Cloud Computing What is Grid Computing What is Virtualization How above three are inter-related to each other What is Big Data Introduction to Analytics and the need for big data analytics Hadoop Solutions - Big Picture Hadoop distributions Comparing Hadoop Vs. Traditional systems Volunteer Computing Data Retrieval - Radom Access Vs. Sequential Access NoSQL Databases

Module 2 : The Motivation of Hadoop

o o

Problems with traditional large-scale systems Requirements for a new approach

Module 3 : Hadoop Basic Concepts

o o o o

What is Hadoop? The Hadoop Distributed File System How MapReduce Works Anatomy of a Hadoop Cluster

Module 4 : Hadoop Demons

o o o o o

Namenode Datanode Secondary namenode Job tracker Task tracker

CLOUDACE TECHNOLOGIES, Regus Solitaire Business Centre (Hyderabad) Pvt Ltd, 4th Floor, Gumidelli Commercial Complex, 1-10-39 to 44, Old Airport Road, Begumpet, Hyderabad - 500016. www.cloudace.in

Module 5 : Hadoop in Detail

o o o o o

Blocks and Splits Replication Data high availability Data Integrity Cluster architecture and block placement

Module 6 : Programming Practices and Performance Tuning

Developing MapReduce Programs in Local Mode Pseudo-distributed Mode Fully distributed mode

Module 7 : Writing a MapReduce Program

o o o o o o

Examining a Sample MapReduce Program Basic API Concepts The Driver Code The Mapper The Reducer Hadoop's Streaming API

Module 8 : Setup Hadoop Cluster

o o o o o o

Install and configure Apache Hadoop Make a fully distributed Hadoop cluster on a single laptop/desktop Install and configure Cloudera Hadoop distribution in fully distributed mode Install and configure Horton Works Hadoop distribution in fully distributed mode Monitoring the cluster Getting used to management console of Cloudera and Horton Works

Module 9 : Delving Deeper Into the Hadoop API

o o o o o o o

Using Combiners The configure and close Methods SequenceFiles Partitioners Counters Directly Accessing HDFS ToolRunner

CLOUDACE TECHNOLOGIES, Regus Solitaire Business Centre (Hyderabad) Pvt Ltd, 4th Floor, Gumidelli Commercial Complex, 1-10-39 to 44, Old Airport Road, Begumpet, Hyderabad - 500016. www.cloudace.in

Using The Distributed Cache

Module 10 : Common MapReduce Algorithms

o o o o o o

Sorting and Searching Indexing Classification/Machine Learning Term Frequency - Inverse Document Frequency Word Co-Occurrence Hands-On Exercise: Creating an Inverted Index

Module 11 : Debugging MapReduce Programs

o o o

Testing with MRUnit Logging Other Debugging Strategies

Module 12 : Advanced MapReduce Programming

o o o o o o o

A Recap of the MapReduce Flow Custom Writables and WritableComparables The Secondary Sort Creating InputFormats and OutputFormats Pipelining Jobs With Oozie Map-Side Joins Reduce-Side Joins

Module 13 : Monitoring and Debugging on Production Cluster

o o o

Counters Skipping Bad Records Rerunning Failed tasks with Isolation Runner

Module 14 : Tuning For Performance

o o o o o o o

Reducing network traffic with combiner Reducing the amount of input data Using Compression Reusing the JVM Running with speculative execution Refactoring code and rewriting algorithms Parameters affecting Performance Other Performance Aspects

CLOUDACE TECHNOLOGIES, Regus Solitaire Business Centre (Hyderabad) Pvt Ltd, 4th Floor, Gumidelli Commercial Complex, 1-10-39 to 44, Old Airport Road, Begumpet, Hyderabad - 500016. www.cloudace.in

Hadoop Ecosystem covered as part of Hadoop Developer

Eco system component: HBase

o o o

Hbase concepts Install and configure hbase on cluster Create database, Develop and run sample applications

Eco system component:ZooKeeper

o o o

ZooKeeper concepts Install and configure ZooKeeper Use ZooKeeper for cluster maintenance

Eco system component: Hive

o o o o

Hive concepts Install and configure hive on cluster Create database, access it console Develop and run sample applications in Java/Python to access hive

Eco system component: Sqoop

o o

Install and configure sqoop on cluster Import data from Oracle/Mysql to hive

Eco system component: PIG

o o

Install and configure PIG Write sample Pig Latin scripts

Eco system component: Flume and Chukwa

o o o

Flume and Chukwa concepts Install and configure flume on cluster Create a sample application to capture logs from Apache using flume

Overview of other Eco system component:

Oozie, Avro, Thrift, Rest, Mahout, Cassandra, YARN, MR2 etc.

Analytics Basics

o o o o

Analytics and big data analytics Commonly used analytics algorithms Analytics tools like R and Weka Mahout

CLOUDACE TECHNOLOGIES, Regus Solitaire Business Centre (Hyderabad) Pvt Ltd, 4th Floor, Gumidelli Commercial Complex, 1-10-39 to 44, Old Airport Road, Begumpet, Hyderabad - 500016. www.cloudace.in

Training Duration Course Fee

- 4 Days classroom Training - 24,000 INR + Service Taxes per Participant ( excludes Exam Fees)

CLOUDACE TECHNOLOGIES, Regus Solitaire Business Centre (Hyderabad) Pvt Ltd, 4th Floor, Gumidelli Commercial Complex, 1-10-39 to 44, Old Airport Road, Begumpet, Hyderabad - 500016. www.cloudace.in

You might also like