You are on page 1of 7

VERSION 1.0.

0
NOVEMBER 2, 2016

HADOOP SETUP ON UBUNTU


HADOOP 2.7.3, UBUNTU 15.10

D BESCHI ANTONY

HADOOP SETUP ON UBUNTU


OBJECTIVE
The objective of this document is to guide the user to setup a single node Hadoop environment.

TARGET AUDIENCE
Any nave linux user who knows how to install softwares. Users who wants to setup their own Hadoop
environment.

STEPS
INSTALL JAVA AND SETUP JAVA HOME
Update the installer
$sudo apt-get update
Install Open JDK
$sudo apt-get install default-jdk
Installation Verification

Update JAVA_HOME
$update-alternatives --config java

ADD A DEDICATED HADOOP USER AND GROUP


Create group and user

11/2/2016

Hadoop setup on Ubuntu

Add user user to sudo user group

INSTALL SSH AND CREATE CERTIFICATES


$sudo apt-get install ssh
$su hdpuser
Create ssh Key

11/2/2016

Hadoop setup on Ubuntu

Check if ssh works

INSTALL HADOOP
$wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
$tar -xvzf hadoop-2.7.3.tar.gz
$cd hadoop-2.7.3
$mkdir -p /usr/local/Hadoop
$sudo mv * /usr/local/Hadoop
$sudo chown -R hdpuser:hadoop /usr/local/hadoop

UPDATE HADOOP CONFIG FILES


The Following files are going to be updated

~/.bashrc
/usr/local/hadoop/etc/hadoop/hadoop-env.sh
/usr/local/hadoop/etc/hadoop/hdfs-site.xml
/usr/local/hadoop/etc/hadoop/core-site.xml
/usr/local/hadoop/etc/hadoop/mapred-site.xml.template

11/2/2016

Hadoop setup on Ubuntu

Add the following at the end of the ~/.bashrc


#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END
$source .bashrc

Update the Hadoop-env.sh file with JAVA_HOME

Update the hdfs-site.xml with given xml after creating the following directories

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is
created.
The default is used if replication is not specified in create
time.</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>

11/2/2016

Hadoop setup on Ubuntu

Update the core-site.xml by placing the given xml between <configuration> tag, after creating the
following directories

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>

Create mapred-site.xml from mapred-site.xml.template and update it with the given xml.
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>

FORMAT HADOOP FILE SYSTEM


$hadoop namenode -format
Sample output

11/2/2016

Hadoop setup on Ubuntu

START HADOOP

STOP HADOOP

REFERENCES
http://hadoop.apache.org/
http://www.scratchtoskills.com/install-hadoop-2-7-2-on-ubuntu-15-10-single-node-cluster/

11/2/2016

Hadoop setup on Ubuntu

You might also like