You are on page 1of 8

Install hadoop in UBUNTU OS

sudo apt-get -y update


sudo apt-get -y upgrade
sudo apt-get -y update

First update all software

chandu@chandu-Inspiron-N5010:~$ sudo apt-get update


chandu@chandu-Inspiron-N5010:~$ sudo mkdir /usr/local/java
chandu@chandu-Inspiron-N5010:~$ sudo chmod 777 /usr/local/java
chandu@chandu-Inspiron-N5010:~$ cd Desktop
chandu@chandu-Inspiron-N5010:~Desktop$ cp

Install java
chandu@chandu-Inspiron-N5010:~$ sudo vim /etc/profile

JAVA_HOME=/usr/local/java/jdk1.8.0_111
JRE_HOME=$JAVA_HOME/jre
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
export JAVA_HOME
export JRE_HOME
export PATH

chandu@chandu-Inspiron-N5010:~$ sudo update-alternatives --install "/usr/bin/java" "java"


"/usr/local/java/jdk1.8.0_111/jre/bin/java" 1
sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.8.0_111/bin/javac" 1
sudo update-alternatives --set java /usr/local/java/jdk1.8.0_111/jre/bin/java
sudo update-alternatives --set javac /usr/local/java/jdk1.8.0_111/bin/javac

chandu@chandu-Inspiron-N5010:~$ . /etc/profile

chandu@chandu-Inspiron-N5010:~$ java version

java version "1.8.0_111"


Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

chandu@chandu-Inspiron-N5010:~$ jps
26824 Jps
chandu@chandu-Inspiron-N5010:~$ javac -version
javac 1.8.0_111
Installing java automatically from app store
chandu@chandu-Inspiron-N5010:~$ sudo apt-get install <java package name>
chandu@chandu-Inspiron-N5010:~$ java -version

openjdk version "1.8.0_111"


OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.10.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)

Path of java
chandu@chandu-Inspiron-N5010:~$ gedit ~/.bashrc
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
PATH=$PATH:$JAVA_HOME/bin

Create separate user for hadoop


chandu@chandu-Inspiron-N5010:~$ sudo su
chandu@chandu-Inspiron-N5010:~$ whoami

Create group
root@chandu-Inspiron-N5010:Home/chandu# sudo addgroup hadoop
Create user
root@chandu-Inspiron-N5010:Home/chandu# sudo adduser hduser
Add hduser to group hadoop
root@chandu-Inspiron-N5010:Home/chandu# sudo adduser hduser hadoop
Add hduser to sudoer list
root@chandu-Inspiron-N5010:Home/chandu# sudo visudo
#allow members of group sudo to execute any command
hduser ALL=(ALL)ALL
ctrl+x
Yes and enter

Logout your system and logon to hduser


First, make sure that SSH is installed and a server is running. On Ubuntu, for example,
this is achieved with:
sudo apt-get install ssh
Then, to enable passwordless login, generate a new SSH key with an empty passphrase:

Install ssh server on your computer


-----------------------ssh key is used for physically login to another machine.
hduser@chandu-Inspiron-N5010:~$ sudo apt-get install openssh-server
hduser@chandu-Inspiron-N5010:~$ sudo apt-get install vim

Generate key
hduser@chandu-Inspiron-N5010:~$ ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa
hduser@chandu-Inspiron-N5010:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
hduser@chandu-Inspiron-N5010:~$ chmod 700 $HOME/.ssh/authorized_keys
If the SSH connect should fail, these general tips might help:
Enable debugging with ssh -vvv localhost and investigate the error in detail.
Check the SSH server configuration in /etc/ssh/sshd_config, in particular the options
PubkeyAuthentication (which should be set to yes) and AllowUsers (if this option is
active, add the hduser user to it). If you made any changes to the SSH server
configuration file, you can force a configuration reload with sudo /etc/init.d/ssh reload.

hduser@chandu-Inspiron-N5010:~$ sudo /etc/init.d/ssh restart


----------------------testing ssh is working properly or not (Test that you can connect with:)
hduser@chandu-Inspiron-N5010:~$ ssh localhost
If successful, you should not have to type in a password.

The fallow message must be appear in console


The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by


applicable law.

To disable IPv6 on Ubuntu LTS, open


hduser@chandu-Inspiron-N5010:~$ sudo vim /etc/sysctl.conf
press i-------------------------------------for insert paste at end of the document
# disable ipv6
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.lo.disable_ipv6=1
esc------------------------------------------
:wq-----------------------------------------write quit

hduser@chandu-Inspiron-N5010:~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6


0
Reboot your system it will show 1
hduser@chandu-Inspiron-N5010:~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
1
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

-Hadoop Installation
hduser@chandu-Inspiron-N5010:~$ cd ~/Desktop
hduser@chandu-Inspiron-N5010:~/Desktop$ sudo mv hadoop-2.7.3.tar.gz /usr/local/
hduser@chandu-Inspiron-N5010:~$ cd /usr/local/
hduser@chandu-Inspiron-N5010:/usr/local/$ sudo tar -xvzf hadoop-2.7.3.tar.gz
hduser@chandu-Inspiron-N5010:/usr/local/$ sudo ln -s hadoop-2.7.3 hadoop
hduser@chandu-Inspiron-N5010:/usr/local/$ sudo chown -R hduser:hadoop hadoop-2.7.3
/usr/local/hadoop
Changing permission to the folder
hduser@chandu-Inspiron-N5010:/usr/local/$ sudo chmod 777 hadoop-2.7.3
hduser@chandu-Inspiron-N5010:~$ cd /usr/local/hadoop/etc/hadoop/
hduser@chandu-Inspiron-N5010:/usr/local/hadoop/etc/hadoop$
vim hadoop-env.sh
press i
/usr/local/java/jdk1.8.0_111--------------------------------java path

export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export HADOOP_HOME_WARN_SUPPRESS="TRUE"
export JAVA_HOME=/usr/local/java/jdk1.8.0_111
LD_LIBRARY_PATH=/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH

Esc
:wq
hduser@chandu-Inspiron-N5010:~$ vim ~/.bashrc

#HADOOP VARIABLES START


export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
#HADOOP VARIABLES END

# Native Path
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later
on)

export JAVA_HOME=/usr/local/java/jdk1.8.0_111

# Some convenient aliases and functions for running Hadoop-related commands

unalias fs &> /dev/null


alias fs="hadoop fs"
unaliash ls &> /dev/null
aliash ls="fs -ls"

# If you have LZO compression enabled in your Hadoop cluster and


# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
#lzohead () {
# hadoop fs -cat $1 | lzop -dc | head -1000 | less
#}
Close terminal and reload it

hduser@laptop:~$ sudo mkdir -p /app/hadoop/tmp


hduser@laptop:~$ sudo chown hduser:hadoop /app/hadoop/tmp
hduser@laptop:~$ sudo chmod -R 777 /app/hadoop/tmp
hduser@laptop:~$ cd /usr/local/hadoop/etc/hadoop/

vim yarn-site.xml

<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>

<property>
<name>yarn.nodemanager.aux-servies</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

hduser@laptop:~$ cd /usr/local/hadoop/etc/hadoop/

vim core-site.xml

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:53410</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
theFileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>

hduser@laptop:~$ cd /usr/local/hadoop/etc/hadoop/
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml

<property>
<name>mapreduse.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job
tracker runs at. If "local", then jobs are run in-process as a
single map
and reduce task.
</description>
</property>

sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode


sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
sudo chown -R hduser:hadoop /usr/local/hadoop_store

hduser@laptop:~$ cd /usr/local/hadoop/etc/hadoop/

vim hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>

Formatting the HDFS filesystem


Before HDFS can be used for the first time, the filesystem must be formatted. This is
done by running the following command:

hadoop namenode -format


Starting and stopping the daemons
To start the HDFS, YARN, and MapReduce daemons, type:

start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
jps

Stopping the daemons is done as follows:

mr-jobhistory-daemon.sh stop historyserver


stop-yarn.sh
stop-dfs.sh

Creating a user directory


Create a home directory for yourself by running the following:
hadoop fs -mkdir -p /user/$USER
hdfs dfs -mkdir - /user/hduser

namenode 50070
resourcemanager 8088
mapreducer 19888

Pig installation
hduser@chandu-Inspiron-N5010:~/Desktop$ sudo mv pig-0.16.0.tar.gz /usr/local/
hduser@chandu-Inspiron-N5010:~$ cd /usr/local/
hduser@chandu-Inspiron-N5010:/usr/local/$ sudo tar -xvzf pig-0.16.0.tar.gz
hduser@chandu-Inspiron-N5010:/usr/local/$ sudo ln -s pig-0.16.0 pig

Add pig path


chandu@chandu-Inspiron-N5010:~$ nano ~/.bashrc

export PIG_HOME= /usr/local/pig


export PIG_CONF_DIR= $PIG_HOME/conf
export PIG_CLASSPATH= $PIG_CONF_DIR
export PATH= $PIG_HOME/bin:$PATH

Hive installation
hduser@chandu-Inspiron-N5010:~/Desktop$ sudo mv apache-hive-2.1.0-bin.tar.gz /usr/local/
hduser@chandu-Inspiron-N5010:~$ cd /usr/local/
hduser@chandu-Inspiron-N5010:/usr/local/$ sudo tar -xvzf apache-hive-2.1.0-bin.tar.gz
hduser@chandu-Inspiron-N5010:/usr/local/$ sudo ln -s apache-hive-2.1.0-bin hive
Add hive path
export HIVE_HOME= usr/local/hive
export PATH= $PATH:$ HIVE_HOME/bin

On all daemons
---------------------This will create warehouse folder for us.
hdfs dfs mkdir p /user/hive/warehouse
hdfs dfs mkdir p /tmp/hive
hdfs dfs chmod 777 /tmp
hdfs dfs chmod 777 /user/hive/warehouse
hdfs dfs chmod 777 /tmp/hive

if you get following message


Delect the absolute log4j-slf4j-impl.jar as we same jar file provided by
hadoop and we will use hadoop given jar.
sudo rm /home/hduser/ecosystem/hive/lib/log4j-slf4j-impl.jar

Sudo rm /home/hduser/ecosystem/hive/lib/log4j-slf4j-impl.jar
Initialize the database to be used with hive
From 2.1.0 version onward we have to tell which type of database we are
using

Change directory to cd /usr/local


Schematool iniSchema dbType derby

You might also like