Big data · Cloud · Debian · Hadoop · Open source

Hadoop Multi-Node Setup in Google Compute Engine

Setting Up GCE local access

Add ssh of your local machine to GCE Metadata

1. On Your local machine generate ssh key

$ sudo apt-get install ssh
$ ssh-keygen -t rsa -P ""
$ cat $HOME/.ssh/id_rsa.pub

2. copy the contents and replace yourusername@yourhostname with hadoop@yourhostname

3. copy the contents and paste it In google developers console (Compute engine->Metadata->SSH KEYS->Edit)

Google Developers Console

4.from your local machine you can able to access cloud vm using ssh hadoop@ip

Add Firewall Rules in default network:

Name Source i/p Range Protocols & Ports
default-allow-external  0.0.0.0/0  tcp:1-65535; udp:1-65535; icmp

Hadoop Cluster Information (2 node cluster)

Name Image Roles
MasterNode debian-7-wheezy-v20140619 Name Node,Secondary NameNode,JobTracker
SlaveNode1 debian-7-wheezy-v20140619 DataNode,TaskTracker
SlaveNode2 debian-7-wheezy-v20140619 DataNode,TaskTracker

Setting JDK 1.7(On all machines)

$ tar xzf jdk-7u51-linux-x64.tar.gz
$ sudo mv jdk1.7.0_51 /opt/
$ vim .bashrc
export JAVA_HOME=/opt/jdk1.7.0_51
export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
$ source .bashrc
$ java -version

 

Setting Hadoop Package (On all machines)

$ wget http://www.bizdirusa.com/mirrors/apache/hadoop/common/stable1/hadoop-1.2.1.tar.gz
$ tar xzf hadoop-1.2.1.tar.gz
$ cd hadoop-1.2.1

conf/hadoop-env.sh :

Uncomment the line export JAVA_HOME= and replace with

export JAVA_HOME=/opt/jdk1.7.0_51

conf/core-site.xml:

<configuration>
 <property>
 <name>fs.default.name</name>
 <value>hdfs://<Master IP>:9000</value>
 </property>
</configuration>

conf/hdfs-site.xml:

<configuration>
 <property>
 <name>dfs.replication</name>
 <value>1</value>
 </property>
 <property>
 <name>dfs.name.dir</name>
 <value>/home/hadoop/yarn/yarn_data/hdfs/namenode</value>
 </property>
 <property>
 <name>dfs.data.dir</name>
 <value>/home/hadoop/yarn/yarn_data/hdfs/datanode</value>
 </property>
</configuration>

conf/mapred-site.xml:.

<configuration>
 <property>
 <name>mapred.job.tracker</name>
 <value><Master IP>:9001</value>
 </property>
</configuration>

masters:

<masternode ip>

slaves:

<slavenode1 ip>
<slavenode2 ip>

/etc/hosts: (also in your local machine)

<masternode ip>        master
<slavenode1 ip>        slave1
<slavenode1 ip>        slave2

Setup Passwordless ssh

On Master

$ sudo apt-get install ssh
$ ssh-keygen -t rsa -P ""
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ ssh localhost
$ exit

Copy the id_rsa.pub from master to authorized_keys in all machines (both files in /home/hadoop/.ssh). Make sure that you are able to login to all the slaves without password.

Execution

Format Hadoop filesystem

$ bin/hadoop namenode -format

Start Hadoop

$ bin/start-all.sh

To verify that all Hadoop processes are running:(on all machines)

$ jps

when installation is correct following daemons should run

Name Image Roles
MasterNode debian-7-wheezy-v20140619 Name Node,Secondary NameNode,JobTracker
SlaveNode1 debian-7-wheezy-v20140619 DataNode,TaskTracker
SlaveNode2 debian-7-wheezy-v20140619 DataNode,TaskTracker

create a input folder and wordcount ip file:

$ mkdir input
$ cd input
$ vim file

file:

Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware

Upload input folder to hdfs:

$ bin/hadoop dfs -copyFromLocal input /input

Run sample word count program:

$ bin/hadoop jar hadoop-examples-1.2.1.jar wordcount /input /output

To take a look at hadoop logs:

$ ls -altr /home/ubuntu/hadoop-1.2.1/logs/

To stop hadoop:

$ bin/stop-all.sh

Web UI for Hadoop NameNode: http://<masternode ip>:50070/

Web UI for Hadoop JobTracker: http://<masternode ip>:50030/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s