Setting Up GCE local access
Add ssh of your local machine to GCE Metadata
1. On Your local machine generate ssh key
$ sudo apt-get install ssh $ ssh-keygen -t rsa -P "" $ cat $HOME/.ssh/id_rsa.pub
2. copy the contents and replace yourusername@yourhostname with hadoop@yourhostname
3. copy the contents and paste it In google developers console (Compute engine->Metadata->SSH KEYS->Edit)
4.from your local machine you can able to access cloud vm using ssh hadoop@ip
Add Firewall Rules in default network:
Name | Source i/p Range | Protocols & Ports |
---|---|---|
default-allow-external | 0.0.0.0/0 | tcp:1-65535; udp:1-65535; icmp |
Hadoop Cluster Information (2 node cluster)
Name | Image | Roles |
---|---|---|
MasterNode | debian-7-wheezy-v20140619 | Name Node,Secondary NameNode,JobTracker |
SlaveNode1 | debian-7-wheezy-v20140619 | DataNode,TaskTracker |
SlaveNode2 | debian-7-wheezy-v20140619 | DataNode,TaskTracker |
Setting JDK 1.7(On all machines)
$ tar xzf jdk-7u51-linux-x64.tar.gz $ sudo mv jdk1.7.0_51 /opt/ $ vim .bashrc
export JAVA_HOME=/opt/jdk1.7.0_51 export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
$ source .bashrc $ java -version
Setting Hadoop Package (On all machines)
$ wget http://www.bizdirusa.com/mirrors/apache/hadoop/common/stable1/hadoop-1.2.1.tar.gz $ tar xzf hadoop-1.2.1.tar.gz $ cd hadoop-1.2.1
conf/hadoop-env.sh :
Uncomment the line export JAVA_HOME= and replace with
export JAVA_HOME=/opt/jdk1.7.0_51
conf/core-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://<Master IP>:9000</value> </property> </configuration>
conf/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.name.dir</name> <value>/home/hadoop/yarn/yarn_data/hdfs/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/yarn/yarn_data/hdfs/datanode</value> </property> </configuration>
conf/mapred-site.xml:.
<configuration> <property> <name>mapred.job.tracker</name> <value><Master IP>:9001</value> </property> </configuration>
masters:
<masternode ip>
slaves:
<slavenode1 ip> <slavenode2 ip>
/etc/hosts: (also in your local machine)
<masternode ip> master <slavenode1 ip> slave1 <slavenode1 ip> slave2
Setup Passwordless ssh
On Master
$ sudo apt-get install ssh $ ssh-keygen -t rsa -P "" $ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys $ ssh localhost $ exit
Copy the id_rsa.pub from master to authorized_keys in all machines (both files in /home/hadoop/.ssh). Make sure that you are able to login to all the slaves without password.
Execution
Format Hadoop filesystem
$ bin/hadoop namenode -format
Start Hadoop
$ bin/start-all.sh
To verify that all Hadoop processes are running:(on all machines)
$ jps
when installation is correct following daemons should run
Name | Image | Roles |
---|---|---|
MasterNode | debian-7-wheezy-v20140619 | Name Node,Secondary NameNode,JobTracker |
SlaveNode1 | debian-7-wheezy-v20140619 | DataNode,TaskTracker |
SlaveNode2 | debian-7-wheezy-v20140619 | DataNode,TaskTracker |
create a input folder and wordcount ip file:
$ mkdir input $ cd input $ vim file
file:
Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware
Upload input folder to hdfs:
$ bin/hadoop dfs -copyFromLocal input /input
Run sample word count program:
$ bin/hadoop jar hadoop-examples-1.2.1.jar wordcount /input /output
To take a look at hadoop logs:
$ ls -altr /home/ubuntu/hadoop-1.2.1/logs/
To stop hadoop:
$ bin/stop-all.sh
Web UI for Hadoop NameNode: http://<masternode ip>:50070/
Web UI for Hadoop JobTracker: http://<masternode ip>:50030/