Child pages
  • Hyak hadoop
Skip to end of metadata
Go to start of metadata

In below instructions, replace XYZ with your group name and replace abc with your userid.
Install Java
Install Java to /gscratch/XYZ/abc/java
Get Hadoop
mkdir /gscratch/XYZ/abc/hadoop
cp /sw/contrib/hadoop/hadoop-1.2.1.tar.gz /gscratch/XYZ/abc/hadoop
cd /gscratch/XYZ/abc/hadoop
tar -xvf hadoop-1.2.1.tar.gz
cd /gscratch/XYZ/abc/hadoop/hadoop-1.2.1
Configure .bashrc
edit .bashrc for 1.2.1
export JAVA_HOME=/gscratch/XYZ/abc/java
export HADOOP_INSTALL=/gscratch/XYZ/abc/hadoop/hadoop-1.2.1
export PATH=$PATH:$HADOOP_INSTALL/bin
export HADOOP_VERSION=1.2.1
export HADOOP_HOME=/gscratch/XYZ/abc/hadoop/hadoop-1.2.1
Get nodes for the Hadoop cluster
<!--
(If you choose a varying master node then you will have the update the core-site.xml, mapred-site.xml and hdfs-site.xml files every time as shown below in the section "Configure Hadoop XML files".)
-->
For convenience, use a fixed master node.
Issue below command to get the slave nodes. Change the the variables nodes,
ppn and walltime (in hours) as appropriate.
qsub -I -l nodes=4:ppn=16,walltime=4:00:00
Delete old HDFS files
if you have run hadoop before then
ssh to all the nodes in output of above command and issue below two commands
rm -rf /scr/XYZ/abc/HDFS
mkdir /scr/XYZ/abc/HDFS
Configure Hadoop slaves and masters files
cd /gscratch/XYZ/abc/hadoop/hadoop-1.2.1/conf
cat $PBS_NODEFILE | uniq > slaves
<!--
edit the file slaves and delete the first line
(hostname in first line is the master node)
-->
edit the file masters and replace localhost with the name of the master node
Configure Hadoop XML files
Copy sample hadoop XML configuration files:
cp /sw/contrib/hadoop/sampleconf/core-site.xml /gscratch/XYZ/abc/hadoop/hadoop-1.2.1/conf
cp /sw/contrib/hadoop/sampleconf/mapred-site.xml /gscratch/XYZ/abc/hadoop/hadoop-1.2.1/conf
cp /sw/contrib/hadoop/sampleconf/hdfs-site.xml /gscratch/XYZ/abc/hadoop/hadoop-1.2.1/conf
Edit the sample XML configuration files:
cd /gscratch/XYZ/abc/hadoop/hadoop-1.2.1/conf
edit the core-site.xml, mapred-site.xml and hdfs-site.xml files:
(1) replace n0001 with your master node name
(2) replace XYZ with your group name
(3) replace abc with your userid
<!--edit the core-site.xml file:
(1) replace n0001 with your master node name
(2) replace XYZ with your group name
(3) replace abc with your userid
edit the mapred-site.xml file:
(1) replace n0001 with your master node name
(2) replace XYZ with your group name
(3) replace abc with your userid
edit the hdfs-site.xml file:
(1) replace XYZ with your group name
(2) replace abc with your userid
-->
Start Hadoop and run a wordcount mapreduce job
cd /gscratch/XYZ/abc/hadoop/hadoop-1.2.1
hadoop namenode -format
start-all.sh
hadoop fs -mkdir /abctemp
hadoop fs -put README.txt /abctemp
hadoop jar hadoop-examples-1.2.1.jar wordcount /abctemp /abctemp_out
Stop Hadoop
stop-all.sh
Start Hadoop and run a terasort mapreduce job
cd /gscratch/XYZ/abc/hadoop/hadoop-1.2.1
hadoop namenode -format
start-all.sh
hadoop jar hadoop-examples-1.2.1.jar teragen 1000 /abc_teragen_out
hadoop jar hadoop-examples-1.2.1.jar terasort /abc_teragen_out /abc_terasort_out
hadoop jar hadoop-examples-1.2.1.jar teravalidate /abc_terasort_out /abc_teravalidate_out
(Note that in the teragen command, 1000 specifies the
number of rows of input data. Each row has 100 bytes of data.)
Stop Hadoop
stop-all.sh
Further reading:
See below link for more details:
https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html

  • No labels