Hadoop 1.02 on Ubuntu Server 10.04.4 LTS

I added 5 nodes to my Hadoop Cluster and learned a lot more along the way. I decided to update the whole fleet to Hadoop 1.02 and fix a few other configuration items with the cluster.

On the base 64-bit headless server install for Ubuntu 10.04.4 LTS I included the Open SSH server package. From this install I added this to all the new nodes.

sudo apt-get adduser hadoop

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:sun-java-community-team/sun-java6
sudo apt-get update
sudo apt-get install sun-java6-jre sun-java6-bin
wget http://ftp.wayne.edu/apache/hadoop/common/hadoop-1.0.2/hadoop-1.0.2.tar.gz

From here I set up passphrase-less ssh using one set of keys. I tar xvf’ed the tarball into /usr/local and make a symlink in there for hadoop->hadoop-1.0.2

I set the environment variable ‘export HADOOP_INSTALL=/usr/local’ in /etc/profile and gave the hadoop user ownership of /usr/local/hadoop-1.0.2

I read more on the Hadoop Book and followed their practices of creating /var/hadoop_tmp and /var/hdfs for each Hadoop nodes to use as its temp and data space. The 3 main XML configuration files were identical on all 10 data nodes. The namenode has the same 3 XML files but it’s master and slaves files have the actually node names in it, all the datanodes have blank master/slave files. Setting up this Hadoop cluster has been fun and there is more work to do.



Leave a Reply

Your email address will not be published.