• About Me
  • Resume and Experience
  • Blog
  • Contact Me

Philip D. Coyne

Hadoop

What do I try to do: Experiment using Hadoop in analyze large quantiy of data for scienitifc and text mining application. I will start with installation and configuring Single Node Hadoop, load data needed, using Pig and Hive to transform and load data onto HDFS.

Single Node

  • Pre-requisite software tools
  • OS: Ubuntu 14.01.1 on 64-bit machine with 16GB RAM
  • JVM 1.7 (bundled with Ubuntu 14.01.1)
  • Hadoop 2.6.0 download
  • WinSCP to copy file from Windows to Ubuntu environemnt
  • putty if on Windows; or totalTerminal on Mac OS
Prep the Environment
  • Assume that when Ubuntu is installed, the hostname is ubuntuboninc and a user boninc is created. From Windows, use putty to ssh into unbutuboninc; from Mac OS use total terminal applicaiton:
  • ssh boninc@ubuntuboninc
  • Add hadoop user, named hdUser
  • sudo useradd -m hdUser
  • Add password for hdUser
  • sudo passwd hdUser (enter passowrd when system asks)
  • Change hdUser shell to use bash shell
  • sudo chsh -s /bin/bash hdUser
  • Allow hdUser to be able for sudo
  • sudo adduser -g admin hdUser

Hadoop Installation (Installation directory: /usr/local/hadoop-2.6.0)

  • On ubuntuboninc machine, in /home/hdUser, create intall directory
  • mkdir install
  • Use WinSCP to copy hadoop 2.6.0 tar file to ubuntuboninc machine, directory /home/hdUser/install
  • Uncompress hadoop tar to /user/local
  • cd /usr/local
  • tar -xzf /home/hdUser/install/hadoop-2.6.0.tar.gz /usr/local/hadoop-2.6.0
  • Create a group name hadoop
  • sudo groupadd hadoop
  • Change hadoop files to hdUser and group hadoop
  • sudo chown -R /usr/local/hadoop-2.6.0

Prepare Hadoop Environment Variables

  • Use an editor to edit the file:
  • hadoop-env.sh
  • in directory: /usr/local/hadoop-2.6.0/etc/hadoop
  • Ensure the set the following two environment variables:
  • export JAVA_HOME = ${JAVA_HOME} # this is set in /home/hdUser/.bashrc
  • export HADOOP_PREFIX=/usr/local/hadoop-2.6.0
Test:
  • chmod +x /usr/local/etc/hadoop/hadoop-env.sh
  • /usr/local/etc/hadoop/hadoop-env.sh ## execute the environement shell
  • Try the following command
  • $ bin/hadoop
If the above environment variables are set up correctly, the above command will display the usage docummentation for hadoop script.
HDFS Configuration<TBA>
Multi Nodes<To be Added>

We use cookies to enable essential functionality on our website and analyze website traffic. For more information, read our Cookies and Privacy Policy.

Your Cookie Settings

We use cookies to enable essential functionality on our website and analyze website traffic. For more information, read our our Cookies and Privacy Policy below.

Cookie Categories
Essential

These cookies are strictly necessary to provide you with services available through our websites.

Analytics

These cookies collect information that is used in aggregate and in an anonymized form to help us understand how our website is being used and how effectively our site is performing.