Installing Hadoop on OS X El Capitan (And probably Sierra)

This information is primarily sourced (read heavily copied and modified) from: getblueshift.com and references stackoverflow.com. This also assumes you have homebrew and Java 1.7+ installed.

When I did this it installed Hadoop 2.7.3

Steps:
1. Set JAVA_HOME in your bash profile.

$ export JAVA_HOME=$(/usr/libexec/java_home)

2. Install hadoop with brew, as of this writing it will download and install 2.7.3

$ brew install hadoop

3. To make hadoop work on a single node cluster you have to go through several steps outlined here, here are the steps in brief

4. Setup ssh to connect to localhost without login

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

5. Test being able to login, if you are not able to you have to turn on Remote Login in System Preferences -> Sharing

$ ssh localhost

6. Brew installs Hadoop usually in /usr/local/Cellar/hadoop/

$ cd /usr/local/Cellar/hadoop/2.7.3

Note that the version number may be different for your install

7. Edit following config files in directory /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop

$ vi hdfs-site.xml

    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>
    </configuration>

$ vi core-site.xml

    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
      </property>
    </configuration>

$ vi mapred-site.xml

    <configuration>
      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
      </property>
    </configuration>

$ vi yarn-site.xml

    <configuration>
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
      <property>
        <name>yarn.resourcemanager.address</name>
        <value>127.0.0.1:8032</value>
      </property>
      <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>127.0.0.1:8030</value>
      </property>
      <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>127.0.0.1:8031</value>
      </property>
    </configuration>

8. Format and start HDFS and Yarn. See Troubleshooting Note at the bottom

$ cd /usr/local/Cellar/hadoop/2.7.3 $ ./bin/hdfs namenode -format $ ./sbin/start-dfs.sh $ ./bin/hdfs dfs -mkdir /user $ ./bin/hdfs dfs -mkdir /user/<username> $ ./sbin/start-yarn.sh

9. Hadoop talks to itself using two addresses, localhost and <machine-name>. Out of the box I had communication issues. The three address properties we added to yarn-site.xml fix one of the communication issues. The other will occur when MapReduce starts to run. It will try to connect to <machine-name> (Serenity in my case) but won’t be able to find it. To fix this issue we need to modify /etc/hosts

$ sudo vi \etc\hosts

The very first line will be:
127.0.0.1 localhost
And we need to change it to:
127.0.0.1 localhost Serentiy
This defines both “localhost” and “Serentiy” as aliases for 127.0.0.1

10. Test examples code that came with the hadoop version

$ ./bin/hdfs dfs -put libexec/etc/hadoop input $ ./bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+' $ ./bin/hdfs dfs -get output output $ cat output/*

11. Remove tmp files

$ ./bin/hdfs dfs -rmr /users//input $ ./bin/hdfs dfs -rmr /users//ouput $ rm -rf output/

12. Stop HDFS and Yarn after you are done

$ ./sbin/stop-yarn.sh $ ./sbin/stop-dfs.sh

13. Add HADOOP_HOME and CONFIG to bashrc for future use

$ export HADOOP_HOME=/usr/local/Cellar/hadoop/2.7.3 $ export HADOOP_CONF_DIR=$HADOOP_HOME/libexec/etc/hadoop

14. Complete! See the original article for a note about installing Pig

Troubleshooting Note

Note: HDFS attempts to put the HDFS folder in/home. On one of my machines this failed due to permission issues and I had to move the HDFS to a new location. I chose /usr/local/share/hduser/. If you need to move the folder location, you will need to create the directories and the add two more properties to HDFS

$ mkdir -p /usr/local/share/hduser/mydata/hdfs/namenode
$ mkdir -p /usr/local/share/hduser/mydata/hdfs/datanode
$ mkdir -p /usr/local/share/hduser/tmp

$ vi hdfs-site.xml

    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property><property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/share/hduser/mydata/hdfs/namenode</value>
      </property>
      <property>
        <name>dfs.datanode.name.dir</name> 
        <value>file:/usr/local/share/hduser/mydata/hdfs/datanode</value>
      </property>
    </configuration>

  $ vi core-site.xml

    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
      </property>
      <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/usr/local/share/hduser/tmp</value>
      </property>
    </configuration>

If you change the folder, you will need to rerun “hdfs namenode -format” to format the new location.

Installing Hadoop on OS X El Capitan (And probably Sierra)

Troubleshooting Note

One Comment

Pages

Categories

Archives

Search

Blogroll

RSS Feeds

Meta