This information is primarily sourced (read heavily copied and modified) from: getblueshift.com and references stackoverflow.com. This also assumes you have homebrew and Java 1.7+ installed.
When I did this it installed Hadoop 2.7.3
Steps:
1. Set JAVA_HOME
in your bash profile.
$ export JAVA_HOME=$(/usr/libexec/java_home)
2. Install hadoop with brew, as of this writing it will download and install 2.7.3
$ brew install hadoop
3. To make hadoop work on a single node cluster you have to go through several steps outlined here, here are the steps in brief
4. Setup ssh to connect to localhost without login
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
5. Test being able to login, if you are not able to you have to turn on Remote Login in System Preferences -> Sharing
$ ssh localhost
6. Brew installs Hadoop usually in /usr/local/Cellar/hadoop/
$ cd /usr/local/Cellar/hadoop/2.7.3
Note that the version number may be different for your install
7. Edit following config files in directory /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop
$ vi hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
$ vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
$ vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
$ vi yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>127.0.0.1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>127.0.0.1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>127.0.0.1:8031</value>
</property>
</configuration>
8. Format and start HDFS and Yarn. See Troubleshooting Note at the bottom
$ cd /usr/local/Cellar/hadoop/2.7.3
$ ./bin/hdfs namenode -format
$ ./sbin/start-dfs.sh
$ ./bin/hdfs dfs -mkdir /user
$ ./bin/hdfs dfs -mkdir /user/<username>
$ ./sbin/start-yarn.sh
9. Hadoop talks to itself using two addresses, localhost and <machine-name>. Out of the box I had communication issues. The three address properties we added to yarn-site.xml fix one of the communication issues. The other will occur when MapReduce starts to run. It will try to connect to <machine-name> (Serenity in my case) but won’t be able to find it. To fix this issue we need to modify /etc/hosts
$ sudo vi \etc\hosts
The very first line will be:
127.0.0.1 localhost
And we need to change it to:
127.0.0.1 localhost Serentiy
This defines both “localhost” and “Serentiy” as aliases for 127.0.0.1
10. Test examples code that came with the hadoop version
$ ./bin/hdfs dfs -put libexec/etc/hadoop input
$ ./bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
$ ./bin/hdfs dfs -get output output
$ cat output/*
11. Remove tmp files
$ ./bin/hdfs dfs -rmr /users//input
$ ./bin/hdfs dfs -rmr /users//ouput
$ rm -rf output/
12. Stop HDFS and Yarn after you are done
$ ./sbin/stop-yarn.sh
$ ./sbin/stop-dfs.sh
13. Add HADOOP_HOME and CONFIG to bashrc for future use
$ export HADOOP_HOME=/usr/local/Cellar/hadoop/2.7.3
$ export HADOOP_CONF_DIR=$HADOOP_HOME/libexec/etc/hadoop
14. Complete! See the original article for a note about installing Pig
Troubleshooting Note
Note: HDFS attempts to put the HDFS folder in/home
. On one of my machines this failed due to permission issues and I had to move the HDFS to a new location. I chose /usr/local/share/hduser/. If you need to move the folder location, you will need to create the directories and the add two more properties to HDFS
$ mkdir -p /usr/local/share/hduser/mydata/hdfs/namenode
$ mkdir -p /usr/local/share/hduser/mydata/hdfs/datanode
$ mkdir -p /usr/local/share/hduser/tmp
$ vi hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property>
<property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/share/hduser/mydata/hdfs/namenode</value> </property>
<property> <name>
dfs.datanode.name.dir
</name> <value>file:/usr/local/share/hduser/mydata/hdfs/datanode</value> </property>
</configuration>$ vi core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property>
<property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/share/hduser/tmp</value> </property>
</configuration>
If you change the folder, you will need to rerun “hdfs namenode -format
” to format the new location.