Installing Hadoop on OS X El Capitan (And probably Sierra)

This information is primarily sourced (read heavily copied and modified) from: getblueshift.com and references stackoverflow.com.  This also assumes you have homebrew and Java 1.7+ installed.

When I did this it installed Hadoop 2.7.3

Steps:
1.  Set JAVA_HOME in your bash profile.

  $ export JAVA_HOME=$(/usr/libexec/java_home)

2.  Install hadoop with brew, as of this writing it will download and install 2.7.3

  $ brew install hadoop

3.  To make hadoop work on a single node cluster you have to go through several steps outlined here, here are the steps in brief

4.  Setup ssh to connect to localhost without login

  $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
  $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

5. Test being able to login, if you are not able to you have to turn on Remote Login in System Preferences -> Sharing

  $ ssh localhost

6.  Brew installs Hadoop usually in /usr/local/Cellar/hadoop/

  $ cd /usr/local/Cellar/hadoop/2.7.3

Note that the version number may be different for your install

7.  Edit following config files in directory /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop

  $ vi hdfs-site.xml

    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>
    </configuration>

  $ vi core-site.xml

    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
      </property>
    </configuration>

  $ vi mapred-site.xml

    <configuration>
      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
      </property>
    </configuration>

  $ vi yarn-site.xml

    <configuration>
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
      <property>
        <name>yarn.resourcemanager.address</name>
        <value>127.0.0.1:8032</value>
      </property>
      <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>127.0.0.1:8030</value>
      </property>
      <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>127.0.0.1:8031</value>
      </property>
    </configuration>

8.  Format and start HDFS and Yarn.  See Troubleshooting Note at the bottom

  $ cd /usr/local/Cellar/hadoop/2.7.3
$ ./bin/hdfs namenode -format
$ ./sbin/start-dfs.sh
$ ./bin/hdfs dfs -mkdir /user
$ ./bin/hdfs dfs -mkdir /user/<username>
$ ./sbin/start-yarn.sh

9.  Hadoop talks to itself using two addresses, localhost and <machine-name>.  Out of the box I had communication issues.  The three address properties we added to yarn-site.xml fix one of the communication issues.  The other will occur when MapReduce starts to run.  It will try to connect to <machine-name> (Serenity in my case) but won’t be able to find it.  To fix this issue we need to modify /etc/hosts

  $ sudo vi \etc\hosts

The very first line will be:
    127.0.0.1       localhost
And we need to change it to:
    127.0.0.1       localhost Serentiy
This defines both “localhost” and “Serentiy” as aliases for 127.0.0.1

10.  Test examples code that came with the hadoop version

  $ ./bin/hdfs dfs -put libexec/etc/hadoop input
$ ./bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
$ ./bin/hdfs dfs -get output output
$ cat output/*

11.  Remove tmp files

  $ ./bin/hdfs dfs -rmr /users//input
$ ./bin/hdfs dfs -rmr /users//ouput
$ rm -rf output/

12.  Stop HDFS and Yarn after you are done

  $ ./sbin/stop-yarn.sh
$ ./sbin/stop-dfs.sh

13.  Add HADOOP_HOME and CONFIG to bashrc for future use

  $ export HADOOP_HOME=/usr/local/Cellar/hadoop/2.7.3
$ export HADOOP_CONF_DIR=$HADOOP_HOME/libexec/etc/hadoop

14.  Complete!  See the original article for a note about installing Pig

Troubleshooting Note

Note: HDFS attempts to put the HDFS folder in/home.  On one of my machines this failed due to permission issues and I had to move the HDFS to a new location.  I chose /usr/local/share/hduser/.  If you need to move the folder location, you will need to create the directories and the add two more properties to HDFS

  $ mkdir -p /usr/local/share/hduser/mydata/hdfs/namenode
  $ mkdir -p /usr/local/share/hduser/mydata/hdfs/datanode
  $ mkdir -p /usr/local/share/hduser/tmp

  $ vi hdfs-site.xml

    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property><property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/share/hduser/mydata/hdfs/namenode</value>
      </property>
      <property>
        <name>dfs.datanode.name.dir</name> 
        <value>file:/usr/local/share/hduser/mydata/hdfs/datanode</value>
      </property>
    </configuration>

  $ vi core-site.xml
    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
      </property>
      <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/usr/local/share/hduser/tmp</value>
      </property>
    </configuration>

If you change the folder, you will need to rerun “hdfs namenode -format” to format the new location.

Story Continued…

Chapter 2

Derwood was conceived in Singapore.  His mother was a Korean woman of little renown, a systems engineer for a declining computer manufacturer.  Although she worked diligently to nurture his growth, she was emotionally distant and uninvolved beyond what was required.  His father was of somewhat greater renown, having been committed to an asylum some eight months after Derwood’s conception after a nationally covered and highly destructive manic spree, protesting for computer rights.

Unaware of any of this, Derwood was shipped at the age of 6 months, far younger than any child would normally be allowed to travel, to the new mining colony Tau Ceti, where he sat forgotten in a shipping container for three years.

And now for something completely different…

Chapter 1

When humans expanded beyond the atmosphere of Earth, it was not NASA, the United States or Chinese governments, nor any other government that led the way; it was the corporations.  There of the major oil companies banded together in the late 22nd century and founded the first extra-terrestrial colony on Alpha Centauri.  The first prospecting ship took twelve years to travel the mere four light-years.  The 4th planet supported no life, the atmosphere was too hot and thick with carbon dioxide, but the ground was rich.  With no governmental regulations and oversight, no environmental protection groups, no employee safety organizations, the conglomerates raped the planet.  The demand for resources on earth was high.  The profits were obscene.

Other companies soon followed; the space race was born again.  Most expeditions were prospectors, although there was the expected assortment of religious communes, a few extremist governments, and even the occasional pleasure colony.  Coca-Cola founded a colony, just to prove they could.

Within 50 years, space technology had greatly improved.  The trip to Alpha Centauri was down to 6 years.  A limited form of stasis was developed that allowed passengers to sleep through the trip, aging a month for every year traveled, allowing workers and families to be shipped like so much cargo to their destination.  The more automated mining colonies would ferry raw materials to the orbital shipping port, a large automated facility that collected incoming shipments as well as launching the outgoing exports via mass driver to the even larger deep space starships that awaited outside the gravitational well of the solar system that would ferry the goods to their final destination.

Ruby Serialport on Windows

At work, we design and build directional sensors (glorified digital compasses). I like to use Ruby for my scripting needs at work. Most of the the time this works well, but sometimes I need to talk to one of our sensors through the serial port, and this gets a little messy on the windows side of the street. Ruby has a decent serial-port library, but it typically needs to be installed from source with the Visual C++ 6.0 compiler, which I don’t have. There is a pre-compiled version (for windows) of the library on the ruby serial-port project page, but it was done some time ago, and needs a little help to install properly. Note that there is a support issue about getting it to install in VC++2003 that might be helpful to some, but this method covers getting it to work with no compiler installed. These instructions assume that the ruby 1.8.7 one-click installer (with Ruby gems configured) is setup on the system. The installer does not add ruby to the path. While not required, it does make life easier. To add ruby to the path type

path=%path%;c:\ruby;

Once ruby has been added to the path, we can call programs like irb or gem from anywhere. The next step is to update ruby gems to the latest version (1.3.5 as of the time of this writing).

gem update --system

After that is complete, you can optionally update the rest of the installed components

gem update

The next step is to download the pre-compiled ruby serial-port gem. Unzip the file to a new folder somewhere and from the command prompt change directory into it. You should see a Rakefile and two directories, lib and pkg. Before the library can be installed it needs to be raked, and before it can be raked the rakefile needs some tweaking. Before we do that, however, navigate into the pkg folder and rename the file there, serialport-0.6.0-mswin32.gem, to serialport-0.6.0-mswin32.gem.bak. When we rake, a new file will be created.  Now navigate back up a level to the rakefile and try raking it if you like (I’ll wait). You probably got an error if the form “undefined method `manage_gems’ for Gem:Module” That’s because ‘manage_gems‘ has been removed from rubygems. A quick search shows that we just need to comment out the manage gems line and insert require ‘rake/gempackagetask’ instead. When we make the change and run rake again, we get a different error, “uninitialized constant Gem::Platform::WIN32” Searching for the answer here didn’t yield good results (for me).  However, we can use ruby itself to look into Gem::Platform and see what constants are defined. From the command line, type

irb

which takes us to the ruby interpreter. From there,

require 'rubygems'
Gem::Platform.constants

We can see that our options are RUBY or CURRENT. If we were to trace it back far enough, CURRENT should point to something to the effect of “win32api-blah-blah-blah.”  Comment out that line in the Rakefile, make a copy of it underneath, and then change WIN32 to CURRENT. When we run rake this time, it completes without error. However, our gem is not installed quite yet. Navigate to the pkg directory from the command line, where you should see a new filed named serialport-0.6.0-x86-mswin32-60.gem. To install it, simply type

gem install --local serialport-0.6.0-x86-mswin32-60.gem

The gem is installed To test it, type “irb” to go to the ruby interpreter and then type

require 'serialport'

If the gem is properly installed, it command should return true. You can now write scripts utilizing the serial port.

Ruby-on-Rails: Mongrel Configuration

I actually forgot I had this site, until Alec Jacobson made a comment and reminded me of it’s existence  (oops).   Will attempt to remember to post more, but I’m just not the blogging sort of person.

So this post pertains to some work I actually did about a month ago, which would have made excellent material here, but (see paragraph above)….  At work we needed a way to keep track of production issues requiring engineering support.  I had already come across the most excellent Retrospectiva and had actually tried to set it up before but failed at the Apache/fastCGI integration step.  And so it was, a little over 1 year later, I tried again.  The initial setup went smoothly, but once again I found myself having difficulty integrating with Apache.

Retrospectiva up and running

This time I opted for Mongrel.  Unfortunately I couldn’t find a lot of information on configuring Mongrel, so it took a lot of trial and error.  Below, reconstructed from my notes, are the steps that I had to take to configure it and get it working.  Note that this transpired on Fedora Core 11.

  1. Follow the Retrospectiva installation instructions for setting up and installing gems.
    Note that the command for installing the mysql gem on Fedora is:

       gem install mysql -- --with-mysql-config=/usr/bin/mysql_config
  2. Install Mongrel per the Retrospectiva install guide
  3. Configure Mongrel to run as a cluster:
       mongrel_rails cluster::configure -e production -p 8000 -a 127.0.0.1 -N 3 -c /var/www/retrospectiva
  4. Check that the cluster works (you should be able to browse to http://127.0.0.1:8000 and see the website):
    mongrel_rails cluster::start
  5. This step is partially referenced from another blog.  Set up mongrel to start on boot:
       mkdir /etc/init.d/mongrel_cluster
       ln -s /usr/lib64/ruby/gems/1.8/gems/mongrel_cluster-X.Y.Z/resources/mongrel_cluster /etc/init.d/mongrel_cluster
       chmod +x /etc/init.d/mongrel_cluster

    Note that depending on your setup, you might need to use /usr/lib rather than /usr/lib64.

  6. Edit /etc/init.d/mongrel_cluster, set user=root (yes, I was bad.  You should use limited account here, but I didn’t feel like troubleshooting permission errors)
  7. Add the mongrel_cluster as a service:
       /sbin/chkconfig --level 345 mongrel_cluster on
  8. Finally, link your mongrel_cluster.yml from your app’s config directory to /etc/mongrel_cluster:
       mkdir -p /etc/mongrel_cluster
       ln -s /var/www/retrospectiva/config/mongrel_cluster.yml /etc/mongrel_cluster/retrospectiva.yml
  9. Mongrel should be starting on boot now. You can test by rebooting and issuing:
       /etc/init.d/mongrel_cluster start (or stop)
  10. Configure Apache, following these directions.  Note that Apache might not be able to pass off to Mongrel (check the error log).  It may be necessary to edit SELinux to allow Apache scripts to access the network (in my case I enabled all ports for tcp, using the seedit-gui program).
  11. Make sure Apache is starting on boot:
        chkconfig --list|grep http

    You get something like this:  httpd 0:off 1:off 2:off 3:off 4:off 5:off 6:off

    Now run:

        chkconfig httpd on

    This will set it up to start each time you boot.  Verify that it set correctly:

        chkconfig --list|grep http

    You should get:  httpd 0:off 1:off 2:on 3:on 4:on 5:on 6:off

That’s all I have.  Note that this sets up your rails application as the sole Apache output.  Hosting other static content or web applications requires more configuration than I could figure out at this time.

Qt and Ruby on Windows and Mac

I use ruby at work (windows) from time to time to fill my lightweight scripting needs (breaking out visual studio 2005/C# to reformat a couple of hundred lines of text is a bit overkill).  If I have any user interaction (which I typically do not) I just prompt for input through the command line, but occasionally I’ve wanted to have an actual GUI.  I did a search and discovered the cross-platform GUI toolkit Qt and a Qt-Ruby binding so that one could call the other.

Qt on Windows

Some more searching revealed a pre-packaged install for windows.  I followed the directions in a handy guide. The install went almost perfectly, except that I already had MinGW installed.  I had the most recent version (I even ran the updater to make sure) but the version of win32api.h I had was lower than that required by the qt installer.  I checked the sourceforge repository and I did have the latest version listed.  I tried to complete the installation anyway, and when qt finished and ran it’s demo app, it was messed up (screen garbled).

During the process of installing Qt, it gives one the option to use the existing MinGW, or install one from the package.  During my second attempt, I opted to have the installer provide MinGW.  It downloaded a copy from trolltech (the  Qt maker) and installed over my existing copy of minGW (I have no clue if this has broken it yet; I haven’t used it since).  The installer completed, and the Qt (technically pronounced “Cute”, but I still say “Queue – Tee”) demo app ran correctly.  A quick test later and I had verified it working in ruby as well,

Qt on Mac

Installing Qt on the Mac is a tad bit different, both harder and easier.  I primarily followed this guide, but I had a few issues which will be documented here.  Note that the guide was made for Os X 10.4 Tiger.  The Ruby included in that version of the OS was broken, so the first step in the guide is installing a later version of Ruby and disabling the one that comes by default on the system.  This is not necessary for 10.5 Leopard.

The first thing one needs to do is download Qt and install it. I downloaded the Qt 4.5, LGPL/Free, Framework Only.  After that you need to download Mac Ports.

If you aren’t aware, OS X is based off of BSD.  Rather than using the Red Hat Linux RPM package managing system that most flavors of linux use, BSD used the ports system.  The ports sytem is basically directory tree off all the applications availible for BSD, organized by category (math, games, etc.).  Each application has its own folder, maintained by some person.  Inside the folder are pointers to all the files needed to install the program.  The person maintaining the folder is responsible for making sure the files are up to date and always compile.  Mac Ports is a Mac specific version of the ports sytem.  All the programs are maintained to compile and run on OS X. (Fink is a similar program that can help you get and install *nix programs).

We will be using Mac Ports to get cmake, a specialized makefile generator that is required to install the qt-ruby bindings (which you also need to download). I downloaded  qt4-qtruby-2.0.3, the latest version at this time.

Now it is time to install.  As per the guide:

Install Qt and Mac Ports.

Install cmake from the terminal:

cd ~/Desktop
sudo port install cmake   ## CMake is a cool build system

This is where I had to deviate from the guide.  When I called :

cmake .
make && sudo make install

cmake completed but building failed:

[ 38%] Building CXX object smoke/qtwebkit/CMakeFiles/smokeqtwebkit.dir/smokedata.o
[ 40%] Building CXX object smoke/qtwebkit/CMakeFiles/smokeqtwebkit.dir/x_1.o
i686-apple-darwin9-g++-4.0.1: /Users/Zack/Desktop/qt4-qtruby-2.0.3/smoke/qtwebkit/x_1.cpp: No such file or directory
i686-apple-darwin9-g++-4.0.1: no input files
make[2]: *** [smoke/qtwebkit/CMakeFiles/smokeqtwebkit.dir/x_1.o] Error 1
make[1]: *** [smoke/qtwebkit/CMakeFiles/smokeqtwebkit.dir/all] Error 2
make: *** [all] Error 2

I break out my Google-Foo(tm) and start searching for answers.  I track down a post on ruby forge giving me the answer:  Qt did not install with webkit support, so cmake could not find the qtwebkit cpp headers.  The easiest thing to do (for me at least) is just to disable qtwebkit support (the other option is to build and reinstall Qt from source).

cmake -DENABLE_QTWEBKIT_SMOKE=off -DENABLE_QTWEBKIT_RUBY=off .

I run build again and get a new error:

[100%] Building CXX object ruby/qttest/CMakeFiles/qttest.dir/qttesthandlers.o
Linking CXX shared module qttest.so
ld: library not found for -lsmokekde
collect2: ld returned 1 exit status
make[2]: *** [ruby/qttest/qttest.so] Error 1
make[1]: *** [ruby/qttest/CMakeFiles/qttest.dir/all] Error 2
make: *** [all] Error 2

The answer lies further up in the previous post.

cmake -DENABLE_QTWEBKIT_SMOKE=off -DENABLE_QTWEBKIT_RUBY=off -DENABLE_QTTEST=off -DENABLE_QTTEST_SMOKE=off . 

Finally, everything builds and installs.

Qt Ruby DemoQt Ruby Demo

You can find a nice little qt-ruby tutorial here: http://www.darshancomputing.com/qt4-qtruby-tutorial/

XHTML Image Space

While working on a website for a friend, I had an odd issue with a mysterious space underneath the banner image:

Screenshot of the strange space underneath the banner

I set the background of the image container to blue;  the blue sliver seen under the banner image there is my mysterious space.  After making no progress in expunging it in CSS through things like setting margins and padding to zero, I initially was able to hide it by giving a -4px margin.  Unfortunately (fortunately?) I am a purist and was unable to leave it that way, as I KNOW that I shouldn’t need a -4px margin for a space that by all reason shouldn’t be there.

The next day at work I  explained the problem to my minion (Yes, I had a minion at the time; it was awesome).  We started looking at the HTML, stripping out all that wasn’t necessary in an effort to identify the problem.  We pared the HTML down to:

<div>
<img src="logo.png" />
</div>

And still the space remained!  At this point I removed the XHTML strict DTD from the top of the document, and lo, everything was rendered correctly (or at lest rendered as I expected it to be).  At this point I whipped out my google-foo and searched for image spacing issues using xhtml strict.

As it turns out, browsers have for ages “incorrectly” rendered images in the above situation as block elements rather than inline elements.  Inline elements have some space underneath them (just like this text).  When writing the XHTML strict standard, the W3C explicitly defined the inline behavior (transitional XHTML uses the older rendering).  There are two different ways to address the space issue if it is unwanted in this situation.  The first is to use “vertical-align:bottom” (I tired this, but it didn’t work for me.  Maybe I misspelled something, maybe I put it on the wrong element).  The other option is to use “display: block” on the image.

Viola!  Adding the one simple CSS tag fixed my problem.  Now I just need to find a non-convoluted way to add a shadow to my main content in CSS, but that can wait for another post.