Monday, November 10, 2014

Basic Guide to Setting Up Single Node Hadoop 2.5.1 Cluster on Ubuntu



So, you have decided you are interested in big data and data science and exploring what you can do with Hadoop and Map Reduce.

But... you find most of the tutorials too hard to wade through, inconsistent, or you simply encounter problems that you just can't solve. Hadoop is evolving so fast that often the documentation is unable to keep up. 

Here I will run you through the process I followed to get the latest version of Hadoop (2.5.1) running so I could use it to test my Map Reduce programs. 

You can see the official Apache Docs here.


Part One: Java

You need to make sure you have a compatible version of Java on your machine.

Jump into your terminal and type
java -version
You preferably need an installation of Java 7.
When I run this I get:

java version "1.7.0_55"
OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1~0.12.04.2)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)


Part Two: Other Software

You will need ssh and rsync installed. Chances are that they already are, but if not just run:
sudo apt-get install ssh
sudo apt-get install rsync


Part Three: Grab a Release

Head to the Apache Hadoop Releases page, choose a mirror and grab the tarball (.tar.gz). Make sure you do not grab the source file by mistake (src).
Remember: in this walk-through I have grabbed release: 2.5.1

Part Four: Unpack & Configure

Copy the tarball to wherever you want Hadoop to reside. For me I like to put it in the directory
/usr/local/hadoop
and then extract the contents with
tar -xvf hadoop-2.5.1.tar.gz
Then you will need to do some configuration. Open the file
vi hadoop-2.5.1/etc/hadoop/hadoop-env.sh
You will need to modify the line that currently looks like this
export JAVA_HOME=${JAVA_HOME}

You need to point this to your java installation. If you are not sure where that it just run
which java

and then copy the path (minus the bin/java at the end) into the hadoop config file to replace the text ${JAVA_HOME}.



Part Five: Test

First run a quick to check that you have configured java correctly. The following command should show you the version of hadoop and its compilation information.

hadoop-2.5.1/bin/hadoop version

Part Six: Run Standalone

The simplest thing you can do with hadoop is run a map reduce job as a stand alone script.

The Apache Docs give a great simple example: grepping a collection of files.

Run these commands:
mkdir input
cp hadoop-2.5.1/etc/hadoop/*.xml input
hadoop-2.5.1/bin/hadoop jar hadoop-2.5.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar grep input output 'dfs[a-z.]+'

When hadoop completes that process you can open up the results file and have a look.
vi output/part-r-00000
You should see a single line for each match of the regular expression. Trying changing the expression and seeing what you get. Now you can use this installation to test your map reduce jars against Hadoop 2.5.1.


Coming Next: Running Hadoop 2.5.1 in Pseudo Distributed Mode

Sunday, November 9, 2014

Wittgenstein's Beetle Book Review



Wittgenstein's Beetle by Martin Cohen 

Summary: Very disappointing.

What could have been a great primer on one of the essential tools of philosophy, is held back by the author's mediocre understanding of many of the issues he discusses. The prime example is the 'thought experiment' by Wittgenstein that serves as the name of the book. Wittgenstein held that the idea of private language was incoherent because languages were games played between people. His beetle experiment was designed to make this idea concrete by proposing a world in which we all owned a private box containing a beetle. Mr Cohen provides a direct quote from Wittgenstein's Investigations in which he (Wittgenstein) clearly states that the word beetle, if used in such a society, could not be referring to the thing in the box. Mr Cohen then turns around and tells us that the point of Wittgenstein's experiment is to show that we assume that because we use the same word as other people we are talking about the same thing. This is not what Wittgenstein said, and he says this clearly in the text.

To make matters worse, Mr Cohen returns to pick on Wittgenstein's Beetle at the end of the book as an example of a poorly done thought experiment. It fails to meet several of Mr Cohen's criteria for successful thought experiments. One needs to note that it is Mr Cohen who has massaged the definition of a thought experiment to get Wittgenstein's beetle in, and then he criticises its performance, all the while failing to understand it.

I am not going to mention the numerous fallacies the author pens on many topics of science, and his horrendous attempts at jokes. The only reason I am giving the book 2 stars is because the discussion of Searle's Chinese room argument is excellent. Read this chapter and then throw the book away.

Saturday, November 1, 2014

Appcelerator Titanium Android Woes on Mac OSX

I have been having ongoing problems getting Appcelerator to build and install Android Apps again.

The very first time I built an Android App it took me some time to get the configuration right. Now that I have been through system upgrades I seem to have come back to step one again. Like before the official Appcelerator Guide helps me refresh how you get the device itself configured. However, it will not prepare you for the grand cluster of configuration issues you will face getting all the toys to play nicely together.

Problem 

Appcelerator does not recognize your android device.
Even though if you run adb devices you can see it listed.

Solution

I still don't have a solution for this (most people suggest uninstalling everything and starting again, which to my mind constitutes giving up not solving it). I do have a work around though: Build the app without installing it and then use adb to install it independently. This definitely works in the absence of a better solution.

To build

Try the command titanium build,
- or -
Just use the distribute app dialog in Titanium Studio.
You can generate a signed APK easily this way.

To install

Just use the adb command line utility:

   adb install ../Desktop/MyApp.apk

Problem solved,... sort of.


Problem

adb does not even recognize your android device.
This seems to happen randomly, depending on what I had for breakfast.


Solution

I generally find this requires a little fiddling around. This particular combination is currently working for me:
1) Unplug your device.
2) Kill the adb server.
3) Plug your device back in
4) Run adb devices
This seems to kickstart the adb server in such a way that it correctly finds the attached devices.

Problem

Your android App almost builds an APK but red errors flash up at the end. Appcelerator tells you it was built but there is nothing in the build directory. You see a bunch of uninformative python errors codes referring to problems with the file: builder.py, for example:

line 2528, in <module>
[ERROR]     builder.build_and_run(False, avd_id, debugger_host=debugger_host, profiler_host=profiler_host)

For me it turned out that this is all because of the fact that some executables got moved around between distributions of the android SDK.

This problem is outlined in this note from the Appcelerator forums fixed it for me.

Solution

Create symlinks to aapt and dx in /Applications/Android-sdk/platform-tools:

ln -s /Applications/Android-sdk/build-tools/17.0.0/aapt aapt

ln -s /Applications/Android-sdk/build-tools/17.0.0/dx dx