Wednesday, May 29, 2013

Machine Learning for Hackers

I recently read "Machine Learning for Hackers" by Drew Conway and John Myles White.

I'd picked it up because I heard it was a good way to get familiar with the data mining capabilities of R. I also expected the case study based approach to be a good way to see how they approach a broad array of machine learning problems. In these respects I was reasonably well rewarded. You will find a bunch of R code scraps that can be reused with a little effort. Unfortunately the explanation of what the code does (and how) is often absent. In this sense the book is true to its name: you will learn some recipes for tackling certain problems, but you may not understand how the code works, let alone the technique being applied.

The one issue I found unforgivable is that in the instances where the authors talk about machine learning theory, or use its terms, they are often wrong. One example is the application of naive Bayes to spam classification. The scoring function they use is the commonly used likelihood times the prior, leaving off the evidence divisor.

As a method of scoring in Bayesian methods this is appropriate because it is proportional to calculating the full posterior probability, and much more efficient to compute. However, the resulting score is not a probability, yet the authors continuously refer to it as one. This may seem minor, but to me it undermined my confidence in their ability to communicate necessary details about the techniques they are applying.

Another example: in the section on distance metrics the authors state that multiplying a matrix by its transpose computes “the correlation between every pair of columns in the original matrix.” This is also wrong. What they want to say is that it produces a matrix of scores that indicate the correlation between the rows. It is an approximation because the score depends on the length of the columns and whether they have been normalised. These values would not be comparable between matrices. What would be comparable between matrices is a correlation coefficient, but this is not what is being computed.

I am not suggesting that a hacker's guide to machine learning should include a thorough theoretical treatment of the subject. I think only that where terms and theory are introduced they should be used correctly. By this criteria this book is a failure. However, for my purposes (grabbing some code snippets for doing analysis with R) it was moderately successful. My largest disappointment was that given the mistakes I noticed regarding the topics about which I have reasonable knowledge, I have no confidence in their explanation of those areas where I am ignorant.

Thursday, May 2, 2013

Top 8 Essential Tweaks for New Installations of Ubuntu 12.04

Having just upgraded to 12.04 there are a bunch of things that I found I needed to do to get it working how I wanted to.

1) Install the Classic Application menu

It is beyond me why the hierarchical applications menu has been removed in this version of ubuntu. It also seems that the new left hand launcher only displays apps installed from the 'Ubuntu Software Centre.' Applications installed from Synaptic are lost and don't always seem to show up in the new Dash.

So to get the classic application menu: Open a terminal ( Ctrl – Alt – T ) and add the following PPA.

sudo apt-add-repository ppa:diesch/testing

Then update and install the classic menu

sudo apt-get update && sudo apt-get install classicmenu-indicator

2) Install the restricted extras

Allows you to listen to mp3s and watch loads of encrypted video formats.

sudo apt-get install ubuntu-restricted-extras

3)  Enable 'Show Remaining Space Left' Option in Nautilus File Browser

Again, why this is not on by default is beyond me. Extremely useful.

Open Nautilus. Go to View - Statusbar. Enable it, nuff said.

4) Calculator Lens/Scope for Ubuntu 12.04

One upside of the new Ubuntu Dash are a bunch of information rich widgets integrated into the OS. You can get info on weather, cities, films do calculations directly from the HUD.

sudo add-apt-repository ppa:scopes-packagers/ppa
sudo apt-get update
sudo apt-get install unity-lens-utilities unity-scope-calculator
sudo apt-get install unity-scope-rottentomatoes
sudo apt-get install unity-scope-cities

5) Open in Terminal Nautilus Extension

Allows you to open a terminal that is already inside the folder you are currently browsing with Nautilus. This saves me oodles of time.

sudo apt-get install nautilus-open-terminal

6) Install CPU/Memory Indicator Applet

Sweet little widget to view systems resource usage stats

sudo add-apt-repository ppa:indicator-multiload/stable-daily
sudo apt-get update
sudo apt-get install indicator-multiload

7) Install Spotify

Music streaming service desktop client. This info comes directly from their laboratories:

Add the spotify repo by editing /etc/apt/sources.list
Add the line:
deb stable non-free

sudo apt-key adv --keyserver --recv-keys 94558F59

sudo apt-get update && sudo apt-get install spotify-client

8) Install Synergy

Synergy is an application that lets you share you mouse and keyboard across computers. More than that it also shares your clipboard, you can copy text between machines. You can't copy files for the moment but maybe if <a href="">we all donate to the cause</a> we can request that feature.

You can download a Debian package here:

Then just install it with

sudo dpkg -i <synergy package name here>