I recently read "Machine Learning for Hackers" by Drew Conway and John Myles White.
I'd picked it up because I heard it was a good way to get familiar with the data mining capabilities of R. I also expected the case study based approach to be a good way to see how they approach a broad array of machine learning problems. In these respects I was reasonably well rewarded. You will find a bunch of R code scraps that can be reused with a little effort. Unfortunately the explanation of what the code does (and how) is often absent. In this sense the book is true to its name: you will learn some recipes for tackling certain problems, but you may not understand how the code works, let alone the technique being applied.
I'd picked it up because I heard it was a good way to get familiar with the data mining capabilities of R. I also expected the case study based approach to be a good way to see how they approach a broad array of machine learning problems. In these respects I was reasonably well rewarded. You will find a bunch of R code scraps that can be reused with a little effort. Unfortunately the explanation of what the code does (and how) is often absent. In this sense the book is true to its name: you will learn some recipes for tackling certain problems, but you may not understand how the code works, let alone the technique being applied.
The
one issue I found unforgivable is that in the instances where the
authors talk about machine learning theory, or use its terms, they are
often wrong. One example is the application of naive Bayes to spam
classification. The scoring function they use is the commonly used likelihood
times the prior, leaving off the evidence divisor.
As a method of scoring in Bayesian methods this is appropriate because it is proportional to calculating the full posterior probability, and much more efficient to compute. However, the resulting score is not a probability, yet the authors continuously refer to it as one. This may seem minor, but to me it undermined my confidence in their ability to communicate necessary details about the techniques they are applying.
As a method of scoring in Bayesian methods this is appropriate because it is proportional to calculating the full posterior probability, and much more efficient to compute. However, the resulting score is not a probability, yet the authors continuously refer to it as one. This may seem minor, but to me it undermined my confidence in their ability to communicate necessary details about the techniques they are applying.
Another
example: in the section on distance metrics the authors state that
multiplying a matrix by its transpose computes “the correlation between
every pair of columns in the original matrix.” This is also wrong. What
they want to say is that it produces a matrix of scores that indicate
the correlation between the rows. It is an approximation because the
score depends on the length of the columns and whether they have been
normalised. These values would not be comparable between matrices. What
would be comparable between matrices is a correlation coefficient, but
this is not what is being computed.
I
am not suggesting that a hacker's guide to machine learning should
include a thorough theoretical treatment of the subject. I think only
that where terms and theory are introduced they should be used
correctly. By this criteria this book is a failure. However, for my
purposes (grabbing some code snippets for doing analysis with R) it was
moderately successful. My largest disappointment was that given the
mistakes I noticed regarding the topics about which I have reasonable
knowledge, I have no confidence in their explanation of those areas
where I am ignorant.