*Occam's Razor*. This is the notion that whenever two competing theories explain the known facts equally well, then the simpler theory is generally correct.

As a rule of thumb

*Occam's Razor*helps us wade through the infinite number of potential theories that might be put forward to explain any given phenomenon. To demonstrate this I will use a trivial example that is not particularly deeply scientific.

When trying to come up with a principle to described the observation that the sun comes over the horizon every 24 hours we could generate an infinite set of theories as follows:

1) The earth rotates at a constant speed such that the sun appears on the horizon at regular intervals.

Then we may add an infinite set of exceptions.

2) The earth rotates at a constant speed such that the sun appears on the horizon at regular intervals. Except on Thursday the 6th December 2018, when the earth will stop rotating for 24 hours and then resume.

3) The earth rotates at a constant speed such that the sun appears on the horizon at regular intervals. Except on Thursday the 6th December 2018, when the earth will stop rotating for 24 hours and then rotate backwards.

4) The earth rotates at a constant speed such that the sun appears on the horizon at regular intervals. Except after when the New York Yankees have won the world series 100 times in a row, then it will slow down to half its speed.

Etc, etc.

Finding a set of theories that all equally explain the given evidence is easy. In this case we could decide between these theories easily through empirical means, because they make slightly different predictions, we just wait for the predicted outcomes to diverge. However, as there are in fact an infinite number of these alternate theories in practice it is not possible. Instead, we rely on the rule of thumb known as

*Occam's Razor*to remove all alternative theories.

It turns out that in the realm of data mining Occam's Razor turns out to be incredibly practical. If you can fit multiple models to a data set with approximately equal error, then the simplest model will more often than not produce the best predictions. This principle has been critical in the design of many modern machine learning algorithms.

Interestingly the predictive power of simplicity extends beyond this. As google research director Peter Norvig discusses in the presentation below: we are finding that the critical factor in solving many modern computer science problems is data volume. As our data sets grow in size we see that the best predictive models come not from painstakingly building custom models for certain data, but from just using an array of simple models and letting the data speak.

Read more about this issue in the paper The Unreasonable Effectiveness of Data.