Professional Documents
Culture Documents
Ockham’s Razor
Nicole Lazar∗
controversy, historically or in modern times. For of the Razor found in the philosophical literature and
instance, Ockham’s contemporary Walter Chatton in Ockham’s own writing, but is close enough for
took an opposing view, stating that explanations a working definition.
should be added until a phenomenon or proposition
could be affirmed (if three entities are not enough,
add a fourth, or fifth, and so on; Ref 4). Wears and OCKHAM’S RAZOR AS A TOOL
Lewis5 raise potential objection to the principle of OF SCIENCE
parsimony (which they equate with Ockham’s Razor)
in the context of statistical modeling and inference; a The Razor, with its emphasis on simplicity and
simpler explanation of a phenomenon that ignores elegance of explanation, merges well with the
important variables may lead to poor predictive current state of scientific practice. Good theories
power and difficulty in validating models. Part of the are those that explain all the known facts in a
difficulty evidently centers on the meaning of ‘simple’. fashion as uncomplicated as possible. Ockham’s
The objections of Chatton, as well as of Wears Razor may therefore be used by scientists to pare away
and Lewis, involve an understanding of the Razor unnecessary hypotheses and arrive at simpler theories.
as demanding utmost—even excessive—simplicity. Jefferys and Berger6 discuss competing explanations
These ‘anti-Razors’, as they are sometimes called, early in the 20th century for the motion of the
approach the parsimony question from the opposite planet Mercury, which was not fully explained by
direction, namely, that an entity should be shown to Newtonian physics. One theory held that another, as
be lacking in importance before it is removed from the yet unobserved planet (tentatively named ‘Vulcan’)
explanation. But this does not contradict Ockham’s existed between the Sun and Mercury, and it was
Razor, which stresses ‘unnecessary’ multiplication Vulcan’s orbit that was causing the perturbances in
of explanations or assumptions. Clearly, an effect Mercury. Another theory was that the effect was
that has important explanatory power should not caused by rings around the Sun. In still another, the
be ignored. Or, as Einstein put it: ‘everything Sun was taken to be oblate. Yet another explanation
should be made as simple as possible, but not was that the law of gravity was not quite right (see
simpler’. Ref 6 for more details). All of these theories had
unknown parameters (for instance, the amount of
deviation from the law of gravity, or the size and
THE INTUITIVE USE OF OCKHAM’S location of Vulcan), and over time some of them
RAZOR became more and more implausible (for instance, lack
of confirmed sightings of Vulcan made the existence of
One appeal of Ockham’s Razor is that it cuts through such a planet more and more unlikely as time went on).
layers of assumptions and reveals the core of an In 1915, Einstein’s theory of general relativity was able
argument, thereby enabling the impartial outsider to resolve the discrepancy; even more, the perturbance
to evaluate the validity of competing theories. As in the motion of Mercury is a consequence of Einstein’s
a somewhat trivial example, consider the case of the theory, lending this explanation an important element
‘Cottingley Fairies’, a series of photographs of fairies of parsimony. Although Ockham’s Razor could not
taken by two young girls in 1917. These photographs, have led a scientist to the theory of general relativity,
which purportedly show the girls playing with fairies, once that theory was proposed the Razor could (and
were widely believed at the time to be authentic; did) lend it increased credibility and scientific appeal. If
Sir Arthur Conan Doyle, the author of the Sherlock one theory can explain multiple scientific phenomena,
Holmes mysteries, was one believer. But which is and even predict others (as general relativity has),
the simpler explanation: that fairies do actually exist it is to be preferred over multiple theories, one for
but have somehow eluded being caught on film each observed outcome. This is yet another view on
(either prior to 1917 or subsequently)? Or that Ockham’s Razor, one that is still consistent with the
the photographs were fake? The latter explanation previous ideas.
requires fewer assumptions (for most people), and
hence the Razor indicates it should be favored. Indeed,
of course, the photographs were faked, although this THE LAW OF PARSIMONY
was not definitively proven until many years later.
IN STATISTICAL MODELING
Uses such as deciding on the veracity of the Cottingley
Fairies correspond to the layman’s understanding of When building statistical models, the law of parsi-
Ockham’s Razor as ‘the simplest theory is the correct mony is often invoked, at least implicitly. Parsimo-
one’. This diverges quite substantially from statements nious models, with a comparatively small number of
parameters, may be preferred because they are easier of what is meant by ‘simple’, and accords well with
to interpret and explain; they are less prone to overfit- the way science tends to be practiced (we start with
ting the particular sample in hand; and they tend to be simple hypotheses and build them up as needed
more generalizable to other samples taken under simi- when more evidence accumulates), the definition is
lar conditions. Model selection criteria such as Akaike somewhat circular, in the absence of any objective way
Information Criterion (AIC)7 and Bayes Information of assigning the priors. Jeffreys’ argument was based
Criterion (BIC),8 to name just two of the many that on exactly this consideration of prior probabilities.6
have been proposed in the statistics literature, penal- A different tactic is therefore needed. In particular,
ize models that are too complex, i.e., have too large a definition of simplicity that does not appeal
number of parameters. Smith and Spiegelhalter9 argue
directly to the prior probabilities attached to the
that Bayes Factors for comparing nested linear models
competing theories will resolve the circularity. Jefferys
work as a sort of ‘automatic’ Ockham’s Razor, select-
and Berger6 call hypothesis H0 ‘simpler’ than the
ing the simpler model ‘whenever there is nothing to be
lost by so doing’ (Ref 9, p. 216). Some procedures for alternative H1 if it makes sharper predictions about
variable selection, such as the LASSO,10 enforce parsi- future data observations. Their reasoning is that a
mony by shrinking small parameter estimates to zero, more complex model will usually be able to explain a
thereby creating so-called ‘sparse’ models. Informally, wider variety of possible outcomes, whereas a simpler
these procedures and criteria embody the notion that model is more narrowly focused. Consequently, also,
‘simpler models should be preferred until the data jus- a simpler model is easier to invalidate; all that
tify more complex models’ (Ref 11, p. 349), a version is required are observed data that contradict the
of Ockham’s Razor.11 That is, if two competing mod- prediction of that model (and there will be many such
els both fit the data equally well (an idea that in and of possibilities). On the other hand, if data are observed
itself perhaps needs further consideration), we should that match the prediction of the simpler model, its
prefer the simpler one, which typically will have fewer credibility is greatly enhanced, even if the observed
parameters. The objection of Wears and Lewis5 is also data are consistent with the more complicated model
addressed by Balasubramanian’s formulation, because as well. In probability terms, the simpler model puts
the latter would require including any variables of high probability on a small subset of the space of
potential explanatory power; a model that excluded possible (future) observations; the complex model
important predictor variables presumably would not distributes the probability more evenly. These can
fit the data as well as one that included those variables. be thought of as the prior probablities in a Bayesian
Hence, it is again evident that agreement upon what formulation. Combined with models for the observed
is meant by ‘simple’ in any given context is crucial for
data, posterior probabilities for the two hypotheses
reconciling apparent discrepancies in the scientific use
can be calculated using Bayes’ rule. And, further, as
of the Razor.
discussed by Jefferys and Berger, the simpler model
will often have higher posterior probability, so that
OCKHAM’S RAZOR AND BAYESIAN a comparison using, say, Bayes Factors, would lead
STATISTICS to favoring the simpler explanation. Similar results
will hold in many settings even if the two competing
From early days, apparently dating back to the work hypotheses have the same prior probabilities; that is,
of Jeffreys (see, for example, Ref 12), connections even with no a priori preference for one explanation
have been drawn between Ockham’s Razor and over the other, Bayesian considerations will often
the Bayesian approach to statistical analysis. These
lead to selection of the more parsimonious one.
connections are at a philosophical as well as a
Following Jefferys and Berger,6 there are three
computational level, and were elaborated on by
Bayesian interpretations of Ockham’s Razor: (1) as
Jefferys and Berger,6 Balasubramanian,11 and others.
The context for the connection is again framed in in Jeffreys, to set prior probabilities under the logic
terms of selecting between competing models or that experience has shown that simpler theories are
hypotheses for the state of nature, but now one is more likely to be correct; (2) from the modeling
assumed to be (for all practical purposes) true. perspective, whereby simple models (those with fewer
As a first attempt at framing Ockham’s Razor parameters) make sharper inferences and therefore
from a Bayesian viewpoint, one might take the will have higher posterior probabilities; (3) from a
approach that the simpler model is preferable because model selection perspective (Bayesian or frequentist),
it has higher prior probability. While this is appealing in which criteria such as AIC and BIC favor simpler
intuitively, tries to provide an answer to the question models.
REFERENCES
1. Spade PV. William of Ockham. In: Zalta EN 7. Akaike H. Information theory and an extension of the
(ed.). The Stanford Encyclopedia of Philosophy maximum likelihood principle. In Petrov BN, Czaki F,
(Fall 2008 Edition) 2008, http://plato.stanford.edu/ eds. Second International Symposium on Information
archives/fall2008/entries/ockham. Theory. Budapest: Akademiai Kiado; 1973, 267–281.
2. Stigler SM. Stigler’s law of eponymy. Trans New York 8. Schwarz G. Estimating the dimension of a model. Ann
Acad Sci, Ser 2 1980, 39:147–158. Stat 1978, 6:461–464.
3. Thorburn WM. The myth of Occam’s razor. Mind 9. Smith AFM, Spiegelhalter DJ. Bayes factors and choice
1918, 27:345–353. criteria for linear models. J R Stat Soc., Ser A 1980,
4. Keele R. Walter Chatton. In: Zalta EN (ed.) The 42:213–220.
Stanford Encyclopedia of Philosophy (Fall 2008 Edi- 10. Tibshirani R. Regression shrinkage and selection via
tion) 2008, http://plato.stanford.edu/archives/fall2008/ the LASSO. J R Stat Soc., Ser A 1996, 58:267–288.
entries/walter-chatton. 11. Balasubramanian V. Statistical inference, Occam’s
5. Wears RL, Lewis RJ. Statistical models and Occam’s razor, and statistical mechanics on the space of proba-
razor. Acad Emerg Med 1999, 6:93–94. bility distributions. Neural Comput 1997, 9:349–368.
6. Jefferys WH, Berger JO. Sharpening Ockham’s razor 12. Robert CP, Chopin N, Rousseau J. Harold Jeffreys’
on a Bayesian strop. Technical Report # 91-44C. Pur- Theory of Probability revisited. Stat Sci 2009,
due University, 1991. 24:141–172.