Breakthrough mathematics, fundamental research and ideas

During my time on the train last week, I read some of the book ‘God created the integers: the mathematical breakthroughs that changed history’ by Stephen Hawking and several free hotel newspapers: the Globe and Mail, the Toronto Star and the National Post. This served as a supplement to my general musings on how to be more imaginative in my research and the innovation agenda.

The book title is based on a quote by the nineteenth century mathematician, Leopold Kronecker; in full, ‘God created the integers. All the rest is the work of Man.’ The quote speaks to the fact that modern mathematics is a magnificent outgrowth of the most humble beginnings: the whole numbers. The book starts with quoting Euclid as writing that “The Pythagoreans… having been brought up in the study of mathematics, thought that things are numbers… and the whole cosmos is a scale and a number.” In the first chapter, what caught my interest was the Pythagorean cult and their treatment of mathematical results such as the square root of 2 being irrational:

 The Pythagoreans carefully guarded this great discovery (irrational numbers) because it created a crisis that reached to the very roots of their cosmology. When the Pythagoreans learned that one of their members had divulged the secret to someone outside their circle they quickly made plans to throw the betrayer overboard and drown him while on the high seas. -p3

Next, I read the Intellectual Property supplement to the National Post, and in reading about intellectual property, I noted that priority to developing new technologies such as Google Glasses* are protected by patents, yet throwing people overboard to protect new advances in fundamental research is no longer appropriate. In fact, amongst scientists insights and new results are freely shared. Arguably, as a consequence, advances in fundamental research then have no market value – if they are keenly given away to anyone, of any company, of any country (or so my reasoning goes).

Back to the book: the next chapters covered Archimedes, Diophantus, Rene Descartes, Isaac Newton and Leonhard Euler. Despite making advances in fundament research, some of these mathematicans also worked on very applied projects: Archimedes on identifying counterfeit coins and Euler on numerous projects including how to set up ship masts, on correcting the level of the Finow canal, in advising the government on pensions, annunities and insurance, in supervising work on a plumbing system, and on the Seven Bridges of Konigsberg Problem. With regard to the Seven Bridges of Konigsberg Problem,

Euler quickly realized he could solve the problem of the bridges simply by enumerating all possible walks that crossed bridges no more than once. However, that approach held no interest to him. Instead, he generalized the problem… – p388.

On the shoulders of Giant's - perhaps (perhaps necessary but not sufficient). Irrespective of the boost: uncommonly brilliant and arguably unmatched. The photo is sourced from Andrew Dunn (http://www.andrewdunnphoto.com/)

… and perhaps that quote speaks to the tension in advancing applied research at the expense of fundamental research.

In reading the book, so far I’m most impressed by Newton**. How on earth did he think of that? By studying pendulums on earth he arrives at a mechanistic model of planetary motion? Swinging pendulums and falling apples? Swinging and thudding? This doesn’t naturally evoke ideas of elliptical motion for me, let alone that these events over such small distances are generalizable to a cosmic scale. Setting that aside, and continuing to generalize: every object I have ever pushed has… stopped. Yet, Newton’s first law, when it comes to objects in motion, earthly observations are the exception to the rule (not generalizable) and it takes an extra twist (external forces) to explain why, on earth, things always stop. Generalize for the universal theory of gravity; don’t generalize for the first law. I find it so not-obvious! And consequently, I’m so very impressed.

Footnotes

*Google is amazing **and Newton, much moreso.

Advertisements

How to not make bad models

Levin (1980)* is a concise and insightful discussion of where mathematical modelling can go wrong. It is quite relevant to my investigation of The Art of Mathematical Modelling and does a nice job of addressing my ‘why make models?’ question.

Vito Volterra is referred to as the father of mathematical biology in Levin (1980).

This paper answered one of the questions that I had long been wondering about: who is considered to be the father of mathematical biology? Levin’s answer is Vito Volterra** – at least for mathematical biologists who come from a mathematical background. Levin then says that modern day mathematical biologists, as the descendents of Vito Volterra, lack his imagination; too often investigating special cases or making only small extensions of existing theory. It’s a fair point, but thinking takes time and time is often in short supply. My take on Levin’s comment is ‘aspire to be imaginative, but to remember to be productive too’. Furthermore, Levin identifies one of the ingredients that make great models great: imagination – I’m adding that to my notes.

A second piece of advice is that mathematical models that make qualitative predictions are more valuable than those that make quantitative predictions. Levin’s reasoning is that ‘mathematical ecology as a predictive science is weaker than as an explanatory science because ecological principles are usually empirical generalizations that sit uneasily as axioms.’ That is quite eloquent – but is it really quite that simple? For example, if you make a quantitative prediction with a stated level of confidence (i.e., error bars) is that really that much worse than making a qualitative prediction? The sentiment of the quote appears to be to not overstate the exactness of the conclusions, but to me this seems equally applicable to quantitative or qualitative models.

Levin coins the phrase ‘mathematics dressed up as biology’. I have my own version of that, as I like to say ‘that’s just math and a story’, in both cases, for use whenever there are weak links between any empirical observations and the model structure.

To conclude, this paper discusses why the different approaches of biologists and mathematicians to problem solving can result in mathematicians that are keen to analyze awkwardly derived models and in biologists who lack an appreciation for the mathematician’s take on a cleanly formulated problem. Rather than discussing what makes great models great, Levin’s paper reads like advice on how not to make bad models, and because it’s so hard to distill the essence of good models, looking at the art of mathematical modelling from that angle is a constructive line of inquiry.

References

Levin (1980), Mathematics, ecology and ornithology. Auk 74: 422-425

Footnotes

*Suggested by lowendtheory, see Crowdsourcing from Oikos blog.

**Do you agree? For me, if this is true then the timing is interesting: Vito Volterra (1926), Ronald Ross (1908), Michaelis-Menten (1913), P.F. Verhulst (1838), JBS Haldane (1924) and the Law of Mass Action dates to 1864.

Levin also hits on several items from my ‘why make models’ list and so I have updated that post.

Interpreting probabilities

Twice as a student my professors off-handedly remarked that the parameterization of probabilistic models for real world situations lacked a sound philosophical basis. The first time I heard it, I figured if I ignored it maybe it would go away. Or perhaps I had misheard. The second time it came up, I made a mental note that I should revisit this at a later date. Let’s do this now.

The question is how should we interpret a probability. So for example, if I want to estimate the probability that a coin will land heads on a single toss how should I construct the experiment? My professors had said that there was no non-circular real world interpretation of what a probability is. At the time, this bothered me because I think of distributions like the Binomial distribution as the simplest types of mathematical models; the mathematical models with the best predictive abilities and with the most reasonable assumptions. Models in mathematical biology, on the other hand, are usually quite intricate with assumptions that are a lot less tractable. My thinking was that if it was impossible to estimate the probability that a coin lands heads on solid philosophical grounds then there was no hope for me, trying to estimate parameters for mathematical models in biology.

Upon further investigation, now I’m not so sure. Below I provide Elliot Sober’s discussion of some of the different interpretations of probabilities (p.61-70).

1. The relative frequency interpretation. A probability can be interpreted in terms of how often the event happens within a population of events, i.e., a coin that has a 0.5 probability of landing heads on a single toss will yield 50 heads on 100 tosses.

My view: This interpretation is not good because it’s not precise enough: a fair coin might very well not yield 50 heads on 100 tosses.

2. Subjective interpretation. A probability describes the ‘degree of belief that a certain character is true’, i.e., the probability describes the degree of belief we have that the coin will land heads before we toss it.

My view: conceptually, regarding how we interpret probabilities with respect to future events, this is a useful interpretation, but this is not a ‘real world’ interpretation and it doesn’t offer any insight into how to estimate probabilities.

3. Hypothetical relative frequency interpretation. The definition of the probability, p, is,

Pr(|f-p|>ε)=0 in the limit as the number of trials, n, goes to infinity for all ε>0,

where f is the proportion of successes for n trials. Sober says this definition is circular because a probability is defined in terms of a probability converging to 0.

My view: This is a helpful conceptual interpretation of what a probability is, but again it’s unworkable as a real world definition because it requires an infinite number of trials.

4. Propensity interpretation. Characteristics of the object can be interpreted as translating into probabilities. For example, if the coin has equally balanced mass then it will land heads with probability 0.5. Sober says that this interpretation lacks generality and that ‘propensity’ is just a renaming of the concept of probability and so this isn’t a helpful advance.

My view: This is a helpful real world definition as long as we are able to produce a mechanistic description that can be recast in terms of the probability we are trying to estimate.

So far I don’t see too much wrong with 2-4 and I still think that I can estimate probabilities from data. Perhaps the issue is that Sober wants to understand what a probability is and I just want to estimate a probability from data; our goals are different.

I would go about my task of parameter estimation using maximum likelihood. The likelihood function will tell me the how likely it is likelihood that a parameter (which could be a probability) is equal to a particular value given the data. The likelihood isn’t a probability, but I can generate confidence intervals for my parameter estimates given the data, and similarly, I could generate estimates of the probabilities for different estimates of the parameter. In terms of Sober’s question, understanding what a probability is, I now have a probability of a probability, and so maybe I’m no further ahead (this is the circularity mentioned in 3.). However, for estimating my parameter this is not an issue: I have a parameter estimate (this is a probability) and a confidence interval (that was generated by a probability density).

Maybe… but I’m becoming less convinced that there really is a circularity in 3 in terms of understanding what a probability is. I think f(x)=f(x) is a circular definition, but f(f(x)) just requires applying the function twice. It’s a nested definition, not a circular definition. So which is this?

Word for word, this is Sober’s definition:

P(the coin lands heads | the coin is tossed) = 0.5 if, and only if, P(the frequency of heads = 0.5 ± ε | the coin is tossed n times) = 1 in the limit as n goes to infinity,

which he then says is circular because ‘the probability concept appears on both sides of the if-and-only-if’. It is the same probability concept, but strictly speaking, the probabilities on either side refer to different events and so while that might not work to understand the concept of probability, that definition is helpful for estimating probabilities from relative frequencies if we can only work around the issue of not being able to conduct an infinite number of trials. But for me, that’s how the likelihood framework helps: given a finite number of trials, for most situations we might be interested in we won’t be able to estimate the parameter with 100% certainty and so we need to apply our understanding of what a probability is a second time to reach our understanding of our parameter estimate.

But is that really a circular definition?

I’m not an expert on this, I just thought it was interesting. Is anyone familiar with these arguments?

References

Sober, E. 2000. Philosophy of biology, 2 ed. Westview Press, USA.

Snails shells: the logarithmic spiral

There’s a great post up on The Atavism by David Winter where he explains why the shape of the snail’s shell is a logarithmic spiral. What I find interesting is that the shell has the logarithmic spiral shape under two assumptions:

  • The radius of the shell increases exponentially as the snail ages (i.e., the rate of radial growth is proportional to the shell radius) and,
  • The angle between the centre of the snail and the end of the shell changes at a constant rate – like the second hand of a clock which moves at a constant rate of 360-degrees per minute.

Does anyone know what kind of snail this is? Photo credit: Robyn Hurford

I was hoping that we could use this simple model to do a couple of quick experiments.

Firstly, can we validate the hypothesis that snail shells are logarithmic spirals using something like a Turing test?

That is, among a set of spirals are blog-readers going to pick out the logarithmic spiral as being snail-like?

Secondly, since the art of model derivation is subjective, I want to solicit opinions on this particular model derivation – to see if everyone has the same instinctive appraisal of model assumptions or if there are a range of different tastes on the matter.

… and after you’ve voted be sure to go check out The Atavism for a nice explanation and some great snail pictures!

Getting the most information from your models: 6 keys to model selection

Today’s post is a guest blog by Shawn Leroux. Shawn is a postdoctoral fellow at the University of Ottawa and he’s going to write about model selection. Model selection techniques, and in particular Akaike Information Criteria, consider the trade-off between data-fitting and involving too many parameters – exactly the types of considerations that go into choosing a model that is Just Simple Enough. Take it away, Shawn!

————————–

I just got back from attending a model selection workshop delivered by Dr. David Anderson (yes, from Burham & Anderson 2002) and organized by the Quebec Center for Forest Research. I have been using the information-theoretic approach for several years but Dr. Anderson provided some  insight that I thought would be useful to share. Below are the six key messages I am taking home from the workshop.

  1. Think, think, think. Seems obvious but it can not be over-emphasized. Model selection is useless if the candidate set of models are not carefully chosen and represented with appropriate mathematical models. Candidate models should be justified and derived a priori.
  2. Adjusted R2 should not be used for model selection. Burham & Anderson (2002; p. 94-96) present convincing evidence for this. Below is a subset of their Table. 2.1.The table presents two (of nine) a priori models of avian species-accumulation curves from the Breeding Bird Survey (from Flather 1996). The table includes model, number of parameters (K), delta AIC, Akaike weights and adjusted R2. If we simply consider the adjusted R2 we conclude that both models are excellent fits to the data. However, model selection based on AIC shows that Model 1 is poor relative to Model 2. In fact, the evidence ratio (see pt 3 below) for Model 2 vs Model 1 is 3.0 x 1035! There is little model selection uncertainty (see pt 4 below) in this two model set. A quick look at the residuals or plots of observed vs predicted values for both models helps to understand why adjusted R2 can be misleading for model selection.
  3. Use evidence ratios. Evidence ratios are a concise way to quantify model selection uncertainty (see pt 4 below) and the weight of evidence for each model. A quick approximation for the evidence ratio of the Best Model (Model with highest Akaike weight) vs Model 2 is given by Akaike weight of Best Model/Akaike weight of Model 2. More formally,
  4. Akaike weights provide a measure of model selection uncertainty. An Akaike weight is the “probability that model i is the actual (fitted) Kullback-Leibler best model in the set” (Anderson 2008, p. xix). We have high model selection uncertainty if more than a couple models in our set have some Akaike weight (e.g. Akaike weights for five models in a five model candidate set are 0.4, 0.2, 0.175, 0.125, 0.1). We have low model selection uncertainty if all or most of the weight lies in one model (e.g. Akaike weights for five models in a five model candidate set are 0.95, 0.0, 0.05, 0.0, 0.0). Model averaging (see pt 5 below) should be done if you have high model selection uncertainty.
  5. Use multimodel inference. Usually we make inference from the estimated best model in our candidate set of models. However, if we thought hard about which models to include in our a priori set and we have some model selection uncertainty (see pt 4 above), then many models may include precious information that can be used. If we use only the estimated best model, we may be throwing away useful information contained in other models in our set. Multimodel inference allows us to gain information from all models in our set. Model averaging predictions and model averaging parameters within models are two useful methods for multimodel inference. Model averaging for prediction is a weighted average of the predictions (Y) from each of the n models (Anderson 2008, p. 108):where ω is the Akaike weight of model i. Model averaging parameters within models is done similarly except we average parameter estimates instead of model predictions.
  6. Beware of pretending variables! Pretending variables occur when an unrelated variable enters the model set with a ΔAIC ~ 2, therefore causing us to consider this model to be a “good” model. We know we have a pretending variable when adding this variable does not change the deviance. If the deviance has not changed, we have not improved the fit of the model with the added parameter. Parameter estimates and confidence intervals around these estimates should be investigated to confirm the presence of a pretending variable. Pretending variable parameter estimates are usually ~ 0 with large confidence intervals. Pretending variables can skew Akaike weights by increasing model selection uncertainty and may bias multimodel inference, so they should be removed from a candidate model set.

Thanks to Dr. Anderson for leading this workshop and to Marc Mazerolle at CEF for organizing it.

References:

Anderson, D.R. 2008. Model based inference in the life sciences: A primer on evidence. Springer, New York.

Burham, K.P. & Anderson, D.R. 2002. Model selection and multimodel inference: A practical information-theoretic approach, 2nd Ed. Springer, New York.

Flather, C.H. 1996. Fitting species-accumulation functions and assessing regional land use impacts on avian diversity. Journal of Biogeography 23: 155-168.