Testing mass-action

UPDATE: I wrote this, discussing that I don’t really know the justification for the law of mass action, however, comments from Martin and Helen suggest that a derivation is possible using moment closure/mean field methods. I recently found this article:

Use, misuse and extensions of “ideal gas” models of animal encounter. JM Hutchinson, PM Waser. 2007. Biological Reviews. 82:335-359.

I haven’t have time to read it yet, but from the title it certainly sounds like it answers some of my questions.


Yesterday, I came across this paper from PNAS: Parameter-free model discrimination criterion based on steady-state coplanarity by Heather A. Harrington, Kenneth L. Ho, Thomas Thorne and Michael P.H. Strumpf.

The paper outlines a method for testing the mass-action assumption of a model without non-linear fitting or parameter estimation. Instead, the method constructs a transformation of the model variables so that all the steady-state solutions lie on a common plane irrespective of the parameter values. The method then describes how to test if empirical data satisfies this relationship so as to reject (or fail to reject) the mass-action assumption. Sounds awesome!

One of the reasons I like this contribution is that I’ve always found mass-action to be a bit confusing, and consequently, I think developing simple methods to test the validity of this assumption is a step in the right direction.  Thinking about how to properly represent interacting types of individuals in a model is hard because there are lots of different factors at play (see below). For me, mass-action has always seemed a bit like a magic rabbit from out of the hat; just multiply the variables; don’t sweat the details of how the lion stalks its prey; just sit back and enjoy the show.

Figure 1. c x (1 Lion x 1 Eland) = 1 predation event per unit time where c is a constant.

Before getting too far along, let’s state the law:

Defn. Let x_1 be the density of species 1, let x_2 be the density of species 2, and let f be the number of interactions that occur between individuals of the different species per unit time. Then, the law of mass-action states that f \propto x_1 \times x_2.

In understanding models, I find it much more straight forward to explain processes that just involve one type of individual – be it the logistic growth of a species residing on one patch of a metapopulation, or the constant per capita maturation rates of juveniles to adulthood. It’s much harder for me to think about interactions: infectious individuals that contact susceptibles, who then become infected, and predators that catch prey, and then eat them. Because in reality:

Person A walks around, sneezes, then touches the door handle that person B later touches; Person C and D sit next to each other on the train, breathing the same air.

There are lots of different transmission routes, but to make progress on understanding mass-action, you want to think about what happens on average, where the average is taken across all the different transmission routes. In reality, also consider that:

Person A was getting a coffee; Person B was going to a meeting; and Persons C and D were going to work.

You want to think about averaging over all of a person’s daily activities, and as such, all the people in the population might be thought of as being uniformly distributed across the entire domain. Then, the number of susceptibles in the population that find themselves in the same little \Delta x as an infectious person is probably \beta S(t) \times I(t).

Part of it is, I don’t think I understand how I am supposed to conceptualize the movement of individuals in such a population. Individuals are going to move around, but at every point in time the density of the S’s and the I’s still needs to be uniform. Let’s call this the uniformity requirement. I’ve always heard that a corollary of the assumption of mass-action was an assumption that individuals move randomly. I can believe that this type of movement rule might be sufficient to satisfy the uniformity requirement, however, I can’t really believe that people move randomly, or for that matter, that lions and gazelles do either.  I think I’d be more willing to understand the uniformity requirement as being met by any kind of movement where the net result of all the movements of the S’s, and of the I’s, results in no net change in the density of S(t) and I(t) over the domain.

That’s why I find mass-action a bit confusing. With that as a lead in:

How do you interpret the mass-action assumption? Do you have a simple and satisfying way of thinking about it?


Related reading

This paper is relevant since the author’s derive a mechanistic movement model and determine the corresponding functional response:

How linear features alter predator movement and the functional response by Hannah McKenzie, Evelyn Merrill, Raymond Spiteri and Mark Lewis.

Do we need to derive mechanistic error distributions for deterministic models?

In fitting mathematical models to empirical data, one challenge is that deterministic models make exact predictions and empirical observations usually do not match up perfectly. Changing a parameter value may reduce the distance between the model predictions and some of the data points, but increase the distance to others. To estimate the best-fit model parameters, it is necessary to assign a probability to deviations of a given magnitude. The function that assigns these probabilities is called the error distribution. In this post, I ask:

Do mechanistic, deterministic mathematical models necessarily have to have error distributions that are mechanistically derived?

One of the simplest approaches to model fitting is to use a probability density function, such as the normal or Poisson distribution, for the error function and to use a numerical maximization algorithm to identify the best-fit parameters. The more parameters there are to estimate the more time consuming this numerical search becomes, but in most cases this approach to parameter estimation is successful.

In biology, the processes that give rise to deviations between model predictions and data are measurement error, process error, or both. Some simple definitions are:

  • Measurement error: y(t) = f(x(t), b) + e
  • Process (or demographic) error: y(t) = f(x(t)+e, b)

where x(t) is a variable, such as the population size at time t, b is a vector of parameters, f(x,b) is the solution to the deterministic model, e is the error as generated by a specified probability density function, and y(t) is the model prediction including the error. As examples, counting the number of blue ducks each year might be subject to measurement error if a major source of error is in correctly identifying the colour of the duck, whereas extreme weather events that affect duckling survivorship are a source of process error.

In the simple approach described above, to keep it simple, I intended to implement the measurement error formulation of the full model. Under this formulation, many of the probability density functions that might be chosen as the error distribution have a process-based interpretation. For example, the normal distribution arises if (1) there are many different types of measurement errors, (2) these errors arise from the same distribution, and (3) total measurement error is the sum of all the errors. In biological data, all of that might be true, to some degree, but in general this explanation is likely incomplete.

A second justification of the simple approach, could be that the error distribution is not intended to be mechanistic, and here, the normal distribution is simply a function that embodies the necessary characteristics – it’s a decreasing function of the absolute value of the deviation. But if you have derived a mechanistic deterministic model, is it really okay to have an error distribution that isn’t justified on mechanistic grounds? Does such an error distribution undermine the mechanistic model formulation to the point where you might as well have started with a more heuristic formulation of the whole model? Would this be called semi-mechanistic – if the model is mechanistic, but the error distribution is heuristic?

If this all seems like no big deal, consider that measurement error does not compound under the effect of the deterministic model, while process error does. When only measurement error operates the processes occur as hypothesized and only the measurements are off. When process error occurs – slightly higher duck mortality than average – there are fewer breeding ducks in the next year, and this change feeds back into the process affecting the predictions made for future years. This makes model fitting to y(t) quite difficult. This is because model fitting is easier when the model and the error can be separated so that numerical methods for solving deterministic models can be used. If the error and the model can not be disentangled then fitting to y(t) will usually involve solving a stochastic model of some sort, which is more difficult, and more time consuming.

An easier alternative for the process error formulation, is to fit using gradient matching. This is because deterministic models are usually differential equations, f'(t) = g(x(t),b). Let z(t) be a transformation of the data, such that, z(t) = [y(t+Δt)-y(t)]/Δt, then we can fit the model as z(t) = g(x(t),b) +e1 where e1 are deviations between the empirical estimate of the gradient and the gradient as predicted by the model. Derivations from the model predicted gradient can be viewed as errors in the model formation or error that arises due to variation in the processes described by the model. If we have a mixture of measurement error and process error then we could do something nice like generalized profiling.

Anyway, this all has been my long-winded response to a couple of great posts about error at Theoretical Ecology by Florian Hartig. I wanted to revisit Florian’s question ‘what is error?’ Is error stochasticity? The latter would mean that e is a random variable, and I have a hard time imagining any good reason why e would not be a random variable. However, I think there are more issues to resolve if we want to understand how to define error. Specifically, how do we decide which processes are subsumed under f(x(t)) and which go under e? Is this a judgment call or should all the deterministic processes be part of f(x(t),b) and all the stochastic processes be put into e and therefore be considered error?

Interpreting probabilities

Twice as a student my professors off-handedly remarked that the parameterization of probabilistic models for real world situations lacked a sound philosophical basis. The first time I heard it, I figured if I ignored it maybe it would go away. Or perhaps I had misheard. The second time it came up, I made a mental note that I should revisit this at a later date. Let’s do this now.

The question is how should we interpret a probability. So for example, if I want to estimate the probability that a coin will land heads on a single toss how should I construct the experiment? My professors had said that there was no non-circular real world interpretation of what a probability is. At the time, this bothered me because I think of distributions like the Binomial distribution as the simplest types of mathematical models; the mathematical models with the best predictive abilities and with the most reasonable assumptions. Models in mathematical biology, on the other hand, are usually quite intricate with assumptions that are a lot less tractable. My thinking was that if it was impossible to estimate the probability that a coin lands heads on solid philosophical grounds then there was no hope for me, trying to estimate parameters for mathematical models in biology.

Upon further investigation, now I’m not so sure. Below I provide Elliot Sober’s discussion of some of the different interpretations of probabilities (p.61-70).

1. The relative frequency interpretation. A probability can be interpreted in terms of how often the event happens within a population of events, i.e., a coin that has a 0.5 probability of landing heads on a single toss will yield 50 heads on 100 tosses.

My view: This interpretation is not good because it’s not precise enough: a fair coin might very well not yield 50 heads on 100 tosses.

2. Subjective interpretation. A probability describes the ‘degree of belief that a certain character is true’, i.e., the probability describes the degree of belief we have that the coin will land heads before we toss it.

My view: conceptually, regarding how we interpret probabilities with respect to future events, this is a useful interpretation, but this is not a ‘real world’ interpretation and it doesn’t offer any insight into how to estimate probabilities.

3. Hypothetical relative frequency interpretation. The definition of the probability, p, is,

Pr(|f-p|>ε)=0 in the limit as the number of trials, n, goes to infinity for all ε>0,

where f is the proportion of successes for n trials. Sober says this definition is circular because a probability is defined in terms of a probability converging to 0.

My view: This is a helpful conceptual interpretation of what a probability is, but again it’s unworkable as a real world definition because it requires an infinite number of trials.

4. Propensity interpretation. Characteristics of the object can be interpreted as translating into probabilities. For example, if the coin has equally balanced mass then it will land heads with probability 0.5. Sober says that this interpretation lacks generality and that ‘propensity’ is just a renaming of the concept of probability and so this isn’t a helpful advance.

My view: This is a helpful real world definition as long as we are able to produce a mechanistic description that can be recast in terms of the probability we are trying to estimate.

So far I don’t see too much wrong with 2-4 and I still think that I can estimate probabilities from data. Perhaps the issue is that Sober wants to understand what a probability is and I just want to estimate a probability from data; our goals are different.

I would go about my task of parameter estimation using maximum likelihood. The likelihood function will tell me the how likely it is likelihood that a parameter (which could be a probability) is equal to a particular value given the data. The likelihood isn’t a probability, but I can generate confidence intervals for my parameter estimates given the data, and similarly, I could generate estimates of the probabilities for different estimates of the parameter. In terms of Sober’s question, understanding what a probability is, I now have a probability of a probability, and so maybe I’m no further ahead (this is the circularity mentioned in 3.). However, for estimating my parameter this is not an issue: I have a parameter estimate (this is a probability) and a confidence interval (that was generated by a probability density).

Maybe… but I’m becoming less convinced that there really is a circularity in 3 in terms of understanding what a probability is. I think f(x)=f(x) is a circular definition, but f(f(x)) just requires applying the function twice. It’s a nested definition, not a circular definition. So which is this?

Word for word, this is Sober’s definition:

P(the coin lands heads | the coin is tossed) = 0.5 if, and only if, P(the frequency of heads = 0.5 ± ε | the coin is tossed n times) = 1 in the limit as n goes to infinity,

which he then says is circular because ‘the probability concept appears on both sides of the if-and-only-if’. It is the same probability concept, but strictly speaking, the probabilities on either side refer to different events and so while that might not work to understand the concept of probability, that definition is helpful for estimating probabilities from relative frequencies if we can only work around the issue of not being able to conduct an infinite number of trials. But for me, that’s how the likelihood framework helps: given a finite number of trials, for most situations we might be interested in we won’t be able to estimate the parameter with 100% certainty and so we need to apply our understanding of what a probability is a second time to reach our understanding of our parameter estimate.

But is that really a circular definition?

I’m not an expert on this, I just thought it was interesting. Is anyone familiar with these arguments?


Sober, E. 2000. Philosophy of biology, 2 ed. Westview Press, USA.

Snails shells: the logarithmic spiral

There’s a great post up on The Atavism by David Winter where he explains why the shape of the snail’s shell is a logarithmic spiral. What I find interesting is that the shell has the logarithmic spiral shape under two assumptions:

  • The radius of the shell increases exponentially as the snail ages (i.e., the rate of radial growth is proportional to the shell radius) and,
  • The angle between the centre of the snail and the end of the shell changes at a constant rate – like the second hand of a clock which moves at a constant rate of 360-degrees per minute.

Does anyone know what kind of snail this is? Photo credit: Robyn Hurford

I was hoping that we could use this simple model to do a couple of quick experiments.

Firstly, can we validate the hypothesis that snail shells are logarithmic spirals using something like a Turing test?

That is, among a set of spirals are blog-readers going to pick out the logarithmic spiral as being snail-like?

Secondly, since the art of model derivation is subjective, I want to solicit opinions on this particular model derivation – to see if everyone has the same instinctive appraisal of model assumptions or if there are a range of different tastes on the matter.

… and after you’ve voted be sure to go check out The Atavism for a nice explanation and some great snail pictures!

Blog challenge: Sheep cyclone

In relation to multiscale modelling, a category of problem that receives a good amount of attention is understanding how collective (group) movement arises from movement rules defined at the individual level. For example, the direction of each individual’s next move might depend on where that individual’s nearest neighbours are. What type of rules are necessary for the group to stay together? If the group stays together, what does the collective movement look like (directed or meandering)?

Some nice examples are:

and more generally see here, here, here and here. So this is my question:

What type of individual movement rules are needed to produce a Sheep cyclone?

Bonus question: How close together do the parallel walls (relative to the car width) need to be?

Mechanistic models: what is the value of understanding?

My recent thinking has been shaped by my peripheral involvement in discussions between colleagues at the University of Ottawa. What I will discuss now, is the confluence of ideas expressed by several people, and I say this because these have been stimulating discussions, and I don’t want to appear as taking full credit for these ideas by virtue of this solo author blog post.


All other things being equal, mechanistic models are more powerful since they tell you about the underlying processes driving patterns. They are more likely to work correctly when extrapolating beyond the observed conditions.

-Bolker (2008) Ecological models and Data in R, p7.

My question for today is:

Given a mechanistic model and a phenomenological (or statistical) model, if we are trying to determine which model is best, shouldn’t the mechanistic model score some ‘points’ by virtue of it being mechanistic?

Assume a data set that both models are intended to describe. Define mechanistic and phenomenological as follows,

Mechanistic model: a hypothesized relationship between the variables in the data set where the nature of the relationship is specified in terms of the biological processes that are thought to have given rise to the data. The parameters in the mechanistic model all have biological definitions and so they can be measured independently of the data set referenced above.

Phenomenological/Statistical model: a hypothesized relationship between the variables in the data set, where the relationship seeks only to best describe the data.

These definitions are taken from the Ecological Detective by Ray Hilborn and Marc Mangel. Here are some additional comments from Hilborn and Mangel:

A statistical model foregoes any attempt to explain why the variables interact the way they do, and simply attempts to describe the relationship, with the assumption that the relationship extends past the measured values. Regression models are the standard form of such descriptions, and Peters (1991) argued that the only predictive models in ecology should be statistical ones; we consider this an overly narrow viewpoint.

Having defined mechanistic and phenomenological, now the final piece of the puzzle is to define ‘best’. Conventional wisdom is that mechanistic models facilitate a biological understanding, however, I think that’s only one step removed from prediction – you want to take your new found understanding and do something with it, specifically make a prediction and test it. Therefore, the goal of both mechanistic and phenomenological models are to predict and the performance of the models in this respect is referred to as model validation.

But, validation data is not always available. One reason is that if the models predict into the future, we will have to wait until the validation data appears. The other reason is that if we don’t have to wait, it’s a bit tempting to take a sneak peek at the validation data. For both model types, you want to present a model that is good given all the information available – it’s tough to write a paper where your conclusion is that your model is poor when the apparent poorness of the model can be ‘fixed’ by using the validation data to calibrate/parameterize the model (which then leaves no data to validate the model, something that, if anything, is a relief because your previous try at model validation didn’t go so well).

In absence of any validation data, one way to select the best model is using Akaike Information Criterion (AIC) (or a similar test). AIC will choose a model that fits the data well without involving too many parameters, but does AIC tell me which model is best, given my above definition of best, when comparing a mechanistic and a statistical model? Earlier this week, I said that if we wanted to settle this – which is better mechanistic or phenomenological – then we could settle it in the ring with an AIC battle-to-the-death.

As the one who was championing the mechanistic approach, I now feel like I didn’t quite think that one through. Of the set of all models that are phenomenological versus the set of all models that are mechanistic (with respect to a particular data set), it’s not rocket science to figure out which set is a subset of the other one. If one model is a relationship that comes with a biological explanation too, then you’re getting something extra than the model that just describes a relationship. Shouldn’t I get some points for that? Didn’t I earn that when I took the mechanistic approach to modelling because my options for candidate models is much more limited?

There is one way that mechanistic models are already getting points from AIC. If I did a good job of parameterizing my mechanistic model there should be few fitted parameters – hopefully even none. But is that enough of an advantage? Exactly what advantage do I want? I think what I am hoping for is related to the span of data sets that the model could then be applied to for prediction or validation. I feel pretty confident taking my mechanistic model off to another setting and testing it out, but if my model was purely statistical I might be less confident in doing so. Possibly because if my mechanistic model failed in the new setting I could say ‘what went wrong?’ (in terms of my process-based assumptions) and I’d have a starting point for revising my model. If my statistical model didn’t do so well in the new setting, I might not have much to go on if I wanted to try and figure out why.

But, if the objective is only to predict then you don’t need to know about mechanisms and so the phenomenological/statistical approach is the most direct and arguably best way of generating a good predictive model. Perhaps, what this issue revolves around is that mechanistic models make general and inaccurate predictions (i.e., the predictions might apply to a number of different settings) and that phenomenological models make accurate, narrow predictions.

Truth be known, this issue is tugging at my faith (mechanistic models), and I’m not really happy with my answers to some of the fundamental questions about why I favour the mechanistic approach, as I do. And let me say, too, that I definitely don’t think that mechanistic models are better than phenomenological models; I think that each have their place and I’m just wondering about which places those are.