Performance of chironomid-temperature transfer functions

Chrironomid (non-biting midges) are sensitive to temperature (Eggermont and Heiri 2011), and because their head-capsules preserve well in lake sediments, their fossil assemblages can be use to reconstruct past temperature with a transfer function trained on the chironomid-temperature relationship in a modern calibration set of chironomid assemblages and associated temperatures. They are a often used to provide a independent temperature reconstruction against which pollen or macrofossil-inferred vegetation changes can be compared (e.g., Samartin et al. 2012).

The performance of transfer functions for reconstructing past temperatures from chironomid assemblages in lake sediments varies enormously. The root-mean-squared-error of prediction (RMSEP) for transfer functions in a compilation by Eggermont and Heiri (2011), augmented by a few recent studies1, ranges between 0.55 and 2.37 °C. The lowest RMSEP suggests that chironomids give very precise reconstructions; the highest suggests that chironomids have limited utility for anything other than reconstruction glacial-interglacial changes. What causes this 4.3-fold difference in RMSEP? Understanding this might suggest where efforts to improve chironomid-temperature, and other, transfer functions should be focused.

Continue reading

Posted in transfer function | Tagged , | 2 Comments

why would anyone not trust the author????

I have already shown that ordinations of the Lake Żabińskie fossil assemblages are weirder than you ever imagined possible. Now it is time to look at figure 2 of Larocque-Tobler et al (2015) which shows an ordination of the chironomid assemblages in the modern Polish-Canadian calibration set with lakes colour-coded by temperature. A number of lakes are marked as “misplaced”, that is their temperature differs from neighbouring lakes in the ordination. This figure should be easy to replicate with the archived data.


Figure 1. Full published version of LT15 figure 2.

Before attempting to replicate this figure, I want to look at the history of this figure by examining the versions included in the manuscript that became LT15 (Disclosure: I was one of the reviewers). I understand that this could be viewed as a breach of peer review confidentiality, a policy designed to prevent harm to the authors “by premature disclosure of any or all of a manuscript’s details” (ICMJE, 2016). The duty of confidentiality is not without limitations: “Confidentiality may have to be breached” (ICMJE, 2016). In this case, since the final version of figure is published and there is no additional information in the review versions, I maintain that the duty of confidentiality was largely discharged on publication – this is not a “premature disclosure”. Further, I argue that my duty to expose possible issues with LT15 outweighs any residual duty of confidence1. I told QSR of my plans to publish this post: they did not object.

When I first reviewed LT15, circles encompassed groups of lakes that supposedly had the same temperatures.


Figure 2. A. Principal Component Analysis (PCA) of calibration set lakes. The lakes, separated by their chironomid assemblages, were grouped by their mean August air temperature (group 1, 3-11°C; group 2, 11-16°C; group 3, 16.3-17.4°C and group 4, 17.5-27.5°C). The plot of the species scores is identical in all versions of figure 2, so is omitted.

Note that the axes of a ordination should be the same scale, but are not in any version of figure 2. As with Supplementary Data Figure 1, this indicated that they were not plotted in CanoDraw or C2, but probably in something like Excel.

I doubted that each circle encompassed all lakes of the correct temperature and omitted all others as the figure implied. In my review I asked

Figure 2. The circles on this plot beg the reader to trust the author. Does each site within each circle fall within the temperature range given? If not, this figure is misleading.

To which the authors responded

Again, why would anyone not trust the author???? … We added the eleven “misplaced” lakes on the graph. The sentence “Of the 122 lakes, 11(9%) had a chironomid composition which did not correspond to those of lakes with similar mean August air temperature” was added.

From this response alone, I can think of eleven reasons not to trust the authors.

I also thought the percent of the variance assigned to the first (46%) and second (31%) axis was implausibly high for inevitably noisy community data.

Reformulated to: PCA Axis 1 explained 22.3% of the variance in the chironomid data and PCA Axis 2 explained 17.5%.

No explanation for this abuse of the verb “reformulate” was given.

The second review version of the manuscript had a new version of figure 2.


Figure 3. Second review version.

This figure should be identical to the original, except for the symbols showing which the “misplaced” lakes are (and which country each lake belongs to). Indeed it mostly is, but curiously, some lakes have moved position. One lake has vanished from the upper left quadrant, which is otherwise identical. The lower left quadrant is identical. The right hand side has both additions and deletions. I have no idea what is going on here. It cannot be a change in the data transformation used as that would affect all lakes (and the species scores) as would alterations to the data.

Now compare the second version of figure 2 with the published version.


Figure 4. Published version.

Now there are arrows to show where each lake should belong, and a different temperature scale which only extends to 23°C whereas the paper reports lakes as warm as 27.5°C. Where are the four lakes warmer than 23°C? And why are there only 103 lakes not 121? Where are the missing 18 lakes?

The location of lakes in the ordination should be exactly the same as the previous versions. Most lakes are in the same position, but the lower right quadrant is missing a cluster of several lakes, and the upper left quadrant is greatly thinned.

There are now only eight lakes marked as being misplaced. Strangely, six of these eight misplaced were not marked as misplaced in the second review version. Indeed, why would anyone not trust the author?

But what of the archived data? Can any version of figure 2 be replicated? Remember, the figure shows a PCA so it should be possible to reproduce the figure exactly (with the exception that mirror images are possible).

A principal component analysis of square-root transformed species data generates an ordination that looks very similar those shown above. In the figures below, I’ve inverted both axes so they match the published figures.

It is not clear how the temperature scale has been cut into levels for the published figure. What happens to points between 10 and 11°C or between 16 and 17°C? I’ve done my best.

keep <- colSums(spp > 0) >= 3 #Eukiefferiella fittkaui is included

pca <- rda(sqrt(spp[, keep]))

fpca <- fortify(pca, display = "sites", scaling = 1)

lab <- labs(x = "PCA1", y = "PCA2", colour = "Temperature °C")
cols <- scale_colour_manual(values = c("skyblue", "salmon", "black", "green"))

#temperature cuts to match version 2
fpca$temp <- cut(env, breaks = c(3, 11, 16, 17.4, 27.5), include.lowest = TRUE)
fpca$country <- c(rep("Poland", 48), rep("Canada", 73))

g <- ggplot(fpca, aes(-Dim1, -Dim2, colour = temp, shape= country)) +
geom_point(size = 2) +
coord_equal() +lab + cols +
geom_vline(xintercept = 0) +
geom_hline(yintercept = 0)


Figure 5. Attempt to replicate review versions of figure 2

The arrangement of lakes in the ordination greatly resembles the first review version of the analysis with the exception that all the lakes above 23°C (which all lie in or close to the left side of the figure) are not to be found in the first version, and the cluster of lakes on the lower right in the first review version does not exist in the replication.

Some of the lakes marked as misplaced in the second review version, especially those in the lower left quadrant, actually have the expected temperature. Curiously, some of the lakes marked as being Canadian in the review copy are actually Polish. As with issues with Supplementary Data Figure 1, this acts to make the Polish and Canadian assemblages appear to be more similar than the really are.

The widespread mixing of lakes of different temperature classes means the implicit claim that there is a clean separation of lakes into different temperature classes demarcated by circles is misleading.

#Find missing lakes
#plot(-fpca$Dim1, -fpca$Dim2)
#abline(v = 0, h = 0)
#missing <- identify(-fpca$Dim1, -fpca$Dim2)
missing <-  c(3, 48, 66, 67, 68, 69, 70, 71, 72, 73, 114, 115, 116, 117, 118, 119, 120, 121)

#temperature cuts to match published version
fpca$temp <- cut(env, breaks = c(3, 11, 16, 23, 28), include.lowest = TRUE)
g %+% fpca +
geom_point(data = fpca[missing, ], colour = "red", pch = 21, size = 3)

Figure 6. Attempt to replicate published version of figure 2. Lakes enclosed by a red circle have been omitted from the published figure.

The arrangement of lakes also greatly resembles the published figure, with the major exception that the cluster of lakes on the upper far left has been greatly thinned and the lakes above 23°C omitted. Some of the misplaced lakes have arrows attached, others are omitted. Omitting misplaced lakes makes the relationship between chironomid assemblages and temperature appear stronger than can be justified by the data.

The amount of variance explained by the first two axes is considerably lower than that claimed by LT15.

PCA Axis 1 explained 22.3% of the variance in the chironomid data and PCA Axis 2 explained 17.5%.

pc <- pca$CA$eig[1:2] / pca$tot.chi * 100
round(pc, 1)
##  PC1  PC2
## 16.8 11.4

The question arises as to whether the lakes were deleted before or after the ordination was run. This can be tested by running the ordination on the data without the missing lakes.

pca2 <- rda(sqrt(spp[-missing, keep]))
fpca2 <- fortify(pca2, display = "sites", scaling = 1)

#data cuts to match version 2
fpca2$temp <- cut(env[-missing], breaks = c(3, 11, 16, 17.4, 27.5), include.lowest = TRUE)

g <- ggplot(fpca2, aes(-Dim1, -Dim2, colour = temp)) +
geom_point() +
coord_equal() +lab + cols +
geom_vline(xintercept = 0) +
geom_hline(yintercept = 0)

Figure 7. PCA with missing lakes deleted prior to running the ordination.

While this figure is similar to the published figure, the configuration of lakes in the ordination of the full dataset is a much better match, so it appears that the lakes were lost after the ordination was run.

LT15 figure 2 cannot be replicated (neither by me nor the authors). My evaluation of the different versions of this figure and my attempt at reanalysis hints that the analysis may have been tampered with, omitting lakes that did not fit the narrative of “chironomids = temperature”.

If we consider the implausibly precise reconstruction, the false claim that counts were all at least 50 chironomids, the data that evolved into impossible non-integer counts, the weird ordinations, the failed reconstruction diagnostics, the missing rare taxa and now a possibly tweaked ordination, it becomes clear that the question

why would anyone not trust the author????

has many, many answers.

I have been assured by someone who ought to care about the wandering/disappearing lakes, that figure 2

does not amount to prima facie evidence of data fabrication2 …  Other explanations are equally possible, if not more so.

What these other explanations might be was left to my imagination, which rapidly conjured time-travelling sub-aquatic arachnids devouring the chironomids, and thereby changing the ordinations. Discounting these manifestations, I suppose my correspondent was obliquely suggesting that the three erroneous versions of figure 2 are the result of a succession of errors (nearly all of which happen to make the relationship between chironomids and temperature appear stronger than it is).

The errors in figure 2, and Supplementary Data Figure 1, are the direct result of the odd choice to use Excel for plotting, with the attendant (but minimal) risk of copy-paste errors between CANOCO and Excel. This risk would have been avoided entirely by using CanoDraw, which the authors have used previously (see Fig 4 in Larocque et al 2001: JoPL).

Wise counsel enjoins us never to attribute to malice that which is adequately explained by negligence. And everybody makes mistakes: I hope the errors in my papers are neither numerous nor critical. It is not my role to determine how the errors, inconsistencies and improbable results in LT15 came to be. Others have this task of determining whether the results in LT15 are the result of extreme good luck and sloppiness, or something else.

But what I know is this: LT15 is not credible. And, whatever the cause of the problems in LT15, I would read all of the first author’s papers with a certain amount of caution.

  1. Other avenues were explored before publishing this series of posts. 
  2. This is not the f.* word I would have used. 
Posted in Peer reviewed literature, transfer function | Tagged , | 1 Comment

The breakpoint is broken and other tales from the comment on Lyons et al

A few weeks ago, Nature finally published the comment I wrote with colleagues on Lyons et al (2016). In this post, I explore some of the details that did not fit into the constraints of a 1200 word comment, and address some of the issues raised in the reply by Lyons et al.

How foreign is the past?

In his News and Views, Dietl (whose article I have already written about), argues that

the world we live in today … has no analogue in the geological past. As a consequence, referencing ‘natural experiments’ in the distant past as a guide to predict what might happen, now or in the future, is a flawed strategy. Out is the use of uniformitarianism as a guiding principle, and in is a new kind of ‘post-normal’ science.

If this is true, then one of the major justifications for palaeoecology is invalid. Before giving up, its worth checking whether Lyons et al, the paper Dietl is promoting (and was a  reviewers of, is well founded.

Lyons et al, in their paper “Holocene shifts in the assembly of plant and animal communities implicate human impacts”, collate presence-absence assemblage data-sets from the last 300 million years. Within each data-set they determine the co-occurrence structure – are taxa aggregated or segregated?


Figure 1. In the left panel, the taxa are distributed across the landscape at random; in the centre panel, the taxa don’t co-occur, they are segregated; in the right panel, the taxa are aggregated (Dietl 2016). If species were all as easy to identify as this, I could be a field ecologist. (Source Dietl 2016)

Lyons et al find that the proportion of aggregated taxon-pairs begins to change in the mid-Holocene, about 6000 years ago. This change

coincided with increasing human population size and the spread of agriculture in North America

They conclude

The organization of modern and late Holocene plant and animal assemblages differs fundamentally from that of assemblages over the past 300 million years that predate the large-scale impacts of humans. Our results suggest that the rules governing the assembly of communities have recently been changed by human activity.

The authors favoured hypothesis is that habitat fragmentation and dispersal limitation are responsible.


Figure 2. Changes in the proportion of aggregated taxon-pairs over the last three hundred million years. (Source Lyons et al.)

North American mid-Holocene agriculture

If mid-Holocene agriculture was responsible for changing the structure of animal and plant assemblages, one might expect that agriculture to have been intense and extensive. Zelder & Smith (2009), one of two papers by Smith cited by Lyons et al as evidence of mid-Holocene agriculture, write

Five different cultivated varieties of crop plants were brought under domestication in eastern North America between 5000 and 3800 cal. BP: squash Cucurbita pepo, sunflower Helianthus annuus var. macrocarpus, marsh elder Iva annua var. macrocarpa, and two cultivated varieties of chenopod Chenopodium berlandieri (Smith 2006). Clear evidence of the development of a crop complex based on these indigenous eastern seed plants is present by 3800 cal. BP (Smith and Yarnell 2009), but indications of a dominant subsistence role for any crop plant do not appear until after AD 700, when a widespread shift to maize‐centered agriculture is documented by a dramatic change in stable carbon isotope values in human burial populations (Smith 1990).

It would appear that the claim that agriculture at 6000 BP had fundamentally affected the ecological patterns has fallen at the first hurdle: there is no evidence of agriculture this early in eastern North America. That is not to say that humans had no impact of the landscape: just ask any giant sloth.

Poor data-set selection

Any analysis critically depends on the quality and relevance of the data to the hypothesis being tested. Unfortunately, there are serious issues with the modern data used by Lyons et al. They use:

  • 53 fossil mainland data-sets
  • 48 modern mainland data-sets
  • 53 modern island data-sets

The modern island data are only used in the breakpoint analysis (see below) but their relevance is doubtful as different biogeographic processes are important on islands and none of the fossil data-sets are from islands. If Lyons et al were convinced that they were comparable, they would have used them throughout.

Most of the modern mainland data-sets (and all the island data) are a subset of data from a compilation by Atmar & Patterson (1995) for testing their nestedness temperature method. Gotelli & Ulrich (2010) are cited – but they just used the earlier compilation. Once I had been sent the list of data-sets used, I was able track down the metadata for each data-set. There were a large number of issues in the modern mainland data. For example,

  • Two of the data-sets from the Canary Islands were mis-categorised as being mainland.
  • Two data-sets were perfect duplicates of each other.
  • One data-set was a transposed (taxa became sites, and vice versa) near-duplicate of another data-set (Atmar & Patterson had not stored the species or site names so this was only obvious when checking the source publication).
  • Some of the data-sets were very small. The smallest island data-set had only four samples; the smallest mainland data-set had seven.

There were many overlapping data-sets. For example:

  • Two data-sets of Western Australian snails differed only in how sub-species were treated.
  • The bird assemblages in 12 Illinois woods sampled in 1979 and 1980 were treated as independent data-sets.
  • A total of 12 data-sets covered rodents from three habitats in four deserts from the SW USA. The 45 samples from Sonoran desert scrub data-set were included in the 48 sample Sonoran all habitats data-set. They were also included in the all-desert scrub data-set and the all-desert all-habitat data-set.

These overlaps should not have been a surprise as they are documented by Wright et al (1997).

These duplicate and near duplicate data-sets will probably bias the analysis as they give more weight to these data-sets and underestimate the p-value. A similar problem occurs in the fossil pollen data which includes 1000 year time slices of the North American pollen database.

Some of the modern data-sets are from dispersal-limited systems, such as ‘sky-islands’ of boreal forest on mountains in the desert and terrestrial cave invertebrates. The logic that excludes the real island data-sets should also exclude these.

Although Lyons et al are hypothesising that human activities, especially agriculture, are responsible for the change in assembly rules, some of the included data-sets are of minimal relevance to this hypothesis, for example the cave invertebrates mentioned above and the invertebrate assemblages colonising jars of artificial lake water at different distances from a Texan lake.

Lyons et al reply

Lyons et al published a corrigendum that dealt with the mis-categorised, the completely duplicated and the transposed data-sets. It also added some accidentally omitted island/marine data-sets (marine fouling organisms on tile plates suspended underneath Duke University Marine Laboratory dock) which should have been deliberately excluded as irrelevant.

Perhaps not surprisingly, the authors disagree with our assertion that many of the modern data-sets are inappropriate for comparison with fossil data-sets.

Their main argument in this section seems to be that since the modern landscape is highly fragmented, data-sets from relict or patchy habitats should not be excluded. We did not argue that data-sets from landscapes that have been fragmented by human activity should be excluded, but that naturally dispersal limited systems, akin to islands, should be excluded for the same reason that islands are excluded. Most of the data-sets we think should be excluded as ‘islands’ are also small, so should have been excluded because they were small anyway (see below).

The authors’ rejoinder that some fossil data-sets were from edaphically distinct habitat islands is less persuasive when you look at the papers cited and see that they refer to floodplain deposits – which tend to be linear systems rather than islands.

Lyons et al re-run their analysis without small, overlapping, or dispersal limited modern data-sets and report that the results are consistent with the original results. The mean proportion of aggregated taxon pairs for the modern data-sets rose from 0.33 to 0.53 with the strictest criteria (no overlaps, n = 50) which left eight modern data-sets out of the original 50 (there was some confusion as to which data-sets had originally been included).

However, Lyons et al explicitly did not delete any pollen assemblages

because they are separated by 1,000 to 20,000 years, do not necessarily contain the same cores, and vary in their geographic extent. They are not subsets of one another, even if some of the genera and sites may recur.


Figure 3. Proportion of pollen cores in each 1000 year time-slice that were in the previous time-slice.

For the Holocene, typically over 90% of pollen cores in each time-slice are shared with the previous time-slice. The geographic area is typically very similar between adjacent time slices. Treating each time-slice as if it is independent over-inflates the importance of the pollen data, which constitutes the vast majority of data-sets in the 100 – 100,000 year range.

The breakpoint is broken

Lyons et al use a breakpoint analysis to determine when the change in proportion of aggregated taxon pair starts. This analysis fits separate regression lines to the data either side of a breakpoint, the position of which is estimated. The breakpoint analysis gives the authors the headline result that there is a mid-Holocene change in assembly rules and hence that human activity might be responsible.


Figure 4. Breakpoint analysis from Lyons et al.

The confidence intervals in figure 4 are clearly wrong. The confidence intervals appear to be drawn for each regression line separately rather than for the model as a whole.

All the data, island and mainland, are used in the breakpoint analysis

  The breakpoint analysis was run using all the data resolved to the best possible dates to allow for the greatest amount of power in detecting where the switch occurred.

But the reader is assured that

the results were similar when island data were excluded.

The paper only reports the results with the island data included. So I asked for the results without the island data (which I think should have been given, at least in supplementary material).

With the island data included, the breakpoint occurred 5998 years ago with a 95% confidence interval of 40 – 893,000 years. This breakpoint could have occurred between the mid-Pleistocene transition and 1970. Difficult to pin this on early agriculture.

Without the island data, the breakpoint is at 5998 years and the 95% confidence interval is 7 – 4,940,000 years. So the estimate is exactly the same, even though a third of the data have been removed and the confidence interval has now extends into the mid-Pliocene.

The authors interpret the extreme stability of the estimate as evidence that the result is robust. We viewed it as an indicator of pathological behaviour in the segmented R package used. So we (or rather Joe) re-implemented the breakpoint analysis using Markov chain Monte Carlo.

Re-running the breakpoint analysis


Figure 5. The result of our breakpoint analysis.

Importantly, this is the same underlying model that Lyons et al use. All that has changed is the fitting mechanism. When fitted with one breakpoint, the model finds a mid-Holocene breakpoint with extremely wide credibility intervals. However, a model with no breakpoint has lower deviance information criterion score. The timing, and indeed existence, of a breakpoint is highly uncertain. The claim of a mid-Holocene breakpoint by Lyons et al is very weak.

Lyons et al reply

Apparently we “misconstrue” the authors’ use of the break-point analysis.

We did not use it as a formal statistical hypothesis test. … Rather, we used the break-point analysis to identify the approximate time the decline became more pronounced.

Why don’t they just admit that the breakpoint analysis was flawed.

They report the most parsimonious model is one that does not have a break point but instead an “exponential decline” in the proportion of aggregated pairs through time.

This is becoming misleading. This might suggest that we are using an exponential model as an alternative to a linear model with a a breakpoint. We are not. Following Lyons et al, we use log(age) as the predictor. Since we do not find evidence for a breakpoint, we prefer the simpler no-breakpoint model. The choice by Lyons et al of an log(age) predictor has the natural consequence that any change is exponential against age. The exponential decline is not a finding of the model: it is an assumption of the model. It is not a very sensible model. For example, it is now almost a year after the paper was published: what is the predicted percent of aggregated taxon pairs? You cannot tell: log(-1) is not a number. Following their model, at age = 0, the percent aggregated taxon pairs reached minus infinity. Clearly, the domain in which the model is valid is restricted.

Contrary to what Lyons et al argue in their reply, we did not confirm that the decline is non-linear, nor that the decline becomes steeper in the Holocene. These are direct consequences of the model they selected.

It is interesting to contrast the little emphasis the authors would now give to the breakpoint with quotes from the authors in press coverage at the time the paper was published (I’m just taking quotes as other aspects of the press release might have been written by journalists).

When early humans started farming and became dominant in the terrestrial landscape, we see this dramatic restructuring of plant and animal communities. Gotelli

The fraction of plant and animal species that were positively associated with each other was mostly unchanged for 300 million years. Then that fraction sharply declined over the last 6,000 years. Waller

the study even provides a way to date the start of the anthropocene. Waller

This tells us that humans have been having a massive effect on the environment for a very long time. Lyons

The pattern of co-occurring species remained stable through the evolution of land organisms from the earliest tetrapods through dinosaurs, flowering plants and mammals. This pattern didn’t change because of previous mass extinctions or ancient climate variability, but instead, early human activities 6,000 years ago suddenly began resetting a basic property of natural communities. Behrensmeyer

But more and more, research into the Holocene and the Pleistocene is showing that when humans started migrating around the globe, they started having major, unprecedented impacts. Lyons

Biases in the statistics

Often the interesting figures that show problems with a paper are buried deep in the supplementary material. So it is with Lyons et al. The supplementary material include the presence-absence matrices for each fossil data-set and the taxon co-occurrence plot.


Figure 6. Upper panel – presence-absence matrix for the North America pollen 0 ka with taxa ordered by occupancy and sites ordered by richness. Lower panel – co-occurrence plot with taxa ordered by occupancy.

Many of the co-occurrence plots resemble figure 6 with aggregated and segregated taxon-pairs occupying distinct places in the plot. The general pattern is this: two very rare (bottom left) or two very common taxa (top right) cannot be identified as either aggregated or segregated as there is not enough statistical power. Segregated pairs tend to have one common and one rare taxa (top left). Aggregated pairs tend to have taxa with similar occupancies (near diagonal).

Since it is reasonable to expect aggregated and segregated taxon pairs to exist over (almost) the full range of occupancies, the observed pattern probably reflects differences in the statistical power available to detect aggregation and segregation at different levels of occupancy.

Realising there were problems with statistical power, I decided to check if the proportion of aggregated taxon pairs was affected by data-set size. I am surprised that this does not appear to have been done before. Small data-sets will have less statistical power, but will detection of aggregated and segregated pairs be equally affected? The first simple test of this is to plot the proportion of aggregated taxon pairs against data-set size.


Figure 7. Proportion aggregated taxon pairs against data-set size.

There is a strong relationship between data-set size and the proportion of aggregated pairs. Very small data-sets have a low proportion of aggregated pairs. The proportion stabilises once there are more than ~50 sites in the data-set.

The relationship can be explored more directly by taking different sized random sub-samples of a large data-set (ensuring that the number of species is constant). I did this for four data-sets. There are strong, but perhaps inconsistent, patterns in each data-set.


Figure 8. Effect of number of observations on percentage of aggregated taxon pairs for four data-sets used by Lyons et al. US desert rodents (a), Holocene mammals (b), 1,000-year-old North American pollen (c), and 1950 Wisconsin understorey vegetation (d). Grey band shows 95% confidence interval of a local regression smoother fitted with a quasi-binomial distribution.

Holocene mammals and Wisconsin understorey vegetation have the most similar patterns. Initially, there is little change in the proportion of aggregated taxon pairs as the data-set is reduced in size. Then there is a slight rise followed by a strong fall.

The North American pollen 1ka data-set had a similar pattern but is noisier because there are fewer non-random pairs in this data-set.

The desert rodent data-set shows only a rise in proportion aggregated pairs. Very few subsets with fewer than fifty sites had any non-random taxon pairs, so it is not possible to tell what happens for small data-sets.

Part of this inconsistency probably relates to patterns of occupancy within the data-sets.

Lyons et al reply

Lyons et al ignore figure 7 completely and focus on the inconsistent patterns shown in figure 7. Yes the patterns are inconsistent – but probably understandable with more work – but the existence of any pattern with data-set size should worry the authors.

They claim that the effect is only found with very small samples

Using data from Telford et al. simulations, we note that the purported sample size effect in their extended data fig. 1 is found only at low sample sizes (rarefying to ten samples). Even slightly elevated sampling removes this effect; when rarefying to 20, two datasets were non-significant and two showed a relationship in the opposite direction.

They are tying to fit a linear trend to data with a non-linear relationship. Not a cunning plan.

Lyons et al repeat their analyses without the small data-sets and report that the

Redoing our analyses while eliminating datasets with less than 10, 20 or even 50 sites does not change our results substantially (Extended Data Table 1).

With the n = 50 criterion, which will remove most of the sample size bias, only eight modern mainland data-sets remain out of the initial fifty. When I subset the data – it is in the supplementary material – I find nine data-sets meet the criteria.

lyons <- read_excel("nature20097-s1.xlsx")

modern <- lyons %>%
filter(numSites >=50, age <= 100, land.island.ete.corrected != "island", land.island.ete.corrected.overlapping.removed != "NA") %>%
select(Dataset, numSites, percAgg)


Dataset                              numSites     percAgg

———————–  ———  ———-
crypticTaxa                                    171   0.66
NorthAm-mammals-Modern    67   0.41
WI-overstory-SPP-1950           168   0.53
WI-understory-SPP-1950          152   0.46
WI-overstory-SPP-2000           168   0.62
WI-understory-SPP-2000         152   0.44
NorthAm-pollen-0ka                 252   0.30
swusrodco.txt                               202   0.53
wausnailco.txt                                 55   0.82

However, this includes two pairs of data-sets that were resampled and should have been excluded under the no overlaps criterion. That leaves just seven modern data-sets. The analysis will be very sensitive to the properties of these data-sets.

Next Lyons et al stratify the data-sets into three groups:

Deep Time (>1 million years ago; n = 29 data sets), Shallow Time (1 million–100 years ago; n = 24), and Modern (<100 years ago; n = 46).

They argue that the deep time and modern have similar data-set sizes, but differ in the proportion of aggregated taxa and hence our objections are ill founded. I’m not going to criticise what they do here, except to note that they appear to have done a t-test which won’t perform well with the non-normally distributed number of sites and that a Wilcoxon test is significant at the p = 0.05 level. I will quote this though:

Third, in re-testing for a Holocene shift, Bertelsmeier and Ollier … break the data into three arbitrary time bins that bear no correspondence to accepted geological time periods. Their ‘ancient’ is 300 million years ago to 1 million years ago, which encompasses two of the largest recorded mass extinctions, the break-up of Pangaea, considerable changes in climate, the rise of angiosperms and a change from dinosaur- to mammal-dominated terrestrial ecosystems. Moreover, their definitions merge all Pleistocene and non-modern Holocene data into a single bin, obscuring a Holocene shift.

What drives aggregated and segregated taxon pairs

Non-random co-occurrence patterns can be driven by a variety of factors.

  • Large scale dispersal limitations (polar bears and walrus vs penguins)
  • Niche separation (polar bears vs pandas)
  • Biotic interactions (mutalists, competitive exclusion)
  • Small scale dispersal limitations (between islands, habitat fragments)

Because of the large spatial scale of some the data-sets in Lyons et al, niche separation is important, but probably little affected by human activity. For example, looking closely at the North American pollen 0ka slice (figure 5) we see that of the 48 segregated pairs, 13 (27%) include Rhizophora (species 33, red mangrove). Rhizophora is only found in Florida mangrove swamps and does not co-occur with temperate and boreal trees. Humans have not altered this relationship, but because Rhizophora only occurs in the most recent two pollen time-slices (probably due to sea-level rise submerging early Holocene mangrove swamps), it contributes to the apparent decrease in the proportion of aggregated taxon pairs.


It is exciting to see palaeoecological data being used to address important ecological questions, but devastating that such bold claims are supported by inappropriate data and analyses.

Given the number and severity of problems with Lyons et al (2016), we thought that they might retract the paper. Instead they essentially admit to all the problems, but carry on regardless.

Posted in Peer reviewed literature | Tagged , | Leave a comment

OxCal and R

OxCal is perhaps the most powerful of the three main Bayesian age-depth modelling procedures, with many many options and the potential for building far more complicated than your typical palaeolimnologist needs to use. Unlike Bchron and Bacon, OxCal is not R based. Much as I want to use OxCal, I also want to use R.

In this post, I show how OxCal models can be built in R, the OxCal command line program called from R, and the output imported into R and plotted. The code does not deal will all possible outputs from OxCal (it may need altering if your model is different from mine) and absolutely does not absolve you from checking the log files for any problems.

You need to start by downloading OxCal and installing it somewhere convenient on your computer.

We need a couple of packages for data manipulation and plotting.


And we need some radiocarbon dates. Here are a few of the dates from a model I’m currently working on (the full set of dates takes about six hours to process).

dates <- read.table(header = TRUE, text =
"depth_corr    Lab_Code    raw_14C_age    error
312.5    X-1    16010    100
352.0   X-2    16670     70
406.5    X-3    17730    130
440.5    X-4    18690    120
490.5    X-5    20150    220
535.5    X-6    20930    170

OxCal was written by archaeologists so you need to think like an archaeologist to use it: oldest dates need to come first in the model. (I suspect many palaeolimnologists get this wrong and then wait days before the model refuses to converge. At least I’ve done this a couple of times.)

##sort so oldest first
dates <- dates[order(dates$depth_corr, decreasing = TRUE), ]

I want to use a P sequence model to fit an age-depth model to the dates. I’m using the Marine13 calibration curve with a local reservoir offset of 40 +/-30 years. I’m using the P sequence model with a k selected from a prior distribution by OxCal and with an outlier model. All these commands and their arguments are explained in the on-line help for OxCal. writeLines prints the formatted model to the screen. I’ve used lots of white space (don’t worry, it’s free) to make the model human readable.

#Calibration information
curve <- ' Curve("marine13");'
DR <- ' Delta_R("Local Marine",40,30);'

#The dates
ageDepths <- with(dates,
  sprintf('  R_date("Dep%s",%d,%d){\n    z=%s;\n    Outlier(0.05);\n  };',
    depth_corr, raw_14C_age, error, depth_corr
ageDepths <- paste(ageDepths, collapse = "\n")

#Putting it all together
commandp <- paste(
' Outlier_Model("General",T(5),U(0,4),"t");',
' P_Sequence("k",1,1,U(-2,2)){',
'  Boundary("Bottom");',
'  Boundary("Top");',
' };',
sep = "\n")

#Wrapping in Plot()
commandp <- paste('Plot(){', commandp, '};', sep = "\n")
## Plot(){
##  Curve("marine13");
##  Delta_R("Local Marine",40,30);
##  Outlier_Model("General",T(5),U(0,4),"t");
##  P_Sequence("k",1,1,U(-2,2)){
##   Boundary("Bottom");
##   R_date("Dep535.5",20930,170){
##     z=535.5;
##     Outlier(0.05);
##   };
##   R_date("Dep490.5",20150,220){
##     z=490.5;
##     Outlier(0.05);
##   };
##   R_date("Dep440.5",18690,120){
##     z=440.5;
##     Outlier(0.05);
##   };
##   R_date("Dep406.5",17730,130){
##     z=406.5;
##     Outlier(0.05);
##   };
##   R_date("Dep352",16670,70){
##     z=352;
##     Outlier(0.05);
##   };
##   R_date("Dep312.5",16010,100){
##     z=312.5;
##     Outlier(0.05);
##   };
##   Boundary("Top");
##  };
## };

Once we’re happy with the model, we can save the file and call the OxCal command line utility. There are different versions of this for Linux, Mac and Windows.

writeLines(commandp, con = "oxcal_demo.input")
system(paste("~/oxcal/OxCal/bin/OxCalLinux", "oxcal_demo.input"))

Depending on how complicated your model is, you now have time for a cup of tea or a round the world cruise. During this time, R will sit there apparently doing nothing, but in the background, OxCal is busy shunting electrons round.

OxCal will produce three files, oxcal_demo.log, oxcal_demo.txt, and oxcal_demo.js. You need to look in the .log file and check for any problems. The .txt file contains a summary of the model output and the .js file contains all the details. It is this file we need to read into R. Unfortunately it is a large and complicated file which takes a bit of effort to read. I’ve made some functions to help import the data in to a data.frame that I can plot with ggplot. The functions extract what I need – they will need modifying if you need other information or if you have used a different naming convention for the dates.

get.values <- function(js, target, name = "value", numeric = TRUE) {
x <- js[grep(target, js)]
val <- gsub(".+=(.+);", "\\1", x)
lab <- gsub("^(ocd\\[\\d+\\]).*", "\\1", x )
if(numeric) {
val <- as.numeric(val)
res <- data_frame(lab, val)
setNames(res, c("lab", name))
get.vectors <- function(js, target) {
x <- js[grep(target, js)]
lab <- gsub("^(ocd\\[\\d+\\]).*", "\\1", x )
val <- gsub(".+=\\[(.+)\\];", "\\1", x)
val <- strsplit(val, ", ")
res <- sapply(val, as.numeric)
setNames(res, lab)
get.OxCal <- function(js, posterior = TRUE){
what <- "posterior"
what <- "likelihood"
depths <- get.values(js, "\\.data\\.z", "depth")
comment <- get.values(js, paste0(what,"\\.comment\\[0\\]"), "comment", FALSE)
depth_comment <- left_join(depths, comment)

start <- get.values(js, paste0(what,"\\.start"), "start")
resolution <- get.values(js, paste0(what,"\\.resolution"), "resolution")
meta <- full_join(start, resolution)
meta <- inner_join(depth_comment, meta)
probs <- get.vectors(js, paste0(what,"\\.prob="))

out <- plyr::adply(meta, 1, function(r){
prob <- probs[[r$lab]]
date <- r$start + 0:(length(prob) - 1) * r$resolution
data_frame(lab = r$lab, depth = r$depth, date = date, prob = prob, comment  = r$comment, posterior = posterior)
out$start <- NULL
out$resolution <- NULL
out$date <- 1950 - out$date#convert to BP

These functions use regular expressions to find and extract the required information. If you don’t know how to use regular expressions, it is well worth spending a morning learning how to use them as they are very useful for processing data.

With these functions, we need to first read the .js file into memory and then extract the posterior and likelihood information separately.

js <- readLines("oxcal_demo.js")
posterior <- get.OxCal(js, posterior = TRUE)
likelihood <- get.OxCal(js, posterior = FALSE)
allprob <- rbind(posterior, likelihood)

##      lab depth    date     prob               comment posterior
## 1 ocd[9] 535.5 25544.5 0.000910 "Dep535.5 Posterior "      TRUE
## 2 ocd[9] 535.5 25539.5 0.000455 "Dep535.5 Posterior "      TRUE
## 3 ocd[9] 535.5 25534.5 0.000455 "Dep535.5 Posterior "      TRUE
## 4 ocd[9] 535.5 25529.5 0.003095 "Dep535.5 Posterior "      TRUE
## 5 ocd[9] 535.5 25524.5 0.001729 "Dep535.5 Posterior "      TRUE
## 6 ocd[9] 535.5 25519.5 0.002367 "Dep535.5 Posterior "      TRUE

Another function will summarise the age-depth model, extracting, by default, the 95.4% credibility interval and the median.

get.ageModel <- function(x, ci = c(0.023, 0.977)){
x %>%
filter(posterior == TRUE) %>%
group_by(depth, comment) %>%
mutate(sumProb = cumsum(prob)) %>%
mutate(sumProb = sumProb/max(sumProb)) %>%
medianDate = approx(x = sumProb, y = date, xout =  0.5)$y,
lowCI = approx(x = sumProb, y = date, xout = ci[1])$y,
highCI = approx(x = sumProb, y = date, xout = ci[2])$y
mod <- get.ageModel(x = allprob)

A final function plots the model

plot.dates_model <- function(dates, model, title){
dates %>% filter(grepl("Dep", comment)) %>% #just the dates
ggplot(aes(x = depth, y = date, size = prob, colour = posterior, group = comment)) +
geom_ribbon(data = model, aes(x = depth, ymax = highCI, ymin = lowCI), inherit.aes = FALSE, alpha  = 0.2) +
geom_line(data = mod, aes(x = depth, y = medianDate), inherit.aes = FALSE, colour = "grey40") +
geom_line(alpha = 0.6) +
scale_x_reverse() +
scale_size(range = c(0, 6)) +
coord_flip() +
labs(x = "Depth cm", y = "Date yr BP") +
theme_bw() +
guides(size = "none", colour = guide_legend(title = "Posterior", override.aes = list(size = 4))) +
g <- plot.dates_model(allprob, mod, "My Core")


The red lozenges show the probability distribution function of the each date when calibrated individually. The blue lozenges show the posterior probability distribution functions of each date when modelled with the p-sequence model.

This code does not extract all the important information from OxCal, but it is a start.

Posted in Age-depth modelling, R, Uncategorized | Tagged | Leave a comment

73 lakes become 78

I have detailed many curious aspects of the remarkable Lake Żabińskie August air-temperature reconstruction by Larocque-Tobler et al (2015). This posts describes yet more – this time in Supplementary Data Figure 1 which shows a redundancy analysis (a type of constrained ordination).

The method

A Redundancy Analysis (RDA) created in CANOCO was also used to determine the fit of downcore samples to the transfer function samples. The RDA constrained to temperature was created with the transfer function samples and the downcore samples added passively to the RDA.

The result

An RDA with downcore samples passively added to the transfer function samples show that no downcore sample was located outside the field determined by the transfer function samples. The eight warmer lakes (17–27.5 °C) in the Canadian transfer function were also found in the same quadrant as the Polish lakes (16.3–18.8 °C) suggesting a similarity between the assemblages in warmer lakes in Canada and those which cover the same temperature gradient in Poland.

The first result is somewhat inevitable given that only the first two axes of the RDA are considered. As a reconstruction diagnostic tool, passive plotting is only really useful if residual distances are small, otherwise the fossil samples might be highly aberrant on axis three and no-one would ever know.

The second result is critically important to the paper which combines calibration sets from Canada and Poland. If the warm Polish and Canadian lakes do not overlap in ordination space, they do not have similar species assemblages and it is likely that an environmental variable other than temperature is driving the difference between them. This would severely damage the motivation for making a combined calibration set.

The figure


A few points are immediately obvious.

  • Unlike perhaps every other RDA I have seen, there is no arrow showing the constraining environmental variable. It should point right from the origin.
  • The axis scales are not identical as the should be (giving undue prominence to the first axis).
  • The second axis of the ordination is mis-labelled as RDA Axis 2. When there is only one constraining variable, there can only be a single constrained axis. The second axis should be PCA axis 1.

None of these points are of any great importance, except that they strongly indicate that the figure was not made in CanoDraw, C2 or the vegan/analogue packages in R which would not make these mistakes.

Much more seriously, while there are 73 Canadian lakes in the archived calibration set (not 72 as the paper reports), there are 78 red squares (perhaps 77 – some overlap) which indicate Canadian lakes in the figure.

The replication

This figure should be easy to replicate from the archived data (full code at First, I want to plot just the calibration set lakes, omitting the passive fossil samples.


mod <- rda(sqrt(spp) ~ env)

scaling <- "sites"
frda <- fortify(mod, display = "sites", scaling = scaling)

#country information
frda$country <- c(rep("Poland", 48), rep("Canada", 73))

g <- ggplot(frda, aes(-Dim1, Dim2, colour = country, shape = country)) + #axis 1 flipped to match published figure
geom_point(size = 2) +
coord_equal() +
labs(x = "RDA1", y = "PCA1", shape = "Country", colour = "Country") +
geom_vline(xintercept = 0) +
geom_hline(yintercept = 0) +
scale_shape_manual(values = c(15, 18)) +
scale_colour_manual(values = c(2, 4))




Generally, the replication is similar to the published figure. However, contrary to what LT15 write and show in their figure, there are only two Canadian lakes in the lower-left quadrant with the Polish lakes. The extra Canadian lakes in the Polish quadrant are not due to mis-classification of the lakes, there are no lakes in these positions in the replication. Other changes are the deletion of two lakes at the left side of the upper left quadrant and the addition of one point at the upper-right corner of the upper right quadrant.

I honestly cannot explain how these difference could have occurred.

Adding the fossil sites passively is most easily done with the function timetrack in Gavin Simpson’s analogue package.

tt <- analogue::timetrack(X = spp, passive = fos, env = env, method = "rda", transform = "sqrt", scaling = scaling)

g + geom_point(aes(x = -RDA1, y = PC1), data =$fitted.values), inherit.aes = FALSE, colour = "green", shape = 17) +
scale_shape_manual(name = "", limits = c("Canada", "Poland", "Fossil"), values = c(15, 18, 17)) +
scale_colour_manual(name = "", limits = c("Canada", "Poland", "Fossil"), values = c(2, 4, 3))


Well most of the fossil samples are in the same quadrant as LT15 find them, but the pattern is not the same. I have no idea if this is because of different species inclusion rules or what

An earlier version of this post contained a faulty version of the timetrack plot due to my misunderstanding of how analogue::join worked. Gavin kindly updated this function so my code now works.

Posted in Peer reviewed literature, transfer function | Tagged , | 6 Comments

Comment on Lyons et al finally published

After many months, our comment on Lyons et al is finally published [link should give free access].

Lyons et al looked at many community  data sets from the last 300 million years and tested for pairs of species that are more aggregated (co-occur) or segregated than expected by chance. They found that in modern data sets, there are more segregated than aggregated species pairs, and the opposite pattern in fossil data sets, with a breakpoint about 6000 years ago, suggesting a severe impact from early agriculture. This was a surprising finding, especially given the presence-absence data used by the authors.

We argue that Lyons et al result is a product of

  • poor data set selection (some of which were addressed by a corrigendum)
  • artefacts in the breakpoint analysis
  • data-set size related artefacts in the identification of segregated and aggregated species pairs

Together, these problems mean that the results of Lyons et al cannot be substantiated.

Lyons and co-authors, predictable, disagree with our conclusion.

Once I get back to Bergen from Makerere, I plan to write something about the reply, and some of our findings that we didn’t have space for in our comment.


Posted in Peer reviewed literature | Tagged | Leave a comment

There is no scientific misconduct …

The youngest patient was two years old when she died three months after Professor Paolo Macchiarini’s experimental surgery to replace her trachea. Since then, Macchiarini has been accused of scientific malpractice. An external investigation found that he reported in a paper that ethics approval had been obtained when it had not, misrepresented patients’ outcomes in several papers, and other issues rated as malpractice.

The vice-chancellor at the Karolinska Institute, where Macchiarini worked, decided that Macchiarini acted “without due care”, but that his behaviour “does not qualify as scientific misconduct”.

That might have been the end of the matter but for the work of journalists. Vanity Fair reported Macchiarini’s plans to marry the producer of a TV documentary about his work in a ceremony officiated by Pope Francis at the Pope’s summer residence with guests including Vladimir Putin and Barack Obama. In reality, the Pope plans took him to South America instead of the wedding of the already-married surgeon. Macchiarini’s CV was only slightly less fanciful. Sveriges Television alleged that some Russian patients Macchiarini operated onwere not ill enough to warrant such a risky procedure.

The misconduct investigation into Macchiarini’s work was subsequently reopened: he was fired (and is under investigation for manslaughter by Swedish prosecutors); and the vice-chancellor resigned, as did several eminent scientists, including Nobel-prize judges.

Macchiarini’s work hastened the deaths of several patients, yet until pressurised by the media, the Karolinska Institute was prepared to overlook misconduct. There is no scientific misconduct so severe that distinguished scientists might not seek to ignore it. How can we ensure that university investigations into research misconduct (or indeed other types of misconduct) are thorough and fair, and as importantly, seen to be thorough and fair? Quis custodiet ipsos custodes?

Posted in Misconduct, Uncategorized | Tagged | 1 Comment