Chironomid vs pollen: Holocene climate change in southern Europe

Pollen-inferred summer temperature reconstruction from southern Europe show a cool early Holocene summers and a warmer late Holocene summer (Davis et al 2003, Mauri et al 2015). In contrast,  warm early Holocene summers are reconstructed elsewhere in Europe and most of the mid-high latitude Northern hemisphere, as expected from the increased summer insolation due to orbital forcing.

Since one important use of palaeoclimate reconstructions is to validate climate models by testing whether model simulations of past climate match that reconstructed from proxies, it is important to understand whether these cool early Holocene summer reconstructions are valid. Uncritical use of palaeodata to validate models is not going to be useful.

Time to call in the cavalry chironomids.

Samartin et al 2017 (which we covered in our reading group last week – not all ideas here are my own) reports chironomid-inferred summer temperature reconstructions from two lakes in the Italian mountains. Both (yes there is replication) lakes show a warm early Holocene, consistent with the insolation changes, marine proxy records and climate models, but not the pollen based reconstructions.


Samartin et al figure 3. Chironomid-inferred mean July air temperature for Lakes Gemini and Verdarolo compared with other palaeotemperature records.

Samartin et al use a combined Swiss-Norwegian chironomid calibration set and report that all the fossil sample have fair/good analogues this calibration set. I would have liked to have seen a plot of the analogue quality in the supplementary material. Otherwise, I like this paper.

Samartin et al suggest that the pollen-based reconstructions are so different because temperature is not the only control on vegetation in southern Europe. Moisture availability is important, and several millennia of disturbance by humans that have replaced the natural vegetation with “humanized vegetation types” such as Garrigue. Thus pollen assemblages from the early Holocene may lack appropriate modern analogues and so pick inappropriate analogues from higher elevations where there are remnant forests. (In June, I am going to a PAGES workshop on improving pollen based reconstruction. I must remember that in some cases the best reconstruction is no reconstruction.)

There is one problem with Samartin et al. Nature has strict guidelines about the availability of data etc.

A condition of publication in a Nature journal is that authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications.

A data availability statement is to be included in papers which will

will report the availability of the ‘minimal data set’ necessary to interpret, replicate and build on the findings reported in the paper.

This is the full data availability statement from Samartin et al:

The chironomid-inferred mean July air temperature records from Gemini and Verdarolo are available at the NOAA National Centers for Environmental Information webpage ( Data from reference 1 (shown in Fig. 3) have been digitized from the original publication. Data from ref. 3 (online resource, ref. 21 (IGBP PAGES/World Data Center for Paleoclimatology Data Contribution Series no. 2006-106) and ref. 46 (IGBP PAGES/World Data Center for Paleoclimatology Data Contribution Series no. 92-007) have been downloaded from the NOAA National Centers for Environmental Information webpage.

Climate model data from refs 36,37 (shown in Fig. 3) are available at the NCAR climate data webpage ( Climate model data from ref. 6 can be obtained from H.R. on request.

Notice anything missing?

There is no mention of where the chironomid calibration set and fossil data are archived or how they can be accessed. I do hope this omission was just an oversight.

There is no better explanation of why the data should be made available than this sentence from the statement.

Data from reference 1 (shown in Fig. 3) have been digitized from the original publication.


Posted in Peer reviewed literature, transfer function | Tagged , , , , , | Leave a comment

Resilience workshop in Finse: deadline soon

At the end of March, there is a workshop on Measuring Components of Resilience in Long-term Ecological Datasets

The workshop is at Finse, high in the mountains between Bergen and Oslo. Along with debate about how ecosystem resilience can estimated from palaeoenvironmental data, and work to implement our ideas in R, I can promise there will be lots of snow, sauna and skiing. If you have size 39 feet, I can lend you some skis (other skis will be available). Northern lights are possible if the weather is good (we saw them last year). The food is excellent, and the train line to Finse from Bergen or Oslo is one of the most scenic anywhere.


Not just ptarmigan and arctic hares & foxes at Finse

There are still a few days left to apply. All travel within Norway (i.e. trains to Finse and back), food and accommodation at the Finse Research Station, is provided. Some funds are available to support travel of early-career researchers.

Posted in Uncategorized | Tagged | Leave a comment

2016 in numbers

Highest temperature: 120 °C

At least that is what the (possibly broken) thermometer in the wood-fired sauna near Tallinn read when I was taken there and beaten with a viht by PhD students Annika and Agata. After a quick plunge into a hole in the ice-covered Viitna Pikkjärv and a discussion of the lake’s palaeoecological record, I repaid the vihtlema, concentrating on the soles of their feet.


The viht and the steam

Longest hiatus between submission and publication: 8 years

The diatom stratigraphy of Gårdsfjärden is now published.

New R packages learnt: three and a half

I started the year having used ggplot2only once, under duress. I now use it for almost all my figures. I have never used dplyr or tidyr and thought the pipe syntax (%;gt;%) looked rather odd. I’m now a convert, finding that dplyr makes it easy to write understandable code. I also learnt to use plyr a bit. I find it very useful for some jobs, but wish I wouldn’t get such unfathomable bugs when I accidentally load dplyr and plyr at the same time.

Coldest temperature: -13 °C

It seemed like a good idea to go skiing before breakfast at the Palaeo-Resilience workshop at Finse. Beautiful but cold before the sun came up. I should have worn a third pair of gloves.

You can get your chance to think about ecosystem resilience while watching the northern lights as you cool down from a sauna this March at the workshop “Measuring Components of Resilience in Long-term Ecological Datasets“. Alas, no viht.


Dawn at Finse

Most giraffes:  ~15

In 2015, 15 Rothschild’s giraffes were translocated to Lake Mburo from other Ugandan National Parks in the hope that this red-listed species would re-establish and help to control the acacia  that is converting the savannah into woodland.

I went to Uganda for Perpetra Akite’s, who was my PhD student, PhD defence. Managed to get away from Kampala for a couple of days to Lake Mburo National Park where I saw most, perhaps all of the herd. And lots of zebra.


Steffi & co

Number of blog posts on solar variability: 0

I found a few papers reporting links between palaeoecological data and solar variability that I wanted to discuss, but never found the time. I’ll try to cover more this year as I expect to write fewer posts about Polish lakes.

Number of email from 431

At least that is how many are in my email trash folder. It seems like many more than that.

Most bison: 0

There are a lot of trees in Białowieża Forest, and it only takes one to hide a bison. Fortunately, I’m going back at the end of January to work on a 52-year phenological data-set (understorey plants, not bison), and the bison should be more visible when they come out of the forest to graze.

Posted in Uncategorized | Tagged , , , | Leave a comment

Performance of chironomid-temperature transfer functions

Chrironomid (non-biting midges) are sensitive to temperature (Eggermont and Heiri 2011), and because their head-capsules preserve well in lake sediments, their fossil assemblages can be use to reconstruct past temperature with a transfer function trained on the chironomid-temperature relationship in a modern calibration set of chironomid assemblages and associated temperatures. They are a often used to provide a independent temperature reconstruction against which pollen or macrofossil-inferred vegetation changes can be compared (e.g., Samartin et al. 2012).

The performance of transfer functions for reconstructing past temperatures from chironomid assemblages in lake sediments varies enormously. The root-mean-squared-error of prediction (RMSEP) for transfer functions in a compilation by Eggermont and Heiri (2011), augmented by a few recent studies1, ranges between 0.55 and 2.37 °C. The lowest RMSEP suggests that chironomids give very precise reconstructions; the highest suggests that chironomids have limited utility for anything other than reconstruction glacial-interglacial changes. What causes this 4.3-fold difference in RMSEP? Understanding this might suggest where efforts to improve chironomid-temperature, and other, transfer functions should be focused.

Continue reading

Posted in transfer function | Tagged , | 2 Comments

why would anyone not trust the author????

I have already shown that ordinations of the Lake Żabińskie fossil assemblages are weirder than you ever imagined possible. Now it is time to look at figure 2 of Larocque-Tobler et al (2015) which shows an ordination of the chironomid assemblages in the modern Polish-Canadian calibration set with lakes colour-coded by temperature. A number of lakes are marked as “misplaced”, that is their temperature differs from neighbouring lakes in the ordination. This figure should be easy to replicate with the archived data.


Figure 1. Full published version of LT15 figure 2.

Before attempting to replicate this figure, I want to look at the history of this figure by examining the versions included in the manuscript that became LT15 (Disclosure: I was one of the reviewers). I understand that this could be viewed as a breach of peer review confidentiality, a policy designed to prevent harm to the authors “by premature disclosure of any or all of a manuscript’s details” (ICMJE, 2016). The duty of confidentiality is not without limitations: “Confidentiality may have to be breached” (ICMJE, 2016). In this case, since the final version of figure is published and there is no additional information in the review versions, I maintain that the duty of confidentiality was largely discharged on publication – this is not a “premature disclosure”. Further, I argue that my duty to expose possible issues with LT15 outweighs any residual duty of confidence1. I told QSR of my plans to publish this post: they did not object.

When I first reviewed LT15, circles encompassed groups of lakes that supposedly had the same temperatures.


Figure 2. A. Principal Component Analysis (PCA) of calibration set lakes. The lakes, separated by their chironomid assemblages, were grouped by their mean August air temperature (group 1, 3-11°C; group 2, 11-16°C; group 3, 16.3-17.4°C and group 4, 17.5-27.5°C). The plot of the species scores is identical in all versions of figure 2, so is omitted.

Note that the axes of a ordination should be the same scale, but are not in any version of figure 2. As with Supplementary Data Figure 1, this indicated that they were not plotted in CanoDraw or C2, but probably in something like Excel.

I doubted that each circle encompassed all lakes of the correct temperature and omitted all others as the figure implied. In my review I asked

Figure 2. The circles on this plot beg the reader to trust the author. Does each site within each circle fall within the temperature range given? If not, this figure is misleading.

To which the authors responded

Again, why would anyone not trust the author???? … We added the eleven “misplaced” lakes on the graph. The sentence “Of the 122 lakes, 11(9%) had a chironomid composition which did not correspond to those of lakes with similar mean August air temperature” was added.

From this response alone, I can think of eleven reasons not to trust the authors.

I also thought the percent of the variance assigned to the first (46%) and second (31%) axis was implausibly high for inevitably noisy community data.

Reformulated to: PCA Axis 1 explained 22.3% of the variance in the chironomid data and PCA Axis 2 explained 17.5%.

No explanation for this abuse of the verb “reformulate” was given.

The second review version of the manuscript had a new version of figure 2.


Figure 3. Second review version.

This figure should be identical to the original, except for the symbols showing which the “misplaced” lakes are (and which country each lake belongs to). Indeed it mostly is, but curiously, some lakes have moved position. One lake has vanished from the upper left quadrant, which is otherwise identical. The lower left quadrant is identical. The right hand side has both additions and deletions. I have no idea what is going on here. It cannot be a change in the data transformation used as that would affect all lakes (and the species scores) as would alterations to the data.

Now compare the second version of figure 2 with the published version.


Figure 4. Published version.

Now there are arrows to show where each lake should belong, and a different temperature scale which only extends to 23°C whereas the paper reports lakes as warm as 27.5°C. Where are the four lakes warmer than 23°C? And why are there only 103 lakes not 121? Where are the missing 18 lakes?

The location of lakes in the ordination should be exactly the same as the previous versions. Most lakes are in the same position, but the lower right quadrant is missing a cluster of several lakes, and the upper left quadrant is greatly thinned.

There are now only eight lakes marked as being misplaced. Strangely, six of these eight misplaced were not marked as misplaced in the second review version. Indeed, why would anyone not trust the author?

But what of the archived data? Can any version of figure 2 be replicated? Remember, the figure shows a PCA so it should be possible to reproduce the figure exactly (with the exception that mirror images are possible).

A principal component analysis of square-root transformed species data generates an ordination that looks very similar those shown above. In the figures below, I’ve inverted both axes so they match the published figures.

It is not clear how the temperature scale has been cut into levels for the published figure. What happens to points between 10 and 11°C or between 16 and 17°C? I’ve done my best.

keep <- colSums(spp > 0) >= 3 #Eukiefferiella fittkaui is included

pca <- rda(sqrt(spp[, keep]))

fpca <- fortify(pca, display = "sites", scaling = 1)

lab <- labs(x = "PCA1", y = "PCA2", colour = "Temperature °C")
cols <- scale_colour_manual(values = c("skyblue", "salmon", "black", "green"))

#temperature cuts to match version 2
fpca$temp <- cut(env, breaks = c(3, 11, 16, 17.4, 27.5), include.lowest = TRUE)
fpca$country <- c(rep("Poland", 48), rep("Canada", 73))

g <- ggplot(fpca, aes(-Dim1, -Dim2, colour = temp, shape= country)) +
geom_point(size = 2) +
coord_equal() +lab + cols +
geom_vline(xintercept = 0) +
geom_hline(yintercept = 0)


Figure 5. Attempt to replicate review versions of figure 2

The arrangement of lakes in the ordination greatly resembles the first review version of the analysis with the exception that all the lakes above 23°C (which all lie in or close to the left side of the figure) are not to be found in the first version, and the cluster of lakes on the lower right in the first review version does not exist in the replication.

Some of the lakes marked as misplaced in the second review version, especially those in the lower left quadrant, actually have the expected temperature. Curiously, some of the lakes marked as being Canadian in the review copy are actually Polish. As with issues with Supplementary Data Figure 1, this acts to make the Polish and Canadian assemblages appear to be more similar than the really are.

The widespread mixing of lakes of different temperature classes means the implicit claim that there is a clean separation of lakes into different temperature classes demarcated by circles is misleading.

#Find missing lakes
#plot(-fpca$Dim1, -fpca$Dim2)
#abline(v = 0, h = 0)
#missing <- identify(-fpca$Dim1, -fpca$Dim2)
missing <-  c(3, 48, 66, 67, 68, 69, 70, 71, 72, 73, 114, 115, 116, 117, 118, 119, 120, 121)

#temperature cuts to match published version
fpca$temp <- cut(env, breaks = c(3, 11, 16, 23, 28), include.lowest = TRUE)
g %+% fpca +
geom_point(data = fpca[missing, ], colour = "red", pch = 21, size = 3)

Figure 6. Attempt to replicate published version of figure 2. Lakes enclosed by a red circle have been omitted from the published figure.

The arrangement of lakes also greatly resembles the published figure, with the major exception that the cluster of lakes on the upper far left has been greatly thinned and the lakes above 23°C omitted. Some of the misplaced lakes have arrows attached, others are omitted. Omitting misplaced lakes makes the relationship between chironomid assemblages and temperature appear stronger than can be justified by the data.

The amount of variance explained by the first two axes is considerably lower than that claimed by LT15.

PCA Axis 1 explained 22.3% of the variance in the chironomid data and PCA Axis 2 explained 17.5%.

pc <- pca$CA$eig[1:2] / pca$tot.chi * 100
round(pc, 1)
##  PC1  PC2
## 16.8 11.4

The question arises as to whether the lakes were deleted before or after the ordination was run. This can be tested by running the ordination on the data without the missing lakes.

pca2 <- rda(sqrt(spp[-missing, keep]))
fpca2 <- fortify(pca2, display = "sites", scaling = 1)

#data cuts to match version 2
fpca2$temp <- cut(env[-missing], breaks = c(3, 11, 16, 17.4, 27.5), include.lowest = TRUE)

g <- ggplot(fpca2, aes(-Dim1, -Dim2, colour = temp)) +
geom_point() +
coord_equal() +lab + cols +
geom_vline(xintercept = 0) +
geom_hline(yintercept = 0)

Figure 7. PCA with missing lakes deleted prior to running the ordination.

While this figure is similar to the published figure, the configuration of lakes in the ordination of the full dataset is a much better match, so it appears that the lakes were lost after the ordination was run.

LT15 figure 2 cannot be replicated (neither by me nor the authors). My evaluation of the different versions of this figure and my attempt at reanalysis hints that the analysis may have been tampered with, omitting lakes that did not fit the narrative of “chironomids = temperature”.

If we consider the implausibly precise reconstruction, the false claim that counts were all at least 50 chironomids, the data that evolved into impossible non-integer counts, the weird ordinations, the failed reconstruction diagnostics, the missing rare taxa and now a possibly tweaked ordination, it becomes clear that the question

why would anyone not trust the author????

has many, many answers.

I have been assured by someone who ought to care about the wandering/disappearing lakes, that figure 2

does not amount to prima facie evidence of data fabrication2 …  Other explanations are equally possible, if not more so.

What these other explanations might be was left to my imagination, which rapidly conjured time-travelling sub-aquatic arachnids devouring the chironomids, and thereby changing the ordinations. Discounting these manifestations, I suppose my correspondent was obliquely suggesting that the three erroneous versions of figure 2 are the result of a succession of errors (nearly all of which happen to make the relationship between chironomids and temperature appear stronger than it is).

The errors in figure 2, and Supplementary Data Figure 1, are the direct result of the odd choice to use Excel for plotting, with the attendant (but minimal) risk of copy-paste errors between CANOCO and Excel. This risk would have been avoided entirely by using CanoDraw, which the authors have used previously (see Fig 4 in Larocque et al 2001: JoPL).

Wise counsel enjoins us never to attribute to malice that which is adequately explained by negligence. And everybody makes mistakes: I hope the errors in my papers are neither numerous nor critical. It is not my role to determine how the errors, inconsistencies and improbable results in LT15 came to be. Others have this task of determining whether the results in LT15 are the result of extreme good luck and sloppiness, or something else.

But what I know is this: LT15 is not credible. And, whatever the cause of the problems in LT15, I would read all of the first author’s papers with a certain amount of caution.

  1. Other avenues were explored before publishing this series of posts. 
  2. This is not the f.* word I would have used. 
Posted in Peer reviewed literature, transfer function | Tagged , | 1 Comment

The breakpoint is broken and other tales from the comment on Lyons et al

A few weeks ago, Nature finally published the comment I wrote with colleagues on Lyons et al (2016). In this post, I explore some of the details that did not fit into the constraints of a 1200 word comment, and address some of the issues raised in the reply by Lyons et al.

How foreign is the past?

In his News and Views, Dietl (whose article I have already written about), argues that

the world we live in today … has no analogue in the geological past. As a consequence, referencing ‘natural experiments’ in the distant past as a guide to predict what might happen, now or in the future, is a flawed strategy. Out is the use of uniformitarianism as a guiding principle, and in is a new kind of ‘post-normal’ science.

If this is true, then one of the major justifications for palaeoecology is invalid. Before giving up, its worth checking whether Lyons et al, the paper Dietl is promoting (and was a  reviewers of, is well founded.

Lyons et al, in their paper “Holocene shifts in the assembly of plant and animal communities implicate human impacts”, collate presence-absence assemblage data-sets from the last 300 million years. Within each data-set they determine the co-occurrence structure – are taxa aggregated or segregated?


Figure 1. In the left panel, the taxa are distributed across the landscape at random; in the centre panel, the taxa don’t co-occur, they are segregated; in the right panel, the taxa are aggregated (Dietl 2016). If species were all as easy to identify as this, I could be a field ecologist. (Source Dietl 2016)

Lyons et al find that the proportion of aggregated taxon-pairs begins to change in the mid-Holocene, about 6000 years ago. This change

coincided with increasing human population size and the spread of agriculture in North America

They conclude

The organization of modern and late Holocene plant and animal assemblages differs fundamentally from that of assemblages over the past 300 million years that predate the large-scale impacts of humans. Our results suggest that the rules governing the assembly of communities have recently been changed by human activity.

The authors favoured hypothesis is that habitat fragmentation and dispersal limitation are responsible.


Figure 2. Changes in the proportion of aggregated taxon-pairs over the last three hundred million years. (Source Lyons et al.)

North American mid-Holocene agriculture

If mid-Holocene agriculture was responsible for changing the structure of animal and plant assemblages, one might expect that agriculture to have been intense and extensive. Zelder & Smith (2009), one of two papers by Smith cited by Lyons et al as evidence of mid-Holocene agriculture, write

Five different cultivated varieties of crop plants were brought under domestication in eastern North America between 5000 and 3800 cal. BP: squash Cucurbita pepo, sunflower Helianthus annuus var. macrocarpus, marsh elder Iva annua var. macrocarpa, and two cultivated varieties of chenopod Chenopodium berlandieri (Smith 2006). Clear evidence of the development of a crop complex based on these indigenous eastern seed plants is present by 3800 cal. BP (Smith and Yarnell 2009), but indications of a dominant subsistence role for any crop plant do not appear until after AD 700, when a widespread shift to maize‐centered agriculture is documented by a dramatic change in stable carbon isotope values in human burial populations (Smith 1990).

It would appear that the claim that agriculture at 6000 BP had fundamentally affected the ecological patterns has fallen at the first hurdle: there is no evidence of agriculture this early in eastern North America. That is not to say that humans had no impact of the landscape: just ask any giant sloth.

Poor data-set selection

Any analysis critically depends on the quality and relevance of the data to the hypothesis being tested. Unfortunately, there are serious issues with the modern data used by Lyons et al. They use:

  • 53 fossil mainland data-sets
  • 48 modern mainland data-sets
  • 53 modern island data-sets

The modern island data are only used in the breakpoint analysis (see below) but their relevance is doubtful as different biogeographic processes are important on islands and none of the fossil data-sets are from islands. If Lyons et al were convinced that they were comparable, they would have used them throughout.

Most of the modern mainland data-sets (and all the island data) are a subset of data from a compilation by Atmar & Patterson (1995) for testing their nestedness temperature method. Gotelli & Ulrich (2010) are cited – but they just used the earlier compilation. Once I had been sent the list of data-sets used, I was able track down the metadata for each data-set. There were a large number of issues in the modern mainland data. For example,

  • Two of the data-sets from the Canary Islands were mis-categorised as being mainland.
  • Two data-sets were perfect duplicates of each other.
  • One data-set was a transposed (taxa became sites, and vice versa) near-duplicate of another data-set (Atmar & Patterson had not stored the species or site names so this was only obvious when checking the source publication).
  • Some of the data-sets were very small. The smallest island data-set had only four samples; the smallest mainland data-set had seven.

There were many overlapping data-sets. For example:

  • Two data-sets of Western Australian snails differed only in how sub-species were treated.
  • The bird assemblages in 12 Illinois woods sampled in 1979 and 1980 were treated as independent data-sets.
  • A total of 12 data-sets covered rodents from three habitats in four deserts from the SW USA. The 45 samples from Sonoran desert scrub data-set were included in the 48 sample Sonoran all habitats data-set. They were also included in the all-desert scrub data-set and the all-desert all-habitat data-set.

These overlaps should not have been a surprise as they are documented by Wright et al (1997).

These duplicate and near duplicate data-sets will probably bias the analysis as they give more weight to these data-sets and underestimate the p-value. A similar problem occurs in the fossil pollen data which includes 1000 year time slices of the North American pollen database.

Some of the modern data-sets are from dispersal-limited systems, such as ‘sky-islands’ of boreal forest on mountains in the desert and terrestrial cave invertebrates. The logic that excludes the real island data-sets should also exclude these.

Although Lyons et al are hypothesising that human activities, especially agriculture, are responsible for the change in assembly rules, some of the included data-sets are of minimal relevance to this hypothesis, for example the cave invertebrates mentioned above and the invertebrate assemblages colonising jars of artificial lake water at different distances from a Texan lake.

Lyons et al reply

Lyons et al published a corrigendum that dealt with the mis-categorised, the completely duplicated and the transposed data-sets. It also added some accidentally omitted island/marine data-sets (marine fouling organisms on tile plates suspended underneath Duke University Marine Laboratory dock) which should have been deliberately excluded as irrelevant.

Perhaps not surprisingly, the authors disagree with our assertion that many of the modern data-sets are inappropriate for comparison with fossil data-sets.

Their main argument in this section seems to be that since the modern landscape is highly fragmented, data-sets from relict or patchy habitats should not be excluded. We did not argue that data-sets from landscapes that have been fragmented by human activity should be excluded, but that naturally dispersal limited systems, akin to islands, should be excluded for the same reason that islands are excluded. Most of the data-sets we think should be excluded as ‘islands’ are also small, so should have been excluded because they were small anyway (see below).

The authors’ rejoinder that some fossil data-sets were from edaphically distinct habitat islands is less persuasive when you look at the papers cited and see that they refer to floodplain deposits – which tend to be linear systems rather than islands.

Lyons et al re-run their analysis without small, overlapping, or dispersal limited modern data-sets and report that the results are consistent with the original results. The mean proportion of aggregated taxon pairs for the modern data-sets rose from 0.33 to 0.53 with the strictest criteria (no overlaps, n = 50) which left eight modern data-sets out of the original 50 (there was some confusion as to which data-sets had originally been included).

However, Lyons et al explicitly did not delete any pollen assemblages

because they are separated by 1,000 to 20,000 years, do not necessarily contain the same cores, and vary in their geographic extent. They are not subsets of one another, even if some of the genera and sites may recur.


Figure 3. Proportion of pollen cores in each 1000 year time-slice that were in the previous time-slice.

For the Holocene, typically over 90% of pollen cores in each time-slice are shared with the previous time-slice. The geographic area is typically very similar between adjacent time slices. Treating each time-slice as if it is independent over-inflates the importance of the pollen data, which constitutes the vast majority of data-sets in the 100 – 100,000 year range.

The breakpoint is broken

Lyons et al use a breakpoint analysis to determine when the change in proportion of aggregated taxon pair starts. This analysis fits separate regression lines to the data either side of a breakpoint, the position of which is estimated. The breakpoint analysis gives the authors the headline result that there is a mid-Holocene change in assembly rules and hence that human activity might be responsible.


Figure 4. Breakpoint analysis from Lyons et al.

The confidence intervals in figure 4 are clearly wrong. The confidence intervals appear to be drawn for each regression line separately rather than for the model as a whole.

All the data, island and mainland, are used in the breakpoint analysis

  The breakpoint analysis was run using all the data resolved to the best possible dates to allow for the greatest amount of power in detecting where the switch occurred.

But the reader is assured that

the results were similar when island data were excluded.

The paper only reports the results with the island data included. So I asked for the results without the island data (which I think should have been given, at least in supplementary material).

With the island data included, the breakpoint occurred 5998 years ago with a 95% confidence interval of 40 – 893,000 years. This breakpoint could have occurred between the mid-Pleistocene transition and 1970. Difficult to pin this on early agriculture.

Without the island data, the breakpoint is at 5998 years and the 95% confidence interval is 7 – 4,940,000 years. So the estimate is exactly the same, even though a third of the data have been removed and the confidence interval has now extends into the mid-Pliocene.

The authors interpret the extreme stability of the estimate as evidence that the result is robust. We viewed it as an indicator of pathological behaviour in the segmented R package used. So we (or rather Joe) re-implemented the breakpoint analysis using Markov chain Monte Carlo.

Re-running the breakpoint analysis


Figure 5. The result of our breakpoint analysis.

Importantly, this is the same underlying model that Lyons et al use. All that has changed is the fitting mechanism. When fitted with one breakpoint, the model finds a mid-Holocene breakpoint with extremely wide credibility intervals. However, a model with no breakpoint has lower deviance information criterion score. The timing, and indeed existence, of a breakpoint is highly uncertain. The claim of a mid-Holocene breakpoint by Lyons et al is very weak.

Lyons et al reply

Apparently we “misconstrue” the authors’ use of the break-point analysis.

We did not use it as a formal statistical hypothesis test. … Rather, we used the break-point analysis to identify the approximate time the decline became more pronounced.

Why don’t they just admit that the breakpoint analysis was flawed.

They report the most parsimonious model is one that does not have a break point but instead an “exponential decline” in the proportion of aggregated pairs through time.

This is becoming misleading. This might suggest that we are using an exponential model as an alternative to a linear model with a a breakpoint. We are not. Following Lyons et al, we use log(age) as the predictor. Since we do not find evidence for a breakpoint, we prefer the simpler no-breakpoint model. The choice by Lyons et al of an log(age) predictor has the natural consequence that any change is exponential against age. The exponential decline is not a finding of the model: it is an assumption of the model. It is not a very sensible model. For example, it is now almost a year after the paper was published: what is the predicted percent of aggregated taxon pairs? You cannot tell: log(-1) is not a number. Following their model, at age = 0, the percent aggregated taxon pairs reached minus infinity. Clearly, the domain in which the model is valid is restricted.

Contrary to what Lyons et al argue in their reply, we did not confirm that the decline is non-linear, nor that the decline becomes steeper in the Holocene. These are direct consequences of the model they selected.

It is interesting to contrast the little emphasis the authors would now give to the breakpoint with quotes from the authors in press coverage at the time the paper was published (I’m just taking quotes as other aspects of the press release might have been written by journalists).

When early humans started farming and became dominant in the terrestrial landscape, we see this dramatic restructuring of plant and animal communities. Gotelli

The fraction of plant and animal species that were positively associated with each other was mostly unchanged for 300 million years. Then that fraction sharply declined over the last 6,000 years. Waller

the study even provides a way to date the start of the anthropocene. Waller

This tells us that humans have been having a massive effect on the environment for a very long time. Lyons

The pattern of co-occurring species remained stable through the evolution of land organisms from the earliest tetrapods through dinosaurs, flowering plants and mammals. This pattern didn’t change because of previous mass extinctions or ancient climate variability, but instead, early human activities 6,000 years ago suddenly began resetting a basic property of natural communities. Behrensmeyer

But more and more, research into the Holocene and the Pleistocene is showing that when humans started migrating around the globe, they started having major, unprecedented impacts. Lyons

Biases in the statistics

Often the interesting figures that show problems with a paper are buried deep in the supplementary material. So it is with Lyons et al. The supplementary material include the presence-absence matrices for each fossil data-set and the taxon co-occurrence plot.


Figure 6. Upper panel – presence-absence matrix for the North America pollen 0 ka with taxa ordered by occupancy and sites ordered by richness. Lower panel – co-occurrence plot with taxa ordered by occupancy.

Many of the co-occurrence plots resemble figure 6 with aggregated and segregated taxon-pairs occupying distinct places in the plot. The general pattern is this: two very rare (bottom left) or two very common taxa (top right) cannot be identified as either aggregated or segregated as there is not enough statistical power. Segregated pairs tend to have one common and one rare taxa (top left). Aggregated pairs tend to have taxa with similar occupancies (near diagonal).

Since it is reasonable to expect aggregated and segregated taxon pairs to exist over (almost) the full range of occupancies, the observed pattern probably reflects differences in the statistical power available to detect aggregation and segregation at different levels of occupancy.

Realising there were problems with statistical power, I decided to check if the proportion of aggregated taxon pairs was affected by data-set size. I am surprised that this does not appear to have been done before. Small data-sets will have less statistical power, but will detection of aggregated and segregated pairs be equally affected? The first simple test of this is to plot the proportion of aggregated taxon pairs against data-set size.


Figure 7. Proportion aggregated taxon pairs against data-set size.

There is a strong relationship between data-set size and the proportion of aggregated pairs. Very small data-sets have a low proportion of aggregated pairs. The proportion stabilises once there are more than ~50 sites in the data-set.

The relationship can be explored more directly by taking different sized random sub-samples of a large data-set (ensuring that the number of species is constant). I did this for four data-sets. There are strong, but perhaps inconsistent, patterns in each data-set.


Figure 8. Effect of number of observations on percentage of aggregated taxon pairs for four data-sets used by Lyons et al. US desert rodents (a), Holocene mammals (b), 1,000-year-old North American pollen (c), and 1950 Wisconsin understorey vegetation (d). Grey band shows 95% confidence interval of a local regression smoother fitted with a quasi-binomial distribution.

Holocene mammals and Wisconsin understorey vegetation have the most similar patterns. Initially, there is little change in the proportion of aggregated taxon pairs as the data-set is reduced in size. Then there is a slight rise followed by a strong fall.

The North American pollen 1ka data-set had a similar pattern but is noisier because there are fewer non-random pairs in this data-set.

The desert rodent data-set shows only a rise in proportion aggregated pairs. Very few subsets with fewer than fifty sites had any non-random taxon pairs, so it is not possible to tell what happens for small data-sets.

Part of this inconsistency probably relates to patterns of occupancy within the data-sets.

Lyons et al reply

Lyons et al ignore figure 7 completely and focus on the inconsistent patterns shown in figure 7. Yes the patterns are inconsistent – but probably understandable with more work – but the existence of any pattern with data-set size should worry the authors.

They claim that the effect is only found with very small samples

Using data from Telford et al. simulations, we note that the purported sample size effect in their extended data fig. 1 is found only at low sample sizes (rarefying to ten samples). Even slightly elevated sampling removes this effect; when rarefying to 20, two datasets were non-significant and two showed a relationship in the opposite direction.

They are tying to fit a linear trend to data with a non-linear relationship. Not a cunning plan.

Lyons et al repeat their analyses without the small data-sets and report that the

Redoing our analyses while eliminating datasets with less than 10, 20 or even 50 sites does not change our results substantially (Extended Data Table 1).

With the n = 50 criterion, which will remove most of the sample size bias, only eight modern mainland data-sets remain out of the initial fifty. When I subset the data – it is in the supplementary material – I find nine data-sets meet the criteria.

lyons <- read_excel("nature20097-s1.xlsx")

modern <- lyons %>%
filter(numSites >=50, age <= 100, land.island.ete.corrected != "island", land.island.ete.corrected.overlapping.removed != "NA") %>%
select(Dataset, numSites, percAgg)


Dataset                              numSites     percAgg

———————–  ———  ———-
crypticTaxa                                    171   0.66
NorthAm-mammals-Modern    67   0.41
WI-overstory-SPP-1950           168   0.53
WI-understory-SPP-1950          152   0.46
WI-overstory-SPP-2000           168   0.62
WI-understory-SPP-2000         152   0.44
NorthAm-pollen-0ka                 252   0.30
swusrodco.txt                               202   0.53
wausnailco.txt                                 55   0.82

However, this includes two pairs of data-sets that were resampled and should have been excluded under the no overlaps criterion. That leaves just seven modern data-sets. The analysis will be very sensitive to the properties of these data-sets.

Next Lyons et al stratify the data-sets into three groups:

Deep Time (>1 million years ago; n = 29 data sets), Shallow Time (1 million–100 years ago; n = 24), and Modern (<100 years ago; n = 46).

They argue that the deep time and modern have similar data-set sizes, but differ in the proportion of aggregated taxa and hence our objections are ill founded. I’m not going to criticise what they do here, except to note that they appear to have done a t-test which won’t perform well with the non-normally distributed number of sites and that a Wilcoxon test is significant at the p = 0.05 level. I will quote this though:

Third, in re-testing for a Holocene shift, Bertelsmeier and Ollier … break the data into three arbitrary time bins that bear no correspondence to accepted geological time periods. Their ‘ancient’ is 300 million years ago to 1 million years ago, which encompasses two of the largest recorded mass extinctions, the break-up of Pangaea, considerable changes in climate, the rise of angiosperms and a change from dinosaur- to mammal-dominated terrestrial ecosystems. Moreover, their definitions merge all Pleistocene and non-modern Holocene data into a single bin, obscuring a Holocene shift.

What drives aggregated and segregated taxon pairs

Non-random co-occurrence patterns can be driven by a variety of factors.

  • Large scale dispersal limitations (polar bears and walrus vs penguins)
  • Niche separation (polar bears vs pandas)
  • Biotic interactions (mutalists, competitive exclusion)
  • Small scale dispersal limitations (between islands, habitat fragments)

Because of the large spatial scale of some the data-sets in Lyons et al, niche separation is important, but probably little affected by human activity. For example, looking closely at the North American pollen 0ka slice (figure 5) we see that of the 48 segregated pairs, 13 (27%) include Rhizophora (species 33, red mangrove). Rhizophora is only found in Florida mangrove swamps and does not co-occur with temperate and boreal trees. Humans have not altered this relationship, but because Rhizophora only occurs in the most recent two pollen time-slices (probably due to sea-level rise submerging early Holocene mangrove swamps), it contributes to the apparent decrease in the proportion of aggregated taxon pairs.


It is exciting to see palaeoecological data being used to address important ecological questions, but devastating that such bold claims are supported by inappropriate data and analyses.

Given the number and severity of problems with Lyons et al (2016), we thought that they might retract the paper. Instead they essentially admit to all the problems, but carry on regardless.

Posted in Peer reviewed literature | Tagged , | Leave a comment

OxCal and R

OxCal is perhaps the most powerful of the three main Bayesian age-depth modelling procedures, with many many options and the potential for building far more complicated than your typical palaeolimnologist needs to use. Unlike Bchron and Bacon, OxCal is not R based. Much as I want to use OxCal, I also want to use R.

In this post, I show how OxCal models can be built in R, the OxCal command line program called from R, and the output imported into R and plotted. The code does not deal will all possible outputs from OxCal (it may need altering if your model is different from mine) and absolutely does not absolve you from checking the log files for any problems.

You need to start by downloading OxCal and installing it somewhere convenient on your computer.

We need a couple of packages for data manipulation and plotting.


And we need some radiocarbon dates. Here are a few of the dates from a model I’m currently working on (the full set of dates takes about six hours to process).

dates <- read.table(header = TRUE, text =
"depth_corr    Lab_Code    raw_14C_age    error
312.5    X-1    16010    100
352.0   X-2    16670     70
406.5    X-3    17730    130
440.5    X-4    18690    120
490.5    X-5    20150    220
535.5    X-6    20930    170

OxCal was written by archaeologists so you need to think like an archaeologist to use it: oldest dates need to come first in the model. (I suspect many palaeolimnologists get this wrong and then wait days before the model refuses to converge. At least I’ve done this a couple of times.)

##sort so oldest first
dates <- dates[order(dates$depth_corr, decreasing = TRUE), ]

I want to use a P sequence model to fit an age-depth model to the dates. I’m using the Marine13 calibration curve with a local reservoir offset of 40 +/-30 years. I’m using the P sequence model with a k selected from a prior distribution by OxCal and with an outlier model. All these commands and their arguments are explained in the on-line help for OxCal. writeLines prints the formatted model to the screen. I’ve used lots of white space (don’t worry, it’s free) to make the model human readable.

#Calibration information
curve <- ' Curve("marine13");'
DR <- ' Delta_R("Local Marine",40,30);'

#The dates
ageDepths <- with(dates,
  sprintf('  R_date("Dep%s",%d,%d){\n    z=%s;\n    Outlier(0.05);\n  };',
    depth_corr, raw_14C_age, error, depth_corr
ageDepths <- paste(ageDepths, collapse = "\n")

#Putting it all together
commandp <- paste(
' Outlier_Model("General",T(5),U(0,4),"t");',
' P_Sequence("k",1,1,U(-2,2)){',
'  Boundary("Bottom");',
'  Boundary("Top");',
' };',
sep = "\n")

#Wrapping in Plot()
commandp <- paste('Plot(){', commandp, '};', sep = "\n")
## Plot(){
##  Curve("marine13");
##  Delta_R("Local Marine",40,30);
##  Outlier_Model("General",T(5),U(0,4),"t");
##  P_Sequence("k",1,1,U(-2,2)){
##   Boundary("Bottom");
##   R_date("Dep535.5",20930,170){
##     z=535.5;
##     Outlier(0.05);
##   };
##   R_date("Dep490.5",20150,220){
##     z=490.5;
##     Outlier(0.05);
##   };
##   R_date("Dep440.5",18690,120){
##     z=440.5;
##     Outlier(0.05);
##   };
##   R_date("Dep406.5",17730,130){
##     z=406.5;
##     Outlier(0.05);
##   };
##   R_date("Dep352",16670,70){
##     z=352;
##     Outlier(0.05);
##   };
##   R_date("Dep312.5",16010,100){
##     z=312.5;
##     Outlier(0.05);
##   };
##   Boundary("Top");
##  };
## };

Once we’re happy with the model, we can save the file and call the OxCal command line utility. There are different versions of this for Linux, Mac and Windows.

writeLines(commandp, con = "oxcal_demo.input")
system(paste("~/oxcal/OxCal/bin/OxCalLinux", "oxcal_demo.input"))

Depending on how complicated your model is, you now have time for a cup of tea or a round the world cruise. During this time, R will sit there apparently doing nothing, but in the background, OxCal is busy shunting electrons round.

OxCal will produce three files, oxcal_demo.log, oxcal_demo.txt, and oxcal_demo.js. You need to look in the .log file and check for any problems. The .txt file contains a summary of the model output and the .js file contains all the details. It is this file we need to read into R. Unfortunately it is a large and complicated file which takes a bit of effort to read. I’ve made some functions to help import the data in to a data.frame that I can plot with ggplot. The functions extract what I need – they will need modifying if you need other information or if you have used a different naming convention for the dates.

get.values <- function(js, target, name = "value", numeric = TRUE) {
x <- js[grep(target, js)]
val <- gsub(".+=(.+);", "\\1", x)
lab <- gsub("^(ocd\\[\\d+\\]).*", "\\1", x )
if(numeric) {
val <- as.numeric(val)
res <- data_frame(lab, val)
setNames(res, c("lab", name))
get.vectors <- function(js, target) {
x <- js[grep(target, js)]
lab <- gsub("^(ocd\\[\\d+\\]).*", "\\1", x )
val <- gsub(".+=\\[(.+)\\];", "\\1", x)
val <- strsplit(val, ", ")
res <- sapply(val, as.numeric)
setNames(res, lab)
get.OxCal <- function(js, posterior = TRUE){
what <- "posterior"
what <- "likelihood"
depths <- get.values(js, "\\.data\\.z", "depth")
comment <- get.values(js, paste0(what,"\\.comment\\[0\\]"), "comment", FALSE)
depth_comment <- left_join(depths, comment)

start <- get.values(js, paste0(what,"\\.start"), "start")
resolution <- get.values(js, paste0(what,"\\.resolution"), "resolution")
meta <- full_join(start, resolution)
meta <- inner_join(depth_comment, meta)
probs <- get.vectors(js, paste0(what,"\\.prob="))

out <- plyr::adply(meta, 1, function(r){
prob <- probs[[r$lab]]
date <- r$start + 0:(length(prob) - 1) * r$resolution
data_frame(lab = r$lab, depth = r$depth, date = date, prob = prob, comment  = r$comment, posterior = posterior)
out$start <- NULL
out$resolution <- NULL
out$date <- 1950 - out$date#convert to BP

These functions use regular expressions to find and extract the required information. If you don’t know how to use regular expressions, it is well worth spending a morning learning how to use them as they are very useful for processing data.

With these functions, we need to first read the .js file into memory and then extract the posterior and likelihood information separately.

js <- readLines("oxcal_demo.js")
posterior <- get.OxCal(js, posterior = TRUE)
likelihood <- get.OxCal(js, posterior = FALSE)
allprob <- rbind(posterior, likelihood)

##      lab depth    date     prob               comment posterior
## 1 ocd[9] 535.5 25544.5 0.000910 "Dep535.5 Posterior "      TRUE
## 2 ocd[9] 535.5 25539.5 0.000455 "Dep535.5 Posterior "      TRUE
## 3 ocd[9] 535.5 25534.5 0.000455 "Dep535.5 Posterior "      TRUE
## 4 ocd[9] 535.5 25529.5 0.003095 "Dep535.5 Posterior "      TRUE
## 5 ocd[9] 535.5 25524.5 0.001729 "Dep535.5 Posterior "      TRUE
## 6 ocd[9] 535.5 25519.5 0.002367 "Dep535.5 Posterior "      TRUE

Another function will summarise the age-depth model, extracting, by default, the 95.4% credibility interval and the median.

get.ageModel <- function(x, ci = c(0.023, 0.977)){
x %>%
filter(posterior == TRUE) %>%
group_by(depth, comment) %>%
mutate(sumProb = cumsum(prob)) %>%
mutate(sumProb = sumProb/max(sumProb)) %>%
medianDate = approx(x = sumProb, y = date, xout =  0.5)$y,
lowCI = approx(x = sumProb, y = date, xout = ci[1])$y,
highCI = approx(x = sumProb, y = date, xout = ci[2])$y
mod <- get.ageModel(x = allprob)

A final function plots the model

plot.dates_model <- function(dates, model, title){
dates %>% filter(grepl("Dep", comment)) %>% #just the dates
ggplot(aes(x = depth, y = date, size = prob, colour = posterior, group = comment)) +
geom_ribbon(data = model, aes(x = depth, ymax = highCI, ymin = lowCI), inherit.aes = FALSE, alpha  = 0.2) +
geom_line(data = mod, aes(x = depth, y = medianDate), inherit.aes = FALSE, colour = "grey40") +
geom_line(alpha = 0.6) +
scale_x_reverse() +
scale_size(range = c(0, 6)) +
coord_flip() +
labs(x = "Depth cm", y = "Date yr BP") +
theme_bw() +
guides(size = "none", colour = guide_legend(title = "Posterior", override.aes = list(size = 4))) +
g <- plot.dates_model(allprob, mod, "My Core")


The red lozenges show the probability distribution function of the each date when calibrated individually. The blue lozenges show the posterior probability distribution functions of each date when modelled with the p-sequence model.

This code does not extract all the important information from OxCal, but it is a start.

Posted in Age-depth modelling, R, Uncategorized | Tagged | Leave a comment