OxCal and R

OxCal is perhaps the most powerful of the three main Bayesian age-depth modelling procedures, with many many options and the potential for building far more complicated than your typical palaeolimnologist needs to use. Unlike Bchron and Bacon, OxCal is not R based. Much as I want to use OxCal, I also want to use R.

In this post, I show how OxCal models can be built in R, the OxCal command line program called from R, and the output imported into R and plotted. The code does not deal will all possible outputs from OxCal (it may need altering if your model is different from mine) and absolutely does not absolve you from checking the log files for any problems.

You need to start by downloading OxCal and installing it somewhere convenient on your computer.

We need a couple of packages for data manipulation and plotting.


And we need some radiocarbon dates. Here are a few of the dates from a model I’m currently working on (the full set of dates takes about six hours to process).

dates <- read.table(header = TRUE, text =
"depth_corr    Lab_Code    raw_14C_age    error
312.5    X-1    16010    100
352.0   X-2    16670     70
406.5    X-3    17730    130
440.5    X-4    18690    120
490.5    X-5    20150    220
535.5    X-6    20930    170

OxCal was written by archaeologists so you need to think like an archaeologist to use it: oldest dates need to come first in the model. (I suspect many palaeolimnologists get this wrong and then wait days before the model refuses to converge. At least I’ve done this a couple of times.)

##sort so oldest first
dates <- dates[order(dates$depth_corr, decreasing = TRUE), ]

I want to use a P sequence model to fit an age-depth model to the dates. I’m using the Marine13 calibration curve with a local reservoir offset of 40 +/-30 years. I’m using the P sequence model with a k selected from a prior distribution by OxCal and with an outlier model. All these commands and their arguments are explained in the on-line help for OxCal. writeLines prints the formatted model to the screen. I’ve used lots of white space (don’t worry, it’s free) to make the model human readable.

#Calibration information
curve <- ' Curve("marine13");'
DR <- ' Delta_R("Local Marine",40,30);'

#The dates
ageDepths <- with(dates,
  sprintf('  R_date("Dep%s",%d,%d){\n    z=%s;\n    Outlier(0.05);\n  };',
    depth_corr, raw_14C_age, error, depth_corr
ageDepths <- paste(ageDepths, collapse = "\n")

#Putting it all together
commandp <- paste(
' Outlier_Model("General",T(5),U(0,4),"t");',
' P_Sequence("k",1,1,U(-2,2)){',
'  Boundary("Bottom");',
'  Boundary("Top");',
' };',
sep = "\n")

#Wrapping in Plot()
commandp <- paste('Plot(){', commandp, '};', sep = "\n")
## Plot(){
##  Curve("marine13");
##  Delta_R("Local Marine",40,30);
##  Outlier_Model("General",T(5),U(0,4),"t");
##  P_Sequence("k",1,1,U(-2,2)){
##   Boundary("Bottom");
##   R_date("Dep535.5",20930,170){
##     z=535.5;
##     Outlier(0.05);
##   };
##   R_date("Dep490.5",20150,220){
##     z=490.5;
##     Outlier(0.05);
##   };
##   R_date("Dep440.5",18690,120){
##     z=440.5;
##     Outlier(0.05);
##   };
##   R_date("Dep406.5",17730,130){
##     z=406.5;
##     Outlier(0.05);
##   };
##   R_date("Dep352",16670,70){
##     z=352;
##     Outlier(0.05);
##   };
##   R_date("Dep312.5",16010,100){
##     z=312.5;
##     Outlier(0.05);
##   };
##   Boundary("Top");
##  };
## };

Once we’re happy with the model, we can save the file and call the OxCal command line utility. There are different versions of this for Linux, Mac and Windows.

writeLines(commandp, con = "oxcal_demo.input")
system(paste("~/oxcal/OxCal/bin/OxCalLinux", "oxcal_demo.input"))

Depending on how complicated your model is, you now have time for a cup of tea or a round the world cruise. During this time, R will sit there apparently doing nothing, but in the background, OxCal is busy shunting electrons round.

OxCal will produce three files, oxcal_demo.log, oxcal_demo.txt, and oxcal_demo.js. You need to look in the .log file and check for any problems. The .txt file contains a summary of the model output and the .js file contains all the details. It is this file we need to read into R. Unfortunately it is a large and complicated file which takes a bit of effort to read. I’ve made some functions to help import the data in to a data.frame that I can plot with ggplot. The functions extract what I need – they will need modifying if you need other information or if you have used a different naming convention for the dates.

get.values <- function(js, target, name = "value", numeric = TRUE) {
x <- js[grep(target, js)]
val <- gsub(".+=(.+);", "\\1", x)
lab <- gsub("^(ocd\\[\\d+\\]).*", "\\1", x )
if(numeric) {
val <- as.numeric(val)
res <- data_frame(lab, val)
setNames(res, c("lab", name))
get.vectors <- function(js, target) {
x <- js[grep(target, js)]
lab <- gsub("^(ocd\\[\\d+\\]).*", "\\1", x )
val <- gsub(".+=\\[(.+)\\];", "\\1", x)
val <- strsplit(val, ", ")
res <- sapply(val, as.numeric)
setNames(res, lab)
get.OxCal <- function(js, posterior = TRUE){
what <- "posterior"
what <- "likelihood"
depths <- get.values(js, "\\.data\\.z", "depth")
comment <- get.values(js, paste0(what,"\\.comment\\[0\\]"), "comment", FALSE)
depth_comment <- left_join(depths, comment)

start <- get.values(js, paste0(what,"\\.start"), "start")
resolution <- get.values(js, paste0(what,"\\.resolution"), "resolution")
meta <- full_join(start, resolution)
meta <- inner_join(depth_comment, meta)
probs <- get.vectors(js, paste0(what,"\\.prob="))

out <- plyr::adply(meta, 1, function(r){
prob <- probs[[r$lab]]
date <- r$start + 0:(length(prob) - 1) * r$resolution
data_frame(lab = r$lab, depth = r$depth, date = date, prob = prob, comment  = r$comment, posterior = posterior)
out$start <- NULL
out$resolution <- NULL
out$date <- 1950 - out$date#convert to BP

These functions use regular expressions to find and extract the required information. If you don’t know how to use regular expressions, it is well worth spending a morning learning how to use them as they are very useful for processing data.

With these functions, we need to first read the .js file into memory and then extract the posterior and likelihood information separately.

js <- readLines("oxcal_demo.js")
posterior <- get.OxCal(js, posterior = TRUE)
likelihood <- get.OxCal(js, posterior = FALSE)
allprob <- rbind(posterior, likelihood)

##      lab depth    date     prob               comment posterior
## 1 ocd[9] 535.5 25544.5 0.000910 "Dep535.5 Posterior "      TRUE
## 2 ocd[9] 535.5 25539.5 0.000455 "Dep535.5 Posterior "      TRUE
## 3 ocd[9] 535.5 25534.5 0.000455 "Dep535.5 Posterior "      TRUE
## 4 ocd[9] 535.5 25529.5 0.003095 "Dep535.5 Posterior "      TRUE
## 5 ocd[9] 535.5 25524.5 0.001729 "Dep535.5 Posterior "      TRUE
## 6 ocd[9] 535.5 25519.5 0.002367 "Dep535.5 Posterior "      TRUE

Another function will summarise the age-depth model, extracting, by default, the 95.4% credibility interval and the median.

get.ageModel <- function(x, ci = c(0.023, 0.977)){
x %>%
filter(posterior == TRUE) %>%
group_by(depth, comment) %>%
mutate(sumProb = cumsum(prob)) %>%
mutate(sumProb = sumProb/max(sumProb)) %>%
medianDate = approx(x = sumProb, y = date, xout =  0.5)$y,
lowCI = approx(x = sumProb, y = date, xout = ci[1])$y,
highCI = approx(x = sumProb, y = date, xout = ci[2])$y
mod <- get.ageModel(x = allprob)

A final function plots the model

plot.dates_model <- function(dates, model, title){
dates %>% filter(grepl("Dep", comment)) %>% #just the dates
ggplot(aes(x = depth, y = date, size = prob, colour = posterior, group = comment)) +
geom_ribbon(data = model, aes(x = depth, ymax = highCI, ymin = lowCI), inherit.aes = FALSE, alpha  = 0.2) +
geom_line(data = mod, aes(x = depth, y = medianDate), inherit.aes = FALSE, colour = "grey40") +
geom_line(alpha = 0.6) +
scale_x_reverse() +
scale_size(range = c(0, 6)) +
coord_flip() +
labs(x = "Depth cm", y = "Date yr BP") +
theme_bw() +
guides(size = "none", colour = guide_legend(title = "Posterior", override.aes = list(size = 4))) +
g <- plot.dates_model(allprob, mod, "My Core")


The red lozenges show the probability distribution function of the each date when calibrated individually. The blue lozenges show the posterior probability distribution functions of each date when modelled with the p-sequence model.

This code does not extract all the important information from OxCal, but it is a start.

Posted in Age-depth modelling, R, Uncategorized | Tagged | Leave a comment

73 lakes become 78

I have detailed many curious aspects of the remarkable Lake Żabińskie August air-temperature reconstruction by Larocque-Tobler et al (2015). This posts describes yet more – this time in Supplementary Data Figure 1 which shows a redundancy analysis (a type of constrained ordination).

The method

A Redundancy Analysis (RDA) created in CANOCO was also used to determine the fit of downcore samples to the transfer function samples. The RDA constrained to temperature was created with the transfer function samples and the downcore samples added passively to the RDA.

The result

An RDA with downcore samples passively added to the transfer function samples show that no downcore sample was located outside the field determined by the transfer function samples. The eight warmer lakes (17–27.5 °C) in the Canadian transfer function were also found in the same quadrant as the Polish lakes (16.3–18.8 °C) suggesting a similarity between the assemblages in warmer lakes in Canada and those which cover the same temperature gradient in Poland.

The first result is somewhat inevitable given that only the first two axes of the RDA are considered. As a reconstruction diagnostic tool, passive plotting is only really useful if residual distances are small, otherwise the fossil samples might be highly aberrant on axis three and no-one would ever know.

The second result is critically important to the paper which combines calibration sets from Canada and Poland. If the warm Polish and Canadian lakes do not overlap in ordination space, they do not have similar species assemblages and it is likely that an environmental variable other than temperature is driving the difference between them. This would severely damage the motivation for making a combined calibration set.

The figure


A few points are immediately obvious.

  • Unlike perhaps every other RDA I have seen, there is no arrow showing the constraining environmental variable. It should point right from the origin.
  • The axis scales are not identical as the should be (giving undue prominence to the first axis).
  • The second axis of the ordination is mis-labelled as RDA Axis 2. When there is only one constraining variable, there can only be a single constrained axis. The second axis should be PCA axis 1.

None of these points are of any great importance, except that they strongly indicate that the figure was not made in CanoDraw, C2 or the vegan/analogue packages in R which would not make these mistakes.

Much more seriously, while there are 73 Canadian lakes in the archived calibration set (not 72 as the paper reports), there are 78 red squares (perhaps 77 – some overlap) which indicate Canadian lakes in the figure.

The replication

This figure should be easy to replicate from the archived data (full code at https://github.com/richardjtelford/Zabinskie). First, I want to plot just the calibration set lakes, omitting the passive fossil samples.


mod <- rda(sqrt(spp) ~ env)

scaling <- "sites"
frda <- fortify(mod, display = "sites", scaling = scaling)

#country information
frda$country <- c(rep("Poland", 48), rep("Canada", 73))

g <- ggplot(frda, aes(-Dim1, Dim2, colour = country, shape = country)) + #axis 1 flipped to match published figure
geom_point(size = 2) +
coord_equal() +
labs(x = "RDA1", y = "PCA1", shape = "Country", colour = "Country") +
geom_vline(xintercept = 0) +
geom_hline(yintercept = 0) +
scale_shape_manual(values = c(15, 18)) +
scale_colour_manual(values = c(2, 4))




Generally, the replication is similar to the published figure. However, contrary to what LT15 write and show in their figure, there are only two Canadian lakes in the lower-left quadrant with the Polish lakes. The extra Canadian lakes in the Polish quadrant are not due to mis-classification of the lakes, there are no lakes in these positions in the replication. Other changes are the deletion of two lakes at the left side of the upper left quadrant and the addition of one point at the upper-right corner of the upper right quadrant.

I honestly cannot explain how these difference could have occurred.

Adding the fossil sites passively is most easily done with the function timetrack in Gavin Simpson’s analogue package.

tt <- analogue::timetrack(X = spp, passive = fos, env = env, method = "rda", transform = "sqrt", scaling = scaling)

g + geom_point(aes(x = -RDA1, y = PC1), data = as.data.frame(tt$fitted.values), inherit.aes = FALSE, colour = "green", shape = 17) +
scale_shape_manual(name = "", limits = c("Canada", "Poland", "Fossil"), values = c(15, 18, 17)) +
scale_colour_manual(name = "", limits = c("Canada", "Poland", "Fossil"), values = c(2, 4, 3))


Well most of the fossil samples are in the same quadrant as LT15 find them, but the pattern is not the same. I have no idea if this is because of different species inclusion rules or what

An earlier version of this post contained a faulty version of the timetrack plot due to my misunderstanding of how analogue::join worked. Gavin kindly updated this function so my code now works.

Posted in Peer reviewed literature, transfer function | Tagged , | 6 Comments

Comment on Lyons et al finally published

After many months, our comment on Lyons et al is finally published [link should give free access].

Lyons et al looked at many community  data sets from the last 300 million years and tested for pairs of species that are more aggregated (co-occur) or segregated than expected by chance. They found that in modern data sets, there are more segregated than aggregated species pairs, and the opposite pattern in fossil data sets, with a breakpoint about 6000 years ago, suggesting a severe impact from early agriculture. This was a surprising finding, especially given the presence-absence data used by the authors.

We argue that Lyons et al result is a product of

  • poor data set selection (some of which were addressed by a corrigendum)
  • artefacts in the breakpoint analysis
  • data-set size related artefacts in the identification of segregated and aggregated species pairs

Together, these problems mean that the results of Lyons et al cannot be substantiated.

Lyons and co-authors, predictable, disagree with our conclusion.

Once I get back to Bergen from Makerere, I plan to write something about the reply, and some of our findings that we didn’t have space for in our comment.


Posted in Peer reviewed literature | Tagged | Leave a comment

There is no scientific misconduct …

The youngest patient was two years old when she died three months after Professor Paolo Macchiarini’s experimental surgery to replace her trachea. Since then, Macchiarini has been accused of scientific malpractice. An external investigation found that he reported in a paper that ethics approval had been obtained when it had not, misrepresented patients’ outcomes in several papers, and other issues rated as malpractice.

The vice-chancellor at the Karolinska Institute, where Macchiarini worked, decided that Macchiarini acted “without due care”, but that his behaviour “does not qualify as scientific misconduct”.

That might have been the end of the matter but for the work of journalists. Vanity Fair reported Macchiarini’s plans to marry the producer of a TV documentary about his work in a ceremony officiated by Pope Francis at the Pope’s summer residence with guests including Vladimir Putin and Barack Obama. In reality, the Pope plans took him to South America instead of the wedding of the already-married surgeon. Macchiarini’s CV was only slightly less fanciful. Sveriges Television alleged that some Russian patients Macchiarini operated onwere not ill enough to warrant such a risky procedure.

The misconduct investigation into Macchiarini’s work was subsequently reopened: he was fired (and is under investigation for manslaughter by Swedish prosecutors); and the vice-chancellor resigned, as did several eminent scientists, including Nobel-prize judges.

Macchiarini’s work hastened the deaths of several patients, yet until pressurised by the media, the Karolinska Institute was prepared to overlook misconduct. There is no scientific misconduct so severe that distinguished scientists might not seek to ignore it. How can we ensure that university investigations into research misconduct (or indeed other types of misconduct) are thorough and fair, and as importantly, seen to be thorough and fair? Quis custodiet ipsos custodes?

Posted in Misconduct, Uncategorized | Tagged | 1 Comment

Bob Irvine’s zombie paper (hide the tin foil)

A couple of years ago, I criticised a paper by Bob Irvine published by WIT (a publisher on Beal’s List of possibly predatory publishers). Shortly afterwards, the paper was retracted with the editor writing “I have now received the result of a peer evaluation carried out urgently yesterday on the paper you brought into question, and have decided to withdraw it from our eLibrary.”

Recently, a commenter told me that a revised version of the paper was now published.

I don’t know what’s changed in the paper as I didn’t keep a copy of the paper and the journal does not disclose what changed. But I can compare the abstract. It was one paragraph. It is now four paragraphs. The text is otherwise identical (except that “greenhouse” has been mis-corrected to “Green House”). The declaration that Irvine does not understand climate models is still there

Most Coupled Model Intercomparison Project Phase 5 (CMIP5) climate models assume that the efficacy of a solar forcing is close to the efficacy of a similar sized Green House Gas (GHG) forcing.

As are the tin foil experiments.

If the paper was bad enough to merit retraction two years ago, it really isn’t clear why it merits publication now.




Posted in Fake climate sceptics | Tagged | Leave a comment

Data archiving at The Holocene: policy and practice

When I read a recent paper in The Holocene,  I wondered, the way one does, if the data were available, and turned to The Holocene’s submission guidelines.

SAGE [the publisher] acknowledges the importance of research data availability as an integral part of the research and verification process for academic journal articles.

The Holocene requests all authors submitting any primary data used in their research articles if the articles are accepted to be published in the online version of the journal, or provide detailed information in their articles on how the data can be obtained. This information should include links to third-party data repositories or detailed contact information for third-party data sources. Data available only on an author-maintained website will need to be loaded onto either the journal’s platform or a third-party platform to ensure continuing accessibility. Examples of data types include but are not limited to statistical data files, replication code, text files, audio files, images, videos, appendices, and additional charts and graphs necessary to understand the original research. The editor can also grant exceptions for data that cannot legally or ethically be released. All data submitted should comply with Institutional or Ethical Review Board requirements and applicable government regulations. For further information, please contact the editorial office.

The policy is not as strong as I would like – it requests rather than requires – and the first sentence of the main paragraph is difficult to parse. What is missing is a requirement for a data availability statement. Nature announced yesterday that this will be required in their journals.

But how well is the data archiving mandate followed? I’m going to look at the latest issue of The Holocene, and see if data has been archived for the papers. Note, I don’t know when this policy came into force so this test of compliance might be unfair if papers were submitted before the policy was announced.

The issue contains twelve research papers: one paper has most of the data in tables within the paper; four have supplementary online material; none link to third-party (e.g., figshare or datadryad), institutional or personal data repositories.

Unfortunately none of the supplementary online material are currently online (8th September). This is a tiny bit hopeless as the papers have been online since April. It would be so cunning to publish the supplementary material at the same time as the paper, ideally with a link from the PDF.

The absence of the supplemental material means that I cannot tell if they archive data or not. Even being optimistic about their contents, there is less than 50% compliance with the data archiving mandate.

It doesn’t have to be this way. The Journal of Ecology had a 93% data archiving compliance rate in 2015.





Posted in Peer reviewed literature | Tagged , | Leave a comment

Czarny Staw Gąsienicowy

Whenever I see a lake, especially one as beautiful as Czarny Staw Gąsienicowy in the High Tatras, I wonder about what hypotheses could be tested with a sediment core.


The lake, in a glacial cirque 1624 m a.s.l., is 51 m deep. At that depth, a Kajak corer or a micro-Kullenberg corer would be the obvious corers to use. Both are line-operated, the former would be good for sediment cores up to 50 cm long, the latter has a piston which improved sediment recovery when collecting longer cores (perhaps 2m). Fortunately, that is probably about as much lacustrine sediment as there is in the lake. The lake could be cored from the ice in spring (I’ve never done this – I was supposed to core a Finnish bay, but the ice was too thin), or from a small boat.

There are, not surprisingly, several palaeoecological studies on Czarny Staw Gąsienicowy and neighbouring lakes in the High Tatras (this list does not pretend to be complete – please add any I have missed in the comments).

Sienkiewicz and Gąsiorowsk (2014) take short cores from Czarny Staw Gąsienicowy and two other lakes and investigate the diatom stratigraphies over the last millennium and use EDDI to reconstruct nutrient status. The two other lakes have tourist cabins in their catchment and show eutrophication (the Secchi-disc depth is 12 m – these are not your pea soup eutrophic pond).

Gąsiorowski and Sienkiewicz (2010) investigate diatom and cladoceran stratigraphies from short cores from two lakes south of Czarny Staw Gąsienicowy (not the lakes in Sienkiewicz and Gąsiorowsk (2014)). They infer recent acidification-driven change in the stratigraphies following earlier climate-driven changes.

Kubovčík and Bitušík (2006) examine the chironomid response to pH changes in three lakes with different susceptibility to acidification from the Slovakian side of the Tatra. The best buffered lake shows no change in chironomid assemblages, whereas the least buffered lake has a large change in assemblage composition and a large drop in chironomid abundance.

Šporka et al (2002) investigate several proxies from Nižné Terianske pleso in Slovakia. Spherical carbonaceous particles give a good indication of the timing of atmospheric pollution. The pigment record is ambiguous (as it often is): downcore changes may be driven entirely by diagenesis, but there may also be a signal from changes in trophic state revealed by the other proxies. There was “no clear relationship between chironomid assemblage and temperature change”. Diatom and chrysophyte assemblages appear to have been influenced by acidification and perhaps also by warming.

On a longer time-scale, Marciniak (1986) presents a diatom stratigraphy from a 3 m-long core from Przedni Staw Lake that starts in the Older Dryas. It is a very descriptive paper, the likes of which would be difficult to publish now, and mentions some previous work on the same lake, including Cladoceran analyses.

There are also pollen analytical work including Rybníčková and Rybníček (2006) and Kłapyta et al (2015).

An overview of limnological studies in the Carpathian region, of which the Tatras are a part, is given by Buczkó et al (2009).

Posted in Uncategorized | Leave a comment