Making a pollen diagram from Neotoma

Last week I gave a course on R for palaeoecologists covering data handling using tidyverse, reproducibility and some some ordinations and transfer functions. One of the exercises was to download some pollen data from Neotoma and make a pollen diagram. This is my solution. It uses a lot of tidyverse. Everything could be done with base R, but I find tidyverse more expressive.

#load packages
library("tidyverse")
library("neotoma")
library("rioja")
#devtools::install_github("richardjtelford/ggpalaeo") #NEW VERSION
library("ggpalaeo") #Need new version

I’ve chosen to plot the pollen data from Lake Bambili. First we need to download the data with the neotoma package

#download data
get_dataset(40940)#check correct site
## A dataset_list containing 1 objects:
## Accessed from 2019-10-10 16:41h to 2019-10-10 16:41h.
## Datasets:
## dataset.id site.name long lat type
## 40940 Bambili 1 10.24383 5.93487 pollen
bambili_raw <- get_download(40940)
## API call was successful. Returned record for Bambili 1
eco_types <- get_table("EcolGroupTypes")

Then we need to extract the components we need from the downloaded data. get_download returns a list with one element per dataset which we can extract with the $ notation.

#extract components
cnts = counts(bambili_raw$`40940`) #table of counts
meta = bambili_raw$`40940`$sample.meta #sample depths/ages
taxa = bambili_raw$`40940`$taxon.list %>% #taxon info
  mutate_all(as.character) #convert everything from a factor to prevent warnings later

I want to add the age/depth information to the counts. Here I am using bind_cols to do this, which assumes the rows are in the same order. The paranoid might want to convert the row names of the counts into a column (rownames_to_column) and then join on sample.id. Next I use pivot_longer (the replacement of tidyr::gather) to convert the data from a species x samples data.frame to a long format, and join to the taxon information. Note that some datasets have an alias column in taxon.list and you need to join by that instead.

bambili <- meta %>% select(age, depth) %>%
bind_cols(cnts) %>%
pivot_longer(cols = -c("age", "depth"), names_to = "species", values_to = "count") %>%
left_join(taxa, by = c("species" = "taxon.name"))

Now is a good time to check what we have. It’s useful to have the Neotoma database manual open to check what terms mean.

bambili %>% count(variable.element)
## # A tibble: 3 x 2
## variable.element n
##
## 1 pollen 36750
## 2 pollen/spore 175
## 3 spore 875

variable.element contains just pollen and spores. It can include a variety of other things (such as stomata) that we might want to filter out.

eco_types %>%
semi_join(bambili, by = c("EcolGroupID" = "ecological.group")) %>%
select(EcolGroupID, EcolGroup)
## EcolGroupID EcolGroup
## 1 AQVP Aquatic Vascular Plants
## 2 MANG Mangroves
## 3 PALM Palms
## 4 SEED Spermatophyte rank or clade above order
## 5 TRSH Trees and Shrubs
## 6 UNID Unknown and Indeterminable Palynomorphs
## 7 UPHE Upland Herbs
## 8 VACR Terrestrial Vascular Cryptogams

I don’t want to include the AQVP or UNID in the diagram. I’m not sure about SEED.

bambili %>% filter(ecological.group == "SEED", count > 0) %>%
select(species, count) %>%
group_by(species) %>%
summarise(n = n(), max = max(count))
## # A tibble: 1 x 3
## species n max
##
## 1 Monocotyledoneae undiff. 33 6

So SEED is an unidentified monocot present at low abundances. I’m going to delete it.

#remove unwanted variable.element/ecological.group

bambili = bambili %>% 
  filter(!ecological.group %in% c("AQVP", "UNID", "SEED"))
#use `%in%` not `==`

This is a good time to check the count sums. It might be prudent to delete any samples with very low counts.

#check count sums
bambili %>%
group_by(depth) %>%
summarise(s = sum(count)) %>%
arrange(s) %>%
slice(1:5)
## # A tibble: 5 x 2
## depth s
##
## 1 510. 72
## 2 220. 157
## 3 200. 176
## 4 240. 177
## 5 280. 233

Seventy four isn't great, but I'm going to keep it.

Now we can calculate percent and remove the rare taxa

#calculate percent
bambili = bambili %>% 
  group_by(depth) %>% 
  mutate(percent = count/sum(count) * 100) 

#remove rare taxa
bambili1 = bambili %>% 
  group_by(species) %>% 
  filter(
    sum(percent > 0) >= 3, #must be in at least three samples
    max(percent) > 3) #must have a max percent > 3

Now we can use pivot_wider to reshape the data back into a species x samples data.frame that we can plot with rioja::strat.plot. For convenience, I’m separating the species data from the age/depth data.

bambili2 <- bambili1 %>%
select(age, depth, species, percent) %>%
pivot_wider(names_from = "species", values_from = "percent")

bambili_spp <- bambili2 %>% select(-age, -depth) %>%
as.data.frame()#needed as strat.plot does not like tibble - pull request to fix this submitted.

Now we can plot the stratigraphy.

#strat.plot
strat.plot(bambili_spp, yvar = bambili2$depth)

basic-plot-1

There are a variety of aspects of this plot that need improving. We need to:

  • plot on constant scale for all taxa
  • reverse y axis so deeper levels are lower
  • arrange the taxa in some logical order
  • rotate the species names and set the figure margins
  • add a cluster diagram and zones
  • add a secondary scale

Some of these can be fixed by setting an argument in strat.plot (there are a lot of arguments – see ?strat.plot), but to reorder the species, we need to reprocess the data.

bambili2 <- bambili1 %>%
mutate(
#make ecological.group a factor with TRSH first
ecological.group = factor(ecological.group, levels = c("TRSH", "UPHE", "VACR")),
mean_percent = mean(percent)) %>%
#arrange by ecological.group and mean_percent (largest first)
arrange(ecological.group, desc(mean_percent)) %>%
ungroup() %>%
#make species into a factor so we can perserve the order
mutate(species = factor(species, levels = unique(species)))

#reshape using tidyr::spread as pivot_wider (currently?) ignores factor order
bambili3 <- bambili2 %>%
select(age, depth, species, percent) %>%
spread(key = "species", value = "percent")

bambili_spp <- bambili3 %>%
select(-age, -depth) %>%
as.data.frame(bambili_spp)

#set up for ecological group colours
ecological_groups <- bambili2 %>%
  distinct(species, ecological.group) %>% 
  pull(ecological.group)
ecological_colours <- c("red", "green", "orange")

And we run a constrained cluster analysis

bambili_dist <- dist(sqrt(bambili_spp/100))#chord distance
clust <- chclust(bambili_dist, method = "coniss")
#bstick(clust)#five groups

Now we can make a better plot.

#set up mgp (see ?par)
mgp <- c(2, 0.25, 0)
par(tcl = -0.15, mgp = mgp)#shorter axis ticks - see ?par
pt <- strat.plot(
d = bambili_spp,
yvar = bambili3$depth,
y.rev = TRUE, #reverse direction of y-axis
scale.percent = TRUE, #use constant scale for all taxa
srt.xlabel = 45, #rotate x-label by 45 degrees
cex.xlabel = 0.8, #smaller font
mgp = mgp,
xRight = 0.98, #right margin
xLeft = 0.21, #left margin with space for 2nd axis
yTop = 0.60, #top margin
yBottom = 0.1, #bottom margin
col.line = ecological_colours[ecological_groups],#colours
col.bar = ecological_colours[ecological_groups], #colours
ylabel = "Depth cm",
clust = clust
)

#add zone boundaries
addClustZone(pt, clust = clust, nZone = 5)

#add a secondary scale
secondary_scale(pt, yvar = bambili3$depth, 
                yvar2 = bambili3$age, 
                ylabel2 = "Date yr BP",
                n = 10)

better-plot-1

It’s beginning to get there. There are probably too many taxa plotted. Merging the various Polypodiophyta as the names are very long and the ecological interpretation of the different types is unclear. I also want to reduce the space between the y-axis and the ylabel. Unfortunately this is hard coded in strat.plot but it would only take a minute to make it into an argument. I’d also like to add a % sign to to the axis, but I cannot see an argument that will do that (again it shouldn’t be hard to code – adding yet another argument).

I would like to have some text on top describing the ecological groups, but I have the feeling that that would be very difficult to do.

About richard telford

Ecologist with interests in quantitative methods and palaeoenvironments
This entry was posted in Data manipulation, R, reproducible research and tagged . Bookmark the permalink.

1 Response to Making a pollen diagram from Neotoma

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s