Category Archives: EDA

How many is fifty? Sanity checks for assemblage data.

This week I’m at the Palaeolimnology Symposium in Stockholm this week. I have a couple of presentations. I gave the first this morning to the chironomid “DeadHead” meeting. I showed some sanity checks for assemblage data, some of which are … Continue reading

Posted in Data manipulation, EDA | Tagged , | 1 Comment

Falling for ggplot2

I spent a long time resisting the lure of ggplot2. I was proficient with the plotting functions in base graphics; why did I need to learn an entirely new graphics system? Yes, getting up colour ramps could be a real … Continue reading

Posted in EDA, R | Leave a comment

Variance inflation factors and ordination model selection

Variance inflation factors (VIF) give a measure of the extent of multicollinearity in the predictors of a regression. If the VIF of a predictor is high, it indicates that that predictor is highly correlated with other predictors, it contains little or no unique information, and there is redundancy … Continue reading

Posted in EDA, R | Tagged | 4 Comments

REDFIT’s rule of thumb

Because REDFIT tests many frequencies, some are likely to appear statistically significant just by chance — a classic multiple testing problem.  Schulz & Mudelsee (2002) “follow Thomson (1990) and select a false-alarm level of (1-1/n)*100%, where n is the number of data points … Continue reading

Posted in EDA, Peer reviewed literature, R | Tagged , | 1 Comment

REDFIT & false alarms

REDFIT is a useful tool for palaeoecologists who like to test their data for periodicities as it uses the Lomb-Scargle Fourier transform which tolerates unequal time intervals and so avoids the problems inherent in interpolating data to equal intervals. Several of the papers reporting the … Continue reading

Posted in EDA, Peer reviewed literature, R | Tagged , | 3 Comments

Collaboration networks in BIO

I’ve been intrigued about social network analysis since reading the Wegman report that seemed to find that Michael Mann was at the centre of the network of co-authors on papers on which he was a co-author (about as surprising as … Continue reading

Posted in EDA, R | Tagged | 1 Comment

Running correlations – running into problems

Running correlations are a useful technique to explore how correlations between two variables vary in time or space. The correlation is calculated in a window of the first n observations, then the window is moved by one position, and the … Continue reading

Posted in EDA, Peer reviewed literature, R | 17 Comments

Colour coding points in a graph

Sometimes it is useful to colour code points on a graph according to a categorical variable. There are, as always, several ways to do this. It would be possible to use nested ifelse() statements, but that way lies insanity if … Continue reading

Posted in EDA, R | Tagged , , | Leave a comment

Overlaying core and calibration set samples in an ordination

One visualisation technique I frequently use is to make an ordination showing the modern calibration set sites, the fossil core data, and contours of an environmental variable. Here, I’ve used non-metric multidimensional scaling to ordinate the core and calibration data … Continue reading

Posted in EDA, R | Tagged , , , , | 4 Comments

Doubly re-ordered data matrices

A useful method for exploring and visualising the main patterns in community data is a doubly reordered data matrix, with the species abundance indicated by either the size of a symbol or shading. In this example, I’ve used the dune … Continue reading

Posted in EDA, R | Tagged , | Leave a comment