Category Archives: EDA

How many is fifty? Sanity checks for assemblage data.

This week I’m at the Palaeolimnology Symposium in Stockholm this week. I have a couple of presentations. I gave the first this morning to the chironomid “DeadHead” meeting. I showed some sanity checks for assemblage data, some of which are … Continue reading

Posted in Data manipulation, EDA | Tagged , | 1 Comment

Falling for ggplot2

I spent a long time resisting the lure of ggplot2. I was proficient with the plotting functions in base graphics; why did I need to learn an entirely new graphics system? Yes, getting up colour ramps could be a real … Continue reading

Posted in EDA, R | Leave a comment

Variance inflation factors and ordination model selection

Variance inflation factors (VIF) give a measure of the extent of multicollinearity in the predictors of a regression. If the VIF of a predictor is high, it indicates that that predictor is highly correlated with other predictors, it contains little or no unique information, and there is redundancy … Continue reading

Posted in EDA, R | Tagged | 4 Comments

REDFIT’s rule of thumb

Because REDFIT tests many frequencies, some are likely to appear statistically significant just by chance — a classic multiple testing problem.  Schulz & Mudelsee (2002) “follow Thomson (1990) and select a false-alarm level of (1-1/n)*100%, where n is the number of data points … Continue reading

Posted in EDA, Peer reviewed literature, R | Tagged , | 1 Comment

REDFIT & false alarms

REDFIT is a useful tool for palaeoecologists who like to test their data for periodicities as it uses the Lomb-Scargle Fourier transform which tolerates unequal time intervals and so avoids the problems inherent in interpolating data to equal intervals. Several of the papers reporting the … Continue reading

Posted in EDA, Peer reviewed literature, R | Tagged , | 3 Comments

Collaboration networks in BIO

I’ve been intrigued about social network analysis since reading the Wegman report that seemed to find that Michael Mann was at the centre of the network of co-authors on papers on which he was a co-author (about as surprising as … Continue reading

Posted in EDA, R | Tagged | 1 Comment

Running correlations – running into problems

Running correlations are a useful technique to explore how correlations between two variables vary in time or space. The correlation is calculated in a window of the first n observations, then the window is moved by one position, and the … Continue reading

Posted in EDA, Peer reviewed literature, R | 17 Comments