“spatial autocorrelation is not a major issue” de Vernal and Radi (2011)

At the Dino09 conference, Anne de Vernal and Taoufik Radi gave a workshop “Dinocyst assemblages as proxy in late Cenozoic paleoceanography: towards quantitative reconstructions using transfer functions”.

The training manual for the workshop contains this:

When the database is very large, the leave-one-out method may result in underestimation of the error because of spatial autocorrelation (e.g., Telford and Birks, 2005). However, in the case of dinocyst database characterized by large environmental gradients, spatial autocorrelation is not a major issue and it does not affect more MAT than other transfer function techniques (Guiot and de Vernal, 2011).

Some of this is true. It is easy to demonstrate that leave-one-out cross-validation underestimates uncertainty in spatially autocorrelated environments, either by h-block cross-validation (where the test sample and samples within h-km of the test sample are omitted from the training set), the use of spatially independent test sets, or by showing how well artificial spatially autocorrelated variables can be reconstructed.

But the database does not have to be large for spatial autocorrelation to be a potential problem. I would worry about a foram sea-level training set composed of 20 samples on a transect across a salt marsh being influenced by spatial autocorrelation. With regard to ocean-scale training sets, the more samples the more serious the potential problem. I doubt that autocorrelation is a problem for Imbrie & Kipp’s (1971) 61-sample foram-SST training set, but when there are an order of magnitude more samples, there is a clear problem. It is density of samples rather than simply number of samples that causes problems.

It is not immediately obvious why large environmental gradients should save dinocyst training sets from autocorrelation: this sounds like special pleading. Certainly the large gradients don’t help the foram-SST training set, where autocorrelation makes the modern analogue technique (MAT), under-estimate the uncertainty by a factor of ~2.

Contrary to what de Vernal and Radi write here, we should expect autocorrelation to be a more severe problem for MAT than for methods like weighted averaging or weighed averaging partial least squares (WAPLS). MAT considers only the most taxonomically similar analogues, and completely ignores the remainder of the dataset. If there is strong spatial autocorrelation, the best analogues will tend to be geographically close to the test site during cross-validation, and so will have apparently good estimates for any spatially structured environmental variable, whether meaningful or not. In contrast, WAPLS considers the entire training set when calculating the species optima, so it is much less sensitive to local conditions.

This contrast between WAPLS and MAT can be demonstrated by using h-block cross-validation on a spatially autocorrelated training set. With MAT, the performance tends to be very high when h is small, but fall rapidly as h increases. WAPLS tends to perform less well than MAT when h is small, but because the degradation in performance is less marked as h increases, performs better when h is large.

r2 of the a dinocyst-winter sea surface salinity transfer function using MAT (black) or WAPLS (red).

r2 of the a dinocyst-winter sea surface salinity transfer function using MAT (black) or WAPLS (red).

Advertisements

About richard telford

Ecologist with interests in quantitative methods and palaeoenvironments
Aside | This entry was posted in transfer function. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s