At Climate of the Past, there is a pre-print by Darrell Kaufman and others on the data stewardship policies adopted by the PAGES 2k special issue.
Abstract. Data stewardship is an essential element of the publication process. Knowing how to enact generally described data policies can be difficult, however. Examples are needed to model the implementation of open-data polices in actual studies. Here we explain the procedure used to attain a high and consistent level of data stewardship across a special issue of the journal, Climate of the Past. We discuss the challenges related to (1) determining which data are essential for public archival, (2) using previously published data, and (3) understanding how to cite data. We anticipate that open-data sharing in paleo sciences will accelerate as the advantages become more evident and the practice becomes a standard part of publication.
The policy was closely aligned to the regular Climate of the Past policies (which are among the better policies in palaeo journals), but with more “must”. The discussion/review is ongoing, with a co-editor-in-chief encouraging further contributions to the discussion.
Two of the comments posted so far, while generally supportive of data archiving, raise concerns. Both express concern about the impact on early career scientists.
From Karoly et al
The impact of rigid data policies formulated in a top-down manner by experienced researchers (often those involved in modelling or multi-proxy synthesis) with large teams will generally be negative on early-career researchers who are often working to schedules around their PhD study and cannot as rapidly produce the final products of their work as can a larger group. With a desire to succeed and contribute to the science, this leaves them vulnerable to ’scientific exploitation’ and, in more serious cases, may compromise the successful completion of their postgraduate studies and future careers.
It is quite clear that the ramifications of this “pilot” have not been thought out well by its authors, especially given the way that it forces graduate students and early-career scientists to give up their sensitive new data prematurely before their degrees or projects are completed. I know because this is a very real concern of my two graduate students and they deserve to be concerned given this so-called “best practices” data stewardship policy that prompted the earlier comment.
Karoly et al and Cook are presumably concerned that early career scientists who archive their data on publication will be scooped, one of six common fears about archiving data (Bishop 2015).
Is there any justification for this fear? Does anyone have any examples of scientists being scooped because they archived data on publication? Or better still, know of a study of the prevalence of scooping?
I am aware of people being scooped after making unpublished data publicly available. Ongoing time series are a particular problem – please follow their data usage policy.
See also the discovery of the dwarf planet Haumea.
I think the risk of being scooped because of archived data is low.
- A well designed project plans the analyses in advance – the authors should know what papers they plan to write before the data are gathered – so they have a large head-start on anyone else.
- Publishing a paper usually takes months: during that time the authors are the only people with access to the data giving a further headstart.
- The second paper will rarely just use the same data as the first (if it does, care should be taken to avoid salami-slicing).
- Most people have a backlog of papers that need writing, and also have the courtesy not to write a paper on a single, recently published dataset without including the authors.
If scooping is a significant risk, one option is to allow data to be archived but protected from download for some time. The downside of this is that the data are not immediately available for replicating the paper. Another option would be to make the data available under embargo, so the study can be replicated but the data cannot be included in any paper until after a certain date.
We need to know the prevalence of scooping using data archived on publication. Without knowing the prevalence, we don’t know whether we need to adjust policies and practices to reduce the risk, or to put more effort into assuaging the fears of early career scientists.