Been scooped? A discussion on data stewardship

Posted on 26/02/2018 by richard telford

At Climate of the Past, there is a pre-print by Darrell Kaufman and others on the data stewardship policies adopted by the PAGES 2k special issue.

Abstract. Data stewardship is an essential element of the publication process. Knowing how to enact generally described data policies can be difficult, however. Examples are needed to model the implementation of open-data polices in actual studies. Here we explain the procedure used to attain a high and consistent level of data stewardship across a special issue of the journal, Climate of the Past. We discuss the challenges related to (1) determining which data are essential for public archival, (2) using previously published data, and (3) understanding how to cite data. We anticipate that open-data sharing in paleo sciences will accelerate as the advantages become more evident and the practice becomes a standard part of publication.

The policy was closely aligned to the regular Climate of the Past policies (which are among the better policies in palaeo journals), but with more “must”. The discussion/review is ongoing, with a co-editor-in-chief encouraging further contributions to the discussion.

Two of the comments posted so far, while generally supportive of data archiving, raise concerns. Both express concern about the impact on early career scientists.

From Karoly et al

The impact of rigid data policies formulated in a top-down manner by experienced researchers (often those involved in modelling or multi-proxy synthesis) with large teams will generally be negative on early-career researchers who are often working to schedules around their PhD study and cannot as rapidly produce the final products of their work as can a larger group. With a desire to succeed and contribute to the science, this leaves them vulnerable to ’scientific exploitation’ and, in more serious cases, may compromise the successful completion of their postgraduate studies and future careers.

From Cook

It is quite clear that the ramifications of this “pilot” have not been thought out well by its authors, especially given the way that it forces graduate students and early-career scientists to give up their sensitive new data prematurely before their degrees or projects are completed. I know because this is a very real concern of my two graduate students and they deserve to be concerned given this so-called “best practices” data stewardship policy that prompted the earlier comment.

Karoly et al and Cook are presumably concerned that early career scientists who archive their data on publication will be scooped, one of six common fears about archiving data (Bishop 2015).

Is there any justification for this fear? Does anyone have any examples of scientists being scooped because they archived data on publication? Or better still, know of a study of the prevalence of scooping?

I am aware of people being scooped after making unpublished data publicly available. Ongoing time series are a particular problem – please follow their data usage policy.

1/2 One the subject of data parasites: If someone publishes one of your publicly available datasets (that you have not yet published)…

— Trina McMahon 🌪 (@quendi) March 24, 2016

2/2 WITHOUT contacting you, what would you do? I am all for open science/data but also encourage common courtesy.

— Trina McMahon 🌪 (@quendi) March 24, 2016

I think the risk of being scooped because of archived data is low.

A well designed project plans the analyses in advance – the authors should know what papers they plan to write before the data are gathered – so they have a large head-start on anyone else.
Publishing a paper usually takes months: during that time the authors are the only people with access to the data giving a further headstart.
The second paper will rarely just use the same data as the first (if it does, care should be taken to avoid salami-slicing).
Most people have a backlog of papers that need writing, and also have the courtesy not to write a paper on a single, recently published dataset without including the authors.

If scooping is a significant risk, one option is to allow data to be archived but protected from download for some time. The downside of this is that the data are not immediately available for replicating the paper. Another option would be to make the data available under embargo, so the study can be replicated but the data cannot be included in any paper until after a certain date.

We need to know the prevalence of scooping using data archived on publication. Without knowing the prevalence, we don’t know whether we need to adjust policies and practices to reduce the risk, or to put more effort into assuaging the fears of early career scientists.

About richard telford

Ecologist with interests in quantitative methods and palaeoenvironments

View all posts by richard telford →

This entry was posted in Uncategorized and tagged data archiving. Bookmark the permalink.

18 Responses to Been scooped? A discussion on data stewardship

Magma says:

26/02/2018 at 9:48 pm

To me, the likelihood that ECRs would have their doctoral or post-doctoral research data scooped seems low. First of all, such individuals tend to be low profile and their main challenge seems to be how to GET noticed, not the potential negative consequences of this happening. Second, the injustice of using a starting scientist’s data without their permission and to their publishing and career detriment would seem to be very clear, and a strong disincentive to all but the most unscrupulous.

Reply
Gavin Simpson says:

27/02/2018 at 6:12 pm

Whilst I too see the risk of being “scooped” as low it is undeniably a concern of some in the community. Even if we were able to demonstrate the, what I believe to be, low level risk of being scooped, that may not assuage the concerns of our colleagues.

I am surprised that no-one has (yet) suggested alternatives that would allow, I believe, mitigate the concerns of ECRs and their supervisors. The most important aspect of data stewardship at this stage in the transition to Open Data is to actively archive data. There is no reason that data cannot be archived in a repository and held under embargo until a predetermined time. Once the embargo period is passed the data aould automatically be publicly available. In the interim, the data could be referenced with a DOI or similar identifier, just not accessed without permission from the data generators. This should satisfy those concerned about impacts on ECRs or research projects — there are timelines for completion of degrees and projects that would set reasonable limits on any embargo — as well as those pushing for better data stewardship — the data would be archived to a standard.

I believe Pangaea is set-up in such a way that this may be possible — we’ve deposited data there for under-review papers — but may require some changes to the service. Engaging all members of the community, not just the data generators and those organizing large collaborative data syntheses would be productive in finding workable solutions in the short-term.

Note that I am no fan of embargoes in general. However, there are concerns among our community that could be addressed through their appropriate and limited use during this period of transition. It is vitally important that we bring the community with us because they want to rather rather than kicking and screaming.

Now, how do I submit a discussion comment to CoP…

Reply
Willis Eschenbach says:

27/02/2018 at 6:59 pm

IF under the noble-sounding banner of avoiding being “scooped” we allow people to publish their results and hide their data, it means that their work cannot be replicated.

Since work that cannot be replicated is MEANINGLESS in science, this is a very very very very very bad idea … it would lead to the emergence of more people like Lonnie Thompson and his wife. Professor Thompson has managed to keep publicly funded data secret for decades, and this proposal would make legal everything he’s done with his data.

Seriously, I don’t care how many PhD candidates don’t want to archive their data. Cry me a river. You want to run with the big dogs, you gotta piss with the big dogs.

No data, no code, no science is my rule.

w.

Reply
- richard telford says:
  
  27/02/2018 at 9:29 pm
  
  I allow you to comment here. I allow you to criticise scientists when supported by evidence (which was lacking here). I do not allow you to be rude on my blog, hence I have edited your comment.
  
  I am not sure from your comment whether you appreciate that I am in favour of mandatory archiving data and code on publication with a few exceptions as possible. I do not believe that an exaggerated fear of being scooped should be allowed as an exception, but concede that we need to work to assuage this fear.
  
  Reply
  - Willis Eschenbach says:
    
    27/02/2018 at 9:42 pm
    
    Thanks for the reply, Richard. I didn’t realize that so many people were unaware of the actions of the Thompsons. I’ve provided heaps of evidence below.
    
    w.
Magma says:

27/02/2018 at 7:21 pm

Your blog, Richard, and obviously your rules. But you may want to consider whether or not to keep a long-time Wattite’s defamatory comments about a research scientist.

https://earthsciences.osu.edu/people/thompson.3
http://research.bpcrc.osu.edu/Icecore/data/

Reply
Willis Eschenbach says:

27/02/2018 at 8:30 pm

Magma, fortunately for me, the truth is an absolute defense against charges of libel … it’s not defamatory when it’s true.

https://climateaudit.org/category/proxies/thompson/

w.

Reply
Pingback: Weekend reads: 20th anniversary of a fraud; uses and misuses of doubt; how common is scooping? – Retraction Watch
Eli Rabett says:

04/03/2018 at 10:37 pm

The issue with graduate students and early career people is that they do not need the added pressure about being scooped on their own data. Things are difficult enough for them without that.

Reply
- Willis Eschenbach says:
  
  05/03/2018 at 12:01 am
  
  So what is your solution, Eli? Allow them to keep their data secret so that their results are not verifiable? What is your plan to fix “the issue with graduate students”?
  
  Thanks,
  
  w.
  
  Reply
  - Marco says:
    
    05/03/2018 at 9:40 am
    
    How’s McIntyre’s verification of the PARCA data going?
  - Eli Rabett says:
    
    06/03/2018 at 2:19 pm
    
    NASA for example has a policy of allowing data from satellites to be held by mission scientists for a period (a year?). Most funding agencies have agreed that publications can be held behind a pay wall for a year or so.
    
    For a graduate student or young career person a grant period of three years would not IEHO be unreasonable.
  - richard telford says:
    
    06/03/2018 at 3:26 pm
    
    Is this a year from data collection or a year from publication?
Eli Rabett says:

08/03/2018 at 2:42 am

For NASA it’s (or at least was when Eli knew about these things) from collection. It made for interesting times. For example there was a Euro and a US group that was collaborating on an instrument on Cassini but the US group was busy flying the probe and the Euro group scooped them. There are no perfect answers.

Reply
Willis Eschenbach says:

08/03/2018 at 3:11 am

Thanks for the clarification, Eli. I’d have no problem with a year’s delay on data collected by a scientist IF the scientist is 100% self-funded.

But if he’s collecting data on my taxpayers dime, I’m not paying for him to squat on the data while he covers himself in glory. He is merely a public servant hired to collect important data, and HE HAS ALREADY BEEN PAID FOR IT, so he has no claim to it at all. If that’s the case, immediate publication should be the rule. Since you and I bought the data, we should be able to use it right away.

w.

Reply
Marco says:

08/03/2018 at 6:13 pm

Most scientists aren’t hired to collect important data, they are hired to do science, of which collecting important data is often just one part.

Reply
- Willis Eschenbach says:
  
  08/03/2018 at 6:14 pm
  
  Marco, you do realize that you just contradicted yourself?
  
  w.
  
  Reply
  - Marco says:
    
    08/03/2018 at 8:19 pm
    
    I didn’t. We’re paid to do more than just collect the data – and it’s the important step of analyzing and interpreting that data (which we are also paid to do), which you may well throw out when you have to put your collected data out in the open right away. Thus, in that case scientists aren’t doing what they are being paid for.
    
    Of course, when I look at the average hrs a week that I make, in principle I should keep about 20% of all the data to myself. The taxpayer didn’t pay for that – I paid for it with my own free time (I already subtracted costs for the use of equipment). I’m pretty sure that goes for quite a few of us. If you really want what you pay for, expect a lot less!

	vincepi on Tools for a reproducible …
	richard telford on The lure of underwater vo…
	Joe on The lure of underwater vo…
	Reproducibility of h… on why would anyone not trust the…
	Reproducibility of h… on 73 lakes become 78