One could call the year 2020 many names – many of which are not fit for publication. However, one name that can be published, and which we should be calling it more often, is the Year of Open Data. The incredible global response to the COVID-19 pandemic demonstrated the true power of data sharing. We saw academics and commercial companies collaborating on vaccine development, governments and non-governmental organizations pooling healthcare data, and citizens around the world volunteering their data to public health initiatives.
This enthusiasm for data sharing, of course, didn’t come out of nowhere. For the last decade there has been considerable advances in relation to Open Data. We have seen the development of a plethora of repositories and streamlined storage and sharing pathways. We have seen the development of metadata standards and curation practices. Moreover, the advent of the FAIR (findable, accessible, interoperable and re-usable) data principles together with data sharing policies have firmly positioned data sharing practices at the forefront of how we view responsible data.
Nonetheless, despite this widespread enthusiasm for the concept of data sharing, many studies have highlighted the concerns that individual researchers feel when they think about sharing their own data. Key issues such as “being scooped” or “not getting credit for data shared” are common topics of discussion. As a result, many educational initiatives on Open Data focus on addressing these concerns, highlighting the importance of data sharing for global research and the increased visibility and citations that come from open research practices.
In as much as many researchers are wary of sharing their data, many are also hesitant about re-using data from sources (organizations or individuals) that they do not know personally. This is the “flip side” of the data sharing coin that is less often talked about. And yet, talking about it is so necessary. What, one may ask, is the point of investing all this time and money into building data infrastructures and advocating for data sharing if the data itself will not be re-used?
I have studied these issues for a number of years, and of course the primary justification for not re-using data is usually that researchers do not trust the data sources. Many researchers prefer not to re-use data even if it is properly annotated and curated and stored in a trusted repository. That, of course, is their prerogative, but it does raise an interesting double standard. If, as researchers, we are concerned about our data being properly curated and rely on trusted repositories to ensure that this happens, why do we not trust the other datasets within these repositories?
A similar issue crops up when we considering giving credit for the re-use of data. As I mentioned before, failing to accrue credit for data sharing practices is a key concern amongst researchers across the globe. Creative Commons (CC) licenses are often used as a means of assuaging these concerns as they enable the data owner to set the terms for data re-use. Similarly, assigning DOIs (digital object identifiers) to datasets within repositories ensures that a persistent identifier is attached to the resource, enabling citations beyond the lifespan of individual research projects. Nonetheless, many researchers have yet to really understand both CCs and DOIs – something that is not helped by the increasingly automated processes through which these are assigned to any data deposition. As a result, they may fall short of their responsibilities as data re-users, and inadvertently contribute to the persistent problems with in the Open Data landscape (crediting and control of re-use).
As we reflect back on the £$%& year that 2020 was, we should also use it as a means of reflecting back on the enormous strides that the Open Data movement has taken around the world. We are now in a position where we are starting to see the power of sharing research data – how it can speed up research, lower costs and create global research endeavours. However, unless we understand how we as researcher fit in to this picture it will not work. We need to not only be willing to share our data and know how to protect it from mis-use, we also need to educate ourselves enough to ensure that we do not – even entirely by accident – misuse, mis-cite, or mis-attribute any data that we re-use.