Reusing data: bridging the distance between data consumers and producers
The development of open research data has generally been from the perspective of producers sharing data; we argue that it is vital to include the consumer perspective. In this blog post, we report insights gained from a workshop held in November 2024 exploring data reuse from different perspectives.

Open science initiatives have increased the amount of (openly) available research data. Such data are often shared with the idea that they will be reused, although reuse is far from being guaranteed. Working to bridge the distance between data producers and data consumers has been suggested as one possible way to facilitate the reuse of existing data supplies. To this end, we conducted a workshop, funded by Open Science Netherlands (OSNL), which brought together data infrastructure professionals, researchers, and data stewards to discuss and brainstorm key questions related to barriers and drivers for reusing data. We structured the workshop around three different perspectives related to these questions:
- What do we currently know about reuse?
- How can we encourage and foster data reuse?
- How can we address social and organisational factors for reusing data?
1. What do we currently know about data reuse?
To explore this first perspective, data infrastructure providers gave short presentations centring around how they cultivate audiences for their data and what they know about how their data are currently used.
Madeleine de Smaele from 4TU.ResearchData highlighted that while having an audience for data is important, it is not easy to trace or develop audiences for research data. Currently, 4TU collects metrics such as the number of data downloads, views, or (sometimes) citations. Ricarda Braukmann from the Data Archiving and Networked Services (DANS) further discussed difficulties in tracing information about what happens to data once they are ‘outside’ of the repository. While DANS can trace the number of visitors to the repository and data downloads, many key questions about data reuse (e.g. who downloaded data and for which purposes) remain hidden from view. She also emphasized the need to treat data metrics with caution, as the meaning behind such numbers is not always obvious; fifty data downloads could mean that one person downloaded a dataset fifty times or that fifty different people did so once each. Interestingly, as Kati Mozygemba from QualidataNet and Qualiservice emphasised, more is often known about the reuse of restricted access data, as reusers of this data need to complete detailed requests including their demographic information and reasons for reuse.

2. How can we encourage and foster data reuse?
To address this second perspective, participants split into break-out groups to discuss one of four given topics related to how to foster data reuse: (1) alternative (non-academic) data platforms, (2) peer review of data, (3) measuring data reuse, and (4) publication packages.
Alternative (non-academic) data platforms: Participants focusing on this topic explored several data platforms, including innovative repositories in the arts, humanities, and ethnographic research (e.g. the Platform for Experimental Ethnography and Research Platform Art) and government-run platforms (such as Statistics Netherlands). A shared takeaway was that academic repositories can learn a great deal from their alternative/non-academic counterparts. Participants had the sense that the platforms they looked at had well-developed brand identities, provided users with intuitive interfaces for interacting with and reusing the data, described the data in accessible human-readable language, and created visualisations that supported non-experts’ understanding of these data – all things that are less commonly found in academic data repositories. Conversely, both groups found that academic repositories tend to do a much better job implementing certain standards that make searching, accessing, and referencing data much more straightforward.
Peer review of data: While peer review is common practice for academic publications, it is largely unknown territory for data sets. To develop peer review of data into a common practice, participants felt that it would be necessary to have a community agreement on what peer review in this case actually entails, both in terms of timelines and the breadth of review. For example, should “peer review” consist only of metadata checks or checking for identifying personal information? Or should it include more detailed reviews of data, such as looking at variables or running code? One of the main obstacles that was voiced was the difficulty in finding reviewers with the correct (scientific and disciplinary) expertise to perform such detailed reviews.
Measuring data reuse: The topic of measuring data reuse was regarded with some caution during the discussions. In general, the sentiment was that we should try to learn from and avoid the negative consequences of publication metrics. Data reuse metrics should be put into context, possibly by combining quantitative information with qualitative information. Rather than focusing on individual performance, the metrics should be used to study trends over time, display positive developments, and spotlight interesting cases of data reuse. Finally, the practice of measuring data reuse should be tied to practices of Recognition and Reward for people that reuse data responsibly ánd for those whose data are reusable.
Publication packages: “Publication packages” – essentially bundles of files including data, code, and methodological documentation – are designed to archive all related materials that might be needed to replicate, reproduce, or support a study’s results. Participants highlighted that the rationale behind putting together such packages (for instance, as outlined in a set of guidelines specific to Faculties of Social & Behavioural Sciences in the Netherlands) is often rooted in ideas about accountability and research integrity rather than data sharing reuse. To support a more reuse-oriented approach to publication packages, participants suggested that researchers would need to include richer methodological documentation in their packages, as well as clear statements of precise research questions. These could then be used to query and search packages that might have relevant data for reuse. Discussions also highlighted that to be effective at encouraging data reuse, such packages would need wider uptake across disciplines and could benefit from more standardised workflows.

3. How can we address social and organisational factors for reusing data?
This third and final perspective aimed to consider drivers and barriers to data reuse, particularly in relation to Recognition and Rewards efforts. To do this, we invited Leiden-based researchers from different disciplines who both produce and reuse data to share their experiences.
Tom Heyman, a psychology researcher, explained that Recognition and Rewards were not a factor in sharing his data openly per se; rather he was intrinsically motivated to do so. For Tom, reusing data was a fruitful but at times frustrating experience, compounded by problems such as dead links or a lack of robust data. Eric Schultes, a researcher at the Leiden Academic Centre for Drug Research, promoted the idea of nanopublications -– essentially, the publication and reuse of statement-level assertions about the natural or social world that are derived from the research process. This approach, while allowing fine-grained attribution and context, also requires certain technical expertise and infrastructure with significant computing power to search through the large quantity of resulting data. Finally, Benjamin Storme discussed his experience conducting comparative linguistic analysis at a global scale. This line of research relies heavily on the availability of databases with annotated speech data – data which would be impossible to collect for all the languages of the world individually. As a data consumer, he also stressed the importance of rich metadata and the harmonisation of metadata and datasets to facilitate reuse.
Main takeaways
While our workshop focused on thinking about how to encourage and foster the reuse of data, we also encountered many challenges and points for further thought. Across the perspectives discussed here, we see that while data reuse is happening – often in exciting ways – it continues to be challenging from technical, social, and organisational standpoints. Our workshop further highlighted the need to continue to carefully consider appropriate and meaningful metrics for data reuse which capture context and avoid the potential pitfalls of other metrics common in academic work. At the same time, we also found value in looking both “inside and outside” of academia – to other academic practices, such as peer review, or to external data platforms – when thinking about how to make data more usable. The real value of the workshop, however, was in bringing together diverse perspectives and stakeholders to continue the discussion as we work towards bridging the distance between data consumers and producers.
Acknowledgements
We would like to thank Open Science Netherlands for their generous funding and support of this workshop.
Header image by Modestas Urbonas on Unsplash.
DOI: 10.59350/h4f1k-00y10 (export/download/cite this blog post)
0 Comments
Add a comment