Independent researchers: traces in bibliographic data
This blog post presents preliminary findings from an exploratory study on independent researchers, using OpenAlex data to trace their research interests and career paths. We also discuss our experience working with open and federated data solutions during the study.
Introduction
Independent 10 researchers are a diverse category of researchers, from those who want to pursue their own path to the ones who are struggling to stay within the competitive world of academia. We define independent researchers broadly as people who conduct research without an institutional affiliation. A recent blogpost describes how researchers trying to work outside of research institutions face various difficulties due to their lack of access to institutional resources. These difficulties include, for example, the access to funding opportunities as research funders often presuppose an institutional connection. To this, we can add the difficulties in accessing resources such as academic journals which are commonly accessible through subscriptions supported and offered by institutions to their employees. As scholars using bibliometric methods, we were interested in how independent researchers can be traced in bibliographic data. Interest in the existence of independent researchers has been extremely limited in the past, with only two previous attempts made in bibliometric studies; one using data from Scopus and the other one using Web of Science data. For our study, we used affiliation data as a starting point and began by querying Web of Science, Scopus, and OpenAlex. These three data sources treat data on independent researchers differently and we decided to use OpenAlex as data source for two reasons: it has the highest number of records authored by at least one independent researcher and it offers data with a CC0-license, making it possible for us to reshare the data. To be able to trace independent researchers throughout their careers, we have also made use of ORCID data (another open data source).


Two interesting findings about independent researchers
While this is very much an ongoing study, we can already share some preliminary results. First, our findings show that independent researchers are a small but seemingly growing group of researchers. Regarding possible growth, it is rather difficult to assess whether the group is actually growing or whether changes in metadata/affiliation practices have affected the data in the period studied (from 2010 onwards). We chose to limit our study to Social Sciences and Humanities (SSH), as the literature we studied revealed that many independent researchers belong to these fields. By examining the data, it becomes apparent that topics studied by independent researchers seem to differ somewhat from topics studied by researchers in SSH in general. In the case of independent researchers, we find that the most researched topic is History of Science. There also seems to be a tendency for lower open access uptake among these researchers (see graph above on the right), which could be related to the common practice of the author having to pay to publish open access. If these researchers have trouble finding funding in the first place, which can be gleaned from testimonies of independent researchers themselves (see blogpost in the introduction), it is probably difficult to find the funds to pay article processing charges, especially if the independent researcher is first or sole author of the publication.
Also, through affiliation information, we are able to make some inferences about the career path of independent researchers. Many researchers who publish as ‘independent’ started their career at a research organisation and may move back and forth between ‘independence’ and academic positions. This latter category can be divided into people being mainly affiliated but publishing as independent for a short period of time, or the opposite - publishing mostly independent but having occasional papers published with affiliation. There are also researchers who remain independent throughout their whole career. While only a minority of independent researchers had an updated employment history in their ORCID profiles, the employment histories available are still a potentially interesting avenue for the study of career trajectories.
Experience of working with open data
In line with the principles of the Barcelona Declaration on Open Research Information, the 2024 STI conference in Berlin, where we presented our work in September, encouraged all presenters to openly share their data and code. We see many benefits for doing this, not just for us, but also for having a broader reach and giving others the possibility to validate and build on our work. A recent post on this blog describes a possible federated solution and another post describes a practical example of this. As described in these blog posts, we utilized Google BigQuery and Colab to search OpenAlex and ORCID data within the InSySPo database. As we are working at universities in different countries, the Google solution makes collaboration and the process of sharing data easier.
We had to use a two-step approach to find publications from the times when researchers were independent and from when they were affiliated to an institution, as many researchers transition between these roles throughout their careers. First, we retrieved all articles with independent researchers among the authors, and in the next step we searched for all works by these identified researchers to track their career trajectories. Throughout this process we relied on OpenAlex author IDs for author identification. OpenAlex data contains ORCID information harvested from publications but also added in the internal author disambiguation process of OpenAlex. However, adding ORCID information to internal author disambiguation is complex and may lead to errors. Fortunately, these can be openly discussed in the OpenAlex discussion group, raising awareness and hopefully improving algorithms and data accuracy.
Further work and some final thoughts
We find that bibliometric traces of independent researchers are increasing. However, how we define them and which data we use will lead to differences in results warranting further discussions. The various ways in which independent researchers may face difficulties and exclusions from the academic workplace make them an important group to continue to investigate. The InSySPo project has been particularly useful for accessing data in one place (in our case OpenAlex and ORCID data). Using an open data source enables us to share the data freely (which seems appropriate particularly when discussing a group of researchers who do not have access to subscription-based bibliographic data). In the emerging federated data landscape, ongoing discussions about best practices—such as data collection, hosting, and connectivity—are essential.
Our further work on independent researchers will focus on career trajectories and gender. From the literature we found that women seem to be more common as independent researchers, which may point to gendered bias in the academic workplace. Career trajectories may give rise to new discussions as different paths may reveal distinct types of independent researchers. For example, some may be fully independent, publishing exclusively as such, while others may be affiliated but publish occasionally as independents. So, what drives individuals to shift between those statuses, and what factors influence them to spend longer periods in one or the other? Could it reflect funding, or chosen research topics or something else? Starting out in academia and then moving to independence might reflect the choice to become more independent, whereas moving back and forth may reflect the difficulties of finding (permanent) positions and funding. With regards to the data, we find that OpenAlex is adding more metadata and records at a fast rate, thus increasing the amount of data we can gather about this group of researchers operating somewhat on the boundary of the academic landscape.
We would like to thank InSySPo/Alysson Mazoni for their support throughout this study.
Our data and code can be found on Zenodo and GitHub. This work was also presented as a poster at the STI conference in Berlin in September 2024, and the paper can be found here.
Header image by Ralph Katieb on Unsplash.
DOI: 10.59350/hr83z-8x491 (export/download/cite this blog post)
0 Comments
Add a comment