COVID-19 research in the news: Visualizing the sentiment and topics in science news about the pandemic

COVID-19 research in the news: Visualizing the sentiment and topics in science news about the pandemic

Every day news outlets around the world play a central role in disseminating the latest COVID-19 research. In this post, we discuss the impact of COVID-19 findings on the news by applying state-of-the-art sentiment analysis and present some interesting preliminary results, stay tuned!

There are many reasons why we should be concerned with how science is portrayed in the news media, particularly given the ‘infodemic’ related to COVID-19. For example, over-hyped research results can lead to misinterpretation that may contribute, among other things, to public skepticism and distrust towards science. Because of that, we began to wonder how we could start the exploration of the news reception about the science related to the pandemic. More specifically, we decided to explore the potential of natural language processing (NLP), which incorporates sentiment analysis as an important indicator of expression of news media sentiment about COVID-19 findings. As a disclaimer, the analysis presented in this blog post should be seen as a preliminary exploration on how sentiment approaches can be implemented in the study of the reception of scientific content in social and news media outlets.

In our experiment we used an existing dataset of scientific publications related to research on COVID-19 updated up to April 24th, 2020 and matched it with data by Altmetric.com (Figure 1). From this dataset, we selected publications related to the pandemic as indicated by the WHO or Dimensions. Since our analysis focused on texts, we filtered out publications without an abstract. Also, from the data obtained from Altmetric.com we removed news articles that did not come with a summary text (this summary typically contains about the first 250 characters of the news media text). We ended up with a dataset of 1,910 publications with an abstract and mentions in 38,611 different news media posts.

The Sentiment Analysis

To obtain the sentiments apparent in the news articles, we used a sentiment extraction transformer built on top of BERT (Bidirectional Encoder Representations from Transformers) (See Vaswani et al, 2017). We use the bert-base-multilingual-uncased-sentiment model, which is trained in six different languages: English, Dutch, German, French, Spanish and Italian, and is fine-tuned on a set of 500,000 product reviews with sentiment labels ranging from 0 to 4, where 0 is a bad review and 4 is a good review (the pretrained model can be accessed here). Thus, sentiment scores range between 0 and 4 and can be interpreted as follows: 0=‘very negative’, 1=‘negative’, 2=‘neutral’, 3=’positive’, 4=’very positive’.


Figure 1
. The figure illustrates the process of cross-matching WHO/Dimensions COVID-19 data with Altmetric.com data to extract news sentiment for different publication topics. Bert-based sentiment classifier is used to extract sentiment of news posts, initially at the sentence level, and then at the levels of news posts and publications.

How has the science around COVID-19 been received in the news?

To get a sense of how well BERT dealt with topics related to COVID-19 research in the news we plot a term map of the most commonly co-occurring terms in scientific articles. Then, we overlay the average BERT sentiment scores of news corresponding to each paper in the dataset in order to represent the sentiment of news items around COVID-19 research. As we can see, BERT seems to be able to identify paper topics related to solutions like vaccines and treatments as more neutral/slightly positive news media pieces. On the other hand, articles on the topic of symptoms such as fever, hypertension, and policy measures to control the virus are more negatively reported in the news.

Term map - This visualization illustrates a co-occurrence map of the most common terms that are used in titles and abstracts of the selected papers. Warm colors (red) indicate where the positive attention of the media has focused on, while cool colors (blue) indicate negative attention. Terms such as “fever” and “vaccine” are among the most commonly used terms, and also “drug”, “clinical trial”, “antibody”, and “control measures” can be seen among the more frequently used terms. News talks positively about publication topics related to vaccines, antibodies and drugs, while talking about control measures and fever news posts tend to have a more negative sentiment.


We also analyzed the temporal dynamics of the news items and aggregated the average sentiment of the sentences of all the news on a given day (Figures 2 and 3). The number of news items around COVID-19 related scientific publications has increased over time, particularly from mid-March onwards, a pattern that has also been observed for Twitter, other social media sources, and in The Conversation. During the period of higher news activity (March-April), the mean sentiment scores oscillate between slight negativity (1.5) and neutrality (2) (Figure 2).

Figure 2. Trend analysis of the number of distinct news items regarding COVID-19-related publications and their average sentiment score.

In Figure 3 we show the aggregated sentiment scores at the month-level to show the overall increase of the sentiment inferred from the news items from the early months to the more recent ones.

Figure 3. Box plot analysis of the sentiment score of the sentences from news items mentioning COVID-19 research per month of publication of the news items .

Another interesting piece of information recorded by Altmetric.com are the sources of the news items. This enables the study of the type of sentiments expressed by the different news items providers (Figure 4).

Figure 4. Top 35 news outlets characterized by the average sentiment in their news items.

Interestingly, some of the most popular news outlets related to medical research (e.g. MedicalXpress, The Conversation or Medscape) exhibit values very close to 2, suggesting a high degree of neutrality in their dissemination of COVID-19 science related news. In contrast, business-related news outlets (Business Insider - Malaysia, Singapore, Australia, India or the Netherlands) tend to have a more negative sentiment in their news items, perhaps due to the negativity around the critical economic situation caused by the pandemic. Other news aggregators such as Yahoo! News, MSN, or Google News also exhibit rather negative sentiments, which is in line with news media such as the New York Times, CNN News or The Guardian. An interesting exception is the conservative channel Fox News, with a fairly positive coverage of the research around the pandemic.

What did we learn from this exercise?

This is a first analysis of the sentiment of news items covering scientific articles about COVID-19. Overall, we observe a slight increase in the neutrality of news as they move from a slightly negative sentiment in the early months to a more neutral sentiment of scientific findings. On average, paper topics related to solutions like vaccines and treatments tend to be more neutral or positively treated in the news, while paper topics about transmission and control measures are more negatively disseminated. Medical-related news sources tend to present more neutral views, while generalistic and business-related news outlets write more negatively about scientific research related to the virus.

However, this exercise is by no means in its final stages. Given the lack of abstracts in many of the publications and occasionally of summary text from news items, we could only study a limited selection of publications and news media items. In the future we will consider larger sets of publications and news media items. Another concern is that we used an already trained BERT model fine tuned for sentiment analysis on product reviews, and used it for classifying news items about research. While models like BERT can be generalized to different contexts (especially social media), we could have obtained state-of-the-art classification by fine tuning the model with a corpus of research articles about COVID-19 and related news items instead.

Nevertheless, BERT reveals interesting findings that we think are worth sharing in this blog post. It also shows the potential of Machine Learning such as text classification for further studying and characterizing the online and social media reception of scientific outputs. Tips on improvement would be greatly appreciated!

Acknowledgements

We thank William Schueller and Jonathan Dudek for their helpful comments on an early version of this blog post.

0 Comments

Add a comment