Unveiling the potentials and limitations of Sentiment analysis algorithms to study news media

This study was conducted under the CLEOPATRA project. It has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 812997, and the Center for Advanced Internet Studies (CAIS) via the Ministerium für Kultur und Wissenschaft des Landes Nordrhein-Westfalen (2023).

Paper | Combining sentiment analysis classifiers to explore multilingual news articles covering the London 2012 and Rio 2016 Olympics


About: This study aims to present an approach to the challenges of working with Sentiment Analysis (SA) applied to news articles in a multilingual corpus. It looks at the use and combination of multiple algorithms to explore news articles published in English and Portuguese. It presents a methodology that starts by evaluating and combining four SA algorithms (SenticNet, SentiStrength, Vader and BERT, being BERT trained in two datasets) to improve the quality of outputs. A thorough review of the algorithms’ limitations is conducted using SHAP, an explainable AI tool, resulting in a list of issues that researchers must consider before using SA to interpret texts. We propose a combination of the three best classifiers (Vader, Amazon BERT and Sent140 BERT) to identify contradictory results, improving the quality of the positive, neutral and negative labels assigned to the texts. Challenges with translation are addressed, indicating possible solutions for non-English corpora. As a case study, the method is applied to the study of the media coverage of the London 2012 and Rio 2016 Olympic legacies. The combination of different classifiers has proved to be efficient, revealing the unbalance between the media coverage of London 2012, much more positive, and Rio 2016, more negative.