Research Interests
One tablespoon of uncertainty for large language models, with a pinch of computational social science.
Publications
Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings
David Dukić, Ana Barić, Marko Čuljak, Josip Jukić, Martin Tutek
In Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025), pages 108–115, Vienna, Austria. Association for Computational Linguistics.
Measuring how semantics of words change over time improves our understanding of how cultures and perspectives change. Diachronic word embeddings help us quantify this shift, although previous studies leveraged substantial temporally annotated corpora. In this work, we use a corpus of 9.5 million Croatian news articles spanning the past 25 years and quantify semantic change using skip-gram word embeddings trained on five-year periods. Our analysis finds that word embeddings capture linguistic shifts of terms pertaining to major topics in this timespan (COVID-19, Croatia joining the European Union, technological advancements). We also find evidence that embeddings from post-2020 encode increased positivity in sentiment analysis tasks, contrasting studies reporting a decline in mental health over the same period.

Target Two Birds With One SToNe: Entity-Level Sentiment and Tone Analysis in Croatian News Headlines
Ana Barić, Laura Majer, David Dukić, Marijana Grbeša-Zenzerović, and Jan Šnajder
In Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023), pages 78–85, Dubrovnik, Croatia. Association for Computational Linguistics.
Sentiment analysis is often used to examine how different actors are portrayed in the media, and analysis of news headlines is of particular interest due to their attention-grabbing role. We address the task of entity-level sentiment analysis from Croatian news headlines. We frame the task as targeted sentiment analysis (TSA), explicitly differentiating between sentiment toward a named entity and the overall tone of the headline. We describe SToNe, a new dataset for this task with sentiment and tone labels. We implement several neural benchmark models, utilizing single- and multi-task training, and show that TSA can benefit from tone information. Finally, we gauge the difficulty of this task by leveraging dataset cartography.

