Write your content here...
Publications
You Are What You Talk About: Inducing Evaluative Topics for Personality Analysis
Josip Jukić, Iva Vukojević, Jan Šnajder
Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3986–3999, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Expressing attitude or stance toward entities and concepts is an integral part of human behavior and personality. Recently, evaluative language data has become more accessible with social media’s rapid growth, enabling large-scale opinion analysis. However, surprisingly little research examines the relationship between personality and evaluative language. To bridge this gap, we introduce the notion of evaluative topics, obtained by applying topic models to pre-filtered evaluative text from social media. We then link evaluative topics to individual text authors to build their evaluative profiles. We apply evaluative profiling to Reddit comments labeled with personality scores and conduct an exploratory study on the relationship between evaluative topics and Big Five personality facets, aiming for a more interpretable, facet-level analysis. Finally, we validate our approach by observing correlations consistent with prior research in personality psychology.

Personality adjectives in the digital world: A natural language processing study of big five adjectives and their usage on reddit
Iva Vukojević, Irina Masnikosa, Matej Gjurković, Nina Drobac, Ana Butković, Martina Lozić, Denis Bratko, Jan Šnajder
Journal of Research in Personality
Psycholexical studies explore the intricate interplay between language and personality traits, focusing on trait representation in language. One aspect of such representation is the frequency of personality adjective usage. This study examines how linguistic and trait-label properties of personality adjectives relate to their usage frequency. Utilizing a corpus from the social media platform Reddit, we employ natural language processing to analyze Big Five adjectives in person-descriptions. Our results show that trait-label properties exhibit different patterns when considered together rather than separately from linguistic properties—for instance, prefixal composition nullifies the expected effect of polarity on frequency. These findings highlight the importance of considering both linguistic and trait-label properties when assessing the usage of personality adjectives.
SIMPA: statement-to-item matching personality assessment from text
Matej Gjurković, Iva Vukojević, Jan Šnajder
Future generation computer systems
Automated text-based personality assessment (ATBPA) methods can analyze large amounts of text data and identify nuanced linguistic personality cues. However, current approaches lack the interpretability, explainability, and validity offered by standard questionnaire instruments. To address these weaknesses, we propose an approach that combines questionnaire-based and text-based approaches to personality assessment. Our Statement-to-Item Matching Personality Assessment (SIMPA) framework uses natural language processing methods to detect self-referencing descriptions of personality in a target’s text and utilizes these descriptions for personality assessment. The core of the framework is the notion of a trait-constrained semantic similarity between the target’s freely expressed statements and questionnaire items. The conceptual basis is provided by the realistic accuracy model (RAM), which describes the process of accurate personality judgments and which we extend with a feedback loop mechanism to improve the accuracy of judgments. We present a simple proof-of-concept implementation of SIMPA for ATBPA on the social media site Reddit. We show how the framework can be used directly for unsupervised estimation of a target’s Big 5 scores and indirectly to produce features for a supervised ATBPA model, demonstrating state-of-the-art results for the personality prediction task on Reddit.
