
About Us
TakeLab is an academic research group at the Faculty of Electrical Engineering and Computing in Zagreb, Croatia, focused on advancing artificial intelligence, machine learning, and natural language processing (NLP). Our work centers on large language models (LLMs), with a commitment to refining methods for language comprehension and analyzing complex, unstructured data.
- Advancing LLM research, with a focus on enhancing their generalization, robustness, and interpretability.
- Creating representation learning techniques to improve semantic and contextual understanding in computational systems.
- Exploring computational social science, using data-driven methods to study social interactions and societal trends.
Our research focuses on multiple aspects of representation learning, seeking a deeper understanding of the internal workings of LLMs. We also engage in interdisciplinary work within computational social science, utilizing NLP tools to analyze large datasets that reveal insights into human behavior, communication patterns, and evolving societal trends.

Latest Research
Explore our recent research studies. Select a publication to read more about it.
Disentangling Latent Shifts of In-Context Learning with Weak Supervision
NeurIPS 2025
In-context learning (ICL) enables large language models to perform few-shot learning by conditioning on labeled examples in the prompt. Despite its flexibility, ICL suffers from instability -- especially as prompt length increases with more demonstrations. To address this, we treat ICL as a source of weak supervision and propose a parameter-efficient method that disentangles demonstration-induced latent shifts from those of the query. An ICL-based teacher generates pseudo-labels on unlabeled queries, while a student predicts them using only the query input, updating a lightweight adapter. This captures demonstration effects in a compact, reusable form, enabling efficient inference while remaining composable with new demonstrations. Although trained on noisy teacher outputs, the student often outperforms its teacher through pseudo-label correction and coverage expansion, consistent with the weak-to-strong generalization effect. Empirically, our method improves generalization, stability, and efficiency across both in-domain and out-of-domain tasks, surpassing standard ICL and prior disentanglement methods.
Supervised In-Context Fine-Tuning for Generative Sequence Labeling
arXiv preprint
Sequence labeling (SL) tasks, where labels are assigned to tokens, are abundant in NLP (e.g., named entity recognition and aspect-based sentiment analysis). Owing to the intuition that they require bidirectional context, SL tasks are commonly tackled with encoder-only models. Recent work also shows that removing the causal mask in fine-tuning enables decoder-based LLMs to become effective token classifiers. Less work, however, focused on (supervised) generative SL, a more natural setting for causal LLMs. Due to their rapid scaling, causal LLMs applied to SL are expected to outperform encoders, whose own development has stagnated. In this work, we propose supervised in-context fine-tuning (SIFT) for generative SL. SIFT casts SL tasks as constrained response generation, natural to LLMs, combining in-context learning (ICL) from demonstrations with supervised fine-tuning. SIFT considerably outperforms both ICL and decoder-as-encoder fine-tuning baselines on a range of standard SL tasks. We further find that although long context hinders the performance of generative SL in both ICL and SIFT, this deficiency can be mitigated by removing the instruction, as instructions are shown to be largely unnecessary for achieving strong SL performance with SIFT. Our findings highlight strengths and limitations of SL with LLMs, underscoring the importance of a response-based generative task formulation for effective SL performance.
Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings
In Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025), pages 108–115, Vienna, Austria. Association for Computational Linguistics.
Measuring how semantics of words change over time improves our understanding of how cultures and perspectives change. Diachronic word embeddings help us quantify this shift, although previous studies leveraged substantial temporally annotated corpora. In this work, we use a corpus of 9.5 million Croatian news articles spanning the past 25 years and quantify semantic change using skip-gram word embeddings trained on five-year periods. Our analysis finds that word embeddings capture linguistic shifts of terms pertaining to major topics in this timespan (COVID-19, Croatia joining the European Union, technological advancements). We also find evidence that embeddings from post-2020 encode increased positivity in sentiment analysis tasks, contrasting studies reporting a decline in mental health over the same period.

Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
EMNLP 2025.
When prompted to think step-by-step, language models (LMs) produce a chain of thought (CoT), a sequence of reasoning steps that the model supposedly used to produce its prediction. Despite much work on CoT prompting, it is unclear if reasoning verbalized in a CoT is faithful to the models' parametric beliefs. We introduce a framework for measuring parametric faithfulness of generated reasoning, and propose Faithfulness by Unlearning Reasoning steps (FUR), an instance of this framework. FUR erases information contained in reasoning steps from model parameters, and measures faithfulness as the resulting effect on the model's prediction. Our experiments with four LMs and five multi-hop multi-choice question answering (MCQA) datasets show that FUR is frequently able to precisely change the underlying models' prediction for a given instance by unlearning key steps, indicating when a CoT is parametrically faithful. Further analysis shows that CoTs generated by models post-unlearning support different answers, hinting at a deeper effect of unlearning.

TakeLab Retriever: AI-Driven Search Engine for Articles from Croatian News Outlets
arXiv preprint
TakeLab Retriever is an AI-driven search engine designed to discover, collect, and semantically analyze news articles from Croatian news outlets. It offers a unique perspective on the history and current landscape of Croatian online news media, making it an essential tool for researchers seeking to uncover trends, patterns, and correlations that general-purpose search engines cannot provide. TakeLab retriever utilizes cutting-edge natural language processing (NLP) methods, enabling users to sift through articles using named entities, phrases, and topics through the web application. This technical report is divided into two parts: the first explains how TakeLab Retriever is utilized, while the second provides a detailed account of its design. In the second part, we also address the software engineering challenges involved and propose solutions for developing a microservice-based semantic search engine capable of handling over ten million news articles published over the past two decades.
Projects
Explore our projects.
Retriever
TakeLab Retriever is a platform that scans articles and their metadata from Croatian news outlets and does text mining in real-time.
Alanno
We created a powerful annotation platform powered by active learning and designed to support a wide range of machine learning and deep learning models.
PsyTxt
With this project, we aim to set the ground for a truly interdisciplinary perspective on computational personality research by developing datasets and models for personality prediction and analysis based on online textual interactions.
Teaching
We take great pride and care in teaching the things we're good at and that inspire us. We design our courses mainly around the key topics in artificial intelligence, machine learning, NLP and IR, the topics that we deem relevant for our students' career success and professional development. Here's a list of courses we currently offer at the Faculty of Electrical Engineering and Computing, University of Zagreb.
Intro to AI
An introductory course covering fundamental concepts and techniques in artificial intelligence.
Machine Learning 1
A foundational course in machine learning, focused on key algorithms and exploring their underlying mechanisms.
Text Analysis and Retrieval
Examines modern approaches to text analysis and retrieval, grounded in fundamental principles.
Selected Topics in Natural Language Processing
Advanced topics in natural language processing, covering current research and applications.
News
Stay up to date with the latest news and updates from TakeLab.
Team
Get to know the people behind the work.

Ana Barić
CaporegimeI love deadlines. I love the whooshing sound they make as they go by.

David Dukić
CaporegimeDiamonds are made under pressure 💎

Iva Vukojević
CaporegimeExplicabo voluptatem mollitia et repellat qui dolorum quasi

Jan Šnajder
DonExplicabo voluptatem mollitia et repellat qui dolorum quasi

Josip Jukić
CaporegimeIt’s no coincidence a 90° angle is called the right one.

Laura Majer
CaporegimeShe doesn't even go here!

Martin Tutek
SottocapoI love sleep. My life has the tendency to fall apart when I'm awake.

Matej Gjurković
ConsigliereExplicabo voluptatem mollitia et repellat qui dolorum quasi