Smooth Sailing: Improving Active Learning for Pre-trained Language Models with Representation Smoothness Analysis

Josip Jukić, Jan Šnajder

In Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), pages 11–24, Gothenburg, Sweden. Association for Computational Linguistics.

Developed to alleviate prohibitive labeling costs, active learning (AL) methods aim to reduce label complexity in supervised learning. While recent work has demonstrated the benefit of using AL in combination with large pre-trained language models (PLMs), it has often overlooked the practical challenges that hinder the effectiveness of AL. We address these challenges by leveraging representation smoothness analysis to ensure AL is feasible, that is, both effective and practicable. Firstly, we propose an early stopping technique that does not require a validation set – often unavailable in realistic AL conditions – and observe significant improvements over random sampling across multiple datasets and AL methods. Further, we find that task adaptation improves AL, whereas standard short fine-tuning in AL does not provide improvements over random sampling. Our work demonstrates the usefulness of representation smoothness analysis for AL and introduces an AL stopping criterion that reduces label complexity.
Publication Image

Introduction

In this paper, we address the issue of high labeling costs in machine learning by improving active learning (AL) methods for pre-trained language models (PLMs). Active learning can reduce labeling effort by selecting the most informative data points, but it faces practical challenges such as the absence of validation sets and unstable fine-tuning. We propose a solution by leveraging representation smoothness analysis to make AL more effective and practical, especially in situations where validation sets are unavailable.

Key Contributions

  • Early Stopping Without Validation Set: We introduce Besov Early Stopping (BEAST), a technique that uses representation smoothness to stop training without requiring a validation set. This method improves model performance and generalization during AL.
  • Task Adaptation for AL: We show that task adaptation, specifically through Task-Adaptive Pre-Training (TAPT), enhances AL for PLMs, unlike short fine-tuning which does not provide significant improvements.
  • Representation Smoothness Analysis: By analyzing the smoothness of PLM layers, we demonstrate that smoother representations, particularly in earlier layers, lead to better model generalization and AL performance.
  • AL Stopping Criterion: We propose ALSBI (Active Learning Stopping by Besov Index), a new stopping criterion based on the smoothness of actively acquired samples. ALSBI helps detect when the model has reached the point of diminishing returns in label acquisition.

Methodology

We conducted experiments on five NLP datasets to evaluate different AL methods, such as random selection, maximum entropy, and a new representation gradient-based method (RG). Our experiments included various training regimes and explored how early stopping techniques like BEAST could enhance the effectiveness of AL. By analyzing the smoothness of PLM representations, we developed a new criterion for stopping the AL process when label complexity is sufficiently reduced.

Results

Our results show that task adaptation combined with Besov Early Stopping consistently improves AL performance across all datasets and methods. AL with PLMs proves to be both effective and feasible when combined with the right training regimes. Additionally, representation smoothness analysis allows for more stable and efficient active learning, reducing label complexity while maintaining model performance.

Conclusion

This paper demonstrates how representation smoothness analysis can make active learning more practical and efficient for PLMs, particularly in low-resource settings where validation sets are unavailable. The proposed methods—BEAST and ALSBI—offer new ways to reduce labeling costs without sacrificing model performance, and they hold potential for further exploration in other NLP tasks and training regimes.

Future Work

Future research could expand on our findings by applying these techniques to different types of NLP tasks and models. Additionally, exploring alternative training regimes that improve the synergy between active learning and pre-trained language models could yield further advancements in label-efficient learning.