Smooth Sailing: Improving Active Learning for Pre-trained Language Models with Representation Smoothness Analysis
In Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), pages 11–24, Gothenburg, Sweden. Association for Computational Linguistics.

Introduction
In this paper, we address the issue of high labeling costs in machine learning by improving active learning (AL) methods for pre-trained language models (PLMs). Active learning can reduce labeling effort by selecting the most informative data points, but it faces practical challenges such as the absence of validation sets and unstable fine-tuning. We propose a solution by leveraging representation smoothness analysis to make AL more effective and practical, especially in situations where validation sets are unavailable.
Key Contributions
- Early Stopping Without Validation Set: We introduce Besov Early Stopping (BEAST), a technique that uses representation smoothness to stop training without requiring a validation set. This method improves model performance and generalization during AL.
- Task Adaptation for AL: We show that task adaptation, specifically through Task-Adaptive Pre-Training (TAPT), enhances AL for PLMs, unlike short fine-tuning which does not provide significant improvements.
- Representation Smoothness Analysis: By analyzing the smoothness of PLM layers, we demonstrate that smoother representations, particularly in earlier layers, lead to better model generalization and AL performance.
- AL Stopping Criterion: We propose ALSBI (Active Learning Stopping by Besov Index), a new stopping criterion based on the smoothness of actively acquired samples. ALSBI helps detect when the model has reached the point of diminishing returns in label acquisition.
Methodology
We conducted experiments on five NLP datasets to evaluate different AL methods, such as random selection, maximum entropy, and a new representation gradient-based method (RG). Our experiments included various training regimes and explored how early stopping techniques like BEAST could enhance the effectiveness of AL. By analyzing the smoothness of PLM representations, we developed a new criterion for stopping the AL process when label complexity is sufficiently reduced.
Results
Our results show that task adaptation combined with Besov Early Stopping consistently improves AL performance across all datasets and methods. AL with PLMs proves to be both effective and feasible when combined with the right training regimes. Additionally, representation smoothness analysis allows for more stable and efficient active learning, reducing label complexity while maintaining model performance.
Conclusion
This paper demonstrates how representation smoothness analysis can make active learning more practical and efficient for PLMs, particularly in low-resource settings where validation sets are unavailable. The proposed methods—BEAST and ALSBI—offer new ways to reduce labeling costs without sacrificing model performance, and they hold potential for further exploration in other NLP tasks and training regimes.
Future Work
Future research could expand on our findings by applying these techniques to different types of NLP tasks and models. Additionally, exploring alternative training regimes that improve the synergy between active learning and pre-trained language models could yield further advancements in label-efficient learning.