LLMs in Citation Intent Classification: Progress, Precision, and Reproducibility Challenge

Alex Fogelson

Ana Trišović

Neil Thompson

October 21, 2025
Understanding the intent behind scientific citations is critical for advancing scholarly search and knowledge mapping. This paper reflects on the methodological use of large language models (LLMs) for multi-class citation intent classification. Our experiments evaluating a diverse range of models and approaches reveal striking disagreement among state-of-the-art (SotA) systems. This inconsistency suggests that citation intent classification remains a challenging task for LLMs raising questions about the robustness, reliability and replicability of current methods. Moreover, our findings highlight a concerning dependency on proprietary LLMs that, even with access to compute resources, were necessary to achieve sufficient accuracy. This introduces new challenges, as silent updates, lack of versioning, and opaque training pipelines pose threats to methodological transparency and long-term reproducibility in LLM-enabled research.

Related Publications