TY - JOUR
T1 - Jagged competencies
T2 - Measuring the reliability of generative AI in academic research
AU - Thomas, Llewellyn D. W.
AU - Romasanta, Angelo Kenneth G.
AU - Priego, Laia Pujol
PY - 2026/1
Y1 - 2026/1
N2 - Large Language Models (LLMs) are increasingly viewed as a valuable tool for academic research. While LLMs have some benefits, a 'crisis of replicability' in management scholarship mitigates against unrestrained use. In this paper we investigate the reproducibility of LLM analyses. We analyze three LLMs-ChatGPT, Llama and Mistral-over fifteen weeks, testing the consistency, accuracy and their interaction using the same prompts on the same data corpus. While our results demonstrate significant variations in reliability and consistency across the three LLMs, we also show that LLMs can exhibit deterministic and reliable behavior under specific, welldefined constraints. We argue that replicable LLM-based research will rely on understanding and validating the task-specific operational boundaries of the LLM. To ensure the responsible integration of LLMs into management research, we highlight a need for robust frameworks, transparency, ethical guidelines, and ongoing evaluation. We conclude with actionable guidance for management researchers.
AB - Large Language Models (LLMs) are increasingly viewed as a valuable tool for academic research. While LLMs have some benefits, a 'crisis of replicability' in management scholarship mitigates against unrestrained use. In this paper we investigate the reproducibility of LLM analyses. We analyze three LLMs-ChatGPT, Llama and Mistral-over fifteen weeks, testing the consistency, accuracy and their interaction using the same prompts on the same data corpus. While our results demonstrate significant variations in reliability and consistency across the three LLMs, we also show that LLMs can exhibit deterministic and reliable behavior under specific, welldefined constraints. We argue that replicable LLM-based research will rely on understanding and validating the task-specific operational boundaries of the LLM. To ensure the responsible integration of LLMs into management research, we highlight a need for robust frameworks, transparency, ethical guidelines, and ongoing evaluation. We conclude with actionable guidance for management researchers.
KW - Accuracy
KW - Consistency
KW - Generative AI
KW - Llm
KW - Replication
KW - Reproducibility
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=pure_univeritat_ramon_llull&SrcAuth=WosAPI&KeyUT=WOS:001608282400002&DestLinkType=FullRecord&DestApp=WOS_CPL
U2 - 10.1016/j.jbusres.2025.115804
DO - 10.1016/j.jbusres.2025.115804
M3 - Article
SN - 0148-2963
VL - 203
JO - Journal of Business Research
JF - Journal of Business Research
M1 - 115804
ER -