Published Research · Frontiers in Psychology · 2025

Comparing Chatbots to
Psychometric Tests in Hiring

Reduced social desirability bias, but lower predictive validity — a peer-reviewed study of AI chatbots for personality inference in real-world recruitment. Ingram built and deployed the chatbot at the centre of it.

Read the paper

About the Research

AI is being adopted in recruitment far faster than it is being validated — a practitioner-academia gap. This study set out to close part of it with a direct question: can an AI chatbot infer personality for hiring as reliably as an established psychometric test?

Can AI-driven chatbots match validated psychometric instruments for personality inference in professional hiring — and what do they trade off when they try?

The chatbot at the centre of the study was built on Fabrile, Ingram's no-code agent platform, in collaboration with the Montenegrin HR-tech startup Recrewty and the paper's authors. Fabrile was chosen because it could be customised to run the assessment in Montenegrin and Serbian — a language pairing unavailable on the platforms used in earlier work. The chatbot ran on OpenAI's GPT-4 API, with a novel one-question-per-facet design: an open-ended question for each facet of the Big Five, scored individually for a more granular read on its psychometric properties.

The study used a quasi-experimental design with propensity score matching. 159 professionals across Serbia and Montenegro — a control group not in a hiring process, and a candidate group going through real bank selections — completed both a traditional 50-item Big Five questionnaire and the chatbot assessment. It is, to the authors' knowledge, the first study of its kind in the Western Balkans, and one of few to test AI psychometrics outside WEIRD (Western, Educated, Industrialised, Rich, Democratic) populations.

The result is a clear-eyed finding rather than a marketing claim. The chatbot showed good structural, substantive, and convergent validity for Extraversion and Conscientiousness — but not for Neuroticism, Agreeableness, or Openness. Crucially, AI-inferred scores were robustly less susceptible to social desirability bias than traditional self-report tests. But they did not significantly predict real-world outcomes such as job role or education level — traditional tests still won on predictive validity. The work was accepted to Frontiers in Psychology (Vol. 16, 2025), co-authored by Danilo Djukanovic and Dario Krpan of the London School of Economics' Department of Psychological and Behavioural Science.

The Study at a Glance

The question

Can an AI chatbot infer personality traits for hiring as reliably as an established psychometric test — and can it resist the social desirability bias that distorts self-report measures in high-stakes selection?

The method

A quasi-experimental design with propensity score matching. 159 professionals across Serbia and Montenegro completed both a traditional Big Five questionnaire and an AI chatbot assessment, using a novel one-question-per-facet approach.

The result

The chatbot showed good structural, substantive, and convergent validity for Extraversion and Conscientiousness — and AI-inferred scores were robustly less susceptible to social desirability bias than traditional tests.

The honest caveat

The chatbot was weaker on Neuroticism, Agreeableness, and Openness, and AI-inferred scores did not significantly predict real-world outcomes like job role or education — traditional tests still won on predictive validity.

Ingram's Role

Ingram built and customised the specialised chatbot on the Fabrile platform and developed it alongside the authors — provided to the research free of charge. Research only produces meaningful conclusions if the instrument is sound, so the chatbot had to be consistent and measurable enough to be evaluated as a psychometric instrument, not demoed as a product.

That it held up — good psychometric properties on several traits and clear resistance to social desirability bias, with no language-specific training phase — is the research-to-deployment thesis in practice: the same team that engages with original research builds and ships the systems it informs. For Recrewty, that meant a hiring chatbot whose claims are backed by peer review, not a pitch deck.

Read the Recrewty case study Read the full paper

Comparing Chatbots toPsychometric Tests in Hiring