In an era defined by rapid advancements in artificial intelligence, a groundbreaking study published in Communications Psychology reveals that large language models (LLMs) can predict human cognition and educational outcomes with an accuracy rivaling, and sometimes surpassing, traditional genomic analyses and even expert assessments. This paradigm-shifting research brings to the forefront the potential for AI to revolutionize how we understand intellectual capabilities and educational trajectories, fundamentally altering the landscape of cognitive science and educational psychology.
The premise stands on the extraordinary progress of LLMs, sophisticated AI systems trained on vast amounts of textual data from the web, books, and academic literature. These models, initially designed for natural language processing tasks like translation or summarization, have evolved into remarkably nuanced predictors of complex human traits. Wolfram’s study methodically benchmarks the predictive power of LLMs against genomic data and expert human evaluations, uncovering insights that could redefine assessment metrics in psychology and education.
Genomics, which has long been heralded as a critical avenue to understanding individual differences in cognition, relies on identifying specific gene variants linked to intelligence and learning ability. While powerful, genomic predictors often require extensive datasets, are prone to ethical controversies, and frequently struggle to capture the environmental and sociocultural components influencing cognitive development. Wolfram’s research posits that LLMs, grounded in linguistic and contextual world knowledge, offer a complementary—and in some cases superior—approach.
The methodology deployed in the study involves applying state-of-the-art LLMs to naturally occurring textual outputs associated with individuals, such as essays, social media posts, and academic writing. By analyzing syntactic complexity, semantic richness, and thematic coherence, the models generate cognitive profiles without explicit phenotype data. These AI-derived predictions are then directly compared to polygenic scores derived from genome-wide association studies (GWAS) and to expert assessments conducted by seasoned psychologists and educators.
Notably, the results demonstrate that LLMs achieve predictive accuracy on par with genomic methods, a finding that challenges the long-held assumption that genetic markers remain the gold standard for identifying cognitive aptitude. The AI’s ability to contextualize language within broader narratives and cultural frameworks allows it to capture subtle cognitive and educational signals that genetic data may overlook. Moreover, when combined with expert assessments, LLM-generated predictions enhance overall accuracy, indicating a complementary relationship rather than a competitive one.
The implications of this research extend beyond academic curiosity into practical applications. In education, for instance, AI-powered assessments could provide real-time, scalable, and non-invasive evaluations of student learning styles, comprehension, and potential cognitive challenges, facilitating personalized learning experiences at an unprecedented scale. This prospect could democratize access to educational resources, particularly in under-resourced settings where expert evaluators are scarce.
Furthermore, the study addresses concerns related to privacy and data security by emphasizing that LLM predictions can be made from publicly available or consented textual data without the need for genetic sampling, which is costlier and more intrusive. This advantage positions large language models as ethically favorable tools, provided that transparency and consent are rigorously maintained in data collection practices.
Critically, Wolfram also explores the limitations inherent in relying solely on AI models. While LLMs demonstrate remarkable capacity, they are sensitive to biases encoded in training data, including cultural, socioeconomic, and linguistic biases. These factors could skew predictive outcomes if not carefully mitigated through refined model training and validation techniques. The study calls for an interdisciplinary approach where AI specialists collaborate closely with cognitive scientists and ethicists to ensure equitable and responsible deployment.
In the realm of cognitive science, the ability to quantify mental constructs such as working memory, fluid intelligence, and verbal reasoning through language-based AI tools opens new avenues for research. Traditionally challenging to measure with precision, these dimensions are accessible by LLMs analyzing discourse patterns and conceptual complexity. This reframing could accelerate hypothesis testing and theory development, transforming the way intelligence is operationalized and measured.
Moreover, the predictive use of large language models may influence neuropsychological assessments, psychiatric evaluations, and even workplace talent identification. Early indications suggest that nuanced verbal outputs captured by LLMs correlate with cognitive function and educational attainment, offering auxiliary data points that can supplement clinical and administrative decision-making processes. The integration of these models could streamline assessments and offer continuous monitoring capabilities unobtainable by conventional methods.
Wolfram’s study further engages with the ethical dimensions of employing AI in predictive psychology. The paper underscores the necessity of safeguarding individuals from potential misuse of predictive data, highlighting risks such as stigmatization, discrimination, and privacy breaches. It advocates for stringent regulatory frameworks and continuous monitoring to balance innovation with respect for human rights.
Looking ahead, the research hints at the prospect of synergistic models that integrate genomic, linguistic, and expert inputs, leveraging the strengths of each modality. Such hybrid approaches promise more comprehensive and nuanced forecasts of cognitive ability and educational outcomes, establishing a new frontier in predictive accuracy.
Importantly, this emerging AI-driven paradigm democratizes knowledge by enabling non-invasive, cost-effective, and scalable approaches to measure cognition and learning. It offers a potent tool to bridge disparities in educational achievement and cognitive science research infrastructure worldwide, potentially transforming policy development and individualized support services.
In summary, the study by Wolfram marks a watershed moment in cognitive and educational assessment, revealing that large language models offer a predictive capacity that challenges long-established methodologies. By harnessing the intrinsic link between language and cognition, these AI systems stand poised to revolutionize our understanding of the human mind, with profound implications for education, psychology, and beyond.
As large language models continue to evolve, their integration into scientific inquiry and practical applications must be guided by ethical considerations, interdisciplinary collaboration, and rigorous validation. The promise of AI as a complementary or even superior predictor of cognition beckons a future where technology and human expertise converge to unlock unprecedented insights into the fabric of intelligence and learning.
Subject of Research: Cognitive and educational outcome prediction using large language models compared to genomics and expert assessment.
Article Title: Large language models predict cognition and education close to or better than genomics or expert assessment.
Article References:
Wolfram, T. Large language models predict cognition and education close to or better than genomics or expert assessment. Commun Psychol 3, 95 (2025). https://doi.org/10.1038/s44271-025-00274-x
Image Credits: AI Generated