In an era defined by rapid advancements in artificial intelligence and natural language processing, researchers have introduced an innovative method to bridge the gap between abstract concepts and tangible understanding across multiple languages. The breakthrough centers on a novel multimodal transformer-based tool designed for the automatic generation of concreteness ratings—a fundamental linguistic and cognitive measure that assesses how ‘concrete’ or ‘abstract’ a word or concept is perceived. This development, detailed in a recent publication in Communications Psychology, promises to reshape how machines comprehend human language nuances and how multilingual systems can achieve deeper semantic insight.
Concreteness ratings have traditionally played a vital role in psycholinguistics, cognitive science, and language education. Words like “apple” or “dog” are inherently concrete; they evoke vivid sensory experiences, objects one can see or touch. Conversely, terms such as “justice” or “freedom” sit at the abstract end of the spectrum, referencing ideas or concepts without immediate sensory correlates. Historically, compiling concreteness ratings has depended heavily on human judgements collected through extensive surveys and experiments—a largescale, time-consuming endeavor usually limited to individual languages. The advent of this new transformer-based model revolutionizes this landscape by automating these ratings and transcending linguistic boundaries.
At the core of this breakthrough is a transformer architecture, a class of deep learning models that have powered some of the most impressive achievements in natural language understanding and generation. Unlike prior models that rely solely on textual data, this model operates in a multimodal space, integrating linguistic information with visual and contextual cues. This fusion allows the system to calibrate an informed concreteness rating by effectively “experiencing” the concept through data modalities beyond just text. The implications of this approach extend far beyond simple word classification—it equips AI with a richer and more human-like grasp of semantic content.
One of the most striking features of this tool is its capacity for multilinguality. Due to the rich and nuanced nature of languages encoded differently across cultures, direct transfer of concreteness assessments has historically presented a substantial challenge. This model circumvented the issue by leveraging aligned representations in the transformer’s latent space, learning patterns of concreteness that generalize across languages without depending on language-specific training data alone. Consequently, it can generate ratings for languages with minimal or no previously available concreteness databases, thereby democratizing access to semantic analysis tools worldwide.
Technical intricacies of the model reveal how it integrates multimodal embeddings generated from large-scale datasets combining images, texts, and metadata. The researchers utilized transformer layers that attend to varied forms of input, creating joint embeddings that synthesize and balance information. Training included contrastive learning objectives that align visual features with linguistic descriptors, facilitating a refined understanding of concreteness as a spectrum rather than a binary attribute. The model’s architecture allows it to adapt and recalibrate its weights dynamically, depending on language-specific semantic profiles, resulting in high fidelity concreteness estimates.
Evaluation of the system involved rigorous benchmarking against existing human-annotated concreteness datasets in multiple languages, including English, Spanish, and Italian. Results demonstrated correlations with human judgments that are competitive with or exceed traditionally used psycholinguistic norms. Notably, the model exhibited the ability to capture subtle cultural and linguistic variations in concreteness perception. For example, certain words with disparate concreteness ratings in different linguistic communities were accurately contextualized, indicating the system’s refined sensitivity to semantic nuance shaped by culture and usage.
The research team highlighted potential real-world applications for this innovation. In natural language understanding, automatic concreteness ratings can improve tasks such as sentiment analysis, metaphor detection, and text simplification. For educational technologies, this means enhanced tools for vocabulary teaching that are sensitive to learners’ conceptual stages. Additionally, the ability to generate concreteness ratings in under-resourced languages opens pathways for more inclusive and accessible AI models worldwide. Multimodal transformers, therefore, emerge not only as linguistic tools but as cultural mediators bridging semantic divides.
Underlying this breakthrough is a growing recognition within the AI community of the importance of multimodal data integration. Human cognition naturally combines sensory experiences with linguistic knowledge; computational models that mimic this process tend to produce more accurate and intuitive results. By extending this principle to the domain of concreteness rating, the researchers provide a compelling case study of how cross-domain signal fusion significantly advances machine understanding. This sets a precedent for future models to consider complexities of human language and cognition beyond purely textual realms.
Moreover, the tool’s architecture is designed with scalability in mind. It can incorporate new forms of data—including auditory or haptic signals—potentially enabling even more nuanced concreteness assessments in the future. This modularity ensures that as datasets grow and diversify, the model can evolve accordingly without complete retraining. This property is particularly valuable considering the fast pace of data generation and the multiplicity of languages and dialects worldwide, making the system adaptable and future-proof in the rapidly evolving field of computational linguistics.
The study also opens intriguing questions about the cognitive and neurological underpinnings of concreteness. By providing automated yet human-like concreteness ratings, the model offers researchers a new lens to examine how concepts are mentally represented and differ across individuals and cultures. It can serve as a hypothesis generator or validation tool for psycholinguistic experiments, helping to map which features or modalities contribute most to perceived concreteness. This bidirectional benefit—informing both AI development and cognitive science—illustrates the symbiotic relationship that modern interdisciplinary research can foster.
Critical reception within the scientific community has been overwhelmingly positive, with experts acknowledging the contribution as a milestone in both natural language processing and psycholinguistics. The introduction of a multimodal, multilingual approach to concreteness estimation addresses longstanding methodological limitations while simultaneously pushing AI closer to human-level semantic understanding. The potential for integration with other language technologies such as machine translation, question answering, and semantic search makes it a versatile and impactful tool.
Ethical and societal implications of this development should also be considered. Enhanced AI understanding of abstract and concrete concepts can improve communication aids, accessibility technologies, and user-centered design in digital interfaces. Conversely, the ability of machines to grasp subtle semantic distinctions raises questions about privacy, data use, and the risk of linguistic homogenization. Responsible deployment, transparent methodology, and linguistic inclusivity must be key considerations as this technology advances and becomes integrated into everyday AI systems.
Looking forward, the development team envisions expanding this approach into a broader framework for semantic evaluation encompassing other psycholinguistic variables such as emotional valence, imageability, and familiarity. By doing so, they aim to create a comprehensive, multimodal semantic profiling tool that can enrich numerous AI applications that interface with human language. The work calls for collaborative efforts across disciplines—including linguistics, cognitive science, and computer science—to continue refining models that reflect the complexity of human semantic processing.
This pioneering work underscores a fundamental shift in AI research: the move away from isolated, unidimensional data representations toward richer, context-aware, and culturally sensitive models. The tool presented not only propels current state-of-the-art forward but also invites reconsideration of how semantics are encoded and interpreted across media and languages. As AI’s role in society becomes increasingly consequential, innovations like this play a crucial role in ensuring that machines understand the world in ways that resonate with human experience.
Overall, the multimodal transformer-based tool for automatic generation of concreteness ratings exemplifies the power of integrating cutting-edge machine learning with insights from human cognition and linguistics. It stands as a landmark achievement with profound implications for the future of language technologies, enabling more nuanced, flexible, and universally applicable AI language systems. Its potential to democratize semantic understanding across languages and cultures represents a significant step toward AI that truly comprehends the rich texture of human communication.
Subject of Research: Automatic generation of concreteness ratings in language using multimodal transformer models.
Article Title: A multimodal transformer-based tool for automatic generation of concreteness ratings across languages.
Article References:
Kewenig, V., Skipper, J.I. & Vigliocco, G. A multimodal transformer-based tool for automatic generation of concreteness ratings across languages. Commun Psychol 3, 100 (2025). https://doi.org/10.1038/s44271-025-00280-z
Image Credits: AI Generated