AI May Learn Languages More Effectively by Forgetting - Scitke

Giving AI a human-like limitation on memory may actually improve its ability to learn language. In a new proof-of-concept study, Abishek Thamma of the University of Amsterdam and Micha Heilbron of the Max Planck Institute for Psycholinguistics found that small language models with a short-lived memory learned grammar more effectively when trained on language input comparable to what children receive. Their results suggest that insights from psycholinguistics can help shape more effective approaches to AI language learning. The researchers published the study in Transactions of the Association for Computational Linguistics.

The study draws on a long-standing theory in cognitive science that memory limitations may actually facilitate language learning. As people process speech and text, the precise details of words and sentences fade rapidly from memory. Rather than hindering learning, this forgetting may help individuals detect recurring patterns and develop a more abstract understanding of grammar.

To investigate whether the same principle could enhance artificial intelligence, the researchers incorporated a human-like memory constraint into modern neural language models. Unlike humans, today’s AI systems typically retain access to far more detailed linguistic information. However, the findings indicate that introducing a transient memory can make learning more efficient and improve grammatical generalization when training data are scarce.

Memory Fading

To investigate this, Thamma and Heilbron added a simple memory decay mechanism to Transformer language models, producing what they call fleeting memory transformers. Heilbron explained that the models were trained on the BabyLM benchmark, a dataset intended to mirror the amount of linguistic exposure humans receive during development. This setup allowed a controlled comparison between standard models and those with built-in memory constraints under realistic data conditions.

The findings consistently suggest that transient memory improves language learning. Across multiple training runs and initializations, models with memory decay outperformed standard Transformers in language modeling and also showed stronger performance on evaluations targeting syntactic knowledge.

Heilbron added that these advantages only appeared when memory decay was combined with a short “echoic memory” buffer that retained the most recent three to seven words. He noted that, together, these components seem to aid learning by balancing immediate access to nearby context with a progressive fading of earlier word forms.

Transient Memory

The results support a long-standing idea in cognitive science, first advanced in influential connectionist work by Elman (1993), that limitations in memory may actually aid language learning rather than simply hinder it. They also indicate that the strong performance of modern Transformer architectures does not necessarily mean that unlimited memory is ideal for acquiring language.

At the same time, the study revealed an unexpected split, according to Thamma. While fleeting memory improved language-learning outcomes, it made the models less effective at predicting human reading times using surprisal-based metrics. This contradicts the usual finding that gains in language-model performance tend to align with better predictions of human language processing behavior.

Further analyses showed that this mismatch could not be accounted for by established explanations of why more powerful language models sometimes fail to better predict human reading-time data. As a result, the authors suggest that the mechanisms that promote effective language learning may not be the same as those that enable accurate modeling of real-time language processing.

Overall, the study indicates that imposing memory constraints can improve language learning in contemporary neural networks, while also underscoring a key distinction between learning language well and capturing human behavioral patterns.

Main Results

Adding human-like memory decay to Transformer models enhances language learning performance.
Models with transient memory show better language-modeling results and improved syntactic generalization.
These gains rely on a short-term echoic buffer that retains roughly the last 3–7 words.
Although language learning improves, transient memory weakens the models’ ability to predict human reading times using surprisal-based measures.
Current explanations for the gap between language-model performance and behavioral predictions do not fully explain this finding.

This study reexamines a long-standing question in cognitive science using modern language models. The results indicate that memory limitations can still facilitate language learning in today’s neural networks, while also raising new questions about the relationship between linguistic knowledge and human language processing.

WhatsApp Image 2026 03 21 At 15.37.18 1 768x384 29

Read the original article on: Tech Xplore

Memory Fading

Transient Memory

Main Results

Leave a Comment Cancel Reply