
In an MIT classroom, a professor delivers a lecture as students take careful notes to review and absorb essential material in preparation for an exam.
Humans naturally learn and retain new information, but large language models (LLMs) lack this ability. Once deployed, a trained LLM has a fixed “brain” that can’t permanently incorporate new knowledge.
As a result, if a user shares important information with an LLM today, it won’t recall it in future conversations.
MIT Develops Method for LLMs to Self-Update Like Students
MIT researchers have now introduced a new method that allows LLMs to self-update and permanently absorb new information. Much like a student, the model creates its own study notes from user input and uses them to adjust its internal parameters. This work is detailed in a paper published on the arXiv preprint server.
The model produces several self-edits based on a single input and tests each to determine which yields the greatest performance boost. Through this trial-and-error process, it learns how to optimize its own training.
The researchers discovered that this method enhanced LLM accuracy in both question-answering and pattern-recognition tasks, even allowing a smaller model to surpass the performance of much larger ones.
Although challenges remain, this technique could eventually enable AI systems to continuously adapt to new tasks and dynamic objectives in ever-changing environments.
“Like humans, advanced AI systems can’t stay static throughout their lifetimes. LLMs operate in dynamic settings where they constantly encounter new user inputs. Our goal is to build a model that’s more human-like—one that can continuously improve itself,” says Jyothish Pari, an MIT graduate student and co-lead author of the paper describing the technique.
Pari co-authored the work with Adam Zweiger, an MIT undergraduate and fellow co-lead author; graduate students Han Guo and Ekin Akyürek; and senior authors Yoon Kim, an assistant professor in MIT’s Department of Electrical Engineering and Computer Science (EECS) and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Pulkit Agrawal, also an EECS assistant professor and CSAIL member.
The research will be presented at the Conference on Neural Information Processing Systems.
Training the Model to Acquire Knowledge
LLMs are neural network models with billions of parameters, known as weights, which store the model’s knowledge and help it generate predictions from inputs. During training, these weights are adjusted to incorporate information from the training data.
Once deployed, however, the weights become fixed and can no longer be permanently modified.
LLMs excel at in-context learning, where they learn a new task by observing a few examples. While these examples influence the model’s responses in the moment, the learned knowledge does not persist beyond the current interaction.
MIT researchers aimed to harness a model’s strong in-context learning abilities to train it to permanently adjust its weights when it acquires new information.
They developed a framework called SEAL, short for “self-adapting LLMs,” which allows an LLM to create synthetic data from an input and then figure out the most effective way to update itself using that data. Each piece of synthetic data serves as a self-edit the model can implement.

LLMs Learn by Creating and Testing Synthetic Study Sheets
For language tasks, the LLM generates synthetic data by rephrasing the information and its implications from an input passage, much like students create study sheets by summarizing and rewriting lecture notes.
The model produces multiple versions and then tests each one to determine which self-edit yields the largest improvement on a downstream task, such as question answering. This trial-and-error process uses reinforcement learning, rewarding the model for the edits that boost performance the most.
Finally, the LLM internalizes the information from the most effective study sheet by updating its weights.
“Our goal is for the model to craft the most effective study sheet—one that provides the right level of detail and a balanced set of information—so that applying it to update the model enhances its overall performance,” Zweiger explains.
Selecting The Optimal Approach
The framework also lets the model decide how it wants to learn. It can choose which synthetic data to use, set its learning rate, and determine how many training iterations to perform.
In this way, the model not only generates its own training data but also manages how that data is applied to update its weights.
“As humans, we understand the methods that help us learn best. We aim to give LLMs a similar ability. By letting the model control how it processes information, it can determine the most effective way to handle the incoming data,” Pari explains.
SEAL outperformed several baseline approaches across a variety of tasks, including learning new skills from a few examples and integrating knowledge from a text passage. On question-answering tasks, SEAL increased model accuracy by nearly 15%, and for certain skill-learning tasks, it raised the success rate by over 50%.
A key limitation of this method is catastrophic forgetting: as the model continually adapts to new information, its performance on previously learned tasks gradually declines.
The researchers aim to address catastrophic forgetting in future work and explore applying this technique in multi-agent scenarios, where multiple LLMs train one another.
“One major obstacle for LLMs to perform meaningful scientific research is their current inability to update themselves when exposed to new information. While fully deployed self-adapting models are still a long way off, we hope that systems capable of learning in this manner could eventually address this limitation and contribute to scientific progress,” Zweiger says.
Read the original article on: Tech Xplore
Read more: Scientists Create a Synthetic Leaf that Turns Pollution into Energy
