Beyond AlphaFold: AI Excels At Developing New Protein

Beyond AlphaFold: AI Excels At Developing New Protein

Proteins designed with an ultra-rapid software tool called ProteinMPNN were much more likely to fold up as intended. Credit: Ian Haydon, UW Medicine Institute for Protein Design

Over the past 2 years, machine learning has revolutionized protein structure prediction. Currently, three papers in Science describe a similar revolution in protein design.

In the current papers, biologists at the University of Washington School of Medicine show that machine learning can be utilized to create protein molecules much more accurately and quickly than previously possible. The scientists hope this advancement will lead to many new vaccines, treatments, devices for carbon capture, and sustainable biomaterials.

“Proteins are fundamental across biology; however, we know that all the healthy proteins discovered in every plant, animal, and microorganism make up far less than one percent of what is feasible. With these new software tools, scientists should be able to find solutions to long-standing difficulties in medicine, power, and technology,” said senior author David Baker, professor of biochemistry at the College of Washington College of Medicine and recipient of a 2021 Development Prize in Life Sciences.

Proteins are typically described as the “building blocks of life” because they are essential for the structure and function of all living things. They are involved in every process inside cells, including growth, division, and also repair. Proteins are composed of long chains of chemicals dubbed amino acids. The sequence of amino acids in a protein determines its 3-dimensional form. This complex form is crucial for the protein to function.

Recently, powerful machine learning formulas, including AlphaFold and RoseTTAFold have been trained to predict the detailed forms of natural proteins based solely on their amino acid series. Machine learning is a kind of artificial intelligence that enables computers to learn from data without being explicitly programmed. Machine learning can be utilized to model complex scientific concerns that are too hard for humans to understand.

To go beyond the proteins discovered in nature, Baker’s team members broke down the challenge of protein style into three parts and used new software solutions for each.

Artificial intelligence hallucinated these symmetric protein assemblies, in a way similar to other A.!. generative tools that produce output based on simple prompts. Credit: Ian Haydon, UW Medicine Institute for Protein Design

First, a new protein form must be generated. In a paper published July 21st in the journal Science, the team showed that artificial intelligence can produce new protein shapes in two ways. The first, dubbed “hallucination,” belongs to DALL-E or other generative A.I. tools that make outcomes based on simple prompts. The second, called “inpainting,” is analogous to the autocomplete function found in modern search bars.

Second, the group created a new algorithm for producing amino acid sequences to speed up the process. Explained in the Sept. 15 issue of Science, this software device, called ProteinMPNN, runs in about one second. That is more than 200 times faster than the previous best software. Its results are superior to previous tools, and the software requires no expert customization to run.

“Neural networks are simple to train if you have a lot of data, but we do not have as many examples as we would like with proteins. We had to go in and identify which features in these molecules are the most essential. It was a bit of experimentation,” stated project scientist Justas Dauparas, a postdoctoral fellow at the Institute for Protein Design.

Third, the team used AlphaFold, a device created by Alphabet’s DeepMind, to independently examine whether the amino acid series they came up with were likely to fold into the intended shapes.

“Software for predicting protein frameworks is part of the solution, but it can not generate anything new on its own,” explained Dauparas.

“ProteinMPNN is to protein style what AlphaFold was to protein structure prediction,” added Baker.

Detail of a protein designed using a rapid tool called ProteinMPNN, another advance in the use of artificial intelligence and machine learning in protein design. Credit: Ian Haydon, UW Medicine Institute for Protein Design

In another document appearing in Science on Sept. 15, a team from the Baker lab confirmed that the combination of new artificial intelligence tools could reliably generate new proteins that worked in the laboratory.

“We discovered that proteins made using ProteinMPNN were much more likely to fold up as intended, and we might create very complex protein assemblies using these methods,” stated project scientist Basile Wicky, one postdoctoral fellow at the Institute for Protein Style.

Among the current proteins made were nanoscale rings that the scientists believe can become parts for custom nanomachines. Electron microscopes were utilized to observe the rings, which have sizes about a billion times tinnier than a poppy seed.

“This is the very beginning of machine learning in protein style. In the coming months, we will be functioning to improve these tools to develop even more dynamic and functional proteins,” stated Baker.

Computer resources for this work were donated by Microsoft and Amazon Web Services.


More information:

B. I. M. Wicky et al, Hallucinating symmetric protein assemblies, Science (2022). DOI: 10.1126/science.add1964. www.science.org/doi/10.1126/science.add1964

Read the original article on PHYS.

Share this post