Protein engineering is a field primed for artificial intelligence research. Each protein is made up of amino acids; to optimize a protein function, researchers modify proteins by switching out one of 20 different amino acids for another.
Rice UniversityApr 13 2026 Protein engineering is a field primed for artificial intelligence research. Each protein is made up of amino acids; to optimize a protein function, researchers modify proteins by switching out one of 20 different amino acids for another. For a protein that is just 50 amino acids in length, this leads to approximately 1.13x1065 potential combinations to test - that's 113 followed by 65 zeros, or five times as many zeros as a trillion has. This number of potential combinations, impossible to test in the lab, makes protein engineering an ideal challenge for AI. Modeling which of these combinations will give the best results is a perfect problem for the technology's massive computing power. But AI is only as good as the data used to train it, and in some areas of protein engineering , the right data just didn't exist.
One of the biggest bottlenecks in AI-guided protein engineering is not coming up with machine-learning models. It is generating the right and enough experimental data to train them. For engineering protein activity, which optimizes what a protein does, we had a very clear problem: There simply were not enough datasets to train accurate models.' Han Xiao, Rice University professor of chemistry, biosciences and bioengineering and director of the SynthX Center
To be able to generate AI models that could accurately predict how to optimize a protein's function, or activity, Xiao's team had to first generate enough activity data about any given protein to train an AI model. In a recent Nature Biotechnology publication, Xiao's team and collaborators from Johns Hopkins University and Microsoft did just that, sharing an approach that provided the needed data and created accurate models in just three days.
This approach, called Sequence Display, can generate more than 10 million data points in a single experiment. These data points are then fed into protein language AI models, which use them to predict which changes to a protein's amino acids will create the desired change for the protein's activity or function.
'We were able to develop an activity-based barcoding system that records the activity of individual protein variants and generates the kind of dataset needed to train a machine learning model,' said Linqi Cheng, a Rice graduate student and first author on the study. 'Then the model was able to predict mutations that significantly improved the activity of the protein we were studying.'
The team chose a small CRISPR-Cas protein for proof of concept. This protein was valued for its size but limited in its activity to target stretches of DNA to cut. The researchers wanted to identify a version that could cut a wider variety of DNA targets.
Related StoriesFirst, they mutated the DNA that codes for the Cas9 protein, creating many variations. A blank DNA barcode was attached to each variant, along with a special editor that would change the barcode in response to the protein's activity level. As the protein's activity levels increased, so did the editor's. This meant that the most active protein variations had the biggest changes in their barcodes. The DNA barcodes were then read by next-generation sequencing, which would essentially scan the barcode and classify each sequence by level of activity.
'The AI is not replacing the experiment here. It instead depends on the experiment,' Cheng said. 'Sequence Display gives us the data foundation, and the models help us search a much larger data space for strong candidates.'
The team successfully repeated this process with other proteins, including aminoacyl-tRNA synthetases, cytosine deaminase and uracil glycosylase inhibitor. In each case, the barcoding experiment generated enough data points to train AI models. 'What this approach provides is a practical framework for integrating AI with protein engineering,' said Xiao, who is also a Cancer Prevention and Research Institute Scholar. 'Rather than relying on machine learning as a stand-alone solution, we couple it with an experimental platform that generates high-quality training data. This synergy enables more efficient discovery of advanced research tools and next-generation therapeutic proteins.'
This work was supported by a SynthX Seed Award , the National Institutes of Health , the Robert A. Welch Foundation , the U.S. Department of Defense , a 2024 Rice Synthetic Biology Institute Seed Grant and a Medical Research Award from the Robert J. Kleberg, Jr. and Helen C. Kleberg Foundation. Source:Rice University Journal reference:Cheng, L., et al. . Sequence Display enables large-scale sequence–activity datasets for rapid protein evolution. Nature Biotechnology. DOI: 10.1038/s41587-026-03087-3. https://www.nature.com/articles/s41587-026-03087-3.
Artificial Intelligence Biotechnology DNA Machine Learning Protein Engineering Research Technology
United States Latest News, United States Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
Dovedale Towers Reopens Under New ManagementThe iconic Dovedale Towers, known as The Dovey, in south Liverpool is set to reopen in May under the management of 1936 Pubco. The pub, with a rich history including connections to Paul McCartney and Freddie Mercury, will undergo renovations while retaining its classic charm and offering a selection of ales, stouts, and live sports.
Read more »
School Meals to Undergo Major Overhaul with New Dietary RestrictionsNew national menu reforms will ban certain foods in schools as part of efforts to combat childhood obesity and improve dental health. The proposals, championed by Sir Keir Starmer, will limit sugary items, deep-fried foods, and unhealthy 'grab and go' options, while promoting fruit and vegetable consumption. Schools will be required to publish menus online under the regulations, which are subject to public consultation.
Read more »
Wheel of Fortune's Vanna White Enjoys Coachella with Family and Celebrates New MarriageVanna White, the iconic co-host of Wheel of Fortune, was spotted at Coachella with her son, handing out gifts. The article also touches upon her recent marriage and her family life.
Read more »
Morrisons Revamps Salad Bar with New Toppings, Protein Options, and Prize DrawMorrisons is upgrading its salad bar offerings across its stores nationwide, introducing new toppings, dressings, protein options, and a prize draw for customers. The enhancements are designed to provide customers with more customizable and convenient meal solutions.
Read more »
The UK is getting a brand-new 'garden town' with 10,000 new homesA new garden town is set to be built in Gilston, East Hertfordshire, with five new parks, nine miles of trails, and 10,000 homes.
Read more »
Protein complex cohesin found essential for guiding cell identity decisionsTemporarily disabling a protein complex that organizes DNA into loops inside the cell's nucleus drastically disrupted the three-dimensional structure of the genome, but surprisingly most genes continued to function as usual, Weill Cornell Medicine researchers found.
Read more »
