Revolutionizing Science: AI Visualizes All Known Proteins
Proteins are long molecules that contain strings of amino acids and can take on a variety of shapes. Their structure dictates their functions, ranging from transporting molecules to operating as valves and pumps.
In 2021, Alphabet-owned artificial intelligence company DeepMind unveiled its AlphaFold AI software, which predicted the shape of nearly every protein known to science. This catalog of 200 million structures is now available to scientists through the public AlphaFold database.
AI’s Breakthrough in Protein Visualization
Researchers have taken a huge step toward unlocking the secrets of protein structures, the workhorses that drive nearly every biological process. They have predicted the three-dimensional shapes of more than 200 million proteins from their sequences of amino acids, or building blocks. It’s a gargantuan accomplishment by artificial intelligence, particularly the London-based company DeepMind that Google parent Alphabet owns.
The achievement paves the way for untold avenues of scientific exploration into proteins, which are long molecules that string together amino acids to form peptides and polypeptides that then fold themselves into their final shapes. Proteins carry out a vast range of tasks within organisms from bacteria to humans: they transport molecules, act as chemical catalysts, operate as valves and pumps, and much more.
Their three-dimensional shapes are key to understanding what they do. For example, one protein, hemoglobin, is responsible for carrying oxygen in your red blood cells to the rest of your body. This discovery has major implications in advancing research into diseases and developing new drugs that target proteins.
Every Protein Mapped: AI’s Stunning Achievement
Proteins are the building blocks of life, and their shape is fundamental to understanding how they work. The discovery by UK-based AI company DeepMind of a way to predict the structures of proteins is a huge step forward, and one that will help scientists tackle major global challenges like developing malaria vaccines and combating plastic pollution.
The new database, called AlphaFold, is now available to researchers and covers more than 200 million protein shapes — that’s every catalogued protein from plants, bacteria, animals and humans. The expansion of the database means it’s now possible to predict the structure of any protein if the sequence is known, and the new tools could lead to faster breakthroughs in areas like drug development.
The database also includes a set of computational tools such as cBioPortal , CRAVAT , Jalview , MutDB , and STRUM  that allow users to take a list of genetic variants (VCF files ) and map them onto the protein structures. They can then explore the 3D structure of the resulting mutant and compare it to the wild-type protein, enabling them to assess its impact on the structure and function.
Transforming Bio-Science with AI Protein Maps
Proteins are built from chains of amino acids that fold and twist to form intricate shapes. These molecule-sized machines perform crucial tasks inside our bodies and cells, from keeping our hearts beating to creating hormones that control appetite and reproduction. But until recently, scientists haven’t been able to map proteins’ full structures down to the atomic level.
That changed this year when deep learning algorithms swept a competition called CASP. The top-rated method was developed by DeepMind, a London-based AI company owned by Alphabet, the parent company of Google. The program, known as AlphaFold, achieved accuracies of about 90%—a milestone that would have seemed impossible just two years earlier.
To understand how the proteins in our bodies function, researchers need to know where they are located inside cells. But mapping their precise positions is a painstaking process. One common technique involves tagging individual proteins with fluorescent markers and then using a tool called spatial proteomics to identify their locations. This can be expensive and time-consuming. Now, a new AI system called HCPL (Hybrid subCellular Protein Localiser) has shown world-leading speed and accuracy when it comes to identifying the patterns of proteins in individual cells.
AI and Proteins: A New Era of Discovery
Proteins do much of the work in biology – they build muscles and organs, digest food, fight viruses and many other essential tasks. But there are still many biomedical and industrial challenges that evolution hasn’t yet compelled proteins to solve.
Scientists are working to unlock new potential by designing and testing bespoke proteins with functions that haven’t been found in nature. To do this, they need to understand how real-world proteins are built – and now artificial intelligence is giving them the tools.
Machine learning models like AlphaFold and RoseTTAFold have already proven capable of predicting the 3D shapes of natural proteins based on their amino acid sequences. But creating a new protein shape has proved more difficult.
Baker and his colleagues used a new software model to guide the AI into the right design direction. They fed the model a million amino acid sequences and selected 100 of them to try out in cells. These were tested for their ability to produce a biological reaction. They worked: The first functional artificial protein designed and created by an AI was a lysozyme, generated with 69% identity to the enzyme found in egg whites that defends against bacteria and fungi.
Unveiling Protein Structures: AI’s Key Role
In a landmark achievement, AI-powered algorithms have solved one of biology’s most enduring mysteries by predicting how proteins curl up from linear chains of amino acids into the 3D shapes that carry out life’s tasks. The results are a boon for science, opening the door to faster development of new drugs, more resilient crops and bacteria-fuelled recycling of plastic waste.
The protein structures that were predicted by the UCL-based DeepMind team’s algorithm, AlphaFold, have now been made available to all scientists through a searchable database hosted by the European Bioinformatics Institute of the EMBL-EBI. They will be used to develop potential malaria vaccines, improve the understanding of Parkinson’s disease and a host of other diseases, work out how to protect honeybee health and uncover clues about human evolution.
But besides helping scientists understand how a given protein works, the technology can also aid in the design of proteins with desired properties. Using software such as AlphaFold and Meta AI’s ESMFold, researchers can modify existing proteins or create completely new ones to perform the desired task, such as creating a drug that can unclog arteries.
Protein Science Redefined by Advanced AI
Last year, the artificial intelligence program AlphaFold, developed by Google parent company Alphabet’s DeepMind, made a stunning breakthrough. The machine predicted the shapes of all known proteins, more than 200 million of them. In doing so, it earned one of this year’s $3-million Breakthrough Prizes in Life Sciences.
The results were published in a peer-reviewed science paper in Nature, and the expert response has been enthusiastic. In a news piece in the journal Science, for example, the computational protein scientist Janet Thornton says: “This is a big deal and will change the future of protein structural biology.”
But it’s worth keeping in mind that these predictions are still limited by what we know about proteins’ biochemical functions. In addition, the algorithm only predicts a single state of a protein, whereas in reality, many proteins exist in multiple states. For example, a protein might undergo major rearrangements when it interacts with lipids, such as the case of the mitotic protein Mad2 (pictured in Fig. 4). Forman-Kay’s group has studied these examples and found that, for the most part, AlphaFold accurately predicted the structures of these disordered proteins.
Navigating the Protein Universe with AI
In a watershed moment for protein exploration, an AI tool has mapped the shapes of every protein so far catalogued by scientists. The breakthrough – announced by Google parent company Alphabet-owned AI lab DeepMind today – makes the massive catalog accessible to researchers and opens the door to new discoveries and applications in fields as diverse as medicine, food security and climate science.
The latest update to the free-to-use protein structure prediction tool, dubbed ProteinMPNN, has expanded its scope to include proteins from plants, bacteria and animals as well as humans, which opens up the possibility of using it to tackle “important global issues such as sustainability, food insecurity and neglected diseases”, according to the team behind it. The tool has already been used to help develop drugs and understand how malaria parasite proteins work, for example, by predicting where antibodies could attach themselves.
The latest milestone builds on decades of work by academics including David Baker, who heads the Institute for Protein Design at the University of Washington, and whose lab developed the Rosetta software suite of tools that use both physical rules and statistics to determine a protein’s three-dimensional structure. The rapidly evolving machine learning tools and huge reservoirs of biological data now being tapped by this new generation of algorithms are transforming how scientists forge proteins into pharmaceutical drugs, industrial enzymes, biosensors and food products.
From Data to Discovery: AI’s Protein Journey
Proteins are the building blocks of life, made of chains of amino acids that fold up into complex shapes. Their 3D structure largely determines their function. DNA gives them their initial blueprint, but figuring out how to turn this sequence of atoms into a functional protein has been a formidable challenge (ref 1).
For years, scientists have competed in an annual competition called CASP, where they submit sequences of test proteins and try to predict the structures from their constituent amino acid sequences. The top team wins. This year’s surprise winner, however, was a newcomer, an AI program called AlphaFold, fielded by DeepMind, the London-based artificial intelligence laboratory of Google parent company Alphabet Inc.
The key to AlphaFold’s success lies in the underlying algorithm, which is a deep neural network trained on thousands of previously predicted protein folding configurations. It then uses these data to predict the structurally most likely protein folding shape, based on its constituent amino acid sequence and the distances between its chemical bonds. From there, it starts searching for ways to tweak the shape and generates new protein designs until one hits a sweet spot.