Unsupervisedly Prompting AlphaFold2 for Accurate Few-Shot Protein Structure Prediction.

Jun Zhang,Sirui Liu,Mengyun Chen,Haotian Chu,Min Wang,Zidong Wang,Jialiang Yu,Ningxi Ni,Fan Yu,Dechin Chen,Yi Isaac Yang,Boxin Xue,Lijiang Yang,Yuan Liu,Yi Qin Gao

Journal of chemical theory and computation（2023）SCI 1区SCI 2区

Changping Lab

Cited 2|Views59

Abstract

Data-driven predictive methods that can efficiently and accurately transform protein sequences into biologically active structures are highly valuable for scientific research and medical development. Determining an accurate folding landscape using coevolutionary information is fundamental to the success of modern protein structure prediction methods. As the state of the art, AlphaFold2 has dramatically raised the accuracy without performing explicit coevolutionary analysis. Nevertheless, its performance still shows strong dependence on available sequence homologues. Based on the interrogation on the cause of such dependence, we presented EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets. By prompting the model with calibrated or virtually generated homologue sequences, EvoGen helps AlphaFold2 fold accurately in the low-data regime and even achieve encouraging performance with single-sequence predictions. Being able to make accurate predictions with few-shot MSA not only generalizes AlphaFold2 better for orphan sequences but also democratizes its use for high-throughput applications. Besides, EvoGen combined with AlphaFold2 yields a probabilistic structure generation method that could explore alternative conformations of protein sequences, and the task-aware differentiable algorithm for sequence generation will benefit other related tasks including protein design.

Translated text

Key words

Homology Modeling,Secondary Structure Prediction,Support Vector Machines

Bibtex

AI Read Science

AI Summary

AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.

Example

Background

Key content

Introduction

Methods

Results

Related work

Fund

Key content

Pretraining has recently greatly promoted the development of natural language processing (NLP)
We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Try using models to generate summary,it takes about 60s

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper