WeChat Mini Program
Old Version Features

Unsupervisedly Prompting AlphaFold2 for Accurate Few-Shot Protein Structure Prediction.

Journal of chemical theory and computation(2023)SCI 1区SCI 2区

Changping Lab

Cited 2|Views59
Abstract
Data-driven predictive methods that can efficiently and accurately transform protein sequences into biologically active structures are highly valuable for scientific research and medical development. Determining an accurate folding landscape using coevolutionary information is fundamental to the success of modern protein structure prediction methods. As the state of the art, AlphaFold2 has dramatically raised the accuracy without performing explicit coevolutionary analysis. Nevertheless, its performance still shows strong dependence on available sequence homologues. Based on the interrogation on the cause of such dependence, we presented EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets. By prompting the model with calibrated or virtually generated homologue sequences, EvoGen helps AlphaFold2 fold accurately in the low-data regime and even achieve encouraging performance with single-sequence predictions. Being able to make accurate predictions with few-shot MSA not only generalizes AlphaFold2 better for orphan sequences but also democratizes its use for high-throughput applications. Besides, EvoGen combined with AlphaFold2 yields a probabilistic structure generation method that could explore alternative conformations of protein sequences, and the task-aware differentiable algorithm for sequence generation will benefit other related tasks including protein design.
More
Translated text
Key words
Homology Modeling,Secondary Structure Prediction,Support Vector Machines
PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Try using models to generate summary,it takes about 60s
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本文提出了一种名为EvoGen的元生成模型,旨在改进AlphaFold2对序列同源体较少的蛋白质结构预测性能。

方法】:EvoGen通过去噪搜索到的多序列 alignment (MSA) 或生成虚拟 MSAs 来操纵折叠景观。

实验】:实验表明,EvoGen与AlphaFold2结合使用,在低数据环境下也能准确折叠蛋白质结构,甚至单序列预测也能取得鼓舞人心的性能,扩展了AlphaFold2对孤儿序列的应用,并且其概率结构生成方法可用于探索蛋白质序列的其他构象。