Chrome Extension
WeChat Mini Program
Use on ChatGLM

The Open Catalyst 2020 (OC20) Dataset and Community Challenges.

ACS Catalysis(2021)

Facebook AI Res FAIR | Carnegie Mellon Univ | Stanford Univ | Natl Energy Res Sci Comp Ctr NERSC

Cited 325|Views159
Abstract
Catalyst discovery and optimization is key to solving many societal and energy challenges including solar fuels synthesis, long-term energy storage, and renewable fertilizer production. Despite considerable effort by the catalysis community to apply machine learning models to the computational catalyst discovery process, it remains an open challenge to build models that can generalize across both elemental compositions of surfaces and adsorbate identity/configurations, perhaps because datasets have been smaller in catalysis than related fields. To address this we developed the OC20 dataset, consisting of 1,281,040 Density Functional Theory (DFT) relaxations ( 264,890,000 single point evaluations) across a wide swath of materials, surfaces, and adsorbates (nitrogen, carbon, and oxygen chemistries). We supplemented this dataset with randomly perturbed structures, short timescale molecular dynamics, and electronic structure analyses. The dataset comprises three central tasks indicative of day-to-day catalyst modeling and comes with pre-defined train/validation/test splits to facilitate direct comparisons with future model development efforts. We applied three state-of-the-art graph neural network models (CGCNN, SchNet, Dimenet++) to each of these tasks as baseline demonstrations for the community to build on. In almost every task, no upper limit on model size was identified, suggesting that even larger models are likely to improve on initial results. The dataset and baseline models are both provided as open resources, as well as a public leader board to encourage community contributions to solve these important tasks.
More
Translated text
Key words
catalysis,renewable energy,datasets,machine learning,graph convolutions,force field
PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Try using models to generate summary,it takes about 60s
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Related Papers
2018

被引用353 | 浏览

Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:OC20数据集是一个大规模的催化剂数据集,包括了1,281,121个密度泛函理论(DFT)松弛计算结果以及相关的结构扰动、分子动力学模拟和电子结构分析结果。该数据集涵盖了多种物质、表面和吸附剂的组合,并提供了三个典型的催化剂建模任务的预定义训练/验证/测试集,以促进模型开发的直接比较。该论文还提供了三种基准模型,为研究人员提供了进一步改进的基础。

方法】:开发了OC20数据集,使用密度泛函理论计算和相关技术进行数据收集和处理,并采用三种图神经网络模型进行建模。

实验】:在三个典型的催化剂建模任务上,利用OC20数据集和三个图神经网络模型进行训练和测试,获得了初始结果,并发现在大多数任务中模型的规模没有上限,因此使用更大的模型很可能能够进一步改善结果。