Regression with Large Language Models for Materials and Molecular Property Prediction

Ryan Jacobs,Maciej P. Polak,Lane E. Schultz, Hamed Mahdavi,Vasant Honavar,Dane Morgan

CoRR（2024）

Cited 0|Views3

Abstract

We demonstrate the ability of large language models (LLMs) to perform material and molecular property regression tasks, a significant deviation from the conventional LLM use case. We benchmark the Large Language Model Meta AI (LLaMA) 3 on several molecular properties in the QM9 dataset and 24 materials properties. Only composition-based input strings are used as the model input and we fine tune on only the generative loss. We broadly find that LLaMA 3, when fine-tuned using the SMILES representation of molecules, provides useful regression results which can rival standard materials property prediction models like random forest or fully connected neural networks on the QM9 dataset. Not surprisingly, LLaMA 3 errors are 5-10x higher than those of the state-of-the-art models that were trained using far more granular representation of molecules (e.g., atom types and their coordinates) for the same task. Interestingly, LLaMA 3 provides improved predictions compared to GPT-3.5 and GPT-4o. This work highlights the versatility of LLMs, suggesting that LLM-like generative models can potentially transcend their traditional applications to tackle complex physical phenomena, thus paving the way for future research and applications in chemistry, materials science and other scientific domains.

Translated text

Bibtex

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

【要点】：本文展示了大型语言模型（LLM）在材料与分子属性回归任务中的应用能力，并发现LLM在经过微调后，其预测效果可以媲美传统模型，具有广泛的应用潜力。

【方法】：使用大型语言模型LLaMA 3对分子和材料属性进行回归预测，仅使用基于组成的输入字符串，并通过仅调整生成损失进行微调。

【实验】：在QM9数据集上对LLaMA 3进行了微调，并测试了其在24种材料属性上的表现，发现LLaMA 3在使用SMILES表示的分子上进行微调后，提供了可用的回归结果，其误差比GPT-3.5和GPT-4o有所改进。

去 AI 文献库对话