Com^2: A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models

Kai Xiong,Xiao Ding,Yixin Cao, Yuxiong Yan,Li Du, Yufei Zhang, Jinglong Gao, Jiaqian Liu,Bing Qin,Ting Liu

arXiv · Computation and Language（2025）

Cited 0|Views1

Abstract

Large language models (LLMs) have mastered abundant simple and explicit commonsense knowledge through pre-training, enabling them to achieve human-like performance in simple commonsense reasoning. Nevertheless, LLMs struggle to reason with complex and implicit commonsense knowledge that is derived from simple ones (such as understanding the long-term effects of certain events), an aspect humans tend to focus on more. Existing works focus on complex tasks like math and code, while complex commonsense reasoning remains underexplored due to its uncertainty and lack of structure. To fill this gap and align with real-world concerns, we propose a benchmark Com^2 focusing on complex commonsense reasoning. We first incorporate causal event graphs to serve as structured complex commonsense. Then we adopt causal theory (e.g., intervention) to modify the causal event graphs and obtain different scenarios that meet human concerns. Finally, an LLM is employed to synthesize examples with slow thinking, which is guided by the logical relationships in the modified causal graphs. Furthermore, we use detective stories to construct a more challenging subset. Experiments show that LLMs struggle in reasoning depth and breadth, while post-training and slow thinking can alleviate this. The code and data are available at https://github.com/Waste-Wood/Com2.

Translated text

Bibtex

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

【要点】：本文提出了Com^2，一个基于因果图引导的复杂常识推理基准，用于探索大型语言模型在处理复杂常识推理任务上的能力。

【方法】：研究采用了因果事件图来表示结构化的复杂常识，并通过因果理论（如干预）修改这些图以产生不同的人类关注场景，然后使用大型语言模型通过缓慢思维生成示例，该思维过程由修改后的因果图中的逻辑关系指导。

【实验】：实验使用了侦探故事构建了一个更具挑战性的子集，并展示了大型语言模型在推理深度和广度上的困难，同时证明了后训练和缓慢思维可以缓解这一问题。数据集名称未明确提及，但从文中推测应为自定义构建的数据集，代码和数据可在https://github.com/Waste-Wood/Com2获取。

去 AI 文献库对话