Chrome Extension
WeChat Mini Program
Use on ChatGLM

Improving Alignment of Dialogue Agents Via Targeted Human Judgements

Computing Research Repository (CoRR)(2022)

Cited 527|Views604
Abstract
We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We use reinforcement learning from human feedback to train our models with two new additions to help human raters judge agent behaviour. First, to make our agent more helpful and harmless, we break down the requirements for good dialogue into natural language rules the agent should follow, and ask raters about each rule separately. We demonstrate that this breakdown enables us to collect more targeted human judgements of agent behaviour and allows for more efficient rule-conditional reward models. Second, our agent provides evidence from sources supporting factual claims when collecting preference judgements over model statements. For factual questions, evidence provided by Sparrow supports the sampled response 78% of the time. Sparrow is preferred more often than baselines while being more resilient to adversarial probing by humans, violating our rules only 8% of the time when probed. Finally, we conduct extensive analyses showing that though our model learns to follow our rules it can exhibit distributional biases.
More
Translated text
Key words
Spoken Dialogue Systems,Language Modeling,Topic Modeling,Part-of-Speech Tagging,Machine Translation
PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Try using models to generate summary,it takes about 60s
Must-Reading Tree
Example
Using MRT to find the research sequence of this paper
Related Papers
A E Elo, Sam Sloan
2008

被引用537 | 浏览

Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点:论文介绍了Sparrow,一种信息获取对话代理系统,通过增加与人员评估相关的两个新元素来对其行为进行评估和训练,以提高其帮助性、正确性和无害性。研究表明,这种针对性的人员评估能够收集更有针对性的评价来建立奖励模型,并通过为模型陈述提供支持性证据来收集关于事实性问题的偏好评价。

方法:运用强化学习和人员反馈进行训练,并将良好对话的要求细化为自然语言规则,让人员进行分别评价每一条规则。

实验:通过提供支持事实性主张的来源证据,在事实性问题上,Sparrow提供的证据在78%的情况下支持抽样响应。Sparrow相对于基准模型更受人们偏爱,并且在针对性探测中更具韧性,在接受探测时仅违反规则8%的时间。对模型进行全面分析表明,尽管模型遵循规则,但可能存在分布偏差。