谷歌浏览器插件
订阅小程序
在清言上使用

SIRL: Self-Imitation Reinforcement Learning for Single-step Hitting Tasks.

ICARM(2023)

引用 1|浏览16
关键词
actual interaction,delayed reward,gradient information,human demonstrations,interaction sample,learning methods,MuJoCo simulation,optimal samples,optimal successful samples,policy optimization,RL policy,RL-based method,sample efficiency,self-imitation learning,self-imitation reinforcement learning,sequential decision-making tasks,single-step hitting tasks,single-step robotic,SIRL algorithm,standard RL frameworks,standard RL methods,supervised learning methods
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要