SIRL: Self-Imitation Reinforcement Learning for Single-step Hitting Tasks.
ICARM(2023)
关键词
actual interaction,delayed reward,gradient information,human demonstrations,interaction sample,learning methods,MuJoCo simulation,optimal samples,optimal successful samples,policy optimization,RL policy,RL-based method,sample efficiency,self-imitation learning,self-imitation reinforcement learning,sequential decision-making tasks,single-step hitting tasks,single-step robotic,SIRL algorithm,standard RL frameworks,standard RL methods,supervised learning methods
AI 理解论文
溯源树
样例

生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要