郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布!

Published as a conference paper at ICLR 2020
ABSTRACT
1 INTRODUCTION
2 BACKGROUND
2.1 TRAINING SETUP
2.2 BAYESIAN REINFORCEMENT LEARNING
3 BAYES-ADAPTIVE DEEP RL VIA META-LEARNING
3.1 APPROXIMATE INFERENCE
3.2 TRAINING OBJECTIVE
4 RELATED WORK
5 EXPERIMENTS
5.1 GRIDWORLD
5.2 MUJOCO CONTINUOUS CONTROL META-LEARNING TASKS
6 CONCLUSION & FUTURE WORK
Supplementary Material
A FULL ELBO DERIVATION
B EXPERIMENTS: GRIDWORLD
B.1 ADDITIONAL REMARKS
B.2 HYPERPARAMETERS
B.3 COMPARISON TO RL2
C EXPERIMENTS: MUJOCO
C.1 LEARNING CURVES
C.2 TRAINING DETAILS AND COMPARISON TO RL2
C.3 CHEETAHDIR TEST TIME BEHAVIOUR
C.4 RUNTIME COMPARISON
C.5 LATENT SPACE VISUALISATION
C.6 HYPERPARAMETERS