Advantage-Weighted
off-policy RL | Advantage-Weighted Regression (AWR):组合先前策略得到新 base policy
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning 论文题目:Advantage-Weighted Regression: Simple and Scalable Off-Polic ......