DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION (ICLR ‘20) |
HackMD |
Raj |
This paper focuses to learn long-horizon behaviors by propagating analytic value gradients through imagined trajectories using a recurrent state space model (PlaNet, haffner et al) |
The Value Equivalence Principle for Model-Based Reinforcement Learning (NeurIPS ‘20) |
HackMD |
Raj |
This paper introduces and studies the concept of equivalence for Reinforcement Learning models with respect to a set of policies and value functions. It further shows that this principle can be leveraged to find models constrained by representational capacity, which are better than their maximum likelihood counterparts. |
Stackelberg Actor-critic: A game theoretic perspective |
HackMD |
Sharath |
This paper formulates the interaction between the actor and critic ans a stackelberg games and leverages the implicit function theorem to calculate the accurate gradient updates for actor and critic. |
Curriculum learning for Reinforcement Learning Domains |
HackMD |
Sharath |
This is a survey paper on curriculum learning methods in reinforcement learning. |
Policy Gradient Methods for Reinforcement Learning with Function Approximation (NIPS 1999) |
HackMD |
Raj |
This paper provides the first policy gradient algorithm based on neural networks. |
Reinforcement Learning via Fenchel Rockafellar Duality |
HackMD |
Sharath |
This paper reviews the basic concepts of fenchel duality, f-divergences and shows how can these set of tools can be applied tin the context of reinforcement learning to derive theoritcally as well as practically robust algorithms. |
High-Dimensional Continuous Control Using Generalized Advantage Estimation |
HackMD |
Raj |
This paper gives an algorithm with an advantage estimator and TRPO technique to empirically guarantee monotonic policy improvement. |
Off-Policy Actor-Critic (ICML ‘12) |
HackMD |
Sharath |
This paper presents the first off-policy version of the actor-critic algorithms and derives a simple and elegant algorithm which performs better than the existing algorithms on standard reinforcement-learning benchmark problems. |
Combining Physical Simulators and Object-Based Networks for Control (ICRA ‘19) |
HackMD |
Sharath |
In this paper the authors proposed a hybrid dynamics model, Simulation-Augemented Interaction Networks, where they incorporated Interaction Networks into a physics engine for solving real world complex robotics control tasks. |
Learning Agile and Dynamic Motor Skills for Legged Robots |
HackMD |
Sharath |
This paper tackles the sim2real transfer problem for legged robots. |
PAC-Bounds-for-Multi-armed-Bandit (CoLT ‘02) |
HackMD |
Raj |
This paper provides a technique to guarantee PAC bounds based on the rewards distirbution of the particular problem achieving better sample complexity. |
Deep Reinforcement Learning for Dialogue Generation |
HackMD |
Om |
This paper discusses how better dialogue generation can be achieved using RL. It provides a technique to convert converstational properties like informativity, coherence and ease of answering into reward functions. |
Rainbow: Combining Improvements in Deep Reinforcement Learning |
HackMD |
Om |
The paper discusses add-ons to the DQN and A3C that can improve their performance, namely Double DQN, Prioritized Experience Replay, Dueling Network Architecture, Distributional Q-Learning, Noisy DQN. |
The Option-Critic Architecture |
HackMD |
Om |
Paper discusses the hierarchical reinforcement learning method implimentation based on temporal abstractions. |
Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets |
HackMD |
Om |
The paper suggests and provides experimental justification for methods to tackle Distribution Shift. |
FeUdal Networks for Hierarchical Reinforcement Learning |
HackMD |
Om |
This paper describes the FeUdal Network model. Employs a manager-worker hierarchy. |