DQN Agent playing MountainCar-v0

This is a trained model of a DQN agent playing MountainCar-v0. We train a three-layer MLP as the Q-network. We store the transitions in a replay buffer. After the network converges, we stop training and validate its performance in comparison to a random baseline.

Parameters:

hidden_size = 64
gamma = 0.99
epsilon_decay = 0.999
buffer_size = 10000
batch_size = 64
episodes = 10000

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on MountainCar-v0
self-reported

-120.10 +/- 19.30