Reinforcement Learning with Variational Bayesian Inference and Elastic Weight Consolidation (EWC).
📌 Paper
In this project we present an implementation of a deep reinforcement learning agent, specifically using a Deep Deterministic Policy Gradient (DDPG) algorithm.
The underlying Artificial Neural Network (ANN) has an architecture similar to that of typical RESNETs, though with a few differences, as architectures such as RESnet50 are simply too large for this project.
Our agent uses Variational Bayesian Networks, which in broad terms means that the weights in the networks are represented by probability distributions, rather than single values. This allows the agent to describe uncertainty in weights, and estimate uncertainty in predictions.
Further, the agent is using Elastic Weight Consolidation (EWC), which is a method of preventing catastrophic forgetting. Intuitively, this can be thought of as making neurons more resilient to changes in the environment, in proportion to their importance. Neurons which are important to the current task are less likely to be changed, and thus are less likely to forget the current task.
Ultimately, this agent is exemplified by playing Atari games, such as Space Invaders and Breakout. However, the agent is not limited to these games, and can be used for any discrete action space environment.
pip3 install -r requirements.txt
Install the ROMs for the Atari games, a guide can be seen here:
https://github.com/openai/atari-py
The following is an example command which should be able to be run out of the box.
python3 train.py --render --verbose --epsilon_decay 0.99911 --environment space_invaders --n_step 4 --lr_model 0.001 --frames 4 --frame_skip 4 --batch_size 32 --gamma 0.99 --epsilon_min 0.01 --save --max_episodes 1000 --params_update 4 --memory_capacity 5000 --backup
If your device supports CUDA, you can use the GPU to train the model. To do so, add the --use_cuda
flag to the command above.
More information about the flags can be found by running python3 train.py --help
.
Once the training is completed, or has been stopped via. Ctrl+C
, the model will be saved inside a new directory, where the metrics will be saved as a .npy
file.
usage: train.py [-h] [--environment {breakout,space_invaders,tennis}]
[--max_episodes MAX_EPISODES] [--max_steps MAX_STEPS]
[--memory_capacity MEMORY_CAPACITY] [--batch_size BATCH_SIZE]
[--epsilon EPSILON] [--epsilon_decay EPSILON_DECAY]
[--epsilon_min EPSILON_MIN] [--lr_model LR_MODEL]
[--seed SEED] [--kl_weight KL_WEIGHT] [--gamma GAMMA] [--save]
[--plot] [--verbose] [--use_cuda] [--frame_skip FRAME_SKIP]
[--test_model TEST_MODEL] [--backup] [--n_step N_STEP]
[--frames FRAMES] [--params_update PARAMS_UPDATE] [--imwrite]
[--render | --slow_render]
optional arguments:
-h, --help show this help message and exit
--environment {breakout,space_invaders,tennis}
--max_episodes MAX_EPISODES
--max_steps MAX_STEPS
--memory_capacity MEMORY_CAPACITY
--batch_size BATCH_SIZE
--epsilon EPSILON
--epsilon_decay EPSILON_DECAY
--epsilon_min EPSILON_MIN
--lr_model LR_MODEL
--seed SEED
--kl_weight KL_WEIGHT
--gamma GAMMA
--save
--plot
--verbose
--use_cuda
--frame_skip FRAME_SKIP
--test_model TEST_MODEL
--backup
--n_step N_STEP
--frames FRAMES
--params_update PARAMS_UPDATE
--imwrite
--render
--slow_render
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
This is caused by the frequency of parameters being updated. To fix this, decrement the
--params_update
flags value in the above command.
RuntimeError: CUDA out of memory
Same as above.
Rendering doesn't work, env.render() complains.
This is caused by the
--render
flag. To fix this, remove the--render
flag from the above command if it is there.
ValueError: crop_width has an invalid length: 3
This happens if you've installed the ROMs via. ale (
A.L.E. (Arcade Learning Environment)
) instead ofatari-py
. To fix this, uninstallale-py
and installatari-py
instead. (See https://github.com/openai/atari-py)