This repo aims to replicate the result of Expected Policy Gradient with pytorch.
- Python
- PyTorch (tested on 1.8.1+cpu and 1.9.0+cpu)
- OpenAI Gym
- MuJoCo (Warning: MuJoCo is not supported by Apple Silicon)
- numpy
- numdifftools (only for epg_rb_target_numdifftools.py and epg_vanilla.py)
- matplotlib and pandas (for graphing)
Install the dependencies:
pip install -r requirements.txt
- InvertedPendulum-v2
- HalfCheetah-v2
- Reacher-v2
- Walker2d-v2
Train model
python [spg.py|ddpg.py|epg_*.py]
Generate a graph with the existing data
python figure.py
Generate a comparison graph of the variation4 and ddpg in HalfCheetah-v2
python variation4/test_curve.py