Welcome to the CognitiveRobotics_Robo_Control repository!
This repository contains the code developed for the final project of the course Cognitive Robotics offered at the University of Groningen during the academic year 2019-2020.
The goal of the project can be summarized as follows:
- Trying to come up with an innovative robotic arm trajectory generating controller.
More precisely, the attempt was made to to train an Reinforcement Learning (RL) agent to control a robotic arm in joint space without having to perform possibly expensive Inverse Kinematics operations. The goal of the controller is to compute changes to be applied to the current joint angles of all controlled joints in order to make the robotic arm attain a given goal location and to have the fingers of the arm's end-effector's gripper point towards the goal location. The RL agent was only given the following information:
- The goal position in Cartesian space
- Its end-effector's Center of Mass (COM) Cartesian position
- The normalized vector expressing the orientation of the end-effector's fingers
- The normalized vector pointing from the end-effector's COM towards the goal location
- The set of the robot's current joint angles in radians
In order to achieve this goal, the physics simulation software Pybullet is employed, in which a Franka Emika Panda, i.e. a robotic arm, is simulated.
The simulated arm is controlled by a Proximal Policy Optimization (PPO) Reinforcement Learning agent, where the implementation of the PPO algorithm is provided by Stable-Baselines.
A customized Gym environment, called PandaRobotEnv and defined in customRobotEnv.py, acts as a bridge between the Pybullet simulation and the PPO agent.
The model of the robotic arm is provided by pybullet_robots.
For designing the aforementioned PandaRobotEnv, the KukaGymEnv, included in this repository as a reference environment and originally shipped with the Pybullet installation, served as inspiration for designing some core functions.
However, the used Gym environment's functionality has been thoroughly redesigned and augmented in order to meet our custom goals and to be compatible with both Stable-Baselines' PPO implementation and the Franka Emika Panda.
This repository contains functionality to train PPO agents on controlling a Franka Emika Panda in joint using different reward functions and modes as well as to both visually render and record the performance of trained PPO agents performing their assigned task.
Furthermore, drawing upon a separate repository devoted to the evaluation of this project, which is included as a Git-submodule, the repository contains a set of trained agents, the evaluation of their training outcomes, and the functionality used to perform the evaluation.
An example video showing the evolution of the training progress of one trained PPO agent can be found on YouTube.
Note: The repository has been set up using Python 3.
Software needed for running the code used in this project can be installed using pip as follows:
pip install tensorflow
pip install pybullet
pip install stable-baselines
pip install argparse
For recording videos of trained agents, an extra software is needed. Under Ubuntu, it can be installed by the following command:
sudo apt-get install ffmpeg
To load the included submodules containing the robot models, trained models, and evaluation data, one has to manually load them by executing the following command when loading them for the first time:
git submodule update --init --recursive
To get updated versions of the submodules at some later point, call:
git submodule update --recursive --remote
Note: In case of problems, check out StackOverflow
In the following, the separate functionalities are are quickly introduced.
For training a new or existing PPO agent, the main.py file can be used.
The file takes two optional arguments when being started:
-p: A path to a json-file containing parameter specifications to be used for training.-r: A path to a trained model which is supposed to be loaded for the continuation of its training. When continuing training, a new folder will be created and counting of weight updates starts at 0 again. However, the trained model is used and the path to the read-in model will be recoded in the documentations of used parameters, which are stored in bothparams.csvandparams.json
For the training process, a folder Results/models_unique_folder is created in the repo, where models_unique_folder is a unique identifier for each model and
the folder contains all data associated with the training process of the model. Checkpoints will be saved there, as well as documentation files etc.
Example: Starting training a new agent with parameter settings specified in the file params_6.json:
python3 main.py -p ParameterSettings/params_6.json
Example: Starting training a new agent with default parameter settings:
python3 main.py
A trained model can be visually inspected using the run_trained_model.py file.
Starting the file, 0, 1 or 2 arguments can be provided.
Example: Observe how a given trained default model performs:
python3 run_trained_model.py
Example: Run a specific model provided to the code as an argument:
python3 run_trained_model.py Evaluation_CognitiveRobotics_Robo_Control/Results/PPO2/PandaController_2019_08_11__15_41_05__262730fzyxnprhgl/final_model.zip
Example: Run a specific model provided to the code as a first(!) argument for 1000 time steps given as a second(!) argument:
python3 run_trained_model.py Evaluation_CognitiveRobotics_Robo_Control/Results/PPO2/PandaController_2019_08_11__15_41_05__262730fzyxnprhgl/final_model.zip 1000
Note: In case that the data cannot be found, make sure to load the submodules (as described above).
record_video_of_performing_trained_model.py is the file to record video sequences of a trained agent.
It will create the file structure VideoRecordings/model_name/Recording_date_some_info.mp4.
It can be called without any arguments to record videos of a default model. Alternatively, it can also be called given an argument, which is supposed to be a path to a trained model.
Example: Record a video sequence of a default model:
python3 record_video_of_performing_trained_model.py
Example: Record a video sequence of a specific model:
python3 record_video_of_performing_trained_model.py Evaluation_CognitiveRobotics_Robo_Control/Results/PPO2/PandaController_2019_08_11__15_41_05__262730fzyxnprhgl/final_model.zip
Note: By default, all video sequences are supposed to last 1000 time steps of the simulation.
To change this, adjust the value VIDEO_LENGTH in the said file.
However, due to technical issues, there is still a tendency for videos to encompass more time steps that the provided number.
The included git-submodule Evaluation_CognitiveRobotics_Robo_Control contains a set of trained models, the evaluations of both the training and the final training outcome,
and the tools used for the evaluation.
The tools contain a lot of inline-code and extensive class definitions explaining how the evaluation is done. Feel free to consult the attached project report for an overview.
All trained models are saved in separate folders. Their folders contain Training-check-points, files describing which parameters were used for training, and a documentation of the training process.
callback.py is used by the PPO agent to log training progress and to save checkpoints.
start.sh is not particularly important to the project, but is the script for running the training process on the University's Peregrine cluster.
It has been attached and kept for convenience of the developers.
kukaGymEnv.py served as inspiration for designing our own Gym environment. It is copied from the example environments shipped with the Pybullet installation and kept for comparison.
That's it. Have fun with the repository!