Synthesize Execute Instruct Debug Rank

This is a replication package for SEIDR framework, which is AI-assisted program synthesis. Given a problem description and some input-output examples, the framework generates a program that solves the problem. The framework has been published in the GECCO'23 Proceedings and is undergoing revision for the extension in the ACM TELO journal.

TELO journal paper: Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models

Original GECCO'23 conference paper extended to the TELO journal: Fully Autonomous Programming with Large Language Models

Consider citing the work if you use SEIDR in your research.

seidr - code for running SEIDR on PSB2 (with benchmark.py) or HumanEval (with benchmark_humaneval.py)
scripts - Slurm scripts used to run SEIDR with a specific model on a specific dataset
config - csv tables of experimental setup, where each row corresponds to one problem in a dataset, each table is in a subfolder named after the dataset name
psb2-meta - natural language descriptions of PSB2 problems

Usage

from seidr import dev
help(dev)

Reproducing the experiments from our paper

The experiments are contained in benchmark.py and benchmark_humaneval.py files. When you run this file, the AI-generated programs are commited to a dedicated github repository, while the metrics (i.e., how many tests every program passes) will be logged in your Weights and Biases

Prerequisites

Install dependencies

Either install the project with Poetry or install seidr from pypi:

pip install seidr

With Poetry and Python 3.11 (or later versions):

cd SEIDR_TELO
poetry env use python3.11 
poetry install

With Python venv module:

cd SEIDR_TELO/src
python3.11 -m venv venv
source venv/bin/activate
pip install -r src/requirements_src.txt

Note that depending on your Python version management, you may need to change python3.11 to another alias or Python executable.

Set up Weights and Biases

Create an account on Weights and Biases
Install the Weights and Biases library
Run wandb login and follow the instructions

Set up a GitHub repository for solutions

Go to github, log in to the account that's going to push AI-generated code. Remember the $username and $email for that account.
Go here and generate an access $token
Set GIT_USER to "Bot" or whatever the name of the committer shall be
Set GIT_EMAIL to $email
Set GIT_REMOTE to https://$username:$[email protected]/$repo

Note that you can use a non-GitHub git hosting.

Set up OpenAI access

OpenAI account is needed with access to gpt-3.5-turbo and an OPENAI_API_KEY environment variable set to your OpenAI API access token.

Set up Ollama

Run Ollama with Llama 3-8B or another model locally or on a server. In the latter case, start the Ollama server with the following commands and note the URL:PORT pair:

OLLAMA_HOST=URL:PORT ollama serve &
OLLAMA_HOST=URL:PORT ollama pull llama3 &

Example .config file layout:

# Github
export GIT_REMOTE=https://USERNAME:[email protected]/SOLUTIONS_REPO
export GIT_USER=...
export GIT_EMAIL=...

# Data
export DATA_PATH=...

# OpenAI
export OPENAI_API_KEY=...
export OPENAI_ORG=...

# WandB
export WANDB_ENTITY=...
export WANDB_DIR=...

Run the experiments

If you're using Slurm, write a run.sh file with python benchmark.py and run it with sbatch run.sh --array=1-500. If not, run TASK_ID=n python benchmark.py to re-run one of our experiments exactly, or set the parameters yourself as below.

For example, for basement problem in PSB2, run SEIDR without lexicase selection as follows:

python3 benchmark.py \
    --task_id 0 \
    --problem bowling \
    --language Python \
    --branching_factor 2 \
    --max_programs 100 \
    --drafts_per_prompt 2 \
    --explanations_per_program 2 \
    --repairs_per_explanation 2 \
    --beam_width 2 \
    --log INFO \
    --lexicase_selection False \
    --dataset humaneval \
    --model_name gpt-3.5-turbo \
    --valid_examples 50 \
    --experiment_id 0

To run an example with SEIDR with Llama 3 served by Ollama at URL:PORT on HumanEval with lexicase, run the following:

python3 benchmark_humaneval.py \
    --task_id 0 \
    --problem Python/0 \
    --language Python \
    --branching_factor 2 \
    --max_programs 100 \
    --drafts_per_prompt 2 \
    --explanations_per_program 2 \
    --repairs_per_explanation 2 \
    --beam_width 2 \
    --log INFO \
    --lexicase_selection True \
    --dataset humaneval \
    --model_name llama3 \
    --experiment_id 0 \
    --ollama_url "http://URL:PORT"

Example Slurm scripts are stored in scripts/ and tables with hyperparameters in /config

Name		Name	Last commit message	Last commit date
Latest commit History 282 Commits
.github/workflows		.github/workflows
config		config
psb2-meta		psb2-meta
scripts		scripts
seidr		seidr
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
README.md		README.md
benchmark.py		benchmark.py
benchmark_humaneval.py		benchmark_humaneval.py
configure_experiments.py		configure_experiments.py
generate_experiment_csv_mutliple_runs.py		generate_experiment_csv_mutliple_runs.py
parse_humaneval_tests.py		parse_humaneval_tests.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Synthesize Execute Instruct Debug Rank

TELO journal paper: Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models

Original GECCO'23 conference paper extended to the TELO journal: Fully Autonomous Programming with Large Language Models

Contents

Usage

Reproducing the experiments from our paper

Prerequisites

Install dependencies

Set up Weights and Biases

Set up a GitHub repository for solutions

Set up OpenAI access

Set up Ollama

Run the experiments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

vadim0x60/seidr

Folders and files

Latest commit

History

Repository files navigation

Synthesize Execute Instruct Debug Rank

TELO journal paper: Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models

Original GECCO'23 conference paper extended to the TELO journal: Fully Autonomous Programming with Large Language Models

Contents

Usage

Reproducing the experiments from our paper

Prerequisites

Install dependencies

Set up Weights and Biases

Set up a GitHub repository for solutions

Set up OpenAI access

Set up Ollama

Run the experiments

About

Resources

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages