Scaling Laws For Scalable Oversight

This repo contains the code to reproduce the plots for our paper Scaling Laws For Scalable Oversight.

Reproducing each figure

Below are instructions to reproduce each figure (aspirationally).

Requirements

The required python packages to run this repo are

numpy
matplotlib
pandas
dotenv
tqdm

We recommend you create a new python venv and install these packages, e.g.

python -m venv scaling
pip install numpy matplotlib pandas dotenv tqdm
source scaling/bin/activate

Let us know if anything does not work with this environment!

Mafia

cd mafia
./script.sh

Debate

cd debate
./script.sh

Backdoor Code

Wargames

Wargames plots can be created by running

cd wargames
# You can run ./run_many_ai_control.sh many times to get more samples, though note that the script itself runs each game in parallel many times
./run_many_ai_control.sh 
./run_analyze_strategies.sh
python3 analyze_ai_control.py
python3 analyze_strategies.py

Combined Plots

Combined plots could be generated by running elo/all_analysis.ipynb.

Theory

All theory plots can be plotted by running:

cd theory
./experiment_with_optimal_strat.sh
python3 plot_theory.py
python3 git.py

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
backdoor-code		backdoor-code
debate		debate
elo		elo
mafia		mafia
theory		theory
wargames		wargames
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scaling Laws For Scalable Oversight

Reproducing each figure

Requirements

Mafia

Debate

Backdoor Code

Wargames

Combined Plots

Theory

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

subhashk01/oversight-scaling-laws

Folders and files

Latest commit

History

Repository files navigation

Scaling Laws For Scalable Oversight

Reproducing each figure

Requirements

Mafia

Debate

Backdoor Code

Wargames

Combined Plots

Theory

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages