Highlights
- Pro
Pinned Loading
-
dataset_foundry
dataset_foundry PublicA toolkit for building validated datasets. Uses the concept of data pipelines to load, generate and validate datasets, especially for those used in AI safety evaluations.
Python 1
-
single_file_backdoors
single_file_backdoors PublicEvaluates how AI models might inject backdoors when refactoring single files and how to detect and defend against such insertions.
-
full-repo-refactor
full-repo-refactor PublicA Control Arena setting for evaluating agents inserting backdoors while refactoring full repos.
Python
-
full_repo_datasets
full_repo_datasets PublicContains datasets of full repos for doing AI safety research, along with Dataset Foundry pipelines for generating repos and datasets.
Python
-
llm-action-evals
llm-action-evals PublicA framework that allows non-programmers to build AI safety evals of LLMs taking actions in the real world via function-calling.
Python
-
ai-digest-demo
ai-digest-demo PublicDemo of targeted AI persuasion by building a profile of user using Facebook Likes. Developed as an 8-hour take-home test for AI Digest.
TypeScript
If the problem persists, check the GitHub status page or contact support.