A calculator for estimating the compute cost of building a sparse auto-encoder layer into an LLM to make concepts inside such LLM interpretable. Auto-published at https://huge.github.io/interpretable-layer-cost
todo:
-
add intro about http://transformer-circuits.pub/2023/monosemantic-features/index.html#problem-setup , short read, ..
-
explain params( like http://transformer-circuits.pub/2023/monosemantic-features/index.html#problem-setup ) and draw the inserted layer struct
-
sketch replication OSS efforts and ideas to be explored/developed on public-weights LLMs
-
instead of the parameter for sheer training samples, maybe a target precision form a scaling law could be set