Open
Description
As discussed on #1270, we might want to add a small integration test that builds a transformer from scratch using our blocks and runs a small amount of training on it.
We would want to assert that our loss is finite and decreasing.
As discussed on #1270, we might want to add a small integration test that builds a transformer from scratch using our blocks and runs a small amount of training on it.
We would want to assert that our loss is finite and decreasing.