Open
Description
Our generate()
passes are probably the single most complex compiled graphs we ship. It would be useful to have a test that assert that the numerics of these routines are stable across torch, tf, and jax, for a real model checkpoints.
One version of this would be to add some integration test that loads a small generative preset, prompts the model fixed text string, and runs a short amount of generation for beam
, contrastive
and greedy
samples (which are all deterministic).
We could consider going farther and testing random samplers with a fixed seed, but this would need to be done per-backend, so a little more annoying