Open
Description
(issue opened after discussion with @lhoestq)
In InferenceClient.chat_completion
, one can pass a response_format
which constraint the output format. It must be either a regex or a json schema. A usual use case is to have a dataclass or a Pydantic model and you want the LLM to generate an instance of that class. This can currently be done like this:
client.chat_completion(..., response_format={"type": "json", "value": MyCustomModel.schema()})
It would be good to either:
- document this particular use case for convenience
- or even allow passing
client.chat_completion(..., response_format=MyCustomModel)
and handle the serialization automatically before making the http call. If we do so, pydantic shouldn't be a dependency.
Note: the same should be done for client.text_generation(..., grammar=...)
.
Note: it seems that it's also possible to handle simple dataclasses with something like this. Unsure if it's worth the hassle though. If we add that, we should not add a dependency, simply copy the code + license into a submodule given how tiny and unmaintained the code is.