InferenceClient: allow passing a pydantic model as `response_format`

(issue opened after discussion with @lhoestq)

In `InferenceClient.chat_completion`, one can pass a `response_format` which constraint the output format. It must be either a regex or a json schema. A usual use case is to have a dataclass or a Pydantic model and you want the LLM to generate an instance of that class. This can currently be done like this:

```PY
client.chat_completion(..., response_format={"type": "json", "value": MyCustomModel.schema()})
```

It would be good to either:
1. document this particular use case for convenience
2. or even allow passing `client.chat_completion(..., response_format=MyCustomModel)` and handle the serialization automatically before making the http call. If we do so, pydantic shouldn't be a dependency.

**Note:** the same should be done for `client.text_generation(..., grammar=...)`.

---

**Note:** it seems that it's also possible to handle simple dataclasses with something like [this](https://github.com/Peter554/dc_schema/blob/master/dc_schema/__init__.py). Unsure if it's worth the hassle though. If we add that, we should not add a dependency, simply copy the code + license into a submodule given how tiny and unmaintained the code is.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

InferenceClient: allow passing a pydantic model as `response_format` #2646

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

InferenceClient: allow passing a pydantic model as response_format #2646

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

InferenceClient: allow passing a pydantic model as `response_format` #2646