-
-
Notifications
You must be signed in to change notification settings - Fork 190
Structured Output & JSON mode response support #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7d615e7
to
2cf6f91
Compare
I like this approach! Very simple implementation. |
Take a look at #90. It was a schema builder, I thought about adding, similar to how tools are defined. |
3b53b08
to
cbcbc7f
Compare
b9b3d63
to
58d0b6a
Compare
connection: @connection, | ||
& | ||
) | ||
response = @provider.with_response_schema(@response_schema) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crmne should we reset the @response_schema after the completion so it doesn't apply to subsequent messages in the chat?
Learning from your comment on my other PR about temperature I realize now that your with_*
pattern is meant to only apply to the next completion, so that's why I'm asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, with_*
applies to all consequent messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I actually updated the code to reset the response schema after completion after your comment on my other PR about temperature already!
I've been battle-testing this code at my company Osello (which is also a sponsor now of this project) and I realized it does make more sense to reset the response_schema after completion because in practice subsequent chat messages are most likely not meant to follow the same format:
chat.with_response_format(type: :string, enum: %w["Toronto", "Ottawa"])
.ask("What's the capital of Canada?")
.content
# => "Ottawa"
chat.ask("How long has it been the capital?")
# => "Ottawa has been the capital of Canada since 1857."
chat.with_response_format(type: :integer).ask("How many years is that?")
# => 168
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the the design of the with_*
prefix I think it would be best to not break the applies to all interface. What about splitting it to a different prefix?
# Applies to all messages
chat = RubyLLM.chat.with_response_format(type: :string)
chat.ask("What's the capital of Canada?")
# => "Ottawa"
# Applies to current message
chat.as(type: :integer).ask("How many years is that?")
# => 168
# Resets back
chat.ask("How long has it been the capital?")
# => "Ottawa has been the capital of Canada since 1857."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sirwolfgang good idea!
Although as
might be a bit too ambiguous. Maybe .with_next_response_in_format(...)
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea is that I could make it so that with_response_format
takes a block, and that can be used to reset format after:
chat.with_response_format(type: :string, enum: %w["Toronto", "Ottawa"]) do
chat.ask("What's the capital of Canada?").content
end
# => "Ottawa"
chat.ask("How long has it been the capital?")
# => "Ottawa has been the capital of Canada since 1857."
chat.with_response_format(type: :integer)
chat.ask("How many years ago is that?")
# => 168
chat.ask("How many years ago will that be next year?")
# => 169
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayelkaake I think linguistically/ergonomically it would be better to split and not use the .with
prefix at all, making it a shorter scan/parse.
Totally open to ideas other than .as
, but also curious to what you think it might collide with? I don't think it's structurally more ambiguous than .with
. The only other as
convention I can think of is rails routes, so I also don't think it's mnemonically overloaded.
Could extend it to .as_format
, .as_type
, .as_response_type
or something else, if we want to preserve the root namespace.
Otherwise, if we could delay execution like that of AR, I can see the argument for making it a post setting; something like:
agent.ask("...?").in(type: :integer)
agent.ask("...?").as(type: :integer)
agent.ask("...?").structured_as(type: :integer)
agent.ask("...?").formated_as(type: :integer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, just doesn't read well with English "ask someone to do something as...".... and also it should be clear you're not modifying the query like it is with AR, it's modifying the response.
I like the postfix format better. Maybe something like agent.ask("...?").response_as(:integer)
but I think that might require some major updates to this library to get there.
Most APIs don't let you mutate the response format in real-time, so LLMs are kind of introducing the need for a new pattern maybe? I've been scratching my head about these things a lot over the last couple weeks! 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think refactoring to support a more dynamic method chaining should be a different PR; But we could setup the expect syntax here, and build towards that; Since it should functionally work in either order.
agent.<token>.ask("...?") => agent.ask("...?").<token>
response/d feels a little weird to me. I could see this interface also make sense for loading personas; like respond_as(:support_agent)
. Which might be the process of chaining assistants for processing. Like
timekeeper = RubyLLM.chat.with_tool(TIME)
groot = RubyLLM.chat.with_instructions(GROOT)
timekeeper.ask("What time is it?").respond_as(groot) => "I am grooooooot"
58d0b6a
to
af296c8
Compare
You rock @jayelkaake ! |
LMK if there's anything I can contribute to this to help get it over the link @crmne @jayelkaake. I'd like to get this in along with #144 to move forward on the path of using structured responses with OpenRouter |
@message.try('tool_call_id=', tool_call_id) | ||
@message.try('input_tokens=', message.input_tokens) | ||
@message.try('output_tokens=', message.output_tokens) | ||
@message.try('content_schema=', message.schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayelkaake I think this should be @message.try('content_schema=', message.content_schema)
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks for pointing out @tkoenig
It was actually a merge problem for me working with multiple branches since this one hasn't been reviewed yet 😛 (was fixed in my dev branch). I think I need to stop making changes for a little while until it gets a review so I'm not just writing vapourware.
835327c
to
e325cba
Compare
@jayelkaake is this one ready to use in production? Should I try it out this branch? Still running my own version but would like to use this if this will be merged/ |
@jayelkaake I just do a rebase in main and trick part like this, FYI and hoping this PR merged soon. |
Thank you for the work on this PR. It's definitely a common use case I was sad to see wasn't in the gem already. Anything I can do to help get it over the finish line? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work on this @jayelkaake, and for the sponsorship support. I've taken a close look at this PR and while I appreciate the effort, I don't think this API design fits with RubyLLM's philosophy.
The core issue: This approach is inconsistent with how RubyLLM handles structured data elsewhere. Compare your proposed API:
chat.with_response_format(:object, properties: { hobbies: { type: :array, items: { type: :string, enum: %w[Soccer Golf Hockey] } } })
To how we handle Tools:
class Weather < RubyLLM::Tool
description "Gets current weather for a location"
param :latitude, desc: "Latitude (e.g., 52.5200)"
param :longitude, desc: "Longitude (e.g., 13.4050)"
def execute(latitude:, longitude:)
# implementation
end
end
The Tools API is self-documenting, reusable, and feels Ruby-native. Your approach is essentially raw JSON Schema dumped into method calls, which I have to backtrack on.
Additional concerns:
- Parameter overloading is confusing - sometimes a symbol, sometimes symbol + hash, sometimes just hash
- Thread.current conflicts with Context - we just shipped Context in 1.3.0 specifically to avoid thread-local state
- Magic behavior switching between
json_schema
andjson_object
modes is too implicit
What structured responses could look like in RubyLLM:
(need to think this through)
class CustomerData < RubyLLM::Schema
string :name, required: true
string :email
integer :age, range: 18..100
end
class HobbyList < RubyLLM:Schema
array :hobbies, items: { type: :string, enum: %w[Soccer Golf Hockey] }
end
# Usage
response = chat.with_schema(CustomerData).ask("Generate a customer")
# or
response = chat.ask("Generate a customer", schema: CustomerData)
puts response.name # => "John Doe"
puts response.email # => "[email protected]"
puts response.age # => 42
This approach would be:
- Consistent with our existing patterns
- Reusable across conversations
- Self-documenting
- Properly integrated with Context instead of Thread.current
I know this means starting over, but I'd rather ship something that feels like RubyLLM than rush a feature that fights the library's design. When structured responses land, they need to feel like they belong.
Given the strong community interest in this feature (as seen in #11), I'm prioritizing getting this right. Would you be interested in exploring this class-based approach, or would you prefer I handle the implementation?
@message.update!( | ||
role: message.role, | ||
content: message.content, | ||
model_id: message.model_id, | ||
tool_call_id: tool_call_id, | ||
input_tokens: message.input_tokens, | ||
output_tokens: message.output_tokens | ||
) | ||
# These are required fields: | ||
@message.role = message.role | ||
@message.content = message.content | ||
|
||
# These are optional fields: | ||
@message.try('model_id=', message.model_id) | ||
@message.try('tool_call_id=', tool_call_id) | ||
@message.try('input_tokens=', message.input_tokens) | ||
@message.try('output_tokens=', message.output_tokens) | ||
@message.try('content_schema=', message.schema) | ||
|
||
@message.save! | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what has this to do with structured outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 things here depending on what you're asking about @crmne :
Why use try
?
The repo documentation here says that there are required fields and the other fields are optional.
Either docs need to be changed and a major version increased to v2 to indicate semantic versioning reverse-incompatibility, or using try
lets it stay optional for now.
Why store the schema in the DB?
Since the LLMs send the entire thread on each request they also require the schema for each message sent. Without it, you can't continue conversations that were stored in the DB a later date.
In fact, I also later realized that the tool call param schemas need to be stored as well for the same reason.
There are probably ways to not store the schema and instead link it to a class locally, but that would break all past conversations if the response schema in the code changed.
@crmne let me know if you're wondering about something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to use a try
as the message
referenced there is a Message
instance which sets these parameters to nil
or a suitable default in case they don't exist.
Totally understand the schema needing to be in the message and up for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crmne Unless fallbacks were added to to the ActsAs
concern after this commit, you will get an error on save if you follow the docs exactly for the Rails models and don't add the output_tokens
column to your Rails model.
That's why the try
is needed. You can try it for yourself if you want.
@message.class != message.class
in this method - maybe that's what's causing confusion? I'm trying to add more docs to methods to reduce that confusion in my fork (which I'll add in to this PR)
Let me know if I missed something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you mean now! In the docs I have mistakenly kept the comment that says that these fields are optional. My bad. Fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha thanks - if you're updating the docs I'll just remove the try
s because others would not have been able to proceed anyways with basic functionality if they ran into the same issue as me. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did c48c8eb
That's a good idea. Let's remove the try
s.
## | ||
# @return [::RubyLLM::Schema, NilClass] | ||
def response_schema | ||
Thread.current['RubyLLM::Provider::Methods.response_schema'] | ||
end | ||
|
||
## | ||
# @param response_schema [::RubyLLM::Schema] | ||
def with_response_schema(response_schema) | ||
prev_response_schema = Thread.current['RubyLLM::Provider::Methods.response_schema'] | ||
|
||
result = nil | ||
begin | ||
Thread.current['RubyLLM::Provider::Methods.response_schema'] = response_schema | ||
|
||
result = yield | ||
ensure | ||
Thread.current['RubyLLM::Provider::Methods.response_schema'] = prev_response_schema | ||
end | ||
|
||
result | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's all this about threads? we have contexts now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a thread-safe way to set response schema.
I'll be able to refactor it to use context instead (assuming the context system thread-safe, haven't looked yet).
Hey @crmne thanks for the feedback! I'd be happy to update the PR with your suggestions. Adding the context system is amazing. I was going to suggest that, so now that it exists it'll greatly simplify my implementation. I'll work on it and submit my update for re-review. |
FYI. Did a bit of work towards class-defined schemas here #90 |
…lying API payload (#265) ## What this does Implements `with_request_options` (renamed from `with_options` due to ActiveRecord conflict -- see conversation) with @crmne's suggestions from comment #130 (review) and tested against all providers. This allows users to set arbitrary options on the payload before it's sent to the provider's API endpoint. The render_payload takes precedence. Demo: ```ruby chat = RubyLLM .chat(model: "qwen3", provider: :ollama) .with_request_options(response_format: {type: "json_object"}) .with_instructions("Answer with a JSON object with the key `result` and a numerical value.") response = chat.ask("What is the square root of 64?") response.content => "{\n \"result\": 8\n}" ``` This is a power-user feature, and is specific to each provider (and model, to a lesser extent). I added a brief section to the docs. For tests: different providers supported different options, so tests are divided by provider. (Note that `deep_merge` is required for Gemini in particular because it relies on a top-level `generationConfig` object.) ## Type of change - [ ] Bug fix - [X] New feature - [ ] Breaking change - [ ] Documentation - [ ] Performance improvement ## Scope check - [X] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [X] This aligns with RubyLLM's focus on **LLM communication** - [X] This isn't application-specific logic that belongs in user code - [X] This benefits most users, not just my specific use case ## Quality check - [ ] I ran `overcommit --install` and all hooks pass - [X] I tested my changes thoroughly - [X] I updated documentation if needed - [X] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [ ] Breaking change - [X] New public methods/classes - [ ] Changed method signatures - [ ] No API changes ## Related issues - #130 - #131 - #221 --------- Co-authored-by: Carmine Paolino <[email protected]>
Glad new |
Thank you for this PR! Your ideas have been incorporated into the structured output implementation that just landed in be85257. I've implemented many of your suggestions:
Key differences from the original PR:
Full documentation: https://rubyllm.com/guides/chat#structured-output-with-json-schemas-with_schema The feature will be available in v1.4.0. Thanks again for the initial implementation - it was very helpful in shaping the final design! |
Features
✅ Supports all features of
json_schema
with minimal text.✅ Automatically switches to
json_mode
. No need to think about it. Just ask for JSON.✅ Does not require maintenance when the APIs inevitably increase their support
✅ Thread-safe implementation
✅ Full test coverage
❌ To reduce initial complexity, Anthropic, Deepseek and Gemini are not yet supported. I can add this in another PR.
Sample Code
You can now ensure the responses follow a schema you define like this:
You can also provide the JSON schema you want directly to the method like this:
In this example the code is automatically switching to OpenAI's json_mode since no object properties are requested:
How was it tested?
Shout-outs
Related
Schema
class once either PR is merged)