-
-
Notifications
You must be signed in to change notification settings - Fork 190
Image to image with gemini-2.0-flash-preview-image-generation #248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thinking I need to move to content with attachments so the image gets sent properly on the next call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a draft and you mentioned it in a comment but you shouldn't add an images
attribute to the Message
object since we have the Content
object for a reason.
I realize this is a very different approach than the This has similar value as #152 but there is a bit of a clash as this introduces an I also am not sure exactly how/where to document this in the guides. @crmne Looking forward to your feedback/thoughts. |
This document describes the two approaches pretty well I think. I could see an implementation of Imagen in RubyLLM that looks more like the #152 approach. It looks like OpenAI supports conversational image generation through the responses API and a built in tool called "image_generation" - see here. |
I like how OpenAI allows you to reference the previous images via IDs. We really need to get support for these built-in tools via the responses API into RubyLLM. We are already doing it in a fork to get web_search_preview (see diff here) but it's pretty messy. |
What this does
Enable image-to-image generation with gemini-2.0-flash-preview-image-generation
Type of change
Scope check
Quality check
overcommit --install
and all hooks passmodels.json
,aliases.json
)API changes
Related issues
Screenshots
Here's what the test did.
Input

put this in a ring
Output

Second input
'change the background to blue'
Second output
