-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat: #1614 gpt-realtime migration (Realtime API GA) #1646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
examples/realtime/app/server.py
Outdated
# Disable server-side interrupt_response to avoid truncating assistant audio | ||
session_context = await runner.run( | ||
model_config={ | ||
"initial_model_settings": { | ||
"turn_detection": {"type": "semantic_vad", "interrupt_response": False} | ||
} | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to do this by default? why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explored some changes to make the audio output quality, but they're not related to the gpt-realtime migration. So, I've reverted all of them. I will continue seeing improvements for this example app, but it can be done with a separate pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was testing to change to new voices, this is taken from the examples (examples/realtime/app)
model_settings: RealtimeSessionModelSettings = {
"model_name": "gpt-realtime",
"modalities": ["text", "audio"],
"voice": "marin",
"speed": 1.0,
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"input_audio_transcription": {
"model": "gpt-4o-mini-transcribe",
},
"turn_detection": {"type": "semantic_vad", "threshold": 0.5},
# "instructions": "…", # optional
# "prompt": "…", # optional
# "tool_choice": "auto", # optional
# "tools": [], # optional
# "handoffs": [], # optional
# "tracing": {"enabled": False}, # optional
}
config = RealtimeRunConfig(model_settings=model_settings)
runner = RealtimeRunner(starting_agent=get_starting_agent())
I noticied that voice is changed but I lost all agents handoff, tool, etc.
I setted config via RealtimeRunConfig and RealtimeModelConfig. In both cases happened the same.
examples/realtime/app/server.py
Outdated
@@ -93,7 +111,9 @@ async def _serialize_event(self, event: RealtimeSessionEvent) -> dict[str, Any]: | |||
base_event["tool"] = event.tool.name | |||
base_event["output"] = str(event.output) | |||
elif event.type == "audio": | |||
base_event["audio"] = base64.b64encode(event.audio.data).decode("utf-8") | |||
# Coalesce raw PCM and flush on a steady timer for smoother playback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just a quality improvement? would be nice to make it be a separate PR if so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, same with above (I won't repeat this for the rest)
a4333dd
to
f02b096
Compare
Hello, Any ETA on this one? I could be using it right now. :) Cheers, Thomas |
Hi @seratch, do you know if this PR is going to be merged this week? No pressure, just to know ETA in this cases. Thanks you very much! By the way, class OpenAIRealtimeWebSocketModel(RealtimeModel) has "gpt-4o-realtime-preview" by default (and you can't change it). Should by nice to set to "gpt-realtime". |
not to speak for @seratch, but this is probably mostly dependent more on the review from @rm-openai |
@seratch : FYI, noted that with OpenAI 1.107.0, I get this import error using your branch: File "\.venv\Lib\site-packages\agents\realtime\__init__.py", line 84, in <module>
from .openai_realtime import (
...<3 lines>...
)
File "\.venv\Lib\site-packages\agents\realtime\openai_realtime.py", line 32, in <module>
from openai.types.realtime.realtime_audio_config import (
...<3 lines>...
)
ImportError: cannot import name 'Input' from 'openai.types.realtime.realtime_audio_config' (\.venv\Lib\site-packages\openai\types\realtime\realtime_audio_config.py) |
@KelSolaar Thanks for letting me know this! Will resolve the conflicts. |
You are very much welcome! The new model has also mostly solved the issue I reported here: #1681 |
@rm-openai @seratch What about changing OpenAIRealtimeWebSocketModel(RealtimeModel) model from "gpt-4o-realtime-preview" to "gpt-realtime"? Should be nice to have it as default, or better, to make possible to select realtime model to use. |
@na-proyectran This pull request already does the change. Once this is released, the default model will be changed. Right now, we're waiting for the underlying |
Not the only, in openai-python (release 1.107.0) they removed other things like: from openai.types.realtime.realtime_tools_config_union import ( from openai.types.realtime.realtime_audio_config import ( |
sounds great! do you have an idea when that will be? should I think of days, weeks, months? thanks! |
The pull request is essentially functional as is and can be tested, just make sure that you pin your requirements:
|
Hello, I'm looking for image input, and unless I'm missing something, it is not supported at the moment right? From @classmethod
def convert_user_input_to_conversation_item(
cls, event: RealtimeModelSendUserInput
) -> OpenAIConversationItem:
user_input = event.user_input
if isinstance(user_input, dict):
return RealtimeConversationItemUserMessage(
type="message",
role="user",
content=[
Content(
type="input_text",
text=item.get("text"),
)
for item in user_input.get("content", [])
],
)
else:
return RealtimeConversationItemUserMessage(
type="message",
role="user",
content=[Content(type="input_text", text=user_input)],
) The API should look like this: {
"type": "conversation.item.create",
"previous_item_id": null,
"item": {
"type": "message",
"role": "user",
"content": [
{
"type": "input_image",
"image_url": "data:image/{format(example: png)};base64,{some_base64_image_bytes}"
}
]
}
} |
@KelSolaar Thanks for pointing the lack out. The image input should be supported but it's missing here now. I will update the code to cover the use case too. |
Thanks a ton and sorry for making this PR harder to push through! |
It's, just pointing new openai release.
I mean, should by nice to sync with last openai release |
Besides the default model defined in it, I think the real-time model in the master also uses beta data structures defined in the OpenAI SDK package. I hope this PR can solve this issue. Don't want to press on, but is there any ETA on the release? thanks |
30bbd8d
to
7afde98
Compare
this is still in progress but will resolve #1614