-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Summary
- There is no reliable, built-in way to interrupt an ongoing bidirectional audio stream using
LiveRequestQueue
without tearing down the session. Attempts to interrupt viasend_content()
or by sending realtime fake audio bytes result in the agent resuming the previous context or the session getting stuck/paused. - Request: Provide a safe, interrupt mechanism that immediately cancels ongoing audio generation manually when required, similar to how audio input interruption is handled.
Bug Description
There is no safe, built-in way to interrupt an ongoing bidirectional audio stream using LiveRequestQueue
. When I try to redirect the audio agent to talk about a new topic mid-utterance via text, it either:
- Pauses briefly, then resumes talking about the previous context, only acknowledging the new
send_content()
afterward; or - Gets stuck/paused when using realtime empty audio bytes, and the session does not reliably resume.
Minimal Reproduction (Pseudocode; follows ADK patterns)
- Start a session with a
Runner
and aLiveRequestQueue
. Request long-form audio output from the agent. - Attempt to interrupt mid-stream.
Option A: Interrupt via content only
live_request_queue.send_activity_start()
live_request_queue.send_content(content=content)
live_request_queue.send_activity_end()
Observed:
- Sometimes the audio appears to be interrupted briefly, but the agent continues the previous context. Only after finishing does it process the newly sent
content
.
Option B: Use realtime frames as an interrupt signal, then send content
live_request_queue.send_activity_start()
live_request_queue.send_realtime(Blob(data=b"", mime_type="audio/pcm"))
live_request_queue.send_activity_end()
# Send content afterward
live_request_queue.send_activity_start()
live_request_queue.send_content(content=content)
live_request_queue.send_activity_end()
Observed:
- Using
send_realtime
with atext/plain
Blob as an interrupt marker triggers a 1007 “invalid frame payload data” WebSocket error in the LLM flow. - Using an empty audio Blob (
audio/pcm
with zero bytes) sometimes “pauses” the decoder and the session fails to resume properly on the next content—even when wrapped withactivity_start/content/activity_end
.
Additional Observations
- If we close the
LiveRequestQueue
to interrupt, it tears down the entire session, which degrades UX. - If we call
activity_end
without a proper interrupt, the LLM still treats audio as ongoing; subsequent input lags or is ignored until a timeout. - Net effect: users cannot reliably “soft cancel” current audio and redirect the agent without risking stuck/paused behavior or session teardown.
Expected Behavior
send_content()
(or a dedicated interrupt API) should reliably interrupt fast, ongoing audio generation and immediately switch focus to the new content, similar to how the system responds when new audio input is received.
Environment
- OS: macOS, Windows
- Python: 3.13.3
- ADK: 1.14.1 (
pip show google-adk
)
Model Information
- LiteLLM: No (using
google-adk
for Python) - Model:
gemini-live-2.5-flash-preview
Impact
UI events (sent as content text) aren't interrupting the ongoing BiDi audio stream and ends up creating a worse experience.
Workarounds Tried
send_content()
during an active stream: intermittent pause, then resume previous context; new content applied only after prior output completes.send_realtime
withtext/plain
: causes 1007 WebSocket error.send_realtime
with emptyaudio/pcm
: sometimes pauses/locks the decoder; session does not reliably resume.- Closing
LiveRequestQueue
: interrupts but tears down the session.
I'll be happy to test a proposed method (e.g., interrupt_current_activity()
) or guidance on the canonical pattern for a soft cancel.