Engineering Workspace
Realtime Voice
Bidirectional voice over a single WebSocket — bring your own client, pick a managed provider or the in-process HybrIE voice engine, and meter minutes the same way you meter tokens.
Endpoint
/api/v1/inference/realtimeThe console exposes the runtime's realtime WebSocket at wss://api.stimulir.com/api/v1/inference/realtime. Authenticate by sending Authorization: Bearer hyb_… on the upgrade request — the same hyb_* API key you use for chat completions. In BYOC deployments the runtime serves the raw endpoint directly:
/v1/realtimeProvider modes
Realtime exposes a neutral voice session over WebSocket and bridges it to the active provider. Pick a provider with the provider query parameter; omit it to use the in-process HybrIE voice engine.
Managed (Gemini)
Append ?provider=gemini to route the session to Google's native-audio Gemini models. A typical model id is gemini-2.5-flash-native-audio-preview-12-2025. Billing falls under Managed Inference; voice minutes are metered separately from text tokens.
wss://api.stimulir.com/api/v1/inference/realtime?provider=gemini
# Upgrade header: Authorization: Bearer hyb_...
# Session model: gemini-2.5-flash-native-audio-preview-12-2025HybrIE-native (Qwen2.5-Omni)Experimental
Omit the provider query string and the session is served in-process by the HybrIE voice engine on the GPU node — currently Qwen2.5-Omni-3B. Configurable on the runtime with HYBRIE_REALTIME_MODEL. Qwen2.5-Omni is the same voice model used for the local /v1/audio/speech TTS endpoint.
wss://api.stimulir.com/api/v1/inference/realtime
# Upgrade header: Authorization: Bearer hyb_...
# Session model: Qwen2.5-Omni-3BEvent shape
Clients exchange JSON events over the socket. The schema is provider-neutral and bridged to the active provider's voice protocol — open a session, stream input audio frames, and receive output audio deltas. Shape only:
// client → server
{"type": "session.update", "session": {"model": "Qwen2.5-Omni-3B"}}
{"type": "input_audio_buffer.append", "audio": "<base64 pcm16>"}
{"type": "input_audio_buffer.commit"}
// server → client
{"type": "session.created"}
{"type": "response.output_audio.delta", "audio": "<base64 pcm16>"}
{"type": "response.done"}Field names mirror the widely-deployed OpenAI realtime envelope for client familiarity. The runtime translates this neutral schema into the active provider's protocol — your client code does not change when you swap ?provider=gemini for the HybrIE-native session.
Metering
Voice minutes are metered per session and surface as usage events with modality: "voice_realtime". Group billing by this field with /api/v1/usage/summary?group_by=modality — see Usage & Billing.
