Engineering Workspace

Realtime Voice

Bidirectional voice over a single WebSocket — bring your own client, pick a managed provider or the in-process HybrIE voice engine, and meter minutes the same way you meter tokens.

Endpoint

WS/api/v1/inference/realtime

The console exposes the runtime's realtime WebSocket at wss://api.stimulir.com/api/v1/inference/realtime. Authenticate by sending Authorization: Bearer hyb_… on the upgrade request — the same hyb_* API key you use for chat completions. In BYOC deployments the runtime serves the raw endpoint directly:

WS/v1/realtime

Provider modes

Realtime exposes a neutral voice session over WebSocket and bridges it to the active provider. Pick a provider with the provider query parameter; omit it to use the in-process HybrIE voice engine.

Managed (Gemini)

Append ?provider=gemini to route the session to Google's native-audio Gemini models. A typical model id is gemini-2.5-flash-native-audio-preview-12-2025. Billing falls under Managed Inference; voice minutes are metered separately from text tokens.

Connect — managed Gemini
wss://api.stimulir.com/api/v1/inference/realtime?provider=gemini
# Upgrade header: Authorization: Bearer hyb_...
# Session model:  gemini-2.5-flash-native-audio-preview-12-2025

HybrIE-native (Qwen2.5-Omni)Experimental

Omit the provider query string and the session is served in-process by the HybrIE voice engine on the GPU node — currently Qwen2.5-Omni-3B. Configurable on the runtime with HYBRIE_REALTIME_MODEL. Qwen2.5-Omni is the same voice model used for the local /v1/audio/speech TTS endpoint.

Connect — HybrIE-native
wss://api.stimulir.com/api/v1/inference/realtime
# Upgrade header: Authorization: Bearer hyb_...
# Session model:  Qwen2.5-Omni-3B

Event shape

Clients exchange JSON events over the socket. The schema is provider-neutral and bridged to the active provider's voice protocol — open a session, stream input audio frames, and receive output audio deltas. Shape only:

bidirectional events
// client → server
{"type": "session.update", "session": {"model": "Qwen2.5-Omni-3B"}}
{"type": "input_audio_buffer.append", "audio": "<base64 pcm16>"}
{"type": "input_audio_buffer.commit"}

// server → client
{"type": "session.created"}
{"type": "response.output_audio.delta", "audio": "<base64 pcm16>"}
{"type": "response.done"}

Field names mirror the widely-deployed OpenAI realtime envelope for client familiarity. The runtime translates this neutral schema into the active provider's protocol — your client code does not change when you swap ?provider=gemini for the HybrIE-native session.

Metering

Voice minutes are metered per session and surface as usage events with modality: "voice_realtime". Group billing by this field with /api/v1/usage/summary?group_by=modality — see Usage & Billing.