Engineering Workspace

Deployment Modes

Three ways to serve tokens: your provider keys, Stimulir-managed capacity, or your own compute. All three are metered through the same console.

BYOK — Bring Your Own Keys

You attach your own provider API keys as BYOK credentials. The console routes matching inference traffic upstream with your key and meters every request. The provider bills you directly for tokens at their rates; Stimulir adds routing, key management, and usage visibility on top.

Control: full — your provider account, your rate limits, your data agreements.
Billing: token cost paid to the provider on your own account.
Tradeoff: you manage provider accounts and quotas yourself.

Managed Inference (MI)

Stimulir-managed upstream capacity. Requests that don't match a BYOK credential are served from managed providers: nebius, aws_bedrock, together_ai, and azure_openai. No provider accounts to set up — create a key and call the endpoint.

Control: Stimulir selects and operates the upstream capacity.
Billing: metered through your Stimulir workspace.
Tradeoff: simplest to start; less direct control over the upstream provider.

BYOC — Bring Your Own Compute

Run the HybrIE runtime on your own nodes or edge hardware and point the CLI at it. Inference, training, and adapters execute on hardware you own — local Qwen3 / Qwen3-Coder models via Candle on Metal or CUDA, with optional cloud fallback. See Edge Deployment for the local, hybrid, and P2P topologies.

Control: maximum — your hardware, your network, data never has to leave it.
Billing: you pay for your own hardware/cloud nodes; no per-token provider cost for local models.
Tradeoff: you operate the runtime and provision the GPUs.

Comparison

	BYOK	Managed Inference	BYOC
Who runs the models	Your provider accounts	Stimulir-managed providers (nebius, aws_bedrock, together_ai, azure_openai)	You — HybrIE runtime on your nodes
Setup effort	Add provider keys	None beyond an API key	Provision nodes, run the runtime
Token billing	Provider bills you directly	Metered via Stimulir	Your own hardware costs
Data path	Console → your provider	Console → managed provider	Stays on your infrastructure
Metering	Per request via console	Per request via console	Runtime-level

Modes are not exclusive: a single workspace can use BYOK for some model prefixes, fall back to Managed Inference for others, and run BYOC nodes for local workloads. The billing snapshot breaks usage down by mode (mi / byok / byoc).

BYOK — Bring Your Own Keys#

Managed Inference (MI)#

BYOC — Bring Your Own Compute#

Comparison#

BYOK — Bring Your Own Keys

Managed Inference (MI)

BYOC — Bring Your Own Compute

Comparison