Engineering Workspace
Deployment Modes
Three ways to serve tokens: your provider keys, Stimulir-managed capacity, or your own compute. All three are metered through the same console.
BYOK — Bring Your Own Keys
You attach your own provider API keys as BYOK credentials. The console routes matching inference traffic upstream with your key and meters every request. The provider bills you directly for tokens at their rates; Stimulir adds routing, key management, and usage visibility on top.
- Control: full — your provider account, your rate limits, your data agreements.
- Billing: token cost paid to the provider on your own account.
- Tradeoff: you manage provider accounts and quotas yourself.
Managed Inference (MI)
Stimulir-managed upstream capacity. Requests that don't match a BYOK credential are served from managed providers: nebius, aws_bedrock, together_ai, and azure_openai. No provider accounts to set up — create a key and call the endpoint.
- Control: Stimulir selects and operates the upstream capacity.
- Billing: metered through your Stimulir workspace.
- Tradeoff: simplest to start; less direct control over the upstream provider.
BYOC — Bring Your Own Compute
Run the HybrIE runtime on your own nodes or edge hardware and point the CLI at it. Inference, training, and adapters execute on hardware you own — local Qwen3 / Qwen3-Coder models via Candle on Metal or CUDA, with optional cloud fallback. See Edge Deployment for the local, hybrid, and P2P topologies.
- Control: maximum — your hardware, your network, data never has to leave it.
- Billing: you pay for your own hardware/cloud nodes; no per-token provider cost for local models.
- Tradeoff: you operate the runtime and provision the GPUs.
Comparison
| BYOK | Managed Inference | BYOC | |
|---|---|---|---|
| Who runs the models | Your provider accounts | Stimulir-managed providers (nebius, aws_bedrock, together_ai, azure_openai) | You — HybrIE runtime on your nodes |
| Setup effort | Add provider keys | None beyond an API key | Provision nodes, run the runtime |
| Token billing | Provider bills you directly | Metered via Stimulir | Your own hardware costs |
| Data path | Console → your provider | Console → managed provider | Stays on your infrastructure |
| Metering | Per request via console | Per request via console | Runtime-level |
Modes are not exclusive: a single workspace can use BYOK for some model prefixes, fall back to Managed Inference for others, and run BYOC nodes for local workloads. The billing snapshot breaks usage down by mode (mi / byok / byoc).
