Engineering Workspace

Deployment Modes

Three ways to serve tokens: your provider keys, Stimulir-managed capacity, or your own compute. All three are metered through the same console.

BYOK — Bring Your Own Keys

You attach your own provider API keys as BYOK credentials. The console routes matching inference traffic upstream with your key and meters every request. The provider bills you directly for tokens at their rates; Stimulir adds routing, key management, and usage visibility on top.

  • Control: full — your provider account, your rate limits, your data agreements.
  • Billing: token cost paid to the provider on your own account.
  • Tradeoff: you manage provider accounts and quotas yourself.

Managed Inference (MI)

Stimulir-managed upstream capacity. Requests that don't match a BYOK credential are served from managed providers: nebius, aws_bedrock, together_ai, and azure_openai. No provider accounts to set up — create a key and call the endpoint.

  • Control: Stimulir selects and operates the upstream capacity.
  • Billing: metered through your Stimulir workspace.
  • Tradeoff: simplest to start; less direct control over the upstream provider.

BYOC — Bring Your Own Compute

Run the HybrIE runtime on your own nodes or edge hardware and point the CLI at it. Inference, training, and adapters execute on hardware you own — local Qwen3 / Qwen3-Coder models via Candle on Metal or CUDA, with optional cloud fallback. See Edge Deployment for the local, hybrid, and P2P topologies.

  • Control: maximum — your hardware, your network, data never has to leave it.
  • Billing: you pay for your own hardware/cloud nodes; no per-token provider cost for local models.
  • Tradeoff: you operate the runtime and provision the GPUs.

Comparison

BYOKManaged InferenceBYOC
Who runs the modelsYour provider accountsStimulir-managed providers (nebius, aws_bedrock, together_ai, azure_openai)You — HybrIE runtime on your nodes
Setup effortAdd provider keysNone beyond an API keyProvision nodes, run the runtime
Token billingProvider bills you directlyMetered via StimulirYour own hardware costs
Data pathConsole → your providerConsole → managed providerStays on your infrastructure
MeteringPer request via consolePer request via consoleRuntime-level

Modes are not exclusive: a single workspace can use BYOK for some model prefixes, fall back to Managed Inference for others, and run BYOC nodes for local workloads. The billing snapshot breaks usage down by mode (mi / byok / byoc).