Compute Workspace
Edge Deployment
Run the HybrIE runtime at the edge in three topologies: fully local, hybrid Metal + CUDA, or a P2P mesh.
1. Local-only edge
Run the full runtime on a single Mac or Linux GPU box. All inference happens on-device — nothing leaves the machine. Restrict providers to local:
INFERENCE_ALLOWED_PROVIDERS=localLocal models (Qwen3 / Qwen3-Coder) run via Candle on Metal (Apple Silicon) or CUDA, serving the standard OpenAI-compatible API on port 8080.
2. Hybrid Metal + CUDA
Split a pipeline across machines: text encoding and VAE run locally on Metal, while heavy diffusion/decoding runs on a CUDA cloud node. Configure the remote endpoints in the runtime config:
[hybrid]
cloud_endpoints = ["http://<cuda-node>:8080"]Pair this with GPU instances launched from a Lambda offer for the CUDA side.
3. P2P mesh
Multiple runtime nodes discover each other and distribute model weights over a private Tailscale mesh, using the D2L desktop peer identity registry:
/v1/peers/v1/peers/register# See known peers on the mesh
curl http://localhost:8080/v1/peers
# Register this node with a peer
curl -X POST http://<peer>:8080/v1/peers/register \
-H "Content-Type: application/json" \
-d '{"node": "<this-node-tailscale-address>"}'Node discovery and model weight distribution work today. Remote per-component execution across the mesh isExperimental
Two peer registries
The runtime keeps two separate registries — don't confuse them:
| Registry | Purpose |
|---|---|
/v1/compute/peers | Compute worker mesh (new in v0.1.65): service endpoints — gRPC and realtime — used to route inference and realtime sessions to reachable nodes. Managed on Workers. |
/v1/peers | D2L desktop peer identity registry: node identity and discovery for the P2P mesh and model weight distribution, shown above. |
Choose local-only when data must stay on one machine, hybrid when a single device can't carry the heavy stages, and P2P when you have several nodes that should share models. All three are BYOC topologies.
