Compute Workspace

Edge Deployment

Run the HybrIE runtime at the edge in three topologies: fully local, hybrid Metal + CUDA, or a P2P mesh.

1. Local-only edge

Run the full runtime on a single Mac or Linux GPU box. All inference happens on-device — nothing leaves the machine. Restrict providers to local:

Environment
INFERENCE_ALLOWED_PROVIDERS=local

Local models (Qwen3 / Qwen3-Coder) run via Candle on Metal (Apple Silicon) or CUDA, serving the standard OpenAI-compatible API on port 8080.

2. Hybrid Metal + CUDA

Split a pipeline across machines: text encoding and VAE run locally on Metal, while heavy diffusion/decoding runs on a CUDA cloud node. Configure the remote endpoints in the runtime config:

config
[hybrid]
cloud_endpoints = ["http://<cuda-node>:8080"]

Pair this with GPU instances launched from a Lambda offer for the CUDA side.

3. P2P mesh

Multiple runtime nodes discover each other and distribute model weights over a private Tailscale mesh, using the D2L desktop peer identity registry:

GET/v1/peers
POST/v1/peers/register
curl
# See known peers on the mesh
curl http://localhost:8080/v1/peers

# Register this node with a peer
curl -X POST http://<peer>:8080/v1/peers/register \
  -H "Content-Type: application/json" \
  -d '{"node": "<this-node-tailscale-address>"}'

Node discovery and model weight distribution work today. Remote per-component execution across the mesh isExperimental

Two peer registries

The runtime keeps two separate registries — don't confuse them:

RegistryPurpose
/v1/compute/peersCompute worker mesh (new in v0.1.65): service endpoints — gRPC and realtime — used to route inference and realtime sessions to reachable nodes. Managed on Workers.
/v1/peersD2L desktop peer identity registry: node identity and discovery for the P2P mesh and model weight distribution, shown above.

Choose local-only when data must stay on one machine, hybrid when a single device can't carry the heavy stages, and P2P when you have several nodes that should share models. All three are BYOC topologies.