Lab Workspace
PEFT Tuning (LoRA)
Classical parameter-efficient fine-tuning on HybrIE — the primary, stable training route. Train small LoRA adapters on top of frozen base models, version and register them with the runtime, and score every artifact against the zero-LoRA baseline.
What PEFT LoRA is
PEFT (parameter-efficient fine-tuning) trains a small set of LoRA weight matrices on top of a frozen base model. The base weights never change — an adapter only adds low-rank deltas with an explicit rank (r) and alpha scaling to the attention/MLP modules listed in target_modules. The result is a checkpoint that is orders of magnitude smaller than a full fine-tune, cheap to train, and cheap to swap at serving time.
Adapters can be trained against these base model families:
| Family | base_model value |
|---|---|
| Qwen3 4B | qwen3-4b |
| Qwen3 0.6B | qwen3-0.6b |
| Mistral 7B | mistral-7b |
PEFT LoRA vs Doc-to-LoRA. These are different things — never conflate them. PEFT LoRA is classical parameter-efficient fine-tuning: you train one adapter with an explicit rank, alpha, and target modules, and you tune those knobs. Doc-to-LoRA is context internalization: a trained hypernetwork generates adapters from documents on the fly — there is no rank or alpha for you to tune, and its training endpoint trains the hypernetwork itself, not an adapter.
Adapter artifacts are versioned and immutable — retraining the same adapter produces a new version rather than mutating the existing one, so a version you serve today behaves identically tomorrow.
Train an adapter (SFT)
/v1/train/sftSupervised fine-tuning of one plain PEFT LoRA adapter — the standard supervised route, and the SFT counterpart of RL training's peft-lora policy. Context is inlined in the prompt, as in standard SFT — /v1/train/d2l remains the hypernetwork path. The run writes a real PEFT adapter directory (adapter_model.safetensors + adapter_config.json) directly consumable by POST /v1/eval/adapter, POST /v1/eval/rl with policy=peft-lora, and as an init_checkpoint_dir warm-start for /v1/train/rl.
| Parameter | Type | Description |
|---|---|---|
familyrequired | string | Base model family — qwen3-4b, qwen3-0.6b, or mistral-7b. |
lora_rank | integer | LoRA rank (r) for the trained adapter; overrides the family preset. |
init_adapter_dir | string | Warm-start from an exported PEFT adapter directory instead of a fresh initialization. |
num_examples | integer | Number of training examples. |
epochs | integer | Training epochs over the example set. |
lr | float | Learning rate. |
l1_coef | float | L1 regulariser coefficient applied to the adapter weights. |
eval_examples | integer | Held-out eval size. When > 0, a held-out eval runs automatically once training finishes. |
seed | integer | Seed for reproducible example generation. |
device | string | auto, cuda, metal, or cpu. |
model_dir | string | Optional base-model directory override. |
checkpoint_dir | string | Artifact destination for the trained adapter directory. Default ~/hybrie-mounts/d2l-artifacts/sft-<job_id>/. |
curl -X POST http://localhost:8080/v1/train/sft \
-H "Content-Type: application/json" \
-d '{
"family": "qwen3-4b",
"lora_rank": 16,
"epochs": 3,
"eval_examples": 100
}'Response (202 Accepted):
{
"job_id": "sft-1718102400",
"checkpoint_dir": "~/hybrie-mounts/d2l-artifacts/sft-1718102400/"
}SFT runs appear in Training Jobs as kind sft-peft-lora. Progress points stream step / loss samples — for SFT jobs the reward field carries the cross-entropy term. Runs are cancellable via DELETE /v1/train/jobs/:id, with partial checkpoints kept.
SFT → GRPO workflow. Train with SFT first, then optionally continue with reward-driven GRPO: pass the SFT checkpoint directory as init_checkpoint_dir to /v1/train/rl with policy=peft-lora.
Train an adapter (GRPO)
/v1/train/rlFor reward-driven training, RL training with policy=peft-lora trains one plain PEFT adapter directly against a verifiable environment with GRPO. The run writes the same standard adapter directory (adapter_model.safetensors + adapter_config.json) you can register below and score with POST /v1/eval/adapter.
| Parameter | Type | Description |
|---|---|---|
familyrequired | string | Base model family — qwen3-4b, qwen3-0.6b, or mistral-7b. |
policy | string | Set to peft-lora to train a classical PEFT LoRA adapter (default is hypernet). |
lora_rank | integer | LoRA rank (r) for the trained adapter; overrides the family preset. |
init_checkpoint_dir | string | Warm-start from an existing PEFT adapter directory — e.g. an SFT checkpoint — instead of a fresh initialization. |
environment | string | Verifiable environment providing the reward signal. Default niah. |
prompts | integer | Number of environment prompts to train on. |
group_size | integer | GRPO group size — completions sampled per prompt. |
lr | float | Learning rate. |
kl_beta | float | KL penalty coefficient against the reference policy. |
seed | integer | Seed for reproducible task generation. |
device | string | auto, metal, cuda, or cpu. |
checkpoint_dir | string | Artifact destination for the trained adapter directory. |
curl -X POST http://localhost:8080/v1/train/rl \
-H "Content-Type: application/json" \
-d '{
"family": "qwen3-4b",
"policy": "peft-lora",
"lora_rank": 16,
"prompts": 64
}'Response (202 Accepted):
{
"job_id": "train-1718102400",
"checkpoint_dir": "~/hybrie-mounts/d2l-artifacts/train-1718102400/"
}See RL Training (GRPO) for the full training mechanics — group-relative advantages, environments, cancellation — and Training Jobs for tracking the run.
Four ways to produce a PEFT-format adapter: SFT training via /v1/train/sft (the primary supervised route), GRPO RL training via /v1/train/rl with policy=peft-lora, exporting from the D2L pipeline (its adapter artifacts are PEFT-format), or registering an externally trained adapter below.
Register an external adapter
/v1/adaptersAdapters trained outside the platform can be registered directly with the runtime — point artifact_ref at the adapter weights and the runtime serves them like any other registered adapter.
| Parameter | Type | Description |
|---|---|---|
idrequired | string | Stable identifier used to reference the adapter at load and inference time. |
version | string | Version of the adapter artifact. Versions are immutable once registered. |
base_modelrequired | string | Base model family the adapter was trained on — qwen3-4b, qwen3-0.6b, or mistral-7b. |
rankrequired | integer | LoRA rank (r) — the dimensionality of the low-rank update matrices. |
alpha | integer | LoRA alpha scaling factor applied to the update. |
target_modules | string[] | Model modules the LoRA matrices attach to (e.g. attention projections). |
artifact_ref | string | Path or URI to the adapter weights. |
curl -X POST http://localhost:8080/v1/adapters \
-H "Content-Type: application/json" \
-d '{
"id": "refund-policy",
"version": "v3",
"base_model": "qwen3-4b",
"rank": 16,
"alpha": 32,
"target_modules": ["q_proj", "k_proj", "v_proj", "o_proj"],
"artifact_ref": "/var/hybrie/adapters/refund-policy/v3"
}'Score it
/v1/eval/adapterScore any PEFT adapter directory on held-out NIAH against the zero-LoRA baseline — the report includes adapter_acc, base_acc, and lift, plus the adapter's own r / lora_alpha / target_modules so the score is self-describing. See Evaluation for the full parameter and report reference.
stimulir lab eval adapter --family qwen3-4b --adapter-dir <dir>Adapter registry
The runtime keeps a registry of adapters it knows how to serve.
/v1/adaptersList all registered adapters. The response also reports the serving configuration: lora_mode, max_active_loras, and loaded_loras[] — the adapters currently loaded into the runtime.
/v1/adapters/:idMetadata for a single adapter — base model, rank/alpha, target modules, and versions.
/v1/adapters/statusHealth of the adapter subsystem: what is registered, what is currently loaded, and serving readiness.
From the CLI
# Train a classical PEFT LoRA adapter with SFT — the standard supervised route
stimulir lab train sft --family qwen3-4b --lora-rank 16 --epochs 3
# Or train with GRPO against a verifiable environment
stimulir lab train rl --family qwen3-4b --policy peft-lora --lora-rank 16 --prompts 64
# Score it on held-out NIAH
stimulir lab eval adapter --family qwen3-4b --adapter-dir <dir>
# List registered adapters
stimulir lab adapters listNext
- Full GRPO mechanics — environments, KL penalty, cancellation — in RL Training (GRPO).
- Serve adapters at runtime — load, unload, and per-request routing — in Hot-swap Inference.
- Internalize documents as adapters with a hypernetwork in Doc-to-LoRA (Context Internalization).
- Track asynchronous training runs in Training Jobs.
