Lab Workspace

PEFT Tuning (LoRA)

Classical parameter-efficient fine-tuning on HybrIE — the primary, stable training route. Train small LoRA adapters on top of frozen base models, version and register them with the runtime, and score every artifact against the zero-LoRA baseline.

What PEFT LoRA is

PEFT (parameter-efficient fine-tuning) trains a small set of LoRA weight matrices on top of a frozen base model. The base weights never change — an adapter only adds low-rank deltas with an explicit rank (r) and alpha scaling to the attention/MLP modules listed in target_modules. The result is a checkpoint that is orders of magnitude smaller than a full fine-tune, cheap to train, and cheap to swap at serving time.

Adapters can be trained against these base model families:

Familybase_model value
Qwen3 4Bqwen3-4b
Qwen3 0.6Bqwen3-0.6b
Mistral 7Bmistral-7b

PEFT LoRA vs Doc-to-LoRA. These are different things — never conflate them. PEFT LoRA is classical parameter-efficient fine-tuning: you train one adapter with an explicit rank, alpha, and target modules, and you tune those knobs. Doc-to-LoRA is context internalization: a trained hypernetwork generates adapters from documents on the fly — there is no rank or alpha for you to tune, and its training endpoint trains the hypernetwork itself, not an adapter.

Adapter artifacts are versioned and immutable — retraining the same adapter produces a new version rather than mutating the existing one, so a version you serve today behaves identically tomorrow.

Train an adapter (SFT)

POST/v1/train/sft

Supervised fine-tuning of one plain PEFT LoRA adapter — the standard supervised route, and the SFT counterpart of RL training's peft-lora policy. Context is inlined in the prompt, as in standard SFT — /v1/train/d2l remains the hypernetwork path. The run writes a real PEFT adapter directory (adapter_model.safetensors + adapter_config.json) directly consumable by POST /v1/eval/adapter, POST /v1/eval/rl with policy=peft-lora, and as an init_checkpoint_dir warm-start for /v1/train/rl.

ParameterTypeDescription
familyrequiredstringBase model family — qwen3-4b, qwen3-0.6b, or mistral-7b.
lora_rankintegerLoRA rank (r) for the trained adapter; overrides the family preset.
init_adapter_dirstringWarm-start from an exported PEFT adapter directory instead of a fresh initialization.
num_examplesintegerNumber of training examples.
epochsintegerTraining epochs over the example set.
lrfloatLearning rate.
l1_coeffloatL1 regulariser coefficient applied to the adapter weights.
eval_examplesintegerHeld-out eval size. When > 0, a held-out eval runs automatically once training finishes.
seedintegerSeed for reproducible example generation.
devicestringauto, cuda, metal, or cpu.
model_dirstringOptional base-model directory override.
checkpoint_dirstringArtifact destination for the trained adapter directory. Default ~/hybrie-mounts/d2l-artifacts/sft-<job_id>/.
curl
curl -X POST http://localhost:8080/v1/train/sft \
  -H "Content-Type: application/json" \
  -d '{
    "family": "qwen3-4b",
    "lora_rank": 16,
    "epochs": 3,
    "eval_examples": 100
  }'

Response (202 Accepted):

json
{
  "job_id": "sft-1718102400",
  "checkpoint_dir": "~/hybrie-mounts/d2l-artifacts/sft-1718102400/"
}

SFT runs appear in Training Jobs as kind sft-peft-lora. Progress points stream step / loss samples — for SFT jobs the reward field carries the cross-entropy term. Runs are cancellable via DELETE /v1/train/jobs/:id, with partial checkpoints kept.

SFT → GRPO workflow. Train with SFT first, then optionally continue with reward-driven GRPO: pass the SFT checkpoint directory as init_checkpoint_dir to /v1/train/rl with policy=peft-lora.

Train an adapter (GRPO)

POST/v1/train/rl

For reward-driven training, RL training with policy=peft-lora trains one plain PEFT adapter directly against a verifiable environment with GRPO. The run writes the same standard adapter directory (adapter_model.safetensors + adapter_config.json) you can register below and score with POST /v1/eval/adapter.

ParameterTypeDescription
familyrequiredstringBase model family — qwen3-4b, qwen3-0.6b, or mistral-7b.
policystringSet to peft-lora to train a classical PEFT LoRA adapter (default is hypernet).
lora_rankintegerLoRA rank (r) for the trained adapter; overrides the family preset.
init_checkpoint_dirstringWarm-start from an existing PEFT adapter directory — e.g. an SFT checkpoint — instead of a fresh initialization.
environmentstringVerifiable environment providing the reward signal. Default niah.
promptsintegerNumber of environment prompts to train on.
group_sizeintegerGRPO group size — completions sampled per prompt.
lrfloatLearning rate.
kl_betafloatKL penalty coefficient against the reference policy.
seedintegerSeed for reproducible task generation.
devicestringauto, metal, cuda, or cpu.
checkpoint_dirstringArtifact destination for the trained adapter directory.
curl
curl -X POST http://localhost:8080/v1/train/rl \
  -H "Content-Type: application/json" \
  -d '{
    "family": "qwen3-4b",
    "policy": "peft-lora",
    "lora_rank": 16,
    "prompts": 64
  }'

Response (202 Accepted):

json
{
  "job_id": "train-1718102400",
  "checkpoint_dir": "~/hybrie-mounts/d2l-artifacts/train-1718102400/"
}

See RL Training (GRPO) for the full training mechanics — group-relative advantages, environments, cancellation — and Training Jobs for tracking the run.

Four ways to produce a PEFT-format adapter: SFT training via /v1/train/sft (the primary supervised route), GRPO RL training via /v1/train/rl with policy=peft-lora, exporting from the D2L pipeline (its adapter artifacts are PEFT-format), or registering an externally trained adapter below.

Register an external adapter

POST/v1/adapters

Adapters trained outside the platform can be registered directly with the runtime — point artifact_ref at the adapter weights and the runtime serves them like any other registered adapter.

ParameterTypeDescription
idrequiredstringStable identifier used to reference the adapter at load and inference time.
versionstringVersion of the adapter artifact. Versions are immutable once registered.
base_modelrequiredstringBase model family the adapter was trained on — qwen3-4b, qwen3-0.6b, or mistral-7b.
rankrequiredintegerLoRA rank (r) — the dimensionality of the low-rank update matrices.
alphaintegerLoRA alpha scaling factor applied to the update.
target_modulesstring[]Model modules the LoRA matrices attach to (e.g. attention projections).
artifact_refstringPath or URI to the adapter weights.
curl
curl -X POST http://localhost:8080/v1/adapters \
  -H "Content-Type: application/json" \
  -d '{
    "id": "refund-policy",
    "version": "v3",
    "base_model": "qwen3-4b",
    "rank": 16,
    "alpha": 32,
    "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj"],
    "artifact_ref": "/var/hybrie/adapters/refund-policy/v3"
  }'

Score it

POST/v1/eval/adapter

Score any PEFT adapter directory on held-out NIAH against the zero-LoRA baseline — the report includes adapter_acc, base_acc, and lift, plus the adapter's own r / lora_alpha / target_modules so the score is self-describing. See Evaluation for the full parameter and report reference.

bash
stimulir lab eval adapter --family qwen3-4b --adapter-dir <dir>

Adapter registry

The runtime keeps a registry of adapters it knows how to serve.

GET/v1/adapters

List all registered adapters. The response also reports the serving configuration: lora_mode, max_active_loras, and loaded_loras[] — the adapters currently loaded into the runtime.

GET/v1/adapters/:id

Metadata for a single adapter — base model, rank/alpha, target modules, and versions.

GET/v1/adapters/status

Health of the adapter subsystem: what is registered, what is currently loaded, and serving readiness.

From the CLI

bash
# Train a classical PEFT LoRA adapter with SFT — the standard supervised route
stimulir lab train sft --family qwen3-4b --lora-rank 16 --epochs 3

# Or train with GRPO against a verifiable environment
stimulir lab train rl --family qwen3-4b --policy peft-lora --lora-rank 16 --prompts 64

# Score it on held-out NIAH
stimulir lab eval adapter --family qwen3-4b --adapter-dir <dir>

# List registered adapters
stimulir lab adapters list

Next